update and fix README

ModelTC · Feb 19, 2025 · 5c28b33 · 5c28b33
1 parent 2d768aa
commit 5c28b33
Showing 1 changed file with 8 additions and 80 deletions.
diff --git a/README.md b/README.md
@@ -20,94 +20,22 @@ LightLLM is a Python-based LLM (Large Language Model) inference and serving fram
 
 [English Docs](https://lightllm-en.readthedocs.io/en/latest/) | [中文文档](https://lightllm-cn.readthedocs.io/en/latest/) | [Blogs](https://modeltc.github.io/lightllm-blog/)
 
-## Features
-
-- Tri-process asynchronous collaboration: tokenization, model inference, and detokenization are performed asynchronously, leading to a considerable improvement in GPU utilization.
-- Nopad (Unpad): offers support for nopad attention operations across multiple models to efficiently handle requests with large length disparities.
-- Dynamic Batch: enables dynamic batch scheduling of requests
-- [FlashAttention](https://github.com/Dao-AILab/flash-attention): incorporates FlashAttention to improve speed and reduce GPU memory footprint during inference.
-- Tensor Parallelism: utilizes tensor parallelism over multiple GPUs for faster inference.
-- [Token Attention](./docs/TokenAttention.md): implements token-wise's KV cache memory management mechanism, allowing for zero memory waste during inference.
-- High-performance Router: collaborates with Token Attention to meticulously manage the GPU memory of each token, thereby optimizing system throughput.
-- Int8KV Cache: This feature will increase the capacity of tokens to almost twice as much. only llama support.
-
-## Supported Model List
-
-The following table provides a list of supported models along with any special arguments required for their configuration and annotations.
-
-| Model Name                     | Comments                                                                                     |
-|--------------------------------|-------------------------------------------------------------------------------------------------------|
-| [BLOOM](https://huggingface.co/bigscience/bloom) | None                                                                                                  |
-| [LLaMA](https://github.com/facebookresearch/llama) | None                                                                                                  |
-| [LLaMA V2](https://huggingface.co/meta-llama) | None                                                                                                  |
-| [StarCoder](https://github.com/bigcode-project/starcoder) | None                                                                                                  |
-| [Qwen-7b](https://github.com/QwenLM/Qwen-7B) | `--eos_id 151643 --trust_remote_code`                                                                 |
-| [ChatGLM2-6b](https://github.com/THUDM/ChatGLM2-6B) | `--trust_remote_code`                                                                                 |
-| [InternLM-7b](https://github.com/InternLM/InternLM) | `--trust_remote_code`                                                                                 |
-| [InternVL-Chat](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5) | `--eos_id 32007 --trust_remote_code` (Phi3) or `--eos_id 92542 --trust_remote_code` (InternLM2)       |
-| [Qwen-VL](https://huggingface.co/Qwen/Qwen-VL) | None                                                                                                  |
-| [Qwen-VL-Chat](https://huggingface.co/Qwen/Qwen-VL-Chat) | None                                                                                                  |
-| [Qwen2-VL](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) | `--eos_id 151645 --trust_remote_code`, and run `pip install git+https://github.com/huggingface/transformers` |
-| [Llava-7b](https://huggingface.co/liuhaotian/llava-v1.5-7b) | None                                                                                                  |
-| [Llava-13b](https://huggingface.co/liuhaotian/llava-v1.5-13b) | None                                                                                                  |
-| [Mixtral](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) | None                                                                                                  |
-| [Stablelm](https://huggingface.co/stabilityai/stablelm-2-1_6b) | `--trust_remote_code`                                                                                 |
-| [MiniCPM](https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16) | None                                                                                                  |
-| [Phi-3](https://huggingface.co/collections/microsoft/phi-3-6626e15e9585a200d2d761e3) | Only supports Mini and Small                                                                          |
-| [CohereForAI](https://huggingface.co/CohereForAI/c4ai-command-r-plus) | None                                                                                                  |
-| [DeepSeek-V2-Lite](https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite) | `--data_type bfloat16`                                                                                |
-| [DeepSeek-V2](https://huggingface.co/deepseek-ai/DeepSeek-V2) | `--data_type bfloat16`                                                                                |
+## News
+- [2025/02] 🔥 LightLLM v1.0.0 release, achieving the **fastest DeepSeek-R1** serving performance on single H200 machine.
 
 ## Get started
 
-### Installation
-
-Use lightllm with `docker`.
-
-```shell
-docker pull ghcr.io/modeltc/lightllm:main
-```
-
-To start a container with GPU support and port mapping:
-
-```shell
-docker run -it --gpus all -p 8080:8080                  \
-        --shm-size 1g -v your_local_path:/data/         \
-        ghcr.io/modeltc/lightllm:main /bin/bash
-```
-
-
-    Note: If multiple GPUs are used, `--shm-size` in `docker run` command should be increased.
-
-
-Alternatively, you can [build the docker image](https://lightllm-en.readthedocs.io/en/latest/getting_started/installation.html#installing-with-docker) or [install from source with pip](https://lightllm-en.readthedocs.io/en/latest/getting_started/installation.html#installing-from-source).
-
-### Quick Start
-
-Lightllm provides LLM inference services with the state-of-the-art throughput performance via efficient request routers and TokenAttention. 
-
-We provide examples to launch the LightLLM service and query the model (via console and python) for both text and multimodal models.
-
+- [Install LightLLM](https://lightllm-en.readthedocs.io/en/latest/getting_started/installation.html)
 - [Quick Start](https://lightllm-en.readthedocs.io/en/latest/getting_started/quickstart.html)
-- [Text Model Service](https://lightllm-en.readthedocs.io/en/latest/models/test.html#llama)
-- [Multimodal Model Service](https://lightllm-en.readthedocs.io/en/latest/models/test.html#llava)
+- [LLM Service](https://lightllm-en.readthedocs.io/en/latest/models/test.html#llama)
+- [VLM Service](https://lightllm-en.readthedocs.io/en/latest/models/test.html#llava)
 
-    Note: additional parameters for multimodal models (`--enable_multimodal`, `--cache_capacity`) require larger `--shm-size`.
-    If the lightllm is run with `--tp > 1`, the visual model will run on the gpu 0.
-    Input images format: list for dict like `{'type': 'url'/'base64', 'data': xxx}`
-    The special image tag for Qwen-VL is `<img></img>` (`<image>` for Llava), the length of `data["multimodal_params"]["images"]` should be the same as the count of tags, The number can be 0, 1, 2, ...
-
-
-### Other
-
-Please refer to the [documentation](https://lightllm-en.readthedocs.io/en/latest/) for more information.
 
 ## Performance
 
-Lightllm provides high throughput services. The performance comparison between LightLLM and vLLM is shown [here](https://lightllm-en.readthedocs.io/en/latest/dev/performance.html). Up to vllm=0.1.2, we have achieved a 2x larger throughput than vLLM.
-
+Learn more in the release blogs: [v1.0.0 blog](https://www.light-ai.top/lightllm-blog//by%20mtc%20team/2025/02/16/lightllm/).
 
-### FAQ
+## FAQ
 
 Please refer to the [FAQ](https://lightllm-en.readthedocs.io/en/latest/faq.html) for more information.
 
@@ -138,7 +66,7 @@ We welcome any coopoeration and contribution. If there is a project requires lig
 
 ## Community
 
-For further information and discussion, [join our discord server](https://discord.gg/WzzfwVSguU).
+For further information and discussion, [join our discord server](https://discord.gg/WzzfwVSguU). Welcome to be a member and look forward to your contribution!
 
 ## License