Add Doc for LMDeploy (#61)

* add lm-deploy Co-authored-by: sjh <[email protected]>
Ascend · Dec 13, 2024 · 56f3bc8 · 56f3bc8
1 parent a28a6d4
commit 56f3bc8
Show file tree

Hide file tree

Showing 7 changed files with 251 additions and 15 deletions.
diff --git a/_static/images/lm-deploy.png b/_static/images/lm-deploy.png
diff --git a/_static/llamacpp_actions.js b/_static/llamacpp_actions.js
@@ -36,11 +36,11 @@ $(document).ready(function () {
     $.gen_content = function () {
         var options = $.get_options();
         if (options['install_type'] == "docker") {
-            $('#install-llamacpp-pip-section').hide();
+            $('#install-llamacpp-sourceCode-section').hide();
             $('#install-llamacpp-docker-section').show();
-        } else if (options['install_type'] == "pip") {
+        } else if (options['install_type'] == "sourceCode") {
             $('#install-llamacpp-docker-section').hide();
-            $('#install-llamacpp-pip-section').show();
+            $('#install-llamacpp-sourceCode-section').show();
         }
     }
 

diff --git a/index.rst b/index.rst
@@ -36,6 +36,7 @@
    sources/sentence_transformers/index.rst
    sources/trl/index.rst
    sources/opencompass/index.rst
+   sources/lm_deploy/index.rst
 
 .. warning::
 
@@ -339,5 +340,24 @@
                     <a href="sources/opencompass/quick_start.html">快速上手</a>
                 </div>
             </div>
+            <!-- Card 17 -->
+            <div class="box rounded-lg p-4 flex flex-col items-center">
+                <div class="flex items-center mb-4">
+                    <div class="img w-16 h-16 rounded-md mr-4" style="background-image: url('_static/images/lm-deploy.png')"></div>
+                    <div>
+                        <h2 class="text-lg font-semibold">LMDeploy</h2>
+                        <p class="text-gray-600 desc">用于压缩、部署和服务 LLM 的工具包</p>
+                    </div>
+                </div>
+                <div class="flex-grow"></div>
+                <div class="flex space-x-4 text-blue-600">
+                    <a href="https://github.com/InternLM/lmdeploy">官方链接</a>
+                    <span class="split">|</span>
+                    <a href="sources/lm_deploy/install.html">安装指南</a>
+                    <span class="split">|</span>
+                    <a href="sources/lm_deploy/quick_start.html">快速上手</a>
+                </div>
+            </div>
+
         </div>
     </div>
diff --git a/sources/llama_cpp/install.rst b/sources/llama_cpp/install.rst
@@ -9,7 +9,7 @@
 llama.cpp 下载安装
 ---------------------------
 
-此处提供 docker 和 pip 两种安装方式，请按需选择:
+此处提供源码安装和 docker 两种安装方式，请按需选择:
 
 .. raw:: html
 
@@ -19,7 +19,7 @@ llama.cpp 下载安装
             <div class="row-element-1" id="col-values">
                     <div class="row" id="row-install_type">
                     <div class="mobile-headings">安装方式</div>
-                    <div class="values-element block-3 install-type selected" id="install_type-pip">pip</div>
+                    <div class="values-element block-3 install-type selected" id="install_type-sourceCode">源码安装</div>
                     <div class="values-element block-3 install-type" id="install_type-docker">Docker</div>
                 </div>
             </div>
@@ -30,7 +30,7 @@ llama.cpp 下载安装
 
 .. raw:: html
 
-    <section id="install-llamacpp-pip-section">
+    <section id="install-llamacpp-sourceCode-section">
         <h2>使用源代码安装</h2>
         <div class="admonition note">
                 <p class="admonition-title">备注</p>
@@ -40,14 +40,6 @@ llama.cpp 下载安装
             <p class="admonition-title">提示</p>
             <p>LLAMA-Factory 支持的 CANN 最低版本为 8.0.rc1。安装 CANN 时，请同时安装 Kernel 算子包。</p>
         </div>
-        <h3>Python 环境创建</h3>
-            <div class="code">
-                <p>创建并激活 Python 环境：</p>
-                <div class="highlight">
-                  <pre>conda create -y -n llamacpp python=3.10
-    conda activate llamacpp</pre>
-                </div>
-            </div>
 
         <h3>获取源代码</h3>
             <div class="code">
@@ -81,7 +73,9 @@ llama.cpp 下载安装
             </div>           
                 <p>构建 docker 镜像：</p>
                 <div class="highlight">
-                <pre>docker build -t llama-cpp-cann -f .devops/llama-cli-cann.Dockerfile .</pre>
+                <pre>git clone https://github.com/ggerganov/llama.cpp
+    cd llama.cpp
+    docker build -t llama-cpp-cann -f .devops/llama-cli-cann.Dockerfile .</pre>
                 </div>
                 <p>找到所有卡的运行信息：</p>
                 <div class="highlight">

diff --git a/sources/lm_deploy/index.rst b/sources/lm_deploy/index.rst
@@ -0,0 +1,10 @@
+LMDeploy
+===========
+
+.. toctree::
+   :maxdepth: 2
+
+   install.rst
+   quick_start.rst
+
+
diff --git a/sources/lm_deploy/install.rst b/sources/lm_deploy/install.rst
@@ -0,0 +1,51 @@
+安装指南
+==============
+
+LMDeploy 是一个用于大型语言模型（LLMs）和视觉-语言模型（VLMs）压缩、部署和服务的 Python 库。其核心推理引擎包括 TurboMind 引擎和 PyTorch 引擎，
+前者由 C++ 和 CUDA 开发，致力于推理性能的优化，而后者纯 Python 开发，旨在降低开发者的门槛。
+
+本教程面向使用 lm-deploy & 昇腾的开发者，帮助完成昇腾环境下 lm-deploy 的安装。
+
+
+lm_deploy 下载安装
+---------------------------
+
+使用 pip 安装（推荐）
+++++++++++++++++++++++++++
+
+推荐在一个干净的 conda 环境下（python3.8 - 3.12），安装 lmdeploy ：
+
+.. code-block:: shell 
+    :linenos:
+
+    conda create -n lmdeploy python=3.8 -y
+    conda activate lmdeploy
+    pip install lmdeploy
+
+
+从源码安装
++++++++++++++++++++++++++++++++
+
+如果你使用 PyTorch 引擎进行推理，从源代码安装非常简单：
+
+.. code-block:: shell 
+    :linenos:
+
+    git clone https://github.com/InternLM/lmdeploy.git
+    cd lmdeploy
+    pip install -e .
+
+安装校验
+-----------------
+
+安装过程中未出现错误，且执行下面命令后出现 lmdeploy 版本号即为安装成功。
+
+.. code-block:: shell
+    :linenos:
+
+    python -c "import lmdeploy; print(lmdeploy.__version__)"
+
+    # 以下为输出示例
+    # 0.6.2
+
+
diff --git a/sources/lm_deploy/quick_start.rst b/sources/lm_deploy/quick_start.rst
@@ -0,0 +1,161 @@
+快速开始
+=========================
+
+我们基于 LMDeploy 的 PytorchEngine，增加了华为昇腾设备（Atlas 800T A2）的支持。所以，在华为昇腾上使用 LMDeploy 的方法与在英伟达 GPU 上使用 PytorchEngine 后端的方法几乎相同。在阅读本教程之前，请先阅读原版的 `快速开始 <https://github.com/InternLM/lmdeploy/blob/main/docs/zh_cn/get_started/get_started.md>`_ 。
+
+安装
+-----
+
+我们强烈建议用户构建一个 Docker 镜像以简化环境设置。
+克隆 lmdeploy 的源代码，Dockerfile 位于 docker 目录中。
+
+.. code-block:: shell
+    :linenos:
+
+    git clone https://github.com/InternLM/lmdeploy.git
+    cd lmdeploy
+
+环境准备
+--------
+
+Docker 版本应不低于 18.03。并且需按照 `官方指南 <https://www.hiascend.com/document/detail/zh/mindx-dl/60rc2/clusterscheduling/clusterschedulingig/clusterschedulingig/dlug_installation_012.html>`_ 安装 Ascend Docker Runtime。
+
+
+.. note::
+
+    如果在后续容器内出现 `libascend_hal.so: cannot open shared object file` 错误，说明 Ascend Docker Runtime 没有被正确安装。
+
+Drivers，Firmware 和 CANN
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+目标机器需安装华为驱动程序和固件版本至少为 23.0.3，请参考
+`CANN 驱动程序和固件安装 <https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/800alpha001/softwareinst/instg/instg_0005.html>`_ 
+和 `下载资源 <https://www.hiascend.com/hardware/firmware-drivers/community?product=4&model=26&cann=8.0.RC2.beta1&driver=1.0.25.alpha>`_ 。
+
+另外，**docker/Dockerfile_aarch64_ascend** 没有提供CANN 安装包，用户需要自己从 `昇腾资源下载中心 <https://www.hiascend.com/developer/download/community/result?module=cann&cann=8.0.RC2.beta1&product=4&model=26>`_ 下载 CANN（version 8.0.RC2.beta1）软件包。
+并将 **Ascend-cann-kernels-910b*.run** ，**Ascend-cann-nnal_*.run** 和 **Ascend-cann-toolkit*.run** 放在 lmdeploy 源码根目录下。
+
+构建镜像
+~~~~~~~~~~~~~~~~~~
+
+请在 lmdeploy 源代码根目录下执行以下镜像构建命令，CANN 相关的安装包也放在此目录下。
+
+.. code-block:: shell
+    :linenos:
+
+    DOCKER_BUILDKIT=1 docker build -t lmdeploy-aarch64-ascend:latest \
+    -f docker/Dockerfile_aarch64_ascend .
+
+如果以下命令执行没有任何错误，这表明环境设置成功。
+
+.. code-block:: shell
+    :linenos:
+
+    docker run -e ASCEND_VISIBLE_DEVICES=0 --rm --name lmdeploy -t lmdeploy-aarch64-ascend:latest lmdeploy check_env
+
+
+关于在昇腾设备上运行 `docker run` 命令的详情，请参考这篇 `文档 <https://www.hiascend.com/document/detail/zh/mindx-dl/60rc1/clusterscheduling/dockerruntimeug/dlruntime_ug_013.html>`_ 。
+
+离线批处理
+----------
+
+.. note::
+
+    图模式已经支持了 Atlas 800T A2。目前，单卡下的 LLaMa3-8B/LLaMa2-7B/Qwen2-7B 已经通过测试。用户可以设定 `eager_mode=False` 来开启图模式，或者设定 `eager_mode=True` 来关闭图模式。(启动图模式需要事先 source `/usr/local/Ascend/nnal/atb/set_env.sh`)
+
+LLM 推理
+~~~~~~~~~
+
+将 `device_type="ascend"` 加入 `PytorchEngineConfig` 的参数中。
+
+.. code-block:: python
+    :linenos:
+
+    from lmdeploy import pipeline
+    from lmdeploy import PytorchEngineConfig
+    if __name__ == "__main__":
+        pipe = pipeline("internlm/internlm2_5-7b-chat",
+                        backend_config=PytorchEngineConfig(tp=1, device_type="ascend", eager_mode=True))
+        question = ["Shanghai is", "Please introduce China", "How are you?"]
+        response = pipe(question)
+        print(response)
+
+VLM 推理
+~~~~~~~~~
+
+将 `device_type="ascend"` 加入 `PytorchEngineConfig` 的参数中。
+
+.. code-block:: python
+    :linenos:
+
+    from lmdeploy import pipeline, PytorchEngineConfig
+    from lmdeploy.vl import load_image
+    if __name__ == "__main__":
+        pipe = pipeline('OpenGVLab/InternVL2-2B',
+                        backend_config=PytorchEngineConfig(tp=1, device_type='ascend', eager_mode=True))
+        image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
+        response = pipe(('describe this image', image))
+        print(response)
+
+在线服务
+---------
+
+.. note::
+
+    图模式已经支持 Atlas 800T A2。目前，单卡下的 InternLM2-7B/LLaMa2-7B/Qwen2-7B 已经通过测试。
+    在线服务时，图模式默认开启，用户可以添加 `--eager-mode` 来关闭图模式。(启动图模式需要事先 source `/usr/local/Ascend/nnal/atb/set_env.sh` )
+
+LLM 模型服务
+~~~~~~~~~~~~~
+
+将 `--device ascend` 加入到服务启动命令中。
+
+.. code-block:: shell
+    :linenos:
+
+    lmdeploy serve api_server --backend pytorch --device ascend --eager-mode internlm/internlm2_5-7b-chat
+
+
+VLM 模型服务
+~~~~~~~~~~~~~
+
+将 `--device ascend` 加入到服务启动命令中。
+
+.. code-block:: shell
+    :linenos:
+
+    lmdeploy serve api_server --backend pytorch --device ascend --eager-mode OpenGVLab/InternVL2-2B
+
+
+使用命令行与LLM模型对话
+--------------------------------------
+
+将 `--device ascend` 加入到服务启动命令中。
+
+.. code-block:: shell
+    :linenos:
+
+    lmdeploy chat internlm/internlm2_5-7b-chat --backend pytorch --device ascend --eager-mode
+
+
+也可以运行以下命令使启动容器后开启 lmdeploy 聊天
+
+.. code-block:: shell
+    :linenos:
+
+    docker exec -it lmdeploy_ascend_demo \
+    bash -i -c "lmdeploy chat --backend pytorch --device ascend --eager-mode internlm/internlm2_5-7b-chat"
+
+
+量化
+----
+
+运行下面的代码可以在 Atlas 800T A2 上对权重进行 W4A16 量化。
+
+.. code-block:: shell
+    :linenos:
+
+    lmdeploy lite auto_awq $HF_MODEL --work-dir $WORK_DIR --device npu
+
+
+支持的模型列表请参考 `支持的模型 <https://github.com/InternLM/lmdeploy/blob/main/docs/zh_cn/supported_models/supported_models.md>`_ 。