update llamafactory doc

1. add uninstall llamafactory in install.rst 2. add multi_npu.rst 3. update inference gif
Ascend · Jun 24, 2024 · 26192b6 · 26192b6
1 parent 3a437ff
commit 26192b6
Show file tree

Hide file tree

Showing 6 changed files with 73 additions and 39 deletions.
diff --git a/sources/llamafactory/faq.rst b/sources/llamafactory/faq.rst
@@ -83,6 +83,7 @@ A：此类报错通常为部分 Tensor 未放到 NPU 上，请确保报错中算
 
 
 .. **Q：单卡 NPU 情况下，使用 DeepSpeed 训练模型，报错 AttributeError :'GemmaForCausalLM'obiect has no attribute"save checkpoint"，此处 GemmaForCausalLM 还可能为其他模型，详细报错如下图**
+
 **Q：单卡 NPU 情况下，使用 DeepSpeed 训练模型，报错 AttributeError :'GemmaForCausalLM'obiect has no attribute"save checkpoint"，此处 GemmaForCausalLM 还可能为其他模型**
 
 .. .. figure:: ./images/lf-bugfix.png
@@ -113,7 +114,7 @@ A：此问题一般为使用 ``python src/train.py`` 启动训练脚本或使用
 问题反馈
 ----------
 
-如果您遇到任何问题，欢迎在 `官方社区 <https://github.com/hiyouga/LLaMA-Factory/issues/>`_ 提 issue，我们将第一时间进行响应。
+如果您遇到任何问题，欢迎在 `官方社区 <https://github.com/hiyouga/LLaMA-Factory/issues/>`_ 提 issue，或在 `LLAMA-Factory × 昇腾交流群 <https://github.com/hiyouga/LLaMA-Factory/blob/main/assets/wechat_npu.jpg>`_ 内提问，我们将第一时间进行响应。
 
 *持续更新中 ...*
 
diff --git a/sources/llamafactory/images/chat-llamafactory.gif b/sources/llamafactory/images/chat-llamafactory.gif
diff --git a/sources/llamafactory/index.rst b/sources/llamafactory/index.rst
@@ -6,4 +6,5 @@ LLaMA-Factory
 
    install.rst
    quick_start.rst
+   multi_npu.rst
    faq.rst
diff --git a/sources/llamafactory/install.rst b/sources/llamafactory/install.rst
@@ -11,7 +11,9 @@
 - TODO
 
 .. warning::
-  LLAMA-Factory 支持的 CANN 最低版本为 8.0.rc1
+  LLAMA-Factory 支持的 CANN 最低版本为 8.0.rc1。
+
+  安装 CANN 时，请安装算子包，以应用 CANN 算子。
 
 Python 环境创建
 ----------------------
@@ -41,7 +43,7 @@ LLaMA-Factory 安装
 安装校验
 ----------------------
 
-使用 ``llamafactory-cli env`` 指令对 LLaMA-Factory × 昇腾的安装进行校验，如下图所示，正确显示 LLaMA-Factory、PyTorch NPU 和 CANN 版本号及 NPU 型号等信息即说明安装成功。
+使用 ``llamafactory-cli env`` 指令对 LLaMA-Factory × 昇腾的安装进行校验，如下所示，正确显示 LLaMA-Factory、PyTorch NPU 和 CANN 版本号及 NPU 型号等信息即说明安装成功。
 
 .. code-block:: shell
   
@@ -57,3 +59,10 @@ LLaMA-Factory 安装
   - NPU type: xxx
   - CANN version: 8.0.RC2.alpha001
 
+LLaMA-Factory 卸载
+----------------------
+
+.. code-block:: shell
+  
+  pip uninstall llamafactory
+
diff --git a/sources/llamafactory/multi_npu.rst b/sources/llamafactory/multi_npu.rst
@@ -0,0 +1,55 @@
+单机多卡微调
+==============
+
+.. note::
+    阅读本篇前，请确保已按照 :doc:`安装教程 <./install>` 准备好昇腾环境及 LLaMA-Factory ！
+
+本篇为 :doc:`快速开始 <./quick_start>` 的进阶，同样首先安装 DeepSpeed 和 ModelScope：
+
+.. code-block::
+
+  pip install -e '.[deepspeed,modelscope]'
+
+多卡 NPU 指定
+--------------------------
+
+使用 ``export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3`` 指定所需 NPU 卡号，此处为 0~3 四卡 NPU。
+
+.. note::
+
+    昇腾 NPU 卡从 0 开始编号，docker 容器内也是如此；
+
+    如映射物理机上的 6，7 号 NPU 卡到容器内使用，其对应的卡号分别为 0，1
+
+
+或使用以下脚本自动检测并指定多卡 NPU：
+
+.. code-block:: shell
+
+    # ------------------------------ detect npu --------------------------------------
+    # detect npu via npu-smi
+    if command -v npu-smi info &> /dev/null; then
+      num_npus=$(npu-smi info -l | grep "Total Count" | awk -F ":" '{print $NF}')
+      npu_list=$(seq -s, 0 $((num_npus-1)))
+    else
+      num_npus=-1
+      npu_list="-1"
+    fi
+    echo using npu : $npu_list
+    num_gpus=$(echo $npu_list | awk -F "," '{print NF}')
+    # --------------------------------------------------------------------------------
+    export ASCEND_RT_VISIBLE_DEVICES=$npu_list
+
+基于 LoRA 的模型多卡分布式微调
+-------------------------------
+
+通过 ``ASCEND_RT_VISIBLE_DEVICES`` 变量指定多卡后，使用 torchrun 启动分布式训练，需指定 ``nproc_per_node`` 参数为 NPU 卡数量，其余参数配置与 :doc:`快速开始 <./quick_start>` 中单卡微调保持一致
+
+.. code-block:: shell
+    
+    torchrun --nproc_per_node $num_npus \
+        --nnodes 1 \
+        --node_rank 0 \
+        --master_addr 127.0.0.1 \
+        --master_port 7007 \
+        src/train.py <your_path>/qwen1_5_lora_sft_ds.yaml
diff --git a/sources/llamafactory/quick_start.rst b/sources/llamafactory/quick_start.rst
@@ -22,8 +22,6 @@
   export ASCEND_RT_VISIBLE_DEVICES=0
   export USE_MODELSCOPE_HUB=1
 
-多卡 NPU 用户请参考 :ref:`multi_npu`
-
 基于 LoRA 的模型微调
 ------------------------
 
@@ -45,6 +43,8 @@
 
   ``nproc_per_node, nnodes, node_rank, master_addr, master_port`` 为 torchrun 所需参数，其详细含义可参考 `PyTorch 官方文档 <https://pytorch.org/docs/stable/elastic/run.html>`_。
 
+NPU 多卡分布式训练请参考 :doc:`单机多卡微调 <./multi_npu>` 
+
 动态合并 LoRA 的推理
 ---------------------
 
@@ -62,8 +62,8 @@
 
 接下来即可在终端使用微调的模型进行问答聊天了！如下图所示，为在 NPU 成功推理的样例：
 
-.. figure:: ./images/chat.png
-  :align: left
+.. figure:: ./images/chat-llamafactory.gif
+  :align: center
 
 .. note::
   第一轮问答会有一些 warning 告警，这是由于 transformers 库更新所致，不影响推理的正常运行，请忽略
@@ -109,35 +109,3 @@ yaml 配置文件
 .. literalinclude:: ./qwen1_5_lora_sft_ds.yaml
     :language: yaml
     :linenos:
-
-
-.. _multi_npu:
-
-多卡 NPU 的使用
------------------
-
-使用以下脚本自动检测并指定多卡 NPU：
-
-.. code-block:: shell
-
-    # ------------------------------ detect npu --------------------------------------
-    # detect npu via npu-smi
-    if command -v npu-smi info &> /dev/null; then
-      num_npus=$(npu-smi info -l | grep "Total Count" | awk -F ":" '{print $NF}')
-      npu_list=$(seq -s, 0 $((num_npus-1)))
-    else
-      num_npus=-1
-      npu_list="-1"
-    fi
-    echo using npu : $npu_list
-    num_gpus=$(echo $npu_list | awk -F "," '{print NF}')
-    # --------------------------------------------------------------------------------
-    export ASCEND_RT_VISIBLE_DEVICES=$npu_list
-
-或使用 ``export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3`` 指定所需 NPU 卡号，此处为 0~3 四卡 NPU。
-
-.. note::
-
-    昇腾 NPU 卡从 0 开始编号，docker 容器内也是如此；
-
-    如映射物理机上的 6，7 号 NPU 卡到容器内使用，其对应的卡号分别为 0，1