-
Notifications
You must be signed in to change notification settings - Fork 10k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
400ae6f
commit f72945f
Showing
4 changed files
with
282 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
ARG ASCEND_VERSION=8.0.rc2.alpha003-910b-openeuler22.03-py3.8 | ||
|
||
FROM cosdt/cann:$ASCEND_VERSION AS build | ||
|
||
WORKDIR /app | ||
|
||
COPY . . | ||
|
||
RUN yum install -y gcc g++ cmake make | ||
ENV LIBRARY_PATH=/usr/local/Ascend/ascend-toolkit/latest/lib64:$LIBRARY_PATH | ||
ENV ASCEND_TOOLKIT_HOME=/usr/local/Ascend/ascend-toolkit/latest | ||
ENV LD_LIBRARY_PATH=${ASCEND_TOOLKIT_HOME}/lib64:${ASCEND_TOOLKIT_HOME}/lib64/plugin/opskernel:${ASCEND_TOOLKIT_HOME}/lib64/plugin/nnengine:${ASCEND_TOOLKIT_HOME}/opp/built-in/op_impl/ai_core/tbe/op_tiling:${LD_LIBRARY_PATH} | ||
ENV PYTHONPATH=${ASCEND_TOOLKIT_HOME}/python/site-packages:${ASCEND_TOOLKIT_HOME}/opp/built-in/op_impl/ai_core/tbe:${PYTHONPATH} | ||
ENV PATH=${ASCEND_TOOLKIT_HOME}/bin:${ASCEND_TOOLKIT_HOME}/compiler/ccec_compiler/bin:${PATH} | ||
ENV ASCEND_AICPU_PATH=${ASCEND_TOOLKIT_HOME} | ||
ENV ASCEND_OPP_PATH=${ASCEND_TOOLKIT_HOME}/opp | ||
ENV TOOLCHAIN_HOME=${ASCEND_TOOLKIT_HOME}/toolkit | ||
ENV ASCEND_HOME_PATH=${ASCEND_TOOLKIT_HOME} | ||
|
||
|
||
RUN echo "Building with static libs" && \ | ||
# source /usr/local/Ascend/ascend-toolkit/set_env.sh && \ | ||
cmake -B build -DGGML_CANN=ON -DBUILD_SHARED_LIBS=OFF -DCMAKE_LIBRARY_PATH=${ASCEND_TOOLKIT_HOME}\lib64 && \ | ||
cmake --build build --config Release --target llama-cli | ||
|
||
# TODO: use image with NNRT | ||
FROM cosdt/cann:$ASCEND_VERSION AS runtime | ||
|
||
COPY --from=build /app/build/bin/llama-cli /llama-cli | ||
|
||
ENV LC_ALL=C.utf8 | ||
|
||
ENTRYPOINT [ "/llama-cli" ] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,223 @@ | ||
# llama.cpp for CANN | ||
|
||
- [Background](#background) | ||
- [News](#news) | ||
- [OS](#os) | ||
- [Hardware](#hardware) | ||
- [Model Supports](#model-supports) | ||
- [Datatype Supports](#datatype-supports) | ||
- [Linux](#linux) | ||
- [TODO](#todo) | ||
|
||
## Background | ||
|
||
**Ascend NPU** is a range of AI processors using Neural Processing Unit. It will efficiently handle matrix-matrix multiplication, dot-product and scalars. | ||
|
||
**CANN** (Compute Architecture for Neural Networks) is a heterogeneous computing architecture for AI scenarios, providing support for multiple AI frameworks on the top and serving AI processors and programming at the bottom. It plays a crucial role in bridging the gap between upper and lower layers, and is a key platform for improving the computing efficiency of Ascend AI processors. Meanwhile, it offers a highly efficient and easy-to-use programming interface for diverse application scenarios, allowing users to rapidly build AI applications and services based on the Ascend platform. | ||
|
||
**Llama.cpp + CANN** | ||
|
||
The llama.cpp CANN backend is designed to support Ascend NPU. It utilize the ability of AscendC and ACLNN which are intergrated to CANN Toolkit and kernels to using Ascend NPU directly. | ||
|
||
## News | ||
|
||
- 2024.8 | ||
- Support Q4_0 and Q8_0 for Ascend NPU. | ||
- 2024.7 | ||
- Create CANN backend for Ascend NPU. | ||
|
||
## OS | ||
|
||
| OS | Status | Verified | | ||
|:-------:|:-------:|:----------------------------------------------:| | ||
| Linux | Support | Ubuntu 22.04, OpenEuler22.03 | | ||
|
||
|
||
## Hardware | ||
|
||
### Ascend NPU | ||
|
||
**Verified devices** | ||
| Ascend NPU | Status | | ||
|:-----------------------------:|:-------:| | ||
| Atlas 300T A2 | Support | | ||
|
||
*Notes:* | ||
|
||
- If you have trouble with Ascend NPU device, please create a issue with **[CANN]** prefix/tag. | ||
- If you run successfully with your Ascend NPU device, please help update the upper table. | ||
|
||
|
||
## Model Supports | ||
|
||
| Model Name | Status | | ||
|:-----------------------------:|:-------:| | ||
| Baichuan | Support | | ||
| Baichuan 2 | Support | | ||
| Bloom | Support | | ||
| Falcon 2 | Support | | ||
| Gpt 2 | Support | | ||
| InternLM 2 | Support | | ||
| Llama 2 | Support | | ||
| Llama 3 | Support | | ||
| Mamba | Support | | ||
| Mistral 7B | Support | | ||
| OLMo | Support | | ||
| Phi 3 | Support | | ||
| Qwen 2 | Support | | ||
| Refact | Support | | ||
| Starcoder | Support | | ||
| Yi | Support | | ||
|
||
|
||
## DataType Supports | ||
|
||
| DataType | Status | | ||
|:----------------------:|:-------:| | ||
| FP16 | Support | | ||
| Q8_0 | Support | | ||
| Q4_0 | Support | | ||
|
||
## Docker | ||
|
||
### Get Images | ||
You can get a pre-build image at cosdt/cann:8.0.rc2.alpha003-910b-openeuler22.03-py3.8-llama.cpp and use llama-cli directly without building llama.cpp in this image. | ||
```sh | ||
docker pull cosdt/cann:8.0.rc2.alpha003-910b-ubuntu22.04-py3.8-llama.cpp | ||
``` | ||
!!!!!! Add content for build image or get pre-build image. | ||
|
||
### Run container | ||
|
||
```sh | ||
# Find all cards. | ||
npu-smi info | ||
|
||
# Select the cards that you want to use, make sure these cards are not used by someone. | ||
# Following using cards of device0 and device1. | ||
docker run --name llamacpp --device /dev/davinci0 --device /dev/davinci1 --device /dev/davinci_manager --device /dev/devmm_svm --device /dev/hisi_hdc -v /usr/local/dcmi:/usr/local/dcmi -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info -v /PATH_TO_YOUR_MODELS/:/app/models -itd cosdt/cann:8.0.rc2.alpha003-910b-ubuntu22.04-py3.8-llama.cpp -m /app/models/MODEL_PATH -ngl 32 -p "Building a website can be done in 10 simple steps:" | ||
``` | ||
|
||
*Notes:* | ||
|
||
- You may need to install Ascend Driver and firmware on the **host** machine *(Please refer to the [Linux configuration](#linux) for details)*. | ||
|
||
## Linux | ||
|
||
### I. Setup Environment | ||
|
||
1. **Install Ascend Driver and firmware** | ||
|
||
```sh | ||
# create driver running user. | ||
sudo groupadd -g HwHiAiUser | ||
sudo useradd -g HwHiAiUser -d /home/HwHiAiUser -m HwHiAiUser -s /bin/bash | ||
sudo usermod -aG HwHiAiUser $USER | ||
|
||
# download driver from https://www.hiascend.com/hardware/firmware-drivers/community according to your system | ||
# and install driver. | ||
sudo sh Ascend-hdk-910b-npu-driver_x.x.x_linux-{arch}.run --full --install-for-all | ||
``` | ||
|
||
Once installed, run `npu-smi info` to check whether driver is installed successfully. | ||
```sh | ||
+-------------------------------------------------------------------------------------------+ | ||
| npu-smi 24.1.rc2 Version: 24.1.rc2 | | ||
+----------------------+---------------+----------------------------------------------------+ | ||
| NPU Name | Health | Power(W) Temp(C) Hugepages-Usage(page)| | ||
| Chip | Bus-Id | AICore(%) Memory-Usage(MB) HBM-Usage(MB) | | ||
+======================+===============+====================================================+ | ||
| 2 xxx | OK | 64.4 51 15 / 15 | | ||
| 0 | 0000:01:00.0 | 0 1873 / 15077 0 / 32768 | | ||
+======================+===============+====================================================+ | ||
| 5 xxx | OK | 64.0 52 15 / 15 | | ||
| 0 | 0000:81:00.0 | 0 1874 / 15077 0 / 32768 | | ||
+======================+===============+====================================================+ | ||
| No running processes found in NPU 2 | | ||
+======================+===============+====================================================+ | ||
| No running processes found in NPU 5 | | ||
+======================+===============+====================================================+ | ||
``` | ||
|
||
2. **Install Ascend Firmware** | ||
```sh | ||
# download driver from https://www.hiascend.com/hardware/firmware-drivers/community according to your system | ||
# and install driver. | ||
sudo sh Ascend-hdk-910b-npu-firmware_x.x.x.x.X.run --full | ||
``` | ||
If the following messaage appers, firmware is installed successfully. | ||
```sh | ||
Firmware package installed successfully! | ||
``` | ||
|
||
|
||
3. **Install CANN toolkit and kernels** | ||
|
||
CANN toolkit and kernels can be obtained from the official [CANN Toolkit](https://www.hiascend.com/zh/developer/download/community/result?module=cann) page. | ||
|
||
Please download the corresponding version that satified your system. The minimum version required is 8.0.RC2.alpha002 and here is the install command. | ||
```sh | ||
pip3 install attrs numpy decorator sympy cffi pyyaml pathlib2 psutil protobuf scipy requests absl-py wheel typing_extensions | ||
sh Ascend-cann-toolkit_8.0.RC2.alpha002_linux-aarch64.run --install | ||
sh Ascend-cann-kernels-910b_8.0.RC2.alpha002_linux.run --install | ||
``` | ||
|
||
Set Ascend Variables: | ||
```sh | ||
echo "source ~/Ascend/ascend-toolkit/set_env.sh" >> ~/.bashrc | ||
source ~/.bashrc | ||
``` | ||
|
||
Upon a successful installation, CANN is enabled for the available ascend devices. | ||
|
||
### II. Build llama.cpp | ||
|
||
```sh | ||
cmake -B build -DGGML_CANN=on -DCMAKE_BUILD_TYPE=release | ||
cmake --build build --config release | ||
``` | ||
|
||
### III. Run the inference | ||
|
||
1. **Retrieve and prepare model** | ||
|
||
You can refer to the general [*Prepare and Quantize*](README.md#prepare-and-quantize) guide for model prepration. | ||
|
||
**Notes**: | ||
|
||
- CANN backend only supports FP16/Q4_0/Q8_0 models currently. | ||
|
||
2. **Launch inference** | ||
|
||
There are two device selection modes: | ||
|
||
- Single device: Use one device target specified by the user. | ||
- Multiple devices: Automatically choose the devices with the same backend. | ||
|
||
In two device selection modes, the default SYCL backend is level_zero, you can choose other backend supported by SYCL by setting environment variable ONEAPI_DEVICE_SELECTOR. | ||
|
||
| Device selection | Parameter | | ||
|:----------------:|:--------------------------------------:| | ||
| Single device | --split-mode none --main-gpu DEVICE_ID | | ||
| Multiple devices | --split-mode layer (default) | | ||
|
||
Examples: | ||
|
||
- Use device 0: | ||
|
||
```sh | ||
./build/bin/llama-cli -m path_to_model -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33 -sm none -mg 0 | ||
``` | ||
|
||
- Use multiple devices: | ||
|
||
```sh | ||
./build/bin/llama-cli -m path_to_model -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33 -sm layer | ||
``` | ||
|
||
### **GitHub contribution**: | ||
Please add the **[CANN]** prefix/tag in issues/PRs titles to help the CANN-team check/address them without delay. | ||
|
||
|
||
## TODO | ||
- Support more models and data types. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters