Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add deepspeed autotp example readme #9289

Merged
merged 2 commits into from
Oct 27, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions python/llm/example/GPU/Deepspeed-AutoTP/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Run BigDL-LLM on Multiple Intel GPUs using DeepSpeed AutoTP

This example demonstrates how to run BigDL-LLM optimized low-bit model on multiple [Intel GPUs](../README.md) by leveraging DeepSpeed AutoTP.

## 0. Requirements
To run this example with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information. For this particular example, you will need at least two GPUs on your machine.

## Example:

### 1. Install

```bash
conda create -n llm python=3.9
conda activate llm
# below command will install intel_extension_for_pytorch==2.0.110+xpu as default
# you can install specific ipex/torch version for your need
pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
pip install oneccl_bind_pt==2.0.100 -f https://developer.intel.com/ipex-whl-stable-xpu
pip install git+https://github.com/microsoft/DeepSpeed.git@78c518e
pip install git+https://github.com/intel/intel-extension-for-deepspeed.git@ec33277
pip install mpi4py
```

### 2. Configures OneAPI environment variables
```bash
source /opt/intel/oneapi/setvars.sh
```

### 3. Run tensor parallel inference on multiple GPUs
You many want to change some of the parameters in the script such as `NUM_GPUS`` to the number of GPUs you have on your machine.

```
bash run.sh
```
7 changes: 4 additions & 3 deletions python/llm/example/GPU/Deepspeed-AutoTP/run.sh
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
source bigdl-llm-init -t -g
export MASTER_ADDR=127.0.0.1
export CCL_ZE_IPC_EXCHANGE=sockets
NUM_GPUS=4
if [[ -n $OMP_NUM_THREADS ]]; then
export OMP_NUM_THREADS=$(($OMP_NUM_THREADS / 4))
export OMP_NUM_THREADS=$(($OMP_NUM_THREADS / $NUM_GPUS))
else
export OMP_NUM_THREADS=$(($(nproc) / 4))
export OMP_NUM_THREADS=$(($(nproc) / $NUM_GPUS))
fi
torchrun --standalone \
--nnodes=1 \
--nproc-per-node 4 \
--nproc-per-node $NUM_GPUS \
deepspeed_autotp.py --repo-id-or-model-path "meta-llama/Llama-2-7b-hf"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deepspeed_autotp.py is not uploaded

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added in a previous PR.