Skip to content

Commit

Permalink
Refine Pipeline Parallel FastAPI example (#11168)
Browse files Browse the repository at this point in the history
  • Loading branch information
xiangyuT authored May 29, 2024
1 parent 9bfbf78 commit 2299698
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 1 deletion.
2 changes: 2 additions & 0 deletions python/llm/example/GPU/Pipeline-Parallel-FastAPI/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ pip install oneccl_bind_pt==2.1.100 --extra-index-url https://pytorch-extension.
source /opt/intel/oneapi/setvars.sh
pip install mpi4py fastapi uvicorn
conda install -c conda-forge -y gperftools=2.10 # to enable tcmalloc

pip install transformers==4.31.0 # for llama2 models
```

### 2. Run pipeline parallel serving on multiple GPUs
Expand Down
3 changes: 2 additions & 1 deletion python/llm/example/GPU/Pipeline-Parallel-FastAPI/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,5 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=2
export TORCH_LLM_ALLREDUCE=0

export MODEL_PATH=YOUR_MODEL_PATH
CCL_ZE_IPC_EXCHANGE=sockets torchrun --standalone --nnodes=1 --nproc-per-node 2 pipeline_serving.py --repo-id-or-model-path $MODEL_PATH --low-bit fp8
export NUM_GPUS=2
CCL_ZE_IPC_EXCHANGE=sockets torchrun --standalone --nnodes=1 --nproc-per-node $NUM_GPUS pipeline_serving.py --repo-id-or-model-path $MODEL_PATH --low-bit fp8

0 comments on commit 2299698

Please sign in to comment.