Skip to content

Commit

Permalink
add --blocksize to doc and script (#12187)
Browse files Browse the repository at this point in the history
  • Loading branch information
liu-shaojun authored Oct 12, 2024
1 parent 6ffaec6 commit 49eb206
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 0 deletions.
1 change: 1 addition & 0 deletions docker/llm/serving/xpu/docker/start-vllm-service.sh
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ python -m ipex_llm.vllm.xpu.entrypoints.openai.api_server \
--port 8000 \
--model $model \
--trust-remote-code \
--block-size 8 \
--gpu-memory-utilization 0.9 \
--device xpu \
--dtype float16 \
Expand Down
1 change: 1 addition & 0 deletions docs/mddocs/DockerGuides/vllm_docker_quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,7 @@ Before performing benchmark or starting the service, you can refer to this [sect
|`--max-model-len`| Model context length. If unspecified, will be automatically derived from the model config.|
|`--max-num-batched-token`| Maximum number of batched tokens per iteration.|
|`--max-num-seq`| Maximum number of sequences per iteration. Default: 256|
|`--block-size`| vLLM block size. Set to 8 to achieve a performance boost.|

#### Single card serving

Expand Down

0 comments on commit 49eb206

Please sign in to comment.