A web demo based on Gradio is provided in repo.
Support models list:
- ChatGLM
- ChatGLM2
- ChatGLM3
- ChatGLM4
- Llama2
- Llama3
- Gemma
- Yi
- Baichuan2
- Qwen
- Qwen2
Please refer to Installation. This example supports use source code which means you don't need install xFasterTransformer into pip and just build xFasterTransformer library, and it will search library in src directory.
Please refer to Prepare model
- Please refer to Prepare Environment to install oneCCL.
- Python dependencies.
PS: Due to the potential compatibility issues between the model file and the
# requirements.txt in `examples/web_demo/`. pip install -r requirements.txt
transformers
version, please select the appropriatetransformers
version.
After the web server started, open the output URL in the browser to use the demo. Please specify the paths of model and tokenizer directory, and data type. transformer
's tokenizer is used to encode and decode text so ${TOKEN_PATH}
means the huggingface model directory.
# Recommend preloading `libiomp5.so` to get a better performance.
# or LD_PRELOAD=libiomp5.so manually, `libiomp5.so` file will be in `3rdparty/mkl/lib` directory after build xFasterTransformer.
export $(python -c 'import xfastertransformer as xft; print(xft.get_env())')
# run single instance like
python examples/web_demo/ChatGLM.py \
--dtype=bf16 \
--token_path=${TOKEN_PATH} \
--model_path=${MODEL_PATH}
# run multi-rank like
OMP_NUM_THREADS=48 mpirun \
-n 1 numactl -N 0 -m 0 python examples/web_demo/ChatGLM.py --dtype=bf16 --token_path=${TOKEN_PATH} --model_path=${MODEL_PATH}: \
-n 1 numactl -N 1 -m 1 python examples/web_demo/ChatGLM.py --dtype=bf16 --token_path=${TOKEN_PATH} --model_path=${MODEL_PATH}:
Parameter options settings:
-h
,--help
show help message and exit.-t
,--token_path
Path to tokenizer directory.-m
,--model_path
Path to model directory.-d
,--dtype
Data type, default usingfp16
, supports{fp16, bf16, int8, w8a8, int4, nf4, bf16_fp16, bf16_int8, bf16_w8a8,bf16_int4, bf16_nf4, w8a8_int8, w8a8_int4, w8a8_nf4}
.
shell
python web_demo_api.py --url http://local:8000/v1 -m xft
Parameter options settings:
-h
,--help
show this help message and exit-u
,--url
base url likehttp://local:8000/v1
-m
,--model
model name-t
,--token
API token key-i
,--ip
gradio server ip, default0.0.0.0
-p
,--port
gradio server port, default7860
-s
,--share
ture
orfalse
, whether to create a share link