IPEX Speculative Support for Baichuan2 7B (intel-analytics#10112)

* IPEX Speculative Support for Baichuan2 7B * fix license problems * refine
WeiguangHan · Feb 19, 2024 · 5553f43 · 5553f43
1 parent d12242c
commit 5553f43
Show file tree

Hide file tree

Showing 4 changed files with 1,061 additions and 2 deletions.
diff --git a/python/llm/example/CPU/Speculative-Decoding/baichuan2/README.md b/python/llm/example/CPU/Speculative-Decoding/baichuan2/README.md
@@ -63,7 +63,7 @@ First token latency x.xxxxs
 
 ### 4. Accelerate with BIGDL_OPT_IPEX
 
-To accelerate speculative decoding on CPU, you can install our validated version of [IPEX 2.3.0+git0c63936](https://github.com/intel/intel-extension-for-pytorch/tree/0c63936d7a6740679987920367ae2e0cdb375b2e) by following steps: (Other versions of IPEX may have some conflicts and can not accelerate speculative decoding correctly.)
+To accelerate speculative decoding on CPU, optionally, you can install our validated version of [IPEX 2.3.0+git0c63936](https://github.com/intel/intel-extension-for-pytorch/tree/0c63936d7a6740679987920367ae2e0cdb375b2e) by following steps: (Other versions of IPEX may have some conflicts and can not accelerate speculative decoding correctly.)
 
 #### 4.1 Download IPEX installation script
 ```bash
@@ -89,7 +89,19 @@ bash compile_bundle.sh
 pip install -r requirements.txt
 ```
 
-After installed IPEX, you can set `BIGDL_OPT_IPEX=true` to get target model acceleration. Currently only `Baichuan2 13b` is supported.
+#### 4.5 Run Baichuan2 Models with IPEX
+
+After installed IPEX, **if the size of your Baichuan2 is 7B**, replace `modeling_baichuan.py` file under your model directory with `./baichaun2_7b_opt_ipex/modeling_baichuan.ipex`, like:
+
+```bash
+cp ./baichaun2_7b_opt_ipex/modeling_baichuan.ipex your_model_path/modeling_baichuan.py
+```
+
+And also replace `tokenization_baichuan.py` file under your model directory with `./baichaun2_7b_opt_ipex/tokenization_baichuan.py`.
+
+**13B does not need the above operations, and please ignore.**
+
+Then, you can set `BIGDL_OPT_IPEX=true` to get target model acceleration:
 
 ```bash
 source bigdl-llm-init -t