[LLM] support for Yi AWQ model (intel-analytics#9648)

WeiguangHan · Dec 11, 2023 · cfe0e1f · cfe0e1f
1 parent b16a93f
commit cfe0e1f
Show file tree

Hide file tree

Showing 3 changed files with 7 additions and 0 deletions.
diff --git a/...llm/example/CPU/HF-Transformers-AutoModels/Advanced-Quantizations/AWQ/README.md b/...llm/example/CPU/HF-Transformers-AutoModels/Advanced-Quantizations/AWQ/README.md
@@ -7,6 +7,7 @@ This example shows how to directly run 4-bit AWQ models using BigDL-LLM on Intel
 - [Mistral-7B-v0.1-AWQ](https://huggingface.co/TheBloke/Mistral-7B-v0.1-AWQ)
 - [vicuna-7B-v1.5-AWQ](https://huggingface.co/TheBloke/vicuna-7B-v1.5-AWQ)
 - [vicuna-13B-v1.5-AWQ](https://huggingface.co/TheBloke/vicuna-13B-v1.5-AWQ)
+- [Yi-6B-AWQ](https://huggingface.co/TheBloke/Yi-6B-AWQ)
 
 ## Requirements
 To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../../../README.md#system-support) for more information.
@@ -23,6 +24,7 @@ pip install autoawq==0.1.6 --no-deps
 pip install --pre --upgrade bigdl-llm[all] # install bigdl-llm with 'all' option
 pip install transformers==4.35.0
 pip install accelerate==0.24.1
+pip install einops
 ```
 
 ### 2. Run

diff --git a/...llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/AWQ/README.md b/...llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/AWQ/README.md
@@ -7,6 +7,7 @@ This example shows how to directly run 4-bit AWQ models using BigDL-LLM on Intel
 - [Mistral-7B-v0.1-AWQ](https://huggingface.co/TheBloke/Mistral-7B-v0.1-AWQ)
 - [vicuna-7B-v1.5-AWQ](https://huggingface.co/TheBloke/vicuna-7B-v1.5-AWQ)
 - [vicuna-13B-v1.5-AWQ](https://huggingface.co/TheBloke/vicuna-13B-v1.5-AWQ)
+- [Yi-6B-AWQ](https://huggingface.co/TheBloke/Yi-6B-AWQ)
 
 ## Requirements
 To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
@@ -23,6 +24,7 @@ pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-w
 pip install transformers==4.35.0
 pip install autoawq==0.1.6 --no-deps
 pip install accelerate==0.24.1
+pip install einops
 ```
 
 ### 2. Configures OneAPI environment variables

diff --git a/python/llm/src/bigdl/llm/transformers/awq/awq.py b/python/llm/src/bigdl/llm/transformers/awq/awq.py
@@ -70,6 +70,7 @@
     "mistral": "MistralDecoderLayer",
     "gpt_neox": "GPTNeoXDecoderLayer",
     "aquila": "AquilaDecoderLayer",
+    "Yi": "YiDecoderLayer",
 }
 
 
@@ -133,6 +134,8 @@ def get_blocks(model):
         layers = model.gpt_neox.layers
     elif "mistral" in str(model.__class__).lower():
         layers = model.model.layers
+    elif "yi" in str(model.__class__).lower():
+        layers = model.model.layers
     else:
         invalidInputError(False, f"Model type {type(model)} isn't supported.")
     return layers