Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
songhappy committed Oct 27, 2023
1 parent 205735d commit ed35455
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 1 deletion.
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Low-Bit Streaming LLM using BigDL-LLM

In this example, we apply [Streaming-LLM](https://github.com/mit-han-lab/streaming-llm/tree/main#efficient-streaming-language-models-with-attention-sinks) using BigDL-LLM, which can deploy low-bit(including INT8/INT5/INT4) LLMs for infinite-length inputs.
In this example, we apply [Streaming-LLM](https://github.com/mit-han-lab/streaming-llm/tree/main#efficient-streaming-language-models-with-attention-sinks) using BigDL-LLM, which can deploy low-bit(including FP4/INT4/FP8/INT8) LLMs for infinite-length inputs.
Only one code change is needed to load the model using bigdl-llm as follows:
```python
from bigdl.llm.transformers import AutoModelForCausalLM
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@
import urllib.request
import os
import json
# code change to import from bigdl-llm API instead of using transformers API
from bigdl.llm.transformers import AutoModelForCausalLM
from transformers import LlamaTokenizer
import intel_extension_for_pytorch as ipex
Expand All @@ -61,6 +62,8 @@ def load(model_name_or_path):
trust_remote_code=True,
)

# set load_in_4bit=True to get performance boost, set optimize_model=False for now
# TODO align logics of optimize_model and streaming
model = AutoModelForCausalLM.from_pretrained(
model_name_or_path,
load_in_4bit=True,
Expand Down

0 comments on commit ed35455

Please sign in to comment.