You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Bug Reporting
Encounter segmentation fault when using INT4 version of vicuna-7b-v1.5 to generate the tokens with model.generate(). The detailed codes are as follows:
First, generate and store the quantized model:
from bigdl.llm.transformers import AutoModelForCausalLM as LLMINTAutoModelForCasualLM
model_dir = 'lmsys/vicuna-7b-v1.5'
quan_model_name = './model_repo/vicuna-7b-4bit'
tokenizer = AutoTokenizer.from_pretrained(model_dir, use_fast=True)
model = LLMINTAutoModelForCasualLM.from_pretrained(model_dir, load_in_low_bit='sym_int4')
tokenizer.save_pretrained(quan_model_name)
model.save_low_bit(quan_model_name)
Then load the model to inference:
model_dir = './model_repo/vicuna-7b-4bit'
prompt = "Messi is"
tokenizer = AutoTokenizer.from_pretrained(model_dir)
from bigdl.llm.transformers import AutoModelForCausalLM as LLMINTAutoModelForCausalLM
model = LLMINTAutoModelForCausalLM.load_low_bit(model_dir)
model = model.to('cuda:1')
model_input = tokenizer.encode(prompt, return_tensors='pt').to('cuda:1')
output = model.generate(model_input)
text = tokenizer.batch_decode(output)
The inference has no other output and only report Segmentation fault. I used the print function to observe that the segmentation fault happens in model.generate(model_input).
Environment
uname -a
Linux iZ6we0ie5asd1wp4evzoq0Z 5.4.0-150-generic #167-Ubuntu SMP Mon May 15 17:35:05 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Hi @TaoLbr1993, it seems that you are running Bigdl-LLM GPU code on a nvidia GPU?
Actually Bigdl-LLM GPU code is designed for Intel GPUs and only works for Intel GPUs now.
Hi @TaoLbr1993, it seems that you are running Bigdl-LLM GPU code on a nvidia GPU? Actually Bigdl-LLM GPU code is designed for Intel GPUs and only works for Intel GPUs now.
Oh yes, I just use a nvidia GPU to test the code of bigdl. Very thanks for your quick apply!
Encounter segmentation fault when using INT4 version of vicuna-7b-v1.5 to generate the tokens with
model.generate()
. The detailed codes are as follows:First, generate and store the quantized model:
Then load the model to inference:
The inference has no other output and only report
Segmentation fault
. I used the print function to observe that the segmentation fault happens inmodel.generate(model_input)
.I've tried the solution to segmentation faults in issue infernece chatGLM2-6b with BigDL-LLM INT4 failed #9336 with
ulimit -n 2048
, but it does not work.The text was updated successfully, but these errors were encountered: