Segmentation fault when inference on GPU #9363

TaoLbr1993 · 2023-11-06T08:52:02Z

Bug Reporting
Encounter segmentation fault when using INT4 version of vicuna-7b-v1.5 to generate the tokens with model.generate(). The detailed codes are as follows:

First, generate and store the quantized model:

from bigdl.llm.transformers import AutoModelForCausalLM as LLMINTAutoModelForCasualLM
model_dir = 'lmsys/vicuna-7b-v1.5'
quan_model_name = './model_repo/vicuna-7b-4bit'
tokenizer = AutoTokenizer.from_pretrained(model_dir, use_fast=True)
model = LLMINTAutoModelForCasualLM.from_pretrained(model_dir, load_in_low_bit='sym_int4')
tokenizer.save_pretrained(quan_model_name)
model.save_low_bit(quan_model_name)

Then load the model to inference:

model_dir = './model_repo/vicuna-7b-4bit'
prompt = "Messi is"
tokenizer = AutoTokenizer.from_pretrained(model_dir)
from bigdl.llm.transformers import AutoModelForCausalLM as LLMINTAutoModelForCausalLM
model = LLMINTAutoModelForCausalLM.load_low_bit(model_dir)
model = model.to('cuda:1')

model_input = tokenizer.encode(prompt, return_tensors='pt').to('cuda:1')
output = model.generate(model_input)
text = tokenizer.batch_decode(output)

The inference has no other output and only report Segmentation fault. I used the print function to observe that the segmentation fault happens in model.generate(model_input).

Environment

uname -a
Linux iZ6we0ie5asd1wp4evzoq0Z 5.4.0-150-generic #167-Ubuntu SMP Mon May 15 17:35:05 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

conda list (part of)
torchaudio                2.1.0               py311_cu121
torchtriton               2.1.0                     py311 
torchvision               0.16.0              py311_cu121 
pytorch                   2.1.0           py3.11_cuda12.1_cudnn8.9.2_0 
pytorch-cuda              12.1                 ha16c6d3_5
transformers              4.31.0                   pypi_0
datasets                  2.12.0          py311h06a4308_0
bigdl-llm                 2.4.0b20230809           pypi_0

Others
I've tried the solution to segmentation faults in issue infernece chatGLM2-6b with BigDL-LLM INT4 failed #9336 with ulimit -n 2048, but it does not work.

The text was updated successfully, but these errors were encountered:

rnwang04 · 2023-11-06T12:37:48Z

Hi @TaoLbr1993, it seems that you are running Bigdl-LLM GPU code on a nvidia GPU?
Actually Bigdl-LLM GPU code is designed for Intel GPUs and only works for Intel GPUs now.

TaoLbr1993 · 2023-11-07T01:48:29Z

Hi @TaoLbr1993, it seems that you are running Bigdl-LLM GPU code on a nvidia GPU? Actually Bigdl-LLM GPU code is designed for Intel GPUs and only works for Intel GPUs now.

Oh yes, I just use a nvidia GPU to test the code of bigdl. Very thanks for your quick apply!

jason-dai added the user issue label Nov 6, 2023

rnwang04 closed this as completed Nov 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segmentation fault when inference on GPU #9363

Segmentation fault when inference on GPU #9363

TaoLbr1993 commented Nov 6, 2023

rnwang04 commented Nov 6, 2023 •

edited

Loading

TaoLbr1993 commented Nov 7, 2023

Segmentation fault when inference on GPU #9363

Segmentation fault when inference on GPU #9363

Comments

TaoLbr1993 commented Nov 6, 2023

rnwang04 commented Nov 6, 2023 • edited Loading

TaoLbr1993 commented Nov 7, 2023

rnwang04 commented Nov 6, 2023 •

edited

Loading