Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault when inference on GPU #9363

Closed
TaoLbr1993 opened this issue Nov 6, 2023 · 2 comments
Closed

Segmentation fault when inference on GPU #9363

TaoLbr1993 opened this issue Nov 6, 2023 · 2 comments

Comments

@TaoLbr1993
Copy link

  1. Bug Reporting
    Encounter segmentation fault when using INT4 version of vicuna-7b-v1.5 to generate the tokens with model.generate(). The detailed codes are as follows:

First, generate and store the quantized model:

from bigdl.llm.transformers import AutoModelForCausalLM as LLMINTAutoModelForCasualLM
model_dir = 'lmsys/vicuna-7b-v1.5'
quan_model_name = './model_repo/vicuna-7b-4bit'
tokenizer = AutoTokenizer.from_pretrained(model_dir, use_fast=True)
model = LLMINTAutoModelForCasualLM.from_pretrained(model_dir, load_in_low_bit='sym_int4')
tokenizer.save_pretrained(quan_model_name)
model.save_low_bit(quan_model_name)

Then load the model to inference:

model_dir = './model_repo/vicuna-7b-4bit'
prompt = "Messi is"
tokenizer = AutoTokenizer.from_pretrained(model_dir)
from bigdl.llm.transformers import AutoModelForCausalLM as LLMINTAutoModelForCausalLM
model = LLMINTAutoModelForCausalLM.load_low_bit(model_dir)
model = model.to('cuda:1')

model_input = tokenizer.encode(prompt, return_tensors='pt').to('cuda:1')
output = model.generate(model_input)
text = tokenizer.batch_decode(output)

The inference has no other output and only report Segmentation fault. I used the print function to observe that the segmentation fault happens in model.generate(model_input).

  1. Environment
uname -a
Linux iZ6we0ie5asd1wp4evzoq0Z 5.4.0-150-generic #167-Ubuntu SMP Mon May 15 17:35:05 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
conda list (part of)
torchaudio                2.1.0               py311_cu121
torchtriton               2.1.0                     py311 
torchvision               0.16.0              py311_cu121 
pytorch                   2.1.0           py3.11_cuda12.1_cudnn8.9.2_0 
pytorch-cuda              12.1                 ha16c6d3_5
transformers              4.31.0                   pypi_0
datasets                  2.12.0          py311h06a4308_0
bigdl-llm                 2.4.0b20230809           pypi_0
  1. Others
    I've tried the solution to segmentation faults in issue infernece chatGLM2-6b with BigDL-LLM INT4 failed #9336 with ulimit -n 2048, but it does not work.
@rnwang04
Copy link
Contributor

rnwang04 commented Nov 6, 2023

Hi @TaoLbr1993, it seems that you are running Bigdl-LLM GPU code on a nvidia GPU?
Actually Bigdl-LLM GPU code is designed for Intel GPUs and only works for Intel GPUs now.

@TaoLbr1993
Copy link
Author

Hi @TaoLbr1993, it seems that you are running Bigdl-LLM GPU code on a nvidia GPU? Actually Bigdl-LLM GPU code is designed for Intel GPUs and only works for Intel GPUs now.

Oh yes, I just use a nvidia GPU to test the code of bigdl. Very thanks for your quick apply!

@rnwang04 rnwang04 closed this as completed Nov 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants