Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error while deserializing header: HeaderTooLarge #12492

Open
lvjingax opened this issue Dec 4, 2024 · 5 comments
Open

Error while deserializing header: HeaderTooLarge #12492

lvjingax opened this issue Dec 4, 2024 · 5 comments

Comments

@lvjingax
Copy link

lvjingax commented Dec 4, 2024

This error occurs when executing the file minicpm. py

@hkvision
Copy link
Contributor

hkvision commented Dec 5, 2024

Can you provide more information on this so that we can help you detect the issue? e.g. the exact file/command you run, more error stacks, what platform/hardware you run, etc. Thanks.

@lvjingax
Copy link
Author

lvjingax commented Dec 9, 2024

Sorry, this has already been resolved due to a model issue. However, I have encountered a new problem where the program was suddenly killed while running a Python script. Can you help me check?

(llm) root@localhost://home/lvjingang01/MiniCPM-V# python minicpm.py
/root/miniforge3/envs/llm/lib/python3.11/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
/root/miniforge3/envs/llm/lib/python3.11/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source?
warn(
2024-12-06 21:02:00,642 - INFO - intel_extension_for_pytorch auto imported
2024-12-06 21:02:00,740 - INFO - vision_config is None, using default vision config
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 10.02it/s]
2024-12-06 21:02:02,143 - INFO - Converting the current model to sym_int4 format......
/root/miniforge3/envs/llm/lib/python3.11/site-packages/torch/nn/init.py:412: UserWarning: Initializing zero-element tensors is a no-op
warnings.warn("Initializing zero-element tensors is a no-op")
Killed

@hkvision
Copy link
Contributor

hkvision commented Dec 9, 2024

Seems there's no obvious error from the log. Still you need to provide more information for us to locate the issue, e.g. the link or the content of minicpm.py, the exact minicpm model you use, what platform/hardware you run, etc. Thanks.

@lvjingax
Copy link
Author

lvjingax commented Dec 9, 2024

Seems there's no obvious error from the log. Still you need to provide more information for us to locate the issue, e.g. the link or the content of minicpm.py, the exact minicpm model you use, what platform/hardware you run, etc. Thanks.

This is the source code for minicpm.py
import os
import torch
from PIL import Image
from ipex_llm.transformers import AutoModel
#from transformers import AutoModel
from transformers import AutoTokenizer
import time

model_path = "/home/lvjingang01/git/MiniCPM-V-2_6/"
#model_path = "// MiniCPM-V-2_6/"
model = AutoModel.from_pretrained(model_path, trust_remote_code=True, load_in_low_bit="sym_int4") #"fp8",
# optimize_model=True, modules_to_not_convert=["vpm", "resampler"]

model = model.eval()
model = model.float()

#model = model.half() # /transformers/generation/utils.py", line 2415, in _sample
# next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
# RuntimeError: probability tensor contains either inf, nan or element < 0
#model = model.bfloat16() #RuntimeError: unsupported dtype, only fp32 and fp16 are supported
model = model.to('xpu')

tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

def run_minicpm(image_path,question):
image = Image.open(image_path).convert('RGB')
msgs = [{'role': 'user', 'content': question}]
torch.xpu.synchronize()
timeStart = time.time()

res = model.chat(
    image=image,
    msgs=msgs,
    tokenizer=tokenizer,
    sampling=True,
    stream=True,
    temperature=0.7,
)

timeFirstRecord = False

generated_text = ""
for new_text in res:
    if timeFirstRecord == False:
        torch.xpu.synchronize()
        timeFirst = time.time() - timeStart
        timeFirstRecord = True
    generated_text += new_text
    #print(new_text, flush=True, end='')

torch.xpu.synchronize()
timeCost = time.time() - timeStart
token_count_input = len(tokenizer.tokenize(question))
token_count_output = len(tokenizer.tokenize(generated_text))

ms_first_token = timeFirst# * 1000
ms_rest_token = (timeCost - timeFirst) / (token_count_output - 1+1e-8) * 1000
print("\ninput: ", question)
print("output: ", generated_text)
print("token count input: ", token_count_input)
print("token count output: ", token_count_output)
print("time cost(s): ", timeCost)
print("First token latency(s): ", ms_first_token)
print("After token latency(ms/token)", ms_rest_token)
print("output token/s: ", token_count_output/timeCost)
print("output char/s",len(generated_text)/timeCost)
print("******** image path = ",image_path)
print("_______________")

print(res)

print("Start predict")
#run_minicpm('./test_image/guo.png','What are in the image?')
#run_minicpm('./cat.JPG','这是什么品种的猫')
#run_minicpm('./dog.JPG','这是什么?')
#run_minicpm('./green.JPG','图片内是什么植物')
#run_minicpm('./umbrella.JPG','图内的文字是什么意思?')
#run_minicpm('./road.JPG','图片内是什么内容')
#run_minicpm('./tree.JPG','图片内是什么植物')

定义包含.jpg文件的目录

directory = 'Picture'

定义每个图片的提示

prompt = '请描述这张图片,危险吗?50字。'

for filename in os.listdir(directory):
if filename.endswith('.jpg'):
image_path = os.path.join(directory, filename)
print(f"查到的文件名: {filename}")
run_minicpm(image_path, prompt)

@hkvision
Copy link
Contributor

hkvision commented Dec 9, 2024

We suppose the application gets killed due to OOM.
Please follow this example we provide to run minicpm-v-2_6: https://github.com/intel-analytics/ipex-llm/blob/main/python/llm/example/GPU/HuggingFace/Multimodal/MiniCPM-V-2_6/chat.py
More specifically, use model.half() for fp16 model and add modules_to_not_convert=["vpm", "resampler"] when loading the model. Please do not comment out these key code in our example code. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants