Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue with chatglm2-6b #934

Open
QuPengfei opened this issue Oct 9, 2024 · 2 comments
Open

issue with chatglm2-6b #934

QuPengfei opened this issue Oct 9, 2024 · 2 comments
Assignees
Labels
category: LLM LLM pipeline (stateful, static)

Comments

@QuPengfei
Copy link

i saw the issue with chatglm2-6b.

it run successfully if with numactl -m 0 -C 0-23.
it run failed if with numactl -m 0 -C 0-31, or 0-47 , or 0-55.

i can be reproduced with INT8_ASYM or 4BIT_MAXIMUM quantization

here the command to do quatization:
python3 convert.py --model_id $model_path -c $DATA_TYPE --output_dir $target_path

here is the command line to do inference
numactl -m 0 -C 0-47 python benchmark.py -m /app/savedmodels/THUDM/chatglm2-6b/pytorch/dldt/compressed_weights/OV_FP32-INT8_ASYM/ -d CPU -n 3 -p "It is done, and submitted. You can play 'Survival of the Tastiest' on Android, and on the web. Playing on the web works, but you have to simulate multiple touch for table moving and that can be a bit confusing. There is a lot I'd like to talk about. I will go through every topic, insted of making the typical what went right/wrong list. Concept Working over the theme was probably one of the hardest tasks which I had to face. Originally, I had an idea of what kind of game I wanted to develop, gameplay wise - something with a lot of enemies/actors, simple graphics, maybe set in space, controlled from a top-down view. I was confident that I could fit any theme around it. In the end, the problem with a theme like 'Evolution' in a game is that evolution is unassisted. It happens through several seemingly random mutations over time, with the most apt permutation surviving. This genetic car simulator is, in my opinion, a great example of actual evolution of a species facing a challenge. But is it a game? In a game, you need to control something to reach an objective. That control goes against what evolution is" -r /app/output/chatglm2-6b-4BIT_MAXIMUM-16-256-256.1.csv -ic 256 -mc 2 -bs 16 --torch_compile_backend openvino --fuse_decoding_strategy -od /app/output --genai

[ INFO ] ==SUCCESS FOUND==: use_case: text_gen, model_type: chatglm2-6b
[ INFO ] OV Config={'CACHE_DIR': ''}
[ INFO ] OPENVINO_TORCH_BACKEND_DEVICE=CPU
[ INFO ] Model path=/app/savedmodels/THUDM/chatglm2-6b/pytorch/dldt/compressed_weights/OV_FP32-INT8_ASYM, openvino runtime version: 2024.4.0-16579-c3152d32c9c-releases/2024/4
[ INFO ] Pipeline initialization time: 0.98s
[ INFO ] Numbeams: 1, benchmarking iter nums(exclude warm-up): 3, prompt nums: 1, prompt idx: [0]
[ INFO ] [warm-up] Input text: It is done, and submitted. You can play 'Survival of the Tastiest' on Android, and on the web. Playing on the web works, but you have to simulate multiple touch for table moving and that can be a bit confusing. There is a lot I'd like to talk about. I will go through every topic, insted of making the typical what went right/wrong list. Concept Working over the theme was probably one of the hardest tasks which I had to face. Originally, I had an idea of what kind of game I wanted to develop, gameplay wise - something with a lot of enemies/actors, simple graphics, maybe set in space, controlled from a top-down view. I was confident that I could fit any theme around it. In the end, the problem with a theme like 'Evolution' in a game is that evolution is unassisted. It happens through several seemingly random mutations over time, with the most apt permutation surviving. This genetic car simulator is, in my opinion, a great example of actual evolution of a species facing a challenge. But is it a game? In a game, you need to control something to reach an objective. That control goes against what evolution is
[ INFO ] [warm-up] Batch_size=16, all input token size after padding: 256 * 16, all max_output_token_size: 256 * 16
[ ERROR ] An exception occurred
[ INFO ] Traceback (most recent call last):
File "/app/benchmark.py", line 856, in main
iter_data_list, pretrain_time = CASE_TO_BENCH[model_args['use_case']](model_path, framework, args.device, model_args, args.num_iters)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/benchmark.py", line 462, in run_text_generation_benchmark
text_gen_fn(input_text, num, model, tokenizer, args, iter_data_list, md5_list, prompt_idx_list[idx], bench_hook, model_precision, proc_id)
File "/app/benchmark.py", line 348, in run_text_generation_genai
generation_result = model.generate(input_text_list, max_new_tokens=max_gen_tokens, num_beams=args["num_beams"])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Exception from src/inference/src/cpp/infer_request.cpp:223:
Exception from src/plugins/intel_cpu/src/graph.cpp:1243:
Node __module.transformer.encoder.layers.0.self_attention.core_attention/aten::scaled_dot_product_attention/ScaledDotProductAttention of type ScaledDotProductAttentionWithKVCache
Check 'B == B_state' failed at src/plugins/intel_cpu/src/nodes/scaled_attn.cpp:1393:
beam idx batch: 14 is not equal to batch of state: 16

@peterchen-intel
Copy link
Collaborator

@QuPengfei Can you please share the output of "lscpu", "numactl -H", "lscpu -e"? If this can be reproduced with other models?

@andrei-kochin
Copy link
Collaborator

@QuPengfei any updates here?

@ilya-lavrenov ilya-lavrenov added the category: LLM LLM pipeline (stateful, static) label Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: LLM LLM pipeline (stateful, static)
Projects
None yet
Development

No branches or pull requests

6 participants