issue with chatglm2-6b #934

QuPengfei · 2024-10-09T05:46:40Z

i saw the issue with chatglm2-6b.

it run successfully if with numactl -m 0 -C 0-23.
it run failed if with numactl -m 0 -C 0-31, or 0-47 , or 0-55.

i can be reproduced with INT8_ASYM or 4BIT_MAXIMUM quantization

here the command to do quatization:
python3 convert.py --model_id $model_path -c $DATA_TYPE --output_dir $target_path

here is the command line to do inference
numactl -m 0 -C 0-47 python benchmark.py -m /app/savedmodels/THUDM/chatglm2-6b/pytorch/dldt/compressed_weights/OV_FP32-INT8_ASYM/ -d CPU -n 3 -p "It is done, and submitted. You can play 'Survival of the Tastiest' on Android, and on the web. Playing on the web works, but you have to simulate multiple touch for table moving and that can be a bit confusing. There is a lot I'd like to talk about. I will go through every topic, insted of making the typical what went right/wrong list. Concept Working over the theme was probably one of the hardest tasks which I had to face. Originally, I had an idea of what kind of game I wanted to develop, gameplay wise - something with a lot of enemies/actors, simple graphics, maybe set in space, controlled from a top-down view. I was confident that I could fit any theme around it. In the end, the problem with a theme like 'Evolution' in a game is that evolution is unassisted. It happens through several seemingly random mutations over time, with the most apt permutation surviving. This genetic car simulator is, in my opinion, a great example of actual evolution of a species facing a challenge. But is it a game? In a game, you need to control something to reach an objective. That control goes against what evolution is" -r /app/output/chatglm2-6b-4BIT_MAXIMUM-16-256-256.1.csv -ic 256 -mc 2 -bs 16 --torch_compile_backend openvino --fuse_decoding_strategy -od /app/output --genai

[ INFO ] ==SUCCESS FOUND==: use_case: text_gen, model_type: chatglm2-6b
[ INFO ] OV Config={'CACHE_DIR': ''}
[ INFO ] OPENVINO_TORCH_BACKEND_DEVICE=CPU
[ INFO ] Model path=/app/savedmodels/THUDM/chatglm2-6b/pytorch/dldt/compressed_weights/OV_FP32-INT8_ASYM, openvino runtime version: 2024.4.0-16579-c3152d32c9c-releases/2024/4
[ INFO ] Pipeline initialization time: 0.98s
[ INFO ] Numbeams: 1, benchmarking iter nums(exclude warm-up): 3, prompt nums: 1, prompt idx: [0]
[ INFO ] [warm-up] Input text: It is done, and submitted. You can play 'Survival of the Tastiest' on Android, and on the web. Playing on the web works, but you have to simulate multiple touch for table moving and that can be a bit confusing. There is a lot I'd like to talk about. I will go through every topic, insted of making the typical what went right/wrong list. Concept Working over the theme was probably one of the hardest tasks which I had to face. Originally, I had an idea of what kind of game I wanted to develop, gameplay wise - something with a lot of enemies/actors, simple graphics, maybe set in space, controlled from a top-down view. I was confident that I could fit any theme around it. In the end, the problem with a theme like 'Evolution' in a game is that evolution is unassisted. It happens through several seemingly random mutations over time, with the most apt permutation surviving. This genetic car simulator is, in my opinion, a great example of actual evolution of a species facing a challenge. But is it a game? In a game, you need to control something to reach an objective. That control goes against what evolution is
[ INFO ] [warm-up] Batch_size=16, all input token size after padding: 256 * 16, all max_output_token_size: 256 * 16
[ ERROR ] An exception occurred
[ INFO ] Traceback (most recent call last):
File "/app/benchmark.py", line 856, in main
iter_data_list, pretrain_time = CASE_TO_BENCH[model_args['use_case']](model_path, framework, args.device, model_args, args.num_iters)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/benchmark.py", line 462, in run_text_generation_benchmark
text_gen_fn(input_text, num, model, tokenizer, args, iter_data_list, md5_list, prompt_idx_list[idx], bench_hook, model_precision, proc_id)
File "/app/benchmark.py", line 348, in run_text_generation_genai
generation_result = model.generate(input_text_list, max_new_tokens=max_gen_tokens, num_beams=args["num_beams"])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Exception from src/inference/src/cpp/infer_request.cpp:223:
Exception from src/plugins/intel_cpu/src/graph.cpp:1243:
Node __module.transformer.encoder.layers.0.self_attention.core_attention/aten::scaled_dot_product_attention/ScaledDotProductAttention of type ScaledDotProductAttentionWithKVCache
Check 'B == B_state' failed at src/plugins/intel_cpu/src/nodes/scaled_attn.cpp:1393:
beam idx batch: 14 is not equal to batch of state: 16

peterchen-intel · 2024-10-15T03:31:20Z

@QuPengfei Can you please share the output of "lscpu", "numactl -H", "lscpu -e"? If this can be reproduced with other models?

andrei-kochin · 2024-11-01T09:34:10Z

@QuPengfei any updates here?

YuChern-Intel assigned Iffa-Intel and Munesh-Intel Oct 10, 2024

andrei-kochin assigned peterchen-intel Oct 11, 2024

ilya-lavrenov added the category: LLM LLM pipeline (stateful, static) label Nov 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

issue with chatglm2-6b #934

issue with chatglm2-6b #934

QuPengfei commented Oct 9, 2024

peterchen-intel commented Oct 15, 2024

andrei-kochin commented Nov 1, 2024

issue with chatglm2-6b #934

issue with chatglm2-6b #934

Comments

QuPengfei commented Oct 9, 2024

peterchen-intel commented Oct 15, 2024

andrei-kochin commented Nov 1, 2024