-
Hi all, thanks for the great work! I have a question about the evaluation of chat-based model (qwen-72b-chat). As the OpenCompass leaderboard shows (https://opencompass.org.cn/leaderboard-llm), qwen-14b-chat got 71.7 acc on C-Eval dataset. I've checked its config file is"cmmlu_gen_c13365.py", which uses a 5-shot prompt. When I do my own evaluation on qwen-72b-chat, I use the same config file, but the output is not as expected. Sometimes the output contains answers for all 5-shot sample questions plus the real question. Is it normal? How to post-process the output in this case? Did you filter out the last answer (A, B, C, or D) as the prediction label? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
It seems that you are using the wrong |
Beta Was this translation helpful? Give feedback.
It seems that you are using the wrong
qwen-72b-chat
model config as the user & bot prompt is missing, plz try this:https://github.com/open-compass/opencompass/blob/8798336b8593ce059ff0e54b2f2faf78c328bccc/configs/models/qwen/hf_qwen_72b_chat.py