Kindly request for answer extraction script and clarification on some experiment details #1

jinghan23 · 2024-06-19T09:44:35Z

Hi authors, thanks for your amazing work which contributes to long video understanding a lot!
I'm repeating your experiments on LLaVA-NeXT-Video. I meet some problems and would like to know how you are solving them.

Would you mind providing details on which LLaVA-NeXT-Video model you are testing on, lmms-lab/LLaVA-NeXT-Video-34B-DPO or lmms-lab/LLaVA-NeXT-Video-7B-DPO or models without DPO?
I experiment with lmms-lab/LLaVA-NeXT-Video-7B-DPO at first and find that current instruction didn't request the assistant to answer with one exact option. They sometimes reply with a paragraph of reasoning, which makes extracting answers not easy. Would you mind provide your answer extraction script or shed some light on how to evaluate based on raw response? (Or are you evaluating in a perplexity-based mode? like connecting four options with the instruction seperately to see which one has lower perpleixity.)
When I'm evaluating on LVBench with the official repo of LLaVA-NeXT-Video, I find that some videos cannot be read in because decord library does not support AV1 codec currently. I edit video2dataset package as in this issue and re-download LVBench videos using your download.sh, but there are still four videos fail to be processed. I really appreciate it if you can share experiment details like how you are downloading and dealing with AV1 codec stuff.
There's a very interesting point mentioned in section 4.4 in the paper about "using large language models (LLMs) to filter question-answer pairs". But I'm a little bit confused about what LLM filtering means. Is it like only provide the instruction and question as input without the video and check which option will the LLM guess to choose? But it's hard to understand that this method get a even higher score than input with video. Would you mind making a simple clarify on this interesting discovery?

Thanks in advance for your helpful reply.

The text was updated successfully, but these errors were encountered:

huangshiyu13 · 2024-06-20T02:34:18Z

Response:

lmms-lab/LLaVA-NeXT-Video-34B-DPO
You can use this script to get the final option. (No perplexity-based mode.)
We save all the videos to mp4 files. You can convert the video using ffmpeg first, then read them using decord.
We find some questions are easy for the LLM to guess the answer. So we remove them out from the original dataset. Below is a filtered example from the original dataset:

{
'question': 'Why is the whole body of a man covered with white cloth?\n(A) He is sleepy\n(B) He is dead\n(C) He is tired\n(D) He is married', 
'answer': 'B'
}

jinghan23 changed the title ~~Request for answer extraction script and other questions on experiment details~~ Kindly request for answer extraction script and clarification on some experiment details Jun 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kindly request for answer extraction script and clarification on some experiment details #1

Kindly request for answer extraction script and clarification on some experiment details #1

jinghan23 commented Jun 19, 2024

huangshiyu13 commented Jun 20, 2024 •

edited

Loading

Kindly request for answer extraction script and clarification on some experiment details #1

Kindly request for answer extraction script and clarification on some experiment details #1

Comments

jinghan23 commented Jun 19, 2024

huangshiyu13 commented Jun 20, 2024 • edited Loading

huangshiyu13 commented Jun 20, 2024 •

edited

Loading