Skip to content

Latest commit

 

History

History
33 lines (21 loc) · 2.6 KB

README.md

File metadata and controls

33 lines (21 loc) · 2.6 KB

AnimalBench: Benchmarking Multimodal Video Models for Animal-centric Video Understanding

This codebase provides the data and code of our NeurIPS 2024 paper:

Yinuo Jing, Ruxu Zhang, Kongming Liang*, Yongxiang Li, Zhongjiang He, Zhanyu Ma and Jun Guo, "Animal-Bench: Benchmarking Multimodal Video Models for Animal-centric Video Understanding", in Proceedings of Neural Information Processing Systems (NeurIPS), 2024.

👀 About Animal-Bench

image

Previous benchmarks (left) relied on limited agent and the scenarios of editing-based benchmarks are unrealistic. Our proposed Animal-Bench (right) includes diverse animal agents, various realistic scenarios, and encompasses 13 different tasks.

Task Demonstration

image

Effectiveness evaluation results: image

Robustness evaluation results: image

Evaluation on Animal-Bench

Data: You can access and download the MammalNet, Animal Kingdom, LoTE-Animal, MSRVTT-QA, TGIF-QA, NExT-QA dataset to obtain the data used in the paper or you can use your own data.

Annotations: You can find our question-answer pair annotation files in /data.

Models: We mainly referred to MVBench to write the test code. You can refer to the structure of several model files in the /model folder to test your own model.

Acknowledgement

Thanks to the open source of the following projects: Chat-UniVi, mPLUG-Owl, Valley, VideoChat, VideoChat2, Video-ChatGPT, Video-LLaMA, Video-LLaVA.