AnimalBench: Benchmarking Multimodal Video Models for Animal-centric Video Understanding

This codebase provides the data and code of our NeurIPS 2024 paper:

Yinuo Jing, Ruxu Zhang, Kongming Liang*, Yongxiang Li, Zhongjiang He, Zhanyu Ma and Jun Guo, "Animal-Bench: Benchmarking Multimodal Video Models for Animal-centric Video Understanding", in Proceedings of Neural Information Processing Systems (NeurIPS), 2024.

👀 About Animal-Bench

Previous benchmarks (left) relied on limited agent and the scenarios of editing-based benchmarks are unrealistic. Our proposed Animal-Bench (right) includes diverse animal agents, various realistic scenarios, and encompasses 13 different tasks.

Task Demonstration

Effectiveness evaluation results:

Robustness evaluation results：

Evaluation on Animal-Bench

Data: You can access and download the MammalNet, Animal Kingdom, LoTE-Animal, MSRVTT-QA, TGIF-QA, NExT-QA dataset to obtain the data used in the paper or you can use your own data.

Annotations: You can find our question-answer pair annotation files in /data.

Models: We mainly referred to MVBench to write the test code. You can refer to the structure of several model files in the /model folder to test your own model.

Acknowledgement

Thanks to the open source of the following projects: Chat-UniVi, mPLUG-Owl, Valley, VideoChat, VideoChat2, Video-ChatGPT, Video-LLaMA, Video-LLaVA.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

AnimalBench: Benchmarking Multimodal Video Models for Animal-centric Video Understanding

👀 About Animal-Bench

Evaluation on Animal-Bench

Acknowledgement

Files

README.md

Latest commit

History

README.md

File metadata and controls

AnimalBench: Benchmarking Multimodal Video Models for Animal-centric Video Understanding

👀 About Animal-Bench

Evaluation on Animal-Bench

Acknowledgement