This codebase provides the data and code of our NeurIPS 2024 paper:
Yinuo Jing, Ruxu Zhang, Kongming Liang*, Yongxiang Li, Zhongjiang He, Zhanyu Ma and Jun Guo, "Animal-Bench: Benchmarking Multimodal Video Models for Animal-centric Video Understanding", in Proceedings of Neural Information Processing Systems (NeurIPS), 2024.
Previous benchmarks (left) relied on limited agent and the scenarios of editing-based benchmarks are unrealistic. Our proposed Animal-Bench (right) includes diverse animal agents, various realistic scenarios, and encompasses 13 different tasks.
Task Demonstration
Effectiveness evaluation results:
Robustness evaluation results:
Data: You can access and download the MammalNet, Animal Kingdom, LoTE-Animal, MSRVTT-QA, TGIF-QA, NExT-QA dataset to obtain the data used in the paper or you can use your own data.
Annotations: You can find our question-answer pair annotation files in /data.
Models: We mainly referred to MVBench to write the test code. You can refer to the structure of several model files in the /model folder to test your own model.
Thanks to the open source of the following projects: Chat-UniVi, mPLUG-Owl, Valley, VideoChat, VideoChat2, Video-ChatGPT, Video-LLaMA, Video-LLaVA.