CAMEL-Bench: A Comprehensive Arabic LMM Benchmark
[NACCL 2025 🔥]

Overview

CAMEL-Bench is a Comprehensive Arabic LMM Benchmark designed to evaluate and improve the capabilities of Large Multimodal Models (LMMs) in the Arabic language. Our benchmark aims to bridge the gap in multimodal model evaluation for Arabic, which represents a large population of over 400 million speakers worldwide.

The benchmark includes eight diverse domains and 38 sub-domains to rigorously assess the performance of LMMs in visual reasoning and understanding tasks. It comprises over 29K questions, curated by native Arabic speakers, ensuring high-quality evaluation.

Key Features

Eight Domains of Evaluation: Multimodal Understanding and Reasoning, OCR and Document Understanding, Chart and Diagram Understanding, Video Understanding, Cultural-Specific Understanding, Medical Imaging, Agricultural Image Understanding, and Remote Sensing Understanding.
Over 29,000 Questions: Carefully curated by native Arabic speakers to ensure quality and accuracy.
Broad Scope: Evaluates models in domains such as medical imaging, cultural-specific understanding, and remote sensing.
Open and Closed Source Evaluation: We provide a leaderboard featuring results from both closed-source models (e.g., GPT-4o) and open-source LMMs.

📢 Latest Updates

Oct 2024 🔥 CAMEL-Bench in released on HuggingFace CAMEL-Bench Dataset 🤗.
Jan 2025 🔥🔥 CAMEL-Bench in accepted for NACCL 2025 conference.

Leaderboard

Our leaderboard provides a performance comparison of different models evaluated on CAMEL-Bench. Current top performers include GPT-4o with an overall score of 62% and other notable models such as Gemini-1.5-Pro.

Installation

To get started with CAMEL-Bench, clone the repository and install the dependencies:

$ git clone https://github.com/mbzuai-oryx/Camel-Bench.git
$ cd Camel-Bench
$ pip install -r requirements.txt

Getting Started

The benchmark can be easily executed using the provided scripts:

$ python scripts/eval_qwen.py

To evaluate on your model, just modify the generate_qwen function in scripts/eval_qwen.py.

Dataset

Our dataset is hosted on HuggingFace, and can be accessed here: CAMEL-Bench Dataset 🤗.

Citation

If you use CAMEL-Bench in your research, please consider citing:

@article{ghaboura2024camelbench,
  title={CAMEL-Bench: A Comprehensive Arabic LMM Benchmark},
  author={Sara Ghaboura, Ahmed Heakl, Omkar Thawakar, Ali Alharthi, Ines Riahi, Abduljalil Saif, Jorma Laaksonen, Fahad S. Khan, Salman Khan, Rao M. Anwer},
  journal={arXiv preprint arXiv:2410.18976},
  year={2024}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Join Us!

We welcome contributions to CAMEL-Bench! Just push a pull request or issue to get started.

Contact

Ahmed Heakl: [email protected]

For questions or suggestions, feel free to reach out to us on GitHub Discussions.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
assets		assets
scripts		scripts
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CAMEL-Bench: A Comprehensive Arabic LMM Benchmark
[NACCL 2025 🔥]

Overview

Key Features

📢 Latest Updates

Leaderboard

Installation

Getting Started

Dataset

Citation

License

Join Us!

Contact

About

Releases

Packages

Contributors 4

Languages

License

mbzuai-oryx/Camel-Bench

Folders and files

Latest commit

History

Repository files navigation

CAMEL-Bench: A Comprehensive Arabic LMM Benchmark [NACCL 2025 🔥]

Overview

Key Features

📢 Latest Updates

Leaderboard

Installation

Getting Started

Dataset

Citation

License

Join Us!

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

CAMEL-Bench: A Comprehensive Arabic LMM Benchmark
[NACCL 2025 🔥]

Packages