Skip to content

Commit

Permalink
updated readme (#384)
Browse files Browse the repository at this point in the history
  • Loading branch information
firojalam authored Jan 21, 2025
1 parent 800926c commit 66a2970
Showing 1 changed file with 25 additions and 3 deletions.
28 changes: 25 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# LLMeBench: A Flexible Framework for Accelerating LLMs Benchmarking

This repository contains code for the LLMeBench framework (described in <a href="https://arxiv.org/abs/2308.04945" target="_blank">this paper</a>). The framework currently supports evaluation of a variety of NLP tasks using **three** model providers: OpenAI (e.g., [GPT](https://platform.openai.com/docs/guides/gpt)), [HuggingFace Inference API](https://huggingface.co/docs/api-inference/), and Petals (e.g., [BLOOMZ](https://huggingface.co/bigscience/bloomz)); it can be seamlessly customized for any NLP task, LLM model and dataset, regardless of language.
This repository contains code for the LLMeBench framework (described in <a href="https://aclanthology.org/2024.eacl-demo.23/" target="_blank">this paper</a>). The framework currently supports evaluation of a variety of NLP tasks using **three** model providers: OpenAI (e.g., [GPT](https://platform.openai.com/docs/guides/gpt)), [HuggingFace Inference API](https://huggingface.co/docs/api-inference/), and Petals (e.g., [BLOOMZ](https://huggingface.co/bigscience/bloomz)); it can be seamlessly customized for any NLP task, LLM model and dataset, regardless of language.

<!---"https://github.com/qcri/LLMeBench/assets/3918663/15d989e0-edc7-489a-ba3b-36184a715383"--->

Expand All @@ -11,6 +11,7 @@ This repository contains code for the LLMeBench framework (described in <a href=
</p>

## Recent Updates
- 20st January, 2025 -- New assets added. Updated versions for openai, anthropic and sentence_transformers.
- 21st July, 2024 -- Multimodal capabilities have been added. Assets now include support for GPT-4 (OpenAI) and Sonet (Anthropic).

## Overview
Expand All @@ -22,8 +23,8 @@ validated in LLMeBench." src="https://github.com/qcri/LLMeBench/assets/3918663/8
</p>

Developing **LLMeBench** is an ongoing effort and it will be continuously expanded. Currently, the framework features the following:
- Supports 34 [tasks](llmebench/tasks) featuring 7 [model providers](llmebench/models). Tested with 66 [datasets](llmebench/datasets) associated with 16 languages, resulting in **300 [benchmarking assets](assets/)** ready to run.
- Support for **multimodality**
- Supports 34 [tasks](llmebench/tasks) featuring 7 [model providers](llmebench/models). Tested with 66 [datasets](llmebench/datasets) associated with 16 languages, resulting in **800 [benchmarking assets](assets/)** ready to run.
- Support for **text, speech, and multimodality**
- Easily extensible to new models accessible through APIs.
- Extensive caching capabilities, to avoid costly API re-calls for repeated experiments.
- Supports zero- and few-shot learning paradigms.
Expand Down Expand Up @@ -127,4 +128,25 @@ Please cite our papers when referring to this framework:
address = {Malta},
publisher = {Association for Computational Linguistics},
}
```

Please consider citing the following papers if you use the assets derived from them.

```
@inproceedings{kmainasi2024native,
title={Native vs non-native language prompting: A comparative analysis},
author={Kmainasi, Mohamed Bayan and Khan, Rakif and Shahroor, Ali Ezzat and Bendou, Boushra and Hasanain, Maram and Alam, Firoj},
booktitle={International Conference on Web Information Systems Engineering},
pages={406--420},
year={2024},
organization={Springer}
}
@article{hasan2024nativqa,
title={{NativQA}: Multilingual culturally-aligned natural query for {LLMs}},
author={Hasan, Md Arid and Hasanain, Maram and Ahmad, Fatema and Laskar, Sahinur Rahman and Upadhyay, Sunaya and Sukhadia, Vrunda N and Kutlu, Mucahid and Chowdhury, Shammur Absar and Alam, Firoj},
journal={arXiv preprint arXiv:2407.09823},
year={2024}
}
```

0 comments on commit 66a2970

Please sign in to comment.