Customization Guidelines

Adding a New Benchmark

You can refer to the structure of the src/eval_suite/benchs/exampleqa folder, which serves as a minimal benchmark example. Additionally, you might want to check the src/eval_suite/benchs/base_dataset.py and src/eval_suite/benchs/base_evaluator.py files, as they provide the base classes for benchmarks.

Creating a Benchmark Folder
- Create a new folder under the benchs directory.
- The folder should contain the dataset, evaluator, and any other necessary files.
- The folder should include a README.md file to provide an overview of the benchmark and any specific instructions for running it.
Dataset
- Ensure the benchmark folder includes a dataset.py file, which contains the dataset loading logic.
- Implement a subclass that inherits from BaseDataset.
- The subclass must implement the load method, which returns a list[dict], where each element is an evaluation data sample and must contain a unique id field.
Evaluator
- The folder must include an evaluator script, typically named eval_{benchmark_name}.py, which implements the evaluation logic.
- This script should inherit from BaseEvaluator and contain the benchmark's evaluation logic.
- The subclass should implement set_generation_configs to determine the default token generation settings for the LLM during evaluation.
- Implement load_batched_dataset to load the batched dataset from dataset.py.
- Implement scoring to evaluate one data item from the datasets, returning the evaluation result in a dictionary format.
- Implement compute_overall to aggregate the evaluation results into an overall assessment, returning a dictionary with the final evaluation results.
Registering the Benchmark
- Add the benchmark to the __init__.py file under the benchs directory to ensure it is discoverable by the framework.
- Import the benchmark class in the __init__.py file and add it to the __all__ list.
Documentation
- Create a README.md file in the benchmark folder.
- Include any specific instructions or requirements for running the benchmark.
- Add one line to the README.md file in the root directory under the ## Eval Suite section to introduce the new benchmark.

Adding a New Model Loader

You can refer to the src/eval_suite/llms/huggingface.py and src/eval_suite/llms/openai_api.py files as examples for loading LLMs.

Language Model Loader
- Create a new file under the llms directory.
- The file should contain the logic for loading the LLM from a specific source (e.g., Hugging Face, OpenAI API).
Implementation Steps
- Implement a subclass that inherits from BaseLLM.
- The subclass must implement update_generation_configs to handle parameter conversions, as different LLM loaders may have varying parameter names (e.g., max_tokens vs. max_new_tokens).
- The subclass must implement _request, which accepts a str as input and returns a str as the generated output.
Registering the LLM Loader
- Register the new LLM loader in the __init__.py file under the llms directory.
- Import the new LLM loader in the __init__.py file and add it to the __all__ list.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add-bench-or-model.md

add-bench-or-model.md

Customization Guidelines

Adding a New Benchmark

Adding a New Model Loader

Files

add-bench-or-model.md

Latest commit

History

add-bench-or-model.md

File metadata and controls

Customization Guidelines

Adding a New Benchmark

Adding a New Model Loader