Evals is a synthetic data generation and evaluation framework for LLMs and RAG applications.
It has 2 main modules:
- datagen
- eval
A high level architecture diagram of evals is the following:
Architecture diagram

To get started with evals, follow these steps:
- Clone the repository to your local machine.
- Install the necessary dependencies by running
pip install -r requirements.txt
in the project directory. - Create a copy of
config/config.toml.template
and name itconfig/config.toml
. - Update 2 sections in the
config.toml
file:MISC
- Configure your SSL cert file location.
DATAGEN
- Set
DATA_DIR
variable controls the location of the data corpus to generate synthetic data from, it’s relative to thedatagen/data/
directory. In other words, add your data directories in there and specify their name in the variable. - The
GEN_PROVIDER
variable allows choosing betweenazure
orvertex
. - Add in the rest variables desired for generative purposes.
- Set
DATAEVAL
EVAL_TESTS
offers a list of evaluation tests supported by the framework. The possible options areAnswerRelevancy
,Hallucination
,Faithfulness
,Bias
,Toxicity
,Correctness
,Coherence
,PromptInjection
,PromptBreaking
,PromptLeakage
.- The
EVAL_RPVODER
variable allows choosing betweenazure
orvertex
. - Add in the rest of variables required for the model desired to use as judge for evaluations.
To run the synthetic data generation module:
-
Modify/adapt the sample client provided (
datagen/client.py
) -
Run
python -m datagen.client
-
The synthetically generated data will be stored in the
datagen/qa_out/
directory as a CSV file with the format:```csv question,context,ground_truth ```
To run the eval module:
- Modify/adapt the sample client provided (
eval/client.py
)- The input data needs to match the format of the data produced by the synthetic data generation (
question
,context
,ground_truth
). - The
ground_truth
may or may not be used depending on the settinguse_answers_from_dataset
. When set toFalse
it will ignore that data column and generate new outputs using the configured generative model.
- The input data needs to match the format of the data produced by the synthetic data generation (
- Start MlFlow by running:
mlflow ui --port 5000
- Run
python -m eval.client
- Monitor and analyse the eval results on your local MlFlow interface here: http://localhost:5000