qEvals

What is evals?

Evals is a synthetic data generation and evaluation framework for LLMs and RAG applications.

It has 2 main modules:

A high level architecture diagram of evals is the following:

Architecture diagram

To get started with evals, follow these steps:

Clone the repository to your local machine.
Install the necessary dependencies by running pip install -r requirements.txt in the project directory.
Create a copy of config/config.toml.template and name it config/config.toml.
Update 2 sections in the config.toml file:
1. MISC
  - Configure your SSL cert file location.
2. DATAGEN
  - Set DATA_DIR variable controls the location of the data corpus to generate synthetic data from, it’s relative to the datagen/data/ directory. In other words, add your data directories in there and specify their name in the variable.
  - The GEN_PROVIDER variable allows choosing between azure or vertex.
  - Add in the rest variables desired for generative purposes.
3. DATAEVAL
  - EVAL_TESTS offers a list of evaluation tests supported by the framework. The possible options are AnswerRelevancy, Hallucination, Faithfulness, Bias, Toxicity, Correctness, Coherence, PromptInjection, PromptBreaking, PromptLeakage.
  - The EVAL_RPVODER variable allows choosing between azure or vertex.
  - Add in the rest of variables required for the model desired to use as judge for evaluations.

To run the synthetic data generation module:

To run the eval module:

Modify/adapt the sample client provided (eval/client.py)
1. The input data needs to match the format of the data produced by the synthetic data generation (question,context,ground_truth).
2. The ground_truth may or may not be used depending on the setting use_answers_from_dataset. When set to False it will ignore that data column and generate new outputs using the configured generative model.
Start MlFlow by running:
```
mlflow ui --port 5000
```
Run python -m eval.client
Monitor and analyse the eval results on your local MlFlow interface here: http://localhost:5000

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
client		client
config		config
datagen		datagen
eval		eval
static		static
.gitignore		.gitignore
README.md		README.md