🚀 EvalGen Project

This project allows you to create and run LLM judges based on annotated datasets using Weights & Biases (wandb) and Weave for tracking and tracing.

🛠️ Setup

WANDB_EMAIL=your_wandb_email 
WANDB_API_KEY=your_wandb_api_key
OPENAI_API_KEY=your_openai_api_key

To start the annotation app, run:

python main.py

This will launch a web interface for annotating your dataset.

To programmatically create an LLM judge from your wandb dataset annotations:

This will generate a judge like the one in forged_judge.

To load and run the generated judge:

This will evaluate your dataset using the forged judge, with results fully tracked and traced using Weave.

All components are integrated with Weave for comprehensive tracking and tracing of your machine learning workflow.

Happy evaluating! 🎉