LLM as a judge

A quick CLI tool to test whether an LLM outperform another LLM based on the paper LLM-as-a-judge method.

Installation

pip install -r requirements.txt

cp .env.sample .env

by default it's using OLLAMA to run the judge work.

python main.py

The CLI will ask the question and response of LLM A and LLM B, and then will run the benchmark using MODELS as jury.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.env.sample		.env.sample
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cli.py		cli.py
demo.png		demo.png
demo.py		demo.py
main.py		main.py
prompt.py		prompt.py
requirements.txt		requirements.txt