Skip to content

Commit

Permalink
Working toward a functional state
Browse files Browse the repository at this point in the history
Signed-off-by: Dan McPherson <[email protected]>
  • Loading branch information
danmcp committed Jun 25, 2024
1 parent ad020ef commit 3e21919
Show file tree
Hide file tree
Showing 28 changed files with 1,054 additions and 1,733 deletions.
10 changes: 9 additions & 1 deletion .spellcheck-en-custom.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,13 @@
# make spellcheck-sort
# Please keep this file sorted:
# SPDX-License-Identifier: Apache-2.0
eval
Tatsu
TODO
eval
gpt
instructlab
jsonl
justfile
openai
venv
vllm
101 changes: 100 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,103 @@
![Release](https://img.shields.io/github/v/release/instructlab/eval)
![License](https://img.shields.io/github/license/instructlab/eval)

Python library for Evaluation
Python Library for Evaluation

## MT-Bench / MT-Bench-Branch Testing Steps

```shell
# Optional: Use cloud-instance.sh to launch and setup the instance
./cloud-instance.sh ec2 launch -t g5.4xlarge
./cloud-instance.sh ec2 setup-rh-devenv
./cloud-instance.sh ec2 install-rh-nvidia-drivers
./cloud-instance.sh ec2 ssh sudo reboot
./cloud-instance.sh ec2 ssh


# Regardless of how you setup your instance
git clone https://github.com/instructlab/taxonomy.git && pushd taxonomy && git branch rc && popd
git clone --bare https://github.com/instructlab/eval.git && git clone eval.git/ && cd eval && git remote add syncrepo ../eval.git
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
pip install -r requirements-dev.txt
pip install -e .
pip install vllm==0.3.3
python -m vllm.entrypoints.openai.api_server --model instructlab/granite-7b-lab --tensor-parallel-size 1
```

In another shell window

```shell
python3 tests/test_gen_answers.py
python3 tests/test_branch_gen_answers.py
```

Example output tree

```shell
eval_output/
├── mt_bench
│   └── model_answer
│   └── instructlab
│   └── granite-7b-lab.jsonl
└── mt_bench_branch
├── main
│   ├── model_answer
│   │   └── instructlab
│   │   └── granite-7b-lab.jsonl
│   ├── question.jsonl
│   └── reference_answer
│   └── instructlab
│   └── granite-7b-lab.jsonl
└── rc
├── model_answer
│   └── instructlab
│   └── granite-7b-lab.jsonl
├── question.jsonl
└── reference_answer
└── instructlab
└── granite-7b-lab.jsonl
```

```shell
export INSTRUCT_LAB_EVAL_FIRST_N_QUESTIONS=10 # Optional if you want to shorten run times
python3 tests/test_judge_answers.py
python3 tests/test_branch_judge_answers.py
```

Example output tree

```shell
eval_output/
├── mt_bench
│   ├── model_answer
│   │   └── instructlab
│   │   └── granite-7b-lab.jsonl
│   └── model_judgment
│   └── instructlab
│   └── granite-7b-lab_single.jsonl
└── mt_bench_branch
├── main
│   ├── model_answer
│   │   └── instructlab
│   │   └── granite-7b-lab.jsonl
│   ├── model_judgment
│   │   └── instructlab
│   │   └── granite-7b-lab_single.jsonl
│   ├── question.jsonl
│   └── reference_answer
│   └── instructlab
│   └── granite-7b-lab.jsonl
└── rc
├── model_answer
│   └── instructlab
│   └── granite-7b-lab.jsonl
├── model_judgment
│   └── instructlab
│   └── granite-7b-lab_single.jsonl
├── question.jsonl
└── reference_answer
└── instructlab
└── granite-7b-lab.jsonl
```
80 changes: 0 additions & 80 deletions data/mt_bench/model_answer/instructlab/granite-7b-lab.jsonl

This file was deleted.

160 changes: 0 additions & 160 deletions data/mt_bench/model_judgment/gpt-4_single.jsonl

This file was deleted.

80 changes: 0 additions & 80 deletions data/mt_bench/reference_answer/gpt-4-turbo.jsonl

This file was deleted.

3 changes: 2 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
# SPDX-License-Identifier: Apache-2.0
FastChat
GitPython>=3.1.42,<4.0.0
shortuuid
openai<1.0.0
anthropic
psutil
torch
transformers
accelerate
pandas
pandas-stubs
6 changes: 6 additions & 0 deletions src/instructlab/eval/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Standard
import os

openai_api_key = os.environ.get("OPENAI_API_KEY")
if openai_api_key is None:
os.environ["OPENAI_API_KEY"] = "NO_API_KEY"
Loading

0 comments on commit 3e21919

Please sign in to comment.