Skip to content

Commit

Permalink
Fixing up test case after api changes to add error_rate
Browse files Browse the repository at this point in the history
Also added scripts/infra to the cloud instance usage to match the change to the cli and moved the first n option up which now applies to generation as well.

Signed-off-by: Dan McPherson <[email protected]>
  • Loading branch information
danmcp committed Jul 12, 2024
1 parent 6041f91 commit baa89ca
Show file tree
Hide file tree
Showing 3 changed files with 11 additions and 9 deletions.
14 changes: 7 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,12 @@ Python Library for Evaluation
## MT-Bench / MT-Bench-Branch Testing Steps

```shell
# Optional: Use cloud-instance.sh to launch and setup the instance
./cloud-instance.sh ec2 launch -t g5.4xlarge
./cloud-instance.sh ec2 setup-rh-devenv
./cloud-instance.sh ec2 install-rh-nvidia-drivers
./cloud-instance.sh ec2 ssh sudo reboot
./cloud-instance.sh ec2 ssh
# Optional: Use [cloud-instance.sh](https://github.com/instructlab/instructlab/tree/main/scripts/infra) to launch and setup the instance
scripts/infra/cloud-instance.sh ec2 launch -t g5.4xlarge
scripts/infra/cloud-instance.sh ec2 setup-rh-devenv
scripts/infra/cloud-instance.sh ec2 install-rh-nvidia-drivers
scripts/infra/cloud-instance.sh ec2 ssh sudo reboot
scripts/infra/cloud-instance.sh ec2 ssh


# Regardless of how you setup your instance
Expand All @@ -33,6 +33,7 @@ python -m vllm.entrypoints.openai.api_server --model instructlab/granite-7b-lab
In another shell window

```shell
export INSTRUCTLAB_EVAL_FIRST_N_QUESTIONS=10 # Optional if you want to shorten run times
python3 tests/test_gen_answers.py
python3 tests/test_branch_gen_answers.py
```
Expand Down Expand Up @@ -65,7 +66,6 @@ eval_output/
```

```shell
export INSTRUCTLAB_EVAL_FIRST_N_QUESTIONS=40 # Optional if you want to shorten run times
python3 tests/test_judge_answers.py
python3 tests/test_branch_judge_answers.py
```
Expand Down
3 changes: 2 additions & 1 deletion tests/test_branch_judge_answers.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,8 @@
"../taxonomy",
"main",
)
qa_pairs = mt_bench_branch.judge_answers("http://localhost:8000/v1")
qa_pairs, error_rate = mt_bench_branch.judge_answers("http://localhost:8000/v1")
print(f"Error Rate: {error_rate}")
print(f"QA Pair 0:")
pprint.pprint(qa_pairs[0])

Expand Down
3 changes: 2 additions & 1 deletion tests/test_judge_answers.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,14 @@
from instructlab.eval.mt_bench import MTBenchEvaluator

mt_bench = MTBenchEvaluator("instructlab/granite-7b-lab", "instructlab/granite-7b-lab")
overall_score, qa_pairs, turn_scores = mt_bench.judge_answers(
overall_score, qa_pairs, turn_scores, error_rate = mt_bench.judge_answers(
"http://localhost:8000/v1"
)

print(f"Overall Score: {overall_score}")
print(f"Turn 1 Score: {turn_scores[0]}")
print(f"Turn 2 Score: {turn_scores[1]}")
print(f"Error Rate: {error_rate}")
print(f"QA Pair 0:")
pprint.pprint(qa_pairs[0])

Expand Down

0 comments on commit baa89ca

Please sign in to comment.