Skip to content

Commit

Permalink
Working toward a functional state
Browse files Browse the repository at this point in the history
Signed-off-by: Dan McPherson <[email protected]>
  • Loading branch information
danmcp committed Jun 17, 2024
1 parent ad020ef commit b7dab55
Show file tree
Hide file tree
Showing 5 changed files with 16 additions and 6 deletions.
2 changes: 1 addition & 1 deletion src/instructlab/eval/evaluator.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ class Evaluator:
"""
Parent class for Evaluators
Atttributes:
Attributes:
model_path Path to the model to be evaluated
"""

Expand Down
2 changes: 1 addition & 1 deletion src/instructlab/eval/gen_api_answer.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
import shortuuid
import tqdm

from .common import (
from common import (

Check failure on line 16 in src/instructlab/eval/gen_api_answer.py

View workflow job for this annotation

GitHub Actions / lint

E0401: Unable to import 'common' (import-error)
load_questions,
temperature_config,
chat_completion_openai,
Expand Down
2 changes: 1 addition & 1 deletion src/instructlab/eval/gen_judgment.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
import numpy as np
from tqdm import tqdm

from instructlab.eval.common import (
from common import (

Check failure on line 11 in src/instructlab/eval/gen_judgment.py

View workflow job for this annotation

GitHub Actions / lint

E0401: Unable to import 'common' (import-error)
load_questions,
load_model_answers,
load_judge_prompts,
Expand Down
6 changes: 3 additions & 3 deletions src/instructlab/eval/mtbench.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
# Local
from .evaluator import Evaluator
import instructlab.eval.gen_api_answer as gen_api_answer
import instructlab.eval.gen_judgement as gen_judgement
import instructlab.eval.gen_judgment as gen_judgment


class MT_Bench_Evaluator(Evaluator):
Expand All @@ -27,15 +27,15 @@ def gen_answers(self, answer_file, server_url) -> str:
#def judge_answers(self, judge_endpoint) -> tuple:
def judge_answers(self, judge_endpoint) -> str:
"""
Runs MT-Bench judgement
Runs MT-Bench judgment
Returns:
overall_score MT-Bench score for the overall model evaluation
qa_pairs Question and answer pairs from the evaluation
"""
os.environ['OPENAI_API_BASE'] = judge_endpoint
os.environ['OPENAI_API_KEY'] = "NO_API_KEY"
output_file = gen_judgement.run(parallel=40)
output_file = gen_judgment.run(parallel=40)
return output_file


Expand Down
10 changes: 10 additions & 0 deletions steps_mt_bench.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# MT-Bench Broken Down in Eval Scripts (not PR Bench)

You should run with `--tensor-parallel-size <NUM GPUS>` and possibly increase `--max-model-len` to increase the context length

```shell
python -m vllm.entrypoints.openai.api_server --model instructlab/granite-7b-lab --served-model-name granite-7b-lab
```

### From justfile: `run_bench`

Check failure on line 9 in steps_mt_bench.md

View workflow job for this annotation

GitHub Actions / markdown-lint

Heading levels should only increment by one level at a time

steps_mt_bench.md:9 MD001/heading-increment Heading levels should only increment by one level at a time [Expected: h2; Actual: h3] https://github.com/DavidAnson/markdownlint/blob/v0.34.0/doc/md001.md

If dry run:
Expand All @@ -24,6 +30,10 @@ OPENAI_API_KEY="NO_API_KEY" python gen_api_answer.py \

results are in data/mt_bench/model_answer/instructlab/granite-7b-lab.jsonl


Check failure on line 33 in steps_mt_bench.md

View workflow job for this annotation

GitHub Actions / markdown-lint

Multiple consecutive blank lines

steps_mt_bench.md:33 MD012/no-multiple-blanks Multiple consecutive blank lines [Expected: 1; Actual: 2] https://github.com/DavidAnson/markdownlint/blob/v0.34.0/doc/md012.md
### TODO Figure out the right version. Latest fails with openai.types not found

Check failure on line 34 in steps_mt_bench.md

View workflow job for this annotation

GitHub Actions / markdown-lint

Headings should be surrounded by blank lines

steps_mt_bench.md:34 MD022/blanks-around-headings Headings should be surrounded by blank lines [Expected: 1; Actual: 0; Below] [Context: "### TODO Figure out the right version. Latest fails with openai.types not found"] https://github.com/DavidAnson/markdownlint/blob/v0.34.0/doc/md022.md
pip install vllm==0.3.3

### From justfile: `run_judge`

For running judge model with vllm make sure you run with `--served-model-name gpt-4`
Expand Down

0 comments on commit b7dab55

Please sign in to comment.