Working toward a functional state

Signed-off-by: Dan McPherson <[email protected]>
instructlab · Jun 25, 2024 · 3e21919 · 3e21919
1 parent ad020ef
commit 3e21919
Show file tree

Hide file tree

Showing 28 changed files with 1,054 additions and 1,733 deletions.
diff --git a/.spellcheck-en-custom.txt b/.spellcheck-en-custom.txt
@@ -2,5 +2,13 @@
 # make spellcheck-sort
 # Please keep this file sorted:
 # SPDX-License-Identifier: Apache-2.0
-eval
 Tatsu
+TODO
+eval
+gpt
+instructlab
+jsonl
+justfile
+openai
+venv
+vllm
diff --git a/README.md b/README.md
@@ -5,4 +5,103 @@
 ![Release](https://img.shields.io/github/v/release/instructlab/eval)
 ![License](https://img.shields.io/github/license/instructlab/eval)
 
-Python library for Evaluation
+Python Library for Evaluation
+
+## MT-Bench / MT-Bench-Branch Testing Steps
+
+```shell
+# Optional: Use cloud-instance.sh to launch and setup the instance
+./cloud-instance.sh ec2 launch -t g5.4xlarge
+./cloud-instance.sh ec2 setup-rh-devenv
+./cloud-instance.sh ec2 install-rh-nvidia-drivers
+./cloud-instance.sh ec2 ssh sudo reboot
+./cloud-instance.sh ec2 ssh
+
+
+# Regardless of how you setup your instance
+git clone https://github.com/instructlab/taxonomy.git && pushd taxonomy && git branch rc && popd
+git clone --bare https://github.com/instructlab/eval.git && git clone eval.git/ && cd eval && git remote add syncrepo ../eval.git
+python -m venv venv
+source venv/bin/activate
+pip install -r requirements.txt
+pip install -r requirements-dev.txt
+pip install -e .
+pip install vllm==0.3.3
+python -m vllm.entrypoints.openai.api_server --model instructlab/granite-7b-lab --tensor-parallel-size 1
+```
+
+In another shell window
+
+```shell
+python3 tests/test_gen_answers.py
+python3 tests/test_branch_gen_answers.py
+```
+
+Example output tree
+
+```shell
+eval_output/
+├── mt_bench
+│   └── model_answer
+│       └── instructlab
+│           └── granite-7b-lab.jsonl
+└── mt_bench_branch
+    ├── main
+    │   ├── model_answer
+    │   │   └── instructlab
+    │   │       └── granite-7b-lab.jsonl
+    │   ├── question.jsonl
+    │   └── reference_answer
+    │       └── instructlab
+    │           └── granite-7b-lab.jsonl
+    └── rc
+        ├── model_answer
+        │   └── instructlab
+        │       └── granite-7b-lab.jsonl
+        ├── question.jsonl
+        └── reference_answer
+            └── instructlab
+                └── granite-7b-lab.jsonl
+```
+
+```shell
+export INSTRUCT_LAB_EVAL_FIRST_N_QUESTIONS=10 # Optional if you want to shorten run times
+python3 tests/test_judge_answers.py
+python3 tests/test_branch_judge_answers.py
+```
+
+Example output tree
+
+```shell
+eval_output/
+├── mt_bench
+│   ├── model_answer
+│   │   └── instructlab
+│   │       └── granite-7b-lab.jsonl
+│   └── model_judgment
+│       └── instructlab
+│           └── granite-7b-lab_single.jsonl
+└── mt_bench_branch
+    ├── main
+    │   ├── model_answer
+    │   │   └── instructlab
+    │   │       └── granite-7b-lab.jsonl
+    │   ├── model_judgment
+    │   │   └── instructlab
+    │   │       └── granite-7b-lab_single.jsonl
+    │   ├── question.jsonl
+    │   └── reference_answer
+    │       └── instructlab
+    │           └── granite-7b-lab.jsonl
+    └── rc
+        ├── model_answer
+        │   └── instructlab
+        │       └── granite-7b-lab.jsonl
+        ├── model_judgment
+        │   └── instructlab
+        │       └── granite-7b-lab_single.jsonl
+        ├── question.jsonl
+        └── reference_answer
+            └── instructlab
+                └── granite-7b-lab.jsonl
+```
diff --git a/data/mt_bench/model_answer/instructlab/granite-7b-lab.jsonl b/data/mt_bench/model_answer/instructlab/granite-7b-lab.jsonl
diff --git a/data/mt_bench/model_judgment/gpt-4_single.jsonl b/data/mt_bench/model_judgment/gpt-4_single.jsonl
diff --git a/data/mt_bench/reference_answer/gpt-4-turbo.jsonl b/data/mt_bench/reference_answer/gpt-4-turbo.jsonl
diff --git a/requirements.txt b/requirements.txt
@@ -1,10 +1,11 @@
 # SPDX-License-Identifier: Apache-2.0
 FastChat
+GitPython>=3.1.42,<4.0.0
 shortuuid
 openai<1.0.0
-anthropic
 psutil
 torch
 transformers
 accelerate
 pandas
+pandas-stubs
diff --git a/src/instructlab/eval/__init__.py b/src/instructlab/eval/__init__.py
@@ -0,0 +1,6 @@
+# Standard
+import os
+
+openai_api_key = os.environ.get("OPENAI_API_KEY")
+if openai_api_key is None:
+    os.environ["OPENAI_API_KEY"] = "NO_API_KEY"