-
Notifications
You must be signed in to change notification settings - Fork 23
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add facilities for unit and functional tests
There was one unit test already existing and this commit adds a couple of starter unit tests for mt_bench and mmlu. Signed-off-by: Dan McPherson <[email protected]>
- Loading branch information
Showing
23 changed files
with
3,085 additions
and
34 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,119 @@ | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
name: Test | ||
|
||
on: | ||
workflow_dispatch: | ||
push: | ||
branches: | ||
- "main" | ||
- "release-**" | ||
paths: | ||
- '**.py' | ||
- 'pyproject.toml' | ||
- 'requirements**.txt' | ||
- 'tox.ini' | ||
- 'scripts/*.sh' # Used by this workflow | ||
- '.github/workflows/test.yml' # This workflow | ||
pull_request: | ||
branches: | ||
- "main" | ||
- "release-**" | ||
paths: | ||
- '**.py' | ||
- 'pyproject.toml' | ||
- 'requirements**.txt' | ||
- 'tox.ini' | ||
- 'scripts/*.sh' # Used by this workflow | ||
- '.github/workflows/test.yml' # This workflow | ||
|
||
env: | ||
LC_ALL: en_US.UTF-8 | ||
|
||
defaults: | ||
run: | ||
shell: bash | ||
|
||
permissions: | ||
contents: read | ||
|
||
jobs: | ||
test: | ||
name: "test: ${{ matrix.python }} on ${{ matrix.platform }}" | ||
runs-on: "${{ matrix.platform }}" | ||
strategy: | ||
matrix: | ||
python: | ||
- "3.10" | ||
- "3.11" | ||
platform: | ||
- "ubuntu-latest" | ||
include: | ||
- python: "3.11" | ||
platform: "macos-latest" | ||
steps: | ||
- name: "Harden Runner" | ||
uses: step-security/harden-runner@91182cccc01eb5e619899d80e4e971d6181294a7 # v2.10.1 | ||
with: | ||
egress-policy: audit # TODO: change to 'egress-policy: block' after couple of runs | ||
|
||
- name: Checkout | ||
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2 | ||
with: | ||
# https://github.com/actions/checkout/issues/249 | ||
fetch-depth: 0 | ||
|
||
- name: Free disk space | ||
if: matrix.platform != 'macos-latest' | ||
uses: ./.github/actions/free-disk-space | ||
|
||
- name: Install the expect package | ||
if: startsWith(matrix.platform, 'ubuntu') | ||
run: | | ||
sudo apt-get install -y expect | ||
- name: Install tools on MacOS | ||
if: startsWith(matrix.platform, 'macos') | ||
run: | | ||
brew install expect coreutils bash | ||
- name: Setup Python ${{ matrix.python }} | ||
uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b # v5.3.0 | ||
with: | ||
python-version: ${{ matrix.python }} | ||
cache: pip | ||
cache-dependency-path: | | ||
**/pyproject.toml | ||
**/requirements*.txt | ||
- name: Remove llama-cpp-python from cache | ||
run: | | ||
pip cache remove llama_cpp_python | ||
- name: Cache huggingface | ||
uses: actions/cache@6849a6489940f00c2f30c0fb92c6274307ccb58a # v4.1.2 | ||
with: | ||
path: ~/.cache/huggingface | ||
# config contains DEFAULT_MODEL | ||
key: huggingface-${{ hashFiles('src/instructlab/configuration.py') }} | ||
|
||
- name: Install dependencies | ||
run: | | ||
python -m pip install --upgrade pip | ||
python -m pip install tox tox-gh>=1.2 | ||
- name: Run unit and functional tests with tox | ||
run: | | ||
tox | ||
- name: Remove llama-cpp-python from cache | ||
if: always() | ||
run: | | ||
pip cache remove llama_cpp_python | ||
test-workflow-complete: | ||
needs: ["test"] | ||
runs-on: ubuntu-latest | ||
steps: | ||
- name: Test Workflow Complete | ||
run: echo "Test Workflow Complete" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,7 @@ | ||
# eval | ||
|
||
![Lint](https://github.com/instructlab/eval/actions/workflows/lint.yml/badge.svg?branch=main) | ||
![Tests](https://github.com/instructlab/eval/actions/workflows/test.yml/badge.svg?branch=main) | ||
![Build](https://github.com/instructlab/eval/actions/workflows/pypi.yaml/badge.svg?branch=main) | ||
![Release](https://img.shields.io/github/v/release/instructlab/eval) | ||
![License](https://img.shields.io/github/license/instructlab/eval) | ||
|
@@ -77,20 +78,32 @@ MMLU Branch is an adaptation of MMLU that is designed to test custom knowledge t | |
|
||
A teacher model is used to generate new multiple choice questions based on the knowledge document included in the taxonomy Git branch. A “task” is then constructed that references the newly generated answer choices. These tasks are then used to score the model’s grasp on new knowledge the same way MMLU works. Generation of these tasks are done as part of the [InstructLab SDG](https://github.com/instructlab/sdg) library. | ||
|
||
## MT-Bench / MT-Bench Branch Testing Steps | ||
## Development | ||
|
||
> **⚠️ Note:** Must use Python version 3.10 or later. | ||
### Set up your dev environment | ||
|
||
The following tools are required: | ||
|
||
- [`git`](https://git-scm.com) | ||
- [`python`](https://www.python.org) (v3.10 or v3.11) | ||
- [`pip`](https://pypi.org/project/pip/) (v23.0+) | ||
- [`bash`](https://www.gnu.org/software/bash/) (v5+, for functional tests) | ||
|
||
#### Optional: Use [cloud-instance.sh](https://github.com/instructlab/instructlab/tree/main/scripts/infra) to launch and setup an instance | ||
|
||
```shell | ||
# Optional: Use cloud-instance.sh (https://github.com/instructlab/instructlab/tree/main/scripts/infra) to launch and setup the instance | ||
scripts/infra/cloud-instance.sh ec2 launch -t g5.4xlarge | ||
scripts/infra/cloud-instance.sh ec2 launch -t g6.2xlarge | ||
scripts/infra/cloud-instance.sh ec2 setup-rh-devenv | ||
scripts/infra/cloud-instance.sh ec2 install-rh-nvidia-drivers | ||
scripts/infra/cloud-instance.sh ec2 ssh sudo reboot | ||
scripts/infra/cloud-instance.sh ec2 ssh | ||
``` | ||
|
||
#### Regardless of how you setup your instance | ||
|
||
# Regardless of how you setup your instance | ||
``` | ||
Check failure on line 106 in README.md GitHub Actions / markdown-lintFenced code blocks should have a language specified
|
||
git clone https://github.com/instructlab/taxonomy.git && pushd taxonomy && git branch rc && popd | ||
git clone --bare https://github.com/instructlab/eval.git && git clone eval.git/ && cd eval && git remote add syncrepo ../eval.git | ||
python3 -m venv venv | ||
|
@@ -99,6 +112,68 @@ pip install -r requirements.txt | |
pip install -r requirements-dev.txt | ||
pip install -e . | ||
pip install vllm | ||
``` | ||
|
||
### Testing | ||
|
||
Before pushing changes to GitHub, you need to run the tests as shown below. They can be run individually as shown in each sub-section | ||
or can be run with the one command: | ||
|
||
```shell | ||
tox | ||
``` | ||
|
||
#### Unit tests | ||
|
||
Unit tests are enforced by the CI system using [`pytest`](https://docs.pytest.org/). When making changes, run these tests before pushing the changes to avoid CI issues. | ||
|
||
Running unit tests can be done with: | ||
|
||
```shell | ||
tox -e py3-unit | ||
``` | ||
|
||
By default, all tests found within the `tests` directory are run. However, specific unit tests can run by passing filenames, classes and/or methods to `pytest` using tox positional arguments. The following example invokes a single test method `test_mt_bench` that is declared in the `tests/test_mt_bench.py` file: | ||
|
||
```shell | ||
tox -e py3-unit -- tests/test_mt_bench.py::test_mt_bench | ||
``` | ||
|
||
#### Functional tests | ||
|
||
Functional tests are enforced by the CI system. When making changes, run the tests before pushing the changes to avoid CI issues. | ||
|
||
Running functional tests can be done with: | ||
|
||
```shell | ||
tox -e py3-functional | ||
``` | ||
|
||
#### Coding style | ||
|
||
Cli follows the python [`pep8`](https://peps.python.org/pep-0008/) coding style. The coding style is enforced by the CI system, and your PR will fail until the style has been applied correctly. | ||
|
||
We use [pre-commit](https://pre-commit.com/) to enforce coding style using [`black`](https://github.com/psf/black), and [`isort`](https://pycqa.github.io/isort/). | ||
|
||
You can invoke formatting with: | ||
|
||
```shell | ||
tox -e ruff | ||
``` | ||
|
||
In addition, we use [`pylint`](https://www.pylint.org) to perform static code analysis of the code. | ||
|
||
You can invoke the linting with the following command | ||
|
||
```shell | ||
tox -e lint | ||
``` | ||
|
||
### MT-Bench / MT-Bench Branch Example Usage | ||
|
||
Launch vllm serving granite-7b-lab | ||
|
||
```shell | ||
python -m vllm.entrypoints.openai.api_server --model instructlab/granite-7b-lab --tensor-parallel-size 1 | ||
``` | ||
|
||
|
@@ -107,8 +182,8 @@ In another shell window | |
```shell | ||
export INSTRUCTLAB_EVAL_FIRST_N_QUESTIONS=10 # Optional if you want to shorten run times | ||
# Commands relative to eval directory | ||
python3 tests/test_gen_answers.py | ||
python3 tests/test_branch_gen_answers.py | ||
python3 scripts/test_gen_answers.py | ||
python3 scripts/test_branch_gen_answers.py | ||
``` | ||
|
||
Example output tree | ||
|
@@ -139,8 +214,8 @@ eval_output/ | |
``` | ||
|
||
```shell | ||
python3 tests/test_judge_answers.py | ||
python3 tests/test_branch_judge_answers.py | ||
python3 scripts/test_judge_answers.py | ||
python3 scripts/test_branch_judge_answers.py | ||
``` | ||
|
||
Example output tree | ||
|
Oops, something went wrong.