Skip to content

Commit

Permalink
Add facilities for unit and functional tests
Browse files Browse the repository at this point in the history
There was one unit test already existing and this commit adds a couple of starter unit tests for mt_bench and mmlu.

Signed-off-by: Dan McPherson <[email protected]>
  • Loading branch information
danmcp committed Nov 1, 2024
1 parent bd42ab8 commit 7c901a8
Show file tree
Hide file tree
Showing 23 changed files with 3,085 additions and 34 deletions.
119 changes: 119 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
# SPDX-License-Identifier: Apache-2.0

name: Test

on:
workflow_dispatch:
push:
branches:
- "main"
- "release-**"
paths:
- '**.py'
- 'pyproject.toml'
- 'requirements**.txt'
- 'tox.ini'
- 'scripts/*.sh' # Used by this workflow
- '.github/workflows/test.yml' # This workflow
pull_request:
branches:
- "main"
- "release-**"
paths:
- '**.py'
- 'pyproject.toml'
- 'requirements**.txt'
- 'tox.ini'
- 'scripts/*.sh' # Used by this workflow
- '.github/workflows/test.yml' # This workflow

env:
LC_ALL: en_US.UTF-8

defaults:
run:
shell: bash

permissions:
contents: read

jobs:
test:
name: "test: ${{ matrix.python }} on ${{ matrix.platform }}"
runs-on: "${{ matrix.platform }}"
strategy:
matrix:
python:
- "3.10"
- "3.11"
platform:
- "ubuntu-latest"
include:
- python: "3.11"
platform: "macos-latest"
steps:
- name: "Harden Runner"
uses: step-security/harden-runner@91182cccc01eb5e619899d80e4e971d6181294a7 # v2.10.1
with:
egress-policy: audit # TODO: change to 'egress-policy: block' after couple of runs

- name: Checkout
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
# https://github.com/actions/checkout/issues/249
fetch-depth: 0

- name: Free disk space
if: matrix.platform != 'macos-latest'
uses: ./.github/actions/free-disk-space

- name: Install the expect package
if: startsWith(matrix.platform, 'ubuntu')
run: |
sudo apt-get install -y expect
- name: Install tools on MacOS
if: startsWith(matrix.platform, 'macos')
run: |
brew install expect coreutils bash
- name: Setup Python ${{ matrix.python }}
uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b # v5.3.0
with:
python-version: ${{ matrix.python }}
cache: pip
cache-dependency-path: |
**/pyproject.toml
**/requirements*.txt
- name: Remove llama-cpp-python from cache
run: |
pip cache remove llama_cpp_python
- name: Cache huggingface
uses: actions/cache@6849a6489940f00c2f30c0fb92c6274307ccb58a # v4.1.2
with:
path: ~/.cache/huggingface
# config contains DEFAULT_MODEL
key: huggingface-${{ hashFiles('src/instructlab/configuration.py') }}

- name: Install dependencies
run: |
python -m pip install --upgrade pip
python -m pip install tox tox-gh>=1.2
- name: Run unit and functional tests with tox
run: |
tox
- name: Remove llama-cpp-python from cache
if: always()
run: |
pip cache remove llama_cpp_python
test-workflow-complete:
needs: ["test"]
runs-on: ubuntu-latest
steps:
- name: Test Workflow Complete
run: echo "Test Workflow Complete"
4 changes: 4 additions & 0 deletions .spellcheck-en-custom.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ Backport
backported
benchmarking
codebase
cli
dev
dr
eval
gpt
Expand All @@ -16,9 +18,11 @@ jsonl
justfile
MMLU
openai
pre
SDG
Tatsu
tl
TODO
tox
venv
vllm
6 changes: 6 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -54,3 +54,9 @@ spellcheck-sort: .spellcheck-en-custom.txt ## Sort spellcheck directory
.PHONY: verify
verify: check-tox ## Run linting, typing, and formatting checks via tox
tox p -e fastlint,mypy,ruff

##@ Development

.PHONY: tests
tests: check-tox ## Run unit and type checks
tox -e py3-unit,mypy
91 changes: 83 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# eval

![Lint](https://github.com/instructlab/eval/actions/workflows/lint.yml/badge.svg?branch=main)
![Tests](https://github.com/instructlab/eval/actions/workflows/test.yml/badge.svg?branch=main)
![Build](https://github.com/instructlab/eval/actions/workflows/pypi.yaml/badge.svg?branch=main)
![Release](https://img.shields.io/github/v/release/instructlab/eval)
![License](https://img.shields.io/github/license/instructlab/eval)
Expand Down Expand Up @@ -77,20 +78,32 @@ MMLU Branch is an adaptation of MMLU that is designed to test custom knowledge t

A teacher model is used to generate new multiple choice questions based on the knowledge document included in the taxonomy Git branch. A “task” is then constructed that references the newly generated answer choices. These tasks are then used to score the model’s grasp on new knowledge the same way MMLU works. Generation of these tasks are done as part of the [InstructLab SDG](https://github.com/instructlab/sdg) library.

## MT-Bench / MT-Bench Branch Testing Steps
## Development

> **⚠️ Note:** Must use Python version 3.10 or later.
### Set up your dev environment

The following tools are required:

- [`git`](https://git-scm.com)
- [`python`](https://www.python.org) (v3.10 or v3.11)
- [`pip`](https://pypi.org/project/pip/) (v23.0+)
- [`bash`](https://www.gnu.org/software/bash/) (v5+, for functional tests)

#### Optional: Use [cloud-instance.sh](https://github.com/instructlab/instructlab/tree/main/scripts/infra) to launch and setup an instance

```shell
# Optional: Use cloud-instance.sh (https://github.com/instructlab/instructlab/tree/main/scripts/infra) to launch and setup the instance
scripts/infra/cloud-instance.sh ec2 launch -t g5.4xlarge
scripts/infra/cloud-instance.sh ec2 launch -t g6.2xlarge
scripts/infra/cloud-instance.sh ec2 setup-rh-devenv
scripts/infra/cloud-instance.sh ec2 install-rh-nvidia-drivers
scripts/infra/cloud-instance.sh ec2 ssh sudo reboot
scripts/infra/cloud-instance.sh ec2 ssh
```

#### Regardless of how you setup your instance

# Regardless of how you setup your instance
```

Check failure on line 106 in README.md

View workflow job for this annotation

GitHub Actions / markdown-lint

Fenced code blocks should have a language specified

README.md:106 MD040/fenced-code-language Fenced code blocks should have a language specified [Context: "```"] https://github.com/DavidAnson/markdownlint/blob/v0.35.0/doc/md040.md
git clone https://github.com/instructlab/taxonomy.git && pushd taxonomy && git branch rc && popd
git clone --bare https://github.com/instructlab/eval.git && git clone eval.git/ && cd eval && git remote add syncrepo ../eval.git
python3 -m venv venv
Expand All @@ -99,6 +112,68 @@ pip install -r requirements.txt
pip install -r requirements-dev.txt
pip install -e .
pip install vllm
```

### Testing

Before pushing changes to GitHub, you need to run the tests as shown below. They can be run individually as shown in each sub-section
or can be run with the one command:

```shell
tox
```

#### Unit tests

Unit tests are enforced by the CI system using [`pytest`](https://docs.pytest.org/). When making changes, run these tests before pushing the changes to avoid CI issues.

Running unit tests can be done with:

```shell
tox -e py3-unit
```

By default, all tests found within the `tests` directory are run. However, specific unit tests can run by passing filenames, classes and/or methods to `pytest` using tox positional arguments. The following example invokes a single test method `test_mt_bench` that is declared in the `tests/test_mt_bench.py` file:

```shell
tox -e py3-unit -- tests/test_mt_bench.py::test_mt_bench
```

#### Functional tests

Functional tests are enforced by the CI system. When making changes, run the tests before pushing the changes to avoid CI issues.

Running functional tests can be done with:

```shell
tox -e py3-functional
```

#### Coding style

Cli follows the python [`pep8`](https://peps.python.org/pep-0008/) coding style. The coding style is enforced by the CI system, and your PR will fail until the style has been applied correctly.

We use [pre-commit](https://pre-commit.com/) to enforce coding style using [`black`](https://github.com/psf/black), and [`isort`](https://pycqa.github.io/isort/).

You can invoke formatting with:

```shell
tox -e ruff
```

In addition, we use [`pylint`](https://www.pylint.org) to perform static code analysis of the code.

You can invoke the linting with the following command

```shell
tox -e lint
```

### MT-Bench / MT-Bench Branch Example Usage

Launch vllm serving granite-7b-lab

```shell
python -m vllm.entrypoints.openai.api_server --model instructlab/granite-7b-lab --tensor-parallel-size 1
```

Expand All @@ -107,8 +182,8 @@ In another shell window
```shell
export INSTRUCTLAB_EVAL_FIRST_N_QUESTIONS=10 # Optional if you want to shorten run times
# Commands relative to eval directory
python3 tests/test_gen_answers.py
python3 tests/test_branch_gen_answers.py
python3 scripts/test_gen_answers.py
python3 scripts/test_branch_gen_answers.py
```

Example output tree
Expand Down Expand Up @@ -139,8 +214,8 @@ eval_output/
```

```shell
python3 tests/test_judge_answers.py
python3 tests/test_branch_judge_answers.py
python3 scripts/test_judge_answers.py
python3 scripts/test_branch_judge_answers.py
```

Example output tree
Expand Down
Loading

0 comments on commit 7c901a8

Please sign in to comment.