[WIP] Add new GPU runner for E2E job, incorporate unit tests into existing runner #71

nathan-weinberg · 2024-07-16T13:46:42Z

This PR does the following:

Adds a new GPU runner job (AWS-based) we can use for full E2E runs (current runner was just sanity checking integration)
Changes existing GPU runner to run our unit test suite
Some modifications to unit tests so they can be run in CI

Resolves #12

.github/workflows/e2e-nvidia-a10g-x1.yml

alimaredia · 2024-07-16T17:58:05Z

README.md

@@ -1,6 +1,7 @@
 # eval

 ![Lint](https://github.com/instructlab/eval/actions/workflows/lint.yml/badge.svg?branch=main)
+![Test](https://github.com/instructlab/eval/actions/workflows/test.yml/badge.svg?branch=main)


@nathan-weinberg on line 38, 39 of this file can you mention how to run the tests with pytest individually and as a group?

alimaredia · 2024-07-16T18:06:07Z

.github/workflows/test.yml

-          python3 -m pip install .
+          python3.11 -m pip install .
+          # start llama-cpp server
+          ilab model download --repository instructlab/granite-7b-lab-GGUF --filename granite-7b-lab-Q4_K_M.gguf


Since what we want to do with ilab is pretty basic, I'd suggest we just install instructlab from pypi.

I would agree except the current pypi package is pretty out-of-date - using it now is just going to require us to change a bunch of stuff when the next release comes out - but once 0.18.0 lands that'll make sense

I agree we should install from pypi, installing instructlab from pypi won't change the ilab model download command

@nathan-weinberg we can install the beta releases with --pre and then remove that once 0.18.0 lands

alimaredia · 2024-07-16T18:10:52Z

.github/workflows/test.yml

+          python3.11 -m pip install .
+          # start llama-cpp server
+          ilab model download --repository instructlab/granite-7b-lab-GGUF --filename granite-7b-lab-Q4_K_M.gguf
+          ilab model serve --model-path /home/runner/.local/share/instructlab/models/granite-7b-lab-Q4_K_M.gguf


In the functional tests scripts we shut down the server to clean up. I wonder if we should split all of this into a shell script that manages ilab installation, server startup, pytest running, and server shutdown.

Really, we shouldn't need ilab at all to run unit or functional tests for this library - it should operate independently of the CLI

I'm gonna look into some other possible approaches

Really, we shouldn't need ilab at all to run unit or functional tests for this library - it should operate independently of the CLI

I think this makes sense for unit tests, but functional ones should prob still use a server from ilab etc

alinaryan

Good stuff

alinaryan · 2024-07-16T21:49:09Z

.github/workflows/test.yml

-          python3 -m pip install .
+          python3.11 -m pip install .
+          # start llama-cpp server
+          ilab model download --repository instructlab/granite-7b-lab-GGUF --filename granite-7b-lab-Q4_K_M.gguf


I agree we should install from pypi, installing instructlab from pypi won't change the ilab model download command

alinaryan · 2024-07-16T21:50:57Z

.github/workflows/test.yml

+          python3.11 -m pip install .
+          # start llama-cpp server
+          ilab model download --repository instructlab/granite-7b-lab-GGUF --filename granite-7b-lab-Q4_K_M.gguf
+          ilab model serve --model-path /home/runner/.local/share/instructlab/models/granite-7b-lab-Q4_K_M.gguf


Really, we shouldn't need ilab at all to run unit or functional tests for this library - it should operate independently of the CLI

I think this makes sense for unit tests, but functional ones should prob still use a server from ilab etc

booxter · 2024-07-31T22:00:04Z

.gitignore

@@ -53,6 +53,7 @@ coverage.xml
 .hypothesis/
 .pytest_cache/
 cover/
+eval_output/


taxonomy too?

booxter · 2024-07-31T22:03:24Z

.github/workflows/e2e-nvidia-a10g-x1.yml

+      - name: Install ilab
+        run: |
+          export CUDA_HOME="/usr/local/cuda"
+          export LD_LIBRRY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64"


LIBRRY :)

booxter · 2024-07-31T22:05:58Z

.github/workflows/e2e-nvidia-a10g-x1.yml

+          # Install the local version of eval before installing the CLI so PR changes are included
+          python3.11 -m pip install .
+
+          python3.11 -m pip install instructlab


(Question) AFAIU this assumes that instructlab package never caps / pins the eval library in an incompatible way. Otherwise, this line could revert the eval package to the one from pypi. Is it an acceptable assumption?

Use previous GPU runner for unit tests Signed-off-by: Nathan Weinberg <[email protected]>

nathan-weinberg · 2024-10-21T21:13:13Z

closing in favor of #155

russellb reviewed Jul 16, 2024

View reviewed changes

.github/workflows/e2e-nvidia-a10g-x1.yml Show resolved Hide resolved

russellb reviewed Jul 16, 2024

View reviewed changes

.github/workflows/e2e-nvidia-a10g-x1.yml Outdated Show resolved Hide resolved

nathan-weinberg force-pushed the gpu-runner branch 3 times, most recently from 7b5a2d5 to dfd6430 Compare July 16, 2024 14:53

nathan-weinberg requested review from alimaredia and alinaryan July 16, 2024 14:57

nathan-weinberg force-pushed the gpu-runner branch 3 times, most recently from 055be45 to 9707f67 Compare July 16, 2024 15:59

nathan-weinberg mentioned this pull request Jul 16, 2024

fix: remove unused flag from e2e job configs instructlab/instructlab#1743

Merged

5 tasks

nathan-weinberg changed the title ~~[WIP] Add GPU runner job~~ [WIP] Add new GPU runner for E2E job, incorporate unit tests into existing runner Jul 16, 2024

nathan-weinberg force-pushed the gpu-runner branch 3 times, most recently from ab0af26 to 4f4be69 Compare July 16, 2024 17:20

alimaredia reviewed Jul 16, 2024

View reviewed changes

alinaryan reviewed Jul 16, 2024

View reviewed changes

booxter reviewed Jul 31, 2024

View reviewed changes

[WIP] Add new GPU runner for E2E tests

772bc74

Use previous GPU runner for unit tests Signed-off-by: Nathan Weinberg <[email protected]>

nathan-weinberg force-pushed the gpu-runner branch from 4f4be69 to 772bc74 Compare September 24, 2024 14:19

mergify bot added CI/CD Affects CI/CD configuration documentation Improvements or additions to documentation testing Relates to testing ci-failure labels Sep 24, 2024

nathan-weinberg closed this Oct 21, 2024

nathan-weinberg deleted the gpu-runner branch November 4, 2024 20:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Add new GPU runner for E2E job, incorporate unit tests into existing runner #71

[WIP] Add new GPU runner for E2E job, incorporate unit tests into existing runner #71

nathan-weinberg commented Jul 16, 2024 •

edited

Loading

alimaredia Jul 16, 2024

nathan-weinberg Jul 16, 2024

alimaredia Jul 16, 2024

nathan-weinberg Jul 16, 2024

alinaryan Jul 16, 2024

jaideepr97 Jul 31, 2024

alimaredia Jul 16, 2024

nathan-weinberg Jul 16, 2024

nathan-weinberg Jul 16, 2024

alinaryan Jul 16, 2024

alinaryan left a comment

alinaryan Jul 16, 2024

alinaryan Jul 16, 2024

booxter Jul 31, 2024

booxter Jul 31, 2024

booxter Jul 31, 2024

nathan-weinberg commented Oct 21, 2024

[WIP] Add new GPU runner for E2E job, incorporate unit tests into existing runner #71

[WIP] Add new GPU runner for E2E job, incorporate unit tests into existing runner #71

Conversation

nathan-weinberg commented Jul 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alinaryan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nathan-weinberg commented Oct 21, 2024

nathan-weinberg commented Jul 16, 2024 •

edited

Loading