Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automate validation of pipeline configs #132

Merged
merged 14 commits into from
Jul 15, 2024

Conversation

russellb
Copy link
Member

@russellb russellb commented Jul 14, 2024

Closes #131

1cfd4bb Add jsonschema for validating pipeline configuration
c3bd917 Add a script that will validate pipeline configurations
18b4741 Add tox env for validating pipeline configuration
3090e65 Run pipeline config validation in CI

commit 1cfd4bb
Author: Russell Bryant [email protected]
Date: Sun Jul 14 12:50:45 2024 -0400

Add jsonschema for validating pipeline configuration

This commit introduces a `v1.json` jsonschema for use with validating
pipeline configuration. This follows the same format used in the
`instructlab/schema` repository which we use for validating taxonomy
`qna.yaml` files.

It's very likely that the schema is missing some details that are
allowed. It does pass against all configurations currently in the
source tree, though.

Signed-off-by: Russell Bryant <[email protected]>

commit c3bd917
Author: Russell Bryant [email protected]
Date: Sun Jul 14 12:53:43 2024 -0400

Add a script that will validate pipeline configurations

This script loads the pipeline validation schema and validates all
configurations under `src/instructlab/sdg/pipelines/`.

Signed-off-by: Russell Bryant <[email protected]>

commit 18b4741
Author: Russell Bryant [email protected]
Date: Sun Jul 14 12:52:42 2024 -0400

Add tox env for validating pipeline configuration

To run the script that validates pipeline configurations, you may now
run:

    tox -e validate-pipelines

or

    make validate-pipelines

Signed-off-by: Russell Bryant <[email protected]>

commit 3090e65
Author: Russell Bryant [email protected]
Date: Sun Jul 14 13:00:03 2024 -0400

Run pipeline config validation in CI

Updating the existing linting job to include pipeline config
validation, as well.

Signed-off-by: Russell Bryant <[email protected]>

russellb added 4 commits July 14, 2024 12:50
This commit introduces a `v1.json` jsonschema for use with validating
pipeline configuration. This follows the same format used in the
`instructlab/schema` repository which we use for validating taxonomy
`qna.yaml` files.

It's very likely that the schema is missing some details that are
allowed. It does pass against all configurations currently in the
source tree, though.

Signed-off-by: Russell Bryant <[email protected]>
This script loads the pipeline validation schema and validates all
configurations under `src/instructlab/sdg/pipelines/`.

Signed-off-by: Russell Bryant <[email protected]>
To run the script that validates pipeline configurations, you may now
run:

    tox -e validate-pipelines

or

    make validate-pipelines

Signed-off-by: Russell Bryant <[email protected]>
Updating the existing linting job to include pipeline config
validation, as well.

Signed-off-by: Russell Bryant <[email protected]>
@mergify mergify bot added CI/CD Affects CI/CD configuration testing Relates to testing labels Jul 14, 2024
@russellb russellb marked this pull request as ready for review July 14, 2024 17:07
@russellb
Copy link
Member Author

russellb commented Jul 14, 2024

Also, a follow-up enhancement to this would be to utilize the schema at runtime to validate custom schemas as soon as they are loaded and prior to trying to execute them.

This would require some special care, as right now, validation will fail if any fields are present that were not expected. The code currently allows that. It may occur with a config using a newer minor version that the code doesn't understand.

If we are OK rejecting any configs that are a newer version than what the code knows about, I think it would work OK.

markmc added 5 commits July 15, 2024 16:36
This is how a pipeline can specify a custom model to use for an
LLMBlock.

Signed-off-by: Mark McLoughlin <[email protected]>
These are LLMBlock specific.

Signed-off-by: Mark McLoughlin <[email protected]>
Signed-off-by: Mark McLoughlin <[email protected]>
src/instructlab/sdg/pipelines/schema/v1.json Outdated Show resolved Hide resolved
src/instructlab/sdg/pipelines/schema/v1.json Outdated Show resolved Hide resolved
src/instructlab/sdg/pipelines/schema/v1.json Show resolved Hide resolved
src/instructlab/sdg/pipelines/schema/v1.json Outdated Show resolved Hide resolved


def main():
schema_path = "src/instructlab/sdg/pipelines/schema/v1.json"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about loading this from the instructlab.sdg package

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that would be better

with open(schema_path, "r") as file:
schema = json.load(file)

yaml_files = glob.glob("src/instructlab/sdg/pipelines/**/*.yaml", recursive=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And loading these from a command-line argument?

Then it can be a tool that can be used to validate custom pipeline configs using the installed instructlab.sdg schema?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that sounds good.

We could even leave this as the default when you don't pass an argument

src/instructlab/sdg/pipelines/schema/v1.json Show resolved Hide resolved
src/instructlab/sdg/pipelines/schema/v1.json Outdated Show resolved Hide resolved
src/instructlab/sdg/pipelines/schema/v1.json Outdated Show resolved Hide resolved
src/instructlab/sdg/pipelines/schema/v1.json Outdated Show resolved Hide resolved
src/instructlab/sdg/pipelines/schema/v1.json Outdated Show resolved Hide resolved
Copy link
Contributor

@markmc markmc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main outstanding comment is about making this a bit more general purpose, but that can come later

I think this is good to go ... I'm fine if you'd prefer to squash all my commits though

@markmc markmc added this to the 0.1.3 milestone Jul 15, 2024
@russellb
Copy link
Member Author

My main outstanding comment is about making this a bit more general purpose, but that can come later

I think this is good to go ... I'm fine if you'd prefer to squash all my commits though

Thanks for the additions! lgtm.

I'm going to merge as having reviewed each others additions to the PR.

@russellb russellb merged commit b49f961 into instructlab:main Jul 15, 2024
12 checks passed
@russellb
Copy link
Member Author

filed this for making the validation script more generally useful -- #139

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI/CD Affects CI/CD configuration testing Relates to testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a schema for validating pipeline configuration
2 participants