-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automate validation of pipeline configs #132
Conversation
This commit introduces a `v1.json` jsonschema for use with validating pipeline configuration. This follows the same format used in the `instructlab/schema` repository which we use for validating taxonomy `qna.yaml` files. It's very likely that the schema is missing some details that are allowed. It does pass against all configurations currently in the source tree, though. Signed-off-by: Russell Bryant <[email protected]>
This script loads the pipeline validation schema and validates all configurations under `src/instructlab/sdg/pipelines/`. Signed-off-by: Russell Bryant <[email protected]>
To run the script that validates pipeline configurations, you may now run: tox -e validate-pipelines or make validate-pipelines Signed-off-by: Russell Bryant <[email protected]>
Updating the existing linting job to include pipeline config validation, as well. Signed-off-by: Russell Bryant <[email protected]>
Also, a follow-up enhancement to this would be to utilize the schema at runtime to validate custom schemas as soon as they are loaded and prior to trying to execute them. This would require some special care, as right now, validation will fail if any fields are present that were not expected. The code currently allows that. It may occur with a config using a newer minor version that the code doesn't understand. If we are OK rejecting any configs that are a newer version than what the code knows about, I think it would work OK. |
This is how a pipeline can specify a custom model to use for an LLMBlock. Signed-off-by: Mark McLoughlin <[email protected]>
These are LLMBlock specific. Signed-off-by: Mark McLoughlin <[email protected]>
Signed-off-by: Mark McLoughlin <[email protected]>
Signed-off-by: Mark McLoughlin <[email protected]>
Signed-off-by: Mark McLoughlin <[email protected]>
|
||
|
||
def main(): | ||
schema_path = "src/instructlab/sdg/pipelines/schema/v1.json" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about loading this from the instructlab.sdg
package
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that would be better
with open(schema_path, "r") as file: | ||
schema = json.load(file) | ||
|
||
yaml_files = glob.glob("src/instructlab/sdg/pipelines/**/*.yaml", recursive=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And loading these from a command-line argument?
Then it can be a tool that can be used to validate custom pipeline configs using the installed instructlab.sdg
schema?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that sounds good.
We could even leave this as the default when you don't pass an argument
Signed-off-by: Mark McLoughlin <[email protected]>
Signed-off-by: Mark McLoughlin <[email protected]>
Signed-off-by: Mark McLoughlin <[email protected]>
Signed-off-by: Mark McLoughlin <[email protected]>
Signed-off-by: Mark McLoughlin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main outstanding comment is about making this a bit more general purpose, but that can come later
I think this is good to go ... I'm fine if you'd prefer to squash all my commits though
Thanks for the additions! lgtm. I'm going to merge as having reviewed each others additions to the PR. |
filed this for making the validation script more generally useful -- #139 |
Closes #131
1cfd4bb Add jsonschema for validating pipeline configuration
c3bd917 Add a script that will validate pipeline configurations
18b4741 Add tox env for validating pipeline configuration
3090e65 Run pipeline config validation in CI
commit 1cfd4bb
Author: Russell Bryant [email protected]
Date: Sun Jul 14 12:50:45 2024 -0400
commit c3bd917
Author: Russell Bryant [email protected]
Date: Sun Jul 14 12:53:43 2024 -0400
commit 18b4741
Author: Russell Bryant [email protected]
Date: Sun Jul 14 12:52:42 2024 -0400
commit 3090e65
Author: Russell Bryant [email protected]
Date: Sun Jul 14 13:00:03 2024 -0400