Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON Schema to GBNF integration tests #7790

Merged
merged 7 commits into from
Jun 22, 2024

Conversation

HanClinto
Copy link
Collaborator

@HanClinto HanClinto commented Jun 6, 2024

This adds a number of FAILING tests to exercise JSON schema conversion. These failures are good, as they show bugs in our current schema-to-grammar system (as noted by #7703 and #7789 ).

I originally started writing these tests to help me dig into the root causes behind #7703 , but I ran into some other issues along the way (hence #7789 ).

Of the failing tests, in particular, there are a number of "pass" test cases that are marked with TODO that SHOULD pass and currently do NOT.

The "fail" test cases that are improperly passing are not as big of a deal (these are things that we can't really expect to support, such as numeric minimums or ensuring uniqueness of array entries), so feel free to ignore those TODO entries.

This is meant as an aide to anyone else who wants to chip in on any of the json-schema features / bugs and wants an easier way to debug the json schema generation and test it against sample generations. Very helpfully the json-schema.org documentation often includes examples of matching and non-matching strings for example schemas in their documentation, and I have begun copying those into these integration tests directly.

@github-actions github-actions bot added the testing Everything test related label Jun 6, 2024
@HanClinto
Copy link
Collaborator Author

@ochafik Check out these latest tests -- I think you might like them. :)

@ochafik
Copy link
Collaborator

ochafik commented Jun 6, 2024

Awesome! Was plotting something similar in #7797 :-)

I'll take a deeper look in the coming days!

@HanClinto
Copy link
Collaborator Author

Awesome! Was plotting something similar in #7797 :-)

I'll take a deeper look in the coming days!

Haha -- we went down nearly identical roads. Your framework is much nicer than mine -- I'll pull your framework into this branch and update my tests accordingly. Then you should be able to pull it into your branch more easily if you want. Nicely done!!

@mofosyne mofosyne added the Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level label Jun 12, 2024
@HanClinto HanClinto force-pushed the tests-json-schema-integration branch from bd3ebb6 to cfccb6c Compare June 12, 2024 17:42
@HanClinto
Copy link
Collaborator Author

HanClinto commented Jun 12, 2024

Okay, ready for review / merge.

I incorporated the improvements from @ochafik 's test harness in #7797 , and also added a #define INCLUDE_FAILING_TESTS 1 at the top that can be commented / uncommented to exclude / include the failing test cases. This is so that those test cases don't break CI, but the PR can still be merged into master so that other PRs can build on it (#7840, #7797, #7863, etc).

@HanClinto HanClinto force-pushed the tests-json-schema-integration branch from 307619d to 488cbfa Compare June 12, 2024 23:45
Copy link
Collaborator

@ochafik ochafik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing, thanks @HanClinto !

fclose(string_file);
}

fprintf(stderr, "\n NOTE: Debug grammar file generated. To analyze this failure in detail, run the following command: ./gbnf-validator test-grammar-integration.grammar.gbnf test-grammar-integration.string.txt\n\n");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now ./llama-gbnf-validator :-D

test_schema(
"string w/ min length 3",
// Schema
R"""(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just indent across string boundary?

        R"""({
          "type": "string",
          "minLength": 3
        })""",

},
// Failing strings
{
"",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"""",
"true"

);

test_schema(
"exotic formats (list)",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would split "data", "uuid", "time" and "date-time" test cases

},
// Failing strings
{
"{}",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

""",
""a",
"a"",
""\"",

// "By extension, even an empty object is valid"
R"""({})""",
// "By default, providing additional properties is valid"
#ifdef INCLUDE_FAILING_TESTS
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for preparing these!

I would just move them down to the failed branch (w/ caveat header), hopefully I'll move them back up very soon :-) (not a fan of #ifdef dead code)

)""",
// Passing strings
{
"{\"b\": \"foo\", \"a\": \"bar\"}",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: TBH I'd be tempted to use R"""({...})""" as soon as any backslash is needed, to make it easier to copy-paste things.

{
"{}", // Missing all required properties
"{\"productName\": \"A green door\", \"price\": 12.50, \"productId\": 1}", // Out of order properties
// TODO: The following line should fail, but currently it passes. `exclusiveMinimum` is not supported, as it would likely be too difficult to implement.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: exclusiveMinimum for number is on my roadmap :-)

{
"{}", // Missing all required properties
"{\"productName\": \"A green door\", \"price\": 12.50, \"productId\": 1}", // Out of order properties
// TODO: The following line should fail, but currently it passes. `exclusiveMinimum` is not supported, as it would likely be too difficult to implement.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uncomment & move it up / to passing?

@HanClinto HanClinto force-pushed the tests-json-schema-integration branch from 488cbfa to 8713e57 Compare June 22, 2024 02:47
…n validation against auto-generated JSON-schema grammars.
…ing the ability to automatically output improperly failing grammars to debug output files so they can more easily be examined in the gbnf-validator program.
…n pass CI, but still be useful for other PRs that want to leverage the framework.
@HanClinto HanClinto force-pushed the tests-json-schema-integration branch from 8713e57 to 3bdf7c3 Compare June 22, 2024 02:54
@HanClinto HanClinto merged commit c5a8d4b into ggerganov:master Jun 22, 2024
63 of 64 checks passed
@HanClinto
Copy link
Collaborator Author

@ochafik Thank you very much for the review! I implemented most of your suggestions, but there were a couple that I didn't bother with (that I figure can get resolved in #7797 and others).

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Jun 30, 2024
* Adding simple bare-bones test for end-to-end integration test for json validation against auto-generated JSON-schema grammars.

* Adding additional examples as documented in ggerganov#7789 . Also adding the ability to automatically output improperly failing grammars to debug output files so they can more easily be examined in the gbnf-validator program.

* Uncommenting formerly commented tests so that they fail for others who are attempting to reproduce the bugs.

* Merging improved schema test methods added by @ochafik in ggerganov#7797

* Adding #define to temporarily remove failing tests so that this PR can pass CI, but still be useful for other PRs that want to leverage the framework.

* Fixing nits from ochafik. Removing escape slashes, adding additional failing cases, fixing some other strings.

* Fixing grammar indentation to be consistent throughout file.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants