-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GBNF Validator Program #5948
GBNF Validator Program #5948
Conversation
I've been using this program the past several days, and while it has been invaluable for me to debug my grammars, there are a number of flaws in it. In hindsight, I should not have developed it this way at all, and instead oriented everything around sampling.cpp. I will attempt to rewrite this program. Please do not merge as-is. |
2878dd7
to
cb63990
Compare
cb63990
to
7320bf1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should look into exposing a llama_grammar_context
API in llama.h
and integrating the grammar parsing functionality in the core library. For now we can merge this as an intermediate stage
It may be valuable to expose the validator as an internal API too, e.g. to write e2e tests for the JSON -> grammar conversion (see #5978), for instance checking that some non-schema-compliant JSON is rejected by the converted grammars. |
That's an interesting idea. There have been a couple of recent PRs (#6004, #5950) that have expanded the "core" capabilities of the grammar parser for validation, so now the piece of code that validates the grammar is pretty short:
This code is copied nearly verbatim from The main reason I didn't use that function in my program is that I didn't want to bother with generating a full sampling context, but in hindsight maybe it wouldn't be so bad. Regardless, if in your server you're generating sampling contexts, then you should (as of those recent PRs) have enough information to at least do basic validation of your grammars. Would that be enough for your needs, or would something additional be helpful? |
Thanks for the snippet, actually I've already added validation to #5978 tests (see verify_expectations_parseable in test-json-schema-to-grammar.cpp), but this PR (cool job btw!) will allow going one step further, e.g. testing that the grammar generated for the json schema |
Aaah, that makes more sense to me. Yes, that sounds really great! I'm not sure the best way to integrate this code into the core -- I think my original intention was for my I'm finishing up the requested changes (moving away from streams to |
084bfa4
to
f5c7582
Compare
Changes are in place -- ready for re-review. |
@ggerganov I think I might have accidentally had this marked as draft before, which might have blocked it from getting merged. Is there anything more I should do before this can be merged in? |
I didn't merge it because it was draft and wasn't sure if more things were planned. All good now! |
* Revising GBNF validator program to be much simpler. * Changing from streams to using cstdio * Adding final newline character.
This adds a program to do validation of arbitrary example input files against a GBNF grammar file. I built this to help me in some other experiments that I'm doing with using GBNF to guarantee syntactically-correct queries when doing text-to-SQL, and I needed a utility to help me validate my grammar.
I originally started building this tool in an external language (Python), but found myself re-implementing the parser structure, and I figured that it would be safer to use the implementation within llama.cpp itself to ensure 1-to-1 behavior matching, so switched to C++ instead of Python.
Usage is:
I'm not sold that I did it in a very good way. I had to expose a lot of the grammar functions and structures via the internal API, and it feels like it might be a little too hacked together. That said, it feels like a useful tool as-is, and I at least wanted to submit it for review to get some feedback.