-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
grammars: x{min,max} repetition operator #6640
Conversation
…PRINT_ALL=1 to force all)
I like the pretty print options that you've put in here. One thing that we might want to consider is to port your pretty-print functions to gbnf-validator.cpp so that people can pretty-print arbitrary grammars? |
haha -- for sure! If nothing else, hopefully I'm good at casting doubt. :) Daydreaming about this problem, and I'm pondering tackling it a completely different way. An alternate approach I was dreaming about is adding a new operator type to So something like
Or something like that. Then when building the stacks in llama_advance_grammar, the REPEAT_N operator would essentially function like a rule ref, except that it would hydrate the previous rule and if the repeat .value is > 1, then append another REPEAT_N operator with the value-1. Not sure if I'm making sense or not, but what I like about this is that (hopefully) wouldn't need to hydrate specific rules for repeat = 1000 -- we could just set .value to 1000 and let it advance a bit more intelligently. No idea if this would work or not, but this is a very loose daydream of an idea that has been flitting around in my head for a few days, and I wanted to write it down before it escaped again. :) I've not made any progress on doing a POC for this or anything -- I'm just wondering about the feasibility of this approach right now. |
@HanClinto I did wonder about something like this, I think it would work and the stack might just need to be made of a (edit) vector of element reference + times it's been repeated. I wonder how much better it would perform, maybe it would allow breaking the new 100k repetition barrier? On my side I'm probably done for this week, but keen to explore the head set optimization route later (cf. #4218 (comment) ; incubating here, which is mostly painful refactoring & prep work, next step is to update the stack with head-plausible alternatives only when accepting a char). Edit: half of me is hoping you'll have implemented this alternative approach by the end of the weekend, the other half is hoping my silly head set code won't have bitrotten too much with fancy new alternative logic :-p |
Co-authored-by: Clint Herron <[email protected]>
@HanClinto Might want to add different types for repeat exactly N, repeat unbounded, and repeat up to N times. |
Okay -- I think that's probably enough of an encouragement for me to at least try taking a stab at it this weekend.
Yeah, that's what I'm dreaming of! :)
This sounds really hopeful! I don't think I'm as familiar with classical grammar optimizations as you are, but this sounds really reasonable. Looking through the code, I think that there is a LOT of possible optimization work to be done in I could be seriously off base, but regardless, I'm excited to see what you have in store -- especially if it's a generally proven optimization commonly applied to parsers. :)
haha -- fair enough. :D I'll take a stab at it and see how far I get. If I can't get something working in a weekend, it might be a deadend, so we'll see what Monday brings. BTW, hacking on this with you has been a TON of fun -- thank you!! |
The way I'm imagining it, LLAMA_GRETYPE_REPEAT_N would mean "repeat exactly N times" (with a possible special case where value == 0 means to repeat unbounded). I was originally imagining that repeating up to N times would be handled with LLAMA_GRETYPE_REPEAT_N with value == N, and a blank alternate rule to follow (similar to how Thank you for helping me think this through! I'm still not confident that it's going to work, but I'm increasingly confident that it would be worth exploring. Regardless, unless I'm really off base with my understanding of what the head set optimization involves, I don't imagine that this new token would step on those toes at all, and I think that's worth exploring independently. |
Love this work @ochafik! I plan to take a look soon, need to wrap my head around the parser changes and the new JSON schema converter (now in C++?). |
@ejones thanks!!
Love your work on grammar sampling btw! Took me a while to wrap my own head around it, it's smart! (Some related changes you might have missed: #6616, #6609, and upcoming #6644)
Hehe yeah, as I was improving the schema support in #5978 I realised I had to port the changes to JS, and thought it would be cool to have it in the server (like llama-cpp-python & Together AI do, w/ param |
@ejones @HanClinto lemme know if I can help clarify any changes :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry for taking so long to get back to this. This all looks good to me, and it will be great to get this merged in!
Howdy, {0,N} seems broken. It only ever returns 0.
Also, with a range of numbers, it always seems to try to return as many as possible even if it means repeating the last! |
Out of curiosity, what if you try this with a different model? I wonder if we're running into odd edge-case tokenization bugs. Also, I noticed different responses whether I had a space at the end of my input prompt or not. Maybe play with that as well? |
@ShelbyJenkins Depending on the models & tasks I've had more success sometimes by using greedy sampling ( If you are able to share a self-contained invocation CLI of a pathological case that would help! The example from this PR doesn't consume all the repetitions allowed even if cranked up to 10k, for instance: ./main \
-mu https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/resolve/main/Phi-3-mini-4k-instruct-q4.gguf \
--grammar 'root ::= [a-zA-Z.,: \n]{0,10000}' \
-p "Here is a haiku for you: " \
--no-display-prompt --log-disable --seed 42
|
For {0,n}:
For {n,n}: |
@ShelbyJenkins Oh this one is weirding me out indeed. A workaround seems to be Could you please open a bug so we can keep tracking this? Tested w/ Llama3 and Phi3, even tried adding things like ./main --log-disable --no-display-prompt \
-mu https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct-Q8_0.gguf \
--grammar 'root ::= item{0,7}
item ::= "- " [^\r\n\x0b\x0c\x85\u2028\u2029]+ "\n"' \
-p "instructions: For each discrete topic in this text, provide a short ELI5 sentence describing the topic.\\nuser input: In computer science, Backus–Naur form (/ˌbækəs ˈnaʊər/) (BNF or Backus normal form) is a notation used to describe the syntax of programming languages or other formal languages. It was developed by John Backus and Peter Naur. BNF can be described as a metasyntax notation for context-free grammars. Backus–Naur form is applied wherever exact descriptions of languages are needed, such as in official language specifications, in manuals, and in textbooks on programming language theory. BNF can be used to describe document formats, instruction sets, and communication protocols.\\n\\nOver time, many extensions and variants of the original Backus–Naur notation have been created; some are exactly defined, including extended Backus–Naur form (EBNF) and augmented Backus–Naur form (ABNF). Invented in 1976." \
./main --log-disable --no-display-prompt \
-mu https://huggingface.co/bartowski/Phi-3-mini-4k-instruct-GGUF/resolve/main/Phi-3-mini-4k-instruct-Q8_0.gguf \
--grammar 'root ::= item{0,7}
item ::= "- " [^\r\n\x0b\x0c\x85\u2028\u2029]+ "\n"' \
-p "instructions: For each discrete topic in this text, provide a short ELI5 sentence describing the topic.\\nuser input: In computer science, Backus–Naur form (/ˌbækəs ˈnaʊər/) (BNF or Backus normal form) is a notation used to describe the syntax of programming languages or other formal languages. It was developed by John Backus and Peter Naur. BNF can be described as a metasyntax notation for context-free grammars. Backus–Naur form is applied wherever exact descriptions of languages are needed, such as in official language specifications, in manuals, and in textbooks on programming language theory. BNF can be used to describe document formats, instruction sets, and communication protocols.\\n\\nOver time, many extensions and variants of the original Backus–Naur notation have been created; some are exactly defined, including extended Backus–Naur form (EBNF) and augmented Backus–Naur form (ABNF). Invented in 1976." \ |
I'm trying to port this to python wrapper but I did something wrong, and god knows why he decided to re-implement everything in python. If someone's ready to help, feel free to add a review in there : abetlen/llama-cpp-python#1637 |
Add bounded repetition operators
x{n}
,x{,n}
,x{m,n}
,x{m,}
to GBNF (unified w/+
/*
/?
), and update JSON schema converters to use them.Also improved the parser test w/ support to pretty print expectations for easier update.
Notes:
--grammar 'root ::= .{10,100}'
to generate between 10 and 100 chars.it should be more efficient to handle it in the grammar itself(edit: no measurable performance improvement grammars: x{min,max} repetition operator #6640 (comment) but generated grammars are much simpler) (+ benefits non-JSON use cases)Rewrite rules
Used to be:
Now it's:
Which means in practice
*
/?
don't change but+
does:TODO before undrafting:
{n}
repetition+
/*
/?
)cc/ @HanClinto (thanks for casting doubts on the rules rewrite in #4218 (comment) !)
cc/ @ejones