-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
convert_hf : fix Gemma v1 conversion #8597
Conversation
* convert_hf : allow renaming tokens, but with a warning
@compilade Have you tried running the converted model? It outputs garbage for me.
Resulted in:
full log
I tried bigger contexts and flash attention, but it made no difference. I had the same issue in #7992. Model with duplicated keys worked. With master:
|
Concept makes sense, but should check G's observation as well. Would it make sense to error on certain warnings by default but not if Not sure you can easily fix models that was previously converted incorrectly. |
@Galunid I did not try yet (didn't finish downloading), but now I've tried https://huggingface.co/google/gemma-1.1-2b-it and I can confirm it's outputting garbage. I'll try to figure out what's wrong. Thanks for warning me about this.
@mofosyne In this case, the thing which was prevented before was renaming tokens which were not
In this case these models don't exist because conversion failed previously. The warnings for token renaming are replacing assertions. Unless you're talking about the changes in #8228 which make the Gemma v1 tokenizer correctly pre-tokenize HTML tags. I'm turning this PR into a draft until I figure out if the garbage output is caused by something related to the tokenizer or lazy conversion or something else. |
Wait, this seems wrong:
|
Also
Difference between master and PR seems to be
That's the model from
outputs nothing (that's for the model from this PR). Here's a diff between the two ( 4c4
< GGUF.kv_count : [27]
---
> GGUF.kv_count : [33]
20c20
< general.file_type : [1]
---
> general.file_type : [15]
25a26,31
> tokenizer.ggml.bos_token_id : [2]
> tokenizer.ggml.eos_token_id : [1]
> tokenizer.ggml.unknown_token_id : [3]
> tokenizer.ggml.padding_token_id : [0]
> tokenizer.ggml.add_bos_token : [ True]
> tokenizer.ggml.add_eos_token : [False] |
Had this happen for me as well, and from the logs it was because the model was continuously outputting
Yes, this is because Running it twice could work, but Alternatively, making the |
@Galunid I think the problem has been fixed in 50d1a03. It still runs $ ./build/bin/llama-cli --log-disable -m models/gemma-1.1-2B-it-Q8_0.gguf -p "To make pizza," -n 60 --temp 0
To make pizza, you need to:
A. Preheat the oven to 500 degrees Fahrenheit.
B. Spread pizza dough on a baking sheet.
C. Bake the pizza for 10 minutes.
D. Add cheese and toppings to the pizza and bake for an additional 5 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That did the trick
* convert_hf : fix Gemma v1 conversion * convert_hf : allow renaming tokens, but with a warning * convert_hf : fix Gemma v1 not setting BOS and EOS tokens
Should fix #7897 and fix #7923. Conversion for Gemma v1 instruct models was broken by #7827, because the conversion for Gemma originally (and erroneously) relied on duplicated GGUF keys and these are no longer allowed since that PR.
I've also allowed renaming tokens, but with a warning, which allows converting finetunes of Gemma like https://huggingface.co/Columbia-NLP/gemma-2b-zephyr-dpo which change the text of some control tokens.
cc @bartowski1182, @maab19, @TobiasKlapper, since you've noticed that problem before.