-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
convert : refactor rope_freqs generation #9396
convert : refactor rope_freqs generation #9396
Conversation
This should also fix vocab-only conversion for Phi-3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for the implementation!
Btw, you can use |
Related to issue Support for Phi-3 models #6849 |
Should we merge this, or wait for the rest of the tests in OP to be confirmed? |
I ran the test locally and can confirm that it passes. Let's wait for final confirmation from @compilade to merge this. |
Since #9322 was merged, MiniCPM3's conversion also has to be updated before merging this. I'll update it today. |
MiniCPM3's tokenizer is treated as a SentencePiece tokenizer to avoid having to run its custom Python code which mixes tokenization in the same file as tool calls. gguf-py : add long and short RoPE factors to tensor mappings Empty, but the key names are used to populate the mappings.
@compilade Hey bro, when I try to convert Minicpm3 to
Will it be implemented by this PR? Thanks! |
Yes, actually, in e83d270 I've changed how MiniCPM3's tokenizer is loaded to exactly avoid that custom code. It uses SentencePiece directly instead. I think it results in the same model files, but I didn't test that yet because I can't really run the custom tokenization code since it repends on That was a single line change in |
Got that, as we have been turning to NixOS (especially for production use) these days. Hope everything goes well! ❤️ |
* convert : refactor rope_freqs generation This should also fix vocab-only conversion for Phi-3. * convert : adapt MiniCPM3 to separate rope_freqs insertion MiniCPM3's tokenizer is treated as a SentencePiece tokenizer to avoid having to run its custom Python code which mixes tokenization in the same file as tool calls. gguf-py : add long and short RoPE factors to tensor mappings Empty, but the key names are used to populate the mappings.
* convert : refactor rope_freqs generation This should also fix vocab-only conversion for Phi-3. * convert : adapt MiniCPM3 to separate rope_freqs insertion MiniCPM3's tokenizer is treated as a SentencePiece tokenizer to avoid having to run its custom Python code which mixes tokenization in the same file as tool calls. gguf-py : add long and short RoPE factors to tensor mappings Empty, but the key names are used to populate the mappings.
* convert : refactor rope_freqs generation This should also fix vocab-only conversion for Phi-3. * convert : adapt MiniCPM3 to separate rope_freqs insertion MiniCPM3's tokenizer is treated as a SentencePiece tokenizer to avoid having to run its custom Python code which mixes tokenization in the same file as tool calls. gguf-py : add long and short RoPE factors to tensor mappings Empty, but the key names are used to populate the mappings.
Follow-up from #9117 (comment).
This isolates handling of generated tensors like
rope_freqs
for Llama3, Phi-3 and Exaone. This should also fix--vocab-only
conversion for Phi-3-128k and Phi-3.5 (which previously generated invalid GGUF files because they included a non-zero tensor count while not including any tensor data).Note that this will also be relevant for MiniCPM3 (#9322), which re-uses the misbehaving Phi-3 rope tensors insertion.
TODO
llama-tokenize
too)rope_freqs.weight
, and has the same checksum as a previous conversion I did a while ago with the same checkout of the upstream modelllama-tokenize
)tests/test-lora-conversion-inference.sh
rope_freqs
, though.master