Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert : refactor rope_freqs generation #9396

Merged

Conversation

compilade
Copy link
Collaborator

@compilade compilade commented Sep 10, 2024

Follow-up from #9117 (comment).

This isolates handling of generated tensors like rope_freqs for Llama3, Phi-3 and Exaone. This should also fix --vocab-only conversion for Phi-3-128k and Phi-3.5 (which previously generated invalid GGUF files because they included a non-zero tensor count while not including any tensor data).

Note that this will also be relevant for MiniCPM3 (#9322), which re-uses the misbehaving Phi-3 rope tensors insertion.

TODO

  • Test Phi-3-mini-128k LoRA
  • Test Phi-3-mini-128k vocab-only validity (with llama-tokenize too)
  • Test Phi-3-mini-128k conversion
  • Test MiniCPM3 conversion
  • Test MiniCPM3 vocab-only validity
  • Test Llama-3.1 LoRA
  • Test Llama-3.1 conversion
    • includes rope_freqs.weight, and has the same checksum as a previous conversion I did a while ago with the same checkout of the upstream model
  • Test Llama-3.1 vocab-only (also with llama-tokenize)
  • Run tests/test-lora-conversion-inference.sh
    • Doesn't really test rope_freqs, though.
  • Test Llama-3.2-1B-Instruct conversion
    • The resulting model file is exactly identical as when converting from master
    • The vocab-only file is also identical as what it was before

This should also fix vocab-only conversion for Phi-3.
@compilade compilade added refactoring Refactoring bugfix fixes an issue or bug python python script changes labels Sep 10, 2024
@compilade compilade requested a review from ngxson September 10, 2024 00:59
@compilade compilade added the Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix label Sep 10, 2024
Copy link
Collaborator

@ngxson ngxson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the implementation!

@ngxson
Copy link
Collaborator

ngxson commented Sep 10, 2024

Btw, you can use scripts/test-lora-conversion-inference.sh. It will perform end-to-end conversion-inference test for lora gemma2, phi3 and llama arch (note: these are toy models, so they don't have rope_freqs)

@ThiloteE
Copy link

Related to issue Support for Phi-3 models #6849

@bioinformatist bioinformatist mentioned this pull request Sep 12, 2024
4 tasks
@ggerganov
Copy link
Owner

Should we merge this, or wait for the rest of the tests in OP to be confirmed?

@ngxson
Copy link
Collaborator

ngxson commented Sep 16, 2024

I ran the test locally and can confirm that it passes. Let's wait for final confirmation from @compilade to merge this.

@compilade
Copy link
Collaborator Author

compilade commented Sep 16, 2024

Should we merge this, or wait for the rest of the tests in OP to be confirmed?

Since #9322 was merged, MiniCPM3's conversion also has to be updated before merging this. I'll update it today.

MiniCPM3's tokenizer is treated as a SentencePiece tokenizer to avoid
having to run its custom Python code which mixes tokenization
in the same file as tool calls.

gguf-py : add long and short RoPE factors to tensor mappings

Empty, but the key names are used to populate the mappings.
@bioinformatist
Copy link

bioinformatist commented Sep 18, 2024

@compilade Hey bro, when I try to convert Minicpm3 to .gguf format, it says:

The repository for /models contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co//models.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Will it be implemented by this PR? Thanks!

@compilade
Copy link
Collaborator Author

@compilade Hey bro, when I try to convert Minicpm3 to .gguf format, it says:

The repository for /models contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co//models.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Will it be implemented by this PR? Thanks!

@bioinformatist

Yes, actually, in e83d270 I've changed how MiniCPM3's tokenizer is loaded to exactly avoid that custom code. It uses SentencePiece directly instead. I think it results in the same model files, but I didn't test that yet because I can't really run the custom tokenization code since it repends on datamodel_code_generator which is not in Nixpkgs.

That was a single line change in set_vocab.

@bioinformatist
Copy link

Got that, as we have been turning to NixOS (especially for production use) these days. Hope everything goes well! ❤️

@compilade compilade added the merge ready indicates that this may be ready to merge soon and is just holding out in case of objections label Sep 30, 2024
@ggerganov ggerganov merged commit 1927378 into ggerganov:master Oct 1, 2024
9 checks passed
@compilade compilade mentioned this pull request Oct 6, 2024
4 tasks
dsx1986 pushed a commit to dsx1986/llama.cpp that referenced this pull request Oct 29, 2024
* convert : refactor rope_freqs generation

This should also fix vocab-only conversion for Phi-3.

* convert : adapt MiniCPM3 to separate rope_freqs insertion

MiniCPM3's tokenizer is treated as a SentencePiece tokenizer to avoid
having to run its custom Python code which mixes tokenization
in the same file as tool calls.

gguf-py : add long and short RoPE factors to tensor mappings

Empty, but the key names are used to populate the mappings.
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024
* convert : refactor rope_freqs generation

This should also fix vocab-only conversion for Phi-3.

* convert : adapt MiniCPM3 to separate rope_freqs insertion

MiniCPM3's tokenizer is treated as a SentencePiece tokenizer to avoid
having to run its custom Python code which mixes tokenization
in the same file as tool calls.

gguf-py : add long and short RoPE factors to tensor mappings

Empty, but the key names are used to populate the mappings.
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024
* convert : refactor rope_freqs generation

This should also fix vocab-only conversion for Phi-3.

* convert : adapt MiniCPM3 to separate rope_freqs insertion

MiniCPM3's tokenizer is treated as a SentencePiece tokenizer to avoid
having to run its custom Python code which mixes tokenization
in the same file as tool calls.

gguf-py : add long and short RoPE factors to tensor mappings

Empty, but the key names are used to populate the mappings.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix fixes an issue or bug merge ready indicates that this may be ready to merge soon and is just holding out in case of objections python python script changes refactoring Refactoring Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants