Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP for adding support for Tekken tokenizer needed for Mistral NeMo #8578

Closed
wants to merge 1 commit into from

Conversation

HanClinto
Copy link
Collaborator

Attempting to add support for Mistral NeMo (#8577), but I've never added support for a new model before, so this is heavily a WIP. I need to take a break for a while, so uploading my notes here in case it's useful for anyone else.

They claim it can be a drop-in replacement of Mistral 7B, so surely it shouldn't be too much work to make it work with ggml since Mistral 7B works.

While the model architecture may be a drop-in replacement for Mistral 7B, the tokenizer is not (yet) added to our list of supported BPE tokenizers. Attempting to quantize Mistra-NeMo via GGUF-my-repo results in:

Error: Error converting to fp16: b'INFO:hf-to-gguf:Loading model: Mistral-Nemo-Instruct-2407
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:gguf: context length = 1024000
INFO:hf-to-gguf:gguf: embedding length = 5120
INFO:hf-to-gguf:gguf: feed forward length = 14336
INFO:hf-to-gguf:gguf: head count = 32
INFO:hf-to-gguf:gguf: key-value head count = 8
INFO:hf-to-gguf:gguf: rope theta = 1000000.0
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05
INFO:hf-to-gguf:gguf: file type = 1
INFO:hf-to-gguf:Set model tokenizer
WARNING:hf-to-gguf:

WARNING:hf-to-gguf:**********************************************************************************
WARNING:hf-to-gguf:** WARNING: The BPE pre-tokenizer was not recognized!
WARNING:hf-to-gguf:** There are 2 possible reasons for this:
WARNING:hf-to-gguf:** - the model has not been added to convert_hf_to_gguf_update.py yet
WARNING:hf-to-gguf:** - the pre-tokenization config has changed upstream
WARNING:hf-to-gguf:** Check your model files and convert_hf_to_gguf_update.py and update them accordingly.
WARNING:hf-to-gguf:** ref: https://github.com/ggerganov/llama.cpp/pull/6920
WARNING:hf-to-gguf:**
WARNING:hf-to-gguf:** chkhsh: aa78fe8b04bc622b077520b1fb3d3a5c6f7a53dd375e2361e62599be3cf58de1
WARNING:hf-to-gguf:**********************************************************************************
WARNING:hf-to-gguf:

I have not yet expanded the tests to include the new tokenizer.

I haven't figured out any other settings or options that may need to be set for this tokenizer.

I haven't looked into the regex used by llm_tokenizer_bpe to see if it needs to be changed from the default or not.

Basically it's drastically untested, and I would have liked to get this further before uploading a WIP.

@github-actions github-actions bot added the python python script changes label Jul 18, 2024
@HanClinto
Copy link
Collaborator Author

Superceded by #8579

@HanClinto HanClinto closed this Jul 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python python script changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant