Bug: WARNING: The BPE pre-tokenizer was not recognized! #9927

smileyboy2019 · 2024-10-17T15:13:26Z

What happened?

WARNING:hf-to-gguf:**************************************************************************************
WARNING:hf-to-gguf:** WARNING: The BPE pre-tokenizer was not recognized!
WARNING:hf-to-gguf:** There are 2 possible reasons for this:
WARNING:hf-to-gguf:** - the model has not been added to convert_hf_to_gguf_update.py yet
WARNING:hf-to-gguf:** - the pre-tokenization config has changed upstream
WARNING:hf-to-gguf:** Check your model files and convert_hf_to_gguf_update.py and update them accordingly.
WARNING:hf-to-gguf:** ref: #6920
WARNING:hf-to-gguf:**
WARNING:hf-to-gguf:** chkhsh: 8e62295832751ca1e8f92f2226f403dea30dc5165e448b5bfa05af5340c64ec7
WARNING:hf-to-gguf:**************************************************************************************
WARNING:hf-to-gguf:

Traceback (most recent call last):
File "/root/llama.cpp-master/convert_hf_to_gguf.py", line 4430, in
main()
File "/root/llama.cpp-master/convert_hf_to_gguf.py", line 4424, in main
model_instance.write()
File "/root/llama.cpp-master/convert_hf_to_gguf.py", line 434, in write
self.prepare_metadata(vocab_only=False)
File "/root/llama.cpp-master/convert_hf_to_gguf.py", line 427, in prepare_metadata
self.set_vocab()
File "/root/llama.cpp-master/convert_hf_to_gguf.py", line 2554, in set_vocab
tokens, toktypes, tokpre = self.get_vocab_base()
File "/root/llama.cpp-master/convert_hf_to_gguf.py", line 515, in get_vocab_base
tokpre = self.get_vocab_base_pre(tokenizer)
File "/root/llama.cpp-master/convert_hf_to_gguf.py", line 671, in get_vocab_base_pre
raise NotImplementedError("BPE pre-tokenizer was not recognized - update get_vocab_base_pre()")
NotImplementedError: BPE pre-tokenizer was not recognized - update get_vocab_base_pre()

Name and Version

python convert_hf_to_gguf.py /data/model/BAAI/bge-large-zh-v1.5/ --outfile text2vec-base-chinese.gguf --model-name bert-bge

What operating system are you seeing the problem on?

No response

Relevant log output

No response

github-actions · 2024-12-01T01:08:01Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

smileyboy2019 added bug-unconfirmed medium severity Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable) labels Oct 17, 2024

github-actions bot added the stale label Nov 17, 2024

github-actions bot closed this as completed Dec 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: WARNING: The BPE pre-tokenizer was not recognized! #9927

Bug: WARNING: The BPE pre-tokenizer was not recognized! #9927

smileyboy2019 commented Oct 17, 2024

github-actions bot commented Dec 1, 2024

Bug: WARNING: The BPE pre-tokenizer was not recognized! #9927

Bug: WARNING: The BPE pre-tokenizer was not recognized! #9927

Comments

smileyboy2019 commented Oct 17, 2024

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

github-actions bot commented Dec 1, 2024