-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: GGML_ASSERT(llama_add_eos_token(model) != 1) failed llama-server critical error with flan-t5 models #8990
Comments
Probably fixed via #8997 |
@ggerganov I think #8997 fixes only T5 model loading, but they wont work correctly. For T5 models to work llama.cpp/examples/main/main.cpp Lines 540 to 556 in d3ae0ee
|
I have the same issue with
Pulled #8997 but issue remains.
|
In my case, I quantized the original HuggingFace SmolLM to https://huggingface.co/aisuko/SmolLM-135M-Instruct-gguf. And it works fine with However, it doesn't work on the fine-tuned version of When I fine-tuning the SmoILM, the tokenizer part is below:
After I convert the
|
@Aisuko I think the problem is that your model has |
COOL COOL, thank you @fairydreaming I will test it later. Update: It works. And you are right. Thanks. |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
What happened?
direct llama-cli call for Flan-T5 based models working fine. When trying to set up server, critical error stop execution
model repo: https://huggingface.co/Felladrin/gguf-LaMini-Flan-T5-248M
model file: https://huggingface.co/Felladrin/gguf-LaMini-Flan-T5-248M/resolve/main/LaMini-Flan-T5-248M.Q8_0.gguf
Name and Version
.\llama-cli.exe --version
Windows 11 with Python 3.11
What operating system are you seeing the problem on?
Windows
Relevant log output
The text was updated successfully, but these errors were encountered: