-
Notifications
You must be signed in to change notification settings - Fork 7.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem with phi-3.5-mini-instruct chat template and endless generation #2930
Comments
Thanks for the question, but I had set that to Never. I did some more tests and interestingly now only new answers were generated without new questions (in the previous tests I modified templates several times, maybe questions were generated for certain prompt templates). Here is one new example (I had to stop the generation):
Please consider the generated text Then I extended the prompt template so that there were in following lines in the file
That didn't resolved the problem. Here is an example (I had to stop the generation again):
PLease pay attention to the fact that there is no generated text My assumption is now that the problem is that the stop token from the model is not handled correctly. |
Ah, I understand now. I reproduced this by downloading that model (the Phi3-medium variant) from HuggingFace, setting the standard prompt per the model card, and I had the same issue with the prompt text inserting itself into the output/reply to my session. Looks like ThiloteE is suggesting to use the Community built variant of this model. |
@ThiloteE: Thanks for supplying the model. I tried the Q6_K version. The result is:
Here is an example for the second case:
In my test I did the following:
My impression was that some first cycles (step 2 up to step 4) after starting GPT4All worked well but then the next cycles didn't stopped. But that could have been a coincidence. Could it be that the model produces several stop tokens and only some of them are processed correctly by GPT4All? |
Have you also changed the prompt template to the one that I suggested? Or do you still use the old one from Bartowski? I would suggest this chat template for the GPT4All-Community version:
Yes, the model is trained to use multiple stop tokens and GPT4All only can parse one of them unfortunately. I believe the template and the one that I provided should be the one that is triggered earlier in their chat template and more often than the other one, but stopping the generation early could confuse the model, so it's a little bit of a hack probably. In my personal tests I encountered zero problems though, otherwise I would not have uploaded the model. It is a little disheartening to hear that you report the model still behaves abnormaly, because that means my methodology to create those quants might be far from perfect. The devs at Nomic know about the problem with parsing eos in the chat templates and are (probably) working on a fix, but it is not ready at present time. |
Yes, I used the prompt template defined on the web site of the model and cited in your comment. I just checked my impression that the first answer after starting GPT4All works well, i.e. the generation stops automatically. Unfortunately, the impression is not generally true. Thanks for your work again. I think it is an important part for the investigation of the problem and for solving it in the end. |
Seems like an upstream issue with llama.cpp. See ggerganov/llama.cpp#9127. It has been reported to start at 4096 tokens. There are also more reports at the Kobolcpp repository. It has been suggested that it might be related to rope scaling, which is a technique to extend context length. Since people with the default prompt template have these issues too, the cause is probably not my quantization method. Phew |
try this , leave the system prompt empty use this for prompt template
|
ggerganov/llama.cpp#9396 has been merged in llama.cpp, which might possibly fix this issue here. |
Bug Report
I want to use the new model Phi-3.5-mini-instruct and downloaded the file
Phi-3.5-mini-instruct-Q5_K_M.gguf
from https://huggingface.co/bartowski/Phi-3.5-mini-instruct-GGUFThere is the following prompt format stated:
Therefore I used that information in the settings. As the result I have the following section in the file
GPT4All.ini
:But when I use the model sometimes after the answer an new question is automatically generated an answered. I suppose the reason for that has to do with the prompt template or with the processing of the prompt template.
Can I modify the prompt template for the correct function of this model (and similar for other models I download from Hugging Face)?
There seems to be information about the prompt template in the GGUF meta data. Would it be possible that this information is automatically used by GPT4All?
Steps to Reproduce
GPT4All.ini
Expected Behavior
Only the questions from the user should be answered and no new question or task should be generated.
Your Environment
The text was updated successfully, but these errors were encountered: