-
Notifications
You must be signed in to change notification settings - Fork 991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Repeated greeting in same chat session #801
Comments
If I put the first two line of the dialog:
Into the program manually as the initial value of messages, then the chatbot can response correctly:
But after that the chatbot start to repeat itself and greeting again.
|
Sorry for the late response. I've been spread thin lately. I see you opened the issue. There's also a related issue as well; iss #800. I would consider this a duplicate issue, as issue #800 more clearly targets the root problem. Maybe we can consolidate our findings there? I have to get back to work, so I'll have less time, but I've decided to focus whatever available time and resources I have on llama.cpp related projects, especially because I depend on them. |
Will go back to 0.2.7 to see whether this issue disappears. Thanks! @teleprint-me |
Yes, I can confirm it starts with #711 and there is conceivable performance drop as well. Here is the conversation I have with the commit before #711 {'role': 'system', 'content': 'start chat'} |
This line looks not right:
If I change the ret line with the message to: |
i believe this is the same/similar issue: #800 |
@earonesty It is the same issue. @delock It's related to the template structure. Look at my PR #781 I originally believed that simply removing |
It looks like @abetlen applied a variation of the original function recommendations but it didn't actually resolve the issue as it's still present in the add-functionary-support branch. Removing the While I'd simply state that I do respect and appreciate the work being done here, I'd also like to state that hard-coding prompts is most likely not a good idea, especially in a library that others rely upon. I don't think I can express and emphasize this enough. It should be a responsibility left to end consumer of the library. |
I applied #781 but still see repeated greetings. If print out the real prompt returned from format_llama2, we can see that with #781 the initial wrong
prompt generated
The reason is the highlighted line down below lack role before message, so the whole conversation after system message lack role switching.
If we fix it like the following, adding the missing role before message:
The prompt returned from format_llama2 will be more consistent with the message history passed to
|
I'll close my PR for this issue since this fixes the core issue. I am drafting a more flexible template system because I think it would be useful for unsupported template structures based on custom fine-tunes. The chat templates rely on the structure of the dataset. I hope that's alright. I'll post a PR once it's ready. |
this is still broken in the main release 0.2.11, and this PR does not fully fix all of the problems. applying your fix, the first call works...
but the second call does not
i suspect there are 2 bugs, one of which is fixed by the above. |
If you guys have some time, check out my Draft PR #809 which aims to resolve the overall issues with the current design. I suspect @abetlen is more focused on supporting function capabilities at the moment; I put my draft on hold for this reason. I don't see an immediate fix for this at the moment. The only solution is to use v0.2.7 for the time being until either a redesign occurs or a proper solution that allows for more clarity on proper templating support. The only thing I can say with any sort of confidence is the current implementation is difficult to reason about which explains the bugs in the templates. |
Hi @earonesty , do you have a link to your example.py in your description? I want to have an understanding of what else might still be wrong. |
in this case i'm just calling the api twice. not even a long conversation. using the llama cpp server.
the second time i hit the endpoint i get back INST stuff, even with the patch. the first time it's fine. it could be the openai server wrapper thats a problem. not sure yet. havent had time to dig into it, just running the older version for now. |
i suspect it would be the best if we could include the template scripts in model config inside a metadata variable in the GGUF file that way we don't need the caller or user to know or care about the template |
Should be mostly resolved now with auto chat format detection since v0.2.37 |
This is a replication of the problem described in the following link #711 (comment) I met same issue so create an issue to track.
Problem statement
When chat with using the
llm.create_chat_completion
API, the chatbot keeps greeting and repeat what I had input before.Expected Behavior
I'm trying to have a continue conversation with the chat bot, and I expect to have a smooth speech flow.
Current Behavior
A conversation goes like the following with a simple chat program using
llama-2-7b-chat.Q5_K_M.gguf
downloaded from https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF, the simple chat code is from the following PR #575:The response from 'Rob' is generated by llama-cpp-python and we can clearly 'Rob' greets me again and again.
Environment and Context
Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.
$ lscpu
12th Gen Intel(R) Core(TM) i7-12700H, with hyper threading off
Operating System, e.g. for Linux:
$ uname -a
Linux cortex 6.5.3-arch1-1 #1 SMP PREEMPT_DYNAMIC Wed, 13 Sep 2023 08:37:40 +0000 x86_64 GNU/Linux
SDK version, e.g. for Linux:
Failure Information (for bugs)
See current behaivor part
Steps to Reproduce
Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.
simple_chat.py
llama-2-7b-chat.Q5_K_M.gguf
downloaded from https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUFRun `python simple_chat.py --model
Chat freely or input prompt from 'User' in section 'Current behavior'
llama-cpp-python version:
commit 43dfe1e
llama-cpp version:
commit 48edda3
Failure Logs
See 'current behavior section'
The text was updated successfully, but these errors were encountered: