-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: maintain chat completion id for streaming responses #5880
server: maintain chat completion id for streaming responses #5880
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is LGTM, one thing that would be nice if you can do:
We're having multiple places calling gen_chatcmplid
. It would be nice to call gen_chatcmplid
only once for each incoming request, then use the generated id in both format_final_response_oaicompat
and format_partial_response_oaicompat
Just small detail though, we can do it later. I'm planning to clean up all the functions related to OAI-compat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM, thank you. I'll wait for @ggerganov to decide if this can be merged now or after #5882
Thanks for taking a look. Will be reimplemented in #5882 - no need to merge this PR |
Thank you for the quick turnaround, @mscheong01 @ggerganov . I don't have much to add - it works fine and generates consistent chat completion id. Only issue is when consumed by OpenAI client (as in the code second block of #5876 (comment)) it seems that I would still get an error ( Adding llama.cpp/examples/server/oai.hpp Line 176 in 293378b
llama.cpp/examples/server/oai.hpp Line 197 in 293378b
|
@mscheong01 If you can re-apply the changes on top of latest |
fixes #5876
tested code ( provided by @xyc )
result (before)
result (after)