server: maintain chat completion id for streaming responses #5880

mscheong01 · 2024-03-05T08:41:41Z

tested code ( provided by @xyc )

import OpenAI from "openai";

process.env["OPENAI_API_KEY"] =
  "no-key";

const openai = new OpenAI({
  baseURL: "http://127.0.0.1:8080/v1",
  apiKey: "no-key",
});

async function main() {
  const stream = await openai.chat.completions.create({
    model: "gpt-3.5-turbo",
    messages: [{ role: "user", content: "Say this is a test" }],
    stream: true,
  });
  for await (const chunk of stream) {
    process.stdout.write(JSON.stringify(chunk));
  }
}

main();

result (before)

{"choices":[{"delta":{"content":"S"},"finish_reason":null,"index":0}],"created":1709628034,"id":"chatcmpl-LMLcrWfJ29GgNvzP7tYhSSC32xARD6p1","model":"gpt-3.5-turbo","object":"chat.completion.chunk"}{"choices":[{"delta":{"content":"ay"},"finish_reason":null,"index":0}],"created":1709628034,"id":"chatcmpl-B5x5NUWQhvUCrh4oPQkigOfFvgdEe6kM","model":"gpt-3.5-turbo","object":"chat.completion.chunk"}{"choices":[{"delta":{"content":" this"},"finish_reason":null,"index":0}],"created":1709628046,"id":"chatcmpl-34ohfUOtEixbtgGVJY9CIkPFZ1BCwC3o","model":"gpt-3.5-turbo","object":"chat.completion.chunk"}{"choices":[{"delta":{"content":" is"},"finish_reason":null,"index":0}],"created":1709628048,"id":"chatcmpl-mk5o2qKT46tvoShPZ2LNr3RjiDupMIGt","model":"gpt-3.5-turbo","object":"chat.completion.chunk"}{"choices":[{"delta":{"content":" a"},"finish_reason":null,"index":0}],"created":1709628049,"id":"chatcmpl-6RXeg13CV7h1jhDDsbTUfoy0nysh67iq","model":"gpt-3.5-turbo","object":"chat.completion.chunk"}{"choices":[{"delta":{"content":" test"},"finish_reason":null,"index":0}],"created":1709628050,"id":"chatcmpl-hnlmLdKAvr0YV0eXUcIfc6uLYDdHkKXV","model":"gpt-3.5-turbo","object":"chat.completion.chunk"}{"choices":[{"delta":{},"finish_reason":"stop","index":0}],"created":1709628073,"id":"chatcmpl-zTcUBKVba00yAd0PylbJJFEcbqJr5RXs","model":"gpt-3.5-turbo","object":"chat.completion.chunk"}

result (after)

{"choices":[{"delta":{"content":"S"},"finish_reason":null,"index":0}],"created":1709627698,"id":"chatcmpl-OvKzfgQykk3jjHM2liQCrwnCwzJWKUAk","model":"gpt-3.5-turbo","object":"chat.completion.chunk"}{"choices":[{"delta":{"content":"ay"},"finish_reason":null,"index":0}],"created":1709627706,"id":"chatcmpl-OvKzfgQykk3jjHM2liQCrwnCwzJWKUAk","model":"gpt-3.5-turbo","object":"chat.completion.chunk"}{"choices":[{"delta":{"content":" this"},"finish_reason":null,"index":0}],"created":1709627707,"id":"chatcmpl-OvKzfgQykk3jjHM2liQCrwnCwzJWKUAk","model":"gpt-3.5-turbo","object":"chat.completion.chunk"}{"choices":[{"delta":{"content":" is"},"finish_reason":null,"index":0}],"created":1709627708,"id":"chatcmpl-OvKzfgQykk3jjHM2liQCrwnCwzJWKUAk","model":"gpt-3.5-turbo","object":"chat.completion.chunk"}{"choices":[{"delta":{"content":" a"},"finish_reason":null,"index":0}],"created":1709627709,"id":"chatcmpl-OvKzfgQykk3jjHM2liQCrwnCwzJWKUAk","model":"gpt-3.5-turbo","object":"chat.completion.chunk"}{"choices":[{"delta":{"content":" test"},"finish_reason":null,"index":0}],"created":1709627710,"id":"chatcmpl-OvKzfgQykk3jjHM2liQCrwnCwzJWKUAk","model":"gpt-3.5-turbo","object":"chat.completion.chunk"}{"choices":[{"delta":{},"finish_reason":"stop","index":0}],"created":1709627727,"id":"chatcmpl-OvKzfgQykk3jjHM2liQCrwnCwzJWKUAk","model":"gpt-3.5-turbo","object":"chat.completion.chunk"}

examples/server/server.cpp

ngxson

This PR is LGTM, one thing that would be nice if you can do:

We're having multiple places calling gen_chatcmplid. It would be nice to call gen_chatcmplid only once for each incoming request, then use the generated id in both format_final_response_oaicompat and format_partial_response_oaicompat

Just small detail though, we can do it later. I'm planning to clean up all the functions related to OAI-compat.

ngxson

This LGTM, thank you. I'll wait for @ggerganov to decide if this can be merged now or after #5882

ggerganov · 2024-03-06T07:17:52Z

Thanks for taking a look. Will be reimplemented in #5882 - no need to merge this PR

xyc · 2024-03-06T07:26:09Z

Thank you for the quick turnaround, @mscheong01 @ggerganov .

I don't have much to add - it works fine and generates consistent chat completion id. Only issue is when consumed by OpenAI client (as in the code second block of #5876 (comment)) it seems that I would still get an error (missing role for choice 0)

Adding {"role", "assistant"}, before these lines seems to fix it.

llama.cpp/examples/server/oai.hpp

Line 176 in 293378b

{"content", content}}}

llama.cpp/examples/server/oai.hpp

Line 197 in 293378b

{"content", content},

ggerganov · 2024-03-08T09:36:14Z

@mscheong01 If you can re-apply the changes on top of latest master we can merge this. Will close this for now

server: maintain chat completion id for streaming responses

c5efd83

ggerganov reviewed Mar 5, 2024

View reviewed changes

examples/server/server.cpp Outdated Show resolved Hide resolved

persist completion_id only for same stream

aa95dc5

ngxson reviewed Mar 5, 2024

View reviewed changes

consolidate gen_chatcmplid() callsite

293378b

ngxson approved these changes Mar 5, 2024

View reviewed changes

ggerganov closed this Mar 8, 2024

mscheong01 mentioned this pull request Mar 11, 2024

server: maintain chat completion id for streaming responses #5988

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: maintain chat completion id for streaming responses #5880

server: maintain chat completion id for streaming responses #5880

mscheong01 commented Mar 5, 2024

ngxson left a comment •

edited

Loading

ngxson left a comment •

edited

Loading

ggerganov commented Mar 6, 2024

xyc commented Mar 6, 2024

ggerganov commented Mar 8, 2024

server: maintain chat completion id for streaming responses #5880

server: maintain chat completion id for streaming responses #5880

Conversation

mscheong01 commented Mar 5, 2024

ngxson left a comment • edited Loading

Choose a reason for hiding this comment

ngxson left a comment • edited Loading

Choose a reason for hiding this comment

ggerganov commented Mar 6, 2024

xyc commented Mar 6, 2024

ggerganov commented Mar 8, 2024

ngxson left a comment •

edited

Loading

ngxson left a comment •

edited

Loading