docs: update server streaming mode documentation #9519

CentricStorm · 2024-09-17T05:03:29Z

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

Server documentation:

Mentioned that streaming mode uses a different response format
Added a link to documentation of that response format
Reduced the size of n_predict in existing non-streamed example script (because on some computers 512 tokens can take a long time)
Made style of existing non-streamed example script more consistent

examples/server/README.md

CentricStorm · 2024-11-28T04:16:01Z

Updated the streaming mode example script with split data handling, which has been tested with these unit tests:

// Chunks with a total of 9 tokens: " token token token token, token, token,", split at different positions.
const chunks = [
    `data: {"co`,
    `ntent":" token"}\n\n`,
    `data: {"content":" token"}\n`,
    `\n`,
    `data: {"content":" token"}\n\n`,
    `data: {"content":" token"}\n\ndata: {"co`,
    `ntent":","}\n\n`,
    `data: {"content":" token"}\n\ndata: {"content":","}\n`,
    `\n`,
    `data: {"content":" token"}\n\ndata: {"content":","}\n\n`
]

Avoided using Node.js readline so that it can work in browsers as well with minimal modification.

ngxson · 2024-11-28T10:53:25Z

Btw have you been able to test it with latest version on master branch? We added [DONE] as last data chunk at some point, to be aligned with openai implementation

CentricStorm · 2024-11-29T01:02:17Z

Btw have you been able to test it with latest version on master branch? We added [DONE] as last data chunk at some point, to be aligned with openai implementation

It seems like #9459 only added the [DONE] event for the OpenAI-compatible /chat/completions API, not for the /completion API that this example uses:

https://github.com/ggerganov/llama.cpp/blob/678d7994f4da0af3d29046be99950ac999ee9762/examples/server/server.cpp#L3027

CentricStorm · 2024-12-09T05:26:49Z

Example script still works with b4291 (ce8784b), but changed localhost to 127.0.0.1 because of a separate issue with the latest version of Node.js.

ngxson

On second thought, I think it's not a good idea to add this to our documentation. Because we already follow SSE standard (except for the POST method), client code should be trivial to implement.

The documentation should be reserved for things that can only be found in llama.cpp and not on the internet.

In this case, the code you provided is the same as openai implementation (because they also use SSE+POST method), there are many libraries on npm that can handle this (for example, this). So adding this here brings no more additional info to the docs, while adding maintenance cost in the future.

CentricStorm · 2024-12-11T06:02:28Z

On second thought, I think it's not a good idea to add this to our documentation. Because we already follow SSE standard (except for the POST method), client code should be trivial to implement.

The documentation should be reserved for things that can only be found in llama.cpp and not on the internet.

Removed example code.

examples/server/README.md

Provide more documentation for streaming mode.

CentricStorm · 2024-12-11T22:01:43Z

Suggestions implemented.

Provide more documentation for streaming mode.

github-actions bot added examples server labels Sep 17, 2024

ggerganov approved these changes Sep 17, 2024

View reviewed changes

CentricStorm mentioned this pull request Sep 17, 2024

Bug: Last 2 Chunks In Streaming Mode Come Together In Firefox #9502

Closed

ngxson reviewed Sep 18, 2024

View reviewed changes

examples/server/README.md Outdated Show resolved Hide resolved

ngxson approved these changes Sep 18, 2024

View reviewed changes

CentricStorm force-pushed the patch-2 branch 4 times, most recently from aa3b48a to aa32f90 Compare November 28, 2024 03:54

CentricStorm requested review from ngxson and ggerganov November 28, 2024 04:27

ggerganov approved these changes Nov 28, 2024

View reviewed changes

CentricStorm closed this Dec 9, 2024

CentricStorm deleted the patch-2 branch December 9, 2024 04:43

CentricStorm restored the patch-2 branch December 9, 2024 04:43

CentricStorm deleted the patch-2 branch December 9, 2024 04:43

CentricStorm restored the patch-2 branch December 9, 2024 04:45

CentricStorm deleted the patch-2 branch December 9, 2024 04:46

CentricStorm restored the patch-2 branch December 9, 2024 04:53

CentricStorm reopened this Dec 9, 2024

CentricStorm force-pushed the patch-2 branch from aa32f90 to 5f375b6 Compare December 9, 2024 05:09

ngxson requested changes Dec 9, 2024

View reviewed changes

CentricStorm force-pushed the patch-2 branch from 5f375b6 to 9fbd0b4 Compare December 11, 2024 05:53

ngxson requested changes Dec 11, 2024

View reviewed changes

examples/server/README.md Outdated Show resolved Hide resolved

examples/server/README.md Outdated Show resolved Hide resolved

docs: update server streaming mode documentation

40c0724

Provide more documentation for streaming mode.

CentricStorm force-pushed the patch-2 branch from 9fbd0b4 to 40c0724 Compare December 11, 2024 21:57

ngxson approved these changes Dec 11, 2024

View reviewed changes

ngxson merged commit 5555c0c into ggml-org:master Dec 11, 2024
6 checks passed

CentricStorm deleted the patch-2 branch December 12, 2024 02:18

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Dec 20, 2024

docs: update server streaming mode documentation (ggml-org#9519)

a51a27f

Provide more documentation for streaming mode.

tinglou pushed a commit to tinglou/llama.cpp that referenced this pull request Feb 13, 2025

docs: update server streaming mode documentation (ggml-org#9519)

24b37c7

Provide more documentation for streaming mode.

mglambda pushed a commit to mglambda/llama.cpp that referenced this pull request Mar 8, 2025

docs: update server streaming mode documentation (ggml-org#9519)

bb8a517

Provide more documentation for streaming mode.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: update server streaming mode documentation #9519

docs: update server streaming mode documentation #9519

CentricStorm commented Sep 17, 2024 •

edited

Loading

CentricStorm commented Nov 28, 2024 •

edited

Loading

ngxson commented Nov 28, 2024 •

edited

Loading

CentricStorm commented Nov 29, 2024

CentricStorm commented Dec 9, 2024

ngxson left a comment •

edited

Loading

CentricStorm commented Dec 11, 2024

CentricStorm commented Dec 11, 2024

docs: update server streaming mode documentation #9519

docs: update server streaming mode documentation #9519

Conversation

CentricStorm commented Sep 17, 2024 • edited Loading

CentricStorm commented Nov 28, 2024 • edited Loading

ngxson commented Nov 28, 2024 • edited Loading

CentricStorm commented Nov 29, 2024

CentricStorm commented Dec 9, 2024

ngxson left a comment • edited Loading

Choose a reason for hiding this comment

CentricStorm commented Dec 11, 2024

CentricStorm commented Dec 11, 2024

CentricStorm commented Sep 17, 2024 •

edited

Loading

CentricStorm commented Nov 28, 2024 •

edited

Loading

ngxson commented Nov 28, 2024 •

edited

Loading

ngxson left a comment •

edited

Loading