-
Notifications
You must be signed in to change notification settings - Fork 11k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: update server streaming mode documentation #9519
Conversation
aa3b48a
to
aa32f90
Compare
Updated the streaming mode example script with split data handling, which has been tested with these unit tests:
Avoided using Node.js |
Btw have you been able to test it with latest version on |
It seems like #9459 only added the |
Example script still works with b4291 (ce8784b), but changed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, I think it's not a good idea to add this to our documentation. Because we already follow SSE standard (except for the POST method), client code should be trivial to implement.
The documentation should be reserved for things that can only be found in llama.cpp and not on the internet.
In this case, the code you provided is the same as openai implementation (because they also use SSE+POST method), there are many libraries on npm that can handle this (for example, this). So adding this here brings no more additional info to the docs, while adding maintenance cost in the future.
5f375b6
to
9fbd0b4
Compare
Removed example code. |
Provide more documentation for streaming mode.
9fbd0b4
to
40c0724
Compare
Suggestions implemented. |
Provide more documentation for streaming mode.
Provide more documentation for streaming mode.
Provide more documentation for streaming mode.
Server documentation:
n_predict
in existing non-streamed example script (because on some computers 512 tokens can take a long time)