You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@abetlen I noticed that llama.cpp's server supports concurrent requests and continuous batching as well https://github.com/ggerganov/llama.cpp/tree/master/examples/server. To enable that for this library, would it be as straightforward as exposing the relevant command line options? Or am I missing something obvious?
This is an area that has been a headache for me, and I hope someone can answer me
The text was updated successfully, but these errors were encountered: