-
From the code I see that both |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
@alexeyche that's correct that the object can only process a single request at a time. llama.cpp doesn't yet support batching requests so there's no real way to make this possible until that happens. The alternative "solution" that I'm working on is to allow users to load multiple models at the same time, but this will take twice the amount of RAM and likely be quite slow. |
Beta Was this translation helpful? Give feedback.
-
I'd love to see the ability for simultaneous requests! |
Beta Was this translation helpful? Give feedback.
@alexeyche that's correct that the object can only process a single request at a time. llama.cpp doesn't yet support batching requests so there's no real way to make this possible until that happens. The alternative "solution" that I'm working on is to allow users to load multiple models at the same time, but this will take twice the amount of RAM and likely be quite slow.