Is the server can be run with multiple states? #257

alexeyche · 2023-05-22T07:40:53Z

alexeyche
May 22, 2023

From the code I see that both self.eval_tokens and self.eval_logits are shared and basically clash, when two separate requests will come. Am I reading it correctly, it can't be used with anything bigger than 1 global chat state, meaning having basically just one user?

Answered by abetlen

May 22, 2023

@alexeyche that's correct that the object can only process a single request at a time. llama.cpp doesn't yet support batching requests so there's no real way to make this possible until that happens. The alternative "solution" that I'm working on is to allow users to load multiple models at the same time, but this will take twice the amount of RAM and likely be quite slow.

View full answer

abetlen · 2023-05-22T21:20:07Z

abetlen
May 22, 2023
Maintainer

@alexeyche that's correct that the object can only process a single request at a time. llama.cpp doesn't yet support batching requests so there's no real way to make this possible until that happens. The alternative "solution" that I'm working on is to allow users to load multiple models at the same time, but this will take twice the amount of RAM and likely be quite slow.

1 reply

alexeyche May 22, 2023
Author

Got it! Thank you!

dillfrescott · 2023-07-02T19:41:01Z

dillfrescott
Jul 2, 2023

I'd love to see the ability for simultaneous requests!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is the server can be run with multiple states? #257

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

Is the server can be run with multiple states? #257

alexeyche May 22, 2023

Replies: 2 comments · 1 reply

abetlen May 22, 2023 Maintainer

alexeyche May 22, 2023 Author

dillfrescott Jul 2, 2023

alexeyche
May 22, 2023

Replies: 2 comments 1 reply

abetlen
May 22, 2023
Maintainer

alexeyche May 22, 2023
Author

dillfrescott
Jul 2, 2023