[BUG] Sending two requests asking for streamed response kills the server #26

Cyb4Black · 2024-11-18T11:26:01Z

See title.
If you try to have it handle 2 Requests in parallel with streaming the response it starts, but dies half way through the answer.

matatonic · 2024-11-18T11:54:56Z

Yes, this is a limit I haven't resolved yet. The only solution I have in the short term is to just block and only process one request at a time, which is probably better done client side so you don't get request timeouts. Is this something that you need?

Dynamic batching is a much more complex solution with vision models, they don't have any consistent ways to batch, sometimes image contexts can be batched sometimes they can't. Most of the time only the chat can be batched, not the image context. this is inconsistent with the expectations of the API so I have not implemented it at all.

The only practical solution I can suggest is to load multiple copies of the server running on different ports, perhaps with a load balancer in front. This is not a good general solution because vision models are typically huge and this would require enormous vram. So again this is not implemented.

Cyb4Black · 2024-11-18T15:11:38Z

Actually for our hackathon we needed to be able to precess multiple requests in parallel and were lucky TGI by Huggingface just recently added Support for MLlama so we currently don't use openedai-vision for now.

Just wanted to make sure you are aware of the bug.

matatonic · 2024-11-18T15:18:33Z

No problem, you may also be interested in knowing that vllm supports a few of the good vision models and is great for multiple concurrent requests.

matatonic · 2025-03-06T00:04:48Z

Batching not supported yet, but simultaneous requests should no longer hang the server as of 0.42.0

Cyb4Black changed the title ~~Sending two requests asking for streamed response kills the server~~ [BUG] Sending two requests asking for streamed response kills the server Nov 18, 2024

matatonic added the bug Something isn't working label Nov 18, 2024

matatonic closed this as completed Mar 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Sending two requests asking for streamed response kills the server #26

[BUG] Sending two requests asking for streamed response kills the server #26

Cyb4Black commented Nov 18, 2024 •

edited

Loading

matatonic commented Nov 18, 2024

Cyb4Black commented Nov 18, 2024

matatonic commented Nov 18, 2024

matatonic commented Mar 6, 2025

[BUG] Sending two requests asking for streamed response kills the server #26

[BUG] Sending two requests asking for streamed response kills the server #26

Comments

Cyb4Black commented Nov 18, 2024 • edited Loading

matatonic commented Nov 18, 2024

Cyb4Black commented Nov 18, 2024

matatonic commented Nov 18, 2024

matatonic commented Mar 6, 2025

Cyb4Black commented Nov 18, 2024 •

edited

Loading