[AI] Expose configuring the timeout for the AI Extension #10739

npolshakova · 2025-03-04T15:37:43Z

kgateway version

main

Is your feature request related to a problem? Please describe.

We should expose configuring the timeout for the AI extension.

Describe alternatives you've considered

Having a default retires/timeouts policy hard coded

Additional Context

Envoy can emit the buffer size limit (64KB) for async client retries has been exceeded warning if the retry_policy is incorrectly set. If the chat history grows and retries are set, this warning starts to be emitted from the envoy log.

@andy-fong investigated this and found:

This buffer is used for retry only; however, it adds the data to this buffer as the data stream in just in case it needs to retry and resend all the data.
Before 1.28 envoy, it used to just keep appending to this buffer if you have a retry_policy set. This causes some scenario like grpc stream to "leak" memory because the memory just hang there as long as the stream is still open. Reference: memory leak in gRPC access logs envoyproxy/envoy#19699
so, they put a limit (64k) to stop the leak as a temp solution and hence this warning appears.
This warning message itself is not an indication of an issue. It's only an issue if you happen to need to retry and envoy will basically drop the data beyond 64k silently.
specific to ext_proc filter, before 1.32.2 envoy, it ALWAYS add to this buffer even there is no retry_policy set, so it's doing all the work and keeping the memory for nothing. It's been addressed in this [commit]

We are currently not setting retries or timeouts, but would like to make this configurable. As part of making these fields configurable, there are a couple things to note:

Once Envoy has sent something to ext_proc, a retry can be bad. For example, if it retries on timeout, ext_proc already processed and buffered the data but might be taking too long to return, if envoy retry with the same data, it will cause ext_proc to duplicate the data and also probably will mess up the json parsing.
On retry for initial connection failure (might be the only legit retry scenario for our use case), with the hard-coded retry solution that can lose data, I think it will do more harm than good by "randomly" dropping data silently (if the data exceeds 64k).

Returning an error and letting the user retry would be much better than randomly dropping data, so instead of allowing the retries to be configurable we may want the behavior to be that if the connection dies, we just need to return an error code and start over. It might be worth investigating if we can get the filter chain iteration to start over here.

There's also a related bug that was recently addressed: envoyproxy/envoy#36119 where envoy would send a duplicated request body to ext_proc when STREAMED is used and body size > ~1MB

The text was updated successfully, but these errors were encountered:

npolshakova added the Type: Enhancement New feature or request label Mar 4, 2025

npolshakova mentioned this issue Mar 4, 2025

Add AI plugin #10627

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AI] Expose configuring the timeout for the AI Extension #10739

[AI] Expose configuring the timeout for the AI Extension #10739

npolshakova commented Mar 4, 2025 •

edited

Loading

[AI] Expose configuring the timeout for the AI Extension #10739

[AI] Expose configuring the timeout for the AI Extension #10739

Comments

npolshakova commented Mar 4, 2025 • edited Loading

kgateway version

Is your feature request related to a problem? Please describe.

Describe alternatives you've considered

Additional Context

npolshakova commented Mar 4, 2025 •

edited

Loading