Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AI] Expose configuring the timeout for the AI Extension #10739

Open
npolshakova opened this issue Mar 4, 2025 · 0 comments
Open

[AI] Expose configuring the timeout for the AI Extension #10739

npolshakova opened this issue Mar 4, 2025 · 0 comments
Labels
Type: Enhancement New feature or request

Comments

@npolshakova
Copy link
Contributor

npolshakova commented Mar 4, 2025

kgateway version

main

Is your feature request related to a problem? Please describe.

We should expose configuring the timeout for the AI extension.

Describe alternatives you've considered

  • Having a default retires/timeouts policy hard coded

Additional Context

Envoy can emit the buffer size limit (64KB) for async client retries has been exceeded warning if the retry_policy is incorrectly set. If the chat history grows and retries are set, this warning starts to be emitted from the envoy log.

@andy-fong investigated this and found:

  • This buffer is used for retry only; however, it adds the data to this buffer as the data stream in just in case it needs to retry and resend all the data.
  • Before 1.28 envoy, it used to just keep appending to this buffer if you have a retry_policy set. This causes some scenario like grpc stream to "leak" memory because the memory just hang there as long as the stream is still open. Reference: memory leak in gRPC access logs envoyproxy/envoy#19699
  • so, they put a limit (64k) to stop the leak as a temp solution and hence this warning appears.
    This warning message itself is not an indication of an issue. It's only an issue if you happen to need to retry and envoy will basically drop the data beyond 64k silently.
  • specific to ext_proc filter, before 1.32.2 envoy, it ALWAYS add to this buffer even there is no retry_policy set, so it's doing all the work and keeping the memory for nothing. It's been addressed in this [commit]

We are currently not setting retries or timeouts, but would like to make this configurable. As part of making these fields configurable, there are a couple things to note:

  • Once Envoy has sent something to ext_proc, a retry can be bad. For example, if it retries on timeout, ext_proc already processed and buffered the data but might be taking too long to return, if envoy retry with the same data, it will cause ext_proc to duplicate the data and also probably will mess up the json parsing.
  • On retry for initial connection failure (might be the only legit retry scenario for our use case), with the hard-coded retry solution that can lose data, I think it will do more harm than good by "randomly" dropping data silently (if the data exceeds 64k).

Returning an error and letting the user retry would be much better than randomly dropping data, so instead of allowing the retries to be configurable we may want the behavior to be that if the connection dies, we just need to return an error code and start over. It might be worth investigating if we can get the filter chain iteration to start over here.

There's also a related bug that was recently addressed: envoyproxy/envoy#36119 where envoy would send a duplicated request body to ext_proc when STREAMED is used and body size > ~1MB

@npolshakova npolshakova added the Type: Enhancement New feature or request label Mar 4, 2025
@npolshakova npolshakova mentioned this issue Mar 4, 2025
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant