You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
We should expose configuring the timeout for the AI extension.
Describe alternatives you've considered
Having a default retires/timeouts policy hard coded
Additional Context
Envoy can emit the buffer size limit (64KB) for async client retries has been exceeded warning if the retry_policy is incorrectly set. If the chat history grows and retries are set, this warning starts to be emitted from the envoy log.
This buffer is used for retry only; however, it adds the data to this buffer as the data stream in just in case it needs to retry and resend all the data.
Before 1.28 envoy, it used to just keep appending to this buffer if you have a retry_policy set. This causes some scenario like grpc stream to "leak" memory because the memory just hang there as long as the stream is still open. Reference: memory leak in gRPC access logs envoyproxy/envoy#19699
so, they put a limit (64k) to stop the leak as a temp solution and hence this warning appears.
This warning message itself is not an indication of an issue. It's only an issue if you happen to need to retry and envoy will basically drop the data beyond 64k silently.
specific to ext_proc filter, before 1.32.2 envoy, it ALWAYS add to this buffer even there is no retry_policy set, so it's doing all the work and keeping the memory for nothing. It's been addressed in this [commit]
We are currently not setting retries or timeouts, but would like to make this configurable. As part of making these fields configurable, there are a couple things to note:
Once Envoy has sent something to ext_proc, a retry can be bad. For example, if it retries on timeout, ext_proc already processed and buffered the data but might be taking too long to return, if envoy retry with the same data, it will cause ext_proc to duplicate the data and also probably will mess up the json parsing.
On retry for initial connection failure (might be the only legit retry scenario for our use case), with the hard-coded retry solution that can lose data, I think it will do more harm than good by "randomly" dropping data silently (if the data exceeds 64k).
Returning an error and letting the user retry would be much better than randomly dropping data, so instead of allowing the retries to be configurable we may want the behavior to be that if the connection dies, we just need to return an error code and start over. It might be worth investigating if we can get the filter chain iteration to start over here.
There's also a related bug that was recently addressed: envoyproxy/envoy#36119 where envoy would send a duplicated request body to ext_proc when STREAMED is used and body size > ~1MB
The text was updated successfully, but these errors were encountered:
kgateway version
main
Is your feature request related to a problem? Please describe.
We should expose configuring the timeout for the AI extension.
Describe alternatives you've considered
Additional Context
Envoy can emit the
buffer size limit (64KB) for async client retries has been exceeded
warning if theretry_policy
is incorrectly set. If the chat history grows and retries are set, this warning starts to be emitted from the envoy log.@andy-fong investigated this and found:
retry_policy
set. This causes some scenario like grpc stream to "leak" memory because the memory just hang there as long as the stream is still open. Reference: memory leak in gRPC access logs envoyproxy/envoy#19699This warning message itself is not an indication of an issue. It's only an issue if you happen to need to retry and envoy will basically drop the data beyond 64k silently.
We are currently not setting retries or timeouts, but would like to make this configurable. As part of making these fields configurable, there are a couple things to note:
Returning an error and letting the user retry would be much better than randomly dropping data, so instead of allowing the retries to be configurable we may want the behavior to be that if the connection dies, we just need to return an error code and start over. It might be worth investigating if we can get the filter chain iteration to start over here.
There's also a related bug that was recently addressed: envoyproxy/envoy#36119 where envoy would send a duplicated request body to ext_proc when STREAMED is used and body size > ~1MB
The text was updated successfully, but these errors were encountered: