-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unexpected EOF resulting in unable to reconnect #300
Comments
Hey! |
Have you tried to manually reconnect on an error or is it just stuck? I've similar problems with try:
await js.publish(stream, json.dumps(data).encode())
except nats.errors.TimeoutError:
js = nc.jetstream()
# retry
await js.publish(stream, json.dumps(data).encode()) |
Hey @tothandras :) |
Problem
We are running an application built in Python to translate HTTP calls to NATS. This applications acts as our main gateway. This is dockerized (python:3.10-slim) and ran on EKS.
Following shows how network calls come in:
Since 2 weeks we are starting to get problems with nats client disconnecting from server and spitting out the following error:
unexpected EOF
.As you can see in the graphs there is a spike in data received, I can assure you that this is not a DOS, just more requests than usual.
I have the feeling that this is due to to many requests being doing from the app to the resource server. Meaning the nats client is unable to handle a lot of requests at the same time. Do you know if this can be the case and what we can do about it? Is it possible to directly reconnect? Could the issue be related to fastapi (web server) and nats sharing the same loop?
App specs
Connection config:
Versions
Logs
Application
In loki:
In sentry:
Nats Server
No logs (even debug mode)
Graphs
individual pod 5min
data:image/s3,"s3://crabby-images/e2453/e24538758dea0c0f98526f1227853db42f788bd8" alt="image"
data:image/s3,"s3://crabby-images/a51a6/a51a65320387e6ff653a2b02856dad3c95334e22" alt="image"
data:image/s3,"s3://crabby-images/ef513/ef513544e449b849b6303283220ce0139ab99fb0" alt="image"
indiviuael pod 1h
deployment (3 pods running in round robin)
The text was updated successfully, but these errors were encountered: