Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client seems to be hanging the process #159

Open
arun11299 opened this issue Jun 16, 2020 · 3 comments
Open

Client seems to be hanging the process #159

arun11299 opened this issue Jun 16, 2020 · 3 comments

Comments

@arun11299
Copy link
Contributor

arun11299 commented Jun 16, 2020

This is something which has happened like 2 or 3 times in the past 1 - 1.5 months. Still difficult to reproduce but what I think is that the issue happens when the VM (Ubuntu18) is somewhat slow.

We have a process which is heartbeat controlled by another process. Now, this is something which has been working for us for a long time. Even before we introduced nats library. So, we do not see any issue from that side.
The thing is that, all these 2 or 3 times the process was made to crash because it did not respond to the hearbeat messages for over 5 minutes.

These are the few python traces we got from the crashed process:

Thread 1 (Thread 0x7fef63ad0bc0 (LWP 5422)):
Traceback (most recent call first):
  File "/usr/lib/python3.6/asyncio/streams.py", line 638, in read
    del self._buffer[:n]
  File "/usr/local/lib/python3.6/dist-packages/nats/aio/client.py", line 1663, in _read_loop
    b = await self._io_reader.read(DEFAULT_BUFFER_SIZE)
  File "/usr/lib/python3.6/asyncio/events.py", line 145, in _run
    self._callback(*self._args)
  File "/usr/lib/python3.6/asyncio/base_events.py", line 1451, in _run_once
    handle._run()
  File "/usr/lib/python3.6/asyncio/base_events.py", line 438, in run_forever
    self._run_once()
  File "/usr/rift/usr/lib64/python3.6/site-packages/xxx/xxxs/xxx.py", line 156, in scheduler_tick
    self.run_forever()

  File "/usr/local/lib/python3.6/dist-packages/nats/aio/client.py", line 1043, in is_draining
    or self._status == Client.DRAINING_PUBS
  File "/usr/local/lib/python3.6/dist-packages/nats/aio/client.py", line 1033, in is_connected
    return (self._status == Client.CONNECTED) or self.is_draining
  File "/usr/local/lib/python3.6/dist-packages/nats/aio/client.py", line 1657, in _read_loop
    if self.is_connected and self._io_reader.at_eof():
  File "/usr/lib/python3.6/asyncio/events.py", line 145, in _run
    self._callback(*self._args)
  File "/usr/lib/python3.6/asyncio/base_events.py", line 1451, in _run_once
    handle._run()
  File "/usr/lib/python3.6/asyncio/base_events.py", line 438, in run_forever
    self._run_once()
  File "/usr/rift/usr/lib64/python3.6/site-packages/xxx/xxxs/xxx.py", line 156, in scheduler_tick
    self.run_forever()

I am not able to reproduce this at will, so I don't have much information as well. Any ideas what could be going on ?

@arun11299
Copy link
Contributor Author

I am thinking of cases where _read_loop could spin without any yield

@wallyqs
Copy link
Member

wallyqs commented Jun 16, 2020

Thank you @arun11299 for the report, first a few questions:

  • Do you see any disconnections/reconnections from the client either from the client side callbacks or from the NATS Server logs?
  • If there are reconnections, what is the maximum number of reconnections that the client will attempt?
    Just want to discard first that the client is reaching the CLOSED connection state where it gives up connecting. The client will also disconnect from the server if its own ping interval is failing after 6 minutes by default, but you can control this from the client so that it fails faster for example (ping interval every 30s for example):

nats.py/nats/aio/client.py

Lines 279 to 280 in 0f1260d

self.options["ping_interval"] = ping_interval
self.options["max_outstanding_pings"] = max_outstanding_pings

@arun11299
Copy link
Contributor Author

Hi @wallyqs

  • I do not see any disconnections or reconnections happening around the time. From the server side I can only see the usual client ping timer debug logs. From client side also I do not see any prints as I have provided my own callbacks.

  • For ping interval, I am not overriding the defaults. So, I guess its 30 seconds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants