NO-JIRA: [Python] IO: Add ENETUNREACH to the list of tolerated errors #365

Jmennius · 2022-04-07T09:59:25Z

...which will enable reconnection logic to act in this case.
ENETUNREACH can happen when target network is unreachable for example
when the network stack was not fully initialized yet or when a network
is not connected temporarily, etc.
This makes ENETUNREACH handled just like EHOSTUNREACH
(which is for some reason indicated with EINPROGRESS in this part of the code).

...which will enable reconnection logic to act in this case. ENETUNREACH can happen when target network is unreachable for example when the network stack was not fully initialized yet or when a network is not connected temporarily, etc. This makes ENETUNREACH handled just like EHOSTUNREACH (which is for some reason indicated with EINPROGRESS in this part of the code). Signed-off-by: Ievgen Popovych <[email protected]>

codecov-commenter · 2022-04-07T10:49:24Z

Codecov Report

Merging #365 (bb27336) into main (a920192) will increase coverage by 20.11%.
The diff coverage is n/a.

❗ Current head bb27336 differs from pull request most recent head ba58d8c. Consider uploading reports for the commit ba58d8c to get more accurate results

@@             Coverage Diff             @@
##             main     #365       +/-   ##
===========================================
+ Coverage   68.24%   88.36%   +20.11%     
===========================================
  Files         367       47      -320     
  Lines       73285     2397    -70888     
===========================================
- Hits        50011     2118    -47893     
+ Misses      23274      279    -22995

Impacted Files	Coverage Δ
python/proton/_io.py
cpp/examples/encode_decode.cpp
cpp/examples/broker.cpp
...est_PROTON_2116_blocking_connection_object_leak.py
cpp/src/transport.cpp
c/src/sasl/default_sasl.c
cpp/src/connection_options.cpp
c/src/core/error.c
cpp/src/container.cpp
c/examples/raw_connect.c
... and 310 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a920192...ba58d8c. Read the comment docs.

astitcher

Thanks for looking at and debugging the issue you found. I'm pretty sure this isn't the correct place for the fix (even if it was the easiest place to fix your issue).

In future it really helps us to raise an issue connected with fixes for tracking purposes also to understand the environment of the issue - was this under Windows/MacOS or Linux for example as the errno behaviour can vary between the platforms.
Certainly your comment about EHOSTUNREACH is not true for Linux, and that seems to me to be a much more important connect failure case than ENETUNREACH.

astitcher · 2022-04-07T14:25:51Z

python/proton/_io.py

@@ -65,7 +65,7 @@ def connect(addr) -> socket.socket:
        try:
            s.connect(addr[4])
        except socket.error as e:
-            if e.errno not in (errno.EINPROGRESS, errno.EWOULDBLOCK, errno.EAGAIN):
+            if e.errno not in (errno.EINPROGRESS, errno.EWOULDBLOCK, errno.EAGAIN, errno.ENETUNREACH):


I'm pretty sure this change doesn't make sense at this point in the code:
The very low level connect is only making sure that a connect on a nonblocking socket worked - the only "non errors" on a non blocking socket are the ones listed - they indicate that the operation is in progress.
ENETUNREACH indicates that the connect operation failed (at this low level). Any retries because of this kind of failure need to be handled at a higher level.

Jmennius · 2022-04-07T15:49:38Z

Thanks for looking at and debugging the issue you found. I'm pretty sure this isn't the correct place for the fix (even if it was the easiest place to fix your issue).

In future it really helps us to raise an issue connected with fixes for tracking purposes also to understand the environment of the issue - was this under Windows/MacOS or Linux for example as the errno behaviour can vary between the platforms. Certainly your comment about EHOSTUNREACH is not true for Linux, and that seems to me to be a much more important connect failure case than ENETUNREACH.

Sure, thanks for the feedback! Should I open an issue (at this point)?
This was on Linux. I am pretty sure that it behaves like I've described it, which is indeed unexpected...

astitcher · 2022-04-07T15:55:27Z

Sure, thanks for the feedback! Should I open an issue (at this point)?

I think this deserves an issue - although there may already be an issue about reconnect not correctly working if the failure is the initial connect operation.

This was on Linux. I am pretty sure that it behaves like I've described it, which is indeed unexpected...

It could well be that in the case of EHOSTUNREACH this doesn't get discovered immediately which makes it EINPROGRESS, but eventually fails when the target router sends back the unreachable ICMP packet.

Jmennius · 2022-04-07T17:02:34Z

Opened PROTON-2528.

astitcher requested changes Apr 7, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NO-JIRA: [Python] IO: Add ENETUNREACH to the list of tolerated errors #365

NO-JIRA: [Python] IO: Add ENETUNREACH to the list of tolerated errors #365

Jmennius commented Apr 7, 2022

codecov-commenter commented Apr 7, 2022

astitcher left a comment

astitcher Apr 7, 2022

Jmennius commented Apr 7, 2022

astitcher commented Apr 7, 2022

Jmennius commented Apr 7, 2022

NO-JIRA: [Python] IO: Add ENETUNREACH to the list of tolerated errors #365

Are you sure you want to change the base?

NO-JIRA: [Python] IO: Add ENETUNREACH to the list of tolerated errors #365

Conversation

Jmennius commented Apr 7, 2022

codecov-commenter commented Apr 7, 2022

Codecov Report

astitcher left a comment

Choose a reason for hiding this comment

astitcher Apr 7, 2022

Choose a reason for hiding this comment

Jmennius commented Apr 7, 2022

astitcher commented Apr 7, 2022

Jmennius commented Apr 7, 2022