Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NO-JIRA: [Python] IO: Add ENETUNREACH to the list of tolerated errors #365

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Jmennius
Copy link
Contributor

@Jmennius Jmennius commented Apr 7, 2022

...which will enable reconnection logic to act in this case.
ENETUNREACH can happen when target network is unreachable for example
when the network stack was not fully initialized yet or when a network
is not connected temporarily, etc.
This makes ENETUNREACH handled just like EHOSTUNREACH
(which is for some reason indicated with EINPROGRESS in this part of the code).

...which will enable reconnection logic to act in this case.
ENETUNREACH can happen when target network is unreachable for example
when the network stack was not fully initialized yet or when a network
is not connected temporarily, etc.
This makes ENETUNREACH handled just like EHOSTUNREACH
(which is for some reason indicated with EINPROGRESS in this part of the code).

Signed-off-by: Ievgen Popovych <[email protected]>
@codecov-commenter
Copy link

Codecov Report

Merging #365 (bb27336) into main (a920192) will increase coverage by 20.11%.
The diff coverage is n/a.

❗ Current head bb27336 differs from pull request most recent head ba58d8c. Consider uploading reports for the commit ba58d8c to get more accurate results

@@             Coverage Diff             @@
##             main     #365       +/-   ##
===========================================
+ Coverage   68.24%   88.36%   +20.11%     
===========================================
  Files         367       47      -320     
  Lines       73285     2397    -70888     
===========================================
- Hits        50011     2118    -47893     
+ Misses      23274      279    -22995     
Impacted Files Coverage Δ
python/proton/_io.py
cpp/examples/encode_decode.cpp
cpp/examples/broker.cpp
...est_PROTON_2116_blocking_connection_object_leak.py
cpp/src/transport.cpp
c/src/sasl/default_sasl.c
cpp/src/connection_options.cpp
c/src/core/error.c
cpp/src/container.cpp
c/examples/raw_connect.c
... and 310 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a920192...ba58d8c. Read the comment docs.

Copy link
Member

@astitcher astitcher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking at and debugging the issue you found. I'm pretty sure this isn't the correct place for the fix (even if it was the easiest place to fix your issue).

In future it really helps us to raise an issue connected with fixes for tracking purposes also to understand the environment of the issue - was this under Windows/MacOS or Linux for example as the errno behaviour can vary between the platforms.
Certainly your comment about EHOSTUNREACH is not true for Linux, and that seems to me to be a much more important connect failure case than ENETUNREACH.

@@ -65,7 +65,7 @@ def connect(addr) -> socket.socket:
try:
s.connect(addr[4])
except socket.error as e:
if e.errno not in (errno.EINPROGRESS, errno.EWOULDBLOCK, errno.EAGAIN):
if e.errno not in (errno.EINPROGRESS, errno.EWOULDBLOCK, errno.EAGAIN, errno.ENETUNREACH):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure this change doesn't make sense at this point in the code:
The very low level connect is only making sure that a connect on a nonblocking socket worked - the only "non errors" on a non blocking socket are the ones listed - they indicate that the operation is in progress.
ENETUNREACH indicates that the connect operation failed (at this low level). Any retries because of this kind of failure need to be handled at a higher level.

@Jmennius
Copy link
Contributor Author

Jmennius commented Apr 7, 2022

Thanks for looking at and debugging the issue you found. I'm pretty sure this isn't the correct place for the fix (even if it was the easiest place to fix your issue).

In future it really helps us to raise an issue connected with fixes for tracking purposes also to understand the environment of the issue - was this under Windows/MacOS or Linux for example as the errno behaviour can vary between the platforms. Certainly your comment about EHOSTUNREACH is not true for Linux, and that seems to me to be a much more important connect failure case than ENETUNREACH.

Sure, thanks for the feedback! Should I open an issue (at this point)?
This was on Linux. I am pretty sure that it behaves like I've described it, which is indeed unexpected...

@astitcher
Copy link
Member

Sure, thanks for the feedback! Should I open an issue (at this point)?

I think this deserves an issue - although there may already be an issue about reconnect not correctly working if the failure is the initial connect operation.

This was on Linux. I am pretty sure that it behaves like I've described it, which is indeed unexpected...

It could well be that in the case of EHOSTUNREACH this doesn't get discovered immediately which makes it EINPROGRESS, but eventually fails when the target router sends back the unreachable ICMP packet.

@Jmennius
Copy link
Contributor Author

Jmennius commented Apr 7, 2022

Opened PROTON-2528.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants