-
Notifications
You must be signed in to change notification settings - Fork 332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[demo_nodes_cpp] add_two_ints service is flakey #304
Comments
I was unable to reproduce this using Fast-RTPS a from-source build on Bionic Using
returned successfully all 200 times with both the regular and async client. |
I ran the same test as @nuclearsandwich with Connext on Xenial. There, out of 200 runs, I got exactly 1 run where the client hung forever, and 1 run where the client had to try several times to discover the server (the server remained running the entire time). |
59/200 where client hung. Fast-RTSP on High Sierra. |
I was able to reproduce the issue on Windows fat binaries both with FastRTPS and Connext using this
With FastRTPS it usually hang on the 5th - 8th iteration whereas with Connext it hang in the 15th - 17th |
I haven't been able to repro this. I've tried:
|
Also, running a Python server with a C++ client results in the same issue:
|
I was able to reproduce with Fast-RTPS on Xenial by also making the CPU busy during the test (e.g. building a workspace with colcon). The client remains hung even after the CPU usage lowers. This sounds likely related ros2/rmw_fastrtps#239 |
FYI for a more reproducible stress test I recommend the |
Thanks, I discovered that soon after 👍 |
Completely separately it's just been submitted as a rosdep key too: ros/rosdistro#19870 |
Potentially related: ros2/rmw_fastrtps#238 (review) It appears that when the client hangs, it could be due to the number of requested subscribers being reported as zero at the |
I'm not sure I follow the status of this thread. I see @nuclearsandwich said:
Do we have a table of which OS and rmw implementations have this problem? The reason I ask is we see some behavior in the Navigation2 stack on Bionic that may be explained by this, it's not clear to me from this thread. |
wait, now I see that ros2/examples#228 is the equivalent for Bionic (18.04). So what's the status? Is this bug root caused? |
Could you give some more details about your problem?
I don't think so. If you look carefully at the reports above, it looks like it mostly happens with Fast-RTPS (or, at least, it was mostly tested with Fast-RTPS). There also seems to be some reports of it with Connext. There is some possibility that this problem will be improved by the discovery changes that went into FastRTPS 1.7.1 (specifically eProsima/Fast-DDS#411). But that is not a root cause analysis, and as it is we can't upgrade Crystal to FastRTPS 1.7.1 since it is ABI incompatible (though it is API compatible). @mkhansen-intel There are a couple of things you could try:
|
This issue also appears frequently (usually within 20 iterations) with Opensplice on arm64v8 Ubuntu 18.04. Specifically using build from https://ci.ros2.org/view/packaging/job/packaging_linux-aarch64/824/. Additionally, I see the same behavior more frequently (usually within 5 iterations) with other demos such as minimal_service/minimal_client |
Bug report
Required Info:
Steps to reproduce issue
In one terminal start the server:
In another terminal repeatedly run the client:
The same behavior also occurs for the async client:
Expected behavior
Server receives clients request and client gets the response.
Actual behavior
Sometimes (~%30 of the time) the server never gets the request and the client hangs (does not return).
Additional information
Cannot reproduce with OpenSplice and Connext, which leads me to believe that this is a Fast-RTPS issue.
The text was updated successfully, but these errors were encountered: