Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(hesai_hw_monitor): fix 2s drops issue - again #135

Merged
merged 2 commits into from
Apr 3, 2024

Conversation

mojomex
Copy link
Collaborator

@mojomex mojomex commented Apr 2, 2024

PR Type

  • Bug Fix

Related Links

Description

This PR fixes the OT128's 2s UDP packet drop issue (again, this time for node containers).
The culprit was a race condition between the HW monitor's OnHesaiStatusTimer() and OnHesaiLidarMonitorTimer() callbacks, which both access the non-thread-safe HW interface.
(Thread-safety will probably be fixed by #131 as the SendReceive() function should be thread-safe there, as a side-effect of the other fixes made.)

Due to the race condition, the callbacks got into a deadlock, causing all TCP communication from HW monitor to stop. HW monitor is the only source of constant TCP traffic to the sensor (once per diag_span, default 1s) and is thought to refresh the sensor's ARP cache. When the monitor stops, after 1 minute the sensor sends 2 duplicate ARP requests (which are presumably a bug on the sensor side), causing the sensor to stop sending UDP packets for 2s.

The fix in this PR is to merge the two above-mentioned callbacks into a unified one:

[this](){
  OnHesaiStatusTimer();
  OnHesaiLidarMonitorTimer();
}

This gets rid of the sensor's duplicate ARP requests and 2s drops once per minute.

Review Procedure

Run Nebula with OT128 once with, once without this PR (using hesai_launch_component.launch.xml) and confirm in Wireshark -> Statistics -> I/O Graphs that once per minute there is a 2s drop without this PR, and no such drop with this PR.

Remarks

Pre-Review Checklist for the PR Author

PR Author should check the checkboxes below when creating the PR.

  • Assign PR to reviewer

Checklist for the PR Reviewer

Reviewers should check the checkboxes below before approval.

  • Commits are properly organized and messages are according to the guideline
  • (Optional) Unit tests have been written for new behavior
  • PR title describes the changes

Post-Review Checklist for the PR Author

PR Author should check the checkboxes below before merging.

  • All open points are addressed and tracked via issues or tickets

CI Checks

  • Build and test for PR: Required to pass before the merge.

@mojomex mojomex self-assigned this Apr 2, 2024
@mojomex mojomex force-pushed the ot128-2s-drops-fix-again branch from 3a68e4e to f8e5fd0 Compare April 2, 2024 12:30
@mojomex mojomex marked this pull request as ready for review April 2, 2024 12:44
@mojomex mojomex requested review from amc-nu and drwnz April 2, 2024 12:46
@mojomex mojomex merged commit eb5e1b7 into tier4:main Apr 3, 2024
7 checks passed
@mojomex mojomex deleted the ot128-2s-drops-fix-again branch April 3, 2024 04:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants