Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix socket can race condition #223

Merged
merged 7 commits into from
Mar 13, 2024
Merged

Conversation

jlaitine
Copy link

Summary

This fixes the https://ssrc.atlassian.net/browse/DP-8022

The cause for the crash seems to be IOB chain corruption, which is caused by insufficient locking in the can / network stack.

The same locking issue is actually the root cause for the earlier "semcount runaway issue". Therefor, I have reverted the earlier fix for the semcount runaway issue.

The locking is added in patch "arch/risc-v/src/mpfs/mpfs_fpga_canfd.c: Fix a race condition with devif_poll"

Since the reverted patch contained also other things, I created separate patches for those. @haitomatic could you please check that those are correct; I preserved you as the "author" for those.

This PR also contains the following changes, which are not directly causing the issue, but are related:
"net/can: Release IOB when failed to queue in datahandler" <- this is a cherry pick from upstream
"net/can/can_recvmsg.c: Cleanup can_readahead" <- this just removes useless code and improves atomicity of can_readahead

Impact

Fix random crashes when using uavcan on Saluki

Testing

On-desk trials with a drone using can escs

jlaitine and others added 7 commits March 12, 2024 11:09
If we just clear the IOB when failed to queue, we'll leak it.

Signed-off-by: Zhe Weng <[email protected]>
In case CONFIG_NET_TIMESTAMP is enabled, subtract the timestamp size from the buflen

Signed-off-by: Jukka Laitinen <[email protected]>
If pktlen is 0, there is no need to update

Signed-off-by: Jukka Laitinen <[email protected]>
It is better to just atomically remove the iob from the readahead queue first, and only after that work on the iob.

Signed-off-by: Jukka Laitinen <[email protected]>
@jlaitine jlaitine requested review from haitomatic and pussuw March 12, 2024 10:33
@haitomatic
Copy link

lgtm. The PR has reverted most of my previous workaround which is not needed if the race condition is fixed in arch/risc-v/src/mpfs/mpfs_fpga_canfd.c

@jlaitine jlaitine merged commit 51162be into master Mar 13, 2024
6 of 7 checks passed
@jlaitine jlaitine deleted the fix_socket_can_race_condition branch March 13, 2024 10:45
jlaitine pushed a commit that referenced this pull request Sep 27, 2024
"/home/guoshichao/work_profile/vela_os/vela_car_5/nuttx/include/crypto/curve25519.h", line 44: error #223-D:
          function arc4random_buf declared implicitly
      arc4random_buf(secret, CURVE25519_KEY_SIZE);
      ^

Signed-off-by: guoshichao <[email protected]>
jlaitine pushed a commit that referenced this pull request Sep 29, 2024
"/mnt/yang/qixinwei_commit/nuttx/include/nuttx/timers/timer.h", line 257: error #223-D:
          function DEBUGASSERT declared implicitly
    DEBUGASSERT(lower->ops->tick_getstatus);

Signed-off-by: yanghuatao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants