Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix segfault if file descriptor unavailable #249

Merged
merged 1 commit into from
Feb 9, 2024

Conversation

claui
Copy link

@claui claui commented Jan 8, 2024

The get_java_var_long function returns 0 in several failure modes, e.g. if a file descriptor is unavailable.

However, one of the call sites is missing the result check, which causes a JVM segfault if the return value is 0. The segfault occurs on dereferencing the pointer:

eis->eventflags[SPE_DATA_AVAILABLE]

Add a result value check, throwing a proper IOException if it is 0.

See also similar issue #59.

Fixes #112, #136 and #242.

The `get_java_var_long` function returns 0 in several failure modes,
e.g. if a file descriptor is unavailable. [1]

However, one of the call sites is missing the result check, which causes
a JVM segfault if the return value is 0. The segfault occurs on
dereferencing the pointer: [2]

```c
eis->eventflags[SPE_DATA_AVAILABLE]
```

Add a result value check, throwing a proper IOException if it is 0.

See also similar issue NeuronRobotics#59. [3]

Fixes NeuronRobotics#112 [4], NeuronRobotics#136 [5] and NeuronRobotics#242 [6].

[1]: https://github.com/NeuronRobotics/nrjavaserial/blob/0df8b60485a56d7698b71183237b5615d02a8194/src/main/c/src/SerialImp.c#L5137-L5142
[2]: https://github.com/NeuronRobotics/nrjavaserial/blob/0df8b60485a56d7698b71183237b5615d02a8194/src/main/c/src/SerialImp.c#L3085
[3]: NeuronRobotics#59
[4]: NeuronRobotics#112
[5]: NeuronRobotics#136
[6]: NeuronRobotics#242

Reported-by: Alex Vasiliev <@alex-vas>
Reported-by: Łukasz Dywicki <[email protected]>
Reported-by: Jose Pacelli <[email protected]>
Reported-by: Frank Hartwig <[email protected]>
@claui
Copy link
Author

claui commented Feb 8, 2024

I can confirm that this fix is independent of PR #211.
Our tests on our hardware suggest we need both this PR and #211 to keep the JVM from segfaulting.

Both segfaults are easy to tell apart because they have unique fingerprints in the error log:

The segfault being tackled here always occurs in read_byte_array. It always has 0x0000000000000008 as the unmapped address being dereferenced. (The number 8 comes from the struct offset SPE_DATA_AVAILABLE).

Segfaults being addressed by PR #211 are occurring outside of read_byte_array, e.g. in interruptEventLoop.
The code in interruptEventLoop is interpreting stack garbage as an event_info_struct, causing a crash trying to dereference the next "pointer". The unmapped address mentioned in the crash log is not the number 8, but random stack garbage (e.g. 0x0000000d00000000).

@MrDOS
Copy link
Contributor

MrDOS commented Feb 9, 2024

Thank you for digging so deep into these issues. The check you're introducing here is perfectly sensible. What initially confused me was how this code is getting called at all with RXTXPort.eis == NULL. The only place I can find where eis is reset to 0 is if the event loop returns without throwing anything:

eventLoop();
eis = 0;
if (debug)
z.reportln( "eventLoop() returned, this is invalid.");

It looks like that happens when the event loop is shut down via RXTXPort.removeEventListener() (which is one of the things done by RXTXPort.close()):

interruptEventLoop( );

index->closing = 1;

if( eis.closing )
{
report("eventLoop: got interrupt\n");
finalize_threads( &eis );
finalize_event_info_struct( &eis );
LEAVE("eventLoop");
return;
}

The read_byte_array() function you're fixing here is called under the hood by all the different RXTXPort.read() methods. All of those are careful to check that the monitor thread hasn't been interrupted before issuing the read. They're careful to acquire a lock on the I/O mutex. They're even all synchronized.

if ( monThreadisInterrupted == true )
{
if (debug_read)
z.reportln( "RXTXPort:SerialInputStream:read() Interrupted");
return(0);
}
IOLockedMutex.readLock().lock();
try
{
waitForTheNativeCodeSilly();
result = readArray( b, off, Minimum);

...but because access to the monThreadisInterrupted property isn't synchronized (either in read() or in removeEventListener()), there's still a race condition whereby the monitor thread can pack up and go home in between the check and the native read code call 🤦‍♂️ It's a very narrow one, sure, but given that many (most?) applications will call read() in a tight loop, it's no surprise that it gets hit eventually.

I'll happily merge this fix now – thank you very much for the contribution. And I'll modify #211 to protect the monThreadisInterrupted checks inside the read() methods with a read lock on monitorThreadState.

@MrDOS MrDOS merged commit b1da316 into NeuronRobotics:master Feb 9, 2024
@claui claui deleted the fix-segfault branch February 9, 2024 11:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JVM crash on macOS.
2 participants