Fix segfault if file descriptor unavailable #249

claui · 2024-01-08T09:52:15Z

The get_java_var_long function returns 0 in several failure modes, e.g. if a file descriptor is unavailable.

However, one of the call sites is missing the result check, which causes a JVM segfault if the return value is 0. The segfault occurs on dereferencing the pointer:

eis->eventflags[SPE_DATA_AVAILABLE]

Add a result value check, throwing a proper IOException if it is 0.

See also similar issue #59.

Fixes #112, #136 and #242.

@alex-vas

The `get_java_var_long` function returns 0 in several failure modes, e.g. if a file descriptor is unavailable. [1] However, one of the call sites is missing the result check, which causes a JVM segfault if the return value is 0. The segfault occurs on dereferencing the pointer: [2] ```c eis->eventflags[SPE_DATA_AVAILABLE] ``` Add a result value check, throwing a proper IOException if it is 0. See also similar issue NeuronRobotics#59. [3] Fixes NeuronRobotics#112 [4], NeuronRobotics#136 [5] and NeuronRobotics#242 [6]. [1]: https://github.com/NeuronRobotics/nrjavaserial/blob/0df8b60485a56d7698b71183237b5615d02a8194/src/main/c/src/SerialImp.c#L5137-L5142 [2]: https://github.com/NeuronRobotics/nrjavaserial/blob/0df8b60485a56d7698b71183237b5615d02a8194/src/main/c/src/SerialImp.c#L3085 [3]: NeuronRobotics#59 [4]: NeuronRobotics#112 [5]: NeuronRobotics#136 [6]: NeuronRobotics#242 Reported-by: Alex Vasiliev <@alex-vas> Reported-by: Łukasz Dywicki <[email protected]> Reported-by: Jose Pacelli <[email protected]> Reported-by: Frank Hartwig <[email protected]>

claui · 2024-02-08T17:05:35Z

I can confirm that this fix is independent of PR #211.
Our tests on our hardware suggest we need both this PR and #211 to keep the JVM from segfaulting.

Both segfaults are easy to tell apart because they have unique fingerprints in the error log:

The segfault being tackled here always occurs in read_byte_array. It always has 0x0000000000000008 as the unmapped address being dereferenced. (The number 8 comes from the struct offset SPE_DATA_AVAILABLE).

Segfaults being addressed by PR #211 are occurring outside of read_byte_array, e.g. in interruptEventLoop.
The code in interruptEventLoop is interpreting stack garbage as an event_info_struct, causing a crash trying to dereference the next "pointer". The unmapped address mentioned in the crash log is not the number 8, but random stack garbage (e.g. 0x0000000d00000000).

MrDOS · 2024-02-09T11:33:47Z

Thank you for digging so deep into these issues. The check you're introducing here is perfectly sensible. What initially confused me was how this code is getting called at all with RXTXPort.eis == NULL. The only place I can find where eis is reset to 0 is if the event loop returns without throwing anything:

nrjavaserial/src/main/java/gnu/io/RXTXPort.java

Lines 108 to 111 in 0df8b60

    
           eventLoop(); 
        
                    eis = 0; 
        
           if (debug) 
        
           	z.reportln( "eventLoop() returned, this is invalid.");

It looks like that happens when the event loop is shut down via RXTXPort.removeEventListener() (which is one of the things done by RXTXPort.close()):

nrjavaserial/src/main/java/gnu/io/RXTXPort.java

Line 914 in 0df8b60

interruptEventLoop( );

nrjavaserial/src/main/c/src/SerialImp.c

Line 4984 in 0df8b60

index->closing = 1;

nrjavaserial/src/main/c/src/SerialImp.c

Lines 4261 to 4268 in 0df8b60

    
           if( eis.closing ) 
        
           { 
        
           	report("eventLoop: got interrupt\n"); 
        
           	finalize_threads( &eis ); 
        
           	finalize_event_info_struct( &eis ); 
        
           	LEAVE("eventLoop"); 
        
           	return; 
        
           }

The read_byte_array() function you're fixing here is called under the hood by all the different RXTXPort.read() methods. All of those are careful to check that the monitor thread hasn't been interrupted before issuing the read. They're careful to acquire a lock on the I/O mutex. They're even all synchronized.

nrjavaserial/src/main/java/gnu/io/RXTXPort.java

Lines 1482 to 1492 in 0df8b60

    
           if ( monThreadisInterrupted == true ) 
        
           { 
        
           	if (debug_read) 
        
           		z.reportln( "RXTXPort:SerialInputStream:read() Interrupted"); 
        
           	return(0); 
        
           } 
        
           IOLockedMutex.readLock().lock(); 
        
           try 
        
           { 
        
           	waitForTheNativeCodeSilly(); 
        
           	result = readArray( b, off, Minimum);

...but because access to the monThreadisInterrupted property isn't synchronized (either in read() or in removeEventListener()), there's still a race condition whereby the monitor thread can pack up and go home in between the check and the native read code call 🤦‍♂️ It's a very narrow one, sure, but given that many (most?) applications will call read() in a tight loop, it's no surprise that it gets hit eventually.

I'll happily merge this fix now – thank you very much for the contribution. And I'll modify #211 to protect the monThreadisInterrupted checks inside the read() methods with a read lock on monitorThreadState.

claui mentioned this pull request Feb 8, 2024

Fix macOS segfault on close after hardware error #211

Open

MrDOS merged commit b1da316 into NeuronRobotics:master Feb 9, 2024

claui deleted the fix-segfault branch February 9, 2024 11:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix segfault if file descriptor unavailable #249

Fix segfault if file descriptor unavailable #249

claui commented Jan 8, 2024

claui commented Feb 8, 2024

MrDOS commented Feb 9, 2024 •

edited

Loading

Fix segfault if file descriptor unavailable #249

Fix segfault if file descriptor unavailable #249

Conversation

claui commented Jan 8, 2024

claui commented Feb 8, 2024

MrDOS commented Feb 9, 2024 • edited Loading

MrDOS commented Feb 9, 2024 •

edited

Loading