Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix heartbeat read buffer bug #256

Merged
merged 15 commits into from
Oct 23, 2023
Merged

Fix heartbeat read buffer bug #256

merged 15 commits into from
Oct 23, 2023

Conversation

mingyech
Copy link
Member

Check if an insufficient buffer is used in hbConn.Read().

@mingyech mingyech requested a review from jmwample October 20, 2023 00:13
Copy link
Member

@jmwample jmwample left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for finding this and adding the fix. One minor change to maxMessageSize (described below) if possible.

I am intending this to be merged along with one of the other buffering fixes (#254, #255) just so we don't end up with this happening again in the future.

I am also not sure what will happen in the station when we do run into the ErrInsuficientBuffer case if a large upload / download happens over DTLS.

@@ -2,12 +2,14 @@ package dtls

import (
"bytes"
"errors"
"net"
"sync/atomic"
"time"
)

var maxMessageSize = 65535
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is internal and we control this and the place this is interacting with can we make this buffers the same size (32kB --- 32*1024) to match the buffer that is going to be reading from it? If that is going to mess with message sizes when sending SCTP with heartbeats then it might be fine as is.

Copy link
Member Author

@mingyech mingyech Oct 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, yeah. I am wondering if it would be better to change the buffer in halfPipe to 65535 so that we can have a larger maximum message? The reason I put 65535 is because that's the pion SCTP's maxMessageSize, but we can change that if we want as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think I got halfPipe confused with the SCTP code. It is able to read in messages larger than 32kB because it will just read the next part in the for loop. SCTP had a maxMessageSize because it is a message-oriented stream instead of the byte-stream that halfPipe expects, which meant that it must read each message in full. Thanks to the additions in #253, it should now behave like a byte-stream and work well with halfPipe on large messages.

@mingyech
Copy link
Member Author

mingyech commented Oct 20, 2023

I am also not sure what will happen in the station when we do run into the ErrInsuficientBuffer case if a large upload / download happens over DTLS.

I've made the heartbeat code wrap sctp.Stream instead of the dtls.SCTPConn. This way the enhancements in #253 should work on the station side, so the size of the buffer in halfPipe shouldn't matter. still figuring out tests

@jmwample
Copy link
Member

I think there is a problem with the heartbeat tests wrt. memory usage.

# create mem profile for the heartbeat tests 
go test -race -v -run TestHeartbeat -memprofile mem.out

# view the mem profile
pprof -http :9000 mem.out

~7G of allocation in hbConn.recvLoop() over 4 tests. Trying to run with higher count (i.e. --count=10) results in OOM kill.

conjure/pkg/dtls/heartbeat.go

  Total:      6.96GB     6.96GB (flat, cum) 99.89%
     54            .          .           		time.Sleep(c.timeout) 
     55            .          .           	} 
     56            .          .            
     57            .          .           } 
     58            .          .            
     59            .          .           func (c *hbConn) recvLoop() { 
     60            .          .           	for { 
     61       6.96GB     6.96GB           		buffer := make([]byte, c.maxMessageSize) 
     62            .          .            
     63            .          .           		n, err := c.stream.Read(buffer) 
     64            .          .            
     65            .          .           		if bytes.Equal(c.hb, buffer[:n]) { 
     66            .          .           			atomic.AddUint32(&c.waiting, 1) 
     67            .          .           			continue 
     68            .          .           		} 
     69            .          .            
     70            .          .           		c.recvCh <- errBytes{buffer[:n], err} 
     71            .          .           	} 
     72            .          .            
     73            .          .           } 
     74            .          .            

@mingyech
Copy link
Member Author

Oof, yea that seemed to be the problem. Fixed it and the inconsistency in the tests are gone as well.

@mingyech
Copy link
Member Author

btw, my system time's a bit off it seems and it's making my commits show up like they're from two days ago 😅

@jmwample jmwample merged commit 3ee6b61 into master Oct 23, 2023
11 checks passed
@jmwample jmwample deleted the fix-hb-buf branch October 23, 2023 13:57
jmwample pushed a commit that referenced this pull request Oct 23, 2023
Fix heartbeat read buffer bug, by checking if an insufficient buffer is used in hbConn.Read().
Wrap heartbeat before wrapping SCTPConn so that it also benefits from SCTPConn enhancements
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants