-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: cf worker shouldn't disconnect peers #15
Comments
hmm - the peer list is stored in R2 which has global strong consistency (read after write) except if there is a network partition. at least, that's what they say. so there shouldn't be any latency. If a peer that was once in the list is no longer there, then they timed out or gracefully disconnected. if the timeout is too short, you can adjust it in the constructor. |
i can assure you it's not consistent, anyways this should probably work like a BitTorrent tracker where it announces peers and doesn't dictate them |
can you describe exactly what is happening? because the removal case happens when a peer is in the list, and then is not in the list. it's not going to do that if there was latency in the peer arriving into the list. |
I think a better change is to keep the peer if it is connected. The scenario this is trying to cover is where you've joined a room and there are latent peers leftover from previous sessions that never disconnected gracefully (so aren't actually going to work, but whose RTCPeerConnections are created anyway), and you want to remove the unneeded RTCPeerConnections after they are timed out by the signalling server. |
the worker returns inconsistent peers from those who are expected to be in the lobby, this leads to people connecting to eachother then disconnecting the very next refresh, or even before they can finish connecting, the worker really shouldn't dictate which peers you should disconnect from |
yep - if you update this PR to just remove peers who are not currently connected I'll merge it. I want to leave this case in for the other one I mentioned: you join a room where there are still peers registered who didn't gracefully disconnect their sessions and you need to clear up the peer connections you created for them. |
that said, if you are seeing peers disappear from the worker who were in there, this sounds like a much deeper problem that needs to be understood on its own. either R2 is not living up to its guarantees (unlikely) or there are race conditions I need to deal with (more likely.) |
oh now that you mention it I'm re-reading the code, and actually if there are a bunch of rapidly joining peers there might be some shuffling (despite R2's guarantees) since they do compete for slots in the list. the idea was that the state would eventually converge on subsequent polls. i never actually saw this happen in testing but i think what you're saying sounds possible with the current implementation, so i think I can cross off the concern that I don't understand what's going on. in any case, update your PR and it will fix this problem. |
the CF worker really shouldn't disconnect peers, this should happen between the peers themselves
the peer list often takes time to propagate tru cloudflare, or sometimes doesn't at all, leading to people randomly disconnecting with eachother
The text was updated successfully, but these errors were encountered: