From e756710b5d1c51b534c3ee280f3392c59420f34d Mon Sep 17 00:00:00 2001 From: Floris Bruynooghe Date: Tue, 14 Jan 2025 12:35:30 +0100 Subject: [PATCH] fix(iroh): Queue sent datagrams longer (#3129) ## Description The problem is that while the connection to the relay server is still being established sent packets are already being dropped while being queued to send. This means when the connection is finally established they are not there to be sent and depending on some scheduling luck connections will often fail. Extending this timeout makes establishing connections via the relay only much more reliable. ## Breaking Changes ## Notes & open questions "depending on scheduling luck" is a bit hand-wavy. I would have expected QUIC to recover from this and re-send the packets. I think it depends on exactly how long it takes to establish the connection, re-tries could still end up being dropped in this queue if badly timed. It is hard to say if 3*PTO is sufficient. There is an argument for even longer, but it is a trade-off of blocking the entire relay queue if it is too long and giving enough time to establish a normal connection. ## Change checklist - [x] Self-review. - [x] Documentation updates following the [style guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text), if relevant. - [x] Tests if relevant. - [x] All breaking changes documented. --- iroh/src/magicsock/relay_actor.rs | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/iroh/src/magicsock/relay_actor.rs b/iroh/src/magicsock/relay_actor.rs index cbc202420e..c8cacd1b85 100644 --- a/iroh/src/magicsock/relay_actor.rs +++ b/iroh/src/magicsock/relay_actor.rs @@ -96,7 +96,9 @@ const CONNECT_TIMEOUT: Duration = Duration::from_secs(10); /// When the [`ActiveRelayActor`] is not connected it can not deliver datagrams. However it /// will still receive datagrams to send from the [`RelayActor`]. If connecting takes /// longer than this timeout datagrams will be dropped. -const UNDELIVERABLE_DATAGRAM_TIMEOUT: Duration = Duration::from_millis(400); +/// +/// This value is set to 3 times the QUIC initial Probe Timeout (PTO). +const UNDELIVERABLE_DATAGRAM_TIMEOUT: Duration = Duration::from_secs(3); /// An actor which handles the connection to a single relay server. ///