Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only sync with top peers #2983

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Only sync with top peers #2983

wants to merge 4 commits into from

Conversation

thomash-acinq
Copy link
Member

By default we sync with every peer when reconnecting, which can be a lot. We now only sync on reconnection with our top peers (by capacity of our shared channels).

@thomash-acinq thomash-acinq marked this pull request as ready for review January 20, 2025 15:56
@thomash-acinq thomash-acinq requested a review from t-bast January 21, 2025 13:00
Copy link
Member

@t-bast t-bast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately moving this logic entirely to the Router doesn't work, because it doesn't take into account our private channels (which are added to the privateChannels map after being reconnected). It doesn't even take into account our new channels, because they're added to the publicChannels list only after being announced, so when a node starts without any channel, it won't sync the graph at all with anyone with your change because its new channels will only be taken into account after a node restart.

I think we must start with the channels information we have at the initialization of the Switchboard: that's where we can reliably extract the list of our top peers (including public and private channels) and add our sync whitelist to it to create a list of peers we want to sync with.

There is one additional thing that must be handled: if we don't have enough peers with channels to reach our router.sync.peer-limit, we must sync with new peers once we have a channel with them that becomes ready. That is very important to handle the case of a new node being started: initially it doesn't have any peer, but once it has connected to remote node and opened a channel with them, we want to immediately sync the graph.

I think that one way of achieving that is to keep the simplifications you made to Peer and PeerConnection (where we don't check any rate-limit) but send the SendChannelQuery message (or a "wrapped" version of it) to the switchboard instead of the router (to do that, the peer connection can send to its peer parent which can relay to its switchboard parent): then the switchboard can decide whether to forward to the router or drop, depending on whether we've exceeded the rate limit or not. The switchboard can thus update the list of peers we're syncing with dynamically when we initially didn't have enough peers with channels to fill our sync.peer-limit. What do you think?

docs/release-notes/eclair-vnext.md Show resolved Hide resolved
@thomash-acinq thomash-acinq force-pushed the sync-whitelist branch 4 times, most recently from 75fac18 to 352a786 Compare January 27, 2025 16:06
By default we sync with every peer when reconnecting, which can be a lot. We now only sync on reconnection with our top peers (by capacity of our shared channels).
Copy link
Member

@t-bast t-bast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this is indeed much simpler and seems to be working well!

One interesting thing to note is that even when we've already reached our peer limit, we will sync with new peers once we have a first channel created with them (in the event handler for ChannelIdAssigned in Peer.scala). This is a good thing, because we're only syncing once, and will re-sync on reconnection only if our peersToSyncWith wasn't full.

One tiny drawback is that if we create channels with a new peer that should replace an existing peer in our top capacity list, this will only be discovered when restarting the node. But it's fine, the important thing is that we sync with a subset of peers, it doesn't have to perfectly match our top capacity peers all the time, it's not worth the extra complexity.

@@ -116,13 +116,13 @@ class SwitchboardSpec extends TestKitBaseClass with AnyFunSuiteLike {
}

test("sync if no whitelist is defined and peer has channels") {
val nodeParams = Alice.nodeParams.copy(syncWhitelist = Set.empty)
val nodeParams = Alice.nodeParams.copy(routerConf = Alice.nodeParams.routerConf.copy(syncConf = Alice.nodeParams.routerConf.syncConf.copy(whitelist = Set.empty)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you import com.softwaremill.quicklens.ModifyPimp you can greatly simplify the changes in this file, it will let you do something like:

val nodeParams = Alice.nodeParams.modify(_.routerConf.syncConf.whitelist).setTo(Set.empty)

We're using this library in tests in a few other places.

sendFeatures(nodeParams, Nil, remoteNodeId, nodeParams.features.initFeatures(), expectedSync = true)
}

test("don't sync if whitelist doesn't contain peer") {
val nodeParams = Alice.nodeParams.copy(syncWhitelist = Set(randomKey().publicKey, randomKey().publicKey, randomKey().publicKey))
val nodeParams = Alice.nodeParams.copy(routerConf = Alice.nodeParams.routerConf.copy(syncConf = Alice.nodeParams.routerConf.syncConf.copy(peerLimit = 0, whitelist = Set(randomKey().publicKey, randomKey().publicKey, randomKey().publicKey))))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no test of the new behavior, this is important to add. We absolutely must add tests whenever we implement new behavior. Can you test that:

  • when peerLimit is set without a whitelist, we only sync with our top peers and don't sync with others
  • when peerLimit and a whitelist is set, we sync with our top-capacity peers AND the whitelisted peers even if they're not part of our top-capacity peers

Comment on lines +72 to +76
val peerCapacities = channels.map {
case channelData: ChannelDataWithoutCommitments => (channelData.remoteNodeId, 0L)
case channelData: ChannelDataWithCommitments => (channelData.remoteNodeId, channelData.commitments.capacity.toLong)
}.groupMapReduce[PublicKey, Long](_._1)(_._2)(_ + _)
val topCapacityPeers = peerCapacities.toSeq.sortWith { case ((_, c1), (_, c2)) => c1 > c2 }.take(nodeParams.routerConf.syncConf.peerLimit).map(_._1).toSet
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this above the log.info line? It's important for readability to call unstashAll right before context.become, and we'd like the log line's timestamp to match the end of the init phase.

Also, I find the custom sort harder to read than using the default one and taking the last elements:

val topCapacityPeers = peerCapacities.toSeq.sortBy(_._2).takeRight(nodeParams.routerConf.syncConf.peerLimit).map(_._1).toSet

Nice usage of groupMapReduce, I didn't know we had that function in the standard library and it's quite useful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants