-
Notifications
You must be signed in to change notification settings - Fork 292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
filter out unstaked NodeInstance from sent PullRequests #2637
filter out unstaked NodeInstance from sent PullRequests #2637
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, minor style comments.
gossip/src/cluster_info.rs
Outdated
| CrdsData::RestartLastVotedForkSlots(_) | ||
| CrdsData::NodeInstance(_) => { | ||
let stake = stakes.get(&value.pubkey()).copied(); | ||
stake.unwrap_or_default() >= MIN_STAKE_FOR_GOSSIP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can leave this part as is, but add an extra branch before line 437:
CrdsData::NodeInstance(_) if !drop_unstaked_node_instance => true,
gossip/src/cluster_info.rs
Outdated
@@ -1646,7 +1656,7 @@ impl ClusterInfo { | |||
.add_relaxed(num_nodes as u64); | |||
if self.require_stake_for_gossip(stakes) { | |||
push_messages.retain(|_, data| { | |||
retain_staked(data, stakes); | |||
retain_staked(data, stakes, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for someone reading this code later, it is not clear what each of these false/true
means.
I think it would be good to add some inline comments as in:
retain_staked(data, stakes, /*drop_unstaked_node_instance:*/ false);
@@ -1646,7 +1651,7 @@ impl ClusterInfo { | |||
.add_relaxed(num_nodes as u64); | |||
if self.require_stake_for_gossip(stakes) { | |||
push_messages.retain(|_, data| { | |||
retain_staked(data, stakes); | |||
retain_staked(data, stakes, /* drop_unstaked_node_instance */ false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this also be true
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i purposefully left this false since I wanted to minimize crds table differences between nodes on this first PR. If we stop propagating staked NodeInstance
via push, then the crds
tables will differ more between the non upgraded nodes and the upgraded nodes. This will result in more unstaked NodeInstances
getting sent from non upgraded to upgraded nodes in PullResponses
. Which is the same issue we are trying to fix (although less extreme)
Step (2) completed in: #3653 |
* filter out unstaked NodeInstance from sent PullRequests * add descriptor, refactor retain_staked() --------- Co-authored-by: greg <[email protected]>
PR #2511 causes an issue where non upgraded nodes consistently send unstaked
NodeInstance
s inPullResponse
s that the receiving node (that has this PR) drops. This increases the network's gossip traffic since the sender of thePullResponse
keeps thinking the receiver doesn't have and wants the unstakedNodeInstance
.Summary of Changes
Revert the PR that stops propagating unstaked
NodeInstance
s.There are two steps in this process of getting to the point where we do not propagate any unstaked
NodeInstance
Step 1) This PR: Do not propagate unstaked
NodeInstance
s inPullResponse
s. UnstakedNodeInstance
s are propagated as normal in sent/receivedPushMessage
s.Step 2) Once all nodes have upgraded to this Step 1 PR, then we fully remove unstaked
NodeInstance
propagation. This will ensure that we don't run into the issue created by PR #2511. Nodes that have not upgraded to the Step 2 PR will still propagate unstakedNodeInstance
s but will not send them inPullResponse
s, which resulted in a spike in gossip traffic.NOTE: I tested this incremental update process on a cluster with 40 validators, 25 rpcs, and 1 bootstrap and it did not increase the gossip traffic as seen in PR #2511.