Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filter out unstaked NodeInstance from sent PullRequests #2637

Merged

Conversation

gregcusack
Copy link

@gregcusack gregcusack commented Aug 16, 2024

PR #2511 causes an issue where non upgraded nodes consistently send unstaked NodeInstances in PullResponses that the receiving node (that has this PR) drops. This increases the network's gossip traffic since the sender of the PullResponse keeps thinking the receiver doesn't have and wants the unstaked NodeInstance.

Summary of Changes

Revert the PR that stops propagating unstaked NodeInstances.
There are two steps in this process of getting to the point where we do not propagate any unstaked NodeInstance
Step 1) This PR: Do not propagate unstaked NodeInstances in PullResponses. Unstaked NodeInstances are propagated as normal in sent/received PushMessages.

Step 2) Once all nodes have upgraded to this Step 1 PR, then we fully remove unstaked NodeInstance propagation. This will ensure that we don't run into the issue created by PR #2511. Nodes that have not upgraded to the Step 2 PR will still propagate unstaked NodeInstances but will not send them in PullResponses, which resulted in a spike in gossip traffic.

NOTE: I tested this incremental update process on a cluster with 40 validators, 25 rpcs, and 1 bootstrap and it did not increase the gossip traffic as seen in PR #2511.

@gregcusack gregcusack marked this pull request as ready for review August 16, 2024 19:45
@gregcusack gregcusack requested a review from behzadnouri August 16, 2024 19:45
behzadnouri
behzadnouri previously approved these changes Aug 19, 2024
Copy link

@behzadnouri behzadnouri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, minor style comments.

Comment on lines 441 to 444
| CrdsData::RestartLastVotedForkSlots(_)
| CrdsData::NodeInstance(_) => {
let stake = stakes.get(&value.pubkey()).copied();
stake.unwrap_or_default() >= MIN_STAKE_FOR_GOSSIP

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can leave this part as is, but add an extra branch before line 437:

CrdsData::NodeInstance(_) if !drop_unstaked_node_instance => true,

@@ -1646,7 +1656,7 @@ impl ClusterInfo {
.add_relaxed(num_nodes as u64);
if self.require_stake_for_gossip(stakes) {
push_messages.retain(|_, data| {
retain_staked(data, stakes);
retain_staked(data, stakes, false);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for someone reading this code later, it is not clear what each of these false/true means.
I think it would be good to add some inline comments as in:

retain_staked(data, stakes, /*drop_unstaked_node_instance:*/ false);

@@ -1646,7 +1651,7 @@ impl ClusterInfo {
.add_relaxed(num_nodes as u64);
if self.require_stake_for_gossip(stakes) {
push_messages.retain(|_, data| {
retain_staked(data, stakes);
retain_staked(data, stakes, /* drop_unstaked_node_instance */ false);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this also be true?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i purposefully left this false since I wanted to minimize crds table differences between nodes on this first PR. If we stop propagating staked NodeInstance via push, then the crds tables will differ more between the non upgraded nodes and the upgraded nodes. This will result in more unstaked NodeInstances getting sent from non upgraded to upgraded nodes in PullResponses. Which is the same issue we are trying to fix (although less extreme)

@gregcusack gregcusack merged commit 485216b into anza-xyz:master Aug 20, 2024
40 checks passed
@gregcusack gregcusack deleted the rm-node-instance-from-pull-requests branch August 20, 2024 20:09
@gregcusack
Copy link
Author

Step (2) completed in: #3653

ray-kast pushed a commit to abklabs/agave that referenced this pull request Nov 27, 2024
* filter out unstaked NodeInstance from sent PullRequests

* add descriptor, refactor retain_staked()

---------

Co-authored-by: greg <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants