Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add all validators as entrypoint to local cluster #567

Merged
merged 5 commits into from
Apr 6, 2024

Conversation

carllin
Copy link

@carllin carllin commented Apr 3, 2024

Problem

When you have the following pattern:

  1. Killing the entrypoint/leader
  2. Kill validator A not the leader
  3. Restarting validator A

Because the entrypoint is dead, validator A can't discover the cluster. Then that means A must have the same ports for gossip OR tvu as before the restart otherwise validator A won't be discoverable/receive any shreds from turbine.

Summary of Changes

Add all alive validators as entrypoints to A on restart so it's not dependent on only the initial entrypoint

Fixes #

@carllin carllin requested a review from AshwinSekar April 3, 2024 19:53
AshwinSekar
AshwinSekar previously approved these changes Apr 3, 2024
@carllin carllin force-pushed the FixLocalClusterUtility branch from 2cc2e7c to 700611e Compare April 4, 2024 03:28
.map(|validator| validator.info.contact_info.clone())
.collect();
if entry_point_infos.is_empty() {
panic!("Validator has no alive entrypoints to rejoin cluster with");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

saw some tests are failing here, might want to set the entry point to the node itself in case no one is alive (or single node cluster).

Copy link
Author

@carllin carllin Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this assertion is just not a good one then, since it's possible for a node to start up when everyone else is dead.

You're in a bind though if you restart by yourself with no entrypoints/outdated entrypoint and then the rest of the cluster boots up afterwards, you won't be able to be discovered by everybody else

Edit: I guess it's fine if everyone else adds you as an entrypoint though on startup

Copy link

@AshwinSekar AshwinSekar Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should just work right - once the second validator starts up it will use the first guy as an entrypoint from the above collect?
Also do we want to update the LocalCluster 's entrypoint? previous code was doing this when restarting the entrypoint:

                self.entry_point_info = node.info.clone();

we could do this on restart if the previous entry point is dead? probably only useful for those tests that use rpc

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that's a good catch, updating the entrypoint as was done previously

@carllin carllin force-pushed the FixLocalClusterUtility branch from 611c942 to 1058e3d Compare April 4, 2024 20:24
AshwinSekar
AshwinSekar previously approved these changes Apr 4, 2024
@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.8%. Comparing base (2c11b7a) to head (b97d6cc).
Report is 9 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff            @@
##           master     #567     +/-   ##
=========================================
- Coverage    81.8%    81.8%   -0.1%     
=========================================
  Files         849      850      +1     
  Lines      229183   229369    +186     
=========================================
+ Hits       187585   187721    +136     
- Misses      41598    41648     +50     

@carllin carllin merged commit de8e9e6 into anza-xyz:master Apr 6, 2024
38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants