-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add all validators as entrypoint to local cluster #567
Conversation
2cc2e7c
to
700611e
Compare
local-cluster/src/local_cluster.rs
Outdated
.map(|validator| validator.info.contact_info.clone()) | ||
.collect(); | ||
if entry_point_infos.is_empty() { | ||
panic!("Validator has no alive entrypoints to rejoin cluster with"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
saw some tests are failing here, might want to set the entry point to the node itself in case no one is alive (or single node cluster).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this assertion is just not a good one then, since it's possible for a node to start up when everyone else is dead.
You're in a bind though if you restart by yourself with no entrypoints/outdated entrypoint and then the rest of the cluster boots up afterwards, you won't be able to be discovered by everybody else
Edit: I guess it's fine if everyone else adds you as an entrypoint though on startup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should just work right - once the second validator starts up it will use the first guy as an entrypoint from the above collect?
Also do we want to update the LocalCluster
's entrypoint? previous code was doing this when restarting the entrypoint:
self.entry_point_info = node.info.clone();
we could do this on restart if the previous entry point is dead? probably only useful for those tests that use rpc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, that's a good catch, updating the entrypoint as was done previously
611c942
to
1058e3d
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #567 +/- ##
=========================================
- Coverage 81.8% 81.8% -0.1%
=========================================
Files 849 850 +1
Lines 229183 229369 +186
=========================================
+ Hits 187585 187721 +136
- Misses 41598 41648 +50 |
Problem
When you have the following pattern:
A
not the leaderA
Because the entrypoint is dead, validator
A
can't discover the cluster. Then that meansA
must have the same ports for gossip OR tvu as before the restart otherwise validatorA
won't be discoverable/receive any shreds from turbine.Summary of Changes
Add all alive validators as entrypoints to
A
on restart so it's not dependent on only the initial entrypointFixes #