Add all validators as entrypoint to local cluster #567

carllin · 2024-04-03T19:53:56Z

Problem

When you have the following pattern:

Killing the entrypoint/leader
Kill validator A not the leader
Restarting validator A

Because the entrypoint is dead, validator A can't discover the cluster. Then that means A must have the same ports for gossip OR tvu as before the restart otherwise validator A won't be discoverable/receive any shreds from turbine.

Summary of Changes

Add all alive validators as entrypoints to A on restart so it's not dependent on only the initial entrypoint

Fixes #

AshwinSekar · 2024-04-04T16:08:17Z

local-cluster/src/local_cluster.rs

+            .map(|validator| validator.info.contact_info.clone())
+            .collect();
+        if entry_point_infos.is_empty() {
+            panic!("Validator has no alive entrypoints to rejoin cluster with");


saw some tests are failing here, might want to set the entry point to the node itself in case no one is alive (or single node cluster).

maybe this assertion is just not a good one then, since it's possible for a node to start up when everyone else is dead.

You're in a bind though if you restart by yourself with no entrypoints/outdated entrypoint and then the rest of the cluster boots up afterwards, you won't be able to be discovered by everybody else

Edit: I guess it's fine if everyone else adds you as an entrypoint though on startup

this should just work right - once the second validator starts up it will use the first guy as an entrypoint from the above collect?
Also do we want to update the LocalCluster 's entrypoint? previous code was doing this when restarting the entrypoint:

self.entry_point_info = node.info.clone();

we could do this on restart if the previous entry point is dead? probably only useful for those tests that use rpc

yeah, that's a good catch, updating the entrypoint as was done previously

codecov-commenter · 2024-04-05T05:38:03Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.8%. Comparing base (2c11b7a) to head (b97d6cc).
Report is 9 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff            @@
##           master     #567     +/-   ##
=========================================
- Coverage    81.8%    81.8%   -0.1%     
=========================================
  Files         849      850      +1     
  Lines      229183   229369    +186     
=========================================
+ Hits       187585   187721    +136     
- Misses      41598    41648     +50

carllin requested a review from AshwinSekar April 3, 2024 19:53

AshwinSekar previously approved these changes Apr 3, 2024

View reviewed changes

carllin force-pushed the FixLocalClusterUtility branch from 2cc2e7c to 700611e Compare April 4, 2024 03:28

AshwinSekar reviewed Apr 4, 2024

View reviewed changes

carllin dismissed AshwinSekar’s stale review via 611c942 April 4, 2024 20:16

carllin added 3 commits April 4, 2024 16:24

Add entrypoints to all validators

160e614

fix ordering

0c931c0

Remove assert

1058e3d

carllin force-pushed the FixLocalClusterUtility branch from 611c942 to 1058e3d Compare April 4, 2024 20:24

update entrypoint

11e01a8

AshwinSekar previously approved these changes Apr 4, 2024

View reviewed changes

Always add default entrypoint

b97d6cc

carllin dismissed AshwinSekar’s stale review via b97d6cc April 5, 2024 04:09

AshwinSekar approved these changes Apr 6, 2024

View reviewed changes

carllin merged commit de8e9e6 into anza-xyz:master Apr 6, 2024
38 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add all validators as entrypoint to local cluster #567

Add all validators as entrypoint to local cluster #567

carllin commented Apr 3, 2024

AshwinSekar Apr 4, 2024

carllin Apr 4, 2024 •

edited

Loading

AshwinSekar Apr 4, 2024 •

edited

Loading

carllin Apr 4, 2024

codecov-commenter commented Apr 5, 2024

Add all validators as entrypoint to local cluster #567

Add all validators as entrypoint to local cluster #567

Conversation

carllin commented Apr 3, 2024

Problem

Summary of Changes

AshwinSekar Apr 4, 2024

Choose a reason for hiding this comment

carllin Apr 4, 2024 • edited Loading

Choose a reason for hiding this comment

AshwinSekar Apr 4, 2024 • edited Loading

Choose a reason for hiding this comment

carllin Apr 4, 2024

Choose a reason for hiding this comment

codecov-commenter commented Apr 5, 2024

Codecov Report

carllin Apr 4, 2024 •

edited

Loading

AshwinSekar Apr 4, 2024 •

edited

Loading