-
Notifications
You must be signed in to change notification settings - Fork 268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wen_restart: Fix the epoch_stakes used in calculation. #2376
wen_restart: Fix the epoch_stakes used in calculation. #2376
Conversation
pub struct EpochStakesCache { | ||
epoch_stakes_map: HashMap<Epoch, EpochStakes>, | ||
epoch_schedule: EpochSchedule, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this functionality if we're gonna build this module should:
- live in solana runtime
- just replace the
epoch_stakes: HashMap<Epoch, EpochStakes>
in the bank - Be a separate PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class used to do much more, but now that we only need epoch_schedule and epoch_stakes of root_bank, and we don't need to switch to another bank, I think it makes sense to just remove this class.
It might make sense to add node_id_to_stake to EpochStakes, let me figure out how to create EpochStakes in tests, it's so tangled with other classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added node_id_to_stake to EpochStakes, does it look better now?
…better to just keep root_bank around.
This doesn't have to be true, I think we can change by as much as 25% between epochs Seems to make more sense to repair slots which had 42% in any of the epochs, because it's possible the slot been was optimistically confirmed in the older epoch and there are no slots optimisticallly confirmed in the next epoch |
Hmm okay, worst case scenario is that we are stuck waiting for more
validators. Do you think it’s safe to exit if any of the Epoch has > 80%
stake? Then we risk not having enough stake for the Heaviest Fork selected
…On Sat, Aug 3, 2024 at 15:42 carllin ***@***.***> wrote:
This is probably a bit restrictive, but should work in practice, because
epoch_stakes shouldn't change too much from one Epoch to next, same set of
active validators could quickly satisfy >80% active
This doesn't have to be true, I think we can change by as much as 25%
between epochs
—
Reply to this email directly, view it on GitHub
<#2376 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/A3FJ5FL5YIGOMNWHPG6K6E3ZPVMETAVCNFSM6AAAAABLZLUTAKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRXGE3TSNBVGQ>
.
You are receiving this because you were assigned.Message ID:
***@***.***>
|
Seems to make more sense to repair slots which had 42% in any of the epochs, because it's possible the slot was optimistically confirmed in the older epoch and there are no slots optimisticallly confirmed in the next epoch |
Hmm...
|
We do that already, this PR just decides when we can exit aggregate
RestartLastVotedForkSlots stage. When you are at the boundary of two
Epochs, there are two possibilities: ( old Epoch should always have one
slot with > 42 stake, the local root)
1. The new Epoch has no slot with > 42% stake, then we exit when we have
80% stake for the old Epoch
2. The new Epoch has some slot with > 42% stake, then we exit when we have
… 80% stake for both Epochs
On Sat, Aug 3, 2024 at 15:53 carllin ***@***.***> wrote:
Seems to make more sense to repair slots which had 42% in any of the
epochs, because it's possible the slot been was optimistically confirmed in
the older epoch and there are no slots optimisticallly confirmed in the
next epoch
—
Reply to this email directly, view it on GitHub
<#2376 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/A3FJ5FJAK4OKKVFWS35VH6DZPVNNLAVCNFSM6AAAAABLZLUTAKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRXGE4DCMRXGU>
.
You are receiving this because you were assigned.Message ID:
***@***.***>
|
I think the only optimization we have on the current fix is decide "on-the-fly" where the heaviest fork slot would be. If it's in the new Epoch then we don't need > 80% stake for the old Epoch (but with the handoff fix we might get stuck if we don't have > 80% for the old Epoch). If the heaviest fork slot is in the old Epoch then we don't have to wait for > 80% stake in the new Epoch. But:
|
// the new epoch and update the old one. | ||
assert!(self.epoch_info_vec.first().unwrap().epoch == current_epoch); | ||
self.epoch_info_vec.truncate(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems impossible given slots_to_repair
is a btree and thus let highest_repair_slot = self.slots_to_repair.last();
is monotonically increasing, and the epoch must always be greater. I think we can panic here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible, because I do self.slots_to_repair.remove(slot);
above.
The logic is like this: we allow any user to replace the RestartLastVotedForkSlots
they sent out (maybe they screwed up the ledger directory and tried again). So if some validator X
sent out wrong Gossip messages last time claiming it voted up to slot 1000
, but now replace it with only slot 500
, then we have to subtract its stake from slots 500
~1000
, this may lead the highest slots_to_repair
to go backwards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would be a good comment to add to that case
The comment "Somehow wandering..." suggests this is an error/unforseen case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking now we should enforce for each validator that the RestartLastVotedForkSlots
cannot go down, only monotonically increase.
There should be no reason somebody posts a lower last voted forks other than corruption/malice.
In gossip we can enforce we take the latest one instead of latest one by timestamp.
Also would simplify this code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I thought about that at the beginning. But I was afraid that honest validator operators can't resolve the mistakes they had. Let's say someone has hot standby set up, one of them has a bad state voting into the far future, the other one is good. The operator made the mistake to restart the bad one first, then whatever he does, he can't revert back to the good one any more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per our conversation, will send out another PR to forbid any update to RestartLastVotedForkSlots/HeaviestFork. Because that has little to do with epoch_stakes.
wen-restart/proto/wen_restart.proto
Outdated
uint64 epoch = 1; | ||
uint64 total_active_stake = 2; | ||
uint64 total_stake = 3; | ||
bool considered_during_exit = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm do we need this extra flag, can we just ignore LastVotedForkSlots that are not in the current or next epoch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need this extra flag because root_epoch + 1
is always in epoch_info_vec
now, but we don't always consider root_epoch + 1
for exit criteria. For example, we won't consider it at all if no one has ever voted in the next epoch.
We then need this extra flag in the proto file because the proto file is a snapshot, we can restart from this snapshot. Then we need this calculated correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For logging purposes and cleaner implementation instead of the considered_during_exit
flag, we could track for epoch N
and N+1
two fields, voted_percent
and voted_for_this_epoch_percent
voted_percent
is the % stake of the epoch that has voted for either epoch
voted_for_this_epoch_percent
is the % stake of this epoch that has voted for a slot in this epoch.
This might be useful for debugging as well if we log this information.
We then need this extra flag in the proto file because the proto file is a snapshot, we can restart from this snapshot. Then we need this calculated correctly.
This seems extractable from the bank on restart
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to voted_percent and voted_for_this_epoch_percent.
@@ -260,7 +262,7 @@ pub(crate) fn aggregate_restart_last_voted_fork_slots( | |||
// Because all operations on the aggregate are called from this single thread, we can | |||
// fetch all results separately without worrying about them being out of sync. We can | |||
// also use returned iterator without the vector changing underneath us. | |||
let active_percent = last_voted_fork_slots_aggregate.active_percent(); | |||
let active_percent = last_voted_fork_slots_aggregate.min_active_percent(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this, is it possible that some validators only see the LastVotedForks
updates from the old epoch before they see any last LastVotedForks
from the next epoch?
So they may return thinking 80% of the validators have responded, but don't wait for the 80% from the next epoch
Seems like this should be rare, probably a non issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is no limit on percentage of churns stakes can change at each Epoch, it's possible. But it's only an issue if:
- We have some slot OC'ed in the next Epoch (so 67% of the new Epoch stake confirmed it)
- When 80% of the old Epoch stake returned LastVotedForks, we didn't have any slot in the new Epoch with 42% of the new Epoch stake
Even if this happens, the stake of the new Epoch has already been generated at the beginning of the old Epoch, so very soon the newly restarted cluster needs to cross the Epoch boundary so at that point they figured they don't have majority of the stake in the new Epoch to proceed.
Given the current stake distribution, I think it would be rare.
}); | ||
total_active_stake as f64 / total_stake as f64 * 100.0 | ||
fn update_epoch_info_considered_during_exit(&mut self) { | ||
let highest_repair_slot_epoch = self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now one malicious peer voting with a much higher slot could cause the entire cluster to add the next epoch's stakes to consider.
Is there any benefit to only considering the next epoch's stakes if:
Let's say we have 100 stake in first epoch
- In the next epoch, the previous epoch's valdiators should have at least 91 stake (assuming worst case 9% new stake and 9% old stake cooldown).
- If next epoch has OC'd, 67 of next epoch's eshould have voted, of which at least 67 + 91 - 100 = 58 stake is from the previoius epoch.
Thus we could say we ignore the next epoch until at least 58% of the previous epoch's stake has voted for the next epoch. Not sure if this provides more security.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One malicious peer voting shouldn't be a problem, we only consider if any slot has >42% of the stake, that should be a lot of validators already.
This 42% limit is just very convenient because we use it for repair as well. We could change it to 58% or 38% with different tradeoffs in mind. I think your proposal provide similar safety guarantee to the current requirement (>=1 slot with >42% epoch stake), but when you say 58% you didn't count the stake outside the restart. So if we have 80% of the old epoch stake joining the restart, then we have at least (80-9)/(100+9) = 65% of the new epoch stake, 65% - 20% = 45% of the old Epoch stake could lead to us waiting for new Epoch. We could change the
> 1 slot on 42%
limit to all stake >45%
, but I would think it doesn't make too much difference in reality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, now I think the limit should be at least 1 slot with >=33% stake in the new Epoch.
Let's say some slot X is OC'ed in the new epoch, it has >= 67% stake in the new epoch, we can have at most 9% churn, so it should have >=58% stake in the old epoch. Let's say we have >=80% of the old Epoch in the restart, and 5% can lie, then any slot OC'ed in the new epoch should have at least 58-(100-80)-5 = 33% stake when the old Epoch has >=80% stake in restart.
That said, if we are this close to the new epoch we might very soon need 80% of the new Epoch stake to be online for the cluster to continue functioning.
I'm now thinking, in the spirit of guaranteeing if the clock has ticked into the new Epoch for enough people then we require 80% stake to be online for the new epoch, how about waiting for the new Epoch as soon as you have 1/3 of the stake voting into the new cluster? We would never have >=1/3 malicious validator, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to check that >1/3 voted for at least 1 slot in the new Epoch.
If enough people have their clocks close to the new Epoch boundary, we probably should wait for >80% stake so we can reach consensus in the new Epoch.
// the new epoch and update the old one. | ||
assert!(self.epoch_info_vec.first().unwrap().epoch == current_epoch); | ||
self.epoch_info_vec.truncate(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking now we should enforce for each validator that the RestartLastVotedForkSlots
cannot go down, only monotonically increase.
There should be no reason somebody posts a lower last voted forks other than corruption/malice.
In gossip we can enforce we take the latest one instead of latest one by timestamp.
Also would simplify this code
wen-restart/proto/wen_restart.proto
Outdated
uint64 epoch = 1; | ||
uint64 total_active_stake = 2; | ||
uint64 total_stake = 3; | ||
bool considered_during_exit = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For logging purposes and cleaner implementation instead of the considered_during_exit
flag, we could track for epoch N
and N+1
two fields, voted_percent
and voted_for_this_epoch_percent
voted_percent
is the % stake of the epoch that has voted for either epoch
voted_for_this_epoch_percent
is the % stake of this epoch that has voted for a slot in this epoch.
This might be useful for debugging as well if we log this information.
We then need this extra flag in the proto file because the proto file is a snapshot, we can restart from this snapshot. Then we need this calculated correctly.
This seems extractable from the bank on restart
slot in the new Epoch - switch to voted_percent and voted_for_this_epoch_percent
// If at least 1/3 of the stake has voted for a slot in next Epoch, we think | ||
// the cluster's clock is in sync and everyone will enter the new Epoch soon. | ||
// So we require that we have >80% stake in the new Epoch to exit. | ||
const EPOCH_CONSIDERED_FOR_EXIT_THRESHOLD: f64 = 33.33; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we keep this consistent as a threshold const EPOCH_CONSIDERED_FOR_EXIT_THRESHOLD: f64 = 1f64 / 3f64
nstead of a percent? Seems more in line with all the other thresholds we have
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
.insert(*from, last_vote_epoch) | ||
!= Some(last_vote_epoch) | ||
{ | ||
self.update_epoch_info(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to iterate updates for the entire map here, can we just update the entry for the from
validator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't. But right now when we allow updates, just updating the entry for the from validator would be a bit messy, because you may need to subtract stake from some epoc_info.
Can I clear up the update logic in the following PR forbidding entry update?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm is it more complicated than removing the validator's stake from the latest epoch_info they last voted for and adding to the new one??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh i see what you mean, when we prevent updates none of this matters, this is fine for now then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah it's fine, I implemented it already. But it's going away.
} | ||
} | ||
let last_vote_epoch = root_bank | ||
.get_epoch_and_slot_index(last_voted_fork_slots[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an assumption here last_voted_fork_slots
is sorted from largest to smallest? If so would be nice to have an assert somewhere checking this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also small nit: last_vote_epoch -> my_last_vote_epoch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. It should be sorted, but better use max() here to be safe.
Renamed.
} | ||
} | ||
let last_vote_epoch = root_bank | ||
.get_epoch_and_slot_index(last_voted_fork_slots[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also small nit: last_vote_epoch -> my_last_vote_epoch
for slot in last_voted_fork_slots { | ||
if slot >= &root_slot { | ||
slots_stake_map.insert(*slot, sender_stake); | ||
if let Some(sender_stake) = root_bank.epoch_node_id_to_stake(root_epoch, my_pubkey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this stake be based on the epoch slot
is in rather than the stake in the root_epoch
?
Latest vote slot could be bigger than root epoch
Also nit: last_voted_fork_slots
-> my_last_voted_fork_slots
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good catch, changed. Also added a testcase.
renamed.
voted_percent: 0.0, | ||
voted_for_this_epoch_percent: 0.0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this voted percent/voted_for_this_epoch_percent be updated for our own validator's vote?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
pub total_stake: u64, | ||
// Percentage (0 to 100) of total stake of active peers in this epoch, | ||
// no matter they voted for a slot in this epoch or not. | ||
pub voted_percent: f64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: voted_percent -> actively_voting_stake
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
pub voted_percent: f64, | ||
// Percentage (0 to 100) of total stake of active peers which has voted for | ||
// a slot in this epoch. | ||
pub voted_for_this_epoch_percent: f64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to track the actively voting stake, and convert to percent when we need to by dividing by the total stake
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
if self.update_and_check_if_message_already_saved(new_slots, new_slots_vec) { | ||
return None; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be moved before we construct the record
object above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is written like this so we can avoid a new_slots.clone(). But maybe a clone isn't that bad, changed.
.insert(*from, last_vote_epoch) | ||
!= Some(last_vote_epoch) | ||
{ | ||
self.update_epoch_info(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh i see what you mean, when we prevent updates none of this matters, this is fine for now then
let epoch = self.root_bank.epoch_schedule().get_epoch(*slot); | ||
let entry = self.slots_stake_map.get_mut(slot).unwrap(); | ||
*entry = entry.saturating_sub(sender_stake); | ||
if *entry < threshold_stake { | ||
self.slots_to_repair.remove(slot); | ||
if let Some(sender_stake) = self.root_bank.epoch_node_id_to_stake(epoch, from) { | ||
*entry = entry.saturating_sub(sender_stake); | ||
let repair_threshold_stake = (self.root_bank.epoch_total_stake(epoch).unwrap() | ||
as f64 | ||
* self.repair_threshold) as u64; | ||
if *entry < repair_threshold_stake { | ||
self.slots_to_repair.remove(slot); | ||
} | ||
} | ||
} | ||
for slot in new_slots_set.difference(&old_slots_set) { | ||
let epoch = self.root_bank.epoch_schedule().get_epoch(*slot); | ||
let entry = self.slots_stake_map.entry(*slot).or_insert(0); | ||
*entry = entry.saturating_add(sender_stake); | ||
if *entry >= threshold_stake { | ||
self.slots_to_repair.insert(*slot); | ||
if let Some(sender_stake) = self.root_bank.epoch_node_id_to_stake(epoch, from) { | ||
*entry = entry.saturating_add(sender_stake); | ||
let repair_threshold_stake = (self.root_bank.epoch_total_stake(epoch).unwrap() | ||
as f64 | ||
* self.repair_threshold) as u64; | ||
if *entry >= repair_threshold_stake { | ||
self.slots_to_repair.insert(*slot); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't wait for all of this to go away when we prevent updates :)
* wen_restart: Fix the epoch_stakes used in calculation. * Fix a bad merge. * Remove EpochStakesCache, it only caches epoch stakes from root_bank, better to just keep root_bank around. * Split aggregate into smaller functions. * Switch to node_id_to_stake which is simpler. * Rename update_slots_stake_map and switch to epoch_total_stake(). * Remove unnecessary utility functions. * Do not modify epoch_info_vec, just init it with two epochs we will consider. * Switch to epoch_node_id_to_stake() * Add test for divergence at Epoch boundary. * Make linter happy. * - wait for the new Epoch if > 1/3 of the validators voted for some slot in the new Epoch - switch to voted_percent and voted_for_this_epoch_percent * Fix a bad merge. * Fix a bad merge. * Change constant format. * Do not loop through the whole table. * Address reviewer feedback. * Address reviewer comments.
We used the epoch_stakes of local root previously, this is incorrect when we are crossing Epoch boundary.
We now calculate epoch_stakes one Epoch ahead, and wen_restart can only handle up to 9 hours of outage, so we should always have all the epoch_stakes we ever need in the epoch_stakes_map of local root. Any slot further away should be discarded.