From 2e99b5b7f59f0c84b29cb6d07d1ca482772790f4 Mon Sep 17 00:00:00 2001 From: Wen Date: Mon, 10 Apr 2023 09:02:12 -0700 Subject: [PATCH 001/119] Add repair and restart proposal. --- proposals/0024-repair-and-restart.md | 107 +++++++++++++++++++++++++++ 1 file changed, 107 insertions(+) create mode 100644 proposals/0024-repair-and-restart.md diff --git a/proposals/0024-repair-and-restart.md b/proposals/0024-repair-and-restart.md new file mode 100644 index 00000000..c0da5a37 --- /dev/null +++ b/proposals/0024-repair-and-restart.md @@ -0,0 +1,107 @@ +--- +simd: '0024' +title: Automatially find confirmed slots and repair them before a cluster restart +authors: + - Wen Xu (Solana Labs) +category: Standard +type: Core +status: Draft +created: 2023-04-07 +feature: (fill in with feature tracking issues once accepted) +--- + +## Summary + +Improve the current [cluster restart](https://docs.solana.com/running-validator/restart-cluster) +procedure such that validators can automatically figure out confirmed slots, +generate a snapshot, and then proceed to restart if everything looks fine. + +## Motivation +Currently during a [cluster restart](https://docs.solana.com/running-validator/restart-cluster), validator operators need to decide latest optimistically confirmed slot, create a snapshot, +then restart the validators with new commandline arguments. + +The current process involves a lot of human intenvention, if people make a +mistake in deciding the latest optimistically confirmed slot, it could mean +rollback of user transactions after they have been confirmed, which is not +acceptable. + +We aim to automate the finding of oc slot and snapshot generation, so that +we can lower the possibility of human mistakes in the cluster restart process. + +## Alternatives Considered + +See [Handling Common Solana Outages](https://docs.google.com/document/d/1RkNAyz-5aKvv5FF44b8SoKifChKB705y5SdcEoqMPIc/edit#bookmark=id.jtrjbe6g4mk3) for details. + +There are many proposals about automatically detecting that the cluster is +in an outage so validators should enter a recovery process automatically. + +While getting human out of the loop greatly improves recovery speed, +automaticlly restarting the whole cluster still seems risky. Because if +the recovery process itself doesn't work, it might be some time before +we can get human's attention. And it doesn't solve the cases where new binary +is needed. So for now we still plan to have human in the loop. + +## New Terminology + +* SilentRepairMode - when validators restart in this new mode, they will +talk with each other to find the highest oc slot and repair all necessary +blocks. To improve speed and guarantee simplicity, Turbine, vote, and new +block generation are paused in this mode. + +## Detailed Design + +A new command line arg --RepairAndRestart is +added. When the cluster is in need of a restart, we assume at least 80% will +restart with this arg. Any validators restarted with this arg does not participate +in the normal Turbine protocol, update its vote, or generate new blocks until all +of the following steps are complted. + +### Gossip last vote before the restart and ancestors on that fork +Send Gossip message LastVotedForkSlots to everyone, it contains the last voted slot on +its tower and the ancestor slots on the last voted fork and is sent in a bitmap like +the EpicSlots data structure. The number of ancestor slots sent is determined by +, by default this number is 2000. + +### Aggregate, repair, and replay the slots in LastVotedForkSlots +Aggregate the slots in received LastVotedForkSlots messages, whenever some slot has +more than 47% stake and it's missing locally, start the repair process for this slot. +47% is chosen because we don't want to miss any optimistically confirmed slot, those +slots have at least 67% votes, and we assume that there can be 20% validators not +participating in restarts, so we need to repair all slots with at least +67% - 20% = 47% votes. + +### Gossip current heaviest fork +After receiving LastVotedForkSlots from 80% of the validators and reparing all slots +with more than 47% votes, count the heaviest fork and Gossip Heaviest(X, Hash(X)) out, +where X is the tip of the heaviest fork. + +### Generate local snapshot on the heaviest slot +Generate a local snapshot on the heaviest slot. + +### Proceed to restart if everything looks okay, halt otherwise +If things go well, all 80% of the validators should find the same heaviest fork. But +we are only sending slots instead of bank hashes in LastVotedForkSlots, so it's possible +that a duplicate block can make the cluster unable to reach consensus. If at least +2/3 of the people agree on one slot, they should proceed to restart from this slot. +Otherwise validators should halt and send alerts for human attention. + +## Impact +This proposal adds a new RepairAndRestart mode to validators, during this phase +the validators will not participate in normal cluster activities, which is the +same as now. Compared to today's cluster restart, the new mode may mean more +network bandwidth and memory on the restarting validators, but it guarantees the +safety of optimistically confirmed user transactions, and validator admins don't +need to manually generate and download snapshots again. + +## Security Considerations +The two added Gossip messages LastVotedForkSlots and Heavist will only be sent and +processed when the validator is restarted in RepairAndRestart mode. So random validator +restarting in the new mode will not bring extra burden to the system. + +## Backwards Compatibility +This change is backward compatible with previous versions, because validators only +enter the new mode during new restart mode which is controlled by a command line +argument. All current restart arguments like --wait-for-supermajority and +--expected-bank-hash will be kept as is for now. +However, this change does not work until at least 80% installed the new binary and +they are willing to use the new methods for restart. \ No newline at end of file From 03277f195df76ce0d97a080dd35223780d72bfa2 Mon Sep 17 00:00:00 2001 From: Wen <113942165+wen-coding@users.noreply.github.com> Date: Wed, 12 Apr 2023 11:20:27 -0700 Subject: [PATCH 002/119] Update proposals/0024-repair-and-restart.md Co-authored-by: mvines --- proposals/0024-repair-and-restart.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/0024-repair-and-restart.md b/proposals/0024-repair-and-restart.md index c0da5a37..15ab60de 100644 --- a/proposals/0024-repair-and-restart.md +++ b/proposals/0024-repair-and-restart.md @@ -21,7 +21,7 @@ Currently during a [cluster restart](https://docs.solana.com/running-validator/r then restart the validators with new commandline arguments. The current process involves a lot of human intenvention, if people make a -mistake in deciding the latest optimistically confirmed slot, it could mean +mistake in deciding the highest optimistically confirmed slot, it could mean rollback of user transactions after they have been confirmed, which is not acceptable. From 7b71cad6da848411f3cbdd96d836b7b0f0c5d14d Mon Sep 17 00:00:00 2001 From: Wen <113942165+wen-coding@users.noreply.github.com> Date: Wed, 12 Apr 2023 11:20:37 -0700 Subject: [PATCH 003/119] Update proposals/0024-repair-and-restart.md Co-authored-by: mvines --- proposals/0024-repair-and-restart.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/0024-repair-and-restart.md b/proposals/0024-repair-and-restart.md index 15ab60de..a88058f2 100644 --- a/proposals/0024-repair-and-restart.md +++ b/proposals/0024-repair-and-restart.md @@ -25,7 +25,7 @@ mistake in deciding the highest optimistically confirmed slot, it could mean rollback of user transactions after they have been confirmed, which is not acceptable. -We aim to automate the finding of oc slot and snapshot generation, so that +We aim to automate the finding of the highest optimistically confirmed slot and snapshot generation, so that we can lower the possibility of human mistakes in the cluster restart process. ## Alternatives Considered From 996d1ab90f2ded87978af1ebf90640b9572c9db7 Mon Sep 17 00:00:00 2001 From: Wen <113942165+wen-coding@users.noreply.github.com> Date: Wed, 12 Apr 2023 11:20:47 -0700 Subject: [PATCH 004/119] Update proposals/0024-repair-and-restart.md Co-authored-by: mvines --- proposals/0024-repair-and-restart.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/0024-repair-and-restart.md b/proposals/0024-repair-and-restart.md index a88058f2..87533f72 100644 --- a/proposals/0024-repair-and-restart.md +++ b/proposals/0024-repair-and-restart.md @@ -44,7 +44,7 @@ is needed. So for now we still plan to have human in the loop. ## New Terminology * SilentRepairMode - when validators restart in this new mode, they will -talk with each other to find the highest oc slot and repair all necessary +talk with each other to find the highest optimistically confirmed slot and repair all necessary blocks. To improve speed and guarantee simplicity, Turbine, vote, and new block generation are paused in this mode. From de726266f5a2a719d73aee9fe396c292b7305300 Mon Sep 17 00:00:00 2001 From: Wen Date: Wed, 12 Apr 2023 12:17:00 -0700 Subject: [PATCH 005/119] Add protocol overview and lint changes. --- proposals/0024-repair-and-restart.md | 66 +++++++++++++++++++++------- 1 file changed, 49 insertions(+), 17 deletions(-) diff --git a/proposals/0024-repair-and-restart.md b/proposals/0024-repair-and-restart.md index c0da5a37..5b3f29e2 100644 --- a/proposals/0024-repair-and-restart.md +++ b/proposals/0024-repair-and-restart.md @@ -1,6 +1,6 @@ --- simd: '0024' -title: Automatially find confirmed slots and repair them before a cluster restart +title: Automatially repair and start for a cluster restart authors: - Wen Xu (Solana Labs) category: Standard @@ -12,25 +12,31 @@ feature: (fill in with feature tracking issues once accepted) ## Summary -Improve the current [cluster restart](https://docs.solana.com/running-validator/restart-cluster) +Improve the current [cluster restart] +(https://docs.solana.com/running-validator/restart-cluster) procedure such that validators can automatically figure out confirmed slots, generate a snapshot, and then proceed to restart if everything looks fine. ## Motivation -Currently during a [cluster restart](https://docs.solana.com/running-validator/restart-cluster), validator operators need to decide latest optimistically confirmed slot, create a snapshot, -then restart the validators with new commandline arguments. + +Currently during a cluster restart, validator operators need to decide latest +optimistically confirmed slot, create a snapshot, then restart the validators +with new commandline arguments. The current process involves a lot of human intenvention, if people make a -mistake in deciding the latest optimistically confirmed slot, it could mean +mistake in deciding the highest optimistically confirmed slot, it could mean rollback of user transactions after they have been confirmed, which is not acceptable. -We aim to automate the finding of oc slot and snapshot generation, so that -we can lower the possibility of human mistakes in the cluster restart process. +We aim to automate the finding of highest optimistically confirmed slot and +snapshot generation, so that we can lower the possibility of human mistakes +in the cluster restart process. ## Alternatives Considered -See [Handling Common Solana Outages](https://docs.google.com/document/d/1RkNAyz-5aKvv5FF44b8SoKifChKB705y5SdcEoqMPIc/edit#bookmark=id.jtrjbe6g4mk3) for details. +See [Handling Common Solana Outages] +(https://docs.google.com/document/d/1RkNAyz-5aKvv5FF44b8SoKifChKB705y5SdcEoqMPIc) +for details. There are many proposals about automatically detecting that the cluster is in an outage so validators should enter a recovery process automatically. @@ -44,25 +50,37 @@ is needed. So for now we still plan to have human in the loop. ## New Terminology * SilentRepairMode - when validators restart in this new mode, they will -talk with each other to find the highest oc slot and repair all necessary -blocks. To improve speed and guarantee simplicity, Turbine, vote, and new -block generation are paused in this mode. +talk with each other to find the highest optimistically confirmed slot and +repair all necessary blocks. To improve speed and guarantee simplicity, +Turbine, vote, and new block generation are paused in this mode. ## Detailed Design -A new command line arg --RepairAndRestart is -added. When the cluster is in need of a restart, we assume at least 80% will -restart with this arg. Any validators restarted with this arg does not participate -in the normal Turbine protocol, update its vote, or generate new blocks until all -of the following steps are complted. +The new protocol tries to do the following: +1. Everyone freeze, no new blocks, no new votes, and no Turbine +2. Make all blocks which can potentially have been optimistically confirmed +before the freeze propagate to everyone +3. Make restart partipants' last votes before the freze propagate to everyone +4. Now see if enough people can optimistically agree on one block (same +slot and hash) to restart from +4.1 If yes, proceed and restart +4.2 If no, freeze and print out what you think is wrong, wait for human + +A new command line arg --RepairAndRestart is added. +When the cluster is in need of a restart, we assume at least 80% will restart +with this arg. Any validators restarted with this arg does not participate in +the normal Turbine protocol, update its vote, or generate new blocks until all +of the following steps are completed. ### Gossip last vote before the restart and ancestors on that fork + Send Gossip message LastVotedForkSlots to everyone, it contains the last voted slot on its tower and the ancestor slots on the last voted fork and is sent in a bitmap like the EpicSlots data structure. The number of ancestor slots sent is determined by , by default this number is 2000. ### Aggregate, repair, and replay the slots in LastVotedForkSlots + Aggregate the slots in received LastVotedForkSlots messages, whenever some slot has more than 47% stake and it's missing locally, start the repair process for this slot. 47% is chosen because we don't want to miss any optimistically confirmed slot, those @@ -71,14 +89,21 @@ participating in restarts, so we need to repair all slots with at least 67% - 20% = 47% votes. ### Gossip current heaviest fork + After receiving LastVotedForkSlots from 80% of the validators and reparing all slots with more than 47% votes, count the heaviest fork and Gossip Heaviest(X, Hash(X)) out, where X is the tip of the heaviest fork. ### Generate local snapshot on the heaviest slot -Generate a local snapshot on the heaviest slot. + +Generate a local snapshot on the heaviest slot. In case validators can't agree on +the correct restart slot in the next step, we still need to restart using the old +method, then the snapshot generated here may help speed up that process, because +human operators can easily decide which is actually the correct restart slot and +use the snapshots generated in this step. ### Proceed to restart if everything looks okay, halt otherwise + If things go well, all 80% of the validators should find the same heaviest fork. But we are only sending slots instead of bank hashes in LastVotedForkSlots, so it's possible that a duplicate block can make the cluster unable to reach consensus. If at least @@ -86,6 +111,7 @@ that a duplicate block can make the cluster unable to reach consensus. If at lea Otherwise validators should halt and send alerts for human attention. ## Impact + This proposal adds a new RepairAndRestart mode to validators, during this phase the validators will not participate in normal cluster activities, which is the same as now. Compared to today's cluster restart, the new mode may mean more @@ -94,11 +120,17 @@ safety of optimistically confirmed user transactions, and validator admins don't need to manually generate and download snapshots again. ## Security Considerations + The two added Gossip messages LastVotedForkSlots and Heavist will only be sent and processed when the validator is restarted in RepairAndRestart mode. So random validator restarting in the new mode will not bring extra burden to the system. +Non-conforming validators could send out wrong LastVotedForkSlots and Heaviest +messages to mess with cluster restarts, these should be included in the Slashing +rules in the future. + ## Backwards Compatibility + This change is backward compatible with previous versions, because validators only enter the new mode during new restart mode which is controlled by a command line argument. All current restart arguments like --wait-for-supermajority and From 4feeb64141c9f8788ac910980ff9adfde083ffec Mon Sep 17 00:00:00 2001 From: Wen Date: Wed, 12 Apr 2023 16:48:21 -0700 Subject: [PATCH 006/119] Change threshold value from 47% to 34%. --- proposals/0024-repair-and-restart.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/proposals/0024-repair-and-restart.md b/proposals/0024-repair-and-restart.md index d134470d..f1ea728a 100644 --- a/proposals/0024-repair-and-restart.md +++ b/proposals/0024-repair-and-restart.md @@ -57,7 +57,7 @@ Turbine, vote, and new block generation are paused in this mode. ## Detailed Design The new protocol tries to do the following: -1. Everyone freeze, no new blocks, no new votes, and no Turbine +1. Everyone freezes, no new blocks, no new votes, and no Turbine 2. Make all blocks which can potentially have been optimistically confirmed before the freeze propagate to everyone 3. Make restart partipants' last votes before the freze propagate to everyone @@ -82,16 +82,16 @@ the EpicSlots data structure. The number of ancestor slots sent is determined by ### Aggregate, repair, and replay the slots in LastVotedForkSlots Aggregate the slots in received LastVotedForkSlots messages, whenever some slot has -more than 47% stake and it's missing locally, start the repair process for this slot. -47% is chosen because we don't want to miss any optimistically confirmed slot, those -slots have at least 67% votes, and we assume that there can be 20% validators not -participating in restarts, so we need to repair all slots with at least -67% - 20% = 47% votes. +more than 34% stake and it's missing locally, start the repair process for this slot. +34% is chosen because we don't want to miss any optimistically confirmed slot, those +slots have at least 67% votes, there can be 33% validators not giving out dependable +answers (for example, claim they didn't vote for a slot when they actually did), so +we need to repair all slots with at least 67% - 33% = 34% votes. ### Gossip current heaviest fork After receiving LastVotedForkSlots from 80% of the validators and reparing all slots -with more than 47% votes, count the heaviest fork and Gossip Heaviest(X, Hash(X)) out, +with more than 34% votes, count the heaviest fork and Gossip Heaviest(X, Hash(X)) out, where X is the tip of the heaviest fork. ### Generate local snapshot on the heaviest slot From 0aff4cd78adcb3bdae411ec16073044b4cf78f74 Mon Sep 17 00:00:00 2001 From: Wen Date: Sat, 15 Apr 2023 13:06:59 -0700 Subject: [PATCH 007/119] Add introduction, and update default slots to send. --- proposals/0024-repair-and-restart.md | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/proposals/0024-repair-and-restart.md b/proposals/0024-repair-and-restart.md index f1ea728a..f8cb9c2c 100644 --- a/proposals/0024-repair-and-restart.md +++ b/proposals/0024-repair-and-restart.md @@ -56,11 +56,16 @@ Turbine, vote, and new block generation are paused in this mode. ## Detailed Design -The new protocol tries to do the following: +The new protocol tries to make all 80% restarting validators get the same +data blocks and the same set of last votes among them, then they can probably +make the same decision and then proceed. + +The steps roughly look like this: 1. Everyone freezes, no new blocks, no new votes, and no Turbine 2. Make all blocks which can potentially have been optimistically confirmed before the freeze propagate to everyone -3. Make restart partipants' last votes before the freze propagate to everyone +3. Make restart participants' last votes before the freeze propagate to +everyone 4. Now see if enough people can optimistically agree on one block (same slot and hash) to restart from 4.1 If yes, proceed and restart @@ -77,7 +82,10 @@ of the following steps are completed. Send Gossip message LastVotedForkSlots to everyone, it contains the last voted slot on its tower and the ancestor slots on the last voted fork and is sent in a bitmap like the EpicSlots data structure. The number of ancestor slots sent is determined by -, by default this number is 2000. +. By default this number is 108000, +because that's 400ms * 10800 = 12 hours, we assume most restart decisions to be made +in half a day. You can increase this number if you restart after the outage lasted +more than 12 hours. ### Aggregate, repair, and replay the slots in LastVotedForkSlots From e29d83ca7ca2c92c2c75347645e1d9d080843154 Mon Sep 17 00:00:00 2001 From: Wen Date: Mon, 17 Apr 2023 13:29:42 -0700 Subject: [PATCH 008/119] Remove snapshot generation from the new restart protocol and lint changes. --- proposals/0024-repair-and-restart.md | 65 ++++++++++++++-------------- 1 file changed, 32 insertions(+), 33 deletions(-) diff --git a/proposals/0024-repair-and-restart.md b/proposals/0024-repair-and-restart.md index f8cb9c2c..541d4e48 100644 --- a/proposals/0024-repair-and-restart.md +++ b/proposals/0024-repair-and-restart.md @@ -14,14 +14,14 @@ feature: (fill in with feature tracking issues once accepted) Improve the current [cluster restart] (https://docs.solana.com/running-validator/restart-cluster) -procedure such that validators can automatically figure out confirmed slots, -generate a snapshot, and then proceed to restart if everything looks fine. +procedure such that validators can automatically figure out confirmed slots +and then proceed to restart if everything looks fine. ## Motivation Currently during a cluster restart, validator operators need to decide latest -optimistically confirmed slot, create a snapshot, then restart the validators -with new commandline arguments. +optimistically confirmed slot, then restart the validators with new commandline +arguments. The current process involves a lot of human intenvention, if people make a mistake in deciding the highest optimistically confirmed slot, it could mean @@ -29,7 +29,7 @@ rollback of user transactions after they have been confirmed, which is not acceptable. We aim to automate the finding of highest optimistically confirmed slot and -snapshot generation, so that we can lower the possibility of human mistakes +block data distribution, so that we can lower the possibility of human mistakes in the cluster restart process. ## Alternatives Considered @@ -61,14 +61,20 @@ data blocks and the same set of last votes among them, then they can probably make the same decision and then proceed. The steps roughly look like this: + 1. Everyone freezes, no new blocks, no new votes, and no Turbine + 2. Make all blocks which can potentially have been optimistically confirmed before the freeze propagate to everyone + 3. Make restart participants' last votes before the freeze propagate to everyone + 4. Now see if enough people can optimistically agree on one block (same slot and hash) to restart from + 4.1 If yes, proceed and restart + 4.2 If no, freeze and print out what you think is wrong, wait for human A new command line arg --RepairAndRestart is added. @@ -79,44 +85,37 @@ of the following steps are completed. ### Gossip last vote before the restart and ancestors on that fork -Send Gossip message LastVotedForkSlots to everyone, it contains the last voted slot on -its tower and the ancestor slots on the last voted fork and is sent in a bitmap like -the EpicSlots data structure. The number of ancestor slots sent is determined by -. By default this number is 108000, -because that's 400ms * 10800 = 12 hours, we assume most restart decisions to be made -in half a day. You can increase this number if you restart after the outage lasted -more than 12 hours. +Send Gossip message LastVotedForkSlots to everyone, it contains the last voted +slot on its tower and the ancestor slots on the last voted fork and is sent in +a bitmap like the EpicSlots data structure. The number of ancestor slots sent +is determined by . By default this +number is 108000, because that's 400ms * 10800 = 12 hours, we assume most +restart decisions to be made in half a day. You can increase this number if you +restart after the outage lasted more than 12 hours. ### Aggregate, repair, and replay the slots in LastVotedForkSlots -Aggregate the slots in received LastVotedForkSlots messages, whenever some slot has -more than 34% stake and it's missing locally, start the repair process for this slot. -34% is chosen because we don't want to miss any optimistically confirmed slot, those -slots have at least 67% votes, there can be 33% validators not giving out dependable -answers (for example, claim they didn't vote for a slot when they actually did), so -we need to repair all slots with at least 67% - 33% = 34% votes. +Aggregate the slots in received LastVotedForkSlots messages, whenever some slot +has more than 34% stake and it's missing locally, start the repair process for +this slot. 34% is chosen because we don't want to miss any optimistically +confirmed slot, those slots have at least 67% votes, there can be 33% validators +not giving out dependable answers (for example, claim they didn't vote for a slot +when they actually did), so we need to repair all slots with at least +67% - 33% = 34% votes. ### Gossip current heaviest fork -After receiving LastVotedForkSlots from 80% of the validators and reparing all slots -with more than 34% votes, count the heaviest fork and Gossip Heaviest(X, Hash(X)) out, -where X is the tip of the heaviest fork. - -### Generate local snapshot on the heaviest slot - -Generate a local snapshot on the heaviest slot. In case validators can't agree on -the correct restart slot in the next step, we still need to restart using the old -method, then the snapshot generated here may help speed up that process, because -human operators can easily decide which is actually the correct restart slot and -use the snapshots generated in this step. +After receiving LastVotedForkSlots from 80% of the validators and reparing all +slots with more than 34% votes, count the heaviest fork and Gossip +Heaviest(X, Hash(X)) out, where X is the tip of the heaviest fork. ### Proceed to restart if everything looks okay, halt otherwise If things go well, all 80% of the validators should find the same heaviest fork. But -we are only sending slots instead of bank hashes in LastVotedForkSlots, so it's possible -that a duplicate block can make the cluster unable to reach consensus. If at least -2/3 of the people agree on one slot, they should proceed to restart from this slot. -Otherwise validators should halt and send alerts for human attention. +we are only sending slots instead of bank hashes in LastVotedForkSlots, so it's +possible that a duplicate block can make the cluster unable to reach consensus. If +at least 2/3 of the people agree on one slot, they should proceed to restart from +this slot. Otherwise validators should halt and send alerts for human attention. ## Impact From b0b2d47cad1998dfc00be277ad62efa737a9f982 Mon Sep 17 00:00:00 2001 From: Wen Date: Mon, 17 Apr 2023 19:37:53 -0700 Subject: [PATCH 009/119] Change must have block threshold. --- proposals/0024-repair-and-restart.md | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/proposals/0024-repair-and-restart.md b/proposals/0024-repair-and-restart.md index 541d4e48..35127067 100644 --- a/proposals/0024-repair-and-restart.md +++ b/proposals/0024-repair-and-restart.md @@ -96,12 +96,15 @@ restart after the outage lasted more than 12 hours. ### Aggregate, repair, and replay the slots in LastVotedForkSlots Aggregate the slots in received LastVotedForkSlots messages, whenever some slot -has more than 34% stake and it's missing locally, start the repair process for -this slot. 34% is chosen because we don't want to miss any optimistically -confirmed slot, those slots have at least 67% votes, there can be 33% validators -not giving out dependable answers (for example, claim they didn't vote for a slot -when they actually did), so we need to repair all slots with at least -67% - 33% = 34% votes. +has enough stake to be optimistically confirmed and it's missing locally, start +the repair process for this slot. + +We calculate "enough" stake as follows. When there are 80% validators joining +the restart, assuming 5% restarted validators can make mistakes in voting, any +block with more than 67% - 5% - (100-80)% = 42% could potentially be +optimistically confirmed before the restart. If there are 85% validators in the +restart, then any block with more than 67% - 5% - (100-85)% = 47% could be +optimistically confirmed before the restart. ### Gossip current heaviest fork From 838429d5cb82f3a03f22de81ebe3d25421146917 Mon Sep 17 00:00:00 2001 From: Wen Date: Wed, 19 Apr 2023 11:53:53 -0700 Subject: [PATCH 010/119] Update the proposal to reflect changes in discussion. --- proposals/0024-repair-and-restart.md | 31 ++++++++++++++++++++++------ 1 file changed, 25 insertions(+), 6 deletions(-) diff --git a/proposals/0024-repair-and-restart.md b/proposals/0024-repair-and-restart.md index 35127067..d6f65d79 100644 --- a/proposals/0024-repair-and-restart.md +++ b/proposals/0024-repair-and-restart.md @@ -23,10 +23,10 @@ Currently during a cluster restart, validator operators need to decide latest optimistically confirmed slot, then restart the validators with new commandline arguments. -The current process involves a lot of human intenvention, if people make a +The current process involves a lot of human intervention, if people make a mistake in deciding the highest optimistically confirmed slot, it could mean -rollback of user transactions after they have been confirmed, which is not -acceptable. +rollback of user transactions after those transactions have been confirmed, +which is not acceptable. We aim to automate the finding of highest optimistically confirmed slot and block data distribution, so that we can lower the possibility of human mistakes @@ -108,9 +108,24 @@ optimistically confirmed before the restart. ### Gossip current heaviest fork -After receiving LastVotedForkSlots from 80% of the validators and reparing all -slots with more than 34% votes, count the heaviest fork and Gossip -Heaviest(X, Hash(X)) out, where X is the tip of the heaviest fork. +After receiving LastVotedForkSlots from 80% of the validators and reparing slots +with "enough" stake, replay all blocks and pick the heaviest fork as follows: + +1. Pick block and update root for all blocks with more than 67% votes + +2. If a picked block has more than one children, compare the votes on two +heaviest children: + +2.1 If vote_on_child_B + stake_on_validators_not_in_restart < vote_on_child_A, +pick child A. For example, if 80% validators are in restart, child B has 33% +stakes, child A has 54% stakes, then 33 + (100-80) = 53 < 54, pick child A. + +2.2 Otherwise stop traversing the tree and use last picked block. + +After deciding heaviest block, Gossip +Heaviest(X, Hash(X), received_heaviest_stake) out, where X is the latest picked +block. We also send out stake of received Heaviest messages so that we can proceed +to next step when enough validators are ready. ### Proceed to restart if everything looks okay, halt otherwise @@ -120,6 +135,10 @@ possible that a duplicate block can make the cluster unable to reach consensus. at least 2/3 of the people agree on one slot, they should proceed to restart from this slot. Otherwise validators should halt and send alerts for human attention. +Also, there might be 5% of the validators not sending Heaviest, so we only require +that 75% of the people received 75% of the Heaviest messages and they all agree on +one block and hash. + ## Impact This proposal adds a new RepairAndRestart mode to validators, during this phase From 104946308fad3faf7ab01729c2cfe0820cc2e6ff Mon Sep 17 00:00:00 2001 From: Wen Date: Wed, 19 Apr 2023 14:45:11 -0700 Subject: [PATCH 011/119] Add the wait before restart. --- proposals/0024-repair-and-restart.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/proposals/0024-repair-and-restart.md b/proposals/0024-repair-and-restart.md index d6f65d79..94032a06 100644 --- a/proposals/0024-repair-and-restart.md +++ b/proposals/0024-repair-and-restart.md @@ -139,6 +139,10 @@ Also, there might be 5% of the validators not sending Heaviest, so we only requi that 75% of the people received 75% of the Heaviest messages and they all agree on one block and hash. +So after a validator sees that 75% of the validators received 75% of the votes, +wait for 10 more minutes so that the message it sent out have propagated, then +restart from the Heaviest slot everyone agreed on. + ## Impact This proposal adds a new RepairAndRestart mode to validators, during this phase From 6e4a5cd7a7326029d21bdcb1fdf60c2831f13d6f Mon Sep 17 00:00:00 2001 From: Wen Date: Thu, 20 Apr 2023 15:09:44 -0700 Subject: [PATCH 012/119] Change Heaviest selection algorithm. --- proposals/0024-repair-and-restart.md | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/proposals/0024-repair-and-restart.md b/proposals/0024-repair-and-restart.md index 94032a06..4542c6a8 100644 --- a/proposals/0024-repair-and-restart.md +++ b/proposals/0024-repair-and-restart.md @@ -113,12 +113,13 @@ with "enough" stake, replay all blocks and pick the heaviest fork as follows: 1. Pick block and update root for all blocks with more than 67% votes -2. If a picked block has more than one children, compare the votes on two -heaviest children: +2. If a picked block has more than one children, compare the votes on the +heaviest child: -2.1 If vote_on_child_B + stake_on_validators_not_in_restart < vote_on_child_A, -pick child A. For example, if 80% validators are in restart, child B has 33% -stakes, child A has 54% stakes, then 33 + (100-80) = 53 < 54, pick child A. +2.1 If vote_on_child + stake_on_validators_not_in_restart >= 62%, pick child. +For example, if 80% validators are in restart, child has 42% votes, then +42 + (100-80) = 62%, pick child. 62% is chosen instead of 67% because 5% +could make the wrong votes. 2.2 Otherwise stop traversing the tree and use last picked block. From 8cb6ef615068f818b2f93bad6a5016baff830c0d Mon Sep 17 00:00:00 2001 From: Wen Date: Tue, 25 Apr 2023 17:14:29 -0700 Subject: [PATCH 013/119] Make linter happy. --- proposals/0024-repair-and-restart.md | 54 ++++++++++++++-------------- 1 file changed, 28 insertions(+), 26 deletions(-) diff --git a/proposals/0024-repair-and-restart.md b/proposals/0024-repair-and-restart.md index 4542c6a8..15052b69 100644 --- a/proposals/0024-repair-and-restart.md +++ b/proposals/0024-repair-and-restart.md @@ -70,8 +70,8 @@ before the freeze propagate to everyone 3. Make restart participants' last votes before the freeze propagate to everyone -4. Now see if enough people can optimistically agree on one block (same -slot and hash) to restart from +4. Now see if enough people can optimistically agree on one block (same slot +and hash) to restart from 4.1 If yes, proceed and restart @@ -113,8 +113,8 @@ with "enough" stake, replay all blocks and pick the heaviest fork as follows: 1. Pick block and update root for all blocks with more than 67% votes -2. If a picked block has more than one children, compare the votes on the -heaviest child: +2. If a picked block has more than one children, check if the votes on the +heaviest child is over threshold: 2.1 If vote_on_child + stake_on_validators_not_in_restart >= 62%, pick child. For example, if 80% validators are in restart, child has 42% votes, then @@ -130,15 +130,16 @@ to next step when enough validators are ready. ### Proceed to restart if everything looks okay, halt otherwise -If things go well, all 80% of the validators should find the same heaviest fork. But -we are only sending slots instead of bank hashes in LastVotedForkSlots, so it's -possible that a duplicate block can make the cluster unable to reach consensus. If -at least 2/3 of the people agree on one slot, they should proceed to restart from -this slot. Otherwise validators should halt and send alerts for human attention. +If things go well, all 80% of the validators should find the same heaviest +fork. But we are only sending slots instead of bank hashes in +LastVotedForkSlots, so it's possible that a duplicate block can make the +cluster unable to reach consensus. If at least 2/3 of the people agree on one +slot, they should proceed to restart from this slot. Otherwise validators +should halt and send alerts for human attention. -Also, there might be 5% of the validators not sending Heaviest, so we only require -that 75% of the people received 75% of the Heaviest messages and they all agree on -one block and hash. +Also, there might be 5% of the validators not sending Heaviest, so we only +require that 75% of the people received 75% of the Heaviest messages and they +all agree on one block and hash. So after a validator sees that 75% of the validators received 75% of the votes, wait for 10 more minutes so that the message it sent out have propagated, then @@ -149,25 +150,26 @@ restart from the Heaviest slot everyone agreed on. This proposal adds a new RepairAndRestart mode to validators, during this phase the validators will not participate in normal cluster activities, which is the same as now. Compared to today's cluster restart, the new mode may mean more -network bandwidth and memory on the restarting validators, but it guarantees the -safety of optimistically confirmed user transactions, and validator admins don't -need to manually generate and download snapshots again. +network bandwidth and memory on the restarting validators, but it guarantees +the safety of optimistically confirmed user transactions, and validator admins +don't need to manually generate and download snapshots again. ## Security Considerations -The two added Gossip messages LastVotedForkSlots and Heavist will only be sent and -processed when the validator is restarted in RepairAndRestart mode. So random validator -restarting in the new mode will not bring extra burden to the system. +The two added Gossip messages LastVotedForkSlots and Heavist will only be sent +and processed when the validator is restarted in RepairAndRestart mode. So +random validator restarting in the new mode will not bring extra burden to the +system. Non-conforming validators could send out wrong LastVotedForkSlots and Heaviest -messages to mess with cluster restarts, these should be included in the Slashing -rules in the future. +messages to mess with cluster restarts, these should be included in the +Slashing rules in the future. ## Backwards Compatibility -This change is backward compatible with previous versions, because validators only -enter the new mode during new restart mode which is controlled by a command line -argument. All current restart arguments like --wait-for-supermajority and ---expected-bank-hash will be kept as is for now. -However, this change does not work until at least 80% installed the new binary and -they are willing to use the new methods for restart. +This change is backward compatible with previous versions, because validators +only enter the new mode during new restart mode which is controlled by a +command line argument. All current restart arguments like +--wait-for-supermajority and --expected-bank-hash will be kept as is for now. +However, this change does not work until at least 80% installed the new binary +and they are willing to use the new methods for restart. From df5932d0325312cbe7eeb4337cc36ccfdb233440 Mon Sep 17 00:00:00 2001 From: Wen Date: Tue, 25 Apr 2023 22:13:43 -0700 Subject: [PATCH 014/119] Shorten title to make linter happy. --- proposals/0024-repair-and-restart.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/0024-repair-and-restart.md b/proposals/0024-repair-and-restart.md index 15052b69..9c3d2b9c 100644 --- a/proposals/0024-repair-and-restart.md +++ b/proposals/0024-repair-and-restart.md @@ -1,6 +1,6 @@ --- simd: '0024' -title: Automatially repair and start for a cluster restart +title: Repair and start in a cluster restart authors: - Wen Xu (Solana Labs) category: Standard From 5050a7c319934a56f35c0de3552b9ac8d2ec9b38 Mon Sep 17 00:00:00 2001 From: Wen Date: Wed, 26 Apr 2023 13:14:07 -0700 Subject: [PATCH 015/119] Add details of messages and change command line. --- proposals/0024-repair-and-restart.md | 64 ++++++++++++++++------------ 1 file changed, 37 insertions(+), 27 deletions(-) diff --git a/proposals/0024-repair-and-restart.md b/proposals/0024-repair-and-restart.md index 9c3d2b9c..653acda2 100644 --- a/proposals/0024-repair-and-restart.md +++ b/proposals/0024-repair-and-restart.md @@ -12,8 +12,7 @@ feature: (fill in with feature tracking issues once accepted) ## Summary -Improve the current [cluster restart] -(https://docs.solana.com/running-validator/restart-cluster) +Improve the current [cluster restart](https://docs.solana.com/running-validator/restart-cluster) procedure such that validators can automatically figure out confirmed slots and then proceed to restart if everything looks fine. @@ -34,8 +33,7 @@ in the cluster restart process. ## Alternatives Considered -See [Handling Common Solana Outages] -(https://docs.google.com/document/d/1RkNAyz-5aKvv5FF44b8SoKifChKB705y5SdcEoqMPIc) +See [Handling Common Solana Outages](https://docs.google.com/document/d/1RkNAyz-5aKvv5FF44b8SoKifChKB705y5SdcEoqMPIc) for details. There are many proposals about automatically detecting that the cluster is @@ -47,13 +45,6 @@ the recovery process itself doesn't work, it might be some time before we can get human's attention. And it doesn't solve the cases where new binary is needed. So for now we still plan to have human in the loop. -## New Terminology - -* SilentRepairMode - when validators restart in this new mode, they will -talk with each other to find the highest optimistically confirmed slot and -repair all necessary blocks. To improve speed and guarantee simplicity, -Turbine, vote, and new block generation are paused in this mode. - ## Detailed Design The new protocol tries to make all 80% restarting validators get the same @@ -77,21 +68,29 @@ and hash) to restart from 4.2 If no, freeze and print out what you think is wrong, wait for human -A new command line arg --RepairAndRestart is added. -When the cluster is in need of a restart, we assume at least 80% will restart -with this arg. Any validators restarted with this arg does not participate in -the normal Turbine protocol, update its vote, or generate new blocks until all -of the following steps are completed. +A new command line arg --RepairAndRestart is added. When the cluster is in need +of a restart, we assume at least 80% will restart with this arg. Any validators +restarted with this arg does not participate in the normal Turbine protocol, +update its vote, or generate new blocks until all of the following steps are +completed. ### Gossip last vote before the restart and ancestors on that fork -Send Gossip message LastVotedForkSlots to everyone, it contains the last voted +Send direct message LastVotedForkSlots to everyone, it contains the last voted slot on its tower and the ancestor slots on the last voted fork and is sent in -a bitmap like the EpicSlots data structure. The number of ancestor slots sent -is determined by . By default this -number is 108000, because that's 400ms * 10800 = 12 hours, we assume most -restart decisions to be made in half a day. You can increase this number if you -restart after the outage lasted more than 12 hours. +a compressed bitmap like the EpicSlots data structure. The number of ancestor +slots sent is hard coded at 81000, because that's 400ms * 8100 = 9 hours, we +assume most restart decisions to be made in 9 hours. If a validator restarts +after 9 hours past the outage, it cannot join the restart this way. If enough +validators failed to restart within 9 hours, then use the old restart method. + +The fields of LastVotedForkSlots are: + +- `last_voted_slot`: the slot last voted, this also serves as last_slot for the +bit vector. +- `last_voted_hash`: the bank hash of the slot last voted slot. +- `slots`: compressed bit vector representing the slots on the last voted fork, +last slot is always last_voted_slot, first slot is last_voted_slot-8100. ### Aggregate, repair, and replay the slots in LastVotedForkSlots @@ -108,8 +107,9 @@ optimistically confirmed before the restart. ### Gossip current heaviest fork -After receiving LastVotedForkSlots from 80% of the validators and reparing slots -with "enough" stake, replay all blocks and pick the heaviest fork as follows: +After receiving LastVotedForkSlots from 80% of the validators and reparing +slots with "enough" stake, replay all blocks and pick the heaviest fork as +follows: 1. Pick block and update root for all blocks with more than 67% votes @@ -125,8 +125,15 @@ could make the wrong votes. After deciding heaviest block, Gossip Heaviest(X, Hash(X), received_heaviest_stake) out, where X is the latest picked -block. We also send out stake of received Heaviest messages so that we can proceed -to next step when enough validators are ready. +block. We also send out stake of received Heaviest messages so that we can +proceed to next step when enough validators are ready. + +The fields of the Heaviest message is: + +- `slot`: slot of the picked block. +- `hash`: bank hash of the picked block. +- `received`: total of stakes of the validators it received Heaviest messages +from. ### Proceed to restart if everything looks okay, halt otherwise @@ -143,7 +150,10 @@ all agree on one block and hash. So after a validator sees that 75% of the validators received 75% of the votes, wait for 10 more minutes so that the message it sent out have propagated, then -restart from the Heaviest slot everyone agreed on. +do the following: + +- Issue a hard fork at the highest oc slot and change shred version in Gossip. +- Execute the current --wait-for-supermajority logic and wait for 75%. ## Impact From 90134b57951cb04955f11b216d534de9693f2eea Mon Sep 17 00:00:00 2001 From: Wen Date: Thu, 27 Apr 2023 09:47:36 -0700 Subject: [PATCH 016/119] Fix typos on numbers. --- proposals/0024-repair-and-restart.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/proposals/0024-repair-and-restart.md b/proposals/0024-repair-and-restart.md index 653acda2..5e1f73dc 100644 --- a/proposals/0024-repair-and-restart.md +++ b/proposals/0024-repair-and-restart.md @@ -79,7 +79,7 @@ completed. Send direct message LastVotedForkSlots to everyone, it contains the last voted slot on its tower and the ancestor slots on the last voted fork and is sent in a compressed bitmap like the EpicSlots data structure. The number of ancestor -slots sent is hard coded at 81000, because that's 400ms * 8100 = 9 hours, we +slots sent is hard coded at 81000, because that's 400ms * 81000 = 9 hours, we assume most restart decisions to be made in 9 hours. If a validator restarts after 9 hours past the outage, it cannot join the restart this way. If enough validators failed to restart within 9 hours, then use the old restart method. @@ -90,7 +90,7 @@ The fields of LastVotedForkSlots are: bit vector. - `last_voted_hash`: the bank hash of the slot last voted slot. - `slots`: compressed bit vector representing the slots on the last voted fork, -last slot is always last_voted_slot, first slot is last_voted_slot-8100. +last slot is always last_voted_slot, first slot is last_voted_slot-81000. ### Aggregate, repair, and replay the slots in LastVotedForkSlots From 85f62b4ad12d1b380629bbc0ce56b1c6957d2130 Mon Sep 17 00:00:00 2001 From: Wen <113942165+wen-coding@users.noreply.github.com> Date: Mon, 1 May 2023 11:02:01 -0700 Subject: [PATCH 017/119] Update proposals/0024-repair-and-restart.md Co-authored-by: mvines --- proposals/0024-repair-and-restart.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/0024-repair-and-restart.md b/proposals/0024-repair-and-restart.md index 5e1f73dc..321ea4b3 100644 --- a/proposals/0024-repair-and-restart.md +++ b/proposals/0024-repair-and-restart.md @@ -13,7 +13,7 @@ feature: (fill in with feature tracking issues once accepted) ## Summary Improve the current [cluster restart](https://docs.solana.com/running-validator/restart-cluster) -procedure such that validators can automatically figure out confirmed slots +procedure such that validators can automatically figure out the highest optimistically confirmed slot and then proceed to restart if everything looks fine. ## Motivation From eafd745b565598950312598d776d0d9d0d345204 Mon Sep 17 00:00:00 2001 From: Wen Date: Mon, 1 May 2023 11:05:37 -0700 Subject: [PATCH 018/119] Make linter happy. --- proposals/0024-repair-and-restart.md | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/proposals/0024-repair-and-restart.md b/proposals/0024-repair-and-restart.md index 321ea4b3..b50e48f9 100644 --- a/proposals/0024-repair-and-restart.md +++ b/proposals/0024-repair-and-restart.md @@ -13,8 +13,13 @@ feature: (fill in with feature tracking issues once accepted) ## Summary Improve the current [cluster restart](https://docs.solana.com/running-validator/restart-cluster) -procedure such that validators can automatically figure out the highest optimistically confirmed slot -and then proceed to restart if everything looks fine. +procedure such that validators can automatically figure out the highest +optimistically confirmed slot and then proceed to restart if everything looks +fine. + +## New Terminology + +None ## Motivation From ecccadf4229c87be2df181d0ab757c855ba62378 Mon Sep 17 00:00:00 2001 From: Wen Date: Tue, 2 May 2023 13:15:12 -0700 Subject: [PATCH 019/119] All messages need to keep flowing before restart. --- proposals/0024-repair-and-restart.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/proposals/0024-repair-and-restart.md b/proposals/0024-repair-and-restart.md index b50e48f9..66b6928c 100644 --- a/proposals/0024-repair-and-restart.md +++ b/proposals/0024-repair-and-restart.md @@ -160,6 +160,9 @@ do the following: - Issue a hard fork at the highest oc slot and change shred version in Gossip. - Execute the current --wait-for-supermajority logic and wait for 75%. +Before a validator enters restart, it will still respond to LastVotedForkSlots +and send Heaviest messages periodically. + ## Impact This proposal adds a new RepairAndRestart mode to validators, during this phase From e143136612256fba6831a64926e928c21ae28523 Mon Sep 17 00:00:00 2001 From: Wen Date: Thu, 4 May 2023 14:36:17 -0700 Subject: [PATCH 020/119] A snapshot should be generated first in a restart. --- proposals/0024-repair-and-restart.md | 1 + 1 file changed, 1 insertion(+) diff --git a/proposals/0024-repair-and-restart.md b/proposals/0024-repair-and-restart.md index 66b6928c..55975f01 100644 --- a/proposals/0024-repair-and-restart.md +++ b/proposals/0024-repair-and-restart.md @@ -157,6 +157,7 @@ So after a validator sees that 75% of the validators received 75% of the votes, wait for 10 more minutes so that the message it sent out have propagated, then do the following: +- Generate a snapshot at the highest oc slot. - Issue a hard fork at the highest oc slot and change shred version in Gossip. - Execute the current --wait-for-supermajority logic and wait for 75%. From 57b3b1644736b7a6fdc74770ccd3c981293678f8 Mon Sep 17 00:00:00 2001 From: Wen Date: Tue, 9 May 2023 10:49:25 -0700 Subject: [PATCH 021/119] Use Gossip instead of direct messaging in restart. --- proposals/0024-repair-and-restart.md | 24 +++++++++++++++--------- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/proposals/0024-repair-and-restart.md b/proposals/0024-repair-and-restart.md index 55975f01..d6e4ea75 100644 --- a/proposals/0024-repair-and-restart.md +++ b/proposals/0024-repair-and-restart.md @@ -81,13 +81,14 @@ completed. ### Gossip last vote before the restart and ancestors on that fork -Send direct message LastVotedForkSlots to everyone, it contains the last voted -slot on its tower and the ancestor slots on the last voted fork and is sent in -a compressed bitmap like the EpicSlots data structure. The number of ancestor -slots sent is hard coded at 81000, because that's 400ms * 81000 = 9 hours, we -assume most restart decisions to be made in 9 hours. If a validator restarts -after 9 hours past the outage, it cannot join the restart this way. If enough -validators failed to restart within 9 hours, then use the old restart method. +Send Gossip message LastVotedForkSlots to everyone in restart, it contains the +last voted slot on its tower and the ancestor slots on the last voted fork and +is sent in a compressed bitmap like the EpicSlots data structure. The number of +ancestor slots sent is hard coded at 81000, because that's 400ms * 81000 = 9 +hours, we assume most restart decisions to be made in 9 hours. If a validator +restarts after 9 hours past the outage, it cannot join the restart this way. If +enough validators failed to restart within 9 hours, then use the old restart +method. The fields of LastVotedForkSlots are: @@ -97,6 +98,10 @@ bit vector. - `slots`: compressed bit vector representing the slots on the last voted fork, last slot is always last_voted_slot, first slot is last_voted_slot-81000. +When a validator enters restart, it increments its current shred_version, so +the Gossip messages used in restart will not interfere with those outside the +restart. + ### Aggregate, repair, and replay the slots in LastVotedForkSlots Aggregate the slots in received LastVotedForkSlots messages, whenever some slot @@ -161,8 +166,9 @@ do the following: - Issue a hard fork at the highest oc slot and change shred version in Gossip. - Execute the current --wait-for-supermajority logic and wait for 75%. -Before a validator enters restart, it will still respond to LastVotedForkSlots -and send Heaviest messages periodically. +Before a validator enters restart, it will still propagate LastVotedForkSlots +and Heaviest messages in Gossip. After the restart,its shred_version will be +updated so it will no longer send or propagate Gossip messages for restart. ## Impact From 4b99230bed2f567db1cba338efe9b8d2f4f19da5 Mon Sep 17 00:00:00 2001 From: Wen Date: Wed, 10 May 2023 16:38:30 -0700 Subject: [PATCH 022/119] Require 80% of the people receive 80% of Heaviest. --- proposals/0024-repair-and-restart.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/proposals/0024-repair-and-restart.md b/proposals/0024-repair-and-restart.md index d6e4ea75..1a391efc 100644 --- a/proposals/0024-repair-and-restart.md +++ b/proposals/0024-repair-and-restart.md @@ -154,17 +154,17 @@ cluster unable to reach consensus. If at least 2/3 of the people agree on one slot, they should proceed to restart from this slot. Otherwise validators should halt and send alerts for human attention. -Also, there might be 5% of the validators not sending Heaviest, so we only -require that 75% of the people received 75% of the Heaviest messages and they -all agree on one block and hash. +We require that at least 80% of the people received the Heaviest messages from +validators with at least 80% stake, and that the Heaviest messages all agree on +one block and hash. -So after a validator sees that 75% of the validators received 75% of the votes, +So after a validator sees that 80% of the validators received 80% of the votes, wait for 10 more minutes so that the message it sent out have propagated, then do the following: - Generate a snapshot at the highest oc slot. - Issue a hard fork at the highest oc slot and change shred version in Gossip. -- Execute the current --wait-for-supermajority logic and wait for 75%. +- Execute the current --wait-for-supermajority logic and wait for 80%. Before a validator enters restart, it will still propagate LastVotedForkSlots and Heaviest messages in Gossip. After the restart,its shred_version will be From 198e742d4bdd66df26ad543b4b308ee84a6ca00a Mon Sep 17 00:00:00 2001 From: Wen Date: Thu, 11 May 2023 09:47:25 -0700 Subject: [PATCH 023/119] Add security check and some other changes. --- proposals/0024-repair-and-restart.md | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/proposals/0024-repair-and-restart.md b/proposals/0024-repair-and-restart.md index 1a391efc..721e9706 100644 --- a/proposals/0024-repair-and-restart.md +++ b/proposals/0024-repair-and-restart.md @@ -100,7 +100,10 @@ last slot is always last_voted_slot, first slot is last_voted_slot-81000. When a validator enters restart, it increments its current shred_version, so the Gossip messages used in restart will not interfere with those outside the -restart. +restart. There is slight chance that (current_shred_version+1) % 0xffff would +collide with the new shred_version calculated after the restart, but even if +this rare case occured, we plan to flush the CRDS table on successful restart, +so Gossip messages used in restart will be removed. ### Aggregate, repair, and replay the slots in LastVotedForkSlots @@ -154,6 +157,11 @@ cluster unable to reach consensus. If at least 2/3 of the people agree on one slot, they should proceed to restart from this slot. Otherwise validators should halt and send alerts for human attention. +We will also perform some safety checks, if the voted slot does not satisfy +safety checks, then the validators will panic and halt: + +- The voted slot is equal or a child of local optimistically confirmed slot. + We require that at least 80% of the people received the Heaviest messages from validators with at least 80% stake, and that the Heaviest messages all agree on one block and hash. From 7bd9b74739b24a87eb18fa2f50f4c78060726faa Mon Sep 17 00:00:00 2001 From: Wen <113942165+wen-coding@users.noreply.github.com> Date: Fri, 12 May 2023 09:27:31 -0700 Subject: [PATCH 024/119] Update proposals/0024-repair-and-restart.md Co-authored-by: Trent Nelson --- proposals/0024-repair-and-restart.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/0024-repair-and-restart.md b/proposals/0024-repair-and-restart.md index 721e9706..c28ac8b9 100644 --- a/proposals/0024-repair-and-restart.md +++ b/proposals/0024-repair-and-restart.md @@ -1,6 +1,6 @@ --- simd: '0024' -title: Repair and start in a cluster restart +title: Optimistic cluster restart automation authors: - Wen Xu (Solana Labs) category: Standard From a44eeff899328b4c4698b1d95cdb771d4aca2663 Mon Sep 17 00:00:00 2001 From: Wen <113942165+wen-coding@users.noreply.github.com> Date: Fri, 12 May 2023 12:43:21 -0700 Subject: [PATCH 025/119] Update proposals/0024-repair-and-restart.md Co-authored-by: Trent Nelson --- proposals/0024-repair-and-restart.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/0024-repair-and-restart.md b/proposals/0024-repair-and-restart.md index c28ac8b9..e7895f18 100644 --- a/proposals/0024-repair-and-restart.md +++ b/proposals/0024-repair-and-restart.md @@ -32,7 +32,7 @@ mistake in deciding the highest optimistically confirmed slot, it could mean rollback of user transactions after those transactions have been confirmed, which is not acceptable. -We aim to automate the finding of highest optimistically confirmed slot and +We aim to automate the negotiation of highest optimistically confirmed slot and block data distribution, so that we can lower the possibility of human mistakes in the cluster restart process. From 6147af11aa564425f8a884d121e6e339fcfffc52 Mon Sep 17 00:00:00 2001 From: Wen <113942165+wen-coding@users.noreply.github.com> Date: Fri, 12 May 2023 13:43:43 -0700 Subject: [PATCH 026/119] Update proposals/0024-repair-and-restart.md Co-authored-by: Trent Nelson --- proposals/0024-repair-and-restart.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/0024-repair-and-restart.md b/proposals/0024-repair-and-restart.md index e7895f18..46e31961 100644 --- a/proposals/0024-repair-and-restart.md +++ b/proposals/0024-repair-and-restart.md @@ -83,7 +83,7 @@ completed. Send Gossip message LastVotedForkSlots to everyone in restart, it contains the last voted slot on its tower and the ancestor slots on the last voted fork and -is sent in a compressed bitmap like the EpicSlots data structure. The number of +is sent in a compressed bitmap like the `EpochSlots` data structure. The number of ancestor slots sent is hard coded at 81000, because that's 400ms * 81000 = 9 hours, we assume most restart decisions to be made in 9 hours. If a validator restarts after 9 hours past the outage, it cannot join the restart this way. If From bffcd1d93cb27a7be6ec565bd30633a3807d2481 Mon Sep 17 00:00:00 2001 From: Wen <113942165+wen-coding@users.noreply.github.com> Date: Fri, 12 May 2023 13:44:02 -0700 Subject: [PATCH 027/119] Update proposals/0024-repair-and-restart.md Co-authored-by: Trent Nelson --- proposals/0024-repair-and-restart.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/proposals/0024-repair-and-restart.md b/proposals/0024-repair-and-restart.md index 46e31961..8b162726 100644 --- a/proposals/0024-repair-and-restart.md +++ b/proposals/0024-repair-and-restart.md @@ -87,8 +87,8 @@ is sent in a compressed bitmap like the `EpochSlots` data structure. The number ancestor slots sent is hard coded at 81000, because that's 400ms * 81000 = 9 hours, we assume most restart decisions to be made in 9 hours. If a validator restarts after 9 hours past the outage, it cannot join the restart this way. If -enough validators failed to restart within 9 hours, then use the old restart -method. +enough validators failed to restart within 9 hours, then fallback to the +manual, interactive cluster restart method. The fields of LastVotedForkSlots are: From ebc0ceca01000b629aa2cce02f16d33d433f1a01 Mon Sep 17 00:00:00 2001 From: Wen <113942165+wen-coding@users.noreply.github.com> Date: Fri, 12 May 2023 13:45:29 -0700 Subject: [PATCH 028/119] Update proposals/0024-repair-and-restart.md Co-authored-by: Trent Nelson --- proposals/0024-repair-and-restart.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/proposals/0024-repair-and-restart.md b/proposals/0024-repair-and-restart.md index 8b162726..5e31ec0b 100644 --- a/proposals/0024-repair-and-restart.md +++ b/proposals/0024-repair-and-restart.md @@ -95,8 +95,9 @@ The fields of LastVotedForkSlots are: - `last_voted_slot`: the slot last voted, this also serves as last_slot for the bit vector. - `last_voted_hash`: the bank hash of the slot last voted slot. -- `slots`: compressed bit vector representing the slots on the last voted fork, -last slot is always last_voted_slot, first slot is last_voted_slot-81000. +- `ancestors`: bit vector representing the compressed slots which produced +a block on the last voted fork. the most significant bit is always +last_voted_slot, least significant bit is last_voted_slot-81000. When a validator enters restart, it increments its current shred_version, so the Gossip messages used in restart will not interfere with those outside the From dc9209f130f3c39cc07be76f789832d8d7dfe426 Mon Sep 17 00:00:00 2001 From: Wen <113942165+wen-coding@users.noreply.github.com> Date: Fri, 12 May 2023 13:48:30 -0700 Subject: [PATCH 029/119] Update proposals/0024-repair-and-restart.md Co-authored-by: Trent Nelson --- proposals/0024-repair-and-restart.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/0024-repair-and-restart.md b/proposals/0024-repair-and-restart.md index 5e31ec0b..ca1d01b6 100644 --- a/proposals/0024-repair-and-restart.md +++ b/proposals/0024-repair-and-restart.md @@ -154,7 +154,7 @@ from. If things go well, all 80% of the validators should find the same heaviest fork. But we are only sending slots instead of bank hashes in LastVotedForkSlots, so it's possible that a duplicate block can make the -cluster unable to reach consensus. If at least 2/3 of the people agree on one +cluster unable to reach consensus. If at least 2/3 of the nodes agree on one slot, they should proceed to restart from this slot. Otherwise validators should halt and send alerts for human attention. From eefa08752c5cf67278a8090c361390d121426bf3 Mon Sep 17 00:00:00 2001 From: Wen <113942165+wen-coding@users.noreply.github.com> Date: Fri, 12 May 2023 13:52:54 -0700 Subject: [PATCH 030/119] Update proposals/0024-repair-and-restart.md Co-authored-by: Trent Nelson --- proposals/0024-repair-and-restart.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/0024-repair-and-restart.md b/proposals/0024-repair-and-restart.md index ca1d01b6..02aa5c81 100644 --- a/proposals/0024-repair-and-restart.md +++ b/proposals/0024-repair-and-restart.md @@ -159,7 +159,7 @@ slot, they should proceed to restart from this slot. Otherwise validators should halt and send alerts for human attention. We will also perform some safety checks, if the voted slot does not satisfy -safety checks, then the validators will panic and halt: +safety checks, then the restart will be aborted: - The voted slot is equal or a child of local optimistically confirmed slot. From 41453258f9bf359152ee000cf3fdd218bfd1bd01 Mon Sep 17 00:00:00 2001 From: Wen <113942165+wen-coding@users.noreply.github.com> Date: Fri, 12 May 2023 13:55:49 -0700 Subject: [PATCH 031/119] Update proposals/0024-repair-and-restart.md Co-authored-by: Trent Nelson --- proposals/0024-repair-and-restart.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/proposals/0024-repair-and-restart.md b/proposals/0024-repair-and-restart.md index 02aa5c81..9ec7d15e 100644 --- a/proposals/0024-repair-and-restart.md +++ b/proposals/0024-repair-and-restart.md @@ -69,9 +69,9 @@ everyone 4. Now see if enough people can optimistically agree on one block (same slot and hash) to restart from -4.1 If yes, proceed and restart + 1. If yes, proceed and restart -4.2 If no, freeze and print out what you think is wrong, wait for human + 1. If no, freeze and print out what it thinks is wrong, wait for human A new command line arg --RepairAndRestart is added. When the cluster is in need of a restart, we assume at least 80% will restart with this arg. Any validators From 6aeda8305a7037b81eaeadfc89b120658222b8d4 Mon Sep 17 00:00:00 2001 From: Wen <113942165+wen-coding@users.noreply.github.com> Date: Fri, 12 May 2023 13:57:21 -0700 Subject: [PATCH 032/119] Update proposals/0024-repair-and-restart.md Co-authored-by: Trent Nelson --- proposals/0024-repair-and-restart.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/0024-repair-and-restart.md b/proposals/0024-repair-and-restart.md index 9ec7d15e..8c25ae82 100644 --- a/proposals/0024-repair-and-restart.md +++ b/proposals/0024-repair-and-restart.md @@ -66,7 +66,7 @@ before the freeze propagate to everyone 3. Make restart participants' last votes before the freeze propagate to everyone -4. Now see if enough people can optimistically agree on one block (same slot +4. Now see if enough nodes can optimistically agree on one block (same slot and hash) to restart from 1. If yes, proceed and restart From 78c0e5b678c55236b5246b9d9c15fd910770ae11 Mon Sep 17 00:00:00 2001 From: Wen <113942165+wen-coding@users.noreply.github.com> Date: Fri, 12 May 2023 13:57:34 -0700 Subject: [PATCH 033/119] Update proposals/0024-repair-and-restart.md Co-authored-by: Trent Nelson --- proposals/0024-repair-and-restart.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/0024-repair-and-restart.md b/proposals/0024-repair-and-restart.md index 8c25ae82..fd8430a8 100644 --- a/proposals/0024-repair-and-restart.md +++ b/proposals/0024-repair-and-restart.md @@ -63,7 +63,7 @@ The steps roughly look like this: 2. Make all blocks which can potentially have been optimistically confirmed before the freeze propagate to everyone -3. Make restart participants' last votes before the freeze propagate to +3. Make restart participants' last vote prior to the freeze propagate to everyone 4. Now see if enough nodes can optimistically agree on one block (same slot From a938546def252353f12dbec03228949b908ffb31 Mon Sep 17 00:00:00 2001 From: Wen Date: Fri, 12 May 2023 14:47:17 -0700 Subject: [PATCH 034/119] Add some terminologies. --- proposals/0024-repair-and-restart.md | 134 ++++++++++++++++++--------- 1 file changed, 89 insertions(+), 45 deletions(-) diff --git a/proposals/0024-repair-and-restart.md b/proposals/0024-repair-and-restart.md index c28ac8b9..1fcbe578 100644 --- a/proposals/0024-repair-and-restart.md +++ b/proposals/0024-repair-and-restart.md @@ -12,14 +12,32 @@ feature: (fill in with feature tracking issues once accepted) ## Summary -Improve the current [cluster restart](https://docs.solana.com/running-validator/restart-cluster) -procedure such that validators can automatically figure out the highest -optimistically confirmed slot and then proceed to restart if everything looks -fine. +During a cluster restart following an outage, use gossip to exchange local +status and automatically reach consensus on the block to restart from. Proceed +to restart if validators in the restart can reach agreement, or print debug +information and halt otherwise. ## New Terminology -None +* "cluster restart": When there is an outage such that the whole cluster +stalls, human may need to restart most of the validators with a sane state so +that the cluster can continue to function. This is different from sporadic +single validator restart which does not impact the cluster. See +[cluster restart](https://docs.solana.com/running-validator/restart-cluster) +for details. + +* "optimistically confirmed block": a block which gets the votes of validators +with > 2/3 stake. Our algorithm tries to guarantee that an optimistically +confirmed will never be rolled back. When we are performing cluster restart, we +normally start from the highest optimistically confirmed block, but it's also +okay to start from a child of the highest optimistcially confirmed block as +long as consensus can be reached. + +* `RESTART_STAKE_THRESHOLD`: We need enough validators to participate in a +restart so they can make decision for the whole cluster. If everything works +perfect, we only need 2/3 of the total stake. However, validators could die +or perform abnormally, so we currently set the `RESTART_STAKE_THRESHOLD` at +80%, which is the same as now. ## Motivation @@ -28,33 +46,59 @@ optimistically confirmed slot, then restart the validators with new commandline arguments. The current process involves a lot of human intervention, if people make a -mistake in deciding the highest optimistically confirmed slot, it could mean -rollback of user transactions after those transactions have been confirmed, -which is not acceptable. +mistake in deciding the highest optimistically confirmed slot, it is +detrimental to the viability of the ecosystem. We aim to automate the finding of highest optimistically confirmed slot and block data distribution, so that we can lower the possibility of human mistakes -in the cluster restart process. +in the cluster restart process. This also reduces the burden on validator +operators, because they don't have to stay around while the validators +automatically try to reach consensus, they will be paged if things go wrong. ## Alternatives Considered -See [Handling Common Solana Outages](https://docs.google.com/document/d/1RkNAyz-5aKvv5FF44b8SoKifChKB705y5SdcEoqMPIc) -for details. - -There are many proposals about automatically detecting that the cluster is -in an outage so validators should enter a recovery process automatically. +### Automatically detect outage and perform cluster restart +The reaction time of a human in case of emergency is measured in minutes, +while a cluster restart where human need to initiate restarts takes hours. +We consdiered various approaches to automatcially detect outage and perform +cluster restart, which can reduce recovery speed to minutes or even seconds. -While getting human out of the loop greatly improves recovery speed, -automaticlly restarting the whole cluster still seems risky. Because if -the recovery process itself doesn't work, it might be some time before +However, automaticlly restarting the whole cluster still seems risky. Because +if the recovery process itself doesn't work, it might be some time before we can get human's attention. And it doesn't solve the cases where new binary is needed. So for now we still plan to have human in the loop. +After we gain more experience with the restart apprach in this proposal, we +may slowly try to automate more parts to improve cluster reliability. + +### Use gossip and consensus to figure out restart slot before the restart +The main difference between current proposal and this proposal is that this +proposal will automatically enters restart preparation phase where local +status is exchanged via gossip without human intervention. + +While this improves recovery speed, there are concerns about recovery gossip +messages interfers with normal gossip messages, and automatically start a new +message in gossip seems risky. + +### Automatically reduce block production in an outage +Right now we have vote-only mode, a validator will only pack vote transactions +into new blocks if the tower distance (last_vote - local_root) is greater than +400 slots. + +Unfortunately in the previous outages vote-only mode isn't enough to save the +cluster. There are proposals of more aggressive block production reduction to +save the cluster. For example, a leader could produce only one block in four +consecutive slots allocated to it. + +However, this only solves the problem in specific type of outage, and it seems +risky to aggressively reduce block production, so we are not proceeding with +this proposal. + ## Detailed Design -The new protocol tries to make all 80% restarting validators get the same -data blocks and the same set of last votes among them, then they can probably -make the same decision and then proceed. +The new protocol tries to make all restarting validators get the same +data blocks and the same set of last votes among them, then they will almost +certainly make the same decision and proceed. The steps roughly look like this: @@ -66,22 +110,23 @@ before the freeze propagate to everyone 3. Make restart participants' last votes before the freeze propagate to everyone -4. Now see if enough people can optimistically agree on one block (same slot -and hash) to restart from +4. Now see if enough people can agree on one block (same slot and hash) to +restart from 4.1 If yes, proceed and restart 4.2 If no, freeze and print out what you think is wrong, wait for human -A new command line arg --RepairAndRestart is added. When the cluster is in need -of a restart, we assume at least 80% will restart with this arg. Any validators +A new command line arg will be added. When the cluster is in need +of a restart, we assume validators holding at least `RESTART_STAKE_THRESHOLD` +percentage of stakes will restart with this arg. Any validators restarted with this arg does not participate in the normal Turbine protocol, update its vote, or generate new blocks until all of the following steps are completed. ### Gossip last vote before the restart and ancestors on that fork -Send Gossip message LastVotedForkSlots to everyone in restart, it contains the +Send gossip message LastVotedForkSlots to everyone in restart, it contains the last voted slot on its tower and the ancestor slots on the last voted fork and is sent in a compressed bitmap like the EpicSlots data structure. The number of ancestor slots sent is hard coded at 81000, because that's 400ms * 81000 = 9 @@ -99,11 +144,11 @@ bit vector. last slot is always last_voted_slot, first slot is last_voted_slot-81000. When a validator enters restart, it increments its current shred_version, so -the Gossip messages used in restart will not interfere with those outside the +the gossip messages used in restart will not interfere with those outside the restart. There is slight chance that (current_shred_version+1) % 0xffff would collide with the new shred_version calculated after the restart, but even if this rare case occured, we plan to flush the CRDS table on successful restart, -so Gossip messages used in restart will be removed. +so gossip messages used in restart will be removed. ### Aggregate, repair, and replay the slots in LastVotedForkSlots @@ -111,18 +156,19 @@ Aggregate the slots in received LastVotedForkSlots messages, whenever some slot has enough stake to be optimistically confirmed and it's missing locally, start the repair process for this slot. -We calculate "enough" stake as follows. When there are 80% validators joining -the restart, assuming 5% restarted validators can make mistakes in voting, any -block with more than 67% - 5% - (100-80)% = 42% could potentially be -optimistically confirmed before the restart. If there are 85% validators in the -restart, then any block with more than 67% - 5% - (100-85)% = 47% could be -optimistically confirmed before the restart. +We calculate "enough" stake as follows. Assume `RESTART_STAKE_THRESHOLD` is +80%. When there are 80% validators joining the restart, assuming 5% restarted +validators can make mistakes in voting, any block with more than +67% - 5% - (100-80)% = 42% could potentially be optimistically confirmed before +the restart. If there are 85% validators in the restart, then any block with +more than 67% - 5% - (100-85)% = 47% could be optimistically confirmed before +the restart. ### Gossip current heaviest fork -After receiving LastVotedForkSlots from 80% of the validators and reparing -slots with "enough" stake, replay all blocks and pick the heaviest fork as -follows: +After receiving LastVotedForkSlots from the validators holding stake more than +`RESTART_STAKE_THRESHOLD` and repairing slots with "enough" stake, replay all +blocks and pick the heaviest fork as follows: 1. Pick block and update root for all blocks with more than 67% votes @@ -136,7 +182,7 @@ could make the wrong votes. 2.2 Otherwise stop traversing the tree and use last picked block. -After deciding heaviest block, Gossip +After deciding heaviest block, gossip Heaviest(X, Hash(X), received_heaviest_stake) out, where X is the latest picked block. We also send out stake of received Heaviest messages so that we can proceed to next step when enough validators are ready. @@ -150,8 +196,8 @@ from. ### Proceed to restart if everything looks okay, halt otherwise -If things go well, all 80% of the validators should find the same heaviest -fork. But we are only sending slots instead of bank hashes in +If things go well, all of the validators in restart should find the same +heaviest fork. But we are only sending slots instead of bank hashes in LastVotedForkSlots, so it's possible that a duplicate block can make the cluster unable to reach consensus. If at least 2/3 of the people agree on one slot, they should proceed to restart from this slot. Otherwise validators @@ -171,12 +217,12 @@ wait for 10 more minutes so that the message it sent out have propagated, then do the following: - Generate a snapshot at the highest oc slot. -- Issue a hard fork at the highest oc slot and change shred version in Gossip. +- Issue a hard fork at the highest oc slot and change shred version in gossip. - Execute the current --wait-for-supermajority logic and wait for 80%. Before a validator enters restart, it will still propagate LastVotedForkSlots -and Heaviest messages in Gossip. After the restart,its shred_version will be -updated so it will no longer send or propagate Gossip messages for restart. +and Heaviest messages in gossip. After the restart,its shred_version will be +updated so it will no longer send or propagate gossip messages for restart. ## Impact @@ -189,7 +235,7 @@ don't need to manually generate and download snapshots again. ## Security Considerations -The two added Gossip messages LastVotedForkSlots and Heavist will only be sent +The two added gossip messages LastVotedForkSlots and Heaviest will only be sent and processed when the validator is restarted in RepairAndRestart mode. So random validator restarting in the new mode will not bring extra burden to the system. @@ -204,5 +250,3 @@ This change is backward compatible with previous versions, because validators only enter the new mode during new restart mode which is controlled by a command line argument. All current restart arguments like --wait-for-supermajority and --expected-bank-hash will be kept as is for now. -However, this change does not work until at least 80% installed the new binary -and they are willing to use the new methods for restart. From ebd19354f277b652d859f8c2c7e9b9183b0a5ce6 Mon Sep 17 00:00:00 2001 From: Wen Date: Mon, 15 May 2023 14:35:58 -0700 Subject: [PATCH 035/119] Rewording a few paragraphs to make things clear. --- proposals/0024-repair-and-restart.md | 233 ++++++++++++++++----------- 1 file changed, 141 insertions(+), 92 deletions(-) diff --git a/proposals/0024-repair-and-restart.md b/proposals/0024-repair-and-restart.md index 847e3fac..bf222a5d 100644 --- a/proposals/0024-repair-and-restart.md +++ b/proposals/0024-repair-and-restart.md @@ -33,6 +33,19 @@ normally start from the highest optimistically confirmed block, but it's also okay to start from a child of the highest optimistcially confirmed block as long as consensus can be reached. +* "silent repair phase": In the new repair and restart plan, the validators in +restart will first spend some time to exchange information, repair missing +blocks, and finally reach consensus. The validators only continue normal block +production and voting after consensus is reached. We call this preparation +phase where block production and voting are paused the silent repair phase. + +* "ephemeral shred version": right now we update `shred_version` during a +cluster restart, it is used to verify received shreds and filter Gossip peers. +In the new repair and restart plan, we introduce a new temporary shred version +in the silent repair phase so validators in restart don't interfere with those +not in restart. Currently this ephemeral shred version is calculated using +`(current_shred_version + 1) % 0xffff`. + * `RESTART_STAKE_THRESHOLD`: We need enough validators to participate in a restart so they can make decision for the whole cluster. If everything works perfect, we only need 2/3 of the total stake. However, validators could die @@ -98,20 +111,23 @@ this proposal. The new protocol tries to make all restarting validators get the same data blocks and the same set of last votes among them, then they will almost -certainly make the same decision and proceed. +certainly make the same decision on the canonical fork and proceed. The steps roughly look like this: -1. Everyone freezes, no new blocks, no new votes, and no Turbine +1. The validator boots into the silent repair phase, it will not make new +blocks or change its votes. The validator propagates its local voted fork to +all other validators in restart. -2. Make all blocks which can potentially have been optimistically confirmed -before the freeze propagate to everyone +2. While counting local vote information from all others in restart, the +validator repairs all blocks which could potentially have been optimistically +confirmed. -3. Make restart participants' last vote prior to the freeze propagate to -everyone +3. After repair is complete, the validator counts votes on each fork and +sends out local heaviest fork. -4. Now see if enough nodes can agree on one block (same slot and hash) to -restart from +4. Each validator counts if enough nodes can agree on one block (same slot and +hash) to restart from: 1. If yes, proceed and restart @@ -124,52 +140,85 @@ restarted with this arg does not participate in the normal Turbine protocol, update its vote, or generate new blocks until all of the following steps are completed. -### Gossip last vote before the restart and ancestors on that fork - -Send gossip message LastVotedForkSlots to everyone in restart, it contains the -last voted slot on its tower and the ancestor slots on the last voted fork and -is sent in a compressed bitmap like the `EpochSlots` data structure. The number of -ancestor slots sent is hard coded at 81000, because that's 400ms * 81000 = 9 -hours, we assume most restart decisions to be made in 9 hours. If a validator -restarts after 9 hours past the outage, it cannot join the restart this way. If -enough validators failed to restart within 9 hours, then fallback to the -manual, interactive cluster restart method. - -The fields of LastVotedForkSlots are: - -- `last_voted_slot`: the slot last voted, this also serves as last_slot for the -bit vector. -- `last_voted_hash`: the bank hash of the slot last voted slot. -- `ancestors`: compressed bit vector representing the slots which produced -a block on the last voted fork. the most significant bit is always -last_voted_slot, least significant bit is last_voted_slot-81000. - -When a validator enters restart, it increments its current shred_version, so -the gossip messages used in restart will not interfere with those outside the -restart. There is slight chance that (current_shred_version+1) % 0xffff would -collide with the new shred_version calculated after the restart, but even if -this rare case occured, we plan to flush the CRDS table on successful restart, -so gossip messages used in restart will be removed. - -### Aggregate, repair, and replay the slots in LastVotedForkSlots - -Aggregate the slots in received LastVotedForkSlots messages, whenever some slot -has enough stake to be optimistically confirmed and it's missing locally, start -the repair process for this slot. - -We calculate "enough" stake as follows. Assume `RESTART_STAKE_THRESHOLD` is -80%. When there are 80% validators joining the restart, assuming 5% restarted -validators can make mistakes in voting, any block with more than -67% - 5% - (100-80)% = 42% could potentially be optimistically confirmed before -the restart. If there are 85% validators in the restart, then any block with -more than 67% - 5% - (100-85)% = 47% could be optimistically confirmed before -the restart. - -### Gossip current heaviest fork - -After receiving LastVotedForkSlots from the validators holding stake more than -`RESTART_STAKE_THRESHOLD` and repairing slots with "enough" stake, replay all -blocks and pick the heaviest fork as follows: +### 1. Gossip last vote before the restart and ancestors on that fork + +The main goal of this step is to propagate the locally selected fork to all +others in restart. + +We use a new Gossip message `LastVotedForkSlots`, its fields are: + +- `last_voted_slot`: `u64` the slot last voted, this also serves as last_slot +for the bit vector. +- `last_voted_hash`: `Hash` the bank hash of the slot last voted slot. +- `ancestors`: `BitVec` compressed bit vector representing the slots on +sender's last voted fork. the most significant bit is always +`last_voted_slot`, least significant bit is `last_voted_slot-81000`. + +The number of ancestor slots sent is hard coded at 81000, because that's +400ms * 81000 = 9 hours, we assume most restart decisions to be made in 9 +hours. If a validator restarts after 9 hours past the outage, it cannot join +the restart this way. If enough validators failed to restart within 9 hours, +then fallback to the manual, interactive cluster restart method. + +When a validator enters restart, it uses ephemeral shred version to avoid +interfering with those outside the restart. There is slight chance that +the ephemeral shred version would collide with the shred version after the +silent repair phase, but even if this rare case occured, we plan to flush the +CRDS table on successful restart, so gossip messages used in restart will be +removed. + +### 2. Repair ledgers up to the restart slot + +The main goal of this step is to repair all blocks which could potentially be +optimistically confirmed. So during next step each validator can select its +own heaviest fork. + +We need to prevent false negative at all costs, because we can't rollback an +optimistcially confirmed block. However, false positive is okay. Because when +we select the heaviest fork in the next step, we should see all the potential +candidates for optimistically confirmed slots, there we can count the votes and +remove some false positive cases. + +However, it's also overkill to repair every block presented by others. When +`LastVotedForkSlots` messages are being received and aggregated, a validator +can categorize blocks missing locally into 3 categories: ignored, unsure, +and must-have. Depending on the stakes of validators currently in restart, some +slots with too few stake can be safely ignored, some have enough stake they +should definitely be repaired, and the rest would need more confirmation before +those blocks are worth repairing. + +Assume `RESTART_STAKE_THRESHOLD` is 80% and that 5% restarted validators can +make mistakes in voting. + +When only 5% validators are in restart, everything is in "unsure" category. + +When 67% validators are in restart, any slot with less than +67% - 5% - (100-67%) = 29% is in "ignored" category, because even if all +validators join the restart, the slot will not get 67% stake. Any slot with +more than 33% stake can potentially affect heaviest fork choice, so it is in +"must-have" category. Any slot with between 29% and 33% stake is "unsure". + +When 80% validators are in restart, any slot with less than +67% - 5% - (100-80%) = 42% is in "ignored" category, the rest is "must-have". + +From above examples, we can see the "must-have" threshold changes dynamically +depending on how many validators are in restart. The main benefit is that a +block will only move from "must-have/unsure" to "ignored" as more validators +join the restart, not vice versa. So validators will have an unchanged list of +blocks to repair when >71% validators joined the restart. + +### 3. Gossip current heaviest fork + +We use a new Gossip message `HeaviestFork`, its fields are: + +- `slot`: `u64` slot of the picked block. +- `hash`: `Hash` bank hash of the picked block. +- `received`: `u8` total percentage of stakes of the validators it received +`HeaviestFork` messages from. + +After receiving `LastVotedForkSlots` from the validators holding stake more +than `RESTART_STAKE_THRESHOLD` and repairing slots with "enough" stake, replay +all blocks and pick the heaviest fork as follows: 1. Pick block and update root for all blocks with more than 67% votes @@ -181,49 +230,49 @@ For example, if 80% validators are in restart, child has 42% votes, then 42 + (100-80) = 62%, pick child. 62% is chosen instead of 67% because 5% could make the wrong votes. +It's okay to use 62% here because the goal is to prevent false negative rather +than false positive. If validators pick a child of optimistically confirmed +block to start from, it's okay because if 75% of the validators all choose this +block, this block will be instantly confirmed on the chain. While not having +any optimistically confirmed block rolled back is the number one goal here. + 2.2 Otherwise stop traversing the tree and use last picked block. After deciding heaviest block, gossip -Heaviest(X, Hash(X), received_heaviest_stake) out, where X is the latest picked -block. We also send out stake of received Heaviest messages so that we can -proceed to next step when enough validators are ready. - -The fields of the Heaviest message is: +`HeaviestFork(X, Hash(X), received_heaviest_stake)` out, where X is the latest +picked block. We also send out stake of received `HeaviestFork` messages so +that we can proceed to next step when enough validators are ready. -- `slot`: slot of the picked block. -- `hash`: bank hash of the picked block. -- `received`: total of stakes of the validators it received Heaviest messages -from. +### 4. Proceed to restart if everything looks okay, halt otherwise -### Proceed to restart if everything looks okay, halt otherwise +All validators in restart keep counting the number of Heaviest where +`received_heaviest_stake` is higher than 80%. Once a validator counts that 80% +of the validators send out Heaviest where `received_heaviest_stake` is higher +than 80%, it starts the following checks: -If things go well, all of the validators in restart should find the same -heaviest fork. But we are only sending slots instead of bank hashes in -LastVotedForkSlots, so it's possible that a duplicate block can make the -cluster unable to reach consensus. If at least 2/3 of the nodes agree on one -slot, they should proceed to restart from this slot. Otherwise validators -should halt and send alerts for human attention. - -We will also perform some safety checks, if the voted slot does not satisfy -safety checks, then the restart will be aborted: +- Whether all `HeaviestFork` have the same slot and same block Hash. Because +validators are only sending slots instead of bank hashes in +`LastVotedForkSlots`, it's possible that a duplicate block can make the +cluster unable to reach consensus. So block hash needs to be checked as well. - The voted slot is equal or a child of local optimistically confirmed slot. -We require that at least 80% of the people received the Heaviest messages from -validators with at least 80% stake, and that the Heaviest messages all agree on -one block and hash. +If the above check passes, the validator immediately starts generation of +snapshot at the agreed upon slot. + +While the snapshot generation is in progress, the validator also checks to see +2 minutes has passed since agreement has been reached, to guarantee its +`HeaviestFork` message propagates to everyone, then proceeds to restart: -So after a validator sees that 80% of the validators received 80% of the votes, -wait for 10 more minutes so that the message it sent out have propagated, then -do the following: +1. Issue a hard fork at the highest oc slot and change shred version in gossip. +2. Execute the current tasks involved in --wait-for-supermajority and wait for 80%. -- Generate a snapshot at the highest oc slot. -- Issue a hard fork at the highest oc slot and change shred version in gossip. -- Execute the current --wait-for-supermajority logic and wait for 80%. +Before a validator enters restart, it will still propagate `LastVotedForkSlots` +and `HeaviestFork` messages in gossip. After the restart,its shred_version will +be updated so it will no longer send or propagate gossip messages for restart. -Before a validator enters restart, it will still propagate LastVotedForkSlots -and Heaviest messages in gossip. After the restart,its shred_version will be -updated so it will no longer send or propagate gossip messages for restart. +If any of the checks fails, the validator immediately prints out all debug info, +sends out metrics so that people can be paged, and then halts. ## Impact @@ -236,14 +285,14 @@ don't need to manually generate and download snapshots again. ## Security Considerations -The two added gossip messages LastVotedForkSlots and Heaviest will only be sent -and processed when the validator is restarted in RepairAndRestart mode. So -random validator restarting in the new mode will not bring extra burden to the -system. +The two added gossip messages `LastVotedForkSlots` and `HeaviestFork` will only +be sent and processed when the validator is restarted in RepairAndRestart mode. +So random validator restarting in the new mode will not bring extra burden to +the system. -Non-conforming validators could send out wrong LastVotedForkSlots and Heaviest -messages to mess with cluster restarts, these should be included in the -Slashing rules in the future. +Non-conforming validators could send out wrong `LastVotedForkSlots` and +`HeaviestFork` messages to mess with cluster restarts, these should be included +in the Slashing rules in the future. ## Backwards Compatibility From deee8ec908eef2c9d55593a8a5a9894967ab7267 Mon Sep 17 00:00:00 2001 From: Wen Date: Mon, 15 May 2023 15:36:58 -0700 Subject: [PATCH 036/119] Fix a few small sentences. --- proposals/0024-repair-and-restart.md | 136 +++++----- proposals/0024-repair-and-restart.md.bak | 303 +++++++++++++++++++++++ 2 files changed, 371 insertions(+), 68 deletions(-) create mode 100644 proposals/0024-repair-and-restart.md.bak diff --git a/proposals/0024-repair-and-restart.md b/proposals/0024-repair-and-restart.md index bf222a5d..b2e89747 100644 --- a/proposals/0024-repair-and-restart.md +++ b/proposals/0024-repair-and-restart.md @@ -26,12 +26,12 @@ single validator restart which does not impact the cluster. See [cluster restart](https://docs.solana.com/running-validator/restart-cluster) for details. -* "optimistically confirmed block": a block which gets the votes of validators -with > 2/3 stake. Our algorithm tries to guarantee that an optimistically -confirmed will never be rolled back. When we are performing cluster restart, we -normally start from the highest optimistically confirmed block, but it's also -okay to start from a child of the highest optimistcially confirmed block as -long as consensus can be reached. +* "optimistically confirmed block": a block which gets the votes from the +majority of the validators in a cluster (> 2/3 stake). Our algorithm tries to +guarantee that an optimistically confirmed will never be rolled back. When we +are performing cluster restart, we normally start from the highest +optimistically confirmed block, but it's also okay to start from a child of the +highest optimistically confirmed block as long as consensus can be reached. * "silent repair phase": In the new repair and restart plan, the validators in restart will first spend some time to exchange information, repair missing @@ -54,44 +54,45 @@ or perform abnormally, so we currently set the `RESTART_STAKE_THRESHOLD` at ## Motivation -Currently during a cluster restart, validator operators need to decide latest -optimistically confirmed slot, then restart the validators with new commandline -arguments. +Currently during a cluster restart, validator operators need to decide the +highest optimistically confirmed slot, then restart the validators with new +command-line arguments. The current process involves a lot of human intervention, if people make a mistake in deciding the highest optimistically confirmed slot, it is detrimental to the viability of the ecosystem. We aim to automate the negotiation of highest optimistically confirmed slot and -block data distribution, so that we can lower the possibility of human mistakes -in the cluster restart process. This also reduces the burden on validator -operators, because they don't have to stay around while the validators -automatically try to reach consensus, they will be paged if things go wrong. +the distribution of all blocks on that fork, so that we can lower the +possibility of human mistakes in the cluster restart process. This also reduces +the burden on validator operators, because they don't have to stay around while +the validators automatically try to reach consensus, they will be paged if +things go wrong. ## Alternatives Considered ### Automatically detect outage and perform cluster restart The reaction time of a human in case of emergency is measured in minutes, -while a cluster restart where human need to initiate restarts takes hours. -We consdiered various approaches to automatcially detect outage and perform +while a cluster restart where human initiate validator restarts takes hours. +We considered various approaches to automatically detect outage and perform cluster restart, which can reduce recovery speed to minutes or even seconds. -However, automaticlly restarting the whole cluster still seems risky. Because +However, automatically restarting the whole cluster seems risky. Because if the recovery process itself doesn't work, it might be some time before we can get human's attention. And it doesn't solve the cases where new binary is needed. So for now we still plan to have human in the loop. -After we gain more experience with the restart apprach in this proposal, we +After we gain more experience with the restart approach in this proposal, we may slowly try to automate more parts to improve cluster reliability. ### Use gossip and consensus to figure out restart slot before the restart The main difference between current proposal and this proposal is that this -proposal will automatically enters restart preparation phase where local -status is exchanged via gossip without human intervention. +proposal will automatically enter restart preparation phase without human +intervention. -While this improves recovery speed, there are concerns about recovery gossip -messages interfers with normal gossip messages, and automatically start a new -message in gossip seems risky. +While getting human out of the loop improves recovery speed, there are concerns +about recovery gossip messages interfering with normal gossip messages, and +automatically start a new message in gossip seems risky. ### Automatically reduce block production in an outage Right now we have vote-only mode, a validator will only pack vote transactions @@ -105,19 +106,22 @@ consecutive slots allocated to it. However, this only solves the problem in specific type of outage, and it seems risky to aggressively reduce block production, so we are not proceeding with -this proposal. +this proposal for now. ## Detailed Design -The new protocol tries to make all restarting validators get the same -data blocks and the same set of last votes among them, then they will almost -certainly make the same decision on the canonical fork and proceed. +The new protocol tries to make all restarting validators get the same data +blocks and the same set of last votes, then they will almost certainly make the +same decision on the canonical fork and proceed. -The steps roughly look like this: +A new command line arg will be added. When the cluster is in need +of a restart, we assume validators holding at least `RESTART_STAKE_THRESHOLD` +percentage of stakes will restart with this arg. Then the following steps +will happen: 1. The validator boots into the silent repair phase, it will not make new -blocks or change its votes. The validator propagates its local voted fork to -all other validators in restart. +blocks or change its votes. The validator propagates its local voted fork +information to all other validators in restart. 2. While counting local vote information from all others in restart, the validator repairs all blocks which could potentially have been optimistically @@ -129,16 +133,11 @@ sends out local heaviest fork. 4. Each validator counts if enough nodes can agree on one block (same slot and hash) to restart from: - 1. If yes, proceed and restart + 1. If yes, proceed and restart - 2. If no, freeze and print out what it thinks is wrong, wait for human + 2. If no, print out what it thinks is wrong, halt and wait for human -A new command line arg will be added. When the cluster is in need -of a restart, we assume validators holding at least `RESTART_STAKE_THRESHOLD` -percentage of stakes will restart with this arg. Any validators -restarted with this arg does not participate in the normal Turbine protocol, -update its vote, or generate new blocks until all of the following steps are -completed. +See each step explained in details below. ### 1. Gossip last vote before the restart and ancestors on that fork @@ -163,29 +162,28 @@ then fallback to the manual, interactive cluster restart method. When a validator enters restart, it uses ephemeral shred version to avoid interfering with those outside the restart. There is slight chance that the ephemeral shred version would collide with the shred version after the -silent repair phase, but even if this rare case occured, we plan to flush the +silent repair phase, but even if this rare case occurred, we plan to flush the CRDS table on successful restart, so gossip messages used in restart will be removed. ### 2. Repair ledgers up to the restart slot The main goal of this step is to repair all blocks which could potentially be -optimistically confirmed. So during next step each validator can select its -own heaviest fork. +optimistically confirmed. We need to prevent false negative at all costs, because we can't rollback an -optimistcially confirmed block. However, false positive is okay. Because when +optimistically confirmed block. However, false positive is okay. Because when we select the heaviest fork in the next step, we should see all the potential candidates for optimistically confirmed slots, there we can count the votes and remove some false positive cases. However, it's also overkill to repair every block presented by others. When `LastVotedForkSlots` messages are being received and aggregated, a validator -can categorize blocks missing locally into 3 categories: ignored, unsure, -and must-have. Depending on the stakes of validators currently in restart, some +can categorize blocks missing locally into 3 categories: ignored, must-have, +and unsure. Depending on the stakes of validators currently in restart, some slots with too few stake can be safely ignored, some have enough stake they -should definitely be repaired, and the rest would need more confirmation before -those blocks are worth repairing. +should definitely be repaired, and the rest would be undecided pending more +confirmations. Assume `RESTART_STAKE_THRESHOLD` is 80% and that 5% restarted validators can make mistakes in voting. @@ -194,9 +192,10 @@ When only 5% validators are in restart, everything is in "unsure" category. When 67% validators are in restart, any slot with less than 67% - 5% - (100-67%) = 29% is in "ignored" category, because even if all -validators join the restart, the slot will not get 67% stake. Any slot with -more than 33% stake can potentially affect heaviest fork choice, so it is in -"must-have" category. Any slot with between 29% and 33% stake is "unsure". +validators join the restart, the slot will not get 67% stake. When this +threshold is less than 33%, we temporarily put all blocks with >33% stake into +"must-have" category to speed up repairing. Any slot with between 29% and 33% +stake is "unsure". When 80% validators are in restart, any slot with less than 67% - 5% - (100-80%) = 42% is in "ignored" category, the rest is "must-have". @@ -204,11 +203,13 @@ When 80% validators are in restart, any slot with less than From above examples, we can see the "must-have" threshold changes dynamically depending on how many validators are in restart. The main benefit is that a block will only move from "must-have/unsure" to "ignored" as more validators -join the restart, not vice versa. So validators will have an unchanged list of -blocks to repair when >71% validators joined the restart. +join the restart, not vice versa. So the list of blocks a validator needs to +repair will never grow bigger when more validators join the restart. ### 3. Gossip current heaviest fork +The main goal of this step is to "vote" the heaviest fork to restart from. + We use a new Gossip message `HeaviestFork`, its fields are: - `slot`: `u64` slot of the picked block. @@ -217,26 +218,25 @@ We use a new Gossip message `HeaviestFork`, its fields are: `HeaviestFork` messages from. After receiving `LastVotedForkSlots` from the validators holding stake more -than `RESTART_STAKE_THRESHOLD` and repairing slots with "enough" stake, replay -all blocks and pick the heaviest fork as follows: +than `RESTART_STAKE_THRESHOLD` and repairing slots in "must-have" category, +replay all blocks and pick the heaviest fork as follows: -1. Pick block and update root for all blocks with more than 67% votes +1. For all blocks with more than 67% votes, they must be on picked fork. 2. If a picked block has more than one children, check if the votes on the heaviest child is over threshold: -2.1 If vote_on_child + stake_on_validators_not_in_restart >= 62%, pick child. + 1. If vote_on_child + stake_on_validators_not_in_restart >= 62%, pick child. For example, if 80% validators are in restart, child has 42% votes, then 42 + (100-80) = 62%, pick child. 62% is chosen instead of 67% because 5% could make the wrong votes. It's okay to use 62% here because the goal is to prevent false negative rather than false positive. If validators pick a child of optimistically confirmed -block to start from, it's okay because if 75% of the validators all choose this -block, this block will be instantly confirmed on the chain. While not having -any optimistically confirmed block rolled back is the number one goal here. +block to start from, it's okay because if 80% of the validators all choose this +block, this block will be instantly confirmed on the chain. -2.2 Otherwise stop traversing the tree and use last picked block. + 2. Otherwise stop traversing the tree and use last picked block. After deciding heaviest block, gossip `HeaviestFork(X, Hash(X), received_heaviest_stake)` out, where X is the latest @@ -245,10 +245,10 @@ that we can proceed to next step when enough validators are ready. ### 4. Proceed to restart if everything looks okay, halt otherwise -All validators in restart keep counting the number of Heaviest where +All validators in restart keep counting the number of `HeaviestFork` where `received_heaviest_stake` is higher than 80%. Once a validator counts that 80% -of the validators send out Heaviest where `received_heaviest_stake` is higher -than 80%, it starts the following checks: +of the validators send out `HeaviestFork` where `received_heaviest_stake` is +higher than 80%, it starts the following checks: - Whether all `HeaviestFork` have the same slot and same block Hash. Because validators are only sending slots instead of bank hashes in @@ -257,15 +257,15 @@ cluster unable to reach consensus. So block hash needs to be checked as well. - The voted slot is equal or a child of local optimistically confirmed slot. -If the above check passes, the validator immediately starts generation of -snapshot at the agreed upon slot. +If all checks pass, the validator immediately starts generation of snapshot at +the agreed upon slot. While the snapshot generation is in progress, the validator also checks to see -2 minutes has passed since agreement has been reached, to guarantee its -`HeaviestFork` message propagates to everyone, then proceeds to restart: +whether two minutes has passed since agreement has been reached, to guarantee +its `HeaviestFork` message propagates to everyone, then proceeds to restart: -1. Issue a hard fork at the highest oc slot and change shred version in gossip. -2. Execute the current tasks involved in --wait-for-supermajority and wait for 80%. +1. Issue a hard fork at the designated slot and change shred version in gossip. +2. Execute the current tasks in --wait-for-supermajority and wait for 80%. Before a validator enters restart, it will still propagate `LastVotedForkSlots` and `HeaviestFork` messages in gossip. After the restart,its shred_version will @@ -276,7 +276,7 @@ sends out metrics so that people can be paged, and then halts. ## Impact -This proposal adds a new RepairAndRestart mode to validators, during this phase +This proposal adds a new silent repair mode to validators, during this phase the validators will not participate in normal cluster activities, which is the same as now. Compared to today's cluster restart, the new mode may mean more network bandwidth and memory on the restarting validators, but it guarantees diff --git a/proposals/0024-repair-and-restart.md.bak b/proposals/0024-repair-and-restart.md.bak new file mode 100644 index 00000000..7857cf2b --- /dev/null +++ b/proposals/0024-repair-and-restart.md.bak @@ -0,0 +1,303 @@ +--- +simd: '0024' +title: Optimistic cluster restart automation +authors: + - Wen Xu (Solana Labs) +category: Standard +type: Core +status: Draft +created: 2023-04-07 +feature: (fill in with feature tracking issues once accepted) +--- + +## Summary + +During a cluster restart following an outage, use gossip to exchange local +status and automatically reach consensus on the block to restart from. Proceed +to restart if validators in the restart can reach agreement, or print debug +information and halt otherwise. + +## New Terminology + +* "cluster restart": When there is an outage such that the whole cluster +stalls, human may need to restart most of the validators with a sane state so +that the cluster can continue to function. This is different from sporadic +single validator restart which does not impact the cluster. See +[cluster restart](https://docs.solana.com/running-validator/restart-cluster) +for details. + +* "optimistically confirmed block": a block which gets the votes from the +majority of the validators in a cluster (> 2/3 stake). Our algorithm tries to +guarantee that an optimistically confirmed will never be rolled back. When we +are performing cluster restart, we normally start from the highest +optimistically confirmed block, but it's also okay to start from a child of the +highest optimistcially confirmed block as long as consensus can be reached. + +* "silent repair phase": In the new repair and restart plan, the validators in +restart will first spend some time to exchange information, repair missing +blocks, and finally reach consensus. The validators only continue normal block +production and voting after consensus is reached. We call this preparation +phase where block production and voting are paused the silent repair phase. + +* "ephemeral shred version": right now we update `shred_version` during a +cluster restart, it is used to verify received shreds and filter Gossip peers. +In the new repair and restart plan, we introduce a new temporary shred version +in the silent repair phase so validators in restart don't interfere with those +not in restart. Currently this ephemeral shred version is calculated using +`(current_shred_version + 1) % 0xffff`. + +* `RESTART_STAKE_THRESHOLD`: We need enough validators to participate in a +restart so they can make decision for the whole cluster. If everything works +perfect, we only need 2/3 of the total stake. However, validators could die +or perform abnormally, so we currently set the `RESTART_STAKE_THRESHOLD` at +80%, which is the same as now. + +## Motivation + +Currently during a cluster restart, validator operators need to decide the +highest optimistically confirmed slot, then restart the validators with new +commandline arguments. + +The current process involves a lot of human intervention, if people make a +mistake in deciding the highest optimistically confirmed slot, it is +detrimental to the viability of the ecosystem. + +We aim to automate the negotiation of highest optimistically confirmed slot and +the distribution of all blocks on that fork, so that we can lower the +possibility of human mistakes in the cluster restart process. This also reduces +the burden on validator operators, because they don't have to stay around while +the validators automatically try to reach consensus, they will be paged if +things go wrong. + +## Alternatives Considered + +### Automatically detect outage and perform cluster restart +The reaction time of a human in case of emergency is measured in minutes, +while a cluster restart where human initiate validator restarts takes hours. +We consdiered various approaches to automatcially detect outage and perform +cluster restart, which can reduce recovery speed to minutes or even seconds. + +However, automaticlly restarting the whole cluster seems risky. Because +if the recovery process itself doesn't work, it might be some time before +we can get human's attention. And it doesn't solve the cases where new binary +is needed. So for now we still plan to have human in the loop. + +After we gain more experience with the restart apprach in this proposal, we +may slowly try to automate more parts to improve cluster reliability. + +### Use gossip and consensus to figure out restart slot before the restart +The main difference between current proposal and this proposal is that this +proposal will automatically enters restart preparation phase where local +status is exchanged via gossip without human intervention. + +While this improves recovery speed, there are concerns about recovery gossip +messages interfers with normal gossip messages, and automatically start a new +message in gossip seems risky. + +### Automatically reduce block production in an outage +Right now we have vote-only mode, a validator will only pack vote transactions +into new blocks if the tower distance (last_vote - local_root) is greater than +400 slots. + +Unfortunately in the previous outages vote-only mode isn't enough to save the +cluster. There are proposals of more aggressive block production reduction to +save the cluster. For example, a leader could produce only one block in four +consecutive slots allocated to it. + +However, this only solves the problem in specific type of outage, and it seems +risky to aggressively reduce block production, so we are not proceeding with +this proposal. + +## Detailed Design + +The new protocol tries to make all restarting validators get the same +data blocks and the same set of last votes among them, then they will almost +certainly make the same decision on the canonical fork and proceed. + +The steps roughly look like this: + +1. The validator boots into the silent repair phase, it will not make new +blocks or change its votes. The validator propagates its local voted fork to +all other validators in restart. + +2. While counting local vote information from all others in restart, the +validator repairs all blocks which could potentially have been optimistically +confirmed. + +3. After repair is complete, the validator counts votes on each fork and +sends out local heaviest fork. + +4. Each validator counts if enough nodes can agree on one block (same slot and +hash) to restart from: + + 1. If yes, proceed and restart + + 2. If no, freeze and print out what it thinks is wrong, wait for human + +A new command line arg will be added. When the cluster is in need +of a restart, we assume validators holding at least `RESTART_STAKE_THRESHOLD` +percentage of stakes will restart with this arg. Any validators +restarted with this arg does not participate in the normal Turbine protocol, +update its vote, or generate new blocks until all of the following steps are +completed. + +### 1. Gossip last vote before the restart and ancestors on that fork + +The main goal of this step is to propagate the locally selected fork to all +others in restart. + +We use a new Gossip message `LastVotedForkSlots`, its fields are: + +- `last_voted_slot`: `u64` the slot last voted, this also serves as last_slot +for the bit vector. +- `last_voted_hash`: `Hash` the bank hash of the slot last voted slot. +- `ancestors`: `BitVec` compressed bit vector representing the slots on +sender's last voted fork. the most significant bit is always +`last_voted_slot`, least significant bit is `last_voted_slot-81000`. + +The number of ancestor slots sent is hard coded at 81000, because that's +400ms * 81000 = 9 hours, we assume most restart decisions to be made in 9 +hours. If a validator restarts after 9 hours past the outage, it cannot join +the restart this way. If enough validators failed to restart within 9 hours, +then fallback to the manual, interactive cluster restart method. + +When a validator enters restart, it uses ephemeral shred version to avoid +interfering with those outside the restart. There is slight chance that +the ephemeral shred version would collide with the shred version after the +silent repair phase, but even if this rare case occured, we plan to flush the +CRDS table on successful restart, so gossip messages used in restart will be +removed. + +### 2. Repair ledgers up to the restart slot + +The main goal of this step is to repair all blocks which could potentially be +optimistically confirmed. So during next step each validator can select its +own heaviest fork. + +We need to prevent false negative at all costs, because we can't rollback an +optimistcially confirmed block. However, false positive is okay. Because when +we select the heaviest fork in the next step, we should see all the potential +candidates for optimistically confirmed slots, there we can count the votes and +remove some false positive cases. + +However, it's also overkill to repair every block presented by others. When +`LastVotedForkSlots` messages are being received and aggregated, a validator +can categorize blocks missing locally into 3 categories: ignored, unsure, +and must-have. Depending on the stakes of validators currently in restart, some +slots with too few stake can be safely ignored, some have enough stake they +should definitely be repaired, and the rest would need more confirmation before +those blocks are worth repairing. + +Assume `RESTART_STAKE_THRESHOLD` is 80% and that 5% restarted validators can +make mistakes in voting. + +When only 5% validators are in restart, everything is in "unsure" category. + +When 67% validators are in restart, any slot with less than +67% - 5% - (100-67%) = 29% is in "ignored" category, because even if all +validators join the restart, the slot will not get 67% stake. Any slot with +more than 33% stake can potentially affect heaviest fork choice, so it is in +"must-have" category. Any slot with between 29% and 33% stake is "unsure". + +When 80% validators are in restart, any slot with less than +67% - 5% - (100-80%) = 42% is in "ignored" category, the rest is "must-have". + +From above examples, we can see the "must-have" threshold changes dynamically +depending on how many validators are in restart. The main benefit is that a +block will only move from "must-have/unsure" to "ignored" as more validators +join the restart, not vice versa. So validators will have an unchanged list of +blocks to repair when >71% validators joined the restart. + +### 3. Gossip current heaviest fork + +We use a new Gossip message `HeaviestFork`, its fields are: + +- `slot`: `u64` slot of the picked block. +- `hash`: `Hash` bank hash of the picked block. +- `received`: `u8` total percentage of stakes of the validators it received +`HeaviestFork` messages from. + +After receiving `LastVotedForkSlots` from the validators holding stake more +than `RESTART_STAKE_THRESHOLD` and repairing slots with "enough" stake, replay +all blocks and pick the heaviest fork as follows: + +1. Pick block and update root for all blocks with more than 67% votes + +2. If a picked block has more than one children, check if the votes on the +heaviest child is over threshold: + +2.1 If vote_on_child + stake_on_validators_not_in_restart >= 62%, pick child. +For example, if 80% validators are in restart, child has 42% votes, then +42 + (100-80) = 62%, pick child. 62% is chosen instead of 67% because 5% +could make the wrong votes. + +It's okay to use 62% here because the goal is to prevent false negative rather +than false positive. If validators pick a child of optimistically confirmed +block to start from, it's okay because if 75% of the validators all choose this +block, this block will be instantly confirmed on the chain. While not having +any optimistically confirmed block rolled back is the number one goal here. + +2.2 Otherwise stop traversing the tree and use last picked block. + +After deciding heaviest block, gossip +`HeaviestFork(X, Hash(X), received_heaviest_stake)` out, where X is the latest +picked block. We also send out stake of received `HeaviestFork` messages so +that we can proceed to next step when enough validators are ready. + +### 4. Proceed to restart if everything looks okay, halt otherwise + +All validators in restart keep counting the number of Heaviest where +`received_heaviest_stake` is higher than 80%. Once a validator counts that 80% +of the validators send out Heaviest where `received_heaviest_stake` is higher +than 80%, it starts the following checks: + +- Whether all `HeaviestFork` have the same slot and same block Hash. Because +validators are only sending slots instead of bank hashes in +`LastVotedForkSlots`, it's possible that a duplicate block can make the +cluster unable to reach consensus. So block hash needs to be checked as well. + +- The voted slot is equal or a child of local optimistically confirmed slot. + +If the above check passes, the validator immediately starts generation of +snapshot at the agreed upon slot. + +While the snapshot generation is in progress, the validator also checks to see +2 minutes has passed since agreement has been reached, to guarantee its +`HeaviestFork` message propagates to everyone, then proceeds to restart: + +1. Issue a hard fork at the highest oc slot and change shred version in gossip. +2. Execute the current tasks involved in --wait-for-supermajority and wait for 80%. + +Before a validator enters restart, it will still propagate `LastVotedForkSlots` +and `HeaviestFork` messages in gossip. After the restart,its shred_version will +be updated so it will no longer send or propagate gossip messages for restart. + +If any of the checks fails, the validator immediately prints out all debug info, +sends out metrics so that people can be paged, and then halts. + +## Impact + +This proposal adds a new RepairAndRestart mode to validators, during this phase +the validators will not participate in normal cluster activities, which is the +same as now. Compared to today's cluster restart, the new mode may mean more +network bandwidth and memory on the restarting validators, but it guarantees +the safety of optimistically confirmed user transactions, and validator admins +don't need to manually generate and download snapshots again. + +## Security Considerations + +The two added gossip messages `LastVotedForkSlots` and `HeaviestFork` will only +be sent and processed when the validator is restarted in RepairAndRestart mode. +So random validator restarting in the new mode will not bring extra burden to +the system. + +Non-conforming validators could send out wrong `LastVotedForkSlots` and +`HeaviestFork` messages to mess with cluster restarts, these should be included +in the Slashing rules in the future. + +## Backwards Compatibility + +This change is backward compatible with previous versions, because validators +only enter the new mode during new restart mode which is controlled by a +command line argument. All current restart arguments like +--wait-for-supermajority and --expected-bank-hash will be kept as is for now. From 9f013a063f0d683d8cc64196914eb69a306f8b42 Mon Sep 17 00:00:00 2001 From: Wen Date: Mon, 15 May 2023 15:40:10 -0700 Subject: [PATCH 037/119] Remove .bak file. --- proposals/0024-repair-and-restart.md.bak | 303 ----------------------- 1 file changed, 303 deletions(-) delete mode 100644 proposals/0024-repair-and-restart.md.bak diff --git a/proposals/0024-repair-and-restart.md.bak b/proposals/0024-repair-and-restart.md.bak deleted file mode 100644 index 7857cf2b..00000000 --- a/proposals/0024-repair-and-restart.md.bak +++ /dev/null @@ -1,303 +0,0 @@ ---- -simd: '0024' -title: Optimistic cluster restart automation -authors: - - Wen Xu (Solana Labs) -category: Standard -type: Core -status: Draft -created: 2023-04-07 -feature: (fill in with feature tracking issues once accepted) ---- - -## Summary - -During a cluster restart following an outage, use gossip to exchange local -status and automatically reach consensus on the block to restart from. Proceed -to restart if validators in the restart can reach agreement, or print debug -information and halt otherwise. - -## New Terminology - -* "cluster restart": When there is an outage such that the whole cluster -stalls, human may need to restart most of the validators with a sane state so -that the cluster can continue to function. This is different from sporadic -single validator restart which does not impact the cluster. See -[cluster restart](https://docs.solana.com/running-validator/restart-cluster) -for details. - -* "optimistically confirmed block": a block which gets the votes from the -majority of the validators in a cluster (> 2/3 stake). Our algorithm tries to -guarantee that an optimistically confirmed will never be rolled back. When we -are performing cluster restart, we normally start from the highest -optimistically confirmed block, but it's also okay to start from a child of the -highest optimistcially confirmed block as long as consensus can be reached. - -* "silent repair phase": In the new repair and restart plan, the validators in -restart will first spend some time to exchange information, repair missing -blocks, and finally reach consensus. The validators only continue normal block -production and voting after consensus is reached. We call this preparation -phase where block production and voting are paused the silent repair phase. - -* "ephemeral shred version": right now we update `shred_version` during a -cluster restart, it is used to verify received shreds and filter Gossip peers. -In the new repair and restart plan, we introduce a new temporary shred version -in the silent repair phase so validators in restart don't interfere with those -not in restart. Currently this ephemeral shred version is calculated using -`(current_shred_version + 1) % 0xffff`. - -* `RESTART_STAKE_THRESHOLD`: We need enough validators to participate in a -restart so they can make decision for the whole cluster. If everything works -perfect, we only need 2/3 of the total stake. However, validators could die -or perform abnormally, so we currently set the `RESTART_STAKE_THRESHOLD` at -80%, which is the same as now. - -## Motivation - -Currently during a cluster restart, validator operators need to decide the -highest optimistically confirmed slot, then restart the validators with new -commandline arguments. - -The current process involves a lot of human intervention, if people make a -mistake in deciding the highest optimistically confirmed slot, it is -detrimental to the viability of the ecosystem. - -We aim to automate the negotiation of highest optimistically confirmed slot and -the distribution of all blocks on that fork, so that we can lower the -possibility of human mistakes in the cluster restart process. This also reduces -the burden on validator operators, because they don't have to stay around while -the validators automatically try to reach consensus, they will be paged if -things go wrong. - -## Alternatives Considered - -### Automatically detect outage and perform cluster restart -The reaction time of a human in case of emergency is measured in minutes, -while a cluster restart where human initiate validator restarts takes hours. -We consdiered various approaches to automatcially detect outage and perform -cluster restart, which can reduce recovery speed to minutes or even seconds. - -However, automaticlly restarting the whole cluster seems risky. Because -if the recovery process itself doesn't work, it might be some time before -we can get human's attention. And it doesn't solve the cases where new binary -is needed. So for now we still plan to have human in the loop. - -After we gain more experience with the restart apprach in this proposal, we -may slowly try to automate more parts to improve cluster reliability. - -### Use gossip and consensus to figure out restart slot before the restart -The main difference between current proposal and this proposal is that this -proposal will automatically enters restart preparation phase where local -status is exchanged via gossip without human intervention. - -While this improves recovery speed, there are concerns about recovery gossip -messages interfers with normal gossip messages, and automatically start a new -message in gossip seems risky. - -### Automatically reduce block production in an outage -Right now we have vote-only mode, a validator will only pack vote transactions -into new blocks if the tower distance (last_vote - local_root) is greater than -400 slots. - -Unfortunately in the previous outages vote-only mode isn't enough to save the -cluster. There are proposals of more aggressive block production reduction to -save the cluster. For example, a leader could produce only one block in four -consecutive slots allocated to it. - -However, this only solves the problem in specific type of outage, and it seems -risky to aggressively reduce block production, so we are not proceeding with -this proposal. - -## Detailed Design - -The new protocol tries to make all restarting validators get the same -data blocks and the same set of last votes among them, then they will almost -certainly make the same decision on the canonical fork and proceed. - -The steps roughly look like this: - -1. The validator boots into the silent repair phase, it will not make new -blocks or change its votes. The validator propagates its local voted fork to -all other validators in restart. - -2. While counting local vote information from all others in restart, the -validator repairs all blocks which could potentially have been optimistically -confirmed. - -3. After repair is complete, the validator counts votes on each fork and -sends out local heaviest fork. - -4. Each validator counts if enough nodes can agree on one block (same slot and -hash) to restart from: - - 1. If yes, proceed and restart - - 2. If no, freeze and print out what it thinks is wrong, wait for human - -A new command line arg will be added. When the cluster is in need -of a restart, we assume validators holding at least `RESTART_STAKE_THRESHOLD` -percentage of stakes will restart with this arg. Any validators -restarted with this arg does not participate in the normal Turbine protocol, -update its vote, or generate new blocks until all of the following steps are -completed. - -### 1. Gossip last vote before the restart and ancestors on that fork - -The main goal of this step is to propagate the locally selected fork to all -others in restart. - -We use a new Gossip message `LastVotedForkSlots`, its fields are: - -- `last_voted_slot`: `u64` the slot last voted, this also serves as last_slot -for the bit vector. -- `last_voted_hash`: `Hash` the bank hash of the slot last voted slot. -- `ancestors`: `BitVec` compressed bit vector representing the slots on -sender's last voted fork. the most significant bit is always -`last_voted_slot`, least significant bit is `last_voted_slot-81000`. - -The number of ancestor slots sent is hard coded at 81000, because that's -400ms * 81000 = 9 hours, we assume most restart decisions to be made in 9 -hours. If a validator restarts after 9 hours past the outage, it cannot join -the restart this way. If enough validators failed to restart within 9 hours, -then fallback to the manual, interactive cluster restart method. - -When a validator enters restart, it uses ephemeral shred version to avoid -interfering with those outside the restart. There is slight chance that -the ephemeral shred version would collide with the shred version after the -silent repair phase, but even if this rare case occured, we plan to flush the -CRDS table on successful restart, so gossip messages used in restart will be -removed. - -### 2. Repair ledgers up to the restart slot - -The main goal of this step is to repair all blocks which could potentially be -optimistically confirmed. So during next step each validator can select its -own heaviest fork. - -We need to prevent false negative at all costs, because we can't rollback an -optimistcially confirmed block. However, false positive is okay. Because when -we select the heaviest fork in the next step, we should see all the potential -candidates for optimistically confirmed slots, there we can count the votes and -remove some false positive cases. - -However, it's also overkill to repair every block presented by others. When -`LastVotedForkSlots` messages are being received and aggregated, a validator -can categorize blocks missing locally into 3 categories: ignored, unsure, -and must-have. Depending on the stakes of validators currently in restart, some -slots with too few stake can be safely ignored, some have enough stake they -should definitely be repaired, and the rest would need more confirmation before -those blocks are worth repairing. - -Assume `RESTART_STAKE_THRESHOLD` is 80% and that 5% restarted validators can -make mistakes in voting. - -When only 5% validators are in restart, everything is in "unsure" category. - -When 67% validators are in restart, any slot with less than -67% - 5% - (100-67%) = 29% is in "ignored" category, because even if all -validators join the restart, the slot will not get 67% stake. Any slot with -more than 33% stake can potentially affect heaviest fork choice, so it is in -"must-have" category. Any slot with between 29% and 33% stake is "unsure". - -When 80% validators are in restart, any slot with less than -67% - 5% - (100-80%) = 42% is in "ignored" category, the rest is "must-have". - -From above examples, we can see the "must-have" threshold changes dynamically -depending on how many validators are in restart. The main benefit is that a -block will only move from "must-have/unsure" to "ignored" as more validators -join the restart, not vice versa. So validators will have an unchanged list of -blocks to repair when >71% validators joined the restart. - -### 3. Gossip current heaviest fork - -We use a new Gossip message `HeaviestFork`, its fields are: - -- `slot`: `u64` slot of the picked block. -- `hash`: `Hash` bank hash of the picked block. -- `received`: `u8` total percentage of stakes of the validators it received -`HeaviestFork` messages from. - -After receiving `LastVotedForkSlots` from the validators holding stake more -than `RESTART_STAKE_THRESHOLD` and repairing slots with "enough" stake, replay -all blocks and pick the heaviest fork as follows: - -1. Pick block and update root for all blocks with more than 67% votes - -2. If a picked block has more than one children, check if the votes on the -heaviest child is over threshold: - -2.1 If vote_on_child + stake_on_validators_not_in_restart >= 62%, pick child. -For example, if 80% validators are in restart, child has 42% votes, then -42 + (100-80) = 62%, pick child. 62% is chosen instead of 67% because 5% -could make the wrong votes. - -It's okay to use 62% here because the goal is to prevent false negative rather -than false positive. If validators pick a child of optimistically confirmed -block to start from, it's okay because if 75% of the validators all choose this -block, this block will be instantly confirmed on the chain. While not having -any optimistically confirmed block rolled back is the number one goal here. - -2.2 Otherwise stop traversing the tree and use last picked block. - -After deciding heaviest block, gossip -`HeaviestFork(X, Hash(X), received_heaviest_stake)` out, where X is the latest -picked block. We also send out stake of received `HeaviestFork` messages so -that we can proceed to next step when enough validators are ready. - -### 4. Proceed to restart if everything looks okay, halt otherwise - -All validators in restart keep counting the number of Heaviest where -`received_heaviest_stake` is higher than 80%. Once a validator counts that 80% -of the validators send out Heaviest where `received_heaviest_stake` is higher -than 80%, it starts the following checks: - -- Whether all `HeaviestFork` have the same slot and same block Hash. Because -validators are only sending slots instead of bank hashes in -`LastVotedForkSlots`, it's possible that a duplicate block can make the -cluster unable to reach consensus. So block hash needs to be checked as well. - -- The voted slot is equal or a child of local optimistically confirmed slot. - -If the above check passes, the validator immediately starts generation of -snapshot at the agreed upon slot. - -While the snapshot generation is in progress, the validator also checks to see -2 minutes has passed since agreement has been reached, to guarantee its -`HeaviestFork` message propagates to everyone, then proceeds to restart: - -1. Issue a hard fork at the highest oc slot and change shred version in gossip. -2. Execute the current tasks involved in --wait-for-supermajority and wait for 80%. - -Before a validator enters restart, it will still propagate `LastVotedForkSlots` -and `HeaviestFork` messages in gossip. After the restart,its shred_version will -be updated so it will no longer send or propagate gossip messages for restart. - -If any of the checks fails, the validator immediately prints out all debug info, -sends out metrics so that people can be paged, and then halts. - -## Impact - -This proposal adds a new RepairAndRestart mode to validators, during this phase -the validators will not participate in normal cluster activities, which is the -same as now. Compared to today's cluster restart, the new mode may mean more -network bandwidth and memory on the restarting validators, but it guarantees -the safety of optimistically confirmed user transactions, and validator admins -don't need to manually generate and download snapshots again. - -## Security Considerations - -The two added gossip messages `LastVotedForkSlots` and `HeaviestFork` will only -be sent and processed when the validator is restarted in RepairAndRestart mode. -So random validator restarting in the new mode will not bring extra burden to -the system. - -Non-conforming validators could send out wrong `LastVotedForkSlots` and -`HeaviestFork` messages to mess with cluster restarts, these should be included -in the Slashing rules in the future. - -## Backwards Compatibility - -This change is backward compatible with previous versions, because validators -only enter the new mode during new restart mode which is controlled by a -command line argument. All current restart arguments like ---wait-for-supermajority and --expected-bank-hash will be kept as is for now. From 5420a176f4734e88022eebae2d7d956fcc6a970e Mon Sep 17 00:00:00 2001 From: Wen <113942165+wen-coding@users.noreply.github.com> Date: Mon, 15 May 2023 18:25:53 -0700 Subject: [PATCH 038/119] Update proposals/0024-repair-and-restart.md Co-authored-by: mvines --- proposals/0024-repair-and-restart.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/0024-repair-and-restart.md b/proposals/0024-repair-and-restart.md index b2e89747..fdbe149a 100644 --- a/proposals/0024-repair-and-restart.md +++ b/proposals/0024-repair-and-restart.md @@ -33,7 +33,7 @@ are performing cluster restart, we normally start from the highest optimistically confirmed block, but it's also okay to start from a child of the highest optimistically confirmed block as long as consensus can be reached. -* "silent repair phase": In the new repair and restart plan, the validators in +* "silent repair phase": During the proposed optimistic cluster restart automation process, the validators in restart will first spend some time to exchange information, repair missing blocks, and finally reach consensus. The validators only continue normal block production and voting after consensus is reached. We call this preparation From 323ee802dc67762a4be3858f502eaecbc06e28e4 Mon Sep 17 00:00:00 2001 From: Wen <113942165+wen-coding@users.noreply.github.com> Date: Mon, 15 May 2023 18:26:26 -0700 Subject: [PATCH 039/119] Update proposals/0024-repair-and-restart.md Co-authored-by: mvines --- proposals/0024-repair-and-restart.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/0024-repair-and-restart.md b/proposals/0024-repair-and-restart.md index fdbe149a..a0c9cc73 100644 --- a/proposals/0024-repair-and-restart.md +++ b/proposals/0024-repair-and-restart.md @@ -119,7 +119,7 @@ of a restart, we assume validators holding at least `RESTART_STAKE_THRESHOLD` percentage of stakes will restart with this arg. Then the following steps will happen: -1. The validator boots into the silent repair phase, it will not make new +1. The operator restarts the validator with a new command-line argument to cause it to enter the silent repair phase at boot, where it will not make new blocks or change its votes. The validator propagates its local voted fork information to all other validators in restart. From 3b3ef2dfabf5952cf316093f2bb8c429ae9374aa Mon Sep 17 00:00:00 2001 From: Wen <113942165+wen-coding@users.noreply.github.com> Date: Mon, 15 May 2023 18:26:54 -0700 Subject: [PATCH 040/119] Update proposals/0024-repair-and-restart.md Co-authored-by: mvines --- proposals/0024-repair-and-restart.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/0024-repair-and-restart.md b/proposals/0024-repair-and-restart.md index a0c9cc73..f692ec1c 100644 --- a/proposals/0024-repair-and-restart.md +++ b/proposals/0024-repair-and-restart.md @@ -120,7 +120,7 @@ percentage of stakes will restart with this arg. Then the following steps will happen: 1. The operator restarts the validator with a new command-line argument to cause it to enter the silent repair phase at boot, where it will not make new -blocks or change its votes. The validator propagates its local voted fork +blocks or vote. The validator propagates its local voted fork information to all other validators in restart. 2. While counting local vote information from all others in restart, the From 87813b825c2d3ef7066a6c582a6d76e40364956b Mon Sep 17 00:00:00 2001 From: Wen Date: Tue, 16 May 2023 10:52:41 -0700 Subject: [PATCH 041/119] Fix a few wordings. --- proposals/0024-repair-and-restart.md | 55 ++++++++++++++++------------ 1 file changed, 31 insertions(+), 24 deletions(-) diff --git a/proposals/0024-repair-and-restart.md b/proposals/0024-repair-and-restart.md index f692ec1c..e4e7268c 100644 --- a/proposals/0024-repair-and-restart.md +++ b/proposals/0024-repair-and-restart.md @@ -33,18 +33,19 @@ are performing cluster restart, we normally start from the highest optimistically confirmed block, but it's also okay to start from a child of the highest optimistically confirmed block as long as consensus can be reached. -* "silent repair phase": During the proposed optimistic cluster restart automation process, the validators in -restart will first spend some time to exchange information, repair missing -blocks, and finally reach consensus. The validators only continue normal block -production and voting after consensus is reached. We call this preparation -phase where block production and voting are paused the silent repair phase. - -* "ephemeral shred version": right now we update `shred_version` during a +* "silent repair phase": During the proposed optimistic cluster restart +automation process, the validators in restart will first spend some time to +exchange information, repair missing blocks, and finally reach consensus. The +validators only continue normal block production and voting after consensus is +reached. We call this preparation phase where block production and voting are +paused the silent repair phase. + +* "silent repair shred version": right now we update `shred_version` during a cluster restart, it is used to verify received shreds and filter Gossip peers. -In the new repair and restart plan, we introduce a new temporary shred version -in the silent repair phase so validators in restart don't interfere with those -not in restart. Currently this ephemeral shred version is calculated using -`(current_shred_version + 1) % 0xffff`. +In the proposed optimistic cluster restart plan, we introduce a new temporary +shred version in the silent repair phase so validators in restart don't +interfere with those not in restart. Currently this silent repair shred version +is calculated using `(current_shred_version + 1) % 0xffff`. * `RESTART_STAKE_THRESHOLD`: We need enough validators to participate in a restart so they can make decision for the whole cluster. If everything works @@ -66,8 +67,9 @@ We aim to automate the negotiation of highest optimistically confirmed slot and the distribution of all blocks on that fork, so that we can lower the possibility of human mistakes in the cluster restart process. This also reduces the burden on validator operators, because they don't have to stay around while -the validators automatically try to reach consensus, they will be paged if -things go wrong. +the validators automatically try to reach consensus, the validator will halt +and print debug information if anything goes wrong, and operators can set up +their own monitoring accordingly. ## Alternatives Considered @@ -86,9 +88,9 @@ After we gain more experience with the restart approach in this proposal, we may slowly try to automate more parts to improve cluster reliability. ### Use gossip and consensus to figure out restart slot before the restart -The main difference between current proposal and this proposal is that this -proposal will automatically enter restart preparation phase without human -intervention. +The main difference between this and the current proposal is this alternative +tries to make the cluster automatically enter restart preparation phase without +human intervention. While getting human out of the loop improves recovery speed, there are concerns about recovery gossip messages interfering with normal gossip messages, and @@ -159,13 +161,16 @@ hours. If a validator restarts after 9 hours past the outage, it cannot join the restart this way. If enough validators failed to restart within 9 hours, then fallback to the manual, interactive cluster restart method. -When a validator enters restart, it uses ephemeral shred version to avoid +When a validator enters restart, it uses silent repair shred version to avoid interfering with those outside the restart. There is slight chance that -the ephemeral shred version would collide with the shred version after the +the silent repair shred version would collide with the shred version after the silent repair phase, but even if this rare case occurred, we plan to flush the CRDS table on successful restart, so gossip messages used in restart will be removed. +To be extra cautious, we will also filter out `LastVotedForkSlots` and +`HeaviestFork` in gossip if a validator is not in silent repair phase. + ### 2. Repair ledgers up to the restart slot The main goal of this step is to repair all blocks which could potentially be @@ -186,7 +191,8 @@ should definitely be repaired, and the rest would be undecided pending more confirmations. Assume `RESTART_STAKE_THRESHOLD` is 80% and that 5% restarted validators can -make mistakes in voting. +change their votes from what they voted before the restart due to mistakes or +malicious behavior. When only 5% validators are in restart, everything is in "unsure" category. @@ -280,15 +286,16 @@ This proposal adds a new silent repair mode to validators, during this phase the validators will not participate in normal cluster activities, which is the same as now. Compared to today's cluster restart, the new mode may mean more network bandwidth and memory on the restarting validators, but it guarantees -the safety of optimistically confirmed user transactions, and validator admins -don't need to manually generate and download snapshots again. +the safety of optimistically confirmed user transactions, and validator +operators don't need to manually generate and download snapshots again. ## Security Considerations The two added gossip messages `LastVotedForkSlots` and `HeaviestFork` will only -be sent and processed when the validator is restarted in RepairAndRestart mode. -So random validator restarting in the new mode will not bring extra burden to -the system. +be sent and processed when the validator is restarted in the new proposed +optimistic cluster restart mode. They will also be filtered out if a validator +is not in this mode. So random validator restarting in the new mode will not +bring extra burden to the system. Non-conforming validators could send out wrong `LastVotedForkSlots` and `HeaviestFork` messages to mess with cluster restarts, these should be included From 5b69c78c903670c595edbd3e98e1569aefa2a57a Mon Sep 17 00:00:00 2001 From: Wen Date: Tue, 16 May 2023 13:36:28 -0700 Subject: [PATCH 042/119] This proposal is actually proposal 46. --- ...restart.md => 0046-optimistic-cluster-restart-automation.md} | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) rename proposals/{0024-repair-and-restart.md => 0046-optimistic-cluster-restart-automation.md} (99%) diff --git a/proposals/0024-repair-and-restart.md b/proposals/0046-optimistic-cluster-restart-automation.md similarity index 99% rename from proposals/0024-repair-and-restart.md rename to proposals/0046-optimistic-cluster-restart-automation.md index e4e7268c..7e0bbc22 100644 --- a/proposals/0024-repair-and-restart.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -1,5 +1,5 @@ --- -simd: '0024' +simd: '0046' title: Optimistic cluster restart automation authors: - Wen Xu (Solana Labs) From fac8526cfe0efcfad78b571c8f2b44c88c6de644 Mon Sep 17 00:00:00 2001 From: Wen Date: Tue, 16 May 2023 14:33:58 -0700 Subject: [PATCH 043/119] Make linter happy. --- ...6-optimistic-cluster-restart-automation.md | 30 +++++++++++-------- 1 file changed, 17 insertions(+), 13 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index 7e0bbc22..c4e19707 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -74,6 +74,7 @@ their own monitoring accordingly. ## Alternatives Considered ### Automatically detect outage and perform cluster restart + The reaction time of a human in case of emergency is measured in minutes, while a cluster restart where human initiate validator restarts takes hours. We considered various approaches to automatically detect outage and perform @@ -88,6 +89,7 @@ After we gain more experience with the restart approach in this proposal, we may slowly try to automate more parts to improve cluster reliability. ### Use gossip and consensus to figure out restart slot before the restart + The main difference between this and the current proposal is this alternative tries to make the cluster automatically enter restart preparation phase without human intervention. @@ -97,6 +99,7 @@ about recovery gossip messages interfering with normal gossip messages, and automatically start a new message in gossip seems risky. ### Automatically reduce block production in an outage + Right now we have vote-only mode, a validator will only pack vote transactions into new blocks if the tower distance (last_vote - local_root) is greater than 400 slots. @@ -121,7 +124,8 @@ of a restart, we assume validators holding at least `RESTART_STAKE_THRESHOLD` percentage of stakes will restart with this arg. Then the following steps will happen: -1. The operator restarts the validator with a new command-line argument to cause it to enter the silent repair phase at boot, where it will not make new +1. The operator restarts the validator with a new command-line argument to +cause it to enter the silent repair phase at boot, where it will not make new blocks or vote. The validator propagates its local voted fork information to all other validators in restart. @@ -135,9 +139,9 @@ sends out local heaviest fork. 4. Each validator counts if enough nodes can agree on one block (same slot and hash) to restart from: - 1. If yes, proceed and restart + 1. If yes, proceed and restart - 2. If no, print out what it thinks is wrong, halt and wait for human + 2. If no, print out what it thinks is wrong, halt and wait for human See each step explained in details below. @@ -148,10 +152,10 @@ others in restart. We use a new Gossip message `LastVotedForkSlots`, its fields are: -- `last_voted_slot`: `u64` the slot last voted, this also serves as last_slot +* `last_voted_slot`: `u64` the slot last voted, this also serves as last_slot for the bit vector. -- `last_voted_hash`: `Hash` the bank hash of the slot last voted slot. -- `ancestors`: `BitVec` compressed bit vector representing the slots on +* `last_voted_hash`: `Hash` the bank hash of the slot last voted slot. +* `ancestors`: `BitVec` compressed bit vector representing the slots on sender's last voted fork. the most significant bit is always `last_voted_slot`, least significant bit is `last_voted_slot-81000`. @@ -218,9 +222,9 @@ The main goal of this step is to "vote" the heaviest fork to restart from. We use a new Gossip message `HeaviestFork`, its fields are: -- `slot`: `u64` slot of the picked block. -- `hash`: `Hash` bank hash of the picked block. -- `received`: `u8` total percentage of stakes of the validators it received +* `slot`: `u64` slot of the picked block. +* `hash`: `Hash` bank hash of the picked block. +* `received`: `u8` total percentage of stakes of the validators it received `HeaviestFork` messages from. After receiving `LastVotedForkSlots` from the validators holding stake more @@ -232,7 +236,7 @@ replay all blocks and pick the heaviest fork as follows: 2. If a picked block has more than one children, check if the votes on the heaviest child is over threshold: - 1. If vote_on_child + stake_on_validators_not_in_restart >= 62%, pick child. + 1. If vote_on_child + stake_on_validators_not_in_restart >= 62%, pick child. For example, if 80% validators are in restart, child has 42% votes, then 42 + (100-80) = 62%, pick child. 62% is chosen instead of 67% because 5% could make the wrong votes. @@ -242,7 +246,7 @@ than false positive. If validators pick a child of optimistically confirmed block to start from, it's okay because if 80% of the validators all choose this block, this block will be instantly confirmed on the chain. - 2. Otherwise stop traversing the tree and use last picked block. + 2. Otherwise stop traversing the tree and use last picked block. After deciding heaviest block, gossip `HeaviestFork(X, Hash(X), received_heaviest_stake)` out, where X is the latest @@ -256,12 +260,12 @@ All validators in restart keep counting the number of `HeaviestFork` where of the validators send out `HeaviestFork` where `received_heaviest_stake` is higher than 80%, it starts the following checks: -- Whether all `HeaviestFork` have the same slot and same block Hash. Because +* Whether all `HeaviestFork` have the same slot and same block Hash. Because validators are only sending slots instead of bank hashes in `LastVotedForkSlots`, it's possible that a duplicate block can make the cluster unable to reach consensus. So block hash needs to be checked as well. -- The voted slot is equal or a child of local optimistically confirmed slot. +* The voted slot is equal or a child of local optimistically confirmed slot. If all checks pass, the validator immediately starts generation of snapshot at the agreed upon slot. From 351e675993a528b6acb27c3c24134b3d59b79e52 Mon Sep 17 00:00:00 2001 From: Wen Date: Thu, 18 May 2023 16:53:01 -0700 Subject: [PATCH 044/119] Fixes. --- ...6-optimistic-cluster-restart-automation.md | 52 ++++++++++--------- 1 file changed, 28 insertions(+), 24 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index c4e19707..e3e28d6e 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -12,10 +12,11 @@ feature: (fill in with feature tracking issues once accepted) ## Summary -During a cluster restart following an outage, use gossip to exchange local -status and automatically reach consensus on the block to restart from. Proceed -to restart if validators in the restart can reach agreement, or print debug -information and halt otherwise. +During a cluster restart following an outage, make validators enter a separate +recovery protocol that uses gossip to exchange local status and automatically +reach consensus on the block to restart from. Proceed to restart if validators +in the restart can reach agreement, or print debug information and halt +otherwise. ## New Terminology @@ -38,12 +39,12 @@ automation process, the validators in restart will first spend some time to exchange information, repair missing blocks, and finally reach consensus. The validators only continue normal block production and voting after consensus is reached. We call this preparation phase where block production and voting are -paused the silent repair phase. +paused the "silent repair phase". * "silent repair shred version": right now we update `shred_version` during a cluster restart, it is used to verify received shreds and filter Gossip peers. In the proposed optimistic cluster restart plan, we introduce a new temporary -shred version in the silent repair phase so validators in restart don't +shred version in the "silent repair phase" so validators in restart don't interfere with those not in restart. Currently this silent repair shred version is calculated using `(current_shred_version + 1) % 0xffff`. @@ -90,13 +91,13 @@ may slowly try to automate more parts to improve cluster reliability. ### Use gossip and consensus to figure out restart slot before the restart -The main difference between this and the current proposal is this alternative -tries to make the cluster automatically enter restart preparation phase without -human intervention. +The main difference between this and the current restart proposal is this +alternative tries to make the cluster automatically enter restart preparation +phase without human intervention. -While getting human out of the loop improves recovery speed, there are concerns -about recovery gossip messages interfering with normal gossip messages, and -automatically start a new message in gossip seems risky. +While getting humans out of the loop improves recovery speed, there are +concerns about recovery gossip messages interfering with normal gossip +messages, and automatically start a new message in gossip seems risky. ### Automatically reduce block production in an outage @@ -116,8 +117,8 @@ this proposal for now. ## Detailed Design The new protocol tries to make all restarting validators get the same data -blocks and the same set of last votes, then they will almost certainly make the -same decision on the canonical fork and proceed. +blocks and the same set of last votes, so that they will with high probability +converge on the same canonical fork and proceed. A new command line arg will be added. When the cluster is in need of a restart, we assume validators holding at least `RESTART_STAKE_THRESHOLD` @@ -125,11 +126,11 @@ percentage of stakes will restart with this arg. Then the following steps will happen: 1. The operator restarts the validator with a new command-line argument to -cause it to enter the silent repair phase at boot, where it will not make new +cause it to enter the "silent repair phase" at boot, where it will not make new blocks or vote. The validator propagates its local voted fork information to all other validators in restart. -2. While counting local vote information from all others in restart, the +2. While aggregating local vote information from all others in restart, the validator repairs all blocks which could potentially have been optimistically confirmed. @@ -147,8 +148,8 @@ See each step explained in details below. ### 1. Gossip last vote before the restart and ancestors on that fork -The main goal of this step is to propagate the locally selected fork to all -others in restart. +The main goal of this step is to propagate the last `n` ancestors of the last +voted fork to all others in restart. We use a new Gossip message `LastVotedForkSlots`, its fields are: @@ -168,12 +169,11 @@ then fallback to the manual, interactive cluster restart method. When a validator enters restart, it uses silent repair shred version to avoid interfering with those outside the restart. There is slight chance that the silent repair shred version would collide with the shred version after the -silent repair phase, but even if this rare case occurred, we plan to flush the -CRDS table on successful restart, so gossip messages used in restart will be -removed. +"silent repair phase", but even if this rare case occurred, we plan to flush +gossip on successful restart before entering normal validator operation. To be extra cautious, we will also filter out `LastVotedForkSlots` and -`HeaviestFork` in gossip if a validator is not in silent repair phase. +`HeaviestFork` in gossip if a validator is not in "silent repair phase". ### 2. Repair ledgers up to the restart slot @@ -216,6 +216,10 @@ block will only move from "must-have/unsure" to "ignored" as more validators join the restart, not vice versa. So the list of blocks a validator needs to repair will never grow bigger when more validators join the restart. +Once the validator gets LastVotedForkSlots, it can draw a line which are the +"must-have" blocks. When all the "must-have" blocks are repaired and replayed, +it can proceed to step 3. + ### 3. Gossip current heaviest fork The main goal of this step is to "vote" the heaviest fork to restart from. @@ -260,10 +264,10 @@ All validators in restart keep counting the number of `HeaviestFork` where of the validators send out `HeaviestFork` where `received_heaviest_stake` is higher than 80%, it starts the following checks: -* Whether all `HeaviestFork` have the same slot and same block Hash. Because +* Whether all `HeaviestFork` have the same slot and same bank Hash. Because validators are only sending slots instead of bank hashes in `LastVotedForkSlots`, it's possible that a duplicate block can make the -cluster unable to reach consensus. So block hash needs to be checked as well. +cluster unable to reach consensus. So bank hash needs to be checked as well. * The voted slot is equal or a child of local optimistically confirmed slot. From b130ee3c88e98321289dc33898e664d06220cb48 Mon Sep 17 00:00:00 2001 From: Wen Date: Thu, 18 May 2023 17:12:41 -0700 Subject: [PATCH 045/119] Add description of when to enter next step. --- .../0046-optimistic-cluster-restart-automation.md | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index e3e28d6e..a1df566d 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -146,7 +146,7 @@ hash) to restart from: See each step explained in details below. -### 1. Gossip last vote before the restart and ancestors on that fork +### 1. "silent repair phase": Gossip last vote before the restart and ancestors on that fork The main goal of this step is to propagate the last `n` ancestors of the last voted fork to all others in restart. @@ -175,7 +175,11 @@ gossip on successful restart before entering normal validator operation. To be extra cautious, we will also filter out `LastVotedForkSlots` and `HeaviestFork` in gossip if a validator is not in "silent repair phase". -### 2. Repair ledgers up to the restart slot +`LastVotedForkSlots` message will be written into gossip and then be +distributed to all others in restart. Meanwhile the validator can immediately +enter next step. + +### 2. "silent repair phase": Repair ledgers up to the restart slot The main goal of this step is to repair all blocks which could potentially be optimistically confirmed. @@ -220,7 +224,7 @@ Once the validator gets LastVotedForkSlots, it can draw a line which are the "must-have" blocks. When all the "must-have" blocks are repaired and replayed, it can proceed to step 3. -### 3. Gossip current heaviest fork +### 3. "silent repair phase": Gossip current heaviest fork The main goal of this step is to "vote" the heaviest fork to restart from. @@ -257,7 +261,7 @@ After deciding heaviest block, gossip picked block. We also send out stake of received `HeaviestFork` messages so that we can proceed to next step when enough validators are ready. -### 4. Proceed to restart if everything looks okay, halt otherwise +### 4. Exit "silent repair phase": Proceed to restart if everything looks okay, halt otherwise All validators in restart keep counting the number of `HeaviestFork` where `received_heaviest_stake` is higher than 80%. Once a validator counts that 80% From 3234699451738118042b22942018a8db01d9dc27 Mon Sep 17 00:00:00 2001 From: Wen Date: Fri, 19 May 2023 14:41:29 -0700 Subject: [PATCH 046/119] Make linter happy. --- proposals/0046-optimistic-cluster-restart-automation.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index a1df566d..c72c29b6 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -146,7 +146,8 @@ hash) to restart from: See each step explained in details below. -### 1. "silent repair phase": Gossip last vote before the restart and ancestors on that fork +### 1. "silent repair phase": Gossip last vote before the restart and ancestors +on that fork The main goal of this step is to propagate the last `n` ancestors of the last voted fork to all others in restart. @@ -261,7 +262,8 @@ After deciding heaviest block, gossip picked block. We also send out stake of received `HeaviestFork` messages so that we can proceed to next step when enough validators are ready. -### 4. Exit "silent repair phase": Proceed to restart if everything looks okay, halt otherwise +### 4. Exit "silent repair phase": Proceed to restart if everything looks okay, +halt otherwise All validators in restart keep counting the number of `HeaviestFork` where `received_heaviest_stake` is higher than 80%. Once a validator counts that 80% From 76f63bb75fe8d094e18a006a0dc60f8750185e7a Mon Sep 17 00:00:00 2001 From: Wen Date: Tue, 6 Jun 2023 15:56:48 -0700 Subject: [PATCH 047/119] Make linter happy. --- proposals/0046-optimistic-cluster-restart-automation.md | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index c72c29b6..6eef4798 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -146,8 +146,7 @@ hash) to restart from: See each step explained in details below. -### 1. "silent repair phase": Gossip last vote before the restart and ancestors -on that fork +### 1. "silent repair phase": Gossip last vote and ancestors on that fork The main goal of this step is to propagate the last `n` ancestors of the last voted fork to all others in restart. @@ -262,8 +261,7 @@ After deciding heaviest block, gossip picked block. We also send out stake of received `HeaviestFork` messages so that we can proceed to next step when enough validators are ready. -### 4. Exit "silent repair phase": Proceed to restart if everything looks okay, -halt otherwise +### 4. Exit "silent repair phase": Restart if everything okay, halt otherwise All validators in restart keep counting the number of `HeaviestFork` where `received_heaviest_stake` is higher than 80%. Once a validator counts that 80% From 18b3d87bab7d508fd5994ccf1b875a758831fdeb Mon Sep 17 00:00:00 2001 From: Wen <113942165+wen-coding@users.noreply.github.com> Date: Mon, 17 Jul 2023 16:26:45 -0700 Subject: [PATCH 048/119] Update proposals/0046-optimistic-cluster-restart-automation.md Co-authored-by: Trent Nelson --- proposals/0046-optimistic-cluster-restart-automation.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index 6eef4798..a59aec7c 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -105,10 +105,10 @@ Right now we have vote-only mode, a validator will only pack vote transactions into new blocks if the tower distance (last_vote - local_root) is greater than 400 slots. -Unfortunately in the previous outages vote-only mode isn't enough to save the -cluster. There are proposals of more aggressive block production reduction to -save the cluster. For example, a leader could produce only one block in four -consecutive slots allocated to it. +Unfortunately as previous outages demonstrate, vote-only mode isn't enough to +save the cluster in all situations. There are proposals of more aggressive block +production reduction to save the cluster. For example, a leader could produce +only one block in four consecutive slots allocated to it. However, this only solves the problem in specific type of outage, and it seems risky to aggressively reduce block production, so we are not proceeding with From 813c2cfddea98f5b48d98bf202990380813eb48f Mon Sep 17 00:00:00 2001 From: Wen Date: Tue, 18 Jul 2023 11:08:17 -0700 Subject: [PATCH 049/119] Try indent some paragraphs. --- ...46-optimistic-cluster-restart-automation.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index 6eef4798..22475299 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -241,18 +241,18 @@ replay all blocks and pick the heaviest fork as follows: 1. For all blocks with more than 67% votes, they must be on picked fork. -2. If a picked block has more than one children, check if the votes on the +2. If a picked block has more than one child, check if the votes on the heaviest child is over threshold: 1. If vote_on_child + stake_on_validators_not_in_restart >= 62%, pick child. -For example, if 80% validators are in restart, child has 42% votes, then -42 + (100-80) = 62%, pick child. 62% is chosen instead of 67% because 5% -could make the wrong votes. - -It's okay to use 62% here because the goal is to prevent false negative rather -than false positive. If validators pick a child of optimistically confirmed -block to start from, it's okay because if 80% of the validators all choose this -block, this block will be instantly confirmed on the chain. + For example, if 80% validators are in restart, child has 42% votes, then + 42 + (100-80) = 62%, pick child. 62% is chosen instead of 67% because 5% + could make the wrong votes. + + It's okay to use 62% here because the goal is to prevent false negative + rather than false positive. If validators pick a child of optimistically + confirmed block to start from, it's okay because if 80% of the validators + all choose this block, this block will be instantly confirmed on the chain. 2. Otherwise stop traversing the tree and use last picked block. From 3fd02f182c19d992f485333507c32b963a5bda9e Mon Sep 17 00:00:00 2001 From: Wen Date: Tue, 18 Jul 2023 14:11:16 -0700 Subject: [PATCH 050/119] Backtick all new terminologies. --- ...6-optimistic-cluster-restart-automation.md | 72 +++++++++---------- 1 file changed, 36 insertions(+), 36 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index 78d6cb9f..14dc6691 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -20,32 +20,32 @@ otherwise. ## New Terminology -* "cluster restart": When there is an outage such that the whole cluster +* `cluster restart`: When there is an outage such that the whole cluster stalls, human may need to restart most of the validators with a sane state so that the cluster can continue to function. This is different from sporadic single validator restart which does not impact the cluster. See -[cluster restart](https://docs.solana.com/running-validator/restart-cluster) +[`cluster restart`](https://docs.solana.com/running-validator/restart-cluster) for details. -* "optimistically confirmed block": a block which gets the votes from the +* `optimistically confirmed block`: a block which gets the votes from the majority of the validators in a cluster (> 2/3 stake). Our algorithm tries to guarantee that an optimistically confirmed will never be rolled back. When we -are performing cluster restart, we normally start from the highest -optimistically confirmed block, but it's also okay to start from a child of the -highest optimistically confirmed block as long as consensus can be reached. +are performing `cluster restart`, we normally start from the highest +`optimistically confirmed block`, but it's also okay to start from a child of the +highest `optimistically confirmed block` as long as consensus can be reached. -* "silent repair phase": During the proposed optimistic cluster restart +* `silent repair phase`: During the proposed optimistic `cluster restart` automation process, the validators in restart will first spend some time to exchange information, repair missing blocks, and finally reach consensus. The validators only continue normal block production and voting after consensus is reached. We call this preparation phase where block production and voting are -paused the "silent repair phase". +paused the `silent repair phase`. -* "silent repair shred version": right now we update `shred_version` during a -cluster restart, it is used to verify received shreds and filter Gossip peers. -In the proposed optimistic cluster restart plan, we introduce a new temporary -shred version in the "silent repair phase" so validators in restart don't -interfere with those not in restart. Currently this silent repair shred version +* `silent repair shred version`: right now we update `shred_version` during a +`cluster restart`, it is used to verify received shreds and filter Gossip peers. +In the proposed optimistic `cluster restart` plan, we introduce a new temporary +shred version in the `silent repair phase` so validators in restart don't +interfere with those not in restart. Currently this `silent repair shred version` is calculated using `(current_shred_version + 1) % 0xffff`. * `RESTART_STAKE_THRESHOLD`: We need enough validators to participate in a @@ -56,7 +56,7 @@ or perform abnormally, so we currently set the `RESTART_STAKE_THRESHOLD` at ## Motivation -Currently during a cluster restart, validator operators need to decide the +Currently during a `cluster restart`, validator operators need to decide the highest optimistically confirmed slot, then restart the validators with new command-line arguments. @@ -66,7 +66,7 @@ detrimental to the viability of the ecosystem. We aim to automate the negotiation of highest optimistically confirmed slot and the distribution of all blocks on that fork, so that we can lower the -possibility of human mistakes in the cluster restart process. This also reduces +possibility of human mistakes in the `cluster restart` process. This also reduces the burden on validator operators, because they don't have to stay around while the validators automatically try to reach consensus, the validator will halt and print debug information if anything goes wrong, and operators can set up @@ -74,12 +74,12 @@ their own monitoring accordingly. ## Alternatives Considered -### Automatically detect outage and perform cluster restart +### Automatically detect outage and perform `cluster restart` The reaction time of a human in case of emergency is measured in minutes, -while a cluster restart where human initiate validator restarts takes hours. +while a `cluster restart` where human initiate validator restarts takes hours. We considered various approaches to automatically detect outage and perform -cluster restart, which can reduce recovery speed to minutes or even seconds. +`cluster restart`, which can reduce recovery speed to minutes or even seconds. However, automatically restarting the whole cluster seems risky. Because if the recovery process itself doesn't work, it might be some time before @@ -105,10 +105,10 @@ Right now we have vote-only mode, a validator will only pack vote transactions into new blocks if the tower distance (last_vote - local_root) is greater than 400 slots. -Unfortunately as previous outages demonstrate, vote-only mode isn't enough to -save the cluster in all situations. There are proposals of more aggressive block -production reduction to save the cluster. For example, a leader could produce -only one block in four consecutive slots allocated to it. +Unfortunately in the previous outages vote-only mode isn't enough to save the +cluster. There are proposals of more aggressive block production reduction to +save the cluster. For example, a leader could produce only one block in four +consecutive slots allocated to it. However, this only solves the problem in specific type of outage, and it seems risky to aggressively reduce block production, so we are not proceeding with @@ -126,7 +126,7 @@ percentage of stakes will restart with this arg. Then the following steps will happen: 1. The operator restarts the validator with a new command-line argument to -cause it to enter the "silent repair phase" at boot, where it will not make new +cause it to enter the `silent repair phase` at boot, where it will not make new blocks or vote. The validator propagates its local voted fork information to all other validators in restart. @@ -146,7 +146,7 @@ hash) to restart from: See each step explained in details below. -### 1. "silent repair phase": Gossip last vote and ancestors on that fork +### 1. `silent repair phase`: Gossip last vote and ancestors on that fork The main goal of this step is to propagate the last `n` ancestors of the last voted fork to all others in restart. @@ -164,28 +164,28 @@ The number of ancestor slots sent is hard coded at 81000, because that's 400ms * 81000 = 9 hours, we assume most restart decisions to be made in 9 hours. If a validator restarts after 9 hours past the outage, it cannot join the restart this way. If enough validators failed to restart within 9 hours, -then fallback to the manual, interactive cluster restart method. +then fallback to the manual, interactive `cluster restart` method. -When a validator enters restart, it uses silent repair shred version to avoid +When a validator enters restart, it uses `silent repair shred version` to avoid interfering with those outside the restart. There is slight chance that -the silent repair shred version would collide with the shred version after the -"silent repair phase", but even if this rare case occurred, we plan to flush +the `silent repair shred version` would collide with the shred version after the +`silent repair phase`, but even if this rare case occurred, we plan to flush gossip on successful restart before entering normal validator operation. To be extra cautious, we will also filter out `LastVotedForkSlots` and -`HeaviestFork` in gossip if a validator is not in "silent repair phase". +`HeaviestFork` in gossip if a validator is not in `silent repair phase`. `LastVotedForkSlots` message will be written into gossip and then be distributed to all others in restart. Meanwhile the validator can immediately enter next step. -### 2. "silent repair phase": Repair ledgers up to the restart slot +### 2. `silent repair phase`: Repair ledgers up to the restart slot The main goal of this step is to repair all blocks which could potentially be optimistically confirmed. We need to prevent false negative at all costs, because we can't rollback an -optimistically confirmed block. However, false positive is okay. Because when +`optimistically confirmed block`. However, false positive is okay. Because when we select the heaviest fork in the next step, we should see all the potential candidates for optimistically confirmed slots, there we can count the votes and remove some false positive cases. @@ -224,7 +224,7 @@ Once the validator gets LastVotedForkSlots, it can draw a line which are the "must-have" blocks. When all the "must-have" blocks are repaired and replayed, it can proceed to step 3. -### 3. "silent repair phase": Gossip current heaviest fork +### 3. `silent repair phase`: Gossip current heaviest fork The main goal of this step is to "vote" the heaviest fork to restart from. @@ -261,7 +261,7 @@ After deciding heaviest block, gossip picked block. We also send out stake of received `HeaviestFork` messages so that we can proceed to next step when enough validators are ready. -### 4. Exit "silent repair phase": Restart if everything okay, halt otherwise +### 4. Exit `silent repair phase`: Restart if everything okay, halt otherwise All validators in restart keep counting the number of `HeaviestFork` where `received_heaviest_stake` is higher than 80%. Once a validator counts that 80% @@ -296,7 +296,7 @@ sends out metrics so that people can be paged, and then halts. This proposal adds a new silent repair mode to validators, during this phase the validators will not participate in normal cluster activities, which is the -same as now. Compared to today's cluster restart, the new mode may mean more +same as now. Compared to today's `cluster restart`, the new mode may mean more network bandwidth and memory on the restarting validators, but it guarantees the safety of optimistically confirmed user transactions, and validator operators don't need to manually generate and download snapshots again. @@ -305,12 +305,12 @@ operators don't need to manually generate and download snapshots again. The two added gossip messages `LastVotedForkSlots` and `HeaviestFork` will only be sent and processed when the validator is restarted in the new proposed -optimistic cluster restart mode. They will also be filtered out if a validator +optimistic `cluster restart` mode. They will also be filtered out if a validator is not in this mode. So random validator restarting in the new mode will not bring extra burden to the system. Non-conforming validators could send out wrong `LastVotedForkSlots` and -`HeaviestFork` messages to mess with cluster restarts, these should be included +`HeaviestFork` messages to mess with `cluster restart`s, these should be included in the Slashing rules in the future. ## Backwards Compatibility From a9447b44f89e9a288776de752f576684c9d2f145 Mon Sep 17 00:00:00 2001 From: Wen Date: Tue, 18 Jul 2023 14:31:36 -0700 Subject: [PATCH 051/119] Make linter happy. --- ...6-optimistic-cluster-restart-automation.md | 49 ++++++++++--------- 1 file changed, 25 insertions(+), 24 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index 14dc6691..43eb3845 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -31,22 +31,23 @@ for details. majority of the validators in a cluster (> 2/3 stake). Our algorithm tries to guarantee that an optimistically confirmed will never be rolled back. When we are performing `cluster restart`, we normally start from the highest -`optimistically confirmed block`, but it's also okay to start from a child of the -highest `optimistically confirmed block` as long as consensus can be reached. +`optimistically confirmed block`, but it's also okay to start from a child of +the highest `optimistically confirmed block` as long as consensus can be +reached. * `silent repair phase`: During the proposed optimistic `cluster restart` automation process, the validators in restart will first spend some time to exchange information, repair missing blocks, and finally reach consensus. The -validators only continue normal block production and voting after consensus is -reached. We call this preparation phase where block production and voting are +validators only continue normal block production and voting after consensus is +reached. We call this preparation phase where block production and voting are paused the `silent repair phase`. * `silent repair shred version`: right now we update `shred_version` during a -`cluster restart`, it is used to verify received shreds and filter Gossip peers. -In the proposed optimistic `cluster restart` plan, we introduce a new temporary -shred version in the `silent repair phase` so validators in restart don't -interfere with those not in restart. Currently this `silent repair shred version` -is calculated using `(current_shred_version + 1) % 0xffff`. +`cluster restart`, it is used to verify received shreds and filter Gossip +peers. In the proposed optimistic `cluster restart` plan, we introduce a new +temporary shred version in the `silent repair phase` so validators in restart +don't interfere with those not in restart. Currently this `silent repair shred +version` is calculated using `(current_shred_version + 1) % 0xffff`. * `RESTART_STAKE_THRESHOLD`: We need enough validators to participate in a restart so they can make decision for the whole cluster. If everything works @@ -66,11 +67,11 @@ detrimental to the viability of the ecosystem. We aim to automate the negotiation of highest optimistically confirmed slot and the distribution of all blocks on that fork, so that we can lower the -possibility of human mistakes in the `cluster restart` process. This also reduces -the burden on validator operators, because they don't have to stay around while -the validators automatically try to reach consensus, the validator will halt -and print debug information if anything goes wrong, and operators can set up -their own monitoring accordingly. +possibility of human mistakes in the `cluster restart` process. This also +reduces the burden on validator operators, because they don't have to stay +around while the validators automatically try to reach consensus, the validator +will halt and print debug information if anything goes wrong, and operators can +set up their own monitoring accordingly. ## Alternatives Considered @@ -168,9 +169,9 @@ then fallback to the manual, interactive `cluster restart` method. When a validator enters restart, it uses `silent repair shred version` to avoid interfering with those outside the restart. There is slight chance that -the `silent repair shred version` would collide with the shred version after the -`silent repair phase`, but even if this rare case occurred, we plan to flush -gossip on successful restart before entering normal validator operation. +the `silent repair shred version` would collide with the shred version after +the `silent repair phase`, but even if this rare case occurred, we plan to +flush gossip on successful restart before entering normal validator operation. To be extra cautious, we will also filter out `LastVotedForkSlots` and `HeaviestFork` in gossip if a validator is not in `silent repair phase`. @@ -185,7 +186,7 @@ The main goal of this step is to repair all blocks which could potentially be optimistically confirmed. We need to prevent false negative at all costs, because we can't rollback an -`optimistically confirmed block`. However, false positive is okay. Because when +`optimistically confirmed block`. However, false positive is okay. Because when we select the heaviest fork in the next step, we should see all the potential candidates for optimistically confirmed slots, there we can count the votes and remove some false positive cases. @@ -207,7 +208,7 @@ When only 5% validators are in restart, everything is in "unsure" category. When 67% validators are in restart, any slot with less than 67% - 5% - (100-67%) = 29% is in "ignored" category, because even if all validators join the restart, the slot will not get 67% stake. When this -threshold is less than 33%, we temporarily put all blocks with >33% stake into +threshold is less than 33%, we temporarily put all blocks with >33% stake into "must-have" category to speed up repairing. Any slot with between 29% and 33% stake is "unsure". @@ -305,13 +306,13 @@ operators don't need to manually generate and download snapshots again. The two added gossip messages `LastVotedForkSlots` and `HeaviestFork` will only be sent and processed when the validator is restarted in the new proposed -optimistic `cluster restart` mode. They will also be filtered out if a validator -is not in this mode. So random validator restarting in the new mode will not -bring extra burden to the system. +optimistic `cluster restart` mode. They will also be filtered out if a +validator is not in this mode. So random validator restarting in the new mode +will not bring extra burden to the system. Non-conforming validators could send out wrong `LastVotedForkSlots` and -`HeaviestFork` messages to mess with `cluster restart`s, these should be included -in the Slashing rules in the future. +`HeaviestFork` messages to mess with `cluster restart`s, these should be +included in the Slashing rules in the future. ## Backwards Compatibility From e9137032637a2b26330b8bafa2aa21bb974cef89 Mon Sep 17 00:00:00 2001 From: Wen <113942165+wen-coding@users.noreply.github.com> Date: Tue, 18 Jul 2023 15:11:24 -0700 Subject: [PATCH 052/119] Update proposals/0046-optimistic-cluster-restart-automation.md Co-authored-by: Trent Nelson --- proposals/0046-optimistic-cluster-restart-automation.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index 43eb3845..42ef44e3 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -233,7 +233,7 @@ We use a new Gossip message `HeaviestFork`, its fields are: * `slot`: `u64` slot of the picked block. * `hash`: `Hash` bank hash of the picked block. -* `received`: `u8` total percentage of stakes of the validators it received +* `stake_committed_percent`: `u8` total percentage of stakes of the validators it received `HeaviestFork` messages from. After receiving `LastVotedForkSlots` from the validators holding stake more From eb359ac1e20cd090c3393351c277b2ef29635c19 Mon Sep 17 00:00:00 2001 From: Wen Date: Tue, 18 Jul 2023 15:11:53 -0700 Subject: [PATCH 053/119] Remove unnecessary paragraph. --- proposals/0046-optimistic-cluster-restart-automation.md | 4 ---- 1 file changed, 4 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index 43eb3845..099074f6 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -176,10 +176,6 @@ flush gossip on successful restart before entering normal validator operation. To be extra cautious, we will also filter out `LastVotedForkSlots` and `HeaviestFork` in gossip if a validator is not in `silent repair phase`. -`LastVotedForkSlots` message will be written into gossip and then be -distributed to all others in restart. Meanwhile the validator can immediately -enter next step. - ### 2. `silent repair phase`: Repair ledgers up to the restart slot The main goal of this step is to repair all blocks which could potentially be From 192b01c3f4e5a3cbce877ec4a8ad81516a402206 Mon Sep 17 00:00:00 2001 From: Wen <113942165+wen-coding@users.noreply.github.com> Date: Tue, 18 Jul 2023 15:13:18 -0700 Subject: [PATCH 054/119] Update proposals/0046-optimistic-cluster-restart-automation.md Co-authored-by: Trent Nelson --- proposals/0046-optimistic-cluster-restart-automation.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index 83958d2d..ab66ff6b 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -254,7 +254,7 @@ heaviest child is over threshold: 2. Otherwise stop traversing the tree and use last picked block. After deciding heaviest block, gossip -`HeaviestFork(X, Hash(X), received_heaviest_stake)` out, where X is the latest +`HeaviestFork(X.slot, X.hash, committed_stake_percent)` out, where X is the latest picked block. We also send out stake of received `HeaviestFork` messages so that we can proceed to next step when enough validators are ready. From c4d3e3ed2eb4ecbf584ca2d906ecdf2341da8ddd Mon Sep 17 00:00:00 2001 From: Wen <113942165+wen-coding@users.noreply.github.com> Date: Tue, 18 Jul 2023 15:13:25 -0700 Subject: [PATCH 055/119] Update proposals/0046-optimistic-cluster-restart-automation.md Co-authored-by: Trent Nelson --- proposals/0046-optimistic-cluster-restart-automation.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index ab66ff6b..aa6e985f 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -255,7 +255,7 @@ heaviest child is over threshold: After deciding heaviest block, gossip `HeaviestFork(X.slot, X.hash, committed_stake_percent)` out, where X is the latest -picked block. We also send out stake of received `HeaviestFork` messages so +picked block. We also gossip stake of received `HeaviestFork` messages so that we can proceed to next step when enough validators are ready. ### 4. Exit `silent repair phase`: Restart if everything okay, halt otherwise From 8a9990ddecd4851d65a3dcd44c0d02b1ddc8ba88 Mon Sep 17 00:00:00 2001 From: Wen Date: Tue, 18 Jul 2023 15:13:51 -0700 Subject: [PATCH 056/119] Change percent from u8 to u16. --- proposals/0046-optimistic-cluster-restart-automation.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index 83958d2d..eaebbb3e 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -229,8 +229,8 @@ We use a new Gossip message `HeaviestFork`, its fields are: * `slot`: `u64` slot of the picked block. * `hash`: `Hash` bank hash of the picked block. -* `stake_committed_percent`: `u8` total percentage of stakes of the validators it received -`HeaviestFork` messages from. +* `stake_committed_percent`: `u16` total percentage of stakes of the validators +it received `HeaviestFork` messages from. After receiving `LastVotedForkSlots` from the validators holding stake more than `RESTART_STAKE_THRESHOLD` and repairing slots in "must-have" category, From 21878c8560158b2e503ff6484fb6278f7681300e Mon Sep 17 00:00:00 2001 From: Wen Date: Tue, 18 Jul 2023 15:15:21 -0700 Subject: [PATCH 057/119] Make linter happy. --- proposals/0046-optimistic-cluster-restart-automation.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index 02f40b28..77d2374b 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -254,9 +254,9 @@ heaviest child is over threshold: 2. Otherwise stop traversing the tree and use last picked block. After deciding heaviest block, gossip -`HeaviestFork(X.slot, X.hash, committed_stake_percent)` out, where X is the latest -picked block. We also gossip stake of received `HeaviestFork` messages so -that we can proceed to next step when enough validators are ready. +`HeaviestFork(X.slot, X.hash, committed_stake_percent)` out, where X is the +latest picked block. We also gossip stake of received `HeaviestFork` messages +so that we can proceed to next step when enough validators are ready. ### 4. Exit `silent repair phase`: Restart if everything okay, halt otherwise From 879e92dde6e30042ae14f34fb1540e8c8a918c4c Mon Sep 17 00:00:00 2001 From: Wen Date: Tue, 18 Jul 2023 16:16:36 -0700 Subject: [PATCH 058/119] Remove command line reference. --- proposals/0046-optimistic-cluster-restart-automation.md | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index 77d2374b..3d08b7e1 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -121,10 +121,9 @@ The new protocol tries to make all restarting validators get the same data blocks and the same set of last votes, so that they will with high probability converge on the same canonical fork and proceed. -A new command line arg will be added. When the cluster is in need -of a restart, we assume validators holding at least `RESTART_STAKE_THRESHOLD` -percentage of stakes will restart with this arg. Then the following steps -will happen: +When the cluster is in need of a restart, we assume validators holding at least +`RESTART_STAKE_THRESHOLD` percentage of stakes will enter the restart mode. +Then the following steps will happen: 1. The operator restarts the validator with a new command-line argument to cause it to enter the `silent repair phase` at boot, where it will not make new From 5b58b8a9bf4ff6ff0bf548012d2d4cf36c628423 Mon Sep 17 00:00:00 2001 From: Wen Date: Tue, 18 Jul 2023 21:32:39 -0700 Subject: [PATCH 059/119] Revise the threshold for block repair. --- ...6-optimistic-cluster-restart-automation.md | 77 +++++++++++++------ 1 file changed, 52 insertions(+), 25 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index 3d08b7e1..19c58030 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -188,31 +188,57 @@ remove some false positive cases. However, it's also overkill to repair every block presented by others. When `LastVotedForkSlots` messages are being received and aggregated, a validator -can categorize blocks missing locally into 3 categories: ignored, must-have, -and unsure. Depending on the stakes of validators currently in restart, some -slots with too few stake can be safely ignored, some have enough stake they -should definitely be repaired, and the rest would be undecided pending more -confirmations. - -Assume `RESTART_STAKE_THRESHOLD` is 80% and that 5% restarted validators can -change their votes from what they voted before the restart due to mistakes or -malicious behavior. - -When only 5% validators are in restart, everything is in "unsure" category. - -When 67% validators are in restart, any slot with less than -67% - 5% - (100-67%) = 29% is in "ignored" category, because even if all -validators join the restart, the slot will not get 67% stake. When this -threshold is less than 33%, we temporarily put all blocks with >33% stake into -"must-have" category to speed up repairing. Any slot with between 29% and 33% -stake is "unsure". - -When 80% validators are in restart, any slot with less than -67% - 5% - (100-80%) = 42% is in "ignored" category, the rest is "must-have". +can categorize blocks missing locally into 2 categories: must-have and ignored. +Depending on the stakes of validators currently in restart, some slots with too +few stake can be safely ignored, while others will be repaired. + +In the following analysis, we assume: +* `RESTART_STAKE_THRESHOLD` is 80% +* `MALICIOUS_SET` which is validators which can disobey the protocol, is 5%. + For example, these validators can change their votes from what they + previously voted on. +* `OPTIMISTIC_CONFIRMED_THRESHOLD` is 67%, which is the percentage of stake + required to be a `optimistically confirmed block`. + +At any point in restart, let's call percentage of validators not in restart +`PERCENT_NOT_IN_RESTART`. We can draw a line at +`OPTIMISTIC_CONFIRMED_THRESHOLD` - `MALICIOUS_SET` - `PERCENT_NOT_IN_RESTART`. + +Any slot above this line should be repaired, while other slots can be ignored +for now. + +If +`OPTIMISTIC_CONFIRMED_THRESHOLD` - `MALICIOUS_SET` - `PERCENT_NOT_IN_RESTART` +is not positive, then the validators don't have to start any repairs. + +We obviously want to repair all blocks above `OPTIMISTIC_CONFIRMED_THRESHOLD` +before the restart. The validators in `MALICIOUS_SET` could lie about their +votes, so we need to be conservative and lower the line accordingly. Also, +we don't know what the validators not in restart have voted, so we need to +be even more conservative and assume they voted for this block. Being +conservative means we might repair blocks which we didn't need, but we will +never miss any block we should have repaired. + +For example, when only 5% validators are in restart, `PERCENT_NOT_IN_RESTART` +is 100% - 5% = 95%. +`OPTIMISTIC_CONFIRMED_THRESHOLD` - `MALICIOUS_SET` - `PERCENT_NOT_IN_RESTART` += 67% - 5% - 95% < 0, so no validators would repair any block. + +When 70% validators are in restart, `PERCENT_NOT_IN_RESTART` +is 100% - 70% = 30%. +`OPTIMISTIC_CONFIRMED_THRESHOLD` - `MALICIOUS_SET` - `PERCENT_NOT_IN_RESTART` += 67% - 5% - 30% = 32%, so slots with above 32% votes in `LastVotedForkSlots` +would be repaired. + +When 80% validators are in restart, `PERCENT_NOT_IN_RESTART` +is 100% - 80% = 20%. +`OPTIMISTIC_CONFIRMED_THRESHOLD` - `MALICIOUS_SET` - `PERCENT_NOT_IN_RESTART` += 67% - 5% - 20% = 42%, so slots with above 42% votes in `LastVotedForkSlots` +would be repaired. From above examples, we can see the "must-have" threshold changes dynamically depending on how many validators are in restart. The main benefit is that a -block will only move from "must-have/unsure" to "ignored" as more validators +block will only move from "must-have" to "ignored" as more validators join the restart, not vice versa. So the list of blocks a validator needs to repair will never grow bigger when more validators join the restart. @@ -235,10 +261,11 @@ After receiving `LastVotedForkSlots` from the validators holding stake more than `RESTART_STAKE_THRESHOLD` and repairing slots in "must-have" category, replay all blocks and pick the heaviest fork as follows: -1. For all blocks with more than 67% votes, they must be on picked fork. +1. For all blocks with more than 67% stake in `LastVotedForkSlots` messages, + they must be on the heaviest fork. -2. If a picked block has more than one child, check if the votes on the -heaviest child is over threshold: +2. If a picked block has more than one child, check if the heaviest child + should be picked using the following rule: 1. If vote_on_child + stake_on_validators_not_in_restart >= 62%, pick child. For example, if 80% validators are in restart, child has 42% votes, then From d817520894a742737d4b823b8dfbdc5414c04134 Mon Sep 17 00:00:00 2001 From: Wen Date: Tue, 18 Jul 2023 21:36:33 -0700 Subject: [PATCH 060/119] Make linter happy again. --- proposals/0046-optimistic-cluster-restart-automation.md | 1 + 1 file changed, 1 insertion(+) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index 19c58030..adf77f66 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -193,6 +193,7 @@ Depending on the stakes of validators currently in restart, some slots with too few stake can be safely ignored, while others will be repaired. In the following analysis, we assume: + * `RESTART_STAKE_THRESHOLD` is 80% * `MALICIOUS_SET` which is validators which can disobey the protocol, is 5%. For example, these validators can change their votes from what they From fdc753413646ae79a663d3b7f7b7ec88c2d7f682 Mon Sep 17 00:00:00 2001 From: Wen Date: Wed, 19 Jul 2023 09:49:08 -0700 Subject: [PATCH 061/119] Remove 80% reference when we mean RESTART_STAKE_THRESHOLD. --- .../0046-optimistic-cluster-restart-automation.md | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index adf77f66..9dd0226e 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -210,7 +210,7 @@ for now. If `OPTIMISTIC_CONFIRMED_THRESHOLD` - `MALICIOUS_SET` - `PERCENT_NOT_IN_RESTART` -is not positive, then the validators don't have to start any repairs. +is less than 10%, then the validators don't have to start any repairs. We obviously want to repair all blocks above `OPTIMISTIC_CONFIRMED_THRESHOLD` before the restart. The validators in `MALICIOUS_SET` could lie about their @@ -223,7 +223,7 @@ never miss any block we should have repaired. For example, when only 5% validators are in restart, `PERCENT_NOT_IN_RESTART` is 100% - 5% = 95%. `OPTIMISTIC_CONFIRMED_THRESHOLD` - `MALICIOUS_SET` - `PERCENT_NOT_IN_RESTART` -= 67% - 5% - 95% < 0, so no validators would repair any block. += 67% - 5% - 95% < 10%, so no validators would repair any block. When 70% validators are in restart, `PERCENT_NOT_IN_RESTART` is 100% - 70% = 30%. @@ -288,9 +288,10 @@ so that we can proceed to next step when enough validators are ready. ### 4. Exit `silent repair phase`: Restart if everything okay, halt otherwise All validators in restart keep counting the number of `HeaviestFork` where -`received_heaviest_stake` is higher than 80%. Once a validator counts that 80% -of the validators send out `HeaviestFork` where `received_heaviest_stake` is -higher than 80%, it starts the following checks: +`received_heaviest_stake` is higher than `RESTART_STAKE_THRESHOLD`. Once a +validator counts that `RESTART_STAKE_THRESHOLD` of the validators send out +`HeaviestFork` where `received_heaviest_stake` is higher than +`RESTART_STAKE_THRESHOLD`, it starts the following checks: * Whether all `HeaviestFork` have the same slot and same bank Hash. Because validators are only sending slots instead of bank hashes in @@ -307,7 +308,8 @@ whether two minutes has passed since agreement has been reached, to guarantee its `HeaviestFork` message propagates to everyone, then proceeds to restart: 1. Issue a hard fork at the designated slot and change shred version in gossip. -2. Execute the current tasks in --wait-for-supermajority and wait for 80%. +2. Execute the current tasks in --wait-for-supermajority and wait for + `RESTART_STAKE_THRESHOLD` of the total validators to be in ready state. Before a validator enters restart, it will still propagate `LastVotedForkSlots` and `HeaviestFork` messages in gossip. After the restart,its shred_version will From 6b2a0b2066459c38fc694858cb822320339e4004 Mon Sep 17 00:00:00 2001 From: Wen Date: Wed, 19 Jul 2023 09:56:06 -0700 Subject: [PATCH 062/119] Rename HeaviestFork to RestartHeaviestFork. --- ...6-optimistic-cluster-restart-automation.md | 40 ++++++++++--------- 1 file changed, 21 insertions(+), 19 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index 9dd0226e..f25a3e73 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -173,7 +173,7 @@ the `silent repair phase`, but even if this rare case occurred, we plan to flush gossip on successful restart before entering normal validator operation. To be extra cautious, we will also filter out `LastVotedForkSlots` and -`HeaviestFork` in gossip if a validator is not in `silent repair phase`. +`RestartHeaviestFork` in gossip if a validator is not in `silent repair phase`. ### 2. `silent repair phase`: Repair ledgers up to the restart slot @@ -251,12 +251,12 @@ it can proceed to step 3. The main goal of this step is to "vote" the heaviest fork to restart from. -We use a new Gossip message `HeaviestFork`, its fields are: +We use a new Gossip message `RestartHeaviestFork`, its fields are: * `slot`: `u64` slot of the picked block. * `hash`: `Hash` bank hash of the picked block. * `stake_committed_percent`: `u16` total percentage of stakes of the validators -it received `HeaviestFork` messages from. +it received `RestartHeaviestFork` messages from. After receiving `LastVotedForkSlots` from the validators holding stake more than `RESTART_STAKE_THRESHOLD` and repairing slots in "must-have" category, @@ -281,20 +281,20 @@ replay all blocks and pick the heaviest fork as follows: 2. Otherwise stop traversing the tree and use last picked block. After deciding heaviest block, gossip -`HeaviestFork(X.slot, X.hash, committed_stake_percent)` out, where X is the -latest picked block. We also gossip stake of received `HeaviestFork` messages -so that we can proceed to next step when enough validators are ready. +`RestartHeaviestFork(X.slot, X.hash, committed_stake_percent)` out, where X is +the latest picked block. We also gossip stake of received `RestartHeaviestFork` +messages so that we can proceed to next step when enough validators are ready. ### 4. Exit `silent repair phase`: Restart if everything okay, halt otherwise -All validators in restart keep counting the number of `HeaviestFork` where -`received_heaviest_stake` is higher than `RESTART_STAKE_THRESHOLD`. Once a -validator counts that `RESTART_STAKE_THRESHOLD` of the validators send out -`HeaviestFork` where `received_heaviest_stake` is higher than +All validators in restart keep counting the number of `RestartHeaviestFork` +where `received_heaviest_stake` is higher than `RESTART_STAKE_THRESHOLD`. Once +a validator counts that `RESTART_STAKE_THRESHOLD` of the validators send out +`RestartHeaviestFork` where `received_heaviest_stake` is higher than `RESTART_STAKE_THRESHOLD`, it starts the following checks: -* Whether all `HeaviestFork` have the same slot and same bank Hash. Because -validators are only sending slots instead of bank hashes in +* Whether all `RestartHeaviestFork` have the same slot and same bank Hash. +Because validators are only sending slots instead of bank hashes in `LastVotedForkSlots`, it's possible that a duplicate block can make the cluster unable to reach consensus. So bank hash needs to be checked as well. @@ -305,15 +305,17 @@ the agreed upon slot. While the snapshot generation is in progress, the validator also checks to see whether two minutes has passed since agreement has been reached, to guarantee -its `HeaviestFork` message propagates to everyone, then proceeds to restart: +its `RestartHeaviestFork` message propagates to everyone, then proceeds to +restart: 1. Issue a hard fork at the designated slot and change shred version in gossip. 2. Execute the current tasks in --wait-for-supermajority and wait for `RESTART_STAKE_THRESHOLD` of the total validators to be in ready state. Before a validator enters restart, it will still propagate `LastVotedForkSlots` -and `HeaviestFork` messages in gossip. After the restart,its shred_version will -be updated so it will no longer send or propagate gossip messages for restart. +and `RestartHeaviestFork` messages in gossip. After the restart,its +shred_version will be updated so it will no longer send or propagate gossip +messages for restart. If any of the checks fails, the validator immediately prints out all debug info, sends out metrics so that people can be paged, and then halts. @@ -329,14 +331,14 @@ operators don't need to manually generate and download snapshots again. ## Security Considerations -The two added gossip messages `LastVotedForkSlots` and `HeaviestFork` will only -be sent and processed when the validator is restarted in the new proposed -optimistic `cluster restart` mode. They will also be filtered out if a +The two added gossip messages `LastVotedForkSlots` and `RestartHeaviestFork` +will only be sent and processed when the validator is restarted in the new +proposed optimistic `cluster restart` mode. They will also be filtered out if a validator is not in this mode. So random validator restarting in the new mode will not bring extra burden to the system. Non-conforming validators could send out wrong `LastVotedForkSlots` and -`HeaviestFork` messages to mess with `cluster restart`s, these should be +`RestartHeaviestFork` messages to mess with `cluster restart`s, these should be included in the Slashing rules in the future. ## Backwards Compatibility From 6fcc5cfc9e2cc1939947e2a1f1a3e856f41ba4a6 Mon Sep 17 00:00:00 2001 From: Wen Date: Wed, 19 Jul 2023 10:26:27 -0700 Subject: [PATCH 063/119] Rename LastVotedForkSlots to RestartLastVotedForkSlots. --- ...6-optimistic-cluster-restart-automation.md | 58 +++++++++---------- 1 file changed, 29 insertions(+), 29 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index f25a3e73..2d9e3248 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -151,7 +151,7 @@ See each step explained in details below. The main goal of this step is to propagate the last `n` ancestors of the last voted fork to all others in restart. -We use a new Gossip message `LastVotedForkSlots`, its fields are: +We use a new Gossip message `RestartLastVotedForkSlots`, its fields are: * `last_voted_slot`: `u64` the slot last voted, this also serves as last_slot for the bit vector. @@ -172,7 +172,7 @@ the `silent repair shred version` would collide with the shred version after the `silent repair phase`, but even if this rare case occurred, we plan to flush gossip on successful restart before entering normal validator operation. -To be extra cautious, we will also filter out `LastVotedForkSlots` and +To be extra cautious, we will also filter out `RestartLastVotedForkSlots` and `RestartHeaviestFork` in gossip if a validator is not in `silent repair phase`. ### 2. `silent repair phase`: Repair ledgers up to the restart slot @@ -187,10 +187,10 @@ candidates for optimistically confirmed slots, there we can count the votes and remove some false positive cases. However, it's also overkill to repair every block presented by others. When -`LastVotedForkSlots` messages are being received and aggregated, a validator -can categorize blocks missing locally into 2 categories: must-have and ignored. -Depending on the stakes of validators currently in restart, some slots with too -few stake can be safely ignored, while others will be repaired. +`RestartLastVotedForkSlots` messages are being received and aggregated, a +validator can categorize blocks missing locally into 2 categories: must-have +and ignored. Depending on the stakes of validators currently in restart, some +slots with too few stake can be safely ignored, while others will be repaired. In the following analysis, we assume: @@ -228,14 +228,14 @@ is 100% - 5% = 95%. When 70% validators are in restart, `PERCENT_NOT_IN_RESTART` is 100% - 70% = 30%. `OPTIMISTIC_CONFIRMED_THRESHOLD` - `MALICIOUS_SET` - `PERCENT_NOT_IN_RESTART` -= 67% - 5% - 30% = 32%, so slots with above 32% votes in `LastVotedForkSlots` -would be repaired. += 67% - 5% - 30% = 32%, so slots with above 32% votes accumulated from +`RestartLastVotedForkSlots` would be repaired. When 80% validators are in restart, `PERCENT_NOT_IN_RESTART` is 100% - 80% = 20%. `OPTIMISTIC_CONFIRMED_THRESHOLD` - `MALICIOUS_SET` - `PERCENT_NOT_IN_RESTART` -= 67% - 5% - 20% = 42%, so slots with above 42% votes in `LastVotedForkSlots` -would be repaired. += 67% - 5% - 20% = 42%, so slots with above 42% votes accumulated from +`RestartLastVotedForkSlots` would be repaired. From above examples, we can see the "must-have" threshold changes dynamically depending on how many validators are in restart. The main benefit is that a @@ -243,9 +243,9 @@ block will only move from "must-have" to "ignored" as more validators join the restart, not vice versa. So the list of blocks a validator needs to repair will never grow bigger when more validators join the restart. -Once the validator gets LastVotedForkSlots, it can draw a line which are the -"must-have" blocks. When all the "must-have" blocks are repaired and replayed, -it can proceed to step 3. +Once the validator gets `RestartLastVotedForkSlots``, it can draw a line which +are the "must-have" blocks. When all the "must-have" blocks are repaired and +replayed, it can proceed to step 3. ### 3. `silent repair phase`: Gossip current heaviest fork @@ -258,12 +258,12 @@ We use a new Gossip message `RestartHeaviestFork`, its fields are: * `stake_committed_percent`: `u16` total percentage of stakes of the validators it received `RestartHeaviestFork` messages from. -After receiving `LastVotedForkSlots` from the validators holding stake more -than `RESTART_STAKE_THRESHOLD` and repairing slots in "must-have" category, +After receiving `RestartLastVotedForkSlots` from the validators holding stake +more than `RESTART_STAKE_THRESHOLD` and repairing slots in "must-have" category, replay all blocks and pick the heaviest fork as follows: -1. For all blocks with more than 67% stake in `LastVotedForkSlots` messages, - they must be on the heaviest fork. +1. For all blocks with more than 67% stake in `RestartLastVotedForkSlots` + messages, they must be on the heaviest fork. 2. If a picked block has more than one child, check if the heaviest child should be picked using the following rule: @@ -295,7 +295,7 @@ a validator counts that `RESTART_STAKE_THRESHOLD` of the validators send out * Whether all `RestartHeaviestFork` have the same slot and same bank Hash. Because validators are only sending slots instead of bank hashes in -`LastVotedForkSlots`, it's possible that a duplicate block can make the +`RestartLastVotedForkSlots`, it's possible that a duplicate block can make the cluster unable to reach consensus. So bank hash needs to be checked as well. * The voted slot is equal or a child of local optimistically confirmed slot. @@ -312,10 +312,10 @@ restart: 2. Execute the current tasks in --wait-for-supermajority and wait for `RESTART_STAKE_THRESHOLD` of the total validators to be in ready state. -Before a validator enters restart, it will still propagate `LastVotedForkSlots` -and `RestartHeaviestFork` messages in gossip. After the restart,its -shred_version will be updated so it will no longer send or propagate gossip -messages for restart. +Before a validator enters restart, it will still propagate +`RestartLastVotedForkSlots` and `RestartHeaviestFork` messages in gossip. After +the restart,its shred_version will be updated so it will no longer send or +propagate gossip messages for restart. If any of the checks fails, the validator immediately prints out all debug info, sends out metrics so that people can be paged, and then halts. @@ -331,17 +331,17 @@ operators don't need to manually generate and download snapshots again. ## Security Considerations -The two added gossip messages `LastVotedForkSlots` and `RestartHeaviestFork` -will only be sent and processed when the validator is restarted in the new -proposed optimistic `cluster restart` mode. They will also be filtered out if a -validator is not in this mode. So random validator restarting in the new mode -will not bring extra burden to the system. +The two added gossip messages `RestartLastVotedForkSlots` and +`RestartHeaviestFork` will only be sent and processed when the validator is +restarted in the new proposed optimistic `cluster restart` mode. They will also +be filtered out if a validator is not in this mode. So random validator\ +restarting in the new mode will not bring extra burden to the system. -Non-conforming validators could send out wrong `LastVotedForkSlots` and +Non-conforming validators could send out wrong `RestartLastVotedForkSlots` and `RestartHeaviestFork` messages to mess with `cluster restart`s, these should be included in the Slashing rules in the future. -## Backwards Compatibility +## Backwards Compatibilityz This change is backward compatible with previous versions, because validators only enter the new mode during new restart mode which is controlled by a From 761458788db1ed923d59e5e1af9a4f0ba346dc7f Mon Sep 17 00:00:00 2001 From: Wen Date: Wed, 19 Jul 2023 13:27:18 -0700 Subject: [PATCH 064/119] Change format of examples. --- ...6-optimistic-cluster-restart-automation.md | 23 ++++++++----------- 1 file changed, 10 insertions(+), 13 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index 2d9e3248..cae21289 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -220,19 +220,18 @@ be even more conservative and assume they voted for this block. Being conservative means we might repair blocks which we didn't need, but we will never miss any block we should have repaired. -For example, when only 5% validators are in restart, `PERCENT_NOT_IN_RESTART` -is 100% - 5% = 95%. +Next we illustrate the system behavior using concrete numbers: + +* 5% validators in restart: `PERCENT_NOT_IN_RESTART` is 100% - 5% = 95%. `OPTIMISTIC_CONFIRMED_THRESHOLD` - `MALICIOUS_SET` - `PERCENT_NOT_IN_RESTART` = 67% - 5% - 95% < 10%, so no validators would repair any block. -When 70% validators are in restart, `PERCENT_NOT_IN_RESTART` -is 100% - 70% = 30%. +* 70% validators in restart: `PERCENT_NOT_IN_RESTART` is 100% - 70% = 30%. `OPTIMISTIC_CONFIRMED_THRESHOLD` - `MALICIOUS_SET` - `PERCENT_NOT_IN_RESTART` = 67% - 5% - 30% = 32%, so slots with above 32% votes accumulated from `RestartLastVotedForkSlots` would be repaired. -When 80% validators are in restart, `PERCENT_NOT_IN_RESTART` -is 100% - 80% = 20%. +* 80% validators are in restart, `PERCENT_NOT_IN_RESTART` is 100% - 80% = 20%. `OPTIMISTIC_CONFIRMED_THRESHOLD` - `MALICIOUS_SET` - `PERCENT_NOT_IN_RESTART` = 67% - 5% - 20% = 42%, so slots with above 42% votes accumulated from `RestartLastVotedForkSlots` would be repaired. @@ -304,13 +303,11 @@ If all checks pass, the validator immediately starts generation of snapshot at the agreed upon slot. While the snapshot generation is in progress, the validator also checks to see -whether two minutes has passed since agreement has been reached, to guarantee -its `RestartHeaviestFork` message propagates to everyone, then proceeds to -restart: - -1. Issue a hard fork at the designated slot and change shred version in gossip. -2. Execute the current tasks in --wait-for-supermajority and wait for - `RESTART_STAKE_THRESHOLD` of the total validators to be in ready state. +whether a full two minutes interval passed since agreement had been reached, +to guarantee its `RestartHeaviestFork` message propagates to everyone, and +whether the snapshot generation is complete. When above checks pass, it then +proceeds to issue a hard fork at the designated slot and change shred version +in gossip. After that it restarts into the normal (non-restart) state. Before a validator enters restart, it will still propagate `RestartLastVotedForkSlots` and `RestartHeaviestFork` messages in gossip. After From ba1c9d445470a8b7342c5c85eaaee67c195a80fe Mon Sep 17 00:00:00 2001 From: Wen Date: Wed, 19 Jul 2023 13:54:54 -0700 Subject: [PATCH 065/119] Change format of the bullet list. --- .../0046-optimistic-cluster-restart-automation.md | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index cae21289..2ced04bd 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -146,7 +146,9 @@ hash) to restart from: See each step explained in details below. -### 1. `silent repair phase`: Gossip last vote and ancestors on that fork +### Silent repair mode + +1. **Gossip last vote and ancestors on that fork** The main goal of this step is to propagate the last `n` ancestors of the last voted fork to all others in restart. @@ -175,7 +177,7 @@ flush gossip on successful restart before entering normal validator operation. To be extra cautious, we will also filter out `RestartLastVotedForkSlots` and `RestartHeaviestFork` in gossip if a validator is not in `silent repair phase`. -### 2. `silent repair phase`: Repair ledgers up to the restart slot +2. **Repair ledgers up to the restart slot** The main goal of this step is to repair all blocks which could potentially be optimistically confirmed. @@ -246,7 +248,7 @@ Once the validator gets `RestartLastVotedForkSlots``, it can draw a line which are the "must-have" blocks. When all the "must-have" blocks are repaired and replayed, it can proceed to step 3. -### 3. `silent repair phase`: Gossip current heaviest fork +3. **Gossip current heaviest fork** The main goal of this step is to "vote" the heaviest fork to restart from. @@ -284,7 +286,9 @@ After deciding heaviest block, gossip the latest picked block. We also gossip stake of received `RestartHeaviestFork` messages so that we can proceed to next step when enough validators are ready. -### 4. Exit `silent repair phase`: Restart if everything okay, halt otherwise +### Exit `silent repair phase` + +4. **Restart if everything okay, halt otherwise** All validators in restart keep counting the number of `RestartHeaviestFork` where `received_heaviest_stake` is higher than `RESTART_STAKE_THRESHOLD`. Once From 2dffa199282fb072c9fb4cbaa8a7b7cc28d6fa7d Mon Sep 17 00:00:00 2001 From: Wen Date: Wed, 19 Jul 2023 14:22:10 -0700 Subject: [PATCH 066/119] Change reasoning of 81000 slots. --- proposals/0046-optimistic-cluster-restart-automation.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index 2ced04bd..90014bf8 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -163,10 +163,11 @@ sender's last voted fork. the most significant bit is always `last_voted_slot`, least significant bit is `last_voted_slot-81000`. The number of ancestor slots sent is hard coded at 81000, because that's -400ms * 81000 = 9 hours, we assume most restart decisions to be made in 9 -hours. If a validator restarts after 9 hours past the outage, it cannot join -the restart this way. If enough validators failed to restart within 9 hours, -then fallback to the manual, interactive `cluster restart` method. +400ms * 81000 = 9 hours, we assume that optimistic confirmation must halt +within 81k slots of the last finalized block. If a validator restarts after 9 +hours past the outage, it cannot join the restart this way. If enough +validators failed to restart within 9 hours, then fallback to the manual, +interactive `cluster restart` method. When a validator enters restart, it uses `silent repair shred version` to avoid interfering with those outside the restart. There is slight chance that From 5635ad345c99be6b38d90e42652dce7258c2325a Mon Sep 17 00:00:00 2001 From: Wen Date: Thu, 20 Jul 2023 09:49:24 -0700 Subject: [PATCH 067/119] Replace silent repair with new name "wen restart". --- ...6-optimistic-cluster-restart-automation.md | 26 +++++++++---------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index 90014bf8..fd5de22d 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -1,6 +1,6 @@ --- simd: '0046' -title: Optimistic cluster restart automation +title: Wen restart: optimistic cluster restart automation authors: - Wen Xu (Solana Labs) category: Standard @@ -35,18 +35,18 @@ are performing `cluster restart`, we normally start from the highest the highest `optimistically confirmed block` as long as consensus can be reached. -* `silent repair phase`: During the proposed optimistic `cluster restart` +* `wen restart phase`: During the proposed optimistic `cluster restart` automation process, the validators in restart will first spend some time to exchange information, repair missing blocks, and finally reach consensus. The validators only continue normal block production and voting after consensus is reached. We call this preparation phase where block production and voting are -paused the `silent repair phase`. +paused the `wen restart phase`. -* `silent repair shred version`: right now we update `shred_version` during a +* `wen restart shred version`: right now we update `shred_version` during a `cluster restart`, it is used to verify received shreds and filter Gossip peers. In the proposed optimistic `cluster restart` plan, we introduce a new -temporary shred version in the `silent repair phase` so validators in restart -don't interfere with those not in restart. Currently this `silent repair shred +temporary shred version in the `wen restart phase` so validators in restart +don't interfere with those not in restart. Currently this `wen restart shred version` is calculated using `(current_shred_version + 1) % 0xffff`. * `RESTART_STAKE_THRESHOLD`: We need enough validators to participate in a @@ -126,7 +126,7 @@ When the cluster is in need of a restart, we assume validators holding at least Then the following steps will happen: 1. The operator restarts the validator with a new command-line argument to -cause it to enter the `silent repair phase` at boot, where it will not make new +cause it to enter the `wen restart phase` at boot, where it will not make new blocks or vote. The validator propagates its local voted fork information to all other validators in restart. @@ -169,14 +169,14 @@ hours past the outage, it cannot join the restart this way. If enough validators failed to restart within 9 hours, then fallback to the manual, interactive `cluster restart` method. -When a validator enters restart, it uses `silent repair shred version` to avoid +When a validator enters restart, it uses `wen restart shred version` to avoid interfering with those outside the restart. There is slight chance that -the `silent repair shred version` would collide with the shred version after -the `silent repair phase`, but even if this rare case occurred, we plan to +the `wen restart shred version` would collide with the shred version after +the `wen restart phase`, but even if this rare case occurred, we plan to flush gossip on successful restart before entering normal validator operation. To be extra cautious, we will also filter out `RestartLastVotedForkSlots` and -`RestartHeaviestFork` in gossip if a validator is not in `silent repair phase`. +`RestartHeaviestFork` in gossip if a validator is not in `wen restart phase`. 2. **Repair ledgers up to the restart slot** @@ -287,7 +287,7 @@ After deciding heaviest block, gossip the latest picked block. We also gossip stake of received `RestartHeaviestFork` messages so that we can proceed to next step when enough validators are ready. -### Exit `silent repair phase` +### Exit `wen restart phase` 4. **Restart if everything okay, halt otherwise** @@ -324,7 +324,7 @@ sends out metrics so that people can be paged, and then halts. ## Impact -This proposal adds a new silent repair mode to validators, during this phase +This proposal adds a new wen restart mode to validators, during this phase the validators will not participate in normal cluster activities, which is the same as now. Compared to today's `cluster restart`, the new mode may mean more network bandwidth and memory on the restarting validators, but it guarantees From 7adb22b79fbde0bf579c523201b404f22f3619cc Mon Sep 17 00:00:00 2001 From: Wen Date: Thu, 20 Jul 2023 10:41:52 -0700 Subject: [PATCH 068/119] Try to make linter happy. --- proposals/0046-optimistic-cluster-restart-automation.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index fd5de22d..cc8c66db 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -1,6 +1,6 @@ --- simd: '0046' -title: Wen restart: optimistic cluster restart automation +title: "Wen restart" - Optimistic cluster restart automation authors: - Wen Xu (Solana Labs) category: Standard From 16e4ec8649453aac67678e481291d15dcc26f934 Mon Sep 17 00:00:00 2001 From: Wen Date: Thu, 20 Jul 2023 10:50:01 -0700 Subject: [PATCH 069/119] Make linter happy again. --- proposals/0046-optimistic-cluster-restart-automation.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index cc8c66db..f8bb5e3e 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -1,6 +1,6 @@ --- simd: '0046' -title: "Wen restart" - Optimistic cluster restart automation +title: Wen Restart (Optimistic cluster restart automation) authors: - Wen Xu (Solana Labs) category: Standard From 3fa3b9aefa70fdb415b403e4b7977c5db9e8e69a Mon Sep 17 00:00:00 2001 From: Wen Date: Thu, 20 Jul 2023 10:52:30 -0700 Subject: [PATCH 070/119] Back to the title linter likes. --- proposals/0046-optimistic-cluster-restart-automation.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index f8bb5e3e..25687c49 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -1,6 +1,6 @@ --- simd: '0046' -title: Wen Restart (Optimistic cluster restart automation) +title: Optimistic cluster restart automation authors: - Wen Xu (Solana Labs) category: Standard From b6fc273748fce22cdbe084b71cdfdce20ab4b18b Mon Sep 17 00:00:00 2001 From: Wen Date: Thu, 20 Jul 2023 11:12:48 -0700 Subject: [PATCH 071/119] Add cluster restart slot to the doc. --- ...46-optimistic-cluster-restart-automation.md | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index 25687c49..05737362 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -27,14 +27,17 @@ single validator restart which does not impact the cluster. See [`cluster restart`](https://docs.solana.com/running-validator/restart-cluster) for details. -* `optimistically confirmed block`: a block which gets the votes from the -majority of the validators in a cluster (> 2/3 stake). Our algorithm tries to -guarantee that an optimistically confirmed will never be rolled back. When we -are performing `cluster restart`, we normally start from the highest -`optimistically confirmed block`, but it's also okay to start from a child of +* `cluster restart slot`: In current `cluster restart` scheme, human normally +decide on one slot for all validators to restart from. This is very often the +highest `optimistically confirmed block`, because `optimistically confirmed +block` should never be rolled back. But it's also okay to start from a child of the highest `optimistically confirmed block` as long as consensus can be reached. +* `optimistically confirmed block`: a block which gets the votes from the +majority of the validators in a cluster (> 2/3 stake). Our algorithm tries to +guarantee that an optimistically confirmed block will never be rolled back. + * `wen restart phase`: During the proposed optimistic `cluster restart` automation process, the validators in restart will first spend some time to exchange information, repair missing blocks, and finally reach consensus. The @@ -291,6 +294,9 @@ messages so that we can proceed to next step when enough validators are ready. 4. **Restart if everything okay, halt otherwise** +The main purpose in this step is to decide the `cluster restart slot` and the +actual block to restart from. + All validators in restart keep counting the number of `RestartHeaviestFork` where `received_heaviest_stake` is higher than `RESTART_STAKE_THRESHOLD`. Once a validator counts that `RESTART_STAKE_THRESHOLD` of the validators send out @@ -305,7 +311,7 @@ cluster unable to reach consensus. So bank hash needs to be checked as well. * The voted slot is equal or a child of local optimistically confirmed slot. If all checks pass, the validator immediately starts generation of snapshot at -the agreed upon slot. +the agreed upon `cluster restart slot`. While the snapshot generation is in progress, the validator also checks to see whether a full two minutes interval passed since agreement had been reached, From 005caae3fa9c973199cb961590b50dfd220488ac Mon Sep 17 00:00:00 2001 From: Wen Date: Thu, 20 Jul 2023 19:18:28 -0700 Subject: [PATCH 072/119] Small fixes. --- ...6-optimistic-cluster-restart-automation.md | 45 +++++++++---------- 1 file changed, 22 insertions(+), 23 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index 05737362..da789a5a 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -16,7 +16,8 @@ During a cluster restart following an outage, make validators enter a separate recovery protocol that uses gossip to exchange local status and automatically reach consensus on the block to restart from. Proceed to restart if validators in the restart can reach agreement, or print debug information and halt -otherwise. +otherwise. To distinguish the new restart process from other operations, we +call the new process "Wen restart". ## New Terminology @@ -28,7 +29,7 @@ single validator restart which does not impact the cluster. See for details. * `cluster restart slot`: In current `cluster restart` scheme, human normally -decide on one slot for all validators to restart from. This is very often the +decide on one block for all validators to restart from. This is very often the highest `optimistically confirmed block`, because `optimistically confirmed block` should never be rolled back. But it's also okay to start from a child of the highest `optimistically confirmed block` as long as consensus can be @@ -46,7 +47,7 @@ reached. We call this preparation phase where block production and voting are paused the `wen restart phase`. * `wen restart shred version`: right now we update `shred_version` during a -`cluster restart`, it is used to verify received shreds and filter Gossip +`cluster restart`, it is used to verify received shreds and filter gossip peers. In the proposed optimistic `cluster restart` plan, we introduce a new temporary shred version in the `wen restart phase` so validators in restart don't interfere with those not in restart. Currently this `wen restart shred @@ -128,10 +129,9 @@ When the cluster is in need of a restart, we assume validators holding at least `RESTART_STAKE_THRESHOLD` percentage of stakes will enter the restart mode. Then the following steps will happen: -1. The operator restarts the validator with a new command-line argument to -cause it to enter the `wen restart phase` at boot, where it will not make new -blocks or vote. The validator propagates its local voted fork -information to all other validators in restart. +1. The operator restarts the validator into the `wen restart phase` at boot, +where it will not make new blocks or vote. The validator propagates its local +voted fork information to all other validators in restart. 2. While aggregating local vote information from all others in restart, the validator repairs all blocks which could potentially have been optimistically @@ -149,14 +149,14 @@ hash) to restart from: See each step explained in details below. -### Silent repair mode +### Wen restart phase -1. **Gossip last vote and ancestors on that fork** +1. **gossip last vote and ancestors on that fork** -The main goal of this step is to propagate the last `n` ancestors of the last +The main goal of this step is to propagate most recent ancestors on the last voted fork to all others in restart. -We use a new Gossip message `RestartLastVotedForkSlots`, its fields are: +We use a new gossip message `RestartLastVotedForkSlots`, its fields are: * `last_voted_slot`: `u64` the slot last voted, this also serves as last_slot for the bit vector. @@ -167,16 +167,16 @@ sender's last voted fork. the most significant bit is always The number of ancestor slots sent is hard coded at 81000, because that's 400ms * 81000 = 9 hours, we assume that optimistic confirmation must halt -within 81k slots of the last finalized block. If a validator restarts after 9 +within 81k slots of the last confirmed block. If a validator restarts after 9 hours past the outage, it cannot join the restart this way. If enough -validators failed to restart within 9 hours, then fallback to the manual, +validators failed to restart within 9 hours, then we fallback to the manual, interactive `cluster restart` method. When a validator enters restart, it uses `wen restart shred version` to avoid -interfering with those outside the restart. There is slight chance that +interfering with those outside the restart. There is a slight chance that the `wen restart shred version` would collide with the shred version after the `wen restart phase`, but even if this rare case occurred, we plan to -flush gossip on successful restart before entering normal validator operation. +flush gossip after successful restart so it should not be a problem. To be extra cautious, we will also filter out `RestartLastVotedForkSlots` and `RestartHeaviestFork` in gossip if a validator is not in `wen restart phase`. @@ -212,11 +212,10 @@ At any point in restart, let's call percentage of validators not in restart `OPTIMISTIC_CONFIRMED_THRESHOLD` - `MALICIOUS_SET` - `PERCENT_NOT_IN_RESTART`. Any slot above this line should be repaired, while other slots can be ignored -for now. - -If +for now. But if `OPTIMISTIC_CONFIRMED_THRESHOLD` - `MALICIOUS_SET` - `PERCENT_NOT_IN_RESTART` -is less than 10%, then the validators don't have to start any repairs. +is less than 10%, then the validators don't have to start any repairs. Because +the signal now is too noisy. We obviously want to repair all blocks above `OPTIMISTIC_CONFIRMED_THRESHOLD` before the restart. The validators in `MALICIOUS_SET` could lie about their @@ -248,15 +247,15 @@ block will only move from "must-have" to "ignored" as more validators join the restart, not vice versa. So the list of blocks a validator needs to repair will never grow bigger when more validators join the restart. -Once the validator gets `RestartLastVotedForkSlots``, it can draw a line which -are the "must-have" blocks. When all the "must-have" blocks are repaired and +Once the validator gets `RestartLastVotedForkSlots`, it can calculate which +blocks must be repaired. When all those "must-have" blocks are repaired and replayed, it can proceed to step 3. -3. **Gossip current heaviest fork** +3. **gossip current heaviest fork** The main goal of this step is to "vote" the heaviest fork to restart from. -We use a new Gossip message `RestartHeaviestFork`, its fields are: +We use a new gossip message `RestartHeaviestFork`, its fields are: * `slot`: `u64` slot of the picked block. * `hash`: `Hash` bank hash of the picked block. From 5fcbcd1fc845e7ee10c16cef04bb619103996195 Mon Sep 17 00:00:00 2001 From: Wen Date: Mon, 24 Jul 2023 11:58:20 -0700 Subject: [PATCH 073/119] Add handling for oscillating info. --- proposals/0046-optimistic-cluster-restart-automation.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index da789a5a..6fc6092b 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -348,6 +348,14 @@ Non-conforming validators could send out wrong `RestartLastVotedForkSlots` and `RestartHeaviestFork` messages to mess with `cluster restart`s, these should be included in the Slashing rules in the future. +### Discarding oscillating votes +Non-conforming validators could change their last votes back and forth, this +could lead to instability in the system. Considering that during an outage, an +operator could find out that wrong info was sent out and try to correct it. We +allow `RestartLastVotedForkSlots` be changed 3 times, after that all updates +from the validator with the same pubkey will be simply ignored. We allow +`RestartHeaviestFork` to change until the validator exits `wen restart phase`. + ## Backwards Compatibilityz This change is backward compatible with previous versions, because validators From 8f7f7528c1c5e5652b2f380329b24e610ebbc1db Mon Sep 17 00:00:00 2001 From: Wen Date: Mon, 24 Jul 2023 12:06:43 -0700 Subject: [PATCH 074/119] Make linter happy. --- proposals/0046-optimistic-cluster-restart-automation.md | 1 + 1 file changed, 1 insertion(+) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index 6fc6092b..4309fb74 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -349,6 +349,7 @@ Non-conforming validators could send out wrong `RestartLastVotedForkSlots` and included in the Slashing rules in the future. ### Discarding oscillating votes + Non-conforming validators could change their last votes back and forth, this could lead to instability in the system. Considering that during an outage, an operator could find out that wrong info was sent out and try to correct it. We From 50d5bcbf522e080798cef337f3ff792c21e1a302 Mon Sep 17 00:00:00 2001 From: Wen Date: Wed, 26 Jul 2023 11:37:48 -0700 Subject: [PATCH 075/119] Add epoch boundary handling. --- ...0046-optimistic-cluster-restart-automation.md | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index 4309fb74..e919d955 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -357,6 +357,22 @@ allow `RestartLastVotedForkSlots` be changed 3 times, after that all updates from the validator with the same pubkey will be simply ignored. We allow `RestartHeaviestFork` to change until the validator exits `wen restart phase`. +### Handling multiple epochs + +Even though it's not very common that an outage happens across an epoch +boundary, we do need to prepare for this rare case. Because the main purpose +of `wen restart` is to make everyone reach aggrement, the following choices +are made: + +* Every validator only handles 2 epochs, any validator will discard slots +which belong to an epoch which is > 1 epoch away from its root. If a validator +has very old root so it can't proceed, it will exit and report error. + +* The stake weight of each slot is calculated using the epoch the slot is in. +If a validator is missing epoch stakes for a new epoch, it will use the epoch +stakes of its root to approximate the results, and update all calculation once +the first bank has been accepted in the new epoch. + ## Backwards Compatibilityz This change is backward compatible with previous versions, because validators From 397e98bcaecd038fcd5a8853ac86321fb94226b5 Mon Sep 17 00:00:00 2001 From: Wen Date: Wed, 26 Jul 2023 11:47:46 -0700 Subject: [PATCH 076/119] Add cluster wide threshold calculation across Epoch boundary. --- proposals/0046-optimistic-cluster-restart-automation.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index e919d955..38c9380e 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -373,6 +373,10 @@ If a validator is missing epoch stakes for a new epoch, it will use the epoch stakes of its root to approximate the results, and update all calculation once the first bank has been accepted in the new epoch. +* When calculating cluster wide threshold (e.g. how many validators are in the +restart), use the stake weight in the new Epoch. If there is no bank in the new +Epoch yet, use old Epoch stakes to approximate and update later. + ## Backwards Compatibilityz This change is backward compatible with previous versions, because validators From 6190f3514076a21f5020890714de5c818712856b Mon Sep 17 00:00:00 2001 From: Wen Date: Thu, 27 Jul 2023 11:28:35 -0700 Subject: [PATCH 077/119] Update cross epoch stake selection. --- proposals/0046-optimistic-cluster-restart-automation.md | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index 38c9380e..eb9246e6 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -374,8 +374,12 @@ stakes of its root to approximate the results, and update all calculation once the first bank has been accepted in the new epoch. * When calculating cluster wide threshold (e.g. how many validators are in the -restart), use the stake weight in the new Epoch. If there is no bank in the new -Epoch yet, use old Epoch stakes to approximate and update later. +restart), use the stake weight of the slot selected in `RestartHeaviestFork`. +If there is no bank in the new Epoch or no slot has been selected yet, use +Epoch stakes of local root bank to approximate and update later. + +* The `stake_committed_percent` in `RestartHeaviestFork` should always be +calculated using the stakes on the selected slot. ## Backwards Compatibilityz From e4e8d84a259e58689f420f1cfdbf6926b881f98e Mon Sep 17 00:00:00 2001 From: Wen Date: Thu, 27 Jul 2023 11:45:58 -0700 Subject: [PATCH 078/119] Correct mistake in description. --- proposals/0046-optimistic-cluster-restart-automation.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index eb9246e6..64c714c8 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -375,8 +375,8 @@ the first bank has been accepted in the new epoch. * When calculating cluster wide threshold (e.g. how many validators are in the restart), use the stake weight of the slot selected in `RestartHeaviestFork`. -If there is no bank in the new Epoch or no slot has been selected yet, use -Epoch stakes of local root bank to approximate and update later. +If no slot has been selected yet, use Epoch stakes of local root bank to +approximate and update later. * The `stake_committed_percent` in `RestartHeaviestFork` should always be calculated using the stakes on the selected slot. From 51e81d977d94793d6b5b4fc2bba83c800b612599 Mon Sep 17 00:00:00 2001 From: Wen Date: Tue, 1 Aug 2023 11:02:59 -0500 Subject: [PATCH 079/119] Make it clear we are generating incremental snapshot. --- proposals/0046-optimistic-cluster-restart-automation.md | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index 64c714c8..cb17cb0a 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -309,13 +309,10 @@ cluster unable to reach consensus. So bank hash needs to be checked as well. * The voted slot is equal or a child of local optimistically confirmed slot. -If all checks pass, the validator immediately starts generation of snapshot at -the agreed upon `cluster restart slot`. +If all checks pass, the validator immediately starts setting root and +generating an incremental snapshot at the agreed upon `cluster restart slot`. -While the snapshot generation is in progress, the validator also checks to see -whether a full two minutes interval passed since agreement had been reached, -to guarantee its `RestartHeaviestFork` message propagates to everyone, and -whether the snapshot generation is complete. When above checks pass, it then +After the snapshot generation is complete, and above checks pass, it then proceeds to issue a hard fork at the designated slot and change shred version in gossip. After that it restarts into the normal (non-restart) state. From ac0940fbb49b0cd995a43bbf46e4f217447dd07a Mon Sep 17 00:00:00 2001 From: Wen Date: Wed, 2 Aug 2023 10:39:26 -0500 Subject: [PATCH 080/119] Fix typo --- proposals/0046-optimistic-cluster-restart-automation.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index cb17cb0a..424aba08 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -378,7 +378,7 @@ approximate and update later. * The `stake_committed_percent` in `RestartHeaviestFork` should always be calculated using the stakes on the selected slot. -## Backwards Compatibilityz +## Backwards Compatibility This change is backward compatible with previous versions, because validators only enter the new mode during new restart mode which is controlled by a From 3805d7a4224cd85ffed3fdf9688297454b970df7 Mon Sep 17 00:00:00 2001 From: Wen Date: Thu, 3 Aug 2023 22:53:29 -0700 Subject: [PATCH 081/119] Add more reasoning about how HeaviestFork is picked. --- ...6-optimistic-cluster-restart-automation.md | 60 ++++++++++++------- 1 file changed, 40 insertions(+), 20 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index 424aba08..b1fc0c8b 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -263,26 +263,46 @@ We use a new gossip message `RestartHeaviestFork`, its fields are: it received `RestartHeaviestFork` messages from. After receiving `RestartLastVotedForkSlots` from the validators holding stake -more than `RESTART_STAKE_THRESHOLD` and repairing slots in "must-have" category, -replay all blocks and pick the heaviest fork as follows: - -1. For all blocks with more than 67% stake in `RestartLastVotedForkSlots` - messages, they must be on the heaviest fork. - -2. If a picked block has more than one child, check if the heaviest child - should be picked using the following rule: - - 1. If vote_on_child + stake_on_validators_not_in_restart >= 62%, pick child. - For example, if 80% validators are in restart, child has 42% votes, then - 42 + (100-80) = 62%, pick child. 62% is chosen instead of 67% because 5% - could make the wrong votes. - - It's okay to use 62% here because the goal is to prevent false negative - rather than false positive. If validators pick a child of optimistically - confirmed block to start from, it's okay because if 80% of the validators - all choose this block, this block will be instantly confirmed on the chain. - - 2. Otherwise stop traversing the tree and use last picked block. +more than `RESTART_STAKE_THRESHOLD` and repairing slots in "must-have" +category, replay all blocks and pick the heaviest fork by traversing from +local root like this: + +1. If a block has more than 67% stake in `RestartLastVotedForkSlots` + messages, traverse down this block. + +2. Define the "must have" threshold to be 62%. If traversing to a block with + more than one child, we check for each child `vote_on_child + stake_on_validators_not_in_restart >= 62%`. If so, traverse to the child. + + For example, if 80% validators are in restart, child has 42% votes, then + 42 + (100-80) = 62%, traverse to this child. 62% is chosen instead of 67% + because 5% could make the wrong votes. + + Otherwise stop traversing the tree and use last visited block. + +To see why the above algorithm is safe, assuming one block X is optimistically +confirmed before the restart, we prove that it will always be either the block +picked or the ancestor of the block picked. + +Assume X is not picked, then: + +1. If its parent Y is on the Heaviest fork, then either a sibling of X is + chosen or no child of Y is chosen. In either case + vote_on_X + stake_on_validators_not_in_restart < 62%, otherwise X would be + picked. This contradicts with the fact X got 67% votes before the restart, + because vote_on_X should have been greater than + 67% - 5% (non-conforming) - stake_on_validators_not_in_restart. + +2. If its parent Y is also not on the Heaviest fork, Y should have got > 67% + of the votes before the restart as well, then we can apply the same + reasoning to Y's parent, until we find an ancestor which is on the Heaviest + fork, then the contradiction in previous paragraph applies. + +In some cases, we might pick a child with less than 67% votes before the +restart. Say a block X has child A with 43% votes and child B with 42% votes, +child A will be picked as the restart slot. Note that block X which has more +than 43% + 42% = 85% votes is the ancestor of the picked restart slot, so the +constraint that optimistically confirmed block never gets rolled back is +satisfied. After deciding heaviest block, gossip `RestartHeaviestFork(X.slot, X.hash, committed_stake_percent)` out, where X is From 0466a125c6a409a785f298898395b47deb0c6be4 Mon Sep 17 00:00:00 2001 From: Wen Date: Thu, 3 Aug 2023 22:54:51 -0700 Subject: [PATCH 082/119] Make linter happy. --- proposals/0046-optimistic-cluster-restart-automation.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index b1fc0c8b..0bafc157 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -271,7 +271,9 @@ local root like this: messages, traverse down this block. 2. Define the "must have" threshold to be 62%. If traversing to a block with - more than one child, we check for each child `vote_on_child + stake_on_validators_not_in_restart >= 62%`. If so, traverse to the child. + more than one child, we check for each child + `vote_on_child + stake_on_validators_not_in_restart >= 62%`. If so, traverse + to the child. For example, if 80% validators are in restart, child has 42% votes, then 42 + (100-80) = 62%, traverse to this child. 62% is chosen instead of 67% From 6613293b50856a13cb8d0faf71a5e21ca8155d06 Mon Sep 17 00:00:00 2001 From: Wen Date: Wed, 9 Aug 2023 11:47:35 -0700 Subject: [PATCH 083/119] Change indent. --- ...6-optimistic-cluster-restart-automation.md | 237 +++++++++--------- 1 file changed, 122 insertions(+), 115 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index 0bafc157..89f047bd 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -153,124 +153,130 @@ See each step explained in details below. 1. **gossip last vote and ancestors on that fork** -The main goal of this step is to propagate most recent ancestors on the last -voted fork to all others in restart. - -We use a new gossip message `RestartLastVotedForkSlots`, its fields are: - -* `last_voted_slot`: `u64` the slot last voted, this also serves as last_slot -for the bit vector. -* `last_voted_hash`: `Hash` the bank hash of the slot last voted slot. -* `ancestors`: `BitVec` compressed bit vector representing the slots on -sender's last voted fork. the most significant bit is always -`last_voted_slot`, least significant bit is `last_voted_slot-81000`. - -The number of ancestor slots sent is hard coded at 81000, because that's -400ms * 81000 = 9 hours, we assume that optimistic confirmation must halt -within 81k slots of the last confirmed block. If a validator restarts after 9 -hours past the outage, it cannot join the restart this way. If enough -validators failed to restart within 9 hours, then we fallback to the manual, -interactive `cluster restart` method. - -When a validator enters restart, it uses `wen restart shred version` to avoid -interfering with those outside the restart. There is a slight chance that -the `wen restart shred version` would collide with the shred version after -the `wen restart phase`, but even if this rare case occurred, we plan to -flush gossip after successful restart so it should not be a problem. - -To be extra cautious, we will also filter out `RestartLastVotedForkSlots` and -`RestartHeaviestFork` in gossip if a validator is not in `wen restart phase`. + The main goal of this step is to propagate most recent ancestors on the last + voted fork to all others in restart. + + We use a new gossip message `RestartLastVotedForkSlots`, its fields are: + + * `last_voted_slot`: `u64` the slot last voted, this also serves as + last_slot for the bit vector. + * `last_voted_hash`: `Hash` the bank hash of the slot last voted slot. + * `ancestors`: `BitVec` compressed bit vector representing the slots on + sender's last voted fork. the most significant bit is always + `last_voted_slot`, least significant bit is `last_voted_slot-81000`. + + The number of ancestor slots sent is hard coded at 81000, because that's + 400ms * 81000 = 9 hours, we assume that optimistic confirmation must halt + within 81k slots of the last confirmed block. If a validator restarts after + 9 hours past the outage, it cannot join the restart this way. If enough + validators failed to restart within 9 hours, then we fallback to the manual, + interactive `cluster restart` method. + + When a validator enters restart, it uses `wen restart shred version` to + avoid interfering with those outside the restart. There is a slight chance + that the `wen restart shred version` would collide with the shred version + after the `wen restart phase`, but even if this rare case occurred, we plan + to flush gossip after successful restart so it should not be a problem. + + To be extra cautious, we will also filter out `RestartLastVotedForkSlots` + and `RestartHeaviestFork` in gossip if a validator is not in + `wen restart phase`. 2. **Repair ledgers up to the restart slot** -The main goal of this step is to repair all blocks which could potentially be -optimistically confirmed. + The main goal of this step is to repair all blocks which could potentially + be optimistically confirmed. -We need to prevent false negative at all costs, because we can't rollback an -`optimistically confirmed block`. However, false positive is okay. Because when -we select the heaviest fork in the next step, we should see all the potential -candidates for optimistically confirmed slots, there we can count the votes and -remove some false positive cases. + We need to prevent false negative at all costs, because we can't rollback an + `optimistically confirmed block`. However, false positive is okay. Because + when we select the heaviest fork in the next step, we should see all the + potential candidates for optimistically confirmed slots, there we can count + the votes and remove some false positive cases. -However, it's also overkill to repair every block presented by others. When -`RestartLastVotedForkSlots` messages are being received and aggregated, a -validator can categorize blocks missing locally into 2 categories: must-have -and ignored. Depending on the stakes of validators currently in restart, some -slots with too few stake can be safely ignored, while others will be repaired. + However, it's also overkill to repair every block presented by others. When + `RestartLastVotedForkSlots` messages are being received and aggregated, a + validator can categorize blocks missing locally into 2 categories: must-have + and ignored. Depending on the stakes of validators currently in restart, + some slots with too few stake can be safely ignored, while others will be + repaired. -In the following analysis, we assume: + In the following analysis, we assume: -* `RESTART_STAKE_THRESHOLD` is 80% -* `MALICIOUS_SET` which is validators which can disobey the protocol, is 5%. + * `RESTART_STAKE_THRESHOLD` is 80% + * `MALICIOUS_SET` which is validators which can disobey the protocol, is 5%. For example, these validators can change their votes from what they previously voted on. -* `OPTIMISTIC_CONFIRMED_THRESHOLD` is 67%, which is the percentage of stake + * `OPTIMISTIC_CONFIRMED_THRESHOLD` is 67%, which is the percentage of stake required to be a `optimistically confirmed block`. -At any point in restart, let's call percentage of validators not in restart -`PERCENT_NOT_IN_RESTART`. We can draw a line at -`OPTIMISTIC_CONFIRMED_THRESHOLD` - `MALICIOUS_SET` - `PERCENT_NOT_IN_RESTART`. - -Any slot above this line should be repaired, while other slots can be ignored -for now. But if -`OPTIMISTIC_CONFIRMED_THRESHOLD` - `MALICIOUS_SET` - `PERCENT_NOT_IN_RESTART` -is less than 10%, then the validators don't have to start any repairs. Because -the signal now is too noisy. - -We obviously want to repair all blocks above `OPTIMISTIC_CONFIRMED_THRESHOLD` -before the restart. The validators in `MALICIOUS_SET` could lie about their -votes, so we need to be conservative and lower the line accordingly. Also, -we don't know what the validators not in restart have voted, so we need to -be even more conservative and assume they voted for this block. Being -conservative means we might repair blocks which we didn't need, but we will -never miss any block we should have repaired. - -Next we illustrate the system behavior using concrete numbers: - -* 5% validators in restart: `PERCENT_NOT_IN_RESTART` is 100% - 5% = 95%. -`OPTIMISTIC_CONFIRMED_THRESHOLD` - `MALICIOUS_SET` - `PERCENT_NOT_IN_RESTART` -= 67% - 5% - 95% < 10%, so no validators would repair any block. - -* 70% validators in restart: `PERCENT_NOT_IN_RESTART` is 100% - 70% = 30%. -`OPTIMISTIC_CONFIRMED_THRESHOLD` - `MALICIOUS_SET` - `PERCENT_NOT_IN_RESTART` -= 67% - 5% - 30% = 32%, so slots with above 32% votes accumulated from -`RestartLastVotedForkSlots` would be repaired. - -* 80% validators are in restart, `PERCENT_NOT_IN_RESTART` is 100% - 80% = 20%. -`OPTIMISTIC_CONFIRMED_THRESHOLD` - `MALICIOUS_SET` - `PERCENT_NOT_IN_RESTART` -= 67% - 5% - 20% = 42%, so slots with above 42% votes accumulated from -`RestartLastVotedForkSlots` would be repaired. - -From above examples, we can see the "must-have" threshold changes dynamically -depending on how many validators are in restart. The main benefit is that a -block will only move from "must-have" to "ignored" as more validators -join the restart, not vice versa. So the list of blocks a validator needs to -repair will never grow bigger when more validators join the restart. - -Once the validator gets `RestartLastVotedForkSlots`, it can calculate which -blocks must be repaired. When all those "must-have" blocks are repaired and -replayed, it can proceed to step 3. + At any point in restart, let's call percentage of validators not in restart + `PERCENT_NOT_IN_RESTART`. We can draw a line at + `OPTIMISTIC_CONFIRMED_THRESHOLD` - `MALICIOUS_SET` + - `PERCENT_NOT_IN_RESTART`. + + Any slot above this line should be repaired, while other slots can be + ignored for now. But if + `OPTIMISTIC_CONFIRMED_THRESHOLD` - `MALICIOUS_SET` - + `PERCENT_NOT_IN_RESTART` is less than 10%, then the validators don't have to + start any repairs. Because the signal now is too noisy. + + We obviously want to repair all blocks above + `OPTIMISTIC_CONFIRMED_THRESHOLD` before the restart. The validators in + `MALICIOUS_SET` could lie about their votes, so we need to be conservative + and lower the line accordingly. Also, we don't know what the validators not + in restart have voted, so we need to be even more conservative and assume + they voted for this block. Being conservative means we might repair blocks + which we didn't need, but we will never miss any block we should have + repaired. + + Next we illustrate the system behavior using concrete numbers: + + * 5% validators in restart: `PERCENT_NOT_IN_RESTART` is 100% - 5% = 95%. + `OPTIMISTIC_CONFIRMED_THRESHOLD` - `MALICIOUS_SET` - + `PERCENT_NOT_IN_RESTART` = 67% - 5% - 95% < 10%, so no validators would + repair any block. + + * 70% validators in restart: `PERCENT_NOT_IN_RESTART` is 100% - 70% = 30%. + `OPTIMISTIC_CONFIRMED_THRESHOLD` - `MALICIOUS_SET` - + `PERCENT_NOT_IN_RESTART` = 67% - 5% - 30% = 32%, so slots with above 32% + votes accumulated from `RestartLastVotedForkSlots` would be repaired. + + * 80% validators are in restart, `PERCENT_NOT_IN_RESTART` is 100% - 80% = + 20%. `OPTIMISTIC_CONFIRMED_THRESHOLD` - `MALICIOUS_SET` - + `PERCENT_NOT_IN_RESTART` = 67% - 5% - 20% = 42%, so slots with above 42% + votes accumulated from `RestartLastVotedForkSlots` would be repaired. + + From above examples, we can see the "must-have" threshold changes + dynamically depending on how many validators are in restart. The main + benefit is that a block will only move from "must-have" to "ignored" as more + validators join the restart, not vice versa. So the list of blocks a + validator needs to repair will never grow bigger when more validators join + the restart. + + Once the validator gets `RestartLastVotedForkSlots`, it can calculate which + blocks must be repaired. When all those "must-have" blocks are repaired and + replayed, it can proceed to step 3. 3. **gossip current heaviest fork** -The main goal of this step is to "vote" the heaviest fork to restart from. + The main goal of this step is to "vote" the heaviest fork to restart from. -We use a new gossip message `RestartHeaviestFork`, its fields are: + We use a new gossip message `RestartHeaviestFork`, its fields are: -* `slot`: `u64` slot of the picked block. -* `hash`: `Hash` bank hash of the picked block. -* `stake_committed_percent`: `u16` total percentage of stakes of the validators -it received `RestartHeaviestFork` messages from. + * `slot`: `u64` slot of the picked block. + * `hash`: `Hash` bank hash of the picked block. + * `stake_committed_percent`: `u16` total percentage of stakes of the + validators it received `RestartHeaviestFork` messages from. -After receiving `RestartLastVotedForkSlots` from the validators holding stake -more than `RESTART_STAKE_THRESHOLD` and repairing slots in "must-have" -category, replay all blocks and pick the heaviest fork by traversing from -local root like this: + After receiving `RestartLastVotedForkSlots` from the validators holding + stake more than `RESTART_STAKE_THRESHOLD` and repairing slots in "must-have" + category, replay all blocks and pick the heaviest fork by traversing from + local root like this: -1. If a block has more than 67% stake in `RestartLastVotedForkSlots` + 1. If a block has more than 67% stake in `RestartLastVotedForkSlots` messages, traverse down this block. -2. Define the "must have" threshold to be 62%. If traversing to a block with + 2. Define the "must have" threshold to be 62%. If traversing to a block with more than one child, we check for each child `vote_on_child + stake_on_validators_not_in_restart >= 62%`. If so, traverse to the child. @@ -281,39 +287,40 @@ local root like this: Otherwise stop traversing the tree and use last visited block. -To see why the above algorithm is safe, assuming one block X is optimistically -confirmed before the restart, we prove that it will always be either the block -picked or the ancestor of the block picked. + To see why the above algorithm is safe, assuming one block X is + optimistically confirmed before the restart, we prove that it will always be + either the block picked or the ancestor of the block picked. -Assume X is not picked, then: + Assume X is not picked, then: -1. If its parent Y is on the Heaviest fork, then either a sibling of X is + 1. If its parent Y is on the Heaviest fork, then either a sibling of X is chosen or no child of Y is chosen. In either case vote_on_X + stake_on_validators_not_in_restart < 62%, otherwise X would be picked. This contradicts with the fact X got 67% votes before the restart, because vote_on_X should have been greater than 67% - 5% (non-conforming) - stake_on_validators_not_in_restart. -2. If its parent Y is also not on the Heaviest fork, Y should have got > 67% + 2. If its parent Y is also not on the Heaviest fork, Y should have got > 67% of the votes before the restart as well, then we can apply the same reasoning to Y's parent, until we find an ancestor which is on the Heaviest fork, then the contradiction in previous paragraph applies. -In some cases, we might pick a child with less than 67% votes before the -restart. Say a block X has child A with 43% votes and child B with 42% votes, -child A will be picked as the restart slot. Note that block X which has more -than 43% + 42% = 85% votes is the ancestor of the picked restart slot, so the -constraint that optimistically confirmed block never gets rolled back is -satisfied. + In some cases, we might pick a child with less than 67% votes before the + restart. Say a block X has child A with 43% votes and child B with 42% + votes, child A will be picked as the restart slot. Note that block X which + has more than 43% + 42% = 85% votes is the ancestor of the picked restart + slot, so the constraint that optimistically confirmed block never gets + rolled back is satisfied. -After deciding heaviest block, gossip -`RestartHeaviestFork(X.slot, X.hash, committed_stake_percent)` out, where X is -the latest picked block. We also gossip stake of received `RestartHeaviestFork` -messages so that we can proceed to next step when enough validators are ready. + After deciding heaviest block, gossip + `RestartHeaviestFork(X.slot, X.hash, committed_stake_percent)` out, where X + is the latest picked block. We also gossip stake of received + `RestartHeaviestFork` messages so that we can proceed to next step when + enough validators are ready. ### Exit `wen restart phase` -4. **Restart if everything okay, halt otherwise** +**Restart if everything okay, halt otherwise** The main purpose in this step is to decide the `cluster restart slot` and the actual block to restart from. From dd60570388165d7a2869e6ea172a2713790562a7 Mon Sep 17 00:00:00 2001 From: Wen Date: Wed, 9 Aug 2023 11:49:52 -0700 Subject: [PATCH 084/119] Make linter happy. --- proposals/0046-optimistic-cluster-restart-automation.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index 89f047bd..cbe521ca 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -211,8 +211,8 @@ See each step explained in details below. At any point in restart, let's call percentage of validators not in restart `PERCENT_NOT_IN_RESTART`. We can draw a line at - `OPTIMISTIC_CONFIRMED_THRESHOLD` - `MALICIOUS_SET` - - `PERCENT_NOT_IN_RESTART`. + `OPTIMISTIC_CONFIRMED_THRESHOLD` - `MALICIOUS_SET` - + `PERCENT_NOT_IN_RESTART`. Any slot above this line should be repaired, while other slots can be ignored for now. But if From b415733dc2b3b1cd0d912e3b0ad3b39146560a38 Mon Sep 17 00:00:00 2001 From: Wen Date: Wed, 9 Aug 2023 12:19:45 -0700 Subject: [PATCH 085/119] Rework the proof. --- .../0046-optimistic-cluster-restart-automation.md | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index cbe521ca..ce03e944 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -274,7 +274,9 @@ See each step explained in details below. local root like this: 1. If a block has more than 67% stake in `RestartLastVotedForkSlots` - messages, traverse down this block. + messages, traverse down this block. Note that votes for children do count + towards the parent. So being a parent on the chosen fork means the parent + will not be rolled back either. 2. Define the "must have" threshold to be 62%. If traversing to a block with more than one child, we check for each child @@ -300,10 +302,10 @@ See each step explained in details below. because vote_on_X should have been greater than 67% - 5% (non-conforming) - stake_on_validators_not_in_restart. - 2. If its parent Y is also not on the Heaviest fork, Y should have got > 67% - of the votes before the restart as well, then we can apply the same - reasoning to Y's parent, until we find an ancestor which is on the Heaviest - fork, then the contradiction in previous paragraph applies. + 2. If its parent Y is also not on the Heaviest fork, all ancestors of X + should have > 67% of the votes before restart. We should be able to find + the last ancestor of X on the Heaviest fork, then contradiction in previous + paragraph applies. In some cases, we might pick a child with less than 67% votes before the restart. Say a block X has child A with 43% votes and child B with 42% From acb041f5daf213013f396f54b5cd90d281739bd8 Mon Sep 17 00:00:00 2001 From: Wen <113942165+wen-coding@users.noreply.github.com> Date: Mon, 14 Aug 2023 10:05:50 -0700 Subject: [PATCH 086/119] Update proposals/0046-optimistic-cluster-restart-automation.md Co-authored-by: mvines --- proposals/0046-optimistic-cluster-restart-automation.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index ce03e944..5e66537c 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -179,7 +179,7 @@ See each step explained in details below. to flush gossip after successful restart so it should not be a problem. To be extra cautious, we will also filter out `RestartLastVotedForkSlots` - and `RestartHeaviestFork` in gossip if a validator is not in + and `RestartHeaviestFork` (described later) in gossip if a validator is not in `wen restart phase`. 2. **Repair ledgers up to the restart slot** From d85ce345c2a9e10adb9db125b1470566b3567d60 Mon Sep 17 00:00:00 2001 From: Wen Date: Mon, 14 Aug 2023 10:23:02 -0700 Subject: [PATCH 087/119] Explain 81000 slots and issue hard fork before snapshot generation. --- ...6-optimistic-cluster-restart-automation.md | 22 ++++++++++--------- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index ce03e944..11f4a3ba 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -166,11 +166,12 @@ See each step explained in details below. `last_voted_slot`, least significant bit is `last_voted_slot-81000`. The number of ancestor slots sent is hard coded at 81000, because that's - 400ms * 81000 = 9 hours, we assume that optimistic confirmation must halt - within 81k slots of the last confirmed block. If a validator restarts after - 9 hours past the outage, it cannot join the restart this way. If enough - validators failed to restart within 9 hours, then we fallback to the manual, - interactive `cluster restart` method. + 400ms * 81000 = 9 hours, we assume that most validator administrators + would have noticed an outage within 9 hours, and the optimistic + confirmation must have halted within 81k slots of the last confirmed block. + If a validator restarts after 9 hours past the outage, it cannot join the + restart this way. If enough validators failed to restart within 9 hours, + then we fallback to the manual, interactive `cluster restart` method. When a validator enters restart, it uses `wen restart shred version` to avoid interfering with those outside the restart. There is a slight chance @@ -340,12 +341,13 @@ cluster unable to reach consensus. So bank hash needs to be checked as well. * The voted slot is equal or a child of local optimistically confirmed slot. -If all checks pass, the validator immediately starts setting root and -generating an incremental snapshot at the agreed upon `cluster restart slot`. +If all checks pass, the validator immediately starts setting root and issue a +hard fork at the designated slot. Then it will start generating an incremental +snapshot at the agreed upon `cluster restart slot`. This way the hard fork will +be included in the newly generated snapshot. -After the snapshot generation is complete, and above checks pass, it then -proceeds to issue a hard fork at the designated slot and change shred version -in gossip. After that it restarts into the normal (non-restart) state. +After the snapshot generation is complete, it then changes the shred version +in gossip before restarting into the normal (non-restart) state. Before a validator enters restart, it will still propagate `RestartLastVotedForkSlots` and `RestartHeaviestFork` messages in gossip. After From eb99eaccb3d54bd55022316ad2c0b654773b0d34 Mon Sep 17 00:00:00 2001 From: Wen Date: Mon, 14 Aug 2023 11:54:52 -0700 Subject: [PATCH 088/119] Use a hard limit for must-have blocks and accept new RestartLastVotedForkSlots. --- ...6-optimistic-cluster-restart-automation.md | 77 ++++--------------- 1 file changed, 15 insertions(+), 62 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index 88230f87..f2f79297 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -197,62 +197,13 @@ See each step explained in details below. However, it's also overkill to repair every block presented by others. When `RestartLastVotedForkSlots` messages are being received and aggregated, a validator can categorize blocks missing locally into 2 categories: must-have - and ignored. Depending on the stakes of validators currently in restart, - some slots with too few stake can be safely ignored, while others will be - repaired. - - In the following analysis, we assume: - - * `RESTART_STAKE_THRESHOLD` is 80% - * `MALICIOUS_SET` which is validators which can disobey the protocol, is 5%. - For example, these validators can change their votes from what they - previously voted on. - * `OPTIMISTIC_CONFIRMED_THRESHOLD` is 67%, which is the percentage of stake - required to be a `optimistically confirmed block`. - - At any point in restart, let's call percentage of validators not in restart - `PERCENT_NOT_IN_RESTART`. We can draw a line at - `OPTIMISTIC_CONFIRMED_THRESHOLD` - `MALICIOUS_SET` - - `PERCENT_NOT_IN_RESTART`. - - Any slot above this line should be repaired, while other slots can be - ignored for now. But if - `OPTIMISTIC_CONFIRMED_THRESHOLD` - `MALICIOUS_SET` - - `PERCENT_NOT_IN_RESTART` is less than 10%, then the validators don't have to - start any repairs. Because the signal now is too noisy. - - We obviously want to repair all blocks above - `OPTIMISTIC_CONFIRMED_THRESHOLD` before the restart. The validators in - `MALICIOUS_SET` could lie about their votes, so we need to be conservative - and lower the line accordingly. Also, we don't know what the validators not - in restart have voted, so we need to be even more conservative and assume - they voted for this block. Being conservative means we might repair blocks - which we didn't need, but we will never miss any block we should have - repaired. - - Next we illustrate the system behavior using concrete numbers: - - * 5% validators in restart: `PERCENT_NOT_IN_RESTART` is 100% - 5% = 95%. - `OPTIMISTIC_CONFIRMED_THRESHOLD` - `MALICIOUS_SET` - - `PERCENT_NOT_IN_RESTART` = 67% - 5% - 95% < 10%, so no validators would - repair any block. - - * 70% validators in restart: `PERCENT_NOT_IN_RESTART` is 100% - 70% = 30%. - `OPTIMISTIC_CONFIRMED_THRESHOLD` - `MALICIOUS_SET` - - `PERCENT_NOT_IN_RESTART` = 67% - 5% - 30% = 32%, so slots with above 32% - votes accumulated from `RestartLastVotedForkSlots` would be repaired. - - * 80% validators are in restart, `PERCENT_NOT_IN_RESTART` is 100% - 80% = - 20%. `OPTIMISTIC_CONFIRMED_THRESHOLD` - `MALICIOUS_SET` - - `PERCENT_NOT_IN_RESTART` = 67% - 5% - 20% = 42%, so slots with above 42% - votes accumulated from `RestartLastVotedForkSlots` would be repaired. - - From above examples, we can see the "must-have" threshold changes - dynamically depending on how many validators are in restart. The main - benefit is that a block will only move from "must-have" to "ignored" as more - validators join the restart, not vice versa. So the list of blocks a - validator needs to repair will never grow bigger when more validators join - the restart. + and ignored. + + We set the line at 42% when 80% join the restart, it's possible that + different validators see different 80%, so their must-have blocks might + be different, but in reality this case should be rare. Whenever some block + gets to 42%, repair could be started, because when more validators join the + restart, this number will only go up but will never go down. Once the validator gets `RestartLastVotedForkSlots`, it can calculate which blocks must be repaired. When all those "must-have" blocks are repaired and @@ -378,14 +329,16 @@ Non-conforming validators could send out wrong `RestartLastVotedForkSlots` and `RestartHeaviestFork` messages to mess with `cluster restart`s, these should be included in the Slashing rules in the future. -### Discarding oscillating votes +### Handling oscillating votes Non-conforming validators could change their last votes back and forth, this -could lead to instability in the system. Considering that during an outage, an -operator could find out that wrong info was sent out and try to correct it. We -allow `RestartLastVotedForkSlots` be changed 3 times, after that all updates -from the validator with the same pubkey will be simply ignored. We allow -`RestartHeaviestFork` to change until the validator exits `wen restart phase`. +could lead to instability in the system. But our algorithm already built in +safety buffers so < 5% of non-conforming validators will not change the +conclusion. Considering that during an outage, an operator could find out that +wrong info was sent out and try to correct it. We allow +`RestartLastVotedForkSlots` be changed and new values will be used. But we will +log warning messages about this. We allow `RestartHeaviestFork` to change until +the validator exits `wen restart phase`. ### Handling multiple epochs From 30a4d389ef1c54f0881d5d2e0c4fa984be900ca9 Mon Sep 17 00:00:00 2001 From: Wen Date: Thu, 17 Aug 2023 21:16:01 -0700 Subject: [PATCH 089/119] Reverse the order of bits to be consistent with EpochSlots. --- proposals/0046-optimistic-cluster-restart-automation.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index f2f79297..cde8ab59 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -162,8 +162,8 @@ See each step explained in details below. last_slot for the bit vector. * `last_voted_hash`: `Hash` the bank hash of the slot last voted slot. * `ancestors`: `BitVec` compressed bit vector representing the slots on - sender's last voted fork. the most significant bit is always - `last_voted_slot`, least significant bit is `last_voted_slot-81000`. + sender's last voted fork. the least significant bit is always + `last_voted_slot`, most significant bit is `last_voted_slot-81000`. The number of ancestor slots sent is hard coded at 81000, because that's 400ms * 81000 = 9 hours, we assume that most validator administrators From 28373b011e57880975a3b1f1515bd859d64efc07 Mon Sep 17 00:00:00 2001 From: Wen Date: Fri, 8 Sep 2023 13:13:28 -0700 Subject: [PATCH 090/119] Update restart descriptions. --- ...046-optimistic-cluster-restart-automation.md | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index cde8ab59..38f04108 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -292,13 +292,14 @@ cluster unable to reach consensus. So bank hash needs to be checked as well. * The voted slot is equal or a child of local optimistically confirmed slot. -If all checks pass, the validator immediately starts setting root and issue a -hard fork at the designated slot. Then it will start generating an incremental -snapshot at the agreed upon `cluster restart slot`. This way the hard fork will -be included in the newly generated snapshot. +If all checks pass, the validator immediately starts add a hard fork at the +designated slot and update the root. Then it will start generating an +incremental snapshot at the agreed upon `cluster restart slot`. This way the +hard fork will be included in the newly generated snapshot. -After the snapshot generation is complete, it then changes the shred version -in gossip before restarting into the normal (non-restart) state. +After the snapshot generation is complete, it then automatically executes the +--wait_for_supermajority logic with the agreed upon slot and hash, the +validator operators do not need to change the command line arguments here. Before a validator enters restart, it will still propagate `RestartLastVotedForkSlots` and `RestartHeaviestFork` messages in gossip. After @@ -308,6 +309,10 @@ propagate gossip messages for restart. If any of the checks fails, the validator immediately prints out all debug info, sends out metrics so that people can be paged, and then halts. +After the restart is complete, validators will automatically function in normal +mode, the validator operators can update the command line arguments to update +shred_version and remove --wen_restart at a convenient time later. + ## Impact This proposal adds a new wen restart mode to validators, during this phase From 9558fcbbe393adaa3ba76d215b5a8f39f693e4be Mon Sep 17 00:00:00 2001 From: Wen Date: Sat, 18 Nov 2023 15:36:06 -0800 Subject: [PATCH 091/119] Update 81k to 64k. --- ...6-optimistic-cluster-restart-automation.md | 21 ++++++++++--------- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index 38f04108..bd5412ba 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -161,16 +161,17 @@ See each step explained in details below. * `last_voted_slot`: `u64` the slot last voted, this also serves as last_slot for the bit vector. * `last_voted_hash`: `Hash` the bank hash of the slot last voted slot. - * `ancestors`: `BitVec` compressed bit vector representing the slots on - sender's last voted fork. the least significant bit is always - `last_voted_slot`, most significant bit is `last_voted_slot-81000`. - - The number of ancestor slots sent is hard coded at 81000, because that's - 400ms * 81000 = 9 hours, we assume that most validator administrators - would have noticed an outage within 9 hours, and the optimistic - confirmation must have halted within 81k slots of the last confirmed block. - If a validator restarts after 9 hours past the outage, it cannot join the - restart this way. If enough validators failed to restart within 9 hours, + * `ancestors`: `Run-length encoding` compressed bit vector representing the + slots on sender's last voted fork. the least significant bit is always + `last_voted_slot`, most significant bit is `last_voted_slot-65535`. + + The number of ancestor slots sent is hard coded at 65535, because that's + 400ms * 65535 = 7.3 hours, we assume that most validator administrators + would have noticed an outage within 7 hours, and the optimistic + confirmation must have halted within 64k slots of the last confirmed block. + Also 65535 bits nicely fits into u16, which makes encoding more compact. + If a validator restarts after 7 hours past the outage, it cannot join the + restart this way. If enough validators failed to restart within 7 hours, then we fallback to the manual, interactive `cluster restart` method. When a validator enters restart, it uses `wen restart shred version` to From 1e9ea7454cd02eb4eef8f4b9006ea2469234f772 Mon Sep 17 00:00:00 2001 From: Wen <113942165+wen-coding@users.noreply.github.com> Date: Tue, 12 Mar 2024 11:21:35 -0700 Subject: [PATCH 092/119] Update the find heaviest algorithm and proof. --- ...6-optimistic-cluster-restart-automation.md | 70 +++++++++---------- 1 file changed, 33 insertions(+), 37 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index bd5412ba..ce7cac5b 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -223,49 +223,45 @@ See each step explained in details below. After receiving `RestartLastVotedForkSlots` from the validators holding stake more than `RESTART_STAKE_THRESHOLD` and repairing slots in "must-have" - category, replay all blocks and pick the heaviest fork by traversing from - local root like this: + category, replay all blocks and pick the heaviest fork like this: - 1. If a block has more than 67% stake in `RestartLastVotedForkSlots` - messages, traverse down this block. Note that votes for children do count - towards the parent. So being a parent on the chosen fork means the parent - will not be rolled back either. + 1. Calculate the threshold for a block to be on the heaviest fork, the + heaviest fork should have all blocks with possibility to be optimistically + confirmed. The number is `67% - 5% - stake_on_validators_not_in_restart`. - 2. Define the "must have" threshold to be 62%. If traversing to a block with - more than one child, we check for each child - `vote_on_child + stake_on_validators_not_in_restart >= 62%`. If so, traverse - to the child. + For example, if 80% validators are in restart, the number would be + `67% - 5% - (100-80)% = 42%`. If 90% validators are in restart, the number + would be `67% - 5% - (100-90)% = 52%`. - For example, if 80% validators are in restart, child has 42% votes, then - 42 + (100-80) = 62%, traverse to this child. 62% is chosen instead of 67% - because 5% could make the wrong votes. + 2. Sort all blocks passing the calculated threshold, and verify that they + form a single chain. If any block doesn't satisfy the following contraints: + 1. Its stake is no greater than the stake of its parent block. + 2. If it's the first block in the list, its parent block is the current + root. Otherwise its parent block is the block immediately ahead of it + in the list. - Otherwise stop traversing the tree and use last visited block. + If any block does not satisfy any of the constraints, print the first + offending block and exit. + + 3. If the list is empty, then output local root as the HeaviestFork. + Otherwise output the last block in the list as the HeavistFork. To see why the above algorithm is safe, assuming one block X is - optimistically confirmed before the restart, we prove that it will always be - either the block picked or the ancestor of the block picked. - - Assume X is not picked, then: - - 1. If its parent Y is on the Heaviest fork, then either a sibling of X is - chosen or no child of Y is chosen. In either case - vote_on_X + stake_on_validators_not_in_restart < 62%, otherwise X would be - picked. This contradicts with the fact X got 67% votes before the restart, - because vote_on_X should have been greater than - 67% - 5% (non-conforming) - stake_on_validators_not_in_restart. - - 2. If its parent Y is also not on the Heaviest fork, all ancestors of X - should have > 67% of the votes before restart. We should be able to find - the last ancestor of X on the Heaviest fork, then contradiction in previous - paragraph applies. - - In some cases, we might pick a child with less than 67% votes before the - restart. Say a block X has child A with 43% votes and child B with 42% - votes, child A will be picked as the restart slot. Note that block X which - has more than 43% + 42% = 85% votes is the ancestor of the picked restart - slot, so the constraint that optimistically confirmed block never gets - rolled back is satisfied. + optimistically confirmed before the restart, it would have `67%` stake, + discounting `5%` malicious and people not participating in wen_restart, it + should have at least `67% - 5% - stake_on_validators_not_in_restart` stake, + so it should pass the threshold and be in the list. + + Also, any block in the list should only have at most one child in the list. + Let's use `X` to denote `stake_on_validators_not_in_restart` for brevity. + Assuming a block has child `A` and `B` both on the list, the children's + combined stake would be `2 * (67% - 5% - X)`. Because we only allow one + RestartHeaviestFork per pubkey, even if the any validator can put both + children in its RestartHeaviestFork, the children's total stake should be + less than `100% - 5% - X`. We can caculate that if `134% - 2 * X < 95% - X`, + then `X > 39%`, this is not possible when we have at least 80% of the + validators in restart. So we prove any block in the list can have at most + one child in the list by contradiction. After deciding heaviest block, gossip `RestartHeaviestFork(X.slot, X.hash, committed_stake_percent)` out, where X From 4ceebbb5010908224012d23069254594f172a789 Mon Sep 17 00:00:00 2001 From: Wen <113942165+wen-coding@users.noreply.github.com> Date: Tue, 12 Mar 2024 14:06:24 -0700 Subject: [PATCH 093/119] Update the proof for heaviest fork, we don't need to check stakes. --- ...6-optimistic-cluster-restart-automation.md | 44 +++++++++++++------ 1 file changed, 31 insertions(+), 13 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index ce7cac5b..ba00e862 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -234,25 +234,28 @@ See each step explained in details below. would be `67% - 5% - (100-90)% = 52%`. 2. Sort all blocks passing the calculated threshold, and verify that they - form a single chain. If any block doesn't satisfy the following contraints: - 1. Its stake is no greater than the stake of its parent block. - 2. If it's the first block in the list, its parent block is the current - root. Otherwise its parent block is the block immediately ahead of it - in the list. + form a single chain. If it's the first block in the list, its parent block + should be the local root. Otherwise its parent block should be the one + immediately ahead of it in the list. - If any block does not satisfy any of the constraints, print the first - offending block and exit. + If any block does not satisfy above constraint, print the first offending + block and exit. 3. If the list is empty, then output local root as the HeaviestFork. Otherwise output the last block in the list as the HeavistFork. - To see why the above algorithm is safe, assuming one block X is - optimistically confirmed before the restart, it would have `67%` stake, - discounting `5%` malicious and people not participating in wen_restart, it - should have at least `67% - 5% - stake_on_validators_not_in_restart` stake, - so it should pass the threshold and be in the list. + To see why the above algorithm is safe, we will prove that: + + 1. Any block optimistically confirmed before the restart will always be + on the list: + + Assume block X is one such block, it would have `67%` stake, discounting + `5%` non-conforming and people not participating in wen_restart, it should + have at least `67% - 5% - stake_on_validators_not_in_restart` stake, so it + should pass the threshold and be in the list. + + 2. Any block in the list should only have at most one child in the list: - Also, any block in the list should only have at most one child in the list. Let's use `X` to denote `stake_on_validators_not_in_restart` for brevity. Assuming a block has child `A` and `B` both on the list, the children's combined stake would be `2 * (67% - 5% - X)`. Because we only allow one @@ -263,6 +266,21 @@ See each step explained in details below. validators in restart. So we prove any block in the list can have at most one child in the list by contradiction. + 3. If a block not optimistically confirmed before the restart is on the + list, it can only be at the end of the list and none of its siblings are + on the list. + + Let's say block Y is the first not optimistically confirmed block on the + list, its parent Z is confirmed and on the list. We know from above point + that Z can only have 1 child on the list, therefore Y must be at the end + of the list while its siblings are not on the list. + + Even if the last block A on the list may not be optimistically confirmed, + it already has at least `42% - 5% = 37%` stake, with no competing sibling + B getting more than `42%` stake. This is equal to the case where `5%` stake + jumped ship from fork B to fork A, 80% of the cluster can switch to fork B + if that turns out to be the heavist fork. + After deciding heaviest block, gossip `RestartHeaviestFork(X.slot, X.hash, committed_stake_percent)` out, where X is the latest picked block. We also gossip stake of received From f0d933cf5c813ee70959f1511c1e22c7e7dfe8ef Mon Sep 17 00:00:00 2001 From: Wen <113942165+wen-coding@users.noreply.github.com> Date: Tue, 12 Mar 2024 14:09:48 -0700 Subject: [PATCH 094/119] Update notations in proof. --- .../0046-optimistic-cluster-restart-automation.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index ba00e862..a1839e67 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -249,7 +249,7 @@ See each step explained in details below. 1. Any block optimistically confirmed before the restart will always be on the list: - Assume block X is one such block, it would have `67%` stake, discounting + Assume block A is one such block, it would have `67%` stake, discounting `5%` non-conforming and people not participating in wen_restart, it should have at least `67% - 5% - stake_on_validators_not_in_restart` stake, so it should pass the threshold and be in the list. @@ -270,15 +270,15 @@ See each step explained in details below. list, it can only be at the end of the list and none of its siblings are on the list. - Let's say block Y is the first not optimistically confirmed block on the - list, its parent Z is confirmed and on the list. We know from above point - that Z can only have 1 child on the list, therefore Y must be at the end + Let's say block D is the first not optimistically confirmed block on the + list, its parent E is confirmed and on the list. We know from above point + that E can only have 1 child on the list, therefore D must be at the end of the list while its siblings are not on the list. - Even if the last block A on the list may not be optimistically confirmed, + Even if the last block D on the list may not be optimistically confirmed, it already has at least `42% - 5% = 37%` stake, with no competing sibling - B getting more than `42%` stake. This is equal to the case where `5%` stake - jumped ship from fork B to fork A, 80% of the cluster can switch to fork B + E getting more than `42%` stake. This is equal to the case where `5%` stake + jumped ship from fork E to fork D, 80% of the cluster can switch to fork E if that turns out to be the heavist fork. After deciding heaviest block, gossip From 821456dfc05df168efba0fdffb723f206e3c4acd Mon Sep 17 00:00:00 2001 From: Wen <113942165+wen-coding@users.noreply.github.com> Date: Thu, 23 May 2024 13:49:16 -0700 Subject: [PATCH 095/119] Explain the 42% constant. --- .../0046-optimistic-cluster-restart-automation.md | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index a1839e67..955a49a5 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -200,11 +200,14 @@ See each step explained in details below. validator can categorize blocks missing locally into 2 categories: must-have and ignored. - We set the line at 42% when 80% join the restart, it's possible that - different validators see different 80%, so their must-have blocks might - be different, but in reality this case should be rare. Whenever some block - gets to 42%, repair could be started, because when more validators join the - restart, this number will only go up but will never go down. + We set the line at 42%. Because we require that at least 80% join the restart, + so any block with less than 67% - (100 - 80)% - 5% = 42% can never be + optimistically confirmed before the restart. + + It's possible that different validators see different 80%, so their must-have + blocks might be different, but in reality this case should be rare. Whenever + some block gets to 42%, repair could be started, because when more validators + join the restart, this number will only go up but will never go down. Once the validator gets `RestartLastVotedForkSlots`, it can calculate which blocks must be repaired. When all those "must-have" blocks are repaired and From 891d5eafc4fafe852c776fb9ae17aa6c9179757a Mon Sep 17 00:00:00 2001 From: Wen <113942165+wen-coding@users.noreply.github.com> Date: Fri, 24 May 2024 10:57:58 -0700 Subject: [PATCH 096/119] Explain 5% as well. --- .../0046-optimistic-cluster-restart-automation.md | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index 955a49a5..8de9e7b9 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -149,6 +149,10 @@ hash) to restart from: See each step explained in details below. +We assume that as most 5% of the validators in restart can be malicious or +contains bugs, this number is consistent with other algorithms in the consensus +protocol. We call these `non-conforming` validators. + ### Wen restart phase 1. **gossip last vote and ancestors on that fork** @@ -200,9 +204,10 @@ See each step explained in details below. validator can categorize blocks missing locally into 2 categories: must-have and ignored. - We set the line at 42%. Because we require that at least 80% join the restart, - so any block with less than 67% - (100 - 80)% - 5% = 42% can never be - optimistically confirmed before the restart. + We repairs all blocks with no less than 42% stake. The number is + `67% - 5% - stake_on_validators_not_in_restart`. We require that at least 80% + join the restart, any block with less than 67% - (100 - 80)% - 5% = 42% can + never be optimistically confirmed before the restart. It's possible that different validators see different 80%, so their must-have blocks might be different, but in reality this case should be rare. Whenever From 613384d9e0e27dbff26bfe0955b907d6a2e438dc Mon Sep 17 00:00:00 2001 From: Wen <113942165+wen-coding@users.noreply.github.com> Date: Fri, 24 May 2024 13:26:10 -0700 Subject: [PATCH 097/119] Small fixes. --- ...046-optimistic-cluster-restart-automation.md | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index 8de9e7b9..babb71b5 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -77,6 +77,12 @@ around while the validators automatically try to reach consensus, the validator will halt and print debug information if anything goes wrong, and operators can set up their own monitoring accordingly. +However, there are many ways an automatic restart can go wrong, mostly due to +unforseen situations or software bugs. To make things really safe, we apply +multiple checks durinng the restart, if any check fails, the automatic restart +is halted and debugging info printed, waiting for human intervention. Therefore +we say this is an optimistic cluster restart procedure. + ## Alternatives Considered ### Automatically detect outage and perform `cluster restart` @@ -269,7 +275,7 @@ protocol. We call these `non-conforming` validators. combined stake would be `2 * (67% - 5% - X)`. Because we only allow one RestartHeaviestFork per pubkey, even if the any validator can put both children in its RestartHeaviestFork, the children's total stake should be - less than `100% - 5% - X`. We can caculate that if `134% - 2 * X < 95% - X`, + less than `100% - 5% - X`. We can calculate that if `134% - 2 * X < 95% - X`, then `X > 39%`, this is not possible when we have at least 80% of the validators in restart. So we prove any block in the list can have at most one child in the list by contradiction. @@ -284,10 +290,11 @@ protocol. We call these `non-conforming` validators. of the list while its siblings are not on the list. Even if the last block D on the list may not be optimistically confirmed, - it already has at least `42% - 5% = 37%` stake, with no competing sibling - E getting more than `42%` stake. This is equal to the case where `5%` stake - jumped ship from fork E to fork D, 80% of the cluster can switch to fork E - if that turns out to be the heavist fork. + it already has at least `42% - 5% = 37%` stake. Say F is its sibling with + the most stake, F can only have less than `42%` stake because it's not on + the list. So picking D over F is equal to the case where `5%` stake + switched from fork F to fork D, 80% of the cluster can switch to fork D + if that turns out to be the heaviest fork. After deciding heaviest block, gossip `RestartHeaviestFork(X.slot, X.hash, committed_stake_percent)` out, where X From 75306bd52efa7511aa1332aa4da39d9668a8124f Mon Sep 17 00:00:00 2001 From: Wen <113942165+wen-coding@users.noreply.github.com> Date: Sat, 3 Aug 2024 10:16:53 -0700 Subject: [PATCH 098/119] Update stake calculation when crossing Epoch boundaries. --- ...6-optimistic-cluster-restart-automation.md | 23 ++++++++++++------- 1 file changed, 15 insertions(+), 8 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index babb71b5..7a6d281d 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -387,14 +387,21 @@ which belong to an epoch which is > 1 epoch away from its root. If a validator has very old root so it can't proceed, it will exit and report error. * The stake weight of each slot is calculated using the epoch the slot is in. -If a validator is missing epoch stakes for a new epoch, it will use the epoch -stakes of its root to approximate the results, and update all calculation once -the first bank has been accepted in the new epoch. - -* When calculating cluster wide threshold (e.g. how many validators are in the -restart), use the stake weight of the slot selected in `RestartHeaviestFork`. -If no slot has been selected yet, use Epoch stakes of local root bank to -approximate and update later. +Because right now epoch stakes are calculated 1 epoch ahead of time, and we +only handle outages spanning 7 hours, the local root bank should have the +epoch stakes for all epochs we need. + +* When aggregating `RestartLastVotedForkSlots`, for any epoch with at least one +slot X having > 42% stake, calculate the stake of active validators in this +epoch. Only exit this stage if all epochs reaching the above bar has > 80% +stake. This is a bit restrictive, but it guarantees that whichever slot we +select for HeaviestFork, we have enough validators in the restart. Note that +the epoch containing local root should always be considered, because root +should have > 42% stake. + +* When aggregating `RestartHeaviestFork`, use the stake weight of the slot +selected in `RestartHeaviestFork`. If others don't agree with us on the same +slot, we won't be able to proceed anyway. * The `stake_committed_percent` in `RestartHeaviestFork` should always be calculated using the stakes on the selected slot. From f65c6aa0490497502ce67aeea0f466fcfb794aaa Mon Sep 17 00:00:00 2001 From: Wen <113942165+wen-coding@users.noreply.github.com> Date: Mon, 19 Aug 2024 13:41:21 -0700 Subject: [PATCH 099/119] Update exit criteria when crossing Epoch boundary. --- .../0046-optimistic-cluster-restart-automation.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index 7a6d281d..534d63ca 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -391,13 +391,13 @@ Because right now epoch stakes are calculated 1 epoch ahead of time, and we only handle outages spanning 7 hours, the local root bank should have the epoch stakes for all epochs we need. -* When aggregating `RestartLastVotedForkSlots`, for any epoch with at least one -slot X having > 42% stake, calculate the stake of active validators in this -epoch. Only exit this stage if all epochs reaching the above bar has > 80% -stake. This is a bit restrictive, but it guarantees that whichever slot we -select for HeaviestFork, we have enough validators in the restart. Note that -the epoch containing local root should always be considered, because root -should have > 42% stake. +* When aggregating `RestartLastVotedForkSlots`, for any epoch with validators +voting for any slot in this epoch having at least 33% stake, calculate the +stake of active validators in this epoch. Only exit this stage if all epochs +reaching the above bar has > 80% stake. This is a bit restrictive, but it +guarantees that whichever slot we select for HeaviestFork, we have enough +validators in the restart. Note that the epoch containing local root should +always be considered, because root should have > 33% stake. * When aggregating `RestartHeaviestFork`, use the stake weight of the slot selected in `RestartHeaviestFork`. If others don't agree with us on the same From bf5529f4a26088451ae09a058e126d7a4a5d3c0a Mon Sep 17 00:00:00 2001 From: Wen <113942165+wen-coding@users.noreply.github.com> Date: Wed, 4 Sep 2024 11:17:31 -0700 Subject: [PATCH 100/119] Add RestartHeaviestFork round 2. --- ...6-optimistic-cluster-restart-automation.md | 45 +++++++++++++++---- 1 file changed, 37 insertions(+), 8 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index 534d63ca..833f7858 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -224,7 +224,7 @@ protocol. We call these `non-conforming` validators. blocks must be repaired. When all those "must-have" blocks are repaired and replayed, it can proceed to step 3. -3. **gossip current heaviest fork** +3. **gossip current heaviest fork (round 1)** The main goal of this step is to "vote" the heaviest fork to restart from. @@ -302,6 +302,37 @@ protocol. We call these `non-conforming` validators. `RestartHeaviestFork` messages so that we can proceed to next step when enough validators are ready. +4. **gossip current heaviest fork(round 2)** + +The above steps should converge on one restart slot if there is no duplicate +block and everyone has the same set of `RestartLastVotedForkSlots`. However, +in the rare case that many validators joined the restart when the almost 80% +of the cluster has already joined, it's possible that different validators +see a different set of validators with 80% stake, so they may make different +choices in `RestartHeaviestFork`. + +We add another round of `RestartHeaviestFork` to solve this problem. If the +first round of `RestartHeaviestFork` fails to reach conclusion, and there is +no hash mismatch on any block, then we start second round of `RestartHeaviestFork` +based solely on the first round of `RestartHeaviestFork`. Everyone will +select the common ancestor of any block with more than 5% stake in the +first round, then broadcast again. If there is still no block with more than +75% stake, then the algorithm fails and halts. Otherwise it proceeds. + +To see why this is safe, we will prove that: +1. Any block which was optimistically confirmed must be an ancestor of the +block selected in the first round. + +No matter which 80% of the cluster is selected, our previous analysis holds. + +2. If more than 5% of stake agree on a block to which my picked block is not +an ancestor, that means the block I picked wasn't optimistically confirmed. + +Because we can only have less than 5% non-conforming validators, even if one +honest validator saw that the block I picked should not be optimistically +confirmed with the set of 80% stake it sees, this is enough evidence that +we should retrace back to an ancestor of the block I picked previously. + ### Exit `wen restart phase` **Restart if everything okay, halt otherwise** @@ -367,13 +398,11 @@ included in the Slashing rules in the future. ### Handling oscillating votes Non-conforming validators could change their last votes back and forth, this -could lead to instability in the system. But our algorithm already built in -safety buffers so < 5% of non-conforming validators will not change the -conclusion. Considering that during an outage, an operator could find out that -wrong info was sent out and try to correct it. We allow -`RestartLastVotedForkSlots` be changed and new values will be used. But we will -log warning messages about this. We allow `RestartHeaviestFork` to change until -the validator exits `wen restart phase`. +could lead to instability in the system. We forbid any change of slot or hash +in `RestartLastVotedForkSlots` or `RestartHeaviestFork`, everyone will stick +with the first value received, and discrepancies will be recorded in the proto +file for later slashing. We do allow the slot and hash to change in different +rounds of `RestartHeaviestFork` though. ### Handling multiple epochs From 6fb1cf7519c6a3267bb4fe58c6ef1b6e988df8dc Mon Sep 17 00:00:00 2001 From: Wen <113942165+wen-coding@users.noreply.github.com> Date: Wed, 4 Sep 2024 11:25:51 -0700 Subject: [PATCH 101/119] Make linter happy. --- proposals/0046-optimistic-cluster-restart-automation.md | 1 + 1 file changed, 1 insertion(+) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index 833f7858..f85cd207 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -320,6 +320,7 @@ first round, then broadcast again. If there is still no block with more than 75% stake, then the algorithm fails and halts. Otherwise it proceeds. To see why this is safe, we will prove that: + 1. Any block which was optimistically confirmed must be an ancestor of the block selected in the first round. From 9baa635f4edf5ea2b5f811d78a146b1dd7ffcc0d Mon Sep 17 00:00:00 2001 From: Wen <113942165+wen-coding@users.noreply.github.com> Date: Wed, 4 Sep 2024 16:00:21 -0700 Subject: [PATCH 102/119] Use round 0 and round 1 instead of round 1 and 2. --- proposals/0046-optimistic-cluster-restart-automation.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index f85cd207..bd22546f 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -224,7 +224,7 @@ protocol. We call these `non-conforming` validators. blocks must be repaired. When all those "must-have" blocks are repaired and replayed, it can proceed to step 3. -3. **gossip current heaviest fork (round 1)** +3. **gossip current heaviest fork (round 0)** The main goal of this step is to "vote" the heaviest fork to restart from. @@ -302,7 +302,7 @@ protocol. We call these `non-conforming` validators. `RestartHeaviestFork` messages so that we can proceed to next step when enough validators are ready. -4. **gossip current heaviest fork(round 2)** +4. **gossip current heaviest fork(round 1)** The above steps should converge on one restart slot if there is no duplicate block and everyone has the same set of `RestartLastVotedForkSlots`. However, From 4f5469c1ec802237db7339bceb3b52bd4c74e6d8 Mon Sep 17 00:00:00 2001 From: Wen <113942165+wen-coding@users.noreply.github.com> Date: Thu, 12 Sep 2024 11:24:04 -0700 Subject: [PATCH 103/119] Replace previous HeaviestFork stage with a leader based design. --- ...6-optimistic-cluster-restart-automation.md | 125 ++++++------------ 1 file changed, 37 insertions(+), 88 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index bd22546f..19514674 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -224,16 +224,7 @@ protocol. We call these `non-conforming` validators. blocks must be repaired. When all those "must-have" blocks are repaired and replayed, it can proceed to step 3. -3. **gossip current heaviest fork (round 0)** - - The main goal of this step is to "vote" the heaviest fork to restart from. - - We use a new gossip message `RestartHeaviestFork`, its fields are: - - * `slot`: `u64` slot of the picked block. - * `hash`: `Hash` bank hash of the picked block. - * `stake_committed_percent`: `u16` total percentage of stakes of the - validators it received `RestartHeaviestFork` messages from. +3. **Calculate heaviest fork** After receiving `RestartLastVotedForkSlots` from the validators holding stake more than `RESTART_STAKE_THRESHOLD` and repairing slots in "must-have" @@ -296,84 +287,50 @@ protocol. We call these `non-conforming` validators. switched from fork F to fork D, 80% of the cluster can switch to fork D if that turns out to be the heaviest fork. - After deciding heaviest block, gossip - `RestartHeaviestFork(X.slot, X.hash, committed_stake_percent)` out, where X - is the latest picked block. We also gossip stake of received - `RestartHeaviestFork` messages so that we can proceed to next step when - enough validators are ready. - -4. **gossip current heaviest fork(round 1)** - -The above steps should converge on one restart slot if there is no duplicate -block and everyone has the same set of `RestartLastVotedForkSlots`. However, -in the rare case that many validators joined the restart when the almost 80% -of the cluster has already joined, it's possible that different validators -see a different set of validators with 80% stake, so they may make different -choices in `RestartHeaviestFork`. - -We add another round of `RestartHeaviestFork` to solve this problem. If the -first round of `RestartHeaviestFork` fails to reach conclusion, and there is -no hash mismatch on any block, then we start second round of `RestartHeaviestFork` -based solely on the first round of `RestartHeaviestFork`. Everyone will -select the common ancestor of any block with more than 5% stake in the -first round, then broadcast again. If there is still no block with more than -75% stake, then the algorithm fails and halts. Otherwise it proceeds. +4. **Verify the heaviest fork of the leader** -To see why this is safe, we will prove that: + While everyone will calculate its own heaviest fork in previous step, only one + leader specified on command line will send out its heaviest fork via Gossip. + Everyone else will check and accept the choice from the leader only. -1. Any block which was optimistically confirmed must be an ancestor of the -block selected in the first round. - -No matter which 80% of the cluster is selected, our previous analysis holds. - -2. If more than 5% of stake agree on a block to which my picked block is not -an ancestor, that means the block I picked wasn't optimistically confirmed. - -Because we can only have less than 5% non-conforming validators, even if one -honest validator saw that the block I picked should not be optimistically -confirmed with the set of 80% stake it sees, this is enough evidence that -we should retrace back to an ancestor of the block I picked previously. + We use a new gossip message `RestartHeaviestFork`, its fields are: -### Exit `wen restart phase` + * `slot`: `u64` slot of the picked block. + * `hash`: `Hash` bank hash of the picked block. -**Restart if everything okay, halt otherwise** + After deciding heaviest block, the leader gossip + `RestartHeaviestFork(X.slot, X.hash)` out, where X is the latest picked block. + The leader will stay up until manually restarted by its operator. -The main purpose in this step is to decide the `cluster restart slot` and the -actual block to restart from. + Non-leader validator will discard `RestartHeaviestFork` sent by everyone else. + Upon receiving the heaviest fork from the leader, it will perform the + following checks: -All validators in restart keep counting the number of `RestartHeaviestFork` -where `received_heaviest_stake` is higher than `RESTART_STAKE_THRESHOLD`. Once -a validator counts that `RESTART_STAKE_THRESHOLD` of the validators send out -`RestartHeaviestFork` where `received_heaviest_stake` is higher than -`RESTART_STAKE_THRESHOLD`, it starts the following checks: + 1. If the bank selected is missing locally, repair this slot and all slots with + higher stake. -* Whether all `RestartHeaviestFork` have the same slot and same bank Hash. -Because validators are only sending slots instead of bank hashes in -`RestartLastVotedForkSlots`, it's possible that a duplicate block can make the -cluster unable to reach consensus. So bank hash needs to be checked as well. + 2. Check that the bankhash of selected slot matches the data locally. -* The voted slot is equal or a child of local optimistically confirmed slot. + 3. Verify that the selected fork contains local root is on the same fork as + local heaviest fork. -If all checks pass, the validator immediately starts add a hard fork at the -designated slot and update the root. Then it will start generating an -incremental snapshot at the agreed upon `cluster restart slot`. This way the -hard fork will be included in the newly generated snapshot. + If any of the above repair or check fails, exit with error message, the leader + may have made a mistake and this needs manual intervention. -After the snapshot generation is complete, it then automatically executes the ---wait_for_supermajority logic with the agreed upon slot and hash, the -validator operators do not need to change the command line arguments here. +5. **Generate incremental snapshot and exit** -Before a validator enters restart, it will still propagate -`RestartLastVotedForkSlots` and `RestartHeaviestFork` messages in gossip. After -the restart,its shred_version will be updated so it will no longer send or -propagate gossip messages for restart. +If the previous step succeeds, the validator immediately starts adding a hard +fork at the designated slot and perform set root. Then it will start generating +an incremental snapshot at the agreed upon `cluster restart slot`. This way the +hard fork will be included in the newly generated snapshot. After snapshot +generation completes, the `--wait_for_supermajority` args with correct shred +version, restart slot, and expected bankhash will be printed to the logs. -If any of the checks fails, the validator immediately prints out all debug info, -sends out metrics so that people can be paged, and then halts. +After the snapshot generation is complete, a non leader then exits with exit +code `200` to indicate work is complete. -After the restart is complete, validators will automatically function in normal -mode, the validator operators can update the command line arguments to update -shred_version and remove --wen_restart at a convenient time later. +A leader will stay up as long as possible to make sure any later comers get the +`RestartHeaviestFork` message. ## Impact @@ -389,12 +346,12 @@ operators don't need to manually generate and download snapshots again. The two added gossip messages `RestartLastVotedForkSlots` and `RestartHeaviestFork` will only be sent and processed when the validator is restarted in the new proposed optimistic `cluster restart` mode. They will also -be filtered out if a validator is not in this mode. So random validator\ +be filtered out if a validator is not in this mode. So random validator restarting in the new mode will not bring extra burden to the system. -Non-conforming validators could send out wrong `RestartLastVotedForkSlots` and -`RestartHeaviestFork` messages to mess with `cluster restart`s, these should be -included in the Slashing rules in the future. +Non-conforming validators could send out wrong `RestartLastVotedForkSlots` +messages to mess with `cluster restart`s, these should be included in the +Slashing rules in the future. ### Handling oscillating votes @@ -402,8 +359,7 @@ Non-conforming validators could change their last votes back and forth, this could lead to instability in the system. We forbid any change of slot or hash in `RestartLastVotedForkSlots` or `RestartHeaviestFork`, everyone will stick with the first value received, and discrepancies will be recorded in the proto -file for later slashing. We do allow the slot and hash to change in different -rounds of `RestartHeaviestFork` though. +file for later slashing. ### Handling multiple epochs @@ -429,13 +385,6 @@ guarantees that whichever slot we select for HeaviestFork, we have enough validators in the restart. Note that the epoch containing local root should always be considered, because root should have > 33% stake. -* When aggregating `RestartHeaviestFork`, use the stake weight of the slot -selected in `RestartHeaviestFork`. If others don't agree with us on the same -slot, we won't be able to proceed anyway. - -* The `stake_committed_percent` in `RestartHeaviestFork` should always be -calculated using the stakes on the selected slot. - ## Backwards Compatibility This change is backward compatible with previous versions, because validators From 22dfc5790bda569d299ac6a195beabe6acd7cb4d Mon Sep 17 00:00:00 2001 From: Wen <113942165+wen-coding@users.noreply.github.com> Date: Thu, 12 Sep 2024 11:52:41 -0700 Subject: [PATCH 104/119] Update the abstract as well. --- proposals/0046-optimistic-cluster-restart-automation.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index 19514674..eb35a751 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -144,10 +144,12 @@ validator repairs all blocks which could potentially have been optimistically confirmed. 3. After repair is complete, the validator counts votes on each fork and -sends out local heaviest fork. +computes local heaviest fork. -4. Each validator counts if enough nodes can agree on one block (same slot and -hash) to restart from: +4. A leader which is configured on command line sends out its heaviest fork +to everyone. + +5. Each validator verifies that the leader's choice is reasonable: 1. If yes, proceed and restart From 560dd4dab5ae90e78ccd599e9c74680e873ef85f Mon Sep 17 00:00:00 2001 From: Wen <113942165+wen-coding@users.noreply.github.com> Date: Thu, 12 Sep 2024 12:05:46 -0700 Subject: [PATCH 105/119] Update wording. --- proposals/0046-optimistic-cluster-restart-automation.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index eb35a751..0df94a84 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -146,8 +146,8 @@ confirmed. 3. After repair is complete, the validator counts votes on each fork and computes local heaviest fork. -4. A leader which is configured on command line sends out its heaviest fork -to everyone. +4. A leader which is configured on everyone's command line sends out its +heaviest fork to everyone. 5. Each validator verifies that the leader's choice is reasonable: From 683a43a6bb0dc29055831258903e1003460ec8eb Mon Sep 17 00:00:00 2001 From: Wen <113942165+wen-coding@users.noreply.github.com> Date: Thu, 12 Sep 2024 12:06:33 -0700 Subject: [PATCH 106/119] Update company info. --- proposals/0046-optimistic-cluster-restart-automation.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index 0df94a84..db1e73cc 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -2,7 +2,7 @@ simd: '0046' title: Optimistic cluster restart automation authors: - - Wen Xu (Solana Labs) + - Wen Xu (Anza) category: Standard type: Core status: Draft From a722bc6ab9f8daaf9751b1758465c16b743a856b Mon Sep 17 00:00:00 2001 From: Wen <113942165+wen-coding@users.noreply.github.com> Date: Thu, 12 Sep 2024 12:24:02 -0700 Subject: [PATCH 107/119] Update the exit condition of step 2. --- ...0046-optimistic-cluster-restart-automation.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index db1e73cc..f8fdc358 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -217,14 +217,14 @@ protocol. We call these `non-conforming` validators. join the restart, any block with less than 67% - (100 - 80)% - 5% = 42% can never be optimistically confirmed before the restart. - It's possible that different validators see different 80%, so their must-have - blocks might be different, but in reality this case should be rare. Whenever - some block gets to 42%, repair could be started, because when more validators - join the restart, this number will only go up but will never go down. - - Once the validator gets `RestartLastVotedForkSlots`, it can calculate which - blocks must be repaired. When all those "must-have" blocks are repaired and - replayed, it can proceed to step 3. + It's possible that different validators see different 80%, so their + must-have blocks might be different, but there will be another repair round + in the final step so this is fine. Whenever some block gets to 42%, repair + could be started, because when more validators join the restart, this number + will only go up but will never go down. + + When a validator gets `RestartLastVotedForkSlots` from 80% of the stake, and + all those "must-have" blocks are repaired, it can proceed to next step. 3. **Calculate heaviest fork** From 4e531bbbd43319c739130cd5fc22d725d4980f54 Mon Sep 17 00:00:00 2001 From: Wen <113942165+wen-coding@users.noreply.github.com> Date: Thu, 12 Sep 2024 12:40:17 -0700 Subject: [PATCH 108/119] Clarify step 4. --- ...6-optimistic-cluster-restart-automation.md | 24 +++++++++---------- 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index f8fdc358..a23dc304 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -291,9 +291,9 @@ protocol. We call these `non-conforming` validators. 4. **Verify the heaviest fork of the leader** - While everyone will calculate its own heaviest fork in previous step, only one - leader specified on command line will send out its heaviest fork via Gossip. - Everyone else will check and accept the choice from the leader only. + While everyone will calculate its own heaviest fork in previous step, only + one leader specified on command line will send out its heaviest fork via + Gossip. Everyone else will check and accept the choice from the leader only. We use a new gossip message `RestartHeaviestFork`, its fields are: @@ -301,20 +301,20 @@ protocol. We call these `non-conforming` validators. * `hash`: `Hash` bank hash of the picked block. After deciding heaviest block, the leader gossip - `RestartHeaviestFork(X.slot, X.hash)` out, where X is the latest picked block. - The leader will stay up until manually restarted by its operator. + `RestartHeaviestFork(X.slot, X.hash)` out, where X is the block the leader + picked locally in previous step. The leader will stay up until manually + restarted by its operator. - Non-leader validator will discard `RestartHeaviestFork` sent by everyone else. - Upon receiving the heaviest fork from the leader, it will perform the - following checks: + For every non-leader validator, it will perform the following actions on the + heaviest fork sent by the leader: - 1. If the bank selected is missing locally, repair this slot and all slots with - higher stake. + 1. If the bank selected is missing locally, repair this slot and all slots + with higher stake. 2. Check that the bankhash of selected slot matches the data locally. - 3. Verify that the selected fork contains local root is on the same fork as - local heaviest fork. + 3. Verify that the selected fork contains local root, and that its local + heaviest fork slot is on the same fork as the leader's choice. If any of the above repair or check fails, exit with error message, the leader may have made a mistake and this needs manual intervention. From 4ac10f29c0cffb006cdd0ac424f146900b0c5136 Mon Sep 17 00:00:00 2001 From: Wen <113942165+wen-coding@users.noreply.github.com> Date: Thu, 12 Sep 2024 12:43:56 -0700 Subject: [PATCH 109/119] Fix typo. --- proposals/0046-optimistic-cluster-restart-automation.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index a23dc304..cd654f97 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -331,8 +331,8 @@ version, restart slot, and expected bankhash will be printed to the logs. After the snapshot generation is complete, a non leader then exits with exit code `200` to indicate work is complete. -A leader will stay up as long as possible to make sure any later comers get the -`RestartHeaviestFork` message. +A leader will stay up until restarted by the operator to make sure any late +comers get the `RestartHeaviestFork` message. ## Impact From cfbcb5bcef2ccb5fab56ec76b294d012302c0107 Mon Sep 17 00:00:00 2001 From: Wen <113942165+wen-coding@users.noreply.github.com> Date: Sat, 21 Sep 2024 08:39:54 +0800 Subject: [PATCH 110/119] Rename the leader to coordinator. Add the final HeaviestFork aggregation. --- ...6-optimistic-cluster-restart-automation.md | 42 +++++++++++-------- 1 file changed, 24 insertions(+), 18 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index cd654f97..775a3fdc 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -146,10 +146,10 @@ confirmed. 3. After repair is complete, the validator counts votes on each fork and computes local heaviest fork. -4. A leader which is configured on everyone's command line sends out its +4. A coordinator which is configured on everyone's command line sends out its heaviest fork to everyone. -5. Each validator verifies that the leader's choice is reasonable: +5. Each validator verifies that the coordinator's choice is reasonable: 1. If yes, proceed and restart @@ -289,24 +289,25 @@ protocol. We call these `non-conforming` validators. switched from fork F to fork D, 80% of the cluster can switch to fork D if that turns out to be the heaviest fork. -4. **Verify the heaviest fork of the leader** +4. **Verify the heaviest fork of the coordinator** While everyone will calculate its own heaviest fork in previous step, only - one leader specified on command line will send out its heaviest fork via - Gossip. Everyone else will check and accept the choice from the leader only. + one coordinator specified on command line will send out its heaviest fork + via Gossip. Everyone else will check and accept the choice from the + coordinator only. We use a new gossip message `RestartHeaviestFork`, its fields are: * `slot`: `u64` slot of the picked block. * `hash`: `Hash` bank hash of the picked block. - After deciding heaviest block, the leader gossip - `RestartHeaviestFork(X.slot, X.hash)` out, where X is the block the leader - picked locally in previous step. The leader will stay up until manually - restarted by its operator. + After deciding heaviest block, the coordinator gossip + `RestartHeaviestFork(X.slot, X.hash)` out, where X is the block the + coordinator picked locally in previous step. The coordinator will stay up + until manually restarted by its operator. - For every non-leader validator, it will perform the following actions on the - heaviest fork sent by the leader: + For every non-coordinator validator, it will perform the following actions + on the heaviest fork sent by the coordinator: 1. If the bank selected is missing locally, repair this slot and all slots with higher stake. @@ -314,10 +315,13 @@ protocol. We call these `non-conforming` validators. 2. Check that the bankhash of selected slot matches the data locally. 3. Verify that the selected fork contains local root, and that its local - heaviest fork slot is on the same fork as the leader's choice. + heaviest fork slot is on the same fork as the coordinator's choice. - If any of the above repair or check fails, exit with error message, the leader - may have made a mistake and this needs manual intervention. + If any of the above repair or check fails, exit with error message, the + coordinator may have made a mistake and this needs manual intervention. + + When exiting this step, no matter what a non-coordinator validator chooses, + it will send a `RestartHeaviestFork` back to leader to report its status. 5. **Generate incremental snapshot and exit** @@ -328,11 +332,13 @@ hard fork will be included in the newly generated snapshot. After snapshot generation completes, the `--wait_for_supermajority` args with correct shred version, restart slot, and expected bankhash will be printed to the logs. -After the snapshot generation is complete, a non leader then exits with exit -code `200` to indicate work is complete. +After the snapshot generation is complete, a non coordinator then exits with +exit code `200` to indicate work is complete. -A leader will stay up until restarted by the operator to make sure any late -comers get the `RestartHeaviestFork` message. +A coordinator will stay up until restarted by the operator to make sure any +late comers get the `RestartHeaviestFork` message. It also aggregates the +`RestartHeaviestFork` messages sent by the non-coordinators to report on the +status of the cluster. ## Impact From eb81566dc5c04b0608d525db33859fa5493560de Mon Sep 17 00:00:00 2001 From: Wen <113942165+wen-coding@users.noreply.github.com> Date: Sat, 5 Oct 2024 14:04:38 -0700 Subject: [PATCH 111/119] Fix the correctness proof. --- .../0046-optimistic-cluster-restart-automation.md | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index 775a3fdc..34d08f42 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -266,12 +266,13 @@ protocol. We call these `non-conforming` validators. Let's use `X` to denote `stake_on_validators_not_in_restart` for brevity. Assuming a block has child `A` and `B` both on the list, the children's combined stake would be `2 * (67% - 5% - X)`. Because we only allow one - RestartHeaviestFork per pubkey, even if the any validator can put both - children in its RestartHeaviestFork, the children's total stake should be - less than `100% - 5% - X`. We can calculate that if `134% - 2 * X < 95% - X`, - then `X > 39%`, this is not possible when we have at least 80% of the - validators in restart. So we prove any block in the list can have at most - one child in the list by contradiction. + RestartHeaviestFork per pubkey, every conforming validator should select + either `A` or `B`, even if the non-conforming validators could select both. + So the children's total stake should be less than `100% + 5% - X`. We can + calculate that if `134% - 2 * X < 100% + 5% - X`, then `X > 29%`, this is + not possible when we have at least 80% of the validators in restart. So we + prove any block in the list can have at most one child in the list by + contradiction. 3. If a block not optimistically confirmed before the restart is on the list, it can only be at the end of the list and none of its siblings are From a9ac86cb78031cec3133106d47eefdd29c1b2ae0 Mon Sep 17 00:00:00 2001 From: Wen <113942165+wen-coding@users.noreply.github.com> Date: Sat, 5 Oct 2024 14:31:38 -0700 Subject: [PATCH 112/119] Fix the correctness proof. --- .../0046-optimistic-cluster-restart-automation.md | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index 34d08f42..10c42356 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -266,13 +266,12 @@ protocol. We call these `non-conforming` validators. Let's use `X` to denote `stake_on_validators_not_in_restart` for brevity. Assuming a block has child `A` and `B` both on the list, the children's combined stake would be `2 * (67% - 5% - X)`. Because we only allow one - RestartHeaviestFork per pubkey, every conforming validator should select - either `A` or `B`, even if the non-conforming validators could select both. - So the children's total stake should be less than `100% + 5% - X`. We can - calculate that if `134% - 2 * X < 100% + 5% - X`, then `X > 29%`, this is - not possible when we have at least 80% of the validators in restart. So we - prove any block in the list can have at most one child in the list by - contradiction. + RestartHeaviestFork per pubkey, every validator should select either `A` + or `B`, it's easy to find and filter out vialators who selected both. So the + children's total stake should be less than `100% - X`. We can calculate that + if `124% - 2 * X < 100% - X`, then `X > 24%`, this is not possible when we + have at least 80% of the validators in restart. So we prove any block in the + list can have at most one child in the list by contradiction. 3. If a block not optimistically confirmed before the restart is on the list, it can only be at the end of the list and none of its siblings are From b8185eb4dff83fc24e1a6f951b3225351e60735c Mon Sep 17 00:00:00 2001 From: Wen <113942165+wen-coding@users.noreply.github.com> Date: Tue, 22 Oct 2024 09:48:34 -0700 Subject: [PATCH 113/119] Clarify that we pick the slot first then replay to get hash. --- proposals/0046-optimistic-cluster-restart-automation.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index 10c42356..11c5d497 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -230,7 +230,7 @@ protocol. We call these `non-conforming` validators. After receiving `RestartLastVotedForkSlots` from the validators holding stake more than `RESTART_STAKE_THRESHOLD` and repairing slots in "must-have" - category, replay all blocks and pick the heaviest fork like this: + category, pick the heaviest fork like this: 1. Calculate the threshold for a block to be on the heaviest fork, the heaviest fork should have all blocks with possibility to be optimistically @@ -289,6 +289,9 @@ protocol. We call these `non-conforming` validators. switched from fork F to fork D, 80% of the cluster can switch to fork D if that turns out to be the heaviest fork. + After picking the appropriate slot, replay the block and all its ancestors + to get the bankhash for the picked slot. + 4. **Verify the heaviest fork of the coordinator** While everyone will calculate its own heaviest fork in previous step, only From fac38b682c40536be763535c211285117b489114 Mon Sep 17 00:00:00 2001 From: Wen <113942165+wen-coding@users.noreply.github.com> Date: Tue, 22 Oct 2024 09:57:50 -0700 Subject: [PATCH 114/119] Change status to Review --- proposals/0046-optimistic-cluster-restart-automation.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index 11c5d497..e60f03e0 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -5,7 +5,7 @@ authors: - Wen Xu (Anza) category: Standard type: Core -status: Draft +status: Review created: 2023-04-07 feature: (fill in with feature tracking issues once accepted) --- From 7c9d0984e09a1355ff7d423b28bc97d515d87250 Mon Sep 17 00:00:00 2001 From: Wen <113942165+wen-coding@users.noreply.github.com> Date: Fri, 1 Nov 2024 20:33:00 -0700 Subject: [PATCH 115/119] Some small fixes. --- ...6-optimistic-cluster-restart-automation.md | 120 +++++++++--------- 1 file changed, 62 insertions(+), 58 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index e60f03e0..f1ab9fb3 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -13,7 +13,7 @@ feature: (fill in with feature tracking issues once accepted) ## Summary During a cluster restart following an outage, make validators enter a separate -recovery protocol that uses gossip to exchange local status and automatically +recovery protocol that uses Gossip to exchange local status and automatically reach consensus on the block to restart from. Proceed to restart if validators in the restart can reach agreement, or print debug information and halt otherwise. To distinguish the new restart process from other operations, we @@ -47,7 +47,7 @@ reached. We call this preparation phase where block production and voting are paused the `wen restart phase`. * `wen restart shred version`: right now we update `shred_version` during a -`cluster restart`, it is used to verify received shreds and filter gossip +`cluster restart`, it is used to verify received shreds and filter Gossip peers. In the proposed optimistic `cluster restart` plan, we introduce a new temporary shred version in the `wen restart phase` so validators in restart don't interfere with those not in restart. Currently this `wen restart shred @@ -57,7 +57,7 @@ version` is calculated using `(current_shred_version + 1) % 0xffff`. restart so they can make decision for the whole cluster. If everything works perfect, we only need 2/3 of the total stake. However, validators could die or perform abnormally, so we currently set the `RESTART_STAKE_THRESHOLD` at -80%, which is the same as now. +80%, which is the same as what we use now for `--wait_for_supermajority`. ## Motivation @@ -98,17 +98,17 @@ we can get human's attention. And it doesn't solve the cases where new binary is needed. So for now we still plan to have human in the loop. After we gain more experience with the restart approach in this proposal, we -may slowly try to automate more parts to improve cluster reliability. +may slowly try to make the process more automatic to improve reliability. -### Use gossip and consensus to figure out restart slot before the restart +### Use Gossip and consensus to figure out restart slot before the restart The main difference between this and the current restart proposal is this alternative tries to make the cluster automatically enter restart preparation phase without human intervention. While getting humans out of the loop improves recovery speed, there are -concerns about recovery gossip messages interfering with normal gossip -messages, and automatically start a new message in gossip seems risky. +concerns about recovery Gossip messages interfering with normal Gossip +messages, and automatically start a new message in Gossip seems risky. ### Automatically reduce block production in an outage @@ -143,8 +143,8 @@ voted fork information to all other validators in restart. validator repairs all blocks which could potentially have been optimistically confirmed. -3. After repair is complete, the validator counts votes on each fork and -computes local heaviest fork. +3. After enough validators are in restart and repair is complete, the validator +counts votes on each fork and computes local heaviest fork. 4. A coordinator which is configured on everyone's command line sends out its heaviest fork to everyone. @@ -163,12 +163,12 @@ protocol. We call these `non-conforming` validators. ### Wen restart phase -1. **gossip last vote and ancestors on that fork** +1. **Gossip last vote and ancestors on that fork** The main goal of this step is to propagate most recent ancestors on the last voted fork to all others in restart. - We use a new gossip message `RestartLastVotedForkSlots`, its fields are: + We use a new Gossip message `RestartLastVotedForkSlots`, its fields are: * `last_voted_slot`: `u64` the slot last voted, this also serves as last_slot for the bit vector. @@ -177,24 +177,28 @@ protocol. We call these `non-conforming` validators. slots on sender's last voted fork. the least significant bit is always `last_voted_slot`, most significant bit is `last_voted_slot-65535`. - The number of ancestor slots sent is hard coded at 65535, because that's - 400ms * 65535 = 7.3 hours, we assume that most validator administrators - would have noticed an outage within 7 hours, and the optimistic - confirmation must have halted within 64k slots of the last confirmed block. - Also 65535 bits nicely fits into u16, which makes encoding more compact. - If a validator restarts after 7 hours past the outage, it cannot join the - restart this way. If enough validators failed to restart within 7 hours, - then we fallback to the manual, interactive `cluster restart` method. + The max distance between oldest ancestor slot and last voted slot is hard + coded at 65535, because that's 400ms * 65535 = 7.3 hours, we assume that + most validator administrators would have noticed an outage within 7 hours, + and the optimistic confirmation must have halted within 64k slots of the + last confirmed block. Also 65535 bits nicely fits into u16, which makes + encoding more compact. If a validator restarts after 7 hours past the + outage, it cannot join the restart this way. If enough validators failed to + restart within 7 hours, then we fallback to the manual, interactive + `cluster restart` method. When a validator enters restart, it uses `wen restart shred version` to - avoid interfering with those outside the restart. There is a slight chance - that the `wen restart shred version` would collide with the shred version - after the `wen restart phase`, but even if this rare case occurred, we plan - to flush gossip after successful restart so it should not be a problem. - - To be extra cautious, we will also filter out `RestartLastVotedForkSlots` - and `RestartHeaviestFork` (described later) in gossip if a validator is not in - `wen restart phase`. + avoid interfering with those outside the restart. To be extra cautious, we + will also filter out `RestartLastVotedForkSlots` and `RestartHeaviestFork` + (described later) in Gossip if a validator is not in `wen restart phase`. + There is a slight chance that the `wen restart shred version` would collide + with the shred version after the `wen restart phase`, but with the filtering + described above it should not be a problem. + + When a validator receives `RestartLastVotedForkSlots` from someone else, it + will discard all slots smaller than the local root. Because the local root + should be an `optimistic confirmed` slot, it does not need to keep any slot + older than local root. 2. **Repair ledgers up to the restart slot** @@ -240,16 +244,13 @@ protocol. We call these `non-conforming` validators. `67% - 5% - (100-80)% = 42%`. If 90% validators are in restart, the number would be `67% - 5% - (100-90)% = 52%`. - 2. Sort all blocks passing the calculated threshold, and verify that they - form a single chain. If it's the first block in the list, its parent block - should be the local root. Otherwise its parent block should be the one - immediately ahead of it in the list. + 2. Sort all blocks over the threshold by slot number, and verify that they + form a single chain. The first block in the list should be the local root. If any block does not satisfy above constraint, print the first offending block and exit. - 3. If the list is empty, then output local root as the HeaviestFork. - Otherwise output the last block in the list as the HeavistFork. + The list should not be empty, it should contain at least the local root. To see why the above algorithm is safe, we will prove that: @@ -294,17 +295,17 @@ protocol. We call these `non-conforming` validators. 4. **Verify the heaviest fork of the coordinator** - While everyone will calculate its own heaviest fork in previous step, only - one coordinator specified on command line will send out its heaviest fork - via Gossip. Everyone else will check and accept the choice from the - coordinator only. + There will be one coordinator specified on the command line of everyone's + command line. Even though everyone will calculate its own heaviest fork in + previous step, only the coordinator's heaviest fork will be checked and + optionally accepted by others. - We use a new gossip message `RestartHeaviestFork`, its fields are: + We use a new Gossip message `RestartHeaviestFork`, its fields are: * `slot`: `u64` slot of the picked block. * `hash`: `Hash` bank hash of the picked block. - After deciding heaviest block, the coordinator gossip + After deciding the heaviest block, the coordinator Gossip `RestartHeaviestFork(X.slot, X.hash)` out, where X is the block the coordinator picked locally in previous step. The coordinator will stay up until manually restarted by its operator. @@ -325,15 +326,17 @@ protocol. We call these `non-conforming` validators. When exiting this step, no matter what a non-coordinator validator chooses, it will send a `RestartHeaviestFork` back to leader to report its status. + This reporting is just for ease of aggregating the cluster's status at the + coordinator, it doesn't have other effects. 5. **Generate incremental snapshot and exit** If the previous step succeeds, the validator immediately starts adding a hard -fork at the designated slot and perform set root. Then it will start generating -an incremental snapshot at the agreed upon `cluster restart slot`. This way the -hard fork will be included in the newly generated snapshot. After snapshot -generation completes, the `--wait_for_supermajority` args with correct shred -version, restart slot, and expected bankhash will be printed to the logs. +fork at the designated slot and perform `set_root`. Then it will start +generating an incremental snapshot at the agreed upon `cluster restart slot`. +After snapshot generation completes, the `--wait_for_supermajority` args with +correct shred version, restart slot, and expected bankhash will be printed to +the logs. After the snapshot generation is complete, a non coordinator then exits with exit code `200` to indicate work is complete. @@ -345,20 +348,19 @@ status of the cluster. ## Impact -This proposal adds a new wen restart mode to validators, during this phase -the validators will not participate in normal cluster activities, which is the -same as now. Compared to today's `cluster restart`, the new mode may mean more -network bandwidth and memory on the restarting validators, but it guarantees -the safety of optimistically confirmed user transactions, and validator -operators don't need to manually generate and download snapshots again. +This proposal adds a new `wen restart` mode to validators, under this mode the +validators will not participate in normal cluster activities. Compared to +today's `cluster restart`, the new mode may mean more network bandwidth and +memory on the restarting validators, but it guarantees the safety of +optimistically confirmed user transactions, and validator operators don't need +to manually generate and download snapshots during a `cluster restart`. ## Security Considerations -The two added gossip messages `RestartLastVotedForkSlots` and +The two added Gossip messages `RestartLastVotedForkSlots` and `RestartHeaviestFork` will only be sent and processed when the validator is -restarted in the new proposed optimistic `cluster restart` mode. They will also -be filtered out if a validator is not in this mode. So random validator -restarting in the new mode will not bring extra burden to the system. +restarted in `wen restart` mode. So random validator restarting in the new +mode will not clutter the Gossip CRDS table of a normal system. Non-conforming validators could send out wrong `RestartLastVotedForkSlots` messages to mess with `cluster restart`s, these should be included in the @@ -381,12 +383,14 @@ are made: * Every validator only handles 2 epochs, any validator will discard slots which belong to an epoch which is > 1 epoch away from its root. If a validator -has very old root so it can't proceed, it will exit and report error. +has very old root so it can't proceed, it will exit and report error. Since +we assume an outage will be discovered within 7 hours and one epoch is roughly +two days, handling 2 epochs should be enough. * The stake weight of each slot is calculated using the epoch the slot is in. Because right now epoch stakes are calculated 1 epoch ahead of time, and we -only handle outages spanning 7 hours, the local root bank should have the -epoch stakes for all epochs we need. +only handle 2 epochs, the local root bank should have the epoch stakes for all +epochs we need. * When aggregating `RestartLastVotedForkSlots`, for any epoch with validators voting for any slot in this epoch having at least 33% stake, calculate the @@ -401,4 +405,4 @@ always be considered, because root should have > 33% stake. This change is backward compatible with previous versions, because validators only enter the new mode during new restart mode which is controlled by a command line argument. All current restart arguments like ---wait-for-supermajority and --expected-bank-hash will be kept as is for now. +`--wait-for-supermajority` and `--expected-bank-hash` will be kept as is. From 79b3be71dfb2cd39c115f3ed126edfa4739cf419 Mon Sep 17 00:00:00 2001 From: Wen <113942165+wen-coding@users.noreply.github.com> Date: Fri, 8 Nov 2024 12:39:55 -0800 Subject: [PATCH 116/119] Fix typo. --- proposals/0046-optimistic-cluster-restart-automation.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index f1ab9fb3..6ffc40af 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -79,7 +79,7 @@ set up their own monitoring accordingly. However, there are many ways an automatic restart can go wrong, mostly due to unforseen situations or software bugs. To make things really safe, we apply -multiple checks durinng the restart, if any check fails, the automatic restart +multiple checks during the restart, if any check fails, the automatic restart is halted and debugging info printed, waiting for human intervention. Therefore we say this is an optimistic cluster restart procedure. From ab33197b67e3227149124a7d3a20dfcaedf59c89 Mon Sep 17 00:00:00 2001 From: Wen <113942165+wen-coding@users.noreply.github.com> Date: Thu, 14 Nov 2024 15:57:39 -0800 Subject: [PATCH 117/119] Add proof for the 33% limit. --- ...46-optimistic-cluster-restart-automation.md | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index 6ffc40af..1bfd6044 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -400,6 +400,24 @@ guarantees that whichever slot we select for HeaviestFork, we have enough validators in the restart. Note that the epoch containing local root should always be considered, because root should have > 33% stake. +Now we prove this is safe, whenever there is a slot being optimistically +confirmed in the new epoch, we will only exit the aggregating of +`RestartLastVotedForkSlots` stage if > 80% in the new epoch joined. + +1. Assume slot `X` is optimistically confirmed in the new epoch, it has +>67% stake in the new epoch. + +2. Our stake warmup/cooldown limit is at 9% currently, so at least +67% - 9% = 58% of the stake were from the old epoch. + +3. We always have >80% stake of the old epoch, so at least +58% - 20% = 38% of the stake were in restart. Excluding non-conforming +stake, at least 38% - 5% = 33% should be in the restart and they +should at least report they voted for `X` which is in the new epoch. + +4. According to the above rule we will require >80% stake in the new +epoch as well. + ## Backwards Compatibility This change is backward compatible with previous versions, because validators From 331fe01c1c0e3494303e2707b96fb740dfe36257 Mon Sep 17 00:00:00 2001 From: Wen <113942165+wen-coding@users.noreply.github.com> Date: Thu, 14 Nov 2024 16:00:59 -0800 Subject: [PATCH 118/119] Make linter happy. --- proposals/0046-optimistic-cluster-restart-automation.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index 1bfd6044..39a5a300 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -402,7 +402,8 @@ always be considered, because root should have > 33% stake. Now we prove this is safe, whenever there is a slot being optimistically confirmed in the new epoch, we will only exit the aggregating of -`RestartLastVotedForkSlots` stage if > 80% in the new epoch joined. +`RestartLastVotedForkSlots` stage if > 80% in the new epoch joined: + 1. Assume slot `X` is optimistically confirmed in the new epoch, it has >67% stake in the new epoch. From d22ba4a21b239cff87f0a4da5d57c7597ab8acc3 Mon Sep 17 00:00:00 2001 From: Wen <113942165+wen-coding@users.noreply.github.com> Date: Thu, 14 Nov 2024 16:02:06 -0800 Subject: [PATCH 119/119] Make linter happy. --- proposals/0046-optimistic-cluster-restart-automation.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/proposals/0046-optimistic-cluster-restart-automation.md b/proposals/0046-optimistic-cluster-restart-automation.md index 39a5a300..88aed374 100644 --- a/proposals/0046-optimistic-cluster-restart-automation.md +++ b/proposals/0046-optimistic-cluster-restart-automation.md @@ -404,9 +404,8 @@ Now we prove this is safe, whenever there is a slot being optimistically confirmed in the new epoch, we will only exit the aggregating of `RestartLastVotedForkSlots` stage if > 80% in the new epoch joined: - -1. Assume slot `X` is optimistically confirmed in the new epoch, it has ->67% stake in the new epoch. +1. Assume slot `X` is optimistically confirmed in the new epoch, it has >67% +stake in the new epoch. 2. Our stake warmup/cooldown limit is at 9% currently, so at least 67% - 9% = 58% of the stake were from the old epoch.