-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validator set pruning using relayed Validator sets #327
Conversation
Like this change. cc @fedekunze @marbar3778 this will need to be in the refactor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great change!
Looks like integration tests are still broken. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs documentation within the PR.
// Question: what here can be epoched? | ||
slashing(ctx, k) | ||
attestationTally(ctx, k) | ||
cleanupTimedOutBatches(ctx, k) | ||
cleanupTimedOutLogicCalls(ctx, k) | ||
createValsets(ctx, k) | ||
pruneValsets(ctx, k, params) | ||
// TODO: prune claims, attestations when they pass in the handler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please open a issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -111,6 +111,14 @@ func (a AttestationHandler) Handle(ctx sdk.Context, att types.Attestation, claim | |||
|
|||
// Add to denom-erc20 mapping | |||
a.keeper.setCosmosOriginatedDenomToERC20(ctx, claim.CosmosDenom, claim.TokenContract) | |||
case *types.MsgValsetUpdatedClaim: | |||
// TODO here we should check the contents of the validator set against |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please open an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ctx.EventManager().EmitEvent( | ||
sdk.NewEvent( | ||
sdk.EventTypeMessage, | ||
sdk.NewAttribute(sdk.AttributeKeyModule, msg.Type()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The message type is not the module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is generally the pattern we use in the file. So if this is an issue we should open an issue and change it throughout the fie.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall looks good. It would be good to capture the design in a markdown file, this will help future teams with onboarding.
419a16c
to
13a5606
Compare
13a5606
to
15af0e1
Compare
This patch does somthing that we've tried to avoid for a while. Which is adding an event nonce to the validator set updated events. So as to make it possible to relay them into the Cosmos module oracle. This design change is driven by problems with validator set pruning. In the worst case validator sets can be produced every single block. But the worst case on the EVM chain is similarly bad. It could be halted for several hours or even days. In order to retain enough validator sets to ensure bridge continuity and also prune as many validator sets as safely possible we need this information.
This patch moves the Ethereum event signatures into constants in the gravity_utils module rather than repeated string values peppered across the code. Should make it easier to deal with changes there in the future.
This patch updates the event parsing in the orchestrator for the new valset update with event nonce.
This patch adds the MsgValsetUpdatedEvent to the cosmos module logic, this processes and stores the latest validator set as an event. We've resisted doing this for quite a long time. With concern that developers would want to use this value in ways it's not safe to use. But we need it for two purposes that it is safe to use for. 1) warnings about bridge highjacking 2) efficient pruning of validator sets without pruning too many sets
This code provides the most efficient possible validator set pruning that we can prove is correct. By pruning the last observed validator we can keep the number of valsets in the store to a very low value during normal operation and safely extend the required storage in the case that the bridge becomes halted for some reason.
Now that we are emitting validator set update events with an event nonce we have an event that is using event nonce zero. This is a big problem on the Cosmos side as the logic has all been developed with the assumption that event nonce 0 is a special unused value.
15af0e1
to
c13d12e
Compare
@jtremback is working on the spec here. Design is also outlined in #340 |
Reasoning
This pull request re-introduces pruning for validator sets but in a way that is guaranteed to be secure.
In the past we simply pruned validator sets after a given number of blocks. This solution was simple and would work well the vast majority of the time. But could result in a catastrophic failure of the bridge in some edge cases.
Lets take for example a case where the EVM chain halts for 24 hours for whatever reason. During this time the Cosmos chain continues normally and has a complete validator set change out with no hostile intent on the part of any validator. The former pruning logic would remove the signed validator sets from the history as the slashing window passed and by the time the EVM chain restarted no update signed by the validator set that controlled the chain 24 hours ago would still be in the store.
The bridge and all the funds in it would be lost at this point. Even though the signatures required to transition it to the new validator set where created and submitted, they where just deleted before they could be used.
The only solution to this edge case is to do something we've wanted to avoid for a while. Ferry the ValsetUpdated event from Ethereum to Cosmos as an Oracle claim.
We have tried to avoid this mainly because when it was around early in the development of Gravity developers had a habit of misusing it and generally assuming it's more accurate than is possible considering the constraints imposed by the separate timing of two different chains.
But without some awareness of the past state of the Ethereum chain we would have to wait weeks before we could safely assume that the validator set had been updated on the EVM chain. This was previously accepted as a 'solution' but the failure of testnet1 under mountains of validator set updates is a pretty strong takedown of that design decision.
Secondary Reasoning
In addition to our pruning problem having a reasonable idea of what validator set exists on Ethereum will resolve some audit issues. One of the audit issues Informal raised was that a Cosmos full node operator might not know that the bridge has been highjacked.
Now that we are ferrying the data we can (although this pull request does not implement it) validate that the validator set on Ethereum matches the stored value for that nonce on Cosmos. In the case of a failure the Cosmos chain can panic, or raise whatever warning is required without requiring the user to run any sort of external binary to determine that.