Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: fpd panics on jailed finality provider #78

Merged
merged 5 commits into from
Oct 1, 2024

Conversation

gitferry
Copy link
Member

In the previous implementation, once a finality provider is jailed, when the fpd is restarted, it will panic due to error from Babylon when fast sync. This PR fixed this issue with the following changes:

  1. We implemented unjail finality provider CLI other than using the one from Babylon. This implementation will set the fp instance status to inactive after unjail tx is successfully sent (will later updated to active if the fp has voting power).
  2. Once jailed error is detected while starting a fp instance, it will fail but the fpd will not panic, meaning that the loop for updating stored fp status continues running.

Now the flow of jailing/unjailing becomes the follows:

  1. fpd detects jailing via err when sending a fp sig or a loop for checking fp status
  2. once jailed detected, fpd terminates the fp instance without terminating the program
  3. the operator checks fp signing info to get the jail_until via babylond q finality signing-info [fp-pk-hex]
  4. after the jail_until is passed, the operator can unjail the fp by executing fpd unjail-finality-provider [fp-pk-hex]
  5. if everything goes well, the fp will continue sending finality votes if it has voting power after a period of waiting for state transition

Use: "unjail-finality-provider",
Aliases: []string{"ufp"},
Short: "Unjail the given finality provider.",
Example: fmt.Sprintf(`fpd unjail-finality-provider [eots_pk]--daemon-address %s ...`, defaultFpdDaemonAddress),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Example: fmt.Sprintf(`fpd unjail-finality-provider [eots_pk]--daemon-address %s ...`, defaultFpdDaemonAddress),
Example: fmt.Sprintf(`fpd unjail-finality-provider [eots_pk] --daemon-address %s ...`, defaultFpdDaemonAddress),

@gitferry gitferry force-pushed the fix/fp-panic-when-jailed branch from 2aaae63 to 7ed2c90 Compare September 30, 2024 14:04
@gitferry gitferry force-pushed the fix/fp-panic-when-jailed branch from 7ed2c90 to 171cc6d Compare September 30, 2024 14:14
@@ -247,6 +246,11 @@ func (fpm *FinalityProviderManager) StartFinalityProvider(fpPk *bbntypes.BIP340P
}

if err := fpm.addFinalityProviderInstance(fpPk, passphrase); err != nil {
if errors.Is(err, ErrFinalityProviderJailed) {
fpm.logger.Error("failed to start finality provider", zap.Error(err))
// do not return error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not? 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here. This seems like an error to me

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of not returning error is that we still want to start the app even if we fail to start the fp instance due to fp jailed. But I agree that we should not handle the error here. I changed to handle the error in the application level in 0eb53b3

Copy link
Contributor

@RafilxTenfen RafilxTenfen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK, only one question

Comment on lines 108 to 110
if errors.Is(err, ErrFinalityProviderJailed) {
fp.MustSetStatus(proto.FinalityProviderStatus_JAILED)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think this should be done here or inside bootstrap()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Moved the error handler to bootstrap()

@@ -247,6 +246,11 @@ func (fpm *FinalityProviderManager) StartFinalityProvider(fpPk *bbntypes.BIP340P
}

if err := fpm.addFinalityProviderInstance(fpPk, passphrase); err != nil {
if errors.Is(err, ErrFinalityProviderJailed) {
fpm.logger.Error("failed to start finality provider", zap.Error(err))
// do not return error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here. This seems like an error to me

Comment on lines +262 to +265
msg := &finalitytypes.MsgUnjailFinalityProvider{
Signer: bc.mustGetTxSigner(),
FpBtcPk: bbntypes.NewBIP340PubKeyFromBTCPK(fpPk),
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think if it's beneficial to check whether the jail_until is passed before submitting this tx? If the operator submits unjail tx when the FP hasn't passed the jailing period, the tx will still consume gas but the FP won't get unjailed, which is not expected

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I thought gas would not be confused as it will be rejected in the mempool level? Created an issue #79 to track this as this needs introduction of signing info queries and will be handled in a separate pr

@gitferry gitferry merged commit 74baa4c into main Oct 1, 2024
8 checks passed
gitferry added a commit that referenced this pull request Oct 1, 2024
In the previous implementation, once a finality provider is jailed, when
the fpd is restarted, it will panic due to error from Babylon when fast
sync. This PR fixed this issue with the following changes:
1. We implemented `unjail` finality provider CLI other than using the
one from Babylon. This implementation will set the fp instance status to
`inactive` after `unjail` tx is successfully sent (will later updated to
active if the fp has voting power).
2. Once `jailed` error is detected while starting a fp instance, it will
fail but the fpd will not panic, meaning that the loop for updating
stored fp status continues running.

Now the flow of jailing/unjailing becomes the follows:
1. fpd detects jailing via err when sending a fp sig or a loop for
checking fp status
2. once `jailed` detected, fpd terminates the fp instance without
terminating the program
3. the operator checks fp signing info to get the `jail_until` via
`babylond q finality signing-info [fp-pk-hex]`
4. after the `jail_until` is passed, the operator can unjail the fp by
executing `fpd unjail-finality-provider [fp-pk-hex]`
5. if everything goes well, the fp will continue sending finality votes
if it has voting power after a period of waiting for state transition
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants