Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: committee selection beacon slashing #65

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

snormore
Copy link
Member

Slash nodes that commit without revealing in the committee selection random beacon process.

  • NodeRegistryChange::Slashed is used to propagate the slashing event on the block execution response.
  • Metadata::EpochEra is introduced to reset the narwhal store/dag so that it can be reconfigured mid-epoch when a committee member is slashed and no longer has sufficient stake.

@snormore snormore force-pushed the s/committee-beacon-slashing branch 5 times, most recently from 94cb380 to 04bc674 Compare November 19, 2024 12:15
@matthias-wright
Copy link
Collaborator

This is not critical at all, but I'm wondering if it makes sense to have checks in place that ensure that CommitteeSelectionBeaconNonRevealSlashAmount is always less than or equal to MinimumNodeStake. We could check it when loading the genesis and then again when either of them is changed.

@snormore
Copy link
Member Author

snormore commented Nov 21, 2024

Are we sure we want to restrict the slash amount to be less than or equal to min stake? A node can stake more than min stake, but we'd be making the max possible value for slash amount that min stake. Currently if the slash amount is greater than the node has staked, it just zeros out the stake.

@matthias-wright
Copy link
Collaborator

Oh yeah in that case we probably shouldn't restrict the slash amount

@matthias-wright
Copy link
Collaborator

It looks like we are only slashing nodes after a reveal phase timeout, but not after a commit phase timeout. is this intended?

@snormore
Copy link
Member Author

Yep slashing for non-reveal is just to penalize attempts to cause a reroll after knowing the other reveals. There's no penalty for not participating in a commit phase. The incentive to participate is really just to move the epoch change along and get your end of epoch rewards.

@matthias-wright
Copy link
Collaborator

I ran a small testnet locally with the spawn_swarm binary and for one of the nodes, I commented out the part in the committee beacon where the reveal transaction is submitted:

self.execute_transaction_with_retry_on_invalid_nonce(

I also added a log here to print the non-revealing nodes:


I see the nodes that didn't sent the reveal transaction in the logs, but it's stake is never slashed. I've waited until after the era change.
Here are the logs: https://justpaste.it/emp9d

If you want to reproduce this test (I'm sorry, it's hacky 😄):

  • check out the branch https://github.com/fleek-network/lightning/tree/review-slashing
  • plug in your public keys that you have locally in ~/.lightning/keystore here:
    NodePublicKey::from_base58("8ZCBNU5PgsnLGAvevPPBGxABmm6B4tv1AeEnkw3X7UoA").unwrap();
  • remove the data dir if it exists: rm -r ~/.lightning/data
  • start the swarm: RUST_LOG=info cargo run --bin spawn_swarm -- --num-nodes 4 --committee-size 4 --epoch-time 90000
  • while the swarm binary is building, quickly comment out the part where the reveal transaction is sent:
    self.execute_transaction_with_retry_on_invalid_nonce(
  • in the same directory, but in a new shell, start your node: cargo run --bin lightning-node run
  • your node will be part of the swarm testnet
  • query your node's stake balance: curl -X POST http://127.0.0.1:12030/rpc/v0 -H 'Content-Type: application/json' -d '{"id":1,"jsonrpc":"2.0","method":"flk_get_node_info","params":{"public_key": "YOUR_NODE_PUBKEY"}}'
  • you might have to change the rpc port in the step above, the swarm binary will print all the ports at the beginning

There is also a good chance I'm missing something. I just want to make sure that this isn't an edge case.

@snormore
Copy link
Member Author

snormore commented Nov 22, 2024

Thanks for testing it out like this. In your logs we can see it was slashed but the slashing amount is set to 0 ("slashing node 4 by 0"). I pushed a commit to this branch and your test branch that sets the slash amount to 1000 for the e2e/swarm genesis.

Note too that there are a few test-utils/e2e tests using real consensus that do similar to your swarm test here if that's helpful while you're testing things.

@matthias-wright
Copy link
Collaborator

Oh okay perfect. I just tested it again with the newest commit, and now the stake balance goes to 0.

///
/// The parcels from the previous are garbage collected, and the parcels from the next becom
/// ethe parcels from the current after validating them.
pub fn change_committee(&mut self, committee: &[NodeIndex]) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking about how changing this to change_committee and calling it on every era change in addition to every epoch change affects the transaction store.
I wrote down how the transaction store works to remind myself. I'm just putting this here for context:

The transaction store keeps track of the parcels from the previous epoch, the current epoch, and the next epoch.
We only store parcels from the previous epoch to help nodes that are behind. Parcels from older epochs than the previous epoch are garbage collected.
We optimistically store parcels from the next epoch, because we might be slightly behind, and receive a parcel from epoch n+1, while we are still on epoch n.
Once we change epochs, we validate the parcels from the next epoch by checking that they were sent from a validator on the current committee (from epoch n+1).

With the new changes, each epoch can also have multiple eras. A new era begins when a node is slashed and afterwards its staking amount is less than the minimum required staking amount, rendering the node unfit to serve on the committee.
When a new era begins, change_committee is called on the transaction store, and the Narwhal service is restarted with the new committee.

I've been thinking about this scenario:

  • lets say we have epoch 0 and era 0
  • we change epochs to epoch 1
  • node A (non-validator) receives some parcel p from epoch 1 while still being on epoch 0 and stores it optimistically
  • a node on the committee from epoch 1 fails to reveal and is slashed, and removed from the committee
  • Narwhal is restarted with epoch 1 and era 1
  • once the epoch change transaction arrives at node A, it changes epoch, validates parcel p, and executes the transactions
  • now it receives the transaction that triggers the new era 1
  • node A will now rotate the parcels in the transaction store again
    • the transactions from the next epoch (2) are validated and become the transactions from the current epoch (however, there aren't any transactions from the next epoch since we are still on epoch 1)
    • the transactions from the current epoch (1) become the transactions from the previous epoch
    • the transactions from the previous epoch (0) are garbage-collected
  • however, now the transactions from the current epoch 1 are stored under "previous epoch"
  • and the transactions from the previous epoch 0 are garbage-collected, even though epoch 0 is still the previous epoch
  • this isn't an issue for node A, because in the transaction store get_parcel and get_attestation will also check for parcels in the previous epoch
  • but it makes it less likely that node A can help another node that fell behind by re-sending parcels
  • also, if we have another era change during epoch 1, we will remove all the parcels from the transaction store
  • I think at that point it can become a problem even for node A

I think an era change should not cause the transaction store to rotate parcels. I think that if we don't call change_committee on the transaction store after an era change, we should be fine.

  • we still store the parcels from the previous epoch
  • we still store the parcels from the current epoch
  • yes, the committee changed, but all parcels that came before the era change are still valid

I'm still wrapping my head around some of these edge cases. Can someone else go through this scenario and see if my reasoning is correct? @snormore and maybe also @daltoncoder

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left this comment 2 days ago, it said "PENDING" next to it. I thought that meant that the review is pending, but turns out the comment wasn't even posted lol.

Copy link
Member Author

@snormore snormore Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That reasoning seems sound to me on first read, but I'm still wrapping my head around it too and wondering if I can write a test for the scenario.

Is 3b0c7ac what you had in mind for the change? Tests are still green with it, so that's good 😄

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is exactly what I had in mind. We just have to make sure that this won't cause a node to wrongfully reject a parcel or something like that. I don't think this can happen, but I will also go over it again.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went over it again, and I think we do need to make some changes to the transaction store.

  • lets say we are in epoch n and era 0
  • a node is slashed and we move to era 1
  • parcels p1 and p2 are from era 0 and parcel p3 is from era 1 with the new committee
  • p2 contains the transaction that triggers the era change
  • node A that isn't on the committee receives the parcels out of order: p1, p3, p2
  • when node A receives p1, it checks the originator, which is from the committee in era 0. since node A is still on era 0 it correctly accepts the parcel
  • node A receives p3, which originates from a validator v in era 1, but node A is still on era 0
  • there three possibilities:
  1. validator v was both on the committee in era 0 and era 1
    • in this case node A will correctly accept parcel p3 since v is part of the committee in era 0 (which is node A's current committee)
    • we don't have to make any changes for this case
  2. validator v was not part of the committee in era 0, but is on the committee in era 1
    • then node A will reject p3, even though it is a valid parcel
    • however, as I understand it, this cannot happen, because for any two eras e1 and e2 in some epoch n: if e1 < e2, then the committee in e2 is a subset of the committee in e1, so there will never be a node in the committee of epoch e2 that wasn't on the committee in epoch e1.
    • we don't have to make any changes for this case
  3. validator v was on the committee in era 0, but not on the committee in era 1 (v was slashed and kicked)
    • in this case p3 is not a valid parcel
    • however, node A will accept p3 (since A is still in era 0 and v was on the committee in era 0)
    • we have to make changes for this case
    • in the transaction store instead of having previous epoch | current epoch | next epoch we can change it to previous epoch | current epoch | current epoch (pending) | next epoch
    • on every era change, we validate the parcels and attestations in current epoch (pending) the way we do it with the parcels and attestations in next epoch
    • the valid parcels and attestations are moved to current epoch

Please check my logic here.

If my logic is sound here, do you want me to push this change? I'm fine either way

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That logic makes sense to me. Yep if you're able to push the change that'd be great, appreciate it 🙏

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once I started implementing this, I realized that this won't work in case there aren't any era changes in an epoch. Because then the parcels/attestations from current epoch (pending) are never moved to current epoch.
However, I also realized that we don't necessary need this, because in case 3 from my comment above, p3 will be stored in the transaction store even though it is an invalid parcel, but it won't receive 2f+1 attestations, so it won't be executed.

What we probably should do is on every era change, go over the parcels/attestations from the current epoch and throw out parcels/attestations like p3 that originate from nodes that aren't on the committee anymore.
But I can push this up once you merge this PR, since this is optimization/garbage collection and not critical for functionality.

Copy link
Collaborator

@matthias-wright matthias-wright left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@daltoncoder if you have time, can you read through @snormore and my discussion and our conclusions quickly, and see if it makes sense to you as well? Other than that, I'm done with my review, PR looks great to me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants