Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re call add_learner for a restarted node don't sync the state (snapshot). #927

Closed
TheWaWaR opened this issue Nov 11, 2023 · 7 comments
Closed
Assignees

Comments

@TheWaWaR
Copy link

TheWaWaR commented Nov 11, 2023

openraft version: 0.8.3

I have a 3 nodes memory storage cluster setup by following steps:

  • start node 1, 2, 3
  • init node 1
  • add_learner(1)
  • add_learner(2)
  • add_learner(3)
  • change_membership([1, 2, 3])
  • write(1, [(key1, value1), (key2, value2), ...])

Then restart node-3 and call add_learner(3) for it, I can see node-3 got logs from node-1 but the the apply_to_state_machine() function not called, so the state is not synced.

So, my question is what's the proper way to restart a node and sync state from the leader?

NOTE: If I call change_membership(RemoveNodes([3])) and re-add it, it will works just as expected.

Copy link

👋 Thanks for opening this issue!

Get help or engage by:

  • /help : to print help messages.
  • /assignme : to assign this issue to you.

@drmingdrmer drmingdrmer self-assigned this Nov 13, 2023
@drmingdrmer
Copy link
Member

You don't need to invoke add_learner() again when rebooting a node that has previously been part of a cluster.

Once a node has been incorporated into a cluster, no further action is required upon restart.

As for the problem regarding logs being replicated but not applied, could you supply the source code of the application and the debug-level logs for us to investigate further?

@TheWaWaR
Copy link
Author

TheWaWaR commented Nov 13, 2023

Just use raft-kv-memstore example will reproduce the issue.

export RUST_LOG=debug
nohup ./target/debug/raft-key-value  --id 1 --http-addr 127.0.0.1:21001 > n1.log &
nohup ./target/debug/raft-key-value  --id 2 --http-addr 127.0.0.1:21002 > n2.log &
nohup ./target/debug/raft-key-value  --id 3 --http-addr 127.0.0.1:21003 > n3.log &

sleep 2

curl "127.0.0.1:21001/init" -H "Content-Type: application/json" -d "{}"
sleep 1
curl "127.0.0.1:21001/add-learner" -H "Content-Type: application/json" -d '[2, "127.0.0.1:21002"]'
sleep 1
curl "127.0.0.1:21001/add-learner" -H "Content-Type: application/json" -d '[3, "127.0.0.1:21003"]'
sleep 1
curl "127.0.0.1:21001/change-membership" -H "Content-Type: application/json" -d '[1, 2, 3]'
sleep 1

curl "127.0.0.1:21001/write" -H "Content-Type: application/json" -d '{"Set":{"key":"foo","value":"bar"}}'
sleep 1
curl "127.0.0.1:21003/read" -H "Content-Type: application/json" -d '"foo"'

curl "127.0.0.1:21001/write" -H "Content-Type: application/json" -d '{"Set":{"key":"foo","value":"zoo"}}'
sleep 1
curl "127.0.0.1:21003/read" -H "Content-Type: application/json" -d '"foo"'

Then kill 127.0.0.1:21003

Then restart it wait some seconds and read "foo" from it:

nohup ./target/debug/raft-key-value  --id 3 --http-addr 127.0.0.1:21003 > n3.log &

sleep 5

curl "127.0.0.1:21003/read" -H "Content-Type: application/json" -d '"foo"'
{"Ok":""}

As you can see read "foo" returns {"Ok":""} which means the state is not synced.

@drmingdrmer
Copy link
Member

Memstore does not persist data to disk. All state on node-3 will be lost upon restart.

And raft does not guarantee the state will be correctly synchronized when there is data loss.

In your scenario, the Leader still believes all raft-logs are replicated to node-3, and won't replicate any other data.

By removing node-3 and then re-add it. The replication state is reset and the leader forgets the previous state about that "logs are already replication to node-3" and starts to re-send all logs.

There is a FAQ(not yet published) discussing about this issue(https://github.com/datafuselabs/openraft/blob/main/openraft/src/docs/faq/faq.md):

  • 🤔 Can I wipe out the data of ONE node and wait for the leader to replicate all data to it again?

    💡 Avoid doing this. Doing so will panic the leader. But it is permitted
    if [loosen-follower-log-revert] feature flag is enabled.

    In a raft cluster, although logs are replicated to multiple nodes,
    wiping out a node and restarting it is still possible to cause data loss.
    Assumes the leader is N1, followers are N2, N3, N4, N5:

    • A log(a) that is replicated by N1 to N2, N3 is considered committed.
    • At this point, if N3 is replaced with an empty node, and at once the leader N1 is crashed. Then N5 may elected as a new leader with granted vote by N3, N4;
    • Then the new leader N5 will not have log a.
    Ni: Node i
    Lj: Leader   at term j
    Fj: Follower at term j
    
    N1 | L1  a  crashed
    N2 | F1  a
    N3 | F1  a  erased          F2
    N4 |                        F2
    N5 |                 elect  L2
    ----------------------------+---------------> time
                                Data loss: N5 does not have log `a`
    

    But for even number nodes cluster, Erasing exactly one node won't cause data loss.
    Thus, in a special scenario like this, or for testing purpose, you can use
    --feature loosen-follower-log-revert to permit erasing a node.


@TheWaWaR
Copy link
Author

TheWaWaR commented Nov 13, 2023

Even if node-3 have disk storage, what if when node-3 received the last-n logs from node-1 and node-3 starting to save the state to disk, at this point node-3 crashed and the last-n logs lost? Is node-1 still see the last-n logs replicated to node-3?

I think there maybe need a communication process, if node-3's last log index is small than node-1 should allow node-1 sync state to node-3.

If I understand correctly, remove node-3 and re-add node-3 is the only way to sync state from node-1?

@drmingdrmer
Copy link
Member

Raft can not provide any guarantee when any data is lost. It's undefined behavior.

As I mentioned previously, --feature loosen-follower-log-revert enables the ability for a follower's losing logs.

@TheWaWaR
Copy link
Author

TheWaWaR commented Nov 13, 2023

I see. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants