-
Notifications
You must be signed in to change notification settings - Fork 160
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Doc: refine FAQ, add "What actions are required when a node restarts"
- Loading branch information
1 parent
85719cc
commit 3b405c8
Showing
1 changed file
with
71 additions
and
54 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,76 +1,93 @@ | ||
# FAQ | ||
|
||
- **🤔 Why is log id a tuple of `(term, node_id, log_index)`, while standard Raft uses just | ||
`(term, log_index)`**? | ||
### Why is log id a tuple of `(term, node_id, log_index)`? | ||
|
||
💡 The log id `(term, node_id, log_index)` is used to minimize the chance of election conflicts. | ||
This way in every term there could be more than one leaders elected, and the last one is valid. | ||
See: [`leader-id`](`crate::docs::data::leader_id`) for details. | ||
<br/><br/> | ||
In standard Raft log id is `(term, log_index)`, in Openraft he log id `(term, | ||
node_id, log_index)` is used to minimize the chance of election conflicts. | ||
This way in every term there could be more than one leaders elected, and the last one is valid. | ||
See: [`leader-id`](`crate::docs::data::leader_id`) for details. | ||
<br/> | ||
|
||
|
||
- **🤔 How to remove node-2 safely from a cluster `{1, 2, 3}`**? | ||
### How to remove node-2 safely from a cluster `{1, 2, 3}`? | ||
|
||
💡 Call `Raft::change_membership(btreeset!{1, 3})` to exclude node-2 from | ||
the cluster. Then wipe out node-2 data. | ||
**NEVER** modify/erase the data of any node that is still in a raft cluster, unless you know what you are doing. | ||
<br/><br/> | ||
Call `Raft::change_membership(btreeset!{1, 3})` to exclude node-2 from | ||
the cluster. Then wipe out node-2 data. | ||
**NEVER** modify/erase the data of any node that is still in a raft cluster, unless you know what you are doing. | ||
<br/> | ||
|
||
|
||
- **🤔 Can I wipe out the data of ONE node and wait for the leader to replicate all data to it again**? | ||
### What actions are required when a node restarts? | ||
|
||
💡 Avoid doing this. Doing so will panic the leader. But it is permitted | ||
if [`loosen-follower-log-revert`] feature flag is enabled. | ||
None. No calls, e.g., to either [`add_learner()`][] or [`change_membership()`][] | ||
are necessary. | ||
|
||
In a raft cluster, although logs are replicated to multiple nodes, | ||
wiping out a node and restarting it is still possible to cause data loss. | ||
Assumes the leader is `N1`, followers are `N2, N3, N4, N5`: | ||
- A log(`a`) that is replicated by `N1` to `N2, N3` is considered committed. | ||
- At this point, if `N3` is replaced with an empty node, and at once the leader `N1` is crashed. Then `N5` may elected as a new leader with granted vote by `N3, N4`; | ||
- Then the new leader `N5` will not have log `a`. | ||
Openraft maintains the membership configuration in [`Membership`][] for for all | ||
nodes in the cluster, including voters and non-voters (learners). When a | ||
`follower` or `learner` restarts, the leader will automatically re-establish | ||
replication. | ||
|
||
```text | ||
Ni: Node i | ||
Lj: Leader at term j | ||
Fj: Follower at term j | ||
|
||
N1 | L1 a crashed | ||
N2 | F1 a | ||
N3 | F1 a erased F2 | ||
N4 | F2 | ||
N5 | elect L2 | ||
----------------------------+---------------> time | ||
Data loss: N5 does not have log `a` | ||
``` | ||
### Can I wipe out the data of ONE node and wait for the leader to replicate all data to it again? | ||
|
||
But for even number nodes cluster, Erasing **exactly one** node won't cause data loss. | ||
Thus, in a special scenario like this, or for testing purpose, you can use | ||
`--feature loosen-follower-log-revert` to permit erasing a node. | ||
<br/><br/> | ||
Avoid doing this. Doing so will panic the leader. But it is permitted | ||
if [`loosen-follower-log-revert`] feature flag is enabled. | ||
|
||
In a raft cluster, although logs are replicated to multiple nodes, | ||
wiping out a node and restarting it is still possible to cause data loss. | ||
Assumes the leader is `N1`, followers are `N2, N3, N4, N5`: | ||
- A log(`a`) that is replicated by `N1` to `N2, N3` is considered committed. | ||
- At this point, if `N3` is replaced with an empty node, and at once the leader | ||
`N1` is crashed. Then `N5` may elected as a new leader with granted vote by | ||
`N3, N4`; | ||
- Then the new leader `N5` will not have log `a`. | ||
|
||
- **🤔 Is Openraft resilient to incorrectly configured clusters?** | ||
```text | ||
Ni: Node i | ||
Lj: Leader at term j | ||
Fj: Follower at term j | ||
💡 No, Openraft, like standard raft, cannot identify errors in cluster configuration. | ||
N1 | L1 a crashed | ||
N2 | F1 a | ||
N3 | F1 a erased F2 | ||
N4 | F2 | ||
N5 | elect L2 | ||
----------------------------+---------------> time | ||
Data loss: N5 does not have log `a` | ||
``` | ||
|
||
A common error is the assigning a wrong network addresses to a node. In such | ||
a scenario, if this node becomes the leader, it will attempt to replicate | ||
logs to itself. This will cause Openraft to panic because replication | ||
messages can only be received by a follower. | ||
But for even number nodes cluster, Erasing **exactly one** node won't cause data loss. | ||
Thus, in a special scenario like this, or for testing purpose, you can use | ||
`--feature loosen-follower-log-revert` to permit erasing a node. | ||
<br/> | ||
|
||
```text | ||
thread 'main' panicked at openraft/src/engine/engine_impl.rs:793:9: | ||
assertion failed: self.internal_server_state.is_following() | ||
``` | ||
|
||
```ignore | ||
// openraft/src/engine/engine_impl.rs:793 | ||
pub(crate) fn following_handler(&mut self) -> FollowingHandler<C> { | ||
debug_assert!(self.internal_server_state.is_following()); | ||
// ... | ||
} | ||
``` | ||
### Is Openraft resilient to incorrectly configured clusters? | ||
|
||
<br/><br/> | ||
No, Openraft, like standard raft, cannot identify errors in cluster configuration. | ||
|
||
A common error is the assigning a wrong network addresses to a node. In such | ||
a scenario, if this node becomes the leader, it will attempt to replicate | ||
logs to itself. This will cause Openraft to panic because replication | ||
messages can only be received by a follower. | ||
|
||
```text | ||
thread 'main' panicked at openraft/src/engine/engine_impl.rs:793:9: | ||
assertion failed: self.internal_server_state.is_following() | ||
``` | ||
|
||
```ignore | ||
// openraft/src/engine/engine_impl.rs:793 | ||
pub(crate) fn following_handler(&mut self) -> FollowingHandler<C> { | ||
debug_assert!(self.internal_server_state.is_following()); | ||
// ... | ||
} | ||
``` | ||
|
||
<br/> | ||
|
||
[`loosen-follower-log-revert`]: `crate::docs::feature_flags#loosen_follower_log_revert` | ||
|
||
[`add_learner()`]: `crate::Raft::add_learner` | ||
[`change_membership()`]: `crate::Raft::change_membership` | ||
[`Membership`]: `crate::Membership` |