Doc: refine FAQ, add "What actions are required when a node restarts"

databendlabs · Nov 13, 2023 · 3b405c8 · 3b405c8
1 parent 85719cc
commit 3b405c8
Showing 1 changed file with 71 additions and 54 deletions.
diff --git a/openraft/src/docs/faq/faq.md b/openraft/src/docs/faq/faq.md
@@ -1,76 +1,93 @@
 # FAQ
 
--   **🤔 Why is log id a tuple of `(term, node_id, log_index)`, while standard Raft uses just
-    `(term, log_index)`**?
+### Why is log id a tuple of `(term, node_id, log_index)`?
 
-    💡 The log id `(term, node_id, log_index)` is used to minimize the chance of election conflicts.
-    This way in every term there could be more than one leaders elected, and the last one is valid.
-    See: [`leader-id`](`crate::docs::data::leader_id`) for details.
-    <br/><br/>
+In standard Raft log id is `(term, log_index)`, in Openraft he log id `(term,
+node_id, log_index)` is used to minimize the chance of election conflicts.
+This way in every term there could be more than one leaders elected, and the last one is valid.
+See: [`leader-id`](`crate::docs::data::leader_id`) for details.
+<br/>
 
 
--   **🤔 How to remove node-2 safely from a cluster `{1, 2, 3}`**?
+### How to remove node-2 safely from a cluster `{1, 2, 3}`?
 
-    💡 Call `Raft::change_membership(btreeset!{1, 3})` to exclude node-2 from
-    the cluster. Then wipe out node-2 data.
-    **NEVER** modify/erase the data of any node that is still in a raft cluster, unless you know what you are doing.
-    <br/><br/>
+Call `Raft::change_membership(btreeset!{1, 3})` to exclude node-2 from
+the cluster. Then wipe out node-2 data.
+**NEVER** modify/erase the data of any node that is still in a raft cluster, unless you know what you are doing.
+<br/>
 
 
--   **🤔 Can I wipe out the data of ONE node and wait for the leader to replicate all data to it again**?
+### What actions are required when a node restarts?
 
-    💡 Avoid doing this. Doing so will panic the leader. But it is permitted
-    if [`loosen-follower-log-revert`] feature flag is enabled.
+None. No calls, e.g., to either [`add_learner()`][] or [`change_membership()`][]
+are necessary.
 
-    In a raft cluster, although logs are replicated to multiple nodes,
-    wiping out a node and restarting it is still possible to cause data loss.
-    Assumes the leader is `N1`, followers are `N2, N3, N4, N5`:
-    - A log(`a`) that is replicated by `N1` to `N2, N3` is considered committed.
-    - At this point, if `N3` is replaced with an empty node, and at once the leader `N1` is crashed. Then `N5` may elected as a new  leader with granted vote by `N3, N4`;
-    - Then the new leader `N5` will not have log `a`.
+Openraft maintains the membership configuration in [`Membership`][] for for all
+nodes in the cluster, including voters and non-voters (learners).  When a
+`follower` or `learner` restarts, the leader will automatically re-establish
+replication.
 
-    ```text
-    Ni: Node i
-    Lj: Leader   at term j
-    Fj: Follower at term j
 
-    N1 | L1  a  crashed
-    N2 | F1  a
-    N3 | F1  a  erased          F2
-    N4 |                        F2
-    N5 |                 elect  L2
-    ----------------------------+---------------> time
-                                Data loss: N5 does not have log `a`
-    ```
+### Can I wipe out the data of ONE node and wait for the leader to replicate all data to it again?
 
-    But for even number nodes cluster, Erasing **exactly one** node won't cause data loss.
-    Thus, in a special scenario like this, or for testing purpose, you can use
-    `--feature loosen-follower-log-revert` to permit erasing a node.
-    <br/><br/>
+Avoid doing this. Doing so will panic the leader. But it is permitted
+if [`loosen-follower-log-revert`] feature flag is enabled.
 
+In a raft cluster, although logs are replicated to multiple nodes,
+wiping out a node and restarting it is still possible to cause data loss.
+Assumes the leader is `N1`, followers are `N2, N3, N4, N5`:
+- A log(`a`) that is replicated by `N1` to `N2, N3` is considered committed.
+- At this point, if `N3` is replaced with an empty node, and at once the leader
+  `N1` is crashed. Then `N5` may elected as a new  leader with granted vote by
+  `N3, N4`;
+- Then the new leader `N5` will not have log `a`.
 
--   **🤔 Is Openraft resilient to incorrectly configured clusters?**
+```text
+Ni: Node i
+Lj: Leader   at term j
+Fj: Follower at term j
 
-    💡 No, Openraft, like standard raft, cannot identify errors in cluster configuration.
+N1 | L1  a  crashed
+N2 | F1  a
+N3 | F1  a  erased          F2
+N4 |                        F2
+N5 |                 elect  L2
+----------------------------+---------------> time
+                            Data loss: N5 does not have log `a`
+```
 
-    A common error is the assigning a wrong network addresses to a node. In such
-    a scenario, if this node becomes the leader, it will attempt to replicate
-    logs to itself. This will cause Openraft to panic because replication
-    messages can only be received by a follower.
+But for even number nodes cluster, Erasing **exactly one** node won't cause data loss.
+Thus, in a special scenario like this, or for testing purpose, you can use
+`--feature loosen-follower-log-revert` to permit erasing a node.
+<br/>
 
-    ```text
-    thread 'main' panicked at openraft/src/engine/engine_impl.rs:793:9:
-    assertion failed: self.internal_server_state.is_following()
-    ```
 
-    ```ignore
-    // openraft/src/engine/engine_impl.rs:793
-    pub(crate) fn following_handler(&mut self) -> FollowingHandler<C> {
-        debug_assert!(self.internal_server_state.is_following());
-        // ...
-    }
-    ```
+### Is Openraft resilient to incorrectly configured clusters?
 
-    <br/><br/>
+No, Openraft, like standard raft, cannot identify errors in cluster configuration.
+
+A common error is the assigning a wrong network addresses to a node. In such
+a scenario, if this node becomes the leader, it will attempt to replicate
+logs to itself. This will cause Openraft to panic because replication
+messages can only be received by a follower.
+
+```text
+thread 'main' panicked at openraft/src/engine/engine_impl.rs:793:9:
+assertion failed: self.internal_server_state.is_following()
+```
+
+```ignore
+// openraft/src/engine/engine_impl.rs:793
+pub(crate) fn following_handler(&mut self) -> FollowingHandler<C> {
+    debug_assert!(self.internal_server_state.is_following());
+    // ...
+}
+```
+
+<br/>
 
 [`loosen-follower-log-revert`]: `crate::docs::feature_flags#loosen_follower_log_revert`
+
+[`add_learner()`]: `crate::Raft::add_learner`
+[`change_membership()`]: `crate::Raft::change_membership`
+[`Membership`]: `crate::Membership`