-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing log replay on single-node restart (bug?) #1246
Comments
👋 Thanks for opening this issue! Get help or engage by:
|
It appears to be a bug with Openraft. As you mentioned, the committed log ID is not updated when the node is brought up as a sole leader. Currently, the committed log ID is updated in two scenarios during node startup: either by synchronization from an existing leader or when the node itself becomes leadership and appends a new blank log entry. Let me fix it! |
When a node starts up as the Leader, it now re-applies all logs at once. Previously: - New Leader only updated IO progress - Committed log ID remained unchanged Now: - New Leader updates IO progress - Triggers update of committed log ID - Fix: databendlabs#1246
… even when committed log id not saved When a node starts up as the Leader, it now re-applies all logs at once. Even when `save_committed()` is not implemented. - Related issue: databendlabs#1246
@drmingdrmer Yes, with the update, the affected test case runs as expected :-). Thanks for a quick fix. |
… even when committed log id not saved When a node starts up as the Leader, it now re-applies all logs at once. Even when `save_committed()` is not implemented. - Related issue: databendlabs#1246
I'm not sure if it's really a bug in
openraft
, but we are seeing unapplied committed log entries in single-node mode.Our setup looks like this:
Due to a bug in our code the vote was not reconstructed properly upon restart in a single-node unit test. This led to an "election" where the node was first deemed a
Follower
withis_leader == false
and immediately afterwards changed toLeader
. At this time, theopenraft
sent alsoApply
command to the state machine worker to apply the missing log up to the end (after writing log entry 8 with new term for itself).After fixing vote recovery (reading committed vote for itself), the
is_leader
is set totrue
and also log end is detected properly (log entry 7). However, there is noApply
command sent to the state machine worker and the committed index is also not advanced (stays atNone
), so the test fails immediately because it doesn't find the expected data in the state machine.This is all with current
openraft
master and latest Rust version. It happens on all our OSes (M1/M2/M3 MacOS and x64/aarch64 Linux).I suppose, there is something missing when handling shortcuts for single-node Raft with a state machine that doesn't immediately apply entries persistently. I.e., my expectation would be that
openraft
would issueApply
command to apply the log in range (None, 7) and to set committed index to 7 as well, if it's a single-node system. I suppose, merely updating the leader state is missing, i.e., it should set the committed index to log end in this case, which would then triggerApply
command for the state machine worker and apply the missing log.@drmingdrmer Can you please check? Or are we doing something wrong?
Thanks in advance.
Here some snippets from the traces (I tried to remove the irrelevant instrumentation):
"Successful" startup (with wrong vote with node index 0, which is invalid):
afterwards, logs are applied and after we get metrics update that applied == committed, the regular operations continue and the test will find the persistent data.
After correcting the vote storage, the single-node is immediately deemed leader (which is correct), but doesn't apply logs or update committed index:
No
Apply
command is sent. Althoughopenraft
continues running in the background and "ticks", the missing previously committed entries are not applied.The text was updated successfully, but these errors were encountered: