-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Snapshotting and log compaction #32
Conversation
e400fcc
to
b67d78a
Compare
65c8ba0
to
9a2f45e
Compare
@penberg This is ready for review. Please let me know your thoughts on the new methods in the |
I also opened #35 to add support for joint consensus. I think it should be addressed separately, in a different PR. |
Also, I didn't implement the logic to split Snapshots into multiple messages. Thinking about this more, I think this should be handled entirely by the network / Little Raft user, and not by the protocol itself. What do you think @penberg? |
little_raft/src/state_machine.rs
Outdated
&mut self, | ||
last_included_index: usize, | ||
last_included_term: usize, | ||
) -> Snapshot; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For ChiselStore, create_snapshot()
would make an online backup of the current SQLite database to serve as an immutable snapshot:
https://www.sqlite.org/c3ref/backup_finish.html#sqlite3backupinit
However, Little Raft now stores a Bytes
blob in Snapshot
, which seems little cumbersome to work with. Perhaps it makes more sense to add a type parameter to Snapshot
that allows Little Raft users to do whatever they want to represent the snapshot data?
offset: 0, | ||
data: snapshot.data.clone(), | ||
done: true, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me how to use this API with ChiselStore because Little Raft maintains a single snapshot and that's going to be a whole database (that can keep on growing in size). So whenever we compact the log on the leader, we have to send the whole database over?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, for clarification, log compaction happens on every node independently, not just leaders. Leaders are only special in that they might retransmit a snapshot to followers that are lagging too far behind.
Now to your question: if the whole database is the state, then it sounds like yes, you would have to transmit it. Conceptually, how else can you recover your state?
@andreev-io As this pull request is already quite large, I think we could go forward with it as-is, and address the issues I pointed out incrementally. |
This commit implements section 7 of the Raft paper, i.e. Log Compaction. Now, Replicas will merge logs into snapshots to avoid accumulating too many logs that hoard memory. Replicas now support InstallSnapshot RPCs, meaning that a Leader can detect when a Follower is so far behind that the logs it's missing have already been compacted, and then transmit the snapshot to the follower instead of transmitting a particular log entry. Non-leader nodes now know how to treat and respond to InstallSnapshot RPCs. The StateMachine trait has been extended to provide Little Raft user with ability to save and load state to and from permanent storage. The raft_unstable test has been updated to randomly drop and shuffle messages, causing Replicas to retransmit snapshots, asserting the behavior of the new InstallSnapshot RPC.
Due to GitHub quirks, this action will only run on pull requests NOT from a forked repository, which renders it mostly useless. Let's simply remove it.
9a2f45e
to
accf685
Compare
Agreed. But I also pushed a quick fixup to make Bytes a type argument. Can you take a look? One thing I don't like about this is that now you have to provide a type even if you are not using snapshotting. Any thoughts on how to make the interfaces better? |
The type argument to Snapshot looks good and should be nicely extendable. For users who don't need snapshots, maybe Little Raft could provide a "NoSnapshots" default implementation of the trait? |
This PR implements section 7 of the Raft paper, i.e. Log Compaction.
Now, Replicas will merge logs into snapshots to avoid accumulating too
many logs that hoard memory. Replicas now support InstallSnapshot RPCs,
meaning that a Leader can detect when a Follower is so far behind that
the logs it's missing have already been compacted, and then transmit the
snapshot to the follower instead of transmitting a particular log entry.
Non-leader nodes now know how to treat and respond to InstallSnapshot
RPCs.
The StateMachine trait has been extended to provide Little Raft user
with ability to save and load state to and from permanent storage. The
raft_unstable test has been updated to randomly drop and shuffle
messages, causing Replicas to retransmit snapshots, asserting the
behavior of the new InstallSnapshot RPC.