Refactor snapshot #896

zach-schoenberger · 2023-07-16T14:21:56Z

Checklist

Updated guide with pertinent info (may not always apply).
Squash down commits to one or two logical commits which clearly describe the work you've done.
Unittest is a friend:)

This change is

drmingdrmer

Thank you for sharing this draft design; it provides a clear picture.

The traits SnapshotManifest and SnapshotData seem to exhibit similar or
overlapping functionalities. IMHO, merging these two into a
single trait could streamline the code and eliminate redundancy.

Additionally, RaftTypeConfig appears to include three unnecessary type
definitions. A more efficient approach could be to associate these types with
the SnapshotData trait instead:

// Remove extraneous type declarations from RaftTypeConfig
trait RaftTypeConfig {
    // Other types...
    type SnapshotData: SnapshotData;
}

trait SnapshotData {
    // Other associated types...
    type SnapshotChunkId;
    type SnapshotChunk;
    type SnapshotManifest;
}

Opinions?

Reviewed 1 of 17 files at r1.
Reviewable status: 1 of 17 files reviewed, all discussions resolved

schreter

Opinions?

I'd also prefer to have associated types with SnapshotData to prevent type explosion in RaftTypeConfig :-).

Regarding manifest vs. data types, I don't know whether it's better to have them same or different types.

Reviewed 13 of 17 files at r1, 4 of 4 files at r2, all commit messages.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @zach-schoenberger)

openraft/src/raft_types.rs line 1 at r2 (raw file):

use std::fmt::Debug;

Seems unused

openraft/src/type_config.rs line 60 at r2 (raw file):

    type Entry: RaftEntry<Self::NodeId, Self::Node> + FromAppData<Self::D>;

    type SnapshotChunkId: Eq + PartialEq + Send + Sync + Display + Debug + OptionalSerde + 'static;

As suggested above, use these as associated types from SnapshotData...

openraft/src/type_config.rs line 69 at r2 (raw file):

    /// See the [storage chapter of the guide](https://datafuselabs.github.io/openraft/getting-started.html#implement-raftstorage)
    /// for details on where and how this is used.
    type SnapshotData: SnapshotData<ChunkId = Self::SnapshotChunkId, Manifest = Self::SnapshotManifest, Chunk = Self::SnapshotChunk>

...then you also don't need to explicitly reference them here.

Instead, you'd reference C::SnapshotData::ChunkId and the like elsewhere (associated types are already introduced, so why not use them directly?).

openraft/src/core/snapshot_state.rs line 8 at r2 (raw file):

#[derive(Debug, Clone)]
#[derive(PartialEq, Eq)]
pub(crate) struct SnapshotRequestId<NID: NodeId, C> {

C is a bit non-descriptive and normally used for type config. I suggest to use the same name (or at least a more descriptive one) for generics as in the associated type, i.e., ChunkId.

openraft/src/core/sm/mod.rs line 285 at r2 (raw file):

                None => {
                    tracing::error!("should never happen");
                    false

Better error handling?

It should probably return an unexpected error or so. Also the message is too non-descriptive.

Code quote:

                    tracing::error!("should never happen");
                    false

zach-schoenberger

Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @schreter)

openraft/src/raft_types.rs line 1 at r2 (raw file):

Previously, schreter wrote…

Seems unused

It wasn't showing as an unused import, but will double check.

openraft/src/type_config.rs line 69 at r2 (raw file):

Previously, schreter wrote…

...then you also don't need to explicitly reference them here.

Instead, you'd reference C::SnapshotData::ChunkId and the like elsewhere (associated types are already introduced, so why not use them directly?).

Yep. I like the idea. I'll update this setup.

openraft/src/core/snapshot_state.rs line 8 at r2 (raw file):

Previously, schreter wrote…

C is a bit non-descriptive and normally used for type config. I suggest to use the same name (or at least a more descriptive one) for generics as in the associated type, i.e., ChunkId.

Good point. It should really be the RaftTypeConfig and replace both generics.

openraft/src/core/sm/mod.rs line 285 at r2 (raw file):

Previously, schreter wrote…

Better error handling?

It should probably return an unexpected error or so. Also the message is too non-descriptive.

This was just a place holder. The error system is pretty rigid so I hadn't had a chance to figure out what should be used. I'll look into what makes sense..

zach-schoenberger · 2023-07-18T13:05:55Z

@drmingdrmer I originally tried to keep the SnapshotManifest and SnapshotData as one, but I couldn't find a way that made any sense to me. I think looking at the test implementation helps with the why. (this is a very immature implementation, no effort for optimizations) https://github.com/zach-schoenberger/openraft/blob/refactor_snapshot/openraft/src/engine/testing.rs#L13-L128

This setup views the manifest as a light weight entity that contains only the minimal amount of data to know if a snapshot is complete or not. If we combined the traits, it would be the same as combining these structs:

#[cfg_attr(feature = "serde", derive(serde::Deserialize, serde::Serialize))]
pub(crate) struct VecManifest {
    pub chunks: BTreeSet<VecChunkId>,
}

#[derive(Clone)]
pub(crate) struct VecSnapshot {
    pub len: usize,
    pub data: Vec<u8>,
}

// combined
#[derive(Clone)]
pub(crate) struct VecSnapshot {
    pub len: usize,
    pub data: Vec<u8>,
    pub chunks: BTreeSet<VecChunkId>,
}

So what happens is that the SnapshotData struct always needs 2 structs internally: one for the data and one for the manifest. So this would require every API user to find a way to not send both entities when serializing since SnapshotData is what would be sent in the manifest request.

Of course I could be completely missing something. So any ideas on how to address that issue are welcome :)

drmingdrmer · 2023-07-19T05:30:05Z

I originally tried to keep the SnapshotManifest and SnapshotData as one, but I couldn't find a way that made any sense to me. I think looking at the test implementation helps with the why. (this is a very immature implementation, no effort for optimizations) https://github.com/zach-schoenberger/openraft/blob/refactor_snapshot/openraft/src/engine/testing.rs#L13-L128

I agree. SnapshotData and Manifest should be two traits.

And I found that there is no need to introduce enum InstallSnapshotData{ Manifest(), Chunk() }, and Manifest can be removed.

The only thing an application needs is

pub trait SnapshotData: Send + Sync {
    type Chunk: SnapshotChunk;
    type ChunkIter: Iterator<Item=Self::Chunk>;

    // For leader to get all chunks to send
    async fn chunk_iter(&self) -> Result<Self::ChunkIter, AnyError>;

    // For follower to receive a chunk and return `true` when finished.
    async fn receive(&mut self, c: Self::Chunk) -> Result<bool, AnyError>;
}

The application uses an enum of Data and Manifest chunks through the Chunk implementation, if necessary. The chunk_iter() function returns the manifest-chunk as the first element. Here is the Rust code for it:

enum AppChunk {
    Data(D),
    Manifest(M),
}

Make sense?

schreter

Make sense?

From my PoV, that should be sufficient.

The only issue I see that this needs an async iterator for ChunkIter, not a synchronous one - you don't want to force reading all the data needed for the snapshot at the time of chunk_iter() call (OTOH, the Chunk could be also just some metadata object which will read the data at the time of sending them, which is async).

Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @zach-schoenberger)

zach-schoenberger · 2023-07-19T13:19:45Z

The suggestion above is essentially what I did the first time when using a Stream. I actually think doing it that way will lose a lot of what makes your original suggestion for this pr nice. if we just return the iterator, then it really complicates the retry. The code would have to iterate over the iterator, mark which ones succeeded locally, and then try to iterate again and again until all chunks were sent. And on the receive side: it would need at least a finalize to be sent after all chunks were successful. And in the example you give using AppChunk enum, there is no guarantee it will be received first (based on the other PR where we say the goal is to send chunks in parallel).

Of course I would still love to make this super simple and put all this logic on the API user. Just have the network interface have a simple send_snapshot_data that takes the SnapshotData and let the user figure out how to send it and create the InstallSnapshot request on the client side to submit to the raft core.

schreter

if we just return the iterator, then it really complicates the retry.

You are right, retry is an important topic.

Of course I would still love to make this super simple

Mhm... What about initializing the iterator with chunk ID? I.e., something on these lines (not syntactically correct):

pub trait SnapshotData: Send + Sync {
    type Chunk: SnapshotChunk;
    type ChunkIter: Iterator<Item=impl Future<Output = Self::Chunk>>;
    async fn chunks_to_send(&self, restart_marker: Option<Self::Chunk::ChunkId>) -> Result<Self::ChunkIter, AnyError>;
    async fn receive_chunk(&mut self, c: Self::Chunk) -> Result<bool, AnyError>;
}

where

trait SnapshotChunk {
    type ChunkId: Sized + Send;
    fn id(&self) -> Self::ChunkId;
}

When the sending fails, you can request all chunks starting from the chunk ID of the last chunk ID confirmed by the receiver.

BTW, I don't particularly like having chunk receive on SnapshotData. I'd suggest to move it to a dedicated SnapshotReceiver trait and make it convertible into SnapshotData after all chunks are received. I.e., you'll start receiving into a new SnapshotReceiver and convert to (immutable) SnapshotData afterwards, keeping SnapshotData immutable at all times.

Otherwise, what would happen if by mistake someone calls SnapshotData::receive_chunk?

Opinions?

Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @zach-schoenberger)

drmingdrmer · 2023-07-20T13:14:05Z

@zach-schoenberger @schreter
Building upon you two guys' suggestion, let's forget the simplified Stream like design and stick to the Manifest based design, right?
Since in order to efficiently re-send failed chunk, Openraft must understand the
structure of a SnapshotData in some way, and the structure will be defined by a Manifest.

I was expecting a Manifest to behave like a set of ChunkId, which supports adding or removing a ChunkId, so that it can be used to track the progress of received chunks on both sides. That's why I proposed to build series of chunk to send from the difference between the leader's Manifest and an empty Manifest:

// Pseudo code for the leader to send snapshot.
fn leader_sending_snapshot() {
    let snapshot = self.get_current_snapshot()
    let manifest: MyManifest = snapshot.manifest();

    // Use a Manifest for tracking progress.
    // The leader expects the receiver to respond with its already received Manifest.
    let received: MyManifest = self.send_install_snapshot(InstallSnapshotRequest::Manifest(manifest.clone())).await?;

    // When retrying sending chunks, pass in a partially finished Manifest:
    let chunks = manifest.chunks_to_send(received.clone());

    for c in chunks {
        self.send_install_snapshot(InstallSnapshotRequest::Chunk(c)).await?;
        // Track the progress if needed
        // received.add(c.chunk_id()).await;
    }
}

I'd suggest to move it to a dedicated SnapshotReceiver trait and make it convertible into SnapshotData after all chunks are received.

SnapshotReceiver is a good idea!

schreter · 2023-07-20T15:40:57Z

Building upon you two guys' suggestion, let's forget the simplified Stream like design and stick to the Manifest based design, right?

OK for me.

zach-schoenberger · 2023-07-20T21:35:14Z

@drmingdrmer Awesome, that's exactly how I implemented the manifest in the PR. I'm not seeing any differences API wise above from the PR besides adding SnapshotReceiver, which sounds good to me.

drmingdrmer

Reviewed 3 of 17 files at r1, 1 of 7 files at r3, all commit messages.
Reviewable status: 12 of 18 files reviewed, 6 unresolved discussions (waiting on @schreter and @zach-schoenberger)

openraft/src/core/streaming_state.rs line 24 at r3 (raw file):

    pub(crate) streaming_data: Box<RTCSnapshotData<C>>,

    pub(crate) manifest: RTCSnapshotManifest<C>,

Is there an assumption that Manifest can be retrieved from SnapshotData?
IMHO, Manifest is part of SnapshotData.

If it is, there is no need to store a separate Manifest.

Code quote:

    pub(crate) streaming_data: Box<RTCSnapshotData<C>>,

    pub(crate) manifest: RTCSnapshotManifest<C>,

schreter

Reviewed 7 of 7 files at r3, all commit messages.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @zach-schoenberger)

openraft/src/type_config.rs line 56 at r3 (raw file):

    type Entry: RaftEntry<Self::NodeId, Self::Node> + FromAppData<Self::D>;

    // type SnapshotChunkId: Eq + PartialEq + Send + Sync + Display + Debug + OptionalSerde + 'static;

Don't forget to remove commented-out code in the final change.

openraft/src/type_config.rs line 75 at r3 (raw file):

pub type RTCSnapshotChunkId<C> = <<C as RaftTypeConfig>::SnapshotData as SnapshotData>::ChunkId;
pub type RTCSnapshotChunk<C> = <<C as RaftTypeConfig>::SnapshotData as SnapshotData>::Chunk;
pub type RTCSnapshotManifest<C> = <<C as RaftTypeConfig>::SnapshotData as SnapshotData>::Manifest;

Are these aliases relevant also for the implementor of the code? Probably less so.

Maybe make crate-private?

If they indeed make sense for public use (which I doubt), then at least they need appropriate doc comments.

BTW, you used these aliases in a few API functions. I'd suggest to use the full type instead, since it will be confusing to the user, I think. The full type is clear.

Suggestion:

pub(crate) type RTCSnapshotData<C> = <C as RaftTypeConfig>::SnapshotData;
pub(crate) type RTCSnapshotChunkId<C> = <<C as RaftTypeConfig>::SnapshotData as SnapshotData>::ChunkId;
pub(crate) type RTCSnapshotChunk<C> = <<C as RaftTypeConfig>::SnapshotData as SnapshotData>::Chunk;
pub(crate) type RTCSnapshotManifest<C> = <<C as RaftTypeConfig>::SnapshotData as SnapshotData>::Manifest;

openraft/src/raft/message/client_write.rs line 3 at r3 (raw file):

use std::fmt::Debug;

#[cfg(feature = "serde")] use crate::AppDataResponse;

This change doesn't seem right.

zach-schoenberger

Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @schreter)

openraft/src/type_config.rs line 75 at r3 (raw file):

Previously, schreter wrote…

Are these aliases relevant also for the implementor of the code? Probably less so.

Maybe make crate-private?

If they indeed make sense for public use (which I doubt), then at least they need appropriate doc comments.

BTW, you used these aliases in a few API functions. I'd suggest to use the full type instead, since it will be confusing to the user, I think. The full type is clear.

You are right, they likely do not need to be exposed outside of the crate. Although i would argue it is a lot easier to read/understand what the API functions are receiving using them than the expanded form. The full type is just extremely verbose.

openraft/src/raft/message/client_write.rs line 3 at r3 (raw file):

Previously, schreter wrote…

This change doesn't seem right.

when looking at the rest of the code, this import is only used when serde is configured. it generates an unused import warning otherwise.

schreter

@zach-schoenberger You might want to close comments you don't want to address and address open comments you want to address. GitHub doesn't give you the visibility - I'd suggest you use Reviewable to see immediately what's still open.

Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @drmingdrmer and @zach-schoenberger)

openraft/src/type_config.rs line 75 at r3 (raw file):

Previously, zach-schoenberger wrote…

You are right, they likely do not need to be exposed outside of the crate. Although i would argue it is a lot easier to read/understand what the API functions are receiving using them than the expanded form. The full type is just extremely verbose.

Well, in the end, up to you. I'd still go with verbose type for APIs :-).

openraft/src/core/streaming_state.rs line 24 at r3 (raw file):

Previously, drmingdrmer (张炎泼) wrote…

Is there an assumption that Manifest can be retrieved from SnapshotData?
IMHO, Manifest is part of SnapshotData.

If it is, there is no need to store a separate Manifest.

I don't know whether this is an almost-done PR, but if it is, it would be nice to have also doc comments on newly-added items explaining their purpose (and fix comments on the updated ones - like above on snapshot_meta).

openraft/src/raft/message/client_write.rs line 3 at r3 (raw file):

Previously, zach-schoenberger wrote…

when looking at the rest of the code, this import is only used when serde is configured. it generates an unused import warning otherwise.

Ah, OK, that's another story.

But, a) it's unrelated, so you might want to push a separate single-line PR and b) the formatting is likely off (I suppose, the use statement should be on a separate line - but not 100% sure about current rustfmt settings). Do you have rustfmt on save activated?

drmingdrmer

Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @schreter and @zach-schoenberger)

openraft/src/raft/message/client_write.rs line 3 at r3 (raw file):

Previously, schreter wrote…

Ah, OK, that's another story.

But, a) it's unrelated, so you might want to push a separate single-line PR and b) the formatting is likely off (I suppose, the use statement should be on a separate line - but not 100% sure about current rustfmt settings). Do you have rustfmt on save activated?

Because I added inline_attribute_width = 80 to rustfmt.toml. In order to reduce number of lines. 🤔

schreter

Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @drmingdrmer and @zach-schoenberger)

openraft/src/raft/message/client_write.rs line 3 at r3 (raw file):

Previously, drmingdrmer (张炎泼) wrote…

Because I added inline_attribute_width = 80 to rustfmt.toml. In order to reduce number of lines. 🤔

OK, it just looks strange to me, since I've never seen it like that in another project 🙂. I wouldn't be worried about number of lines, since whether it's space or linefeed, the compiler has exactly the same work to do. The human, however, has somewhat less cognitive load if it's split. But, it's up to you 🙂.

zach-schoenberger · 2023-07-25T13:12:08Z

Since there are a lot of tests and example crates in this repo that use Cursor<Vec<u8>> as the snapshot data, should we provide a default implementation of SnapshotData and the other traits to use where cursor is being used?

drmingdrmer · 2023-07-25T15:16:08Z

Since there are a lot of tests and example crates in this repo that use Cursor<Vec<u8>> as the snapshot data, should we provide a default implementation of SnapshotData and the other traits to use where cursor is being used?

Yes, it would be great if the changes required to an application is minimized. 👍

And from my perspective, the Manifest for a Cursor<Vec<u8>> could simply be two integers: (total_length, received_offset). However, this type of SnapshotData does not support parallel transfers.

zach-schoenberger · 2023-07-25T18:41:30Z

I wouldn't want to do it for Cursor directly since it would require hard coding a chunk size and users may want to use Cursor themselves. I was meaning something closer to making something VecSnapshot in the test are public. With name like ExampleSnapshot.

schreter · 2023-10-10T19:50:06Z

Any update on this one?

drmingdrmer · 2023-10-10T23:18:29Z

Any update on this one?

@zach-schoenberger
If you're busy with other matters, I can take over this task.

zach-schoenberger · 2023-10-15T14:26:24Z

Sorry just saw this. I got pretty busy and didn't have a chance to work on this. I'll try to finish up now.

zach-schoenberger · 2023-10-17T13:52:59Z

Hi @drmingdrmer, i figure it might be quicker to just ask first about a test failure I'm seeing. One of the tests for snapshot install is failing from what looks like a race condition between the assert and the snapshot installation. Here are the logs:

2023-10-17T13:39:16.702547Z  INFO ThreadId(05) openraft::core::sm@worker_loop#d800c000000001: openraft::core::sm@install_snapshot#140004000000004: new
2023-10-17T13:39:16.702586Z  INFO ThreadId(05) openraft::core::sm@worker_loop#d800c000000001: openraft::core::sm@install_snapshot#140004000000004: enter
2023-10-17T13:39:16.702858Z  INFO ThreadId(05) openraft::core::sm@worker_loop#d800c000000001: openraft::core::sm@install_snapshot#140004000000004: installing snapshot: SnapshotMeta { last_log_id: Some(LogId { leader_id: LeaderId { term: 5, node_id: 0 }, index: 9 }), last_membership: StoredMembership { log_id: Some(LogId { leader_id: LeaderId { term: 0, node_id: 0 }, index: 0 }), membership: Membership { configs: [{0}], nodes: {0: ()} } }, snapshot_id: "5-0-9-2" }
2023-10-17T13:39:16.702874Z ERROR ThreadId(02) snapshot_streaming::t33_snapshot_delete_conflict_logs@snapshot_delete_conflicting_logs#1: panicked at 'assertion failed: `(left == right)`
  left: `Membership { configs: [{0}], nodes: {0: ()} }`,
 right: `Membership { configs: [{2, 3}], nodes: {2: (), 3: ()} }`: membership should be overridden by the snapshot', tests/tests/snapshot_streaming/t33_snapshot_delete_conflict_logs.rs:194:9 backtrace=backtrace is disabled without --features 'bt' panic.file="tests/tests/snapshot_streaming/t33_snapshot_delete_conflict_logs.rs" panic.line=194 panic.column=9
2023-10-17T13:39:16.703506Z  INFO ThreadId(05) openraft::core::sm@worker_loop#d800c000000001: openraft::core::sm@install_snapshot#140004000000004: decoding snapshot for installation snapshot_size=275
2023-10-17T13:39:16.703788Z DEBUG ThreadId(05) openraft::core::sm@worker_loop#d800c000000001: openraft::core::sm@install_snapshot#140004000000004: SNAP META:SnapshotMeta { last_log_id: Some(LogId { leader_id: LeaderId { term: 5, node_id: 0 }, index: 9 }), last_membership: StoredMembership { log_id: Some(LogId { leader_id: LeaderId { term: 0, node_id: 0 }, index: 0 }), membership: Membership { configs: [{0}], nodes: {0: ()} } }, snapshot_id: "5-0-9-2" }
2023-10-17T13:39:16.703906Z DEBUG ThreadId(05) openraft::core::sm@worker_loop#d800c000000001: openraft::core::sm@install_snapshot#140004000000004: JSON SNAP DATA:{"last_applied_log":{"leader_id":{"term":5,"node_id":0},"index":9},"last_membership":{"log_id":{"leader_id":{"term":0,"node_id":0},"index":0},"membership":{"configs":[[0]],"nodes":{"0":null}}},"client_serial_responses":{"0":[7,"request-6"]},"client_status":{"0":"request-7"}}
2023-10-17T13:39:16.704831Z  INFO ThreadId(05) openraft::core::sm@worker_loop#d800c000000001: openraft::core::sm@install_snapshot#140004000000004: Done install_snapshot, meta: SnapshotMeta { last_log_id: Some(LogId { leader_id: LeaderId { term: 5, node_id: 0 }, index: 9 }), last_membership: StoredMembership { log_id: Some(LogId { leader_id: LeaderId { term: 0, node_id: 0 }, index: 0 }), membership: Membership { configs: [{0}], nodes: {0: ()} } }, snapshot_id: "5-0-9-2" }
2023-10-17T13:39:16.704877Z  INFO ThreadId(05) openraft::core::sm@worker_loop#d800c000000001: openraft::core::sm@install_snapshot#140004000000004: exit
2023-10-17T13:39:16.704928Z  INFO ThreadId(05) openraft::core::sm@worker_loop#d800c000000001: openraft::core::sm@install_snapshot#140004000000004: close time.busy=2.29ms time.idle=91.4µs

I'm not sure why the changes would cause the snapshot install to take longer... or if it some other aspect of how the test is setup to look for the changes. If it's something that jumps out at you please let me know, otherwise I'll keep poking around to see what I can find.

EDIT:
I forgot to add the test itself... It can be run with cargo test --package tests --test snapshot_streaming -- t33_snapshot_delete_conflict_logs::snapshot_delete_conflicting_logs --exact --nocapture

drmingdrmer · 2023-10-18T00:44:36Z

One of the tests for snapshot install is failing from what looks like a race condition between the assert and the snapshot installation.

@zach-schoenberger
You are right. It is a race condition. The in-memory state about snapshot is already updated before actually installing snapshot to RaftStorage. Thus metrics may be ahead of the storage stage.

👉🏻1 Update in-memory state when installing snapshot:
https://github.com/datafuselabs/openraft/blob/496a0f87fb6f03b640f19987ad4ea549c75a6a8f/openraft/src/engine/handler/following_handler/mod.rs#L320

and then:
https://github.com/datafuselabs/openraft/blob/6098f5cf61b074c8fccecf722278183c3ee5cd27/openraft/src/engine/handler/snapshot_handler/mod.rs#L63

👉🏻2 Metrics is updated to the in-memory state. At this time point, the install-snapshot command may not have reached RaftStorage:
https://github.com/datafuselabs/openraft/blob/942ec78bc367665ce82516ad3458e479d895b0ea/openraft/src/core/raft_core.rs#L524

The solution should be split the snapshot state into two: the expected state when all io-commands are executed and the actual state of the RaftStorage. The storage state should be updated at this point, when notification about a snapshot being installed is received.:
https://github.com/datafuselabs/openraft/blob/942ec78bc367665ce82516ad3458e479d895b0ea/openraft/src/core/raft_core.rs#L1319-L1329

Let me fix this first

drmingdrmer

Reviewable status: 6 of 33 files reviewed, 3 unresolved discussions (waiting on @schreter and @zach-schoenberger)

openraft/src/type_config.rs line 75 at r3 (raw file):

Previously, zach-schoenberger wrote…

I've used other crates that follow this kind of pattern to make it easier to use the crate. rkyv being the main one I can think of now. Otherwise users will generally get it wrong at first and then just guess and check what the compiler wants (at least thats what even I had to do when making these changes). I find them a pretty big quality of life improvement. But I'll use whatever you and @drmingdrmer would prefer.

Yes these aliases are life changer. I use such aliases in my repo too. And my alias does not have type parameter C because in my repo C is fixed.

Therefore I'd prefer having them private too. Application developers may have their own simpler way defining alias.

drmingdrmer

Thank you man!
I have thoroughly reviewed the changes and overall, they are quite good.
However, before we go into a detailed discussion, it would be beneficial to first resolve the ongoing discussions that were started during previous reviews.

Reviewed 3 of 15 files at r4, 3 of 12 files at r6, 5 of 11 files at r7, 1 of 1 files at r8, all commit messages.
Reviewable status: 18 of 33 files reviewed, 3 unresolved discussions (waiting on @schreter and @zach-schoenberger)

openraft/src/core/sm/mod.rs line 285 at r2 (raw file):

Previously, zach-schoenberger wrote…

This was just a place holder. The error system is pretty rigid so I hadn't had a chance to figure out what should be used. I'll look into what makes sense..

If it receives a chunk before receiving a Manifest, it is a protocol error, in which it should notify the sender about this error. And in order to receive such an error, in the receive-chunk response: sm::Response::ReceiveSnapshotChunk(meta), meta should be a Result<Option<_>, E>.

zach-schoenberger · 2023-10-19T14:18:13Z

openraft/src/core/streaming_state.rs line 24 at r3 (raw file):

Previously, schreter wrote…

I don't know whether this is an almost-done PR, but if it is, it would be nice to have also doc comments on newly-added items explaining their purpose (and fix comments on the updated ones - like above on snapshot_meta).

I'm really not sure what approach makes the most sense here. It seems really cumbersome to have the user impl return the snapshotdata for population that would also then have its own manifest (since manifest is part of snapshotdata), that we have to overwrite with the one sent over the wire.

Also if manifest were a part of the snapshot, it would make more sense to let the calls to snapshotdata::receive call the manifest::receive but that lives in user land. I could see people not reading the docs and using it incorrectly. If you really want to combine them I can, just for my mental model I like having them separate.

zach-schoenberger · 2023-10-19T14:18:51Z

openraft/src/type_config.rs line 75 at r3 (raw file):

Previously, drmingdrmer (张炎泼) wrote…

Yes these aliases are life changer. I use such aliases in my repo too. And my alias does not have type parameter C because in my repo C is fixed.

Therefore I'd prefer having them private too. Application developers may have their own simpler way defining alias.

These are private (like in the suggestion) in the current state of the PR.

zach-schoenberger · 2023-10-19T14:20:59Z

openraft/src/core/sm/mod.rs line 285 at r2 (raw file):

Previously, drmingdrmer (张炎泼) wrote…

If it receives a chunk before receiving a Manifest, it is a protocol error, in which it should notify the sender about this error. And in order to receive such an error, in the receive-chunk response: sm::Response::ReceiveSnapshotChunk(meta), meta should be a Result<Option<_>, E>.

Sorry, I'm not sure I follow. The return type is Result<Option<SnapshotMeta<C::NodeId, C::Node>>, StorageError<C::NodeId>>. None of the StorageError Violation options available make much sense for this state (assuming using a defensive error makes sense). Should I be adding a new one?

drmingdrmer

Reviewable status: 18 of 33 files reviewed, 2 unresolved discussions (waiting on @schreter and @zach-schoenberger)

openraft/src/core/streaming_state.rs line 24 at r3 (raw file):

Previously, zach-schoenberger wrote…

I'm really not sure what approach makes the most sense here. It seems really cumbersome to have the user impl return the snapshotdata for population that would also then have its own manifest (since manifest is part of snapshotdata), that we have to overwrite with the one sent over the wire.

Also if manifest were a part of the snapshot, it would make more sense to let the calls to snapshotdata::receive call the manifest::receive but that lives in user land. I could see people not reading the docs and using it incorrectly. If you really want to combine them I can, just for my mental model I like having them separate.

As far as I see in your design, if Manifest and SnapshotData are separated, when all chunks are received and it is about to install the snapshot, Openraft must pass the Manifest to RaftStateMachine::install_snapshot().

Correctness relying on user implementation to call Manifest::receive() in SnapshotData::receive() is unacceptable to me either.

If there is an example implementation of the new snapshot API, it would be more clear.

openraft/src/core/sm/mod.rs line 285 at r2 (raw file):

Previously, zach-schoenberger wrote…

Sorry, I'm not sure I follow. The return type is Result<Option<SnapshotMeta<C::NodeId, C::Node>>, StorageError<C::NodeId>>. None of the StorageError Violation options available make much sense for this state (assuming using a defensive error makes sense). Should I be adding a new one?

My apology. It should have already checked whether a manifest is already received, in FollowerHandler::receive_snapshot_chunk.

Then if self.streaming is None, it's an severe bug, and a panic() would be more appropriate than just an error logging.

zach-schoenberger · 2023-10-20T02:00:11Z

openraft/src/core/streaming_state.rs line 24 at r3 (raw file):

Previously, drmingdrmer (张炎泼) wrote…

As far as I see in your design, if Manifest and SnapshotData are separated, when all chunks are received and it is about to install the snapshot, Openraft must pass the Manifest to RaftStateMachine::install_snapshot().

Correctness relying on user implementation to call Manifest::receive() in SnapshotData::receive() is unacceptable to me either.

If there is an example implementation of the new snapshot API, it would be more clear.

Sorry I think this was an old comment thread. I don’t have any updates in mind. Sorry for the confusion.

zach-schoenberger · 2023-10-20T02:00:54Z

openraft/src/core/sm/mod.rs line 285 at r2 (raw file):

Previously, drmingdrmer (张炎泼) wrote…

My apology. It should have already checked whether a manifest is already received, in FollowerHandler::receive_snapshot_chunk.

Then if self.streaming is None, it's an severe bug, and a panic() would be more appropriate than just an error logging.

Awesome. Panic makes sense to me.

zach-schoenberger · 2023-10-20T12:35:07Z

openraft/src/core/sm/mod.rs line 285 at r2 (raw file):

Previously, zach-schoenberger wrote…

Awesome. Panic makes sense to me.

I updated and ran locally having this panic, but it broke the testing setup in t10_api_install_snapshot::snapshot_argument. This test seems to be testing exactly this scenario: sending chunks before a manifest. I think it's still a valid test, but not sure how we would want to structure it with a panic.

drmingdrmer

Reviewable status: 18 of 33 files reviewed, 2 unresolved discussions (waiting on @schreter and @zach-schoenberger)

openraft/src/core/sm/mod.rs line 285 at r2 (raw file):

Previously, zach-schoenberger wrote…

I updated and ran locally having this panic, but it broke the testing setup in t10_api_install_snapshot::snapshot_argument. This test seems to be testing exactly this scenario: sending chunks before a manifest. I think it's still a valid test, but not sure how we would want to structure it with a panic.

Receiving a chunk before manifest is a protocol error and the client(leader) should receive an error response.

Before this PR, such checking is done in Engine, i.e., by FollowingHandler::receive_snapshot_chunk().
At the IO level, i.e., in this method: sm::Worker::receive_snapshot_chunk(), it did not handle protocol error.

I think there are 2 options for this situation.

You can just put the manifest to the engine level and let the FollowingHandler handle such error.
Or just let this method to be able to return a protocol error.

But before this, I still don't know if such a design is suitable to implement a snapshot data that has the leveldb structure.

zach-schoenberger · 2023-10-23T17:22:28Z

openraft/src/core/sm/mod.rs line 285 at r2 (raw file):

Previously, drmingdrmer (张炎泼) wrote…

Receiving a chunk before manifest is a protocol error and the client(leader) should receive an error response.

Before this PR, such checking is done in Engine, i.e., by FollowingHandler::receive_snapshot_chunk().
At the IO level, i.e., in this method: sm::Worker::receive_snapshot_chunk(), it did not handle protocol error.

I think there are 2 options for this situation.

You can just put the manifest to the engine level and let the FollowingHandler handle such error.

Or just let this method to be able to return a protocol error.

But before this, I still don't know if such a design is suitable to implement a snapshot data that has the leveldb structure.

Which aspect of the design? I'm definitely planning to use this for rocksdb. My high level plan is to have the manifest be the list of a checkpoing/backup's files and then each chunk just be a file name which triggers the follower to download the file from the snapshot provider.

drmingdrmer

Reviewable status: 18 of 33 files reviewed, 2 unresolved discussions (waiting on @schreter and @zach-schoenberger)

openraft/src/core/sm/mod.rs line 285 at r2 (raw file):

Previously, zach-schoenberger wrote…

Which aspect of the design? I'm definitely planning to use this for rocksdb. My high level plan is to have the manifest be the list of a checkpoing/backup's files and then each chunk just be a file name which triggers the follower to download the file from the snapshot provider.

If you are using Rocksdb checkpoint files, is there a manifest file stored on disk which maintains the list of checkpoint files?

If such a file exists, in this design, there is no step that presist the manifest file to disk.
If no such a file exists, ths snapshot can not be consistently reconstructed in the presence of dirty files, such as unfinished downloaded temporary files.

If the manifest is part of the snapshot data. It's necessary to incorporate a
step that allows the application to persist manifest.

On the other hand, the application doesn't have a snapshot manifest,
this would constitute a "no manifest" design, implying that the manifest is
constructed on the fly using snapshot data.

zach-schoenberger · 2023-11-08T03:14:53Z

openraft/src/core/sm/mod.rs line 285 at r2 (raw file):

Previously, drmingdrmer (张炎泼) wrote…

If you are using Rocksdb checkpoint files, is there a manifest file stored on disk which maintains the list of checkpoint files?

If such a file exists, in this design, there is no step that presist the manifest file to disk.

If no such a file exists, ths snapshot can not be consistently reconstructed in the presence of dirty files, such as unfinished downloaded temporary files.

If the manifest is part of the snapshot data. It's necessary to incorporate a
step that allows the application to persist manifest.

On the other hand, the application doesn't have a snapshot manifest,
this would constitute a "no manifest" design, implying that the manifest is
constructed on the fly using snapshot data.

no such file exists. there is no need for there to be one
the assumption it cannot be consistently reconstructed is not accurate. files in the directory are all that is needed. the manifest aka list of files can be generated cheaply on the fly.

The manifest is not part of the snapshot data here. The "on the fly" would be closer. Checkpoints hard link or copy the files to create a full representation of the state at time of snapshot. Either that succeeds at which point you have a directory and list of files, or it doesn't and thats an error state.

drmingdrmer

Reviewable status: 18 of 33 files reviewed, 2 unresolved discussions (waiting on @schreter and @zach-schoenberger)

openraft/src/core/sm/mod.rs line 285 at r2 (raw file):

Previously, zach-schoenberger wrote…

no such file exists. there is no need for there to be one

the assumption it cannot be consistently reconstructed is not accurate. files in the directory are all that is needed. the manifest aka list of files can be generated cheaply on the fly.

The manifest is not part of the snapshot data here. The "on the fly" would be closer. Checkpoints hard link or copy the files to create a full representation of the state at time of snapshot. Either that succeeds at which point you have a directory and list of files, or it doesn't and thats an error state.

In the context you've provided, a snapshot is essentially a collection of data files within the directory.
Snapshot gets updated when logs are applied to the state machine, which involves the addition of new files into this directory.

However, without a manifest file, it outlines the file list of the snapshot. It's impossible to perform an atomic update.

Consider a situation where you are adding to files. If the system crashes after
the first file is added and before the second one gets added, then snapshot
directory will end up in an inconsistent state.

zach-schoenberger · 2023-11-08T13:47:28Z

openraft/src/core/sm/mod.rs line 285 at r2 (raw file):

Previously, drmingdrmer (张炎泼) wrote…

In the context you've provided, a snapshot is essentially a collection of data files within the directory.
Snapshot gets updated when logs are applied to the state machine, which involves the addition of new files into this directory.

However, without a manifest file, it outlines the file list of the snapshot. It's impossible to perform an atomic update.

Consider a situation where you are adding to files. If the system crashes after
the first file is added and before the second one gets added, then snapshot
directory will end up in an inconsistent state.

Sorry, but Snapshot gets updated when logs are applied to the state machine, which involves the addition of new files into this doesn't make any sense. That's not what a snapshot is by definition. Snapshots should be immutable. The statemachine can be replaced/overwritten with a snapshot. The statemachine can have logs applied to it. But that is not the purpose of a snapshot. The situation that you bring up should never be the case because of this.

drmingdrmer

Reviewable status: 18 of 33 files reviewed, 2 unresolved discussions (waiting on @schreter and @zach-schoenberger)

openraft/src/core/sm/mod.rs line 285 at r2 (raw file):

Previously, zach-schoenberger wrote…

Sorry, but Snapshot gets updated when logs are applied to the state machine, which involves the addition of new files into this doesn't make any sense. That's not what a snapshot is by definition. Snapshots should be immutable. The statemachine can be replaced/overwritten with a snapshot. The statemachine can have logs applied to it. But that is not the purpose of a snapshot. The situation that you bring up should never be the case because of this.

I mean, when new logs are applied, and by some policy, a new snapshot should be build from the latest state machine.

Snapshot is immutable but it can be built upon a previous snapshot.

zach-schoenberger · 2023-11-08T14:16:47Z

openraft/src/core/sm/mod.rs line 285 at r2 (raw file):

Previously, drmingdrmer (张炎泼) wrote…

I mean, when new logs are applied, and by some policy, a new snapshot should be build from the latest state machine.

Snapshot is immutable but it can be built upon a previous snapshot.

I view this Snapshot is immutable but it can be built upon a previous snapshot. as:

follower receives snapshot
follower applies snapshot to statemachine (in most cases replacing current statemachine)
now there are 2 distinct sets of data - statemachine and snapshot
new logs applied to statemachine
Is this not how the raft system views these entities?

drmingdrmer

Reviewable status: 18 of 33 files reviewed, 2 unresolved discussions (waiting on @schreter and @zach-schoenberger)

openraft/src/core/sm/mod.rs line 285 at r2 (raw file):

Previously, zach-schoenberger wrote…

I view this Snapshot is immutable but it can be built upon a previous snapshot. as:

follower receives snapshot

follower applies snapshot to statemachine (in most cases replacing current statemachine)

now there are 2 distinct sets of data - statemachine and snapshot

new logs applied to statemachine
Is this not how the raft system views these entities?

When a follower receives a snapshot, it should replace both its state machine and its local snapshot with the received one.

schreter

@zach-schoenberger Thanks for the good work and sorry for the delay in reviewing. I currently don't have any further comments except the one below, so from my POV it's good to go. We'll see later, how it works in real life.

Reviewed 5 of 15 files at r4, 11 of 12 files at r6, 10 of 11 files at r7, 1 of 1 files at r8, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @drmingdrmer and @zach-schoenberger)

openraft/src/core/sm/mod.rs line 285 at r2 (raw file):

Previously, drmingdrmer (张炎泼) wrote…

When a follower receives a snapshot, it should replace both its state machine and its local snapshot with the received one.

On this one, I still don't know what's the best course of actions. Probably just fail installing the snapshot?

schreter

@drmingdrmer You are blocking the change. What is still open from your POV? I'd like to try to integrate it with our codebase "soon" (in a week or two, hopefully), so it would be nice to have the change in.

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @drmingdrmer and @zach-schoenberger)

drmingdrmer

As far as I'm know, a typical design for snapshot implementation mirrors that of leveldb, which involves several data files on the disk along with a manifest file. Modifications to the database occur through the addition of new data files followed by an atomic update of the manifest.

However, in this pull request, there is no provided API that allows the application to modify the manifest. It operates under the assumption that the manifest is transient and does not need to be stored persistently.

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @schreter and @zach-schoenberger)

openraft/src/core/sm/mod.rs line 285 at r2 (raw file):

Probably just fail installing the snapshot?

@schreter
I do not get it. Why does it have to fail? When a snapshot is received, replacing state machine with the snapshot is just the course of action.

now there are 2 distinct sets of data - statemachine and snapshot

new logs applied to statemachine
Is this not how the raft system views these entities?

@zach-schoenberger
Yes they are different, but the data in state machine is always a superset of the data snapshot: snapshot ⊆ state_machine.

drmingdrmer · 2024-01-13T04:09:50Z

@zach-schoenberger I will continue working on this based on your PR, in another PR

schreter

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @drmingdrmer and @zach-schoenberger)

openraft/src/core/sm/mod.rs line 285 at r2 (raw file):

I do not get it. Why does it have to fail? When a snapshot is received, replacing state machine with the snapshot is just the course of action.

Sorry for the late reply. What I have in mind, what should happen, for example, on resource shortage (memory, disk, etc.). But from the other discussion I take that this will simply result in the I/O error propagating to the raft core and shutting down the engine.

drmingdrmer · 2024-02-19T13:43:22Z

Closing this PR for this issue being resolved in:

Openraft does not need to understand Manifest or participate in the receiving process. These parts of job are application specific.

zach-schoenberger mentioned this pull request Jul 16, 2023

Refactor snapshot API #606

Closed

drmingdrmer self-requested a review July 16, 2023 14:50

drmingdrmer reviewed Jul 18, 2023

View reviewed changes

drmingdrmer requested a review from schreter July 18, 2023 02:12

schreter reviewed Jul 18, 2023

View reviewed changes

zach-schoenberger commented Jul 18, 2023

View reviewed changes

schreter reviewed Jul 19, 2023

View reviewed changes

drmingdrmer requested a review from schreter July 24, 2023 02:44

drmingdrmer reviewed Jul 24, 2023

View reviewed changes

schreter reviewed Jul 24, 2023

View reviewed changes

zach-schoenberger commented Jul 24, 2023

View reviewed changes

schreter reviewed Jul 24, 2023

View reviewed changes

drmingdrmer requested a review from schreter July 25, 2023 01:48

drmingdrmer reviewed Jul 25, 2023

View reviewed changes

schreter reviewed Jul 25, 2023

View reviewed changes

zach-schoenberger force-pushed the refactor_snapshot branch from 002d6fa to 4fd6452 Compare October 19, 2023 03:21

drmingdrmer requested a review from schreter October 19, 2023 03:42

drmingdrmer reviewed Oct 19, 2023

View reviewed changes

drmingdrmer reviewed Oct 20, 2023

View reviewed changes

drmingdrmer reviewed Oct 21, 2023

View reviewed changes

drmingdrmer reviewed Oct 24, 2023

View reviewed changes

drmingdrmer reviewed Nov 8, 2023

View reviewed changes

drmingdrmer reviewed Nov 9, 2023

View reviewed changes

schreter reviewed Nov 15, 2023

View reviewed changes

schreter reviewed Nov 21, 2023

View reviewed changes

drmingdrmer requested a review from schreter November 22, 2023 04:33

drmingdrmer reviewed Nov 22, 2023

View reviewed changes

drmingdrmer mentioned this pull request Dec 21, 2023

Tracking issue for refactoring snapshot API #857

Closed

4 tasks

schreter requested a review from drmingdrmer February 12, 2024 13:54

schreter reviewed Feb 12, 2024

View reviewed changes

drmingdrmer closed this Feb 19, 2024

Refactor snapshot #896

Refactor snapshot #896

Conversation

zach-schoenberger commented Jul 16, 2023 • edited by drmingdrmer Loading

drmingdrmer left a comment

Choose a reason for hiding this comment

schreter left a comment

Choose a reason for hiding this comment

zach-schoenberger left a comment

Choose a reason for hiding this comment

zach-schoenberger commented Jul 18, 2023

drmingdrmer commented Jul 19, 2023

schreter left a comment

Choose a reason for hiding this comment

zach-schoenberger commented Jul 19, 2023

schreter left a comment

Choose a reason for hiding this comment

drmingdrmer commented Jul 20, 2023

schreter commented Jul 20, 2023

zach-schoenberger commented Jul 20, 2023

drmingdrmer left a comment

Choose a reason for hiding this comment

schreter left a comment

Choose a reason for hiding this comment

zach-schoenberger left a comment

Choose a reason for hiding this comment

schreter left a comment

Choose a reason for hiding this comment

drmingdrmer left a comment

Choose a reason for hiding this comment

schreter left a comment

Choose a reason for hiding this comment

zach-schoenberger commented Jul 25, 2023

drmingdrmer commented Jul 25, 2023

zach-schoenberger commented Jul 25, 2023

schreter commented Oct 10, 2023

drmingdrmer commented Oct 10, 2023

zach-schoenberger commented Oct 15, 2023

zach-schoenberger commented Oct 17, 2023 • edited Loading

drmingdrmer commented Oct 18, 2023 • edited Loading

drmingdrmer left a comment

Choose a reason for hiding this comment

drmingdrmer left a comment

Choose a reason for hiding this comment

zach-schoenberger commented Oct 19, 2023

zach-schoenberger commented Oct 19, 2023

zach-schoenberger commented Oct 19, 2023

drmingdrmer left a comment

Choose a reason for hiding this comment

zach-schoenberger commented Oct 20, 2023

zach-schoenberger commented Oct 20, 2023

zach-schoenberger commented Oct 20, 2023

drmingdrmer left a comment

Choose a reason for hiding this comment

zach-schoenberger commented Oct 23, 2023

drmingdrmer left a comment

Choose a reason for hiding this comment

zach-schoenberger commented Nov 8, 2023

drmingdrmer left a comment

Choose a reason for hiding this comment

zach-schoenberger commented Nov 8, 2023

drmingdrmer left a comment

Choose a reason for hiding this comment

zach-schoenberger commented Nov 8, 2023

drmingdrmer left a comment

Choose a reason for hiding this comment

schreter left a comment

Choose a reason for hiding this comment

schreter left a comment

Choose a reason for hiding this comment

drmingdrmer left a comment

Choose a reason for hiding this comment

drmingdrmer commented Jan 13, 2024

schreter left a comment

Choose a reason for hiding this comment

drmingdrmer commented Feb 19, 2024

zach-schoenberger commented Jul 16, 2023 •

edited by drmingdrmer

Loading

zach-schoenberger commented Oct 17, 2023 •

edited

Loading

drmingdrmer commented Oct 18, 2023 •

edited

Loading