Feature: store heartbeat metric in RaftMetrics and RaftDataMetrics #1177

SteveLauC · 2024-07-18T09:07:49Z

Checklist

Updated guide with pertinent info (may not always apply).
Squash down commits to one or two logical commits which clearly describe the work you've done.
Unittest is a friend:)

Will check the above stuff when this PR is no longer a draft:)

What does this PR do

This PR attempts to implement the metric discussed in this thread,
adding the heartbeat information (time of the last replication or heartbeat) to
RaftDataMetrics and RaftMetrics so that users can subscribe to it.

It is a draft PR as I want to ensure it looks basically correct, then I will
try to add tests, code documents, and polish interfaces.

How it is implemented

For the time type, I used RaftTypeConfig::AsyncRuntime::Instant as it is used
widely.

And I found the last applied log ID is stored in struct ProcessEntry, so I
stored this time information there as well. When a response to a replication
request is received, the replication handler will update it in
ReplicationHandler::update_matching() using the sending_time of the replication
result.

And these metrics will be sent to the tokio watch channel in RaftCore::report_metrics().

Unsolved problems/Questions

There are several unsolved problems with this PR, which I want to discuss with
the maintainer before taking any further actions.

The serde::{Deserialize, Serialize} bounds

RaftDataMetris and RaftMetrics need to satisfy these bounds when the serde
feature of Openraft is enabled, but neither std::time::Instant nor
tokio::time::Instant does this.
Should we use the result.sending_time, or we should let the follower pass
the time when it receives the replication request? The later seems more
accurate.
Tests

Looks like most unit tests (related to replication) use special human-made
log ID to emulate specific cases, should this time info be done similarly?
I guess no as it is not that important compared to the log ID.

This change is

drmingdrmer · 2024-07-18T09:39:11Z

It looks like you can not use Instant in RaftMetrics which requires serde.

The metrics may have to store the interval since the last acked heartbeat, just like millis_since_quorum_ack does:
https://github.com/datafuselabs/openraft/blob/9dbcd4c0b6723fd4ac5d38ed032d8e656d0abc8d/openraft/src/metrics/raft_metrics.rs#L70

drmingdrmer

Reviewed all commit messages.
Reviewable status: 0 of 6 files reviewed, 1 unresolved discussion (waiting on @SteveLauC)

openraft/src/progress/entry/mod.rs line 26 at r1 (raw file):

    /// The id of the last matching log on the target following node.
    pub(crate) matching: Option<LogId<C::NodeId>>,
    pub(crate) time: Option<InstantOf<C>>,

You do not need to store the ack time. there is already a field for it:
https://github.com/datafuselabs/openraft/blob/55d009c897d689534cb5cc3b6321b1f42d9b2066/openraft/src/proposer/leader.rs#L61

clock_progress stores the last timestamp which is acked by each follower/learner.

It can be transform to a BTreeMap as it does with progress -> replications:

https://github.com/datafuselabs/openraft/blob/55d009c897d689534cb5cc3b6321b1f42d9b2066/openraft/src/core/raft_core.rs#L520-L528

SteveLauC · 2024-07-18T10:11:29Z

Thanks for the review and guiding me!

It looks like you can not use Instant in RaftMetrics which requires serde.

The metrics may have to store the interval since the last acked heartbeat, just like millis_since_quorum_ack does:

Done

You do not need to store the ack time. there is already a field for it:

It can be transform to a BTreeMap as it does with progress -> replications:

Done

schreter

Reviewed 6 of 7 files at r2, all commit messages.
Reviewable status: 6 of 7 files reviewed, 2 unresolved discussions (waiting on @drmingdrmer and @SteveLauC)

openraft/src/progress/entry/mod.rs line 26 at r1 (raw file):

    /// The id of the last matching log on the target following node.
    pub(crate) matching: Option<LogId<C::NodeId>>,
    pub(crate) time: Option<InstantOf<C>>,

BTW, independent of the solution, naming a member time and especially w/o documentation is very confusing. Always add a documentation comment telling what this is and name members and methods appropriately.

For instance, this might be last_known_alive_timestamp or something like that.

drmingdrmer

We need to create a test to verify the functionality of these features.

Please include the heartbeat tests in this module:
https://github.com/datafuselabs/openraft/blob/ca156ead4af0624e990d1e1e069aab929a5379a3/tests/tests/metrics/main.rs#L10

Reviewed 7 of 7 files at r2, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @SteveLauC)

openraft/src/metrics/raft_metrics.rs line 77 at r2 (raw file):

    /// The heartbeats states. It is Some() only when this node is leader.
    pub heartbeat: Option<HeartbeatMetrics<C>>,

Please enhance the documentation to clarify the meaning of the value. For example: "The value represents the interval since the timestamp of the last heartbeat acknowledged by a follower."

SteveLauC · 2024-07-18T14:55:27Z

naming a member time and especially w/o documentation is very confusing.

Please enhance the documentation to clarify the meaning of the value

Yeah, I will handle them, just want to ensure this PR is basically ok before doing these things.

It is a draft PR as I want to ensure it looks basically correct, then I will
try to add tests, code documents, and polish interfaces.

SteveLauC · 2024-07-19T01:10:33Z

Hi, I just added the code documents.

We need to create a test to verify the functionality of these features.

Please include the heartbeat tests in this module:

https://github.com/datafuselabs/openraft/blob/ca156ead4af0624e990d1e1e069aab929a5379a3/tests/tests/metrics/main.rs#L10

Should I add tests in this module or the module tests/tests/metrics/t30_leader_metrics.rs, looks that the later is for metrics and the replication tests are put in it:

$ rg "replication"
t30_leader_metrics.rs
52:                if let Some(ref q) = x.replication {
61:            "no replication with 1 node cluster",
95:                if let Some(ref q) = x.replication {
102:            "replication metrics to 4 nodes",
128:        "--- replication metrics should reflect the replication state"
137:                    if let Some(ref q) = x.replication {
144:                "replication metrics to 3 nodes",
159:        n1.wait(timeout()).metrics(|x| x.replication.is_some(), "node-1 starts replication").await?;
161:        n0.wait(timeout()).metrics(|x| x.replication.is_none(), "node-0 stopped replication").await?;

Or maybe it should be added in tests/tests/metrics/t10_server_metrics_and_data_metrics.rs?

SteveLauC · 2024-07-19T02:59:24Z

tests/tests/metrics/t10_server_metrics_and_data_metrics.rs

+        let mut data_metrics = leader.data_metrics();
+        let mut prev_node1 = 0;
+        let mut prev_node2 = 0;
+        for _ in 0..30 {


Here, since the heartbeat is disabled, the last ack time of node 1 and node 2 should never change, so I expect the time since the last ack should increase during the loop, but it won't sometimes, I am not sure about the reason:<

running 1 test panicked at tests/tests/metrics/t10_server_metrics_and_data_metrics.rs:109:13: interval should increase since last ack won't change tests/tests/metrics/t10_server_metrics_and_data_metrics.rs:109:13 backtrace is disabled without --features 'bt' test t10_server_metrics_and_data_metrics::heartbeat_metrics ... FAILED

Any change to RaftMetrics would result in changed() to be ready, it is possible other fields changed but the heartbeat did not change.

I dumped the node-0 metrics from the log (at the bottom of the page):
cat ut.2024-07-19-02 | grep heartbeat_metrics | grep report_metrics:.* -o | grep id:0

But it looks lite impl Display for RaftMetrics is not updated I can not see the heartbeat field:

report_metrics: Metrics{id:0, Learner, term:0, vote:<T0-N0:->, last_log:None, last_applied:None, leader:None(since_last_ack:None ms), membership:{log_id:None, {voters:[], learners:[]}}, snapshot:None, purged:None, replication:{}} report_metrics: Metrics{id:0, Learner, term:0, vote:<T0-N0:->, last_log:None, last_applied:None, leader:None(since_last_ack:None ms), membership:{log_id:None, {voters:[], learners:[]}}, snapshot:None, purged:None, replication:{}} report_metrics: Metrics{id:0, Leader, term:1, vote:<T1-N0:Q>, last_log:1, last_applied:None, leader:0(since_last_ack:0 ms), membership:{log_id:T0-N0.0, {voters:[{0:()}], learners:[]}}, snapshot:None, purged:None, replication:{0:T1-N0.1}} report_metrics: Metrics{id:0, Leader, term:1, vote:<T1-N0:Q>, last_log:1, last_applied:T1-N0.1, leader:0(since_last_ack:0 ms), membership:{log_id:T0-N0.0, {voters:[{0:()}], learners:[]}}, snapshot:None, purged:None, replication:{0:T1-N0.1}} report_metrics: Metrics{id:0, Leader, term:1, vote:<T1-N0:Q>, last_log:2, last_applied:T1-N0.1, leader:0(since_last_ack:0 ms), membership:{log_id:T1-N0.2, {voters:[{0:()}], learners:[1:()]}}, snapshot:None, purged:None, replication:{0:T1-N0.2,1:None}} report_metrics: Metrics{id:0, Leader, term:1, vote:<T1-N0:Q>, last_log:2, last_applied:T1-N0.2, leader:0(since_last_ack:0 ms), membership:{log_id:T1-N0.2, {voters:[{0:()}], learners:[1:()]}}, snapshot:None, purged:None, replication:{0:T1-N0.2,1:None}} report_metrics: Metrics{id:0, Leader, term:1, vote:<T1-N0:Q>, last_log:3, last_applied:T1-N0.2, leader:0(since_last_ack:0 ms), membership:{log_id:T1-N0.3, {voters:[{0:()}], learners:[1:(),2:()]}}, snapshot:None, purged:None, replication:{0:T1-N0.3,1:None,2:None}} report_metrics: Metrics{id:0, Leader, term:1, vote:<T1-N0:Q>, last_log:3, last_applied:T1-N0.3, leader:0(since_last_ack:0 ms), membership:{log_id:T1-N0.3, {voters:[{0:()}], learners:[1:(),2:()]}}, snapshot:None, purged:None, replication:{0:T1-N0.3,1:None,2:None}} report_metrics: Metrics{id:0, Leader, term:1, vote:<T1-N0:Q>, last_log:3, last_applied:T1-N0.3, leader:0(since_last_ack:0 ms), membership:{log_id:T1-N0.3, {voters:[{0:()}], learners:[1:(),2:()]}}, snapshot:None, purged:None, replication:{0:T1-N0.3,1:T1-N0.3,2:T1-N0.3}} report_metrics: Metrics{id:0, Leader, term:1, vote:<T1-N0:Q>, last_log:4, last_applied:T1-N0.3, leader:0(since_last_ack:14 ms), membership:{log_id:T1-N0.4, {voters:[{0:()},{0:(),1:(),2:()}], learners:[]}}, snapshot:None, purged:None, replication:{0:T1-N0.4,1:T1-N0.3,2:T1-N0.3}} report_metrics: Metrics{id:0, Leader, term:1, vote:<T1-N0:Q>, last_log:4, last_applied:T1-N0.4, leader:0(since_last_ack:10 ms), membership:{log_id:T1-N0.4, {voters:[{0:()},{0:(),1:(),2:()}], learners:[]}}, snapshot:None, purged:None, replication:{0:T1-N0.4,1:T1-N0.3,2:T1-N0.4}} report_metrics: Metrics{id:0, Leader, term:1, vote:<T1-N0:Q>, last_log:5, last_applied:T1-N0.4, leader:0(since_last_ack:19 ms), membership:{log_id:T1-N0.5, {voters:[{0:(),1:(),2:()}], learners:[]}}, snapshot:None, purged:None, replication:{0:T1-N0.5,1:T1-N0.4,2:T1-N0.4}} report_metrics: Metrics{id:0, Leader, term:1, vote:<T1-N0:Q>, last_log:5, last_applied:T1-N0.5, leader:0(since_last_ack:9 ms), membership:{log_id:T1-N0.5, {voters:[{0:(),1:(),2:()}], learners:[]}}, snapshot:None, purged:None, replication:{0:T1-N0.5,1:T1-N0.4,2:T1-N0.5}} report_metrics: Metrics{id:0, Leader, term:1, vote:<T1-N0:Q>, last_log:5, last_applied:T1-N0.5, leader:0(since_last_ack:8 ms), membership:{log_id:T1-N0.5, {voters:[{0:(),1:(),2:()}], learners:[]}}, snapshot:None, purged:None, replication:{0:T1-N0.5,1:T1-N0.5,2:T1-N0.5}}

Any change to RaftMetrics would result in changed() to be ready, it is possible other fields changed but the heartbeat did not change.

Actually and surprisingly, the interval value in the next loop can even be smaller than the previous one:

$ cargo t t10_server_metrics_and_data_metrics::heartbeat_metrics -- --nocapture running 1 test heartbeat: {0: Some(2), 1: Some(9), 2: Some(6)} heartbeat: {0: Some(2), 1: Some(6), 2: Some(5)} panicked at tests/tests/metrics/t10_server_metrics_and_data_metrics.rs:111:13: interval should increase since last ack won't change tests/tests/metrics/t10_server_metrics_and_data_metrics.rs:111:13 backtrace is disabled without --features 'bt' test t10_server_metrics_and_data_metrics::heartbeat_metrics ... FAILED

But it looks lite impl Display for RaftMetrics is not updated I can not see the heartbeat field:

Sorry for missing that, it has just been added:)

Hmmm, if another AppendEntries or heartbeat response is received, this interval will become smaller.

Heartbeat is disabled in this test cluster, so there shouldn't be heartbeat response. Then I guess it is made by a replication during the cluster initialization?

I removed this test, do we need more tests or the existing 2 tests are sufficient, let me know what you think:)

drmingdrmer

Reviewed 1 of 2 files at r3, 1 of 1 files at r5, all commit messages.
Reviewable status: 7 of 8 files reviewed, 5 unresolved discussions (waiting on @schreter and @SteveLauC)

openraft/src/metrics/raft_metrics.rs line 114 at r5 (raw file):

        write!(
            f,
            "membership:{}, heartbeat:{{{}}}, snapshot:{}, purged:{}, replication:{{{}}}",

The Display implementations should be consistent: always put heartbeat after replication.

SteveLauC · 2024-07-19T08:47:11Z

The Display implementations should be consistent: always put heartbeat after replication.

Sure

SteveLauC · 2024-07-19T09:01:53Z

https://github.com/datafuselabs/openraft/actions/runs/10005367869/job/27655934359?pr=1177

failures:

---- t11_add_learner::check_learner_after_leader_transferred stdout ----
panicked at /home/runner/work/openraft/openraft/openraft/src/raft_state/log_state_reader.rs:25:13:
assertion failed: Some(log_id) <= self.committed()
/home/runner/work/openraft/openraft/openraft/src/raft_state/log_state_reader.rs:25:13
backtrace is disabled without --features 'bt'
panicked at /home/runner/work/openraft/openraft/openraft/src/raft_state/log_state_reader.rs:25:13:
assertion failed: Some(log_id) <= self.committed()
/home/runner/work/openraft/openraft/openraft/src/raft_state/log_state_reader.rs:25:13
backtrace is disabled without --features 'bt'
panicked at /home/runner/work/openraft/openraft/openraft/src/raft_state/log_state_reader.rs:25:13:
assertion failed: Some(log_id) <= self.committed()
/home/runner/work/openraft/openraft/openraft/src/raft_state/log_state_reader.rs:25:13
backtrace is disabled without --features 'bt'
Error: timeout after 3s when node in new cluster finally commit at least one blank leader-initialize log .applied_index>=10 latest: Metrics{id:3, Leader, term:2, vote:<T2-N3:Q>, last_log:10, last_applied:T1-N0.9, leader:3(since_last_ack:None ms), membership:{log_id:T1-N0.9, {voters:[{1:(),3:(),4:()}], learners:[2:()]}}, snapshot:None, purged:None, replication:{1:None,2:None,3:T2-N3.10,4:None}, heartbeat:{1:None,2:None,3:76,4:None}}

This test failure seems irrelevant.

schreter

Reviewed 1 of 7 files at r2, 1 of 2 files at r3, 1 of 1 files at r4, 1 of 1 files at r5, 1 of 1 files at r6, 1 of 1 files at r7, all commit messages.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @drmingdrmer and @SteveLauC)

openraft/src/metrics/raft_metrics.rs line 78 at r5 (raw file):

    /// Heartbeat metrics. It is Some() only when this node is leader.
    ///
    /// This field records a mapping between a node's ID and milliseconds since

I already opened an issue for this and similar topics. Using "millisecond since" or other relative times is unreliable. We should in general use just timestamps via standard Instant (well, the one from AsyncRuntime) and Duration.

There was just a question on how to serialize it for users who need serde. Possibly, we need a specialized serde serializer to serialize it properly. See also this issue on serde.

Code quote:

milliseconds since

openraft/src/metrics/raft_metrics.rs line 119 at r5 (raw file):

                .as_ref()
                .map(|x| { x.iter().map(|(k, v)| format!("{}:{}", k, DisplayOption(v))).collect::<Vec<_>>().join(",") })
                .unwrap_or_default(),

Ouch! Allocations in Display and similar traits are a very bad idea! Yes, I know there is some legacy stuff with summary() and similar, but continuing it is not good.

Please rewrite allocation-free, e.g., by adding properly-implemented Display trait to HeartbeatMetrics.

Suggestion:

            DisplayOption(&self.heartbeat),

openraft/src/metrics/raft_metrics.rs line 214 at r5 (raw file):

            self.heartbeat
                .as_ref()
                .map(|x| { x.iter().map(|(k, v)| format!("{}:{}", k, DisplayOption(v))).collect::<Vec<_>>().join(",") })

Same here, though it was broken also for replication. Feel free to fix it.

tests/tests/metrics/t10_server_metrics_and_data_metrics.rs line 96 at r4 (raw file):

Hmmm, if another AppendEntries or heartbeat response is received, this interval will become smaller.

That's why use Instant and not some sort of Duration...

SteveLauC · 2024-07-19T09:17:32Z

Please rewrite allocation-free, e.g., by adding properly-implemented Display trait to HeartbeatMetrics.

This would make HeartbeatMetrics a wrapper type, not an alias since we cannot impl std::fmt::Display for std::collections::BTreeMap due to the orphan rule, I will proceed and implement it (for HeartbeatMetric and ReplicationMetrics) if you guys are ok with this change.

drmingdrmer · 2024-07-19T09:25:07Z

@schreter
Shall we consider upgrading Duration in milliseconds to Instant in a separate PR to keep the changes in this PR more manageable?

And GPT-4o has given me a working example of serializing Instant 🤔 :

Below is a complete example that includes both serialization and deserialization for std::time::Instant using serde. We will create a newtype wrapper around Instant and implement both Serialize and Deserialize.

Add dependencies: Ensure you have the necessary dependencies in your Cargo.toml:

[dependencies]
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"

Create wrapper and implement serialization and deserialization:

use serde::{Serialize, Deserialize, Serializer, Deserializer};
use serde::de::{self, Visitor};
use std::time::{Instant, SystemTime, UNIX_EPOCH, Duration};
use std::fmt;

// Newtype wrapper around `Instant`
#[derive(Debug)]
struct SerializableInstant(Instant);

impl Serialize for SerializableInstant {
    fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
    where
        S: Serializer,
    {
        // Convert `Instant` to `SystemTime`
        let system_time = SystemTime::now() + (self.0 - Instant::now());
        // Get the duration since the Unix epoch
        let duration = system_time.duration_since(UNIX_EPOCH).expect("Time went backwards");
        // Serialize the duration as seconds since the epoch
        serializer.serialize_u64(duration.as_secs())
    }
}

impl<'de> Deserialize<'de> for SerializableInstant {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where
        D: Deserializer<'de>,
    {
        struct InstantVisitor;

        impl<'de> Visitor<'de> for InstantVisitor {
            type Value = SerializableInstant;

            fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
                formatter.write_str("a u64 representing seconds since the Unix epoch")
            }

            fn visit_u64<E>(self, value: u64) -> Result<Self::Value, E>
            where
                E: de::Error,
            {
                let system_time = UNIX_EPOCH + Duration::new(value, 0);
                let now = SystemTime::now();
                let instant = if system_time > now {
                    Instant::now() + (system_time.duration_since(now).unwrap())
                } else {
                    Instant::now() - (now.duration_since(system_time).unwrap())
                };
                Ok(SerializableInstant(instant))
            }
        }

        deserializer.deserialize_u64(InstantVisitor)
    }
}

// Example usage
fn main() {
    let now = Instant::now();
    let serializable_now = SerializableInstant(now);

    // Serialize to JSON
    let json = serde_json::to_string(&serializable_now).unwrap();
    println!("Serialized instant: {}", json);

    // Deserialize from JSON
    let deserialized_instant: SerializableInstant = serde_json::from_str(&json).unwrap();
    println!("Deserialized instant: {:?}", deserialized_instant);
}

Explanation

Serialization:
- We convert Instant to SystemTime by calculating the difference between self.0 (the stored Instant) and Instant::now(), and then adding it to SystemTime::now().
- We then get the duration since the Unix epoch and serialize it as a u64 representing seconds since the epoch.
Deserialization:
- We define a Visitor to handle the deserialization of a u64 value.
- We convert the u64 back to a SystemTime by adding the duration to the Unix epoch.
- We then convert SystemTime back to Instant by adjusting from the current SystemTime to the given SystemTime.

This ensures that we can both serialize an Instant to a JSON representation and deserialize it back to an Instant. The Instant is stored as the number of seconds since the Unix epoch, which is a common and portable format for timestamps.

SteveLauC · 2024-07-19T09:33:17Z

Shall we consider upgrading Duration in milliseconds to Instant in a separate PR to keep the changes in this PR more manageable?

And I am actually not sure if that allocation refactor should be done in this PR, it deserves a separate PR if we want a PR to only does one thing 🤪

schreter

Shall we consider upgrading Duration in milliseconds to Instant in a separate PR to keep the changes in this PR more manageable?

Yes, that's OK. And yes, the suggestion is a similar to one in the serde issue I linked. But not in seconds, better microseconds :-).

Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @drmingdrmer and @SteveLauC)

openraft/src/metrics/raft_metrics.rs line 119 at r5 (raw file):

Previously, schreter wrote…

Ouch! Allocations in Display and similar traits are a very bad idea! Yes, I know there is some legacy stuff with summary() and similar, but continuing it is not good.

Please rewrite allocation-free, e.g., by adding properly-implemented Display trait to HeartbeatMetrics.

@SteveLauC I'd strongly suggest to keep discussions pertaining to code locations in Reviewable - it's much simpler to follow and you can easily track "done" items.

Regarding your question about wrappers, there is no need to make HearbeatMetrics a wrapper.

But, you can do the same as for DisplayOption - introduce a helper which wraps the struct only for Display purposes (DisplayBTreeMap, for example).

Suggestion:

DisplayOption(self.heartbeat.as_ref.map(DisplayBTreeMap)),

schreter

And I am actually not sure if that allocation refactor should be done in this PR, it deserves a separate PR if we want a PR to only does one thing 🤪

Since it's broken already for the other metrics, that's OK, but you know, technical debt...

My problem is that openraft is crashing all our malfunction tests which simulate OOMs (i.e., it is panicking instead of "proper" error handling, bubbling up the error to the caller and correctly shutting down). There will be some serious clean up needed :-). Unfortunately, I can't yet open-source our malfunction testing framework (which would be VERY handy for many library writers).

Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @drmingdrmer and @SteveLauC)

SteveLauC · 2024-07-19T10:33:16Z

I just finished that display refactor:)

drmingdrmer

Reviewed 1 of 1 files at r7, 3 of 3 files at r8, all commit messages.
Reviewable status: all files reviewed, 9 unresolved discussions (waiting on @SteveLauC)

openraft/src/display_ext/display_btreemap_opt_value.rs line 28 at r8 (raw file):

            }
            None => { /* do nothing, as map length is 0 */ }
        }

This snippet could be simplified to:

Suggestion:

        let len = self.0.len();
        for (idx, (key, value)) in self.0.iter().enumerate() {
            write!(f, "{}:{}", key, DisplayOption(value))?;
            if idx + 1 != len {
                write!(f, ",")?;
            }
        }

openraft/src/metrics/raft_metrics.rs line 119 at r8 (raw file):

            DisplayOption(&self.snapshot),
            DisplayOption(&self.purged),
            DisplayOption(&self.replication.as_ref().map(DisplayBtreeMapOptValueExt::display)),

Given that DisplayBTreeMapOptValue is already available, you can simplify the expression like this:

DisplayOption(&self.replication.as_ref().map(DisplayBTreeMapOptValue)),

SteveLauC · 2024-07-19T15:21:48Z

Given that DisplayBTreeMapOptValue is already available, you can simplify the expression like this

I wrote it in that way so that there won't be a allow unused attribute in display_btreemap_opt_value.rs, only one in display_ext.rs, to be consistent with other display_xxx.rs files, I agree it's kinda weird, can change to approach you suggested if you don't mind another unused attribute.

drmingdrmer · 2024-07-19T15:24:58Z

Given that DisplayBTreeMapOptValue is already available, you can simplify the expression like this

I wrote it in that way so that there won't be a allow unused attribute in display_btreemap_opt_value.rs, only one in display_ext.rs, to be consistent with other display_xxx.rs files, I agree it's kinda weird, can change to approach you suggested if you don't mind another unused attribute.

I do not quite mind about the unused import :)

schreter

Reviewed 3 of 3 files at r8, all commit messages.
Reviewable status: all files reviewed, 9 unresolved discussions (waiting on @drmingdrmer and @SteveLauC)

openraft/src/display_ext.rs line 9 at r8 (raw file):

pub(crate) mod display_slice;

#[allow(unused_imports)]

I don't quite get it. Why are these imports unused? Are they just written for future use and not really used in the crate (yet)?

A possible path to handle such things might be a prelude module. Such as, using all these traits via use crate::display_ext::prelude::*; where all the use statements are pub-exports of the helpers. If the prelude itself is pub(crate) mod prelude;, it's still hidden for the rest of the world, but hopefully you won't get any warnings for unused imports.

openraft/src/display_ext/display_btreemap_opt_value.rs line 28 at r8 (raw file):

Previously, drmingdrmer (张炎泼) wrote…

This snippet could be simplified to:

Or even further to:

        for (idx, (key, value)) in self.0.iter().enumerate() {
            if idx > 0 {
                write!(f, ",")?;
            }
            write!(f, "{}:{}", key, DisplayOption(value))?;
        }

SteveLauC · 2024-07-20T01:08:59Z

This snippet could be simplified to:

Done

I do not quite mind about the unused import :)

Done

I don't quite get it. Why are these imports unused? Are they just written for future use and not really used in the crate (yet)?

I am not sure about this as well, I just follow the convention:

define your wrapper type
define your XxxExt trait
re-export both of them in display_ext.rs

Previously, drmingdrmer (张炎泼) wrote…

Or even further to:

They are basically same, so I just use the one suggested by @drmingdrmer.

SteveLauC · 2024-07-20T01:28:59Z

When you guys this PR is in its last review round, ping me so that I will squash my commits:)

drmingdrmer

Reviewed 3 of 3 files at r9, all commit messages.
Reviewable status: all files reviewed, 9 unresolved discussions (waiting on @SteveLauC)

SteveLauC · 2024-07-21T23:37:17Z

Gentle ping on @schreter, can I trouble you for another review?

drmingdrmer · 2024-07-22T02:19:12Z

Gentle ping on @schreter, can I trouble you for another review?

It's been good enough to me. Please squash the commits and let's merge it.

@schreter sometimes got delays for giving feedback. Do not worry. If any refinements are needed, he will provide feedback later soon, and you can update it in the next PR. This has become a routine working pattern:)

SteveLauC · 2024-07-22T02:32:28Z

Rust nightly 1.82 renamed std::panic::PanicInfo to std::panic::PanicHookInfo and mark the old name deprecated, this change is currently exclusive to nightly toolchain 1.82.0 and higher.

Looks like we have to use something like rustversion to handle it? 🤔 Or we can #[allow(deprecated)] it.

drmingdrmer · 2024-07-22T03:04:20Z

Rust nightly 1.82 renamed std::panic::PanicInfo to std::panic::PanicHookInfo and mark the old name deprecated, this change is currently exclusive to nightly toolchain 1.82.0 and higher.

Looks like we have to use something like rustversion to handle it? 🤔 Or we can #[allow(deprecated)] it.

It looks like a single #[allow(deprecated)] would be good enough.

SteveLauC · 2024-07-22T03:05:36Z

It looks like a single #[allow(deprecated)] would be good enough.

Let me do this in a separate PR.

drmingdrmer · 2024-07-22T03:16:00Z

It looks like a single #[allow(deprecated)] would be good enough.

Let me do this in a separate PR.

I'm gonna include this fixup in this:

Chore: mark StorageIOError as deprecated #1187

SteveLauC · 2024-07-22T03:18:12Z

I'm gonna include this fixup in this:

Feel free to close #1188 if you need to 😁

schreter

Gentle ping on @schreter, can I trouble you for another review?

I actually already did, but since I didn't add any additional comments you probably didn't get any mail. I just removed the blocker/marked the Display formatting as "done" in Reviewable.

The refactoring with Instant instead of some sort of duration at multiple places is a separate thing, I just added some random comments for that (no blockers).

Reviewed 3 of 3 files at r9, 4 of 4 files at r10, all commit messages.
Reviewable status: all files reviewed, 9 unresolved discussions

SteveLauC · 2024-07-22T08:17:27Z

I actually already did, but since I didn't add any additional comments you probably didn't get any mail. I just removed the blocker/marked the Display formatting as "done" in Reviewable.

Ahh, sorry for missing that, looks like I really need to learn how to use Reviewable. 😶‍🌫️

SteveLauC changed the title ~~feat: store heartbeat metric in RaftMetrics and RaftDataMetrics~~ Feature: store heartbeat metric in RaftMetrics and RaftDataMetrics Jul 18, 2024

drmingdrmer self-requested a review July 18, 2024 09:33

drmingdrmer reviewed Jul 18, 2024

View reviewed changes

schreter reviewed Jul 18, 2024

View reviewed changes

drmingdrmer reviewed Jul 18, 2024

View reviewed changes

SteveLauC commented Jul 19, 2024

View reviewed changes

drmingdrmer requested a review from schreter July 19, 2024 07:42

drmingdrmer reviewed Jul 19, 2024

View reviewed changes

schreter reviewed Jul 19, 2024

View reviewed changes

drmingdrmer reviewed Jul 19, 2024

View reviewed changes

schreter reviewed Jul 19, 2024

View reviewed changes

SteveLauC marked this pull request as ready for review July 20, 2024 01:27

drmingdrmer requested a review from schreter July 20, 2024 02:42

drmingdrmer approved these changes Jul 20, 2024

View reviewed changes

Feature: store heartbeat metric in RaftMetrics and RaftDataMetrics

ca1fcfe

SteveLauC force-pushed the feat/heartbeat_metric branch from b3a23a8 to ca1fcfe Compare July 22, 2024 02:22

SteveLauC mentioned this pull request Jul 22, 2024

Test: allow deprecated std::panic::PanicInfo in example tests #1188

Merged

3 tasks

drmingdrmer merged commit d857a85 into databendlabs:main Jul 22, 2024
25 of 31 checks passed

SteveLauC deleted the feat/heartbeat_metric branch July 22, 2024 03:15

schreter reviewed Jul 22, 2024

View reviewed changes

SteveLauC mentioned this pull request Jul 23, 2024

Metrics: add last acked heartbeat timestamp for each follower/learner #1176

Closed

Feature: store heartbeat metric in RaftMetrics and RaftDataMetrics #1177

Feature: store heartbeat metric in RaftMetrics and RaftDataMetrics #1177

Conversation

SteveLauC commented Jul 18, 2024 • edited Loading

What does this PR do

How it is implemented

Unsolved problems/Questions

drmingdrmer commented Jul 18, 2024

drmingdrmer left a comment

Choose a reason for hiding this comment

SteveLauC commented Jul 18, 2024

schreter left a comment

Choose a reason for hiding this comment

drmingdrmer left a comment

Choose a reason for hiding this comment

SteveLauC commented Jul 18, 2024

SteveLauC commented Jul 19, 2024 • edited Loading

SteveLauC Jul 19, 2024

Choose a reason for hiding this comment

drmingdrmer Jul 19, 2024

Choose a reason for hiding this comment

SteveLauC Jul 19, 2024

Choose a reason for hiding this comment

drmingdrmer Jul 19, 2024

Choose a reason for hiding this comment

SteveLauC Jul 19, 2024

Choose a reason for hiding this comment

SteveLauC Jul 19, 2024 • edited Loading

Choose a reason for hiding this comment

drmingdrmer left a comment

Choose a reason for hiding this comment

SteveLauC commented Jul 19, 2024

SteveLauC commented Jul 19, 2024

schreter left a comment

Choose a reason for hiding this comment

SteveLauC commented Jul 19, 2024

drmingdrmer commented Jul 19, 2024

Explanation

SteveLauC commented Jul 19, 2024

schreter left a comment

Choose a reason for hiding this comment

schreter left a comment

Choose a reason for hiding this comment

SteveLauC commented Jul 19, 2024

drmingdrmer left a comment

Choose a reason for hiding this comment

SteveLauC commented Jul 19, 2024

drmingdrmer commented Jul 19, 2024

schreter left a comment

Choose a reason for hiding this comment

SteveLauC commented Jul 20, 2024

SteveLauC commented Jul 20, 2024

drmingdrmer left a comment

Choose a reason for hiding this comment

SteveLauC commented Jul 21, 2024

drmingdrmer commented Jul 22, 2024

SteveLauC commented Jul 22, 2024 • edited Loading

drmingdrmer commented Jul 22, 2024

SteveLauC commented Jul 22, 2024

drmingdrmer commented Jul 22, 2024

SteveLauC commented Jul 22, 2024

schreter left a comment

Choose a reason for hiding this comment

SteveLauC commented Jul 22, 2024

SteveLauC commented Jul 18, 2024 •

edited

Loading

SteveLauC commented Jul 19, 2024 •

edited

Loading

SteveLauC Jul 19, 2024 •

edited

Loading

SteveLauC commented Jul 22, 2024 •

edited

Loading