Compact cluster status #2714

muhamadazmy · 2025-02-12T15:37:05Z

Compact cluster status

Summary:
Print out a compact cluster status by default

Extended view of the cluster status can still be accessed
via the --extra flag

There are also some minor fixes to the restatectl because of conflicting short flags across different commands.

muhamadazmy · 2025-02-12T15:56:42Z

This is how it looks like atm.

AhmedSoliman · 2025-02-12T20:54:44Z

Not a fan of the letter-per-role approach. Perhaps the proposal is too compact? I can't quickly understand most of the columns.

muhamadazmy · 2025-02-13T08:35:39Z

@AhmedSoliman I tried to use the full form but to me it looks like a lot of data in one column. But maybe you are right about being too compact. Probably better to split the PARTITIONS into LEADERS, FOLLOWERS and the NODESETS to NODESETS and SEQUNCERS

pcholakov · 2025-02-13T14:45:10Z

This is definitely more readable; did we give up on trying to convey the sequencer-partition leader colocation status?

I would consider using singular 'LEADERorFOLLOWERheadings. The way to read the table is something like "node N is a leader for X partitions".NODESETSsounds a bit awkward but the best alternative I can offer is something longer likeNODESET-MEMBER` (in my head, that reads as "node N is a nodeset member of Y nodesets / serves Y log tail segments").

muhamadazmy · 2025-02-13T20:25:20Z

@pcholakov the sequence/partition colocation status is only denoted by the colour so far. The green on the sequencer status means it's optimal colocation, if yellow then it's not. But this might be confusing if colours are not enabled

pcholakov · 2025-02-13T20:30:31Z

That's perfect, I think colour is okay as this is a relatively advanced concept.

pcholakov · 2025-02-17T15:10:03Z

Some minor observations running this locally:

is it a bit odd that restatectl status reports uptime in default view, but not in the expanded --extra view?
maybe in the default compact view, Node Configuration (vN) should just say Cluster overview or not even print any heading at all?

❯ rc st
Node Configuration (v21)
 NODE-ID  NAME   UPTIME  METADATA  LEADERS  FOLLOWERS  NODESETS  SEQUENCERS  ROLES
 N1:10    node1  0s      Member    0        2          2         0           admin | log-server | metadata-server | worker
 N2:3     node2  0s      Member    0        2          2         0           admin | log-server | metadata-server | worker
 N3:3     node3  0s      Member    2        0          2         2           admin | log-server | metadata-server | worker
❯ rc st --extra
Node Configuration (v21)
 NODE  GEN  NAME   ADDRESS                               ROLES
 N1    10   node1  http://node1.cluster.orb.local:5122/  admin | log-server | metadata-server | worker
 N2    3    node2  http://node2.cluster.orb.local:5122/  admin | log-server | metadata-server | worker
 N3    3    node3  http://node3.cluster.orb.local:5122/  admin | log-server | metadata-server | worker

Logs v6
└ Logs Provider: replicated
 ├ Log replication: {node: 2}
 └ Nodeset size: 0
 L-ID  FROM-LSN  KIND        LOGLET-ID  REPLICATION  SEQUENCER  NODESET
 0     3         Replicated  0_2        {node: 2}    N3:3       [N1, N2, N3]
 1     4         Replicated  1_3        {node: 2}    N3:3       [N1, N2, N3]

Alive partition processors (nodes config v21, partition table v29)
 P-ID  NODE   MODE      STATUS  LEADER  EPOCH  SEQUENCER  APPLIED-LSN  PERSISTED-LSN  SKIPPED-RECORDS  ARCHIVED-LSN  LAST-UPDATE
 0     N1:10  Follower  Active  N3:3    e3                3            -              0                -             1 second and 272 ms ago
 0     N2:3   Follower  Active  N3:3    e3                3            -              0                -             1 second and 181 ms ago
 0     N3:3   Leader    Active  N3:3    e3     N3:3       3            -              0                -             1 second and 570 ms ago
 1     N1:10  Follower  Active  N3:3    e4                4            -              0                -             1 second and 220 ms ago
 1     N2:3   Follower  Active  N3:3    e4                4            -              0                -             1 second and 378 ms ago
 1     N3:3   Leader    Active  N3:3    e4     N3:3       4            -              0                -             1 second and 30 ms ago

Metadata service
 NODE  STATUS  VERSION  LEADER  MEMBERS     APPLIED  COMMITTED  TERM  LOG-LENGTH  SNAP-INDEX  SNAP-SIZE
 N1    Member  v3       N2      [N1,N2,N3]  70       70         5     59          11          1.4 kiB
 N2    Member  v3       N2      [N1,N2,N3]  70       70         5     59          11          1.4 kiB
 N3    Member  v3       N2      [N1,N2,N3]  70       70         5     59          11          1.4 kiB

uptime reports 0s on mine because this was run against 1.2.0, which doesn't return age!

PS. I rebased in on my #2748 - there's a trivial Cargo conflict but otherwise the two work well together! (my rebase is here main...pcholakov:restate:pr2714)

muhamadazmy · 2025-02-17T17:27:49Z

@pcholakov I added the uptime to the cluster status response (check the PR). So unless you restart your nodes with that code, the value won't be available.
I did that to avoid calling each node get ident, and instead have everything in a single grpc call

I know that the uptime is not available in the extended view. My plan to first get the compact view correct before enriching the extended view with more data

pcholakov

Great improvement, thank you @muhamadazmy! Approving to unblock merging, let's get this in 🚀 The only comment I'd love to see somehow addressed now or later is to render zero uptime as "n/a" as that's probably just an older server binary.

pcholakov · 2025-02-17T15:16:17Z

crates/types/protobuf/restate/cluster.proto

@@ -39,11 +39,13 @@ message SuspectNode {
 }

 message AliveNode {
-  restate.common.NodeId generational_node_id = 1;
+  restate.common.GenerationalNodeId generational_node_id = 1;


Nice tightening of the contract! <3

pcholakov · 2025-02-17T15:18:40Z

tools/restatectl/src/commands/status.rs

+        table.add_row(row);
+    }
+
+    c_println!("Node Configuration ({})", nodes_config.version());


WDYT about calling this "Cluster status" with no version? (since we technically have multiple metadata values blended into the same table, it doesn't make sense to render just one version.)

pcholakov · 2025-02-17T15:19:47Z

tools/restatectl/src/commands/status.rs

+
+        match node_state {
+            State::Alive(alive) => {
+                // test


superfluous?

Suggested change

// test

pcholakov · 2025-02-17T15:20:26Z

tools/restatectl/src/commands/status.rs

+                    ),
+                    Color::Green,
+                )
+                .with_uptime(Duration::from_secs(alive.age_s));


wdyt about rendering this as "n/a" if zero, which is more correct if we connect to an older version server that doesn't return uptime?

Sounds good!

This is how it looks now when running nodes that does not recognize the uptime flag

pcholakov · 2025-02-17T15:22:28Z

crates/types/protobuf/restate/cluster.proto

  google.protobuf.Timestamp last_heartbeat_at = 2;
  // partition id is u16 but protobuf doesn't support u16. This must be a value
  // that's safe to convert to u16
  map<uint32, PartitionProcessorStatus> partitions = 3;
+  // age of node since the daemon started in seconds
+  uint64 age_s = 4;


thoughts on naming this uptime_s rather? I think it lines up better with the way we render it, and it's less ambiguous (without the comment, could be misinterpreted to mean time since node was first created rather than server process started).

I tried to use the same naming convention used in the node service here https://github.com/restatedev/restate/blob/main/crates/core/protobuf/node_ctl_svc.proto#L62
Which uses age instead of uptime

I realized but I also feel that uptime is just a touch more idiomatic, and this interface is somewhat more likely to be externally consumed by more developers than the core Restate runtime time. No strong feelings on this, just felt the need to call this out :-)

Totally agree. I will change it then.

pcholakov · 2025-02-17T17:32:28Z

tools/restatectl/src/commands/status.rs

+}
+
+// helper macro to unwrap cells
+macro_rules! unwrap {


Nice! I really need to play with macros more :-) thank you for simplifying this and giving me a great example to learn from!

Summary: Print out a compact cluster status by default Extended view of the cluster status can still be accessed via the `--extra` flag > There are also some minor fixes to the restatectl because of conflicting short flags across different commands.

muhamadazmy force-pushed the pr2714 branch from 2140bb6 to a40d637 Compare February 12, 2025 15:42

muhamadazmy force-pushed the pr2714 branch 2 times, most recently from 9f03df3 to 3fe3dd3 Compare February 12, 2025 17:02

muhamadazmy marked this pull request as ready for review February 12, 2025 17:02

muhamadazmy requested a review from pcholakov February 12, 2025 17:02

muhamadazmy force-pushed the pr2714 branch from 3fe3dd3 to e88dc25 Compare February 12, 2025 17:03

muhamadazmy mentioned this pull request Feb 13, 2025

Minor fixes in restatectl #2712

Merged

muhamadazmy force-pushed the pr2714 branch from e88dc25 to 99fd89b Compare February 13, 2025 08:45

muhamadazmy mentioned this pull request Feb 13, 2025

Conflicting restatectl flags #2719

Merged

muhamadazmy force-pushed the pr2714 branch 2 times, most recently from 0555aaa to 78508f6 Compare February 13, 2025 11:04

muhamadazmy mentioned this pull request Feb 13, 2025

Reconfigure respect --sequencer and --nodeset if provider is not supplied #2721

Merged

muhamadazmy force-pushed the pr2714 branch 2 times, most recently from f3458e2 to 58df8bf Compare February 13, 2025 13:01

muhamadazmy mentioned this pull request Feb 13, 2025

Unify the range input for some ids #2723

Merged

muhamadazmy force-pushed the pr2714 branch from 58df8bf to 6c73d1b Compare February 13, 2025 13:18

muhamadazmy force-pushed the pr2714 branch from 6c73d1b to d924b8b Compare February 13, 2025 19:32

pcholakov approved these changes Feb 17, 2025

View reviewed changes

Compact cluster status

f98b462

Summary: Print out a compact cluster status by default Extended view of the cluster status can still be accessed via the `--extra` flag > There are also some minor fixes to the restatectl because of conflicting short flags across different commands.

muhamadazmy force-pushed the pr2714 branch from d924b8b to f98b462 Compare February 18, 2025 08:12

muhamadazmy merged commit 186150c into restatedev:main Feb 18, 2025
56 checks passed

muhamadazmy deleted the pr2714 branch February 18, 2025 08:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compact cluster status #2714

Compact cluster status #2714

muhamadazmy commented Feb 12, 2025 •

edited

Loading

muhamadazmy commented Feb 12, 2025 •

edited

Loading

AhmedSoliman commented Feb 12, 2025

muhamadazmy commented Feb 13, 2025

pcholakov commented Feb 13, 2025

muhamadazmy commented Feb 13, 2025

pcholakov commented Feb 13, 2025

pcholakov commented Feb 17, 2025 •

edited

Loading

muhamadazmy commented Feb 17, 2025

pcholakov left a comment

pcholakov Feb 17, 2025

pcholakov Feb 17, 2025

pcholakov Feb 17, 2025

pcholakov Feb 17, 2025

muhamadazmy Feb 17, 2025

muhamadazmy Feb 18, 2025

pcholakov Feb 17, 2025

muhamadazmy Feb 17, 2025

pcholakov Feb 17, 2025

muhamadazmy Feb 18, 2025

pcholakov Feb 17, 2025

Compact cluster status #2714

Compact cluster status #2714

Conversation

muhamadazmy commented Feb 12, 2025 • edited Loading

muhamadazmy commented Feb 12, 2025 • edited Loading

AhmedSoliman commented Feb 12, 2025

muhamadazmy commented Feb 13, 2025

pcholakov commented Feb 13, 2025

muhamadazmy commented Feb 13, 2025

pcholakov commented Feb 13, 2025

pcholakov commented Feb 17, 2025 • edited Loading

muhamadazmy commented Feb 17, 2025

pcholakov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

muhamadazmy commented Feb 12, 2025 •

edited

Loading

muhamadazmy commented Feb 12, 2025 •

edited

Loading

pcholakov commented Feb 17, 2025 •

edited

Loading