-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compact cluster status #2714
Compact cluster status #2714
Conversation
9f03df3
to
3fe3dd3
Compare
Not a fan of the letter-per-role approach. Perhaps the proposal is too compact? I can't quickly understand most of the columns. |
@AhmedSoliman I tried to use the full form but to me it looks like a lot of data in one column. But maybe you are right about being too compact. Probably better to split the PARTITIONS into |
0555aaa
to
78508f6
Compare
f3458e2
to
58df8bf
Compare
This is definitely more readable; did we give up on trying to convey the sequencer-partition leader colocation status? I would consider using singular 'LEADER |
@pcholakov the sequence/partition colocation status is only denoted by the colour so far. The green on the sequencer status means it's optimal colocation, if yellow then it's not. But this might be confusing if colours are not enabled |
That's perfect, I think colour is okay as this is a relatively advanced concept. |
Some minor observations running this locally:
PS. I rebased in on my #2748 - there's a trivial Cargo conflict but otherwise the two work well together! (my rebase is here main...pcholakov:restate:pr2714) |
@pcholakov I added the uptime to the cluster status response (check the PR). So unless you restart your nodes with that code, the value won't be available. I know that the uptime is not available in the extended view. My plan to first get the compact view correct before enriching the extended view with more data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great improvement, thank you @muhamadazmy! Approving to unblock merging, let's get this in 🚀 The only comment I'd love to see somehow addressed now or later is to render zero uptime as "n/a" as that's probably just an older server binary.
@@ -39,11 +39,13 @@ message SuspectNode { | |||
} | |||
|
|||
message AliveNode { | |||
restate.common.NodeId generational_node_id = 1; | |||
restate.common.GenerationalNodeId generational_node_id = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice tightening of the contract! <3
table.add_row(row); | ||
} | ||
|
||
c_println!("Node Configuration ({})", nodes_config.version()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT about calling this "Cluster status" with no version? (since we technically have multiple metadata values blended into the same table, it doesn't make sense to render just one version.)
|
||
match node_state { | ||
State::Alive(alive) => { | ||
// test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
superfluous?
// test |
), | ||
Color::Green, | ||
) | ||
.with_uptime(Duration::from_secs(alive.age_s)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wdyt about rendering this as "n/a" if zero, which is more correct if we connect to an older version server that doesn't return uptime?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
google.protobuf.Timestamp last_heartbeat_at = 2; | ||
// partition id is u16 but protobuf doesn't support u16. This must be a value | ||
// that's safe to convert to u16 | ||
map<uint32, PartitionProcessorStatus> partitions = 3; | ||
// age of node since the daemon started in seconds | ||
uint64 age_s = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thoughts on naming this uptime_s
rather? I think it lines up better with the way we render it, and it's less ambiguous (without the comment, could be misinterpreted to mean time since node was first created rather than server process started).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to use the same naming convention used in the node service here https://github.com/restatedev/restate/blob/main/crates/core/protobuf/node_ctl_svc.proto#L62
Which uses age
instead of uptime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized but I also feel that uptime is just a touch more idiomatic, and this interface is somewhat more likely to be externally consumed by more developers than the core Restate runtime time. No strong feelings on this, just felt the need to call this out :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally agree. I will change it then.
} | ||
|
||
// helper macro to unwrap cells | ||
macro_rules! unwrap { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I really need to play with macros more :-) thank you for simplifying this and giving me a great example to learn from!
Summary: Print out a compact cluster status by default Extended view of the cluster status can still be accessed via the `--extra` flag > There are also some minor fixes to the restatectl because of conflicting short flags across different commands.
Compact cluster status
Summary:
Print out a compact cluster status by default
Extended view of the cluster status can still be accessed
via the
--extra
flag