-
Notifications
You must be signed in to change notification settings - Fork 656
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[cluster-bus] Send a MEET packet to a node if there is no inbound link #1307
base: unstable
Are you sure you want to change the base?
Conversation
In some cases, when meeting a new node, if the handshake times out, we can end up with an inconsistent view of the cluster where the new node knows about all the nodes in the cluster, but the cluster does not know about this new node (or vice versa). To detect this inconsistency, we now check if a node has an outbound link but no inbound link, in this case it probably means this node does not know us. In this case we (re-)send a MEET packet to this node to do a new handshake with it. Signed-off-by: Pierre Turin <[email protected]>
Signed-off-by: Pierre Turin <[email protected]>
1920952
to
d65423e
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## unstable #1307 +/- ##
============================================
- Coverage 70.69% 70.67% -0.02%
============================================
Files 115 115
Lines 63153 63163 +10
============================================
Hits 44643 44643
- Misses 18510 18520 +10
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we disconnect the outbound link if inbound link is not available? I think it will lead to the same reconnection flow. Would it help with having simpler code and one unified flow. I'm not sure if it will perform the MEET
operation though.
} | ||
|
||
/* If this is a MEET packet from an unknown node, we still process | ||
* the gossip section here since we have to trust the sender because | ||
* of the message type. */ | ||
if (!sender && type == CLUSTERMSG_TYPE_MEET) clusterProcessGossipSection(hdr, link); | ||
/* If this is a MEET packet from an unknown node, we still process | ||
* the gossip section here since we have to trust the sender because | ||
* of the message type. */ | ||
clusterProcessGossipSection(hdr, link); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, but this double if with the same condition was driving me crazy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind this. But in general we avoid changes to the lines of code not related to the PR.
src/cluster_legacy.c
Outdated
clusterDelNode(node); | ||
return 1; | ||
} | ||
if (node->link != NULL && node->inbound_link == NULL && | ||
!nodeInHandshake(node) && !nodeIsMeeting(node) && !nodeTimedOut(node) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we create a macro for this node state check? Not readable at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nodeInNormalState()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nodeInHealthyState()
?
[llength [R 0 CLUSTER NODES]] == 26 && | ||
[llength [R 1 CLUSTER NODES]] == 26 && | ||
[llength [R 2 CLUSTER NODES]] == 26 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we match certain value in the string output? I don't like this magic number comparison which can change in the near future.
In this case we would just re-open an outbound connection, which the other node will accept, but it won't force the other node to recognize us as being part of the cluster if it doesn't trust us yet. The only way to force the other node to add us to its cluster view is for us to send a MEET packet. |
Update test to check node IDs instead of relying on number of words. Rename nodeIsMeeting() to nodeInMeetState(). Introduce nodeInNormalState() macro. Signed-off-by: Pierre Turin <[email protected]>
CLUSTER MEET is an admin operation but I guess we are fine with the case of reinitiating it if the operation wasn't successful in first place and retry it. |
proc cluster_3_nodes_all_know_each_other {} { | ||
set node0_id [dict get [get_myself 0] id] | ||
set node1_id [dict get [get_myself 1] id] | ||
set node2_id [dict get [get_myself 2] id] | ||
|
||
if { | ||
[cluster_get_node_by_id 0 $node0_id] != {} && | ||
[cluster_get_node_by_id 0 $node1_id] != {} && | ||
[cluster_get_node_by_id 0 $node2_id] != {} && | ||
[cluster_get_node_by_id 1 $node0_id] != {} && | ||
[cluster_get_node_by_id 1 $node1_id] != {} && | ||
[cluster_get_node_by_id 1 $node2_id] != {} && | ||
[cluster_get_node_by_id 2 $node0_id] != {} && | ||
[cluster_get_node_by_id 2 $node1_id] != {} && | ||
[cluster_get_node_by_id 2 $node2_id] != {} && | ||
[llength [R 0 CLUSTER LINKS]] == 4 && | ||
[llength [R 1 CLUSTER LINKS]] == 4 && | ||
[llength [R 2 CLUSTER LINKS]] == 4 | ||
} { | ||
return 1 | ||
} else { | ||
return 0 | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From ChatGPT:
proc cluster_3_nodes_all_know_each_other {} { | |
set node0_id [dict get [get_myself 0] id] | |
set node1_id [dict get [get_myself 1] id] | |
set node2_id [dict get [get_myself 2] id] | |
if { | |
[cluster_get_node_by_id 0 $node0_id] != {} && | |
[cluster_get_node_by_id 0 $node1_id] != {} && | |
[cluster_get_node_by_id 0 $node2_id] != {} && | |
[cluster_get_node_by_id 1 $node0_id] != {} && | |
[cluster_get_node_by_id 1 $node1_id] != {} && | |
[cluster_get_node_by_id 1 $node2_id] != {} && | |
[cluster_get_node_by_id 2 $node0_id] != {} && | |
[cluster_get_node_by_id 2 $node1_id] != {} && | |
[cluster_get_node_by_id 2 $node2_id] != {} && | |
[llength [R 0 CLUSTER LINKS]] == 4 && | |
[llength [R 1 CLUSTER LINKS]] == 4 && | |
[llength [R 2 CLUSTER LINKS]] == 4 | |
} { | |
return 1 | |
} else { | |
return 0 | |
} | |
} | |
proc cluster_nodes_all_know_each_other {num_nodes} { | |
# Collect node IDs dynamically | |
set node_ids {} | |
for {set i 0} {$i < $num_nodes} {incr i} { | |
lappend node_ids [dict get [get_myself $i] id] | |
} | |
# Check if all nodes know each other | |
foreach node_id $node_ids { | |
foreach check_node_id $node_ids { | |
for {set node_index 0} {$node_index < $num_nodes} {incr node_index} { | |
if {[cluster_get_node_by_id $node_index $check_node_id] == {}} { | |
return 0 | |
} | |
} | |
} | |
} | |
# Verify cluster link counts for each node | |
set expected_links [expr {2 * ($num_nodes - 1)}] | |
for {set i 0} {$i < $num_nodes} {incr i} { | |
if {[llength [R $i CLUSTER LINKS]] != $expected_links} { | |
return 0 | |
} | |
} | |
return 1 | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we do something like #461? only clear the CLUSTER_NODE_MEET flag when myself receive a "ack" (not the plain PONG but something with a strong ack, ack that sender has already meet myself?) I haven't thought about it carefully, but i feel it is more reliable?
Do you mean by like adding a new flag? I think the concern is we could still end up in the inverse state, where the the node that received the "strong" ack will put the other node online but then might go offline. My original thought was that as long as one node believes the other is part of the cluster, is should try to have the other node join. It's sort of like an "enhanced" version of how we built up the mesh when two disjoin clusters meet each other. |
We could do a 3-way handshake to strengthen the handshake reliability, instead of the current -> MEET/PONG, <- PING/PONG. We could add an extra back and forth between the two nodes before clearing the flags. But we can always end up in a situation where one node thinks the handshake is done and the other node times out the handshake because the last packet got delayed or dropped. With this solution, if the handshake has succeeded on one side and not the other, we ensure both sides will eventually know each other. |
With this fix, we can sometime get in a (potentially) infinite loop where a node keeps sending a MEET packet to the other node, but both nodes know each other. This sometimes (although rarely) happens in the second test The following sequence of events can trigger this issue:
Node 0 should eventually send a PING packet to node 1, but there is no guarantee as to when that can happen. When I reproduce the issue, node 0 never gets a chance to send a PING to node 1 because node 0 overrides Line 5045 in 33f42d7
I think there are various ways to mitigate this issue:
I think option 3 should work best without making too many changes to the current logic. But I'm open to suggestions. |
In some cases, when meeting a new node, if the handshake times out, we can end up with an inconsistent view of the cluster where the new node knows about all the nodes in the cluster, but the cluster does not know about this new node (or vice versa).
To detect this inconsistency, we now check if a node has an outbound link but no inbound link, in this case it probably means this node does not know us. In this case we (re-)send a MEET packet to this node to do a new handshake with it.
This fixes the bug described in #1251.