Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cluster-bus] Send a MEET packet to a node if there is no inbound link #1307

Open
wants to merge 3 commits into
base: unstable
Choose a base branch
from

Conversation

pieturin
Copy link
Contributor

In some cases, when meeting a new node, if the handshake times out, we can end up with an inconsistent view of the cluster where the new node knows about all the nodes in the cluster, but the cluster does not know about this new node (or vice versa).
To detect this inconsistency, we now check if a node has an outbound link but no inbound link, in this case it probably means this node does not know us. In this case we (re-)send a MEET packet to this node to do a new handshake with it.

This fixes the bug described in #1251.

In some cases, when meeting a new node, if the handshake times out, we
can end up with an inconsistent view of the cluster where the new node
knows about all the nodes in the cluster, but the cluster does not know
about this new node (or vice versa).
To detect this inconsistency, we now check if a node has an outbound
link but no inbound link, in this case it probably means this node does
not know us. In this case we (re-)send a MEET packet to this node to do
a new handshake with it.

Signed-off-by: Pierre Turin <[email protected]>
Signed-off-by: Pierre Turin <[email protected]>
Copy link

codecov bot commented Nov 14, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.67%. Comparing base (32f7541) to head (6c67d41).

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #1307      +/-   ##
============================================
- Coverage     70.69%   70.67%   -0.02%     
============================================
  Files           115      115              
  Lines         63153    63163      +10     
============================================
  Hits          44643    44643              
- Misses        18510    18520      +10     
Files with missing lines Coverage Δ
src/cluster_legacy.c 86.20% <100.00%> (+0.01%) ⬆️

... and 13 files with indirect coverage changes

Copy link
Contributor

@hpatro hpatro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we disconnect the outbound link if inbound link is not available? I think it will lead to the same reconnection flow. Would it help with having simpler code and one unified flow. I'm not sure if it will perform the MEET operation though.

Comment on lines -3227 to +3241
}

/* If this is a MEET packet from an unknown node, we still process
* the gossip section here since we have to trust the sender because
* of the message type. */
if (!sender && type == CLUSTERMSG_TYPE_MEET) clusterProcessGossipSection(hdr, link);
/* If this is a MEET packet from an unknown node, we still process
* the gossip section here since we have to trust the sender because
* of the message type. */
clusterProcessGossipSection(hdr, link);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but this double if with the same condition was driving me crazy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind this. But in general we avoid changes to the lines of code not related to the PR.

src/cluster_legacy.h Outdated Show resolved Hide resolved
clusterDelNode(node);
return 1;
}
if (node->link != NULL && node->inbound_link == NULL &&
!nodeInHandshake(node) && !nodeIsMeeting(node) && !nodeTimedOut(node) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we create a macro for this node state check? Not readable at this point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nodeInNormalState()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nodeInHealthyState() ?

src/cluster_legacy.c Outdated Show resolved Hide resolved
tests/unit/cluster/cluster-reliable-meet.tcl Show resolved Hide resolved
Comment on lines 231 to 233
[llength [R 0 CLUSTER NODES]] == 26 &&
[llength [R 1 CLUSTER NODES]] == 26 &&
[llength [R 2 CLUSTER NODES]] == 26
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we match certain value in the string output? I don't like this magic number comparison which can change in the near future.

tests/unit/cluster/cluster-reliable-meet.tcl Show resolved Hide resolved
@pieturin
Copy link
Contributor Author

pieturin commented Nov 14, 2024

What if we disconnect the outbound link if inbound link is not available?

In this case we would just re-open an outbound connection, which the other node will accept, but it won't force the other node to recognize us as being part of the cluster if it doesn't trust us yet. The only way to force the other node to add us to its cluster view is for us to send a MEET packet.

Update test to check node IDs instead of relying on number of words.
Rename nodeIsMeeting() to nodeInMeetState().
Introduce nodeInNormalState() macro.

Signed-off-by: Pierre Turin <[email protected]>
@hpatro
Copy link
Contributor

hpatro commented Nov 14, 2024

What if we disconnect the outbound link if inbound link is not available?

In this case we would just re-opened an outbound connection, which the other node will accept, but it won't force the other node to recognize us as being part of the cluster if it doesn't trust us yet. The only way to force the other node to add us to its cluster view is for us to send a MEET packet.

CLUSTER MEET is an admin operation but I guess we are fine with the case of reinitiating it if the operation wasn't successful in first place and retry it.

Comment on lines +91 to +114
proc cluster_3_nodes_all_know_each_other {} {
set node0_id [dict get [get_myself 0] id]
set node1_id [dict get [get_myself 1] id]
set node2_id [dict get [get_myself 2] id]

if {
[cluster_get_node_by_id 0 $node0_id] != {} &&
[cluster_get_node_by_id 0 $node1_id] != {} &&
[cluster_get_node_by_id 0 $node2_id] != {} &&
[cluster_get_node_by_id 1 $node0_id] != {} &&
[cluster_get_node_by_id 1 $node1_id] != {} &&
[cluster_get_node_by_id 1 $node2_id] != {} &&
[cluster_get_node_by_id 2 $node0_id] != {} &&
[cluster_get_node_by_id 2 $node1_id] != {} &&
[cluster_get_node_by_id 2 $node2_id] != {} &&
[llength [R 0 CLUSTER LINKS]] == 4 &&
[llength [R 1 CLUSTER LINKS]] == 4 &&
[llength [R 2 CLUSTER LINKS]] == 4
} {
return 1
} else {
return 0
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From ChatGPT:

Suggested change
proc cluster_3_nodes_all_know_each_other {} {
set node0_id [dict get [get_myself 0] id]
set node1_id [dict get [get_myself 1] id]
set node2_id [dict get [get_myself 2] id]
if {
[cluster_get_node_by_id 0 $node0_id] != {} &&
[cluster_get_node_by_id 0 $node1_id] != {} &&
[cluster_get_node_by_id 0 $node2_id] != {} &&
[cluster_get_node_by_id 1 $node0_id] != {} &&
[cluster_get_node_by_id 1 $node1_id] != {} &&
[cluster_get_node_by_id 1 $node2_id] != {} &&
[cluster_get_node_by_id 2 $node0_id] != {} &&
[cluster_get_node_by_id 2 $node1_id] != {} &&
[cluster_get_node_by_id 2 $node2_id] != {} &&
[llength [R 0 CLUSTER LINKS]] == 4 &&
[llength [R 1 CLUSTER LINKS]] == 4 &&
[llength [R 2 CLUSTER LINKS]] == 4
} {
return 1
} else {
return 0
}
}
proc cluster_nodes_all_know_each_other {num_nodes} {
# Collect node IDs dynamically
set node_ids {}
for {set i 0} {$i < $num_nodes} {incr i} {
lappend node_ids [dict get [get_myself $i] id]
}
# Check if all nodes know each other
foreach node_id $node_ids {
foreach check_node_id $node_ids {
for {set node_index 0} {$node_index < $num_nodes} {incr node_index} {
if {[cluster_get_node_by_id $node_index $check_node_id] == {}} {
return 0
}
}
}
}
# Verify cluster link counts for each node
set expected_links [expr {2 * ($num_nodes - 1)}]
for {set i 0} {$i < $num_nodes} {incr i} {
if {[llength [R $i CLUSTER LINKS]] != $expected_links} {
return 0
}
}
return 1
}

Copy link
Member

@enjoy-binbin enjoy-binbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we do something like #461? only clear the CLUSTER_NODE_MEET flag when myself receive a "ack" (not the plain PONG but something with a strong ack, ack that sender has already meet myself?) I haven't thought about it carefully, but i feel it is more reliable?

@madolson
Copy link
Member

only clear the CLUSTER_NODE_MEET flag when myself receive a "ack" (not the plain PONG but something with a strong ack, ack that sender has already meet myself?

Do you mean by like adding a new flag? I think the concern is we could still end up in the inverse state, where the the node that received the "strong" ack will put the other node online but then might go offline.

My original thought was that as long as one node believes the other is part of the cluster, is should try to have the other node join. It's sort of like an "enhanced" version of how we built up the mesh when two disjoin clusters meet each other.

@pieturin
Copy link
Contributor Author

can we do something like #461? only clear the CLUSTER_NODE_MEET flag when myself receive a "ack" (not the plain PONG but something with a strong ack, ack that sender has already meet myself?) I haven't thought about it carefully, but i feel it is more reliable?

We could do a 3-way handshake to strengthen the handshake reliability, instead of the current -> MEET/PONG, <- PING/PONG. We could add an extra back and forth between the two nodes before clearing the flags. But we can always end up in a situation where one node thinks the handshake is done and the other node times out the handshake because the last packet got delayed or dropped.

With this solution, if the handshake has succeeded on one side and not the other, we ensure both sides will eventually know each other.

@pieturin
Copy link
Contributor Author

With this fix, we can sometime get in a (potentially) infinite loop where a node keeps sending a MEET packet to the other node, but both nodes know each other. This sometimes (although rarely) happens in the second test Handshake eventually succeeds after node handshake timeout on one side with inconsistent view of the cluster.

The following sequence of events can trigger this issue:

  1. Nodes 1 & 2 know each other, but don't know node 0.
  2. We make node 0 meet node 1. But the handshake times-out on node 1's side, but succeed on node 0's side.
  3. When node 1 marks the handshake as timed out, it will close both connections with node 0.
  4. From node 0 perspective, both connections to node 1 are closed. But since it knows node 1, it will re-open an outbound connection to it.
  5. Node 1 will accept the inbound connection from node 0, but it doesn't know this node, so it doesn't register this connection as belonging to any known node (ie: link->node stays NULL).
  6. With the change from this PR, node 0 will detect that node 2 doesn't have any inbound connection to it. So node 0 will send a MEET packet to node 2. (for this bug to happen, node 2 should be met first)
  7. Handshake with node 0 and 2 succeeds.
  8. Now node 2 gossips about node 0 to node 1. So node 1 will add node 0 to its list of known nodes.
  9. Node 1 opens a connection to node 0. At this point node 0 has both inbound and outbound connections to node 1. But from node 1's perspective, it only has an outbound connection to node 0. The inbound connection is not attached to any node (still has link->node set to NULL).
  10. So node 1 sends a MEET packet to node 0, since it doesn't think it has an inbound connection for it.
  11. The handshake completes successfully since node 0 responds to the MEET packet. But still no inbound connection.
  12. So node 1 keeps sending MEET packets to node 0 until node 0 sends a PING packet to node 1. When node 1 receives a PING packet from the node 0, it will set the node's inbound connection (here).

Node 0 should eventually send a PING packet to node 1, but there is no guarantee as to when that can happen. When I reproduce the issue, node 0 never gets a chance to send a PING to node 1 because node 0 overrides node->pong_received for node 1 when node 2 gossips about node 1 with a higher pong_received value. And node 2 always has a lower pong_received value compared to node 1 when trying to select a node to send a PING to here:

if (min_pong_node == NULL || min_pong > this->pong_received) {

I think there are various ways to mitigate this issue:

  1. Do nothing, the MEET packet loop should eventually stop. (but I have to update the test so that it's not flaky)
  2. Change the pinging decision logic to force a PING to every nodes at least every X amount of time, even if we know that another node was able to ping it recently. X can be set to something like 2 * (server.cluster_ping_interval ? server.cluster_ping_interval : server.cluster_node_timeout / 2). This will incur an increase in cluster bus traffic on large clusters.
  3. If I receive a MEET packet and I already have an outbound link for that node, then I should free my existing outbound link to it, and re-open a new one.

I think option 3 should work best without making too many changes to the current logic. But I'm open to suggestions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants