Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix reconnection deadlock #23

Merged
merged 1 commit into from
Oct 20, 2023
Merged

Fix reconnection deadlock #23

merged 1 commit into from
Oct 20, 2023

Conversation

quinchs
Copy link
Collaborator

@quinchs quinchs commented Oct 19, 2023

Summary

When the binding idles for 2 ish minutes, the server sends an IDLE_SESSION_TIMEOUT_ERROR error indicating that the binding should disconnect until another query is ready to be run.

The bindings logic would reset the state of the connection and only attempt to reconnect on a query call, the issue lies with the implementation of this logic with Netty. When the client resets the promises for indicating both when the client is connected as well as the ready state (when the binding receives READY_FOR_COMMAND message). The issue lies with the connection ready state, since our code looks like the following psuedo-code:

public CompletionStage<Void> send(Sendable message) {
  return whenConnectionReady()
    .thenCompose(v -> {
      if(!this.isConnected) {
        return reconnect().thenCompose(v -> send0(message));
      }
      return send0(message);
    });
}

we rely on the connection ready event to preform reconnect logic, the reason for this is the following states:

  • If we're already starting a connection and queue up the client handshake to be sent, the send method should wait for that connection to succeed or fail before trying to send a message
  • If we're already connected this completion state is successful allowing the binding to send the message right away.

During reconnect logic, if the binding detected a disconnect it would call the client connect method, which is responsible for resetting the connection state which includes both promises, the way it would reset the connection promise is by triggering a custom Netty pipeline event that resets the promise:

@Override
public void userEventTriggered(ChannelHandlerContext ctx, @NotNull Object evt) {
  if(evt.equals("RESET")) {
    // resets the promise
  }

  ...
}

The issue was that this pipeline event wasn't triggered synchronously with the connection reset code, so ultimately it would keep the old promise (which has been completed) and follow the reconnect cycle again which is controlled by a lock, causing a deadlock.

This PR fixes that by making the reset promise synchronous within the pipeline by making the callee run the reset instead of the netty pipeline thread.

@quinchs quinchs requested review from nsidnev and scotttrinh October 19, 2023 17:25
@quinchs
Copy link
Collaborator Author

quinchs commented Oct 19, 2023

cc @petersamokhin this most definitely the issue you we're facing, this should fix it.

@quinchs quinchs merged commit 4647a17 into master Oct 20, 2023
2 checks passed
@quinchs quinchs deleted the fix/client-reconnect-deadlock branch October 20, 2023 15:46
quinchs added a commit that referenced this pull request Oct 25, 2023
commit 8c07ee4
Author: Quin Lynch <[email protected]>
Date:   Wed Oct 25 14:27:48 2023 -0300

    Update ChannelDuplexer.java

commit fdd6101
Merge: 28cebea dd661c6
Author: Quin Lynch <[email protected]>
Date:   Wed Oct 25 14:21:57 2023 -0300

    Merge branch 'master' into feat/protocol-v2

commit dd661c6
Author: Quin Lynch <[email protected]>
Date:   Fri Oct 20 13:18:18 2023 -0300

    meta: 0.2.4-SNAPSHOT

commit 81ee0d1
Author: Quin Lynch <[email protected]>
Date:   Fri Oct 20 13:17:59 2023 -0300

    meta: 0.2.3

commit 4647a17
Author: Quin Lynch <[email protected]>
Date:   Fri Oct 20 12:46:25 2023 -0300

    Fix reconnection deadlock (#23)

commit 3be463b
Author: Quin Lynch <[email protected]>
Date:   Tue Oct 17 21:10:13 2023 -0300

    fix the reconnect logic

commit dfb97dd
Author: Quin Lynch <[email protected]>
Date:   Tue Oct 17 20:49:55 2023 -0300

    remove duplicate reconnect behavior

    Binding would call `reconnect()` which then gets reconnected on duplexer level just to sent Terminate and hang :cry:

commit 6e52959
Author: Quin Lynch <[email protected]>
Date:   Sun Oct 15 13:36:14 2023 -0300

    Fix idle clients causing exceptions

commit a89822d
Author: Quin Lynch <[email protected]>
Date:   Sat Oct 14 21:52:20 2023 -0300

    Fix cloud connection logic

commit 5bb3ba2
Author: Quin Lynch <[email protected]>
Date:   Tue Oct 10 13:09:03 2023 -0300

    meta: 0.2.3-SNAPSHOT

commit ca6ad29
Author: Quin Lynch <[email protected]>
Date:   Tue Oct 10 12:58:18 2023 -0300

    meta: 0.2.2

commit 13bed4d
Author: Quin Lynch <[email protected]>
Date:   Tue Oct 10 12:55:27 2023 -0300

    Fix transaction deadlock in client pool (#22)

commit 28cebea
Merge: dbd4099 7fa55fa
Author: Quin Lynch <[email protected]>
Date:   Tue Sep 26 13:48:57 2023 -0300

    update from master

commit 7fa55fa
Author: Quin Lynch <[email protected]>
Date:   Tue Sep 26 13:28:29 2023 -0300

    Fix primitive deserialization on datatypes (#20)

commit 43da5b2
Author: Quin Lynch <[email protected]>
Date:   Thu Sep 21 15:55:04 2023 -0300

    Update README.md

    closes #19

commit 2a54501
Author: Quin Lynch <[email protected]>
Date:   Wed Sep 13 11:23:59 2023 -0300

    Make EdgeDBClient implement AutoCloseable
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants