-
Notifications
You must be signed in to change notification settings - Fork 580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a dedicated method for disconnecting TLS connections #10005
Conversation
By now, I'm pretty sure the answer is yes. Evidence: Take two connected Icinga 2 nodes and break individual connections by dropping the packets specific to that connection in a firewall. Both nodes will detect that no messages were received and reconnect, however, the old connection remains in an established state:
That implies that there's probably a resource leak in that scenario (until the kernel decides that the connection is actually dead and returns an error for the socket operations). Unverified theory of what might happen: icinga2/lib/remote/jsonrpcconnection.cpp Line 226 in 9a8620d
Which waits for icinga2/lib/remote/jsonrpcconnection.cpp Lines 112 to 120 in 9a8620d
|
Thing to consider
|
e90acc5
to
3a72a6f
Compare
I resolved conflicts and started implementing |
3a72a6f
to
9d67c26
Compare
While continuing that work, I figured that this might become a bigger rework of This PR on it's own should already be enough of an improvement on its own, after all it even fixes a problem in the HTTP connection handling.
In that regard, I figured that adding comments to these calls why they are fine is good enough, especially when compared that was needed just to add a redundant timeout and spawn a pointless coroutine (e90acc5). I'm leaving the PR in a draft state for the moment because I still want to answer a few detail questions regarding the two new disconnect methods (I removed a |
9d67c26
to
e5b9ff4
Compare
I still couldn't find a good reason to call
That on the other hand turned out to be necessary: as confirmed using strace, without calling |
e5b9ff4
to
7430618
Compare
lib/remote/httpserverconnection.cpp
Outdated
|
||
m_Stream->next_layer().async_shutdown(yc[ec]); | ||
|
||
m_Stream->lowest_layer().shutdown(m_Stream->lowest_layer().shutdown_both, ec); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So ec
isn't needed anymore, right?
lib/base/tlsstream.hpp
Outdated
} | ||
|
||
void ForceDisconnect(); | ||
void GracefulDisconnect(boost::asio::io_context::strand strand, boost::asio::yield_context yc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
void GracefulDisconnect(boost::asio::io_context::strand strand, boost::asio::yield_context yc); | |
void GracefulDisconnect(boost::asio::io_context::strand& strand, boost::asio::yield_context yc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, using a reference here wouldn't work because that method spawns a coroutine, and there's nothing that keeps the m_IoStrand
object from the Http
or Rpc
class alive till that coroutine finishes.
7430618
to
a6445c7
Compare
a6445c7
to
1cf8617
Compare
Force push was just a rebase to resolve merge conflicts due to changes in indentation, no other (intended) changes. |
1cf8617
to
e6d387d
Compare
And another rebase to get the GitHub Actions going again (actually, I thought the last rebase would already to that, but at that point, #10251 wasn't merged yet) |
@Al2Klimov Please clarify what your expectations are regarding your outstanding comments (#10005 (review), #10005 (review)). Yes, cleaning up the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is almost perfectly fine.
lib/base/tlsstream.cpp
Outdated
Timeout::Ptr shutdownTimeout(new Timeout(strand.context(), strand, boost::posix_time::seconds(10), | ||
[this, keepAlive = AsioTlsStream::Ptr(this)](boost::asio::yield_context yc) { | ||
// Forcefully terminate the connection if async_shutdown() blocked more than 10 seconds. | ||
ForceDisconnect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- If keepAlive just protects against (SEGV on) ForceDisconnect() out of GracefulDisconnect() scope, please Defer a Timeout cancellation instead.
- Drop a85c188.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If keepAlive just protects against (SEGV on) ForceDisconnect() out of GracefulDisconnect() scope, please Defer a Timeout cancellation instead.
Yes, capturing a reference in a lambda is a pattern we use in a few places to prevent a use after free and that's why I did it here as well. Indeed, after closer inspection, due tue the timeout doing the cancellation check on the same strand that will also cancel the timeout, after Cancel()
, the timeout class won't use the callback, so the captures this
isn't used anymore either.
Surprising what a small insight is able to resolve that whole discussion nicely.
Calling `AsioTlsStream::async_shutdown()` performs a TLS shutdown which exchanges messages (that's why it takes a `yield_context`) and thus has the potential to block the coroutine. Therefore, it should be protected with a timeout. As `async_shutdown()` doesn't simply take a timeout, this has to be implemented using a timer. So far, these timers are scattered throughout the codebase with some places missing them entirely. This commit adds helper functions to properly shutdown a TLS connection with a single function call.
This new helper functions allows deduplicating the timeout handling for `async_shutdown()`.
This new helper function has proper timeout handling which was missing here.
The reason for introducing AsioTlsStream::GracefulDisconnect() was to handle the TLS shutdown properly with a timeout since it involves a timeout. However, the implementation of this timeout involves spwaning coroutines which are redundant in some cases. This commit adds comments to the remaining calls of async_shutdown() stating why calling it is safe in these places.
e6d387d
to
a506d56
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine.
m_CheckLivenessTimer.cancel(); | ||
m_HeartbeatTimer.cancel(); | ||
|
||
m_Stream->lowest_layer().cancel(ec); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The worst what can hypothetically happen if we don't cancel the reader is that it reads...
shutdownTimeout->Cancel(); | ||
|
||
m_Stream->lowest_layer().shutdown(m_Stream->lowest_layer().shutdown_both, ec); | ||
m_Stream->GracefulDisconnect(m_IoStrand, yc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- ... during this and makes the shutdown unclean. But it is already unclean, see 🪵Log TLS shutdown error🛑, only if any #10259, so... 🤷♂️
{ | ||
if (!lowest_layer().is_open()) { | ||
// Already disconnected, nothing to do. | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is consistent with GracefulDisconnect() and shouldn't hurt – idk how Windows handles the -1 thing (#10005 (comment)) if you close a closed socket.😅
|
||
{ | ||
Timeout::Ptr shutdownTimeout(new Timeout(strand.context(), strand, boost::posix_time::seconds(10), | ||
[this](boost::asio::yield_context yc) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional
[this](boost::asio::yield_context yc) { | |
[this](boost::asio::yield_context) { |
Would otherwise add an unnecessary compiler warning.
ForceDisconnect(); | ||
} | ||
)); | ||
Defer cancelTimeout ([&shutdownTimeout]() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional
Defer cancelTimeout ([&shutdownTimeout]() { | |
Defer cancelTimeout ([&shutdownTimeout] { |
Would otherwise add an unnecessary CLion hint.
Properly closing a TLS connection involves sending some shutdown messages so that both ends know that the connection wasn't truncated maliciously. Exchanging those messages can stall for a long time if the underlying TCP connection is broken. The HTTP connection handling was missing any kind of timeout for the TLS shutdown so that dead connections could hang around for a long time.
This PR introduces two new methods on
AsioTlsStream
, namelyForceDisconnect()
which just wraps the call for closing the TCP connection andGracefulShutdown()
which performs the TLS shutdown with a timeout similar to it was done inJsonRpcConnection::Disconnect()
before.fixes #9986