-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trace HTTPClient request execution #320
base: tracing-development
Are you sure you want to change the base?
Trace HTTPClient request execution #320
Conversation
Can one of the admins verify this patch? |
4 similar comments
Can one of the admins verify this patch? |
Can one of the admins verify this patch? |
Can one of the admins verify this patch? |
Can one of the admins verify this patch? |
I chatted with @ktoso earlier to discuss the manual context propagation, and we agreed that we probably shouldn't deprecate the "old" API accepting a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So since technically we're 0.1 and something may change... how do we want to tackle adoption here.
I was thinking to kick off a branch like tracing
for now, so we can polish up there and once we're all confident merge into mainline? We could also tag those tracing releases, they'd follow normal releases e.g. 1.2.2-tracing.
I don't really expect anything breaking in the core APIs but the open telemetry support which we may want to use here could still fluctuate a little bit until they're final hmmm...
Package.swift
Outdated
], | ||
targets: [ | ||
.target( | ||
name: "AsyncHTTPClient", | ||
dependencies: ["NIO", "NIOHTTP1", "NIOSSL", "NIOConcurrencyHelpers", "NIOHTTPCompression", | ||
"NIOFoundationCompat", "NIOTransportServices", "Logging"] | ||
"NIOFoundationCompat", "NIOTransportServices", "Logging", "Instrumentation"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we right away go with Tracing
and do the full thing in a single PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's my intention. I've added a checklist to the PR including creating a Span. I first wanted to get the instrumentation part down and then continue with tracing, but all inside this PR.
047fbb0
to
87085d9
Compare
@swift-server-bot add to whitelist |
I'd like to punt this to a side-branch for iterative development if we can. |
Sure, sounds like a good approach. I can change the target branch once it's created. |
I've opened up the |
@ktoso The CI seems to fail because the Baggage repo cannot be cloned through the Git URL. Should we pin Tracing to 0.1.1 here in order to get the fix? (apple/swift-distributed-tracing/pull/25) |
No, we need to tag a 0.1.1, I'll do that in a moment. |
0.1.1. tagged, please depend on that. Thanks Cory for the development branch, sounds good 👍 |
87085d9
to
ae7268d
Compare
@swift-server-bot test this please |
Can drafts get CI validation? 🤔 |
Yes, they can: I think the CI isn't targeting that branch at the moment. |
ae7268d
to
329522c
Compare
Motivation: Currently when either we or the server send Connection: close, we correctly do not return that connection to the pool. However, we rely on the server actually performing the connection closure: we never call close() ourselves. This is unnecessarily optimistic: a server may absolutely fail to close this connection. To protect our own file descriptors, we should make sure that any connection we do not return the pool is closed. Modifications: If we think a connection is closing when we release it, we now call close() on it defensively. Result: We no longer leak connections when the server fails to close them. Fixes swift-server#324.
Motivation: Flaky tests are bad. This test is flaky because the server closes the connection immediately upon channelActive. In practice this can mean that the handshake never even gets a chance to start: by the time the SSLHandler ends up in the pipeline the connection is already dead. Heck, by the time we attempt to complete the connection the connection might be dead. Modifications: - Change the shutdown to be on first read. - Remove the disabled autoRead. - Change the expected NIOTS failure mode to connectTimeout, which is how this manifests in NIOTS. Result: Test is no longer flaky.
Adding the product dependency to the target by name only produces an error in Xcode 12.4. Instead, the product dependency should be given as a `.product`. Updated the README with the new format, so that new user's won't stumble over this.
Motivation: When we stream request body, current implementation expects that body will finish streaming _before_ we start to receive response body parts. This is not correct, reponse body parts can start to arrive before we finish sending the request. Modifications: - Simplifies state machine, we only case about request being fully sent to prevent sending body parts after .end, but response state machine is mostly ignored and correct flow will be handled by NIOHTTP1 pipeline - Adds HTTPEchoHandler, that replies to each response body part - Adds bi-directional streaming test Result: Closes swift-server#327
Motivation: HTTPResponseAggregator attempts to build a single, complete response object. This necessarily means it loads the entire response payload into memory. It wants to provide this payload as a single contiguous buffer of data, and it does so by aggregating the data into a single contiguous buffer as it goes. Because ByteBuffer does exponential reallocation, the cost of doing this should be amortised constant-time, even though we do have to copy some data sometimes. However, if this operation triggers a copy-on-write then the operation will become quadratic. For large buffers this will rapidly come to dominate the runtime. Unfortunately in at least Swift 5.3 Swift cannot safely see that during the body stanza the state variable is dead. Swift is not necessarily wrong about this: there's a cross-module call to ByteBuffer.writeBuffer in place and Swift cannot easily prove that that call will not lead to a re-entrant access of the `HTTPResponseAggregator` object. For this reason, during the call to `didReceiveBodyPart` there will be two copies of the body buffer alive, and so the write will CoW. This quadratic behaviour is a nasty performance trap that can become highly apparent even at quite small body sizes. Modifications: While Swift can't prove that the `self.state` variable is dead, we can! To that end, we temporarily set it to a different value that does not store the buffer in question. This will force Swift to drop the ref on the buffer, making it uniquely owned and avoiding the CoW. Sadly, it's extremely difficult to test for "does not CoW", so this patch does not currently come with any tests. I have experimentally verified the behaviour. Result: No copy-on-write in the HTTPResponseAggregator during body aggregation.
Motivation: There is an awkward timing window in the TLSEventsHandler flow where it is possible for the NIOSSLClientHandler to fail the handshake on handlerAdded. If this happens, the TLSEventsHandler will not be in the pipeline, and so the handshake failure error will be lost and we'll get a generic one instead. This window can be resolved without performance penalty if we use the new synchronous pipeline operations view to add the two handlers backwards. If this is done then we can ensure that the TLSEventsHandler is always in the pipeline before the NIOSSLClientHandler, and so there is no risk of event loss. While I'm here, AHC does a lot of pipeline modification. This has led to lengthy future chains with lots of event loop hops for no particularly good reason. I've therefore replaced all pipeline operations with their synchronous counterparts. All but one sequence was happening on the correct event loop, and for the one that may not I've added a fast-path dispatch that should tolerate being on the wrong one. The result is cleaner, more linear code that also reduces the allocations and event loop hops. Modifications: - Use synchronous pipeline operations everywhere - Change the order of adding TLSEventsHandler and NIOSSLClientHandler Result: Faster, safer, fewer timing windows.
Motivation: AsyncHTTPClient attempts to avoid the problem of Happy Eyeballs making it hard to know which Channel will be returned by only inserting the TLSEventsHandler upon completion of the connect promise. Unfortunately, as this may involve event loop hops, there are some awkward timing windows in play where the connect may complete before this handler gets added. We should remove that timing window by ensuring that all channels always have this handler in place, and instead of trying to wait until we know which Channel will win, we can find the TLSEventsHandler that belongs to the winning channel after the fact. Modifications: - TLSEventsHandler no longer removes itself from the pipeline or throws away its promise. - makeHTTP1Channel now searches for the TLSEventsHandler from the pipeline that was created and is also responsible for removing it. - Better sanity checking that the proxy TLS case does not overlap with the connection-level TLS case. Results: Further shrinking windows for pipeline management issues.
Motivation: Users of the HTTPClientResponseDelegate expect that the event loop futures returned from didReceiveHead and didReceiveBodyPart can be used to exert backpressure. To be fair to them, they somewhat can. However, the TaskHandler has a bit of a misunderstanding about how NIO backpressure works, and does not correctly manage the buffer of inbound data. The result of this misunderstanding is that multiple calls to didReceiveBodyPart and didReceiveHead can be outstanding at once. This would likely lead to severe bugs in most delegates, as they do not expect it. We should make things work the way delegate implementers believe it works. Modifications: - Added a buffer to the TaskHandler to avoid delivering data that the delegate is not ready for. - Added a new "pending close" state that keeps track of a state where the TaskHandler has received .end but not yet delivered it to the delegate. This allows better error management. - Added some more tests. - Documented our backpressure commitments. Result: Better respect for backpressure. Resolves swift-server#348
329522c
to
d68cb8f
Compare
motivation: 5.4 is out! changes: * update Dockerfile handling of rubygems * add docker compose setup for ubuntu 20.04 and 5.4 toolchain
motivation: test with nightly toolchain changes: add docker compose setup for ubuntu 20.04 and nightly toolchain
Adds support for request-specific TLS configuration: Request(url: "https://webserver.com", tlsConfiguration: .forClient())
Motivation: At the moment, AHC assumes that creating a `NIOSSLContext` is both cheap and doesn't block. Neither of these two assumptions are true. To create a `NIOSSLContext`, BoringSSL will have to read a lot of certificates in the trust store (on disk) which require a lot of ASN1 parsing and much much more. On my Ubuntu test machine, creating one `NIOSSLContext` is about 27,000 allocations!!! To make it worse, AHC allocates a fresh `NIOSSLContext` for _every single connection_, whether HTTP or HTTPS. Yes, correct. Modification: - Cache NIOSSLContexts per TLSConfiguration in a LRU cache - Don't get an NIOSSLContext for HTTP (plain text) connections Result: New connections should be _much_ faster in general assuming that you're not using a different TLSConfiguration for every connection.
…ver#350) This PR is a result of another swift-server#321. In that PR I provided an alternative structure to TLSConfiguration for when connecting with Transport Services. In this one I construct the NWProtocolTLS.Options from TLSConfiguration. It does mean a little more work for whenever we make a connection, but having spoken to @weissi he doesn't seem to think that is an issue. Also there is no method to create a SecIdentity at the moment. We need to generate a pkcs#12 from the certificate chain and private key, which can then be used to create the SecIdentity. This should resolve swift-server#292
…rver#368) Motivation: In the vast majority of cases, we'll only ever create one and only one `NIOSSLContext`. It's therefore wasteful to keep around a whole thread doing nothing just for that. A `DispatchQueue` is absolutely fine here. Modification: Run the `NIOSSLContext` creation on a `DispatchQueue` instead. Result: Fewer threads hanging around.
Co-authored-by: Johannes Weiss <[email protected]>
Motivation: In order to instrument distributed systems, metadata such as trace ids must be propagated across network boundaries. As HTTPClient operates at one such boundary, it should take care of injecting metadata into HTTP headers automatically using the configured instrument. Modifications: HTTPClient gains new method overloads accepting LoggingContext. Result: - New HTTPClient method overloads accepting LoggingContext - Existing overloads accepting Logger construct a DefaultLoggingContext - Existing methods that neither take Logger nor LoggingContext construct a DefaultLoggingContext
Fix building on macOS 12
Motivation:
Context Propagation
In order to instrument distributed systems, metadata such as trace ids
must be propagated across network boundaries through HTTP headers.
As HTTPClient operates at one such boundary, it should take care of
injecting metadata into HTTP headers automatically using the configured
instrument.
Built-in tracing
Furthermore,
HTTPClient
should create aSpan
for executed requestsunder the hood, so that users benefit from tracing effortlessly.
Modifications:
HTTPClient
method overloads acceptingLoggingContext
Span
for executed HTTP requestResult:
a DefaultLoggingContext