-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(de)compression: reduce memory allocation to improve performance #521
(de)compression: reduce memory allocation to improve performance #521
Conversation
Currently, every time `WrapBody::poll_frame` is called, new instance of `BytesMut` is created with the default capacity, which is effectively 64 bytes. This ends up with a lot of memory allocation in certain situations, making the throughput significantly worse. To optimize memory allocation, `WrapBody` now gets `BytesMut` as its field, with initial capacity of 4096 bytes. This buffer will be reused as much as possible across multiple `poll_frame` calls, and only when its capacity becomes 0, new allocation of another 4096 bytes is performed. Fixes: tower-rs#520
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Phenomenal write-up, thanks!
This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [tower-http](https://redirect.github.com/tower-rs/tower-http) | dependencies | patch | `0.6.0` -> `0.6.1` | --- ### Release Notes <details> <summary>tower-rs/tower-http (tower-http)</summary> ### [`v0.6.1`](https://redirect.github.com/tower-rs/tower-http/releases/tag/tower-http-0.6.1): v0.6.1 [Compare Source](https://redirect.github.com/tower-rs/tower-http/compare/tower-http-0.6.0...tower-http-0.6.1) #### Fixed - **decompression:** reuse scratch buffer to significantly reduce allocations and improve performance ([#​521]) [#​521]: https://redirect.github.com/tower-rs/tower-http/pull/521 #### New Contributors - [@​magurotuna](https://redirect.github.com/magurotuna) made their first contribution in [https://github.com/tower-rs/tower-http/pull/521](https://redirect.github.com/tower-rs/tower-http/pull/521) </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Enabled. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR was generated by [Mend Renovate](https://mend.io/renovate/). View the [repository job log](https://developer.mend.io/github/apollographql/subgraph-template-rust-async-graphql). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOC44MC4wIiwidXBkYXRlZEluVmVyIjoiMzguODAuMCIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOltdfQ==--> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [tower-http](https://redirect.github.com/tower-rs/tower-http) | dependencies | patch | `0.6.0` -> `0.6.1` | --- ### Release Notes <details> <summary>tower-rs/tower-http (tower-http)</summary> ### [`v0.6.1`](https://redirect.github.com/tower-rs/tower-http/releases/tag/tower-http-0.6.1): v0.6.1 [Compare Source](https://redirect.github.com/tower-rs/tower-http/compare/tower-http-0.6.0...tower-http-0.6.1) #### Fixed - **decompression:** reuse scratch buffer to significantly reduce allocations and improve performance ([#​521]) [#​521]: https://redirect.github.com/tower-rs/tower-http/pull/521 #### New Contributors - [@​magurotuna](https://redirect.github.com/magurotuna) made their first contribution in [https://github.com/tower-rs/tower-http/pull/521](https://redirect.github.com/tower-rs/tower-http/pull/521) </details> <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOC45NC4yIiwidXBkYXRlZEluVmVyIjoiMzguOTQuMiIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsidHlwZS9wYXRjaCJdfQ==--> Co-authored-by: repo-jeeves[bot] <106431701+repo-jeeves[bot]@users.noreply.github.com>
This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [tower-http](https://redirect.github.com/tower-rs/tower-http) | dependencies | patch | `0.6.0` -> `0.6.1` | --- ### Release Notes <details> <summary>tower-rs/tower-http (tower-http)</summary> ### [`v0.6.1`](https://redirect.github.com/tower-rs/tower-http/releases/tag/tower-http-0.6.1): v0.6.1 [Compare Source](https://redirect.github.com/tower-rs/tower-http/compare/tower-http-0.6.0...tower-http-0.6.1) #### Fixed - **decompression:** reuse scratch buffer to significantly reduce allocations and improve performance ([#​521]) [#​521]: https://redirect.github.com/tower-rs/tower-http/pull/521 #### New Contributors - [@​magurotuna](https://redirect.github.com/magurotuna) made their first contribution in [https://github.com/tower-rs/tower-http/pull/521](https://redirect.github.com/tower-rs/tower-http/pull/521) </details> <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOC45NC4yIiwidXBkYXRlZEluVmVyIjoiMzguOTQuMiIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsidHlwZS9wYXRjaCJdfQ==--> Co-authored-by: repo-jeeves[bot] <106431701+repo-jeeves[bot]@users.noreply.github.com>
This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [tower-http](https://redirect.github.com/tower-rs/tower-http) | dependencies | patch | `0.6.0` -> `0.6.1` | --- ### Release Notes <details> <summary>tower-rs/tower-http (tower-http)</summary> ### [`v0.6.1`](https://redirect.github.com/tower-rs/tower-http/releases/tag/tower-http-0.6.1): v0.6.1 [Compare Source](https://redirect.github.com/tower-rs/tower-http/compare/tower-http-0.6.0...tower-http-0.6.1) #### Fixed - **decompression:** reuse scratch buffer to significantly reduce allocations and improve performance ([#​521]) [#​521]: https://redirect.github.com/tower-rs/tower-http/pull/521 #### New Contributors - [@​magurotuna](https://redirect.github.com/magurotuna) made their first contribution in [https://github.com/tower-rs/tower-http/pull/521](https://redirect.github.com/tower-rs/tower-http/pull/521) </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR was generated by [Mend Renovate](https://mend.io/renovate/). View the [repository job log](https://developer.mend.io/github/mist-id/mist). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOC44MC4wIiwidXBkYXRlZEluVmVyIjoiMzguODAuMCIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOltdfQ==--> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
I just wanna say amazing work! I work on Relay, which get's quite a chunk of compressed traffic. I updated tower and axum the other day and noticed about a 20% decrease in tokio runtime busy threads, a 4x decrease in time spent reading (+decompressing) request bodies, a 50% reduction in the average latency and about 30% reduced latency in the max and p95's. Now looking through several changelogs and git commits trying to find what caused this, I am relatively sure it was this PR! |
Motivation
Currently, every time
WrapBody::poll_frame
is called, new instance ofBytesMut
is created with the default capacity, which is effectively 64 bytes. This ends up with a lot of memory allocation in certain situations, making the throughput significantly worse.#520 describes that a manual implementation of decompression logic using
async-compression
gets the performance issue withtower_http::decompression
to be resolved. To identify what is making this difference, I captured a flamegraph. Here's the result:As annotated,
tower_http::decompression
has more memory allocation which is the reason why the performance is degraded.Solution
To optimize memory allocation,
WrapBody
now getsBytesMut
as its field, with initial capacity of 4096 bytes. This buffer will be reused as much as possible across multiplepoll_frame
calls, and only when its capacity becomes 0, new allocation of another 4096 bytes is performed.The capacity of 4096 bytes is taken from the default capacity of the internal buffer used in
tokio_util::io::ReaderStream
: https://docs.rs/tokio-util/0.7.12/src/tokio_util/io/reader_stream.rs.html#8Performance Improvement
With this optimization, the flamegraph is changed as follows, where memory allocation is significantly reduced and optimized.
Also, the throughput gets 10x better when we measure it using Deno's
fetch
implementation that internally useshyper
andtower_http::decompresson
.This graph shows how long each Deno version takes to handle 2k requests. The rightmost one named
v2.0.0-rc.4-tower-http-patched
is the one with this patch applied. The leftmost one, v1.45.2, does not usetower_http::decompression
.For more information on how this result was measured, please refer to https://github.com/magurotuna/deno_fetch_decompression_throughput
Fixes: #520