Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Introduce thread-local storage variable to update thread's own used_m…
…emory and sum when reading to reduce atomic contention. (#674) #### Description This patch try to introduce a thread-local storage variable for each thread to update its own `used_memory`, and then sum them together when reading in `zmalloc_used_memory`. Then we can reduce unnecessary `lock add` contention from atomic variable. We also add a protection if too many threads created and the total threads number greater than 132, then fall back to atomic operation for the threads index >= 132. #### Problem Statement `zmalloc` and `zfree` related functions will update the `used_memory` atomicity for each operation, and they are called very frequency. From the benchmark of [memtier_benchmark-1Mkeys-load-stream-5-fields-with-100B-values-pipeline-10.yml](https://github.com/redis/redis-benchmarks-specification/blob/main/redis_benchmarks_specification/test-suites/memtier_benchmark-1Mkeys-load-stream-5-fields-with-100B-values-pipeline-10.yml) , the cycles ratio of `zmalloc` and `zfree` are high, they are wrappers for the lower allocator library, it should not take too much cycles. And most of the cycles are contributed by `lock add` and `lock sub` , they are expensive instructions. From the profiling, the metrics' update mainly come from the main thread, use a TLS will reduce a lot of contention. #### Performance Boost **Note:** This optimization should benefit common benchmark widely. I choose below 2 scenarios to validate the performance boost in my local environment. | Test Suites | Performance Boost | |-|-| |[memtier_benchmark-1Mkeys-load-stream-5-fields-with-100B-values-pipeline-10](https://github.com/redis/redis-benchmarks-specification/blob/main/redis_benchmarks_specification/test-suites/memtier_benchmark-1Mkeys-load-stream-5-fields-with-100B-values-pipeline-10.yml)|8%| |[memtier_benchmark-1Mkeys-load-string-with-100B-values-pipeline-10](https://github.com/redis/redis-benchmarks-specification/blob/main/redis_benchmarks_specification/test-suites/memtier_benchmark-1Mkeys-load-string-with-100B-values-pipeline-10.yml)|4%| ##### Test Env - OS: Ubuntu 22.04.4 LTS - Platform: Intel Xeon Platinum 8380 - Server and Client in same socket ##### Start Server ```sh taskset -c 0-3 ~/valkey/src/valkey-server /tmp/valkey_1.conf port 9001 bind * -::* daemonize yes protected-mode no save "" ``` --------- Signed-off-by: Lipeng Zhu <[email protected]> Co-authored-by: Wangyang Guo <[email protected]>
- Loading branch information