Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add benchmark for cluster API #453

Merged
merged 1 commit into from
Jan 24, 2024
Merged

Add benchmark for cluster API #453

merged 1 commit into from
Jan 24, 2024

Conversation

akudiyar
Copy link
Collaborator

@akudiyar akudiyar commented Jan 14, 2024

Before memory optimizations

Benchmark results on commit a6199e8 (before optimizations):

Total time: 00:33:33

Benchmark                                            Mode  Cnt      Score      Error  Units
ClusterBenchmarkRunner.readDataUsingCallAPI         thrpt   10  13761.025 ± 2657.463  ops/s
ClusterBenchmarkRunner.readDataUsingCallAPI:·jfr    thrpt             NaN               ---
ClusterBenchmarkRunner.readDataUsingSpaceAPI        thrpt   10  12033.690 ± 2032.909  ops/s
ClusterBenchmarkRunner.readDataUsingSpaceAPI:·jfr   thrpt             NaN               ---
ClusterBenchmarkRunner.writeDataUsingCallAPI        thrpt   10   6175.897 ±  630.966  ops/s
ClusterBenchmarkRunner.writeDataUsingCallAPI:·jfr   thrpt             NaN               ---
ClusterBenchmarkRunner.writeDataUsingSpaceAPI       thrpt   10   7123.615 ±  894.854  ops/s
ClusterBenchmarkRunner.writeDataUsingSpaceAPI:·jfr  thrpt             NaN               ---

Allocations for ClusterBenchmarkRunner.writeDataUsingCallAPI (before optimizations):

Allocation_before

TLAB allocations for ClusterBenchmarkRunner.writeDataUsingCallAPI (before optimizations):

TLAB_allocation_before

After memory optimizations

Benchmark results on commit 5f9e69c (after optimizations):

Total time: 00:24:56

Benchmark                                            Mode  Cnt      Score      Error  Units
ClusterBenchmarkRunner.readDataUsingCallAPI         thrpt   10  19674.739 ± 2977.708  ops/s
ClusterBenchmarkRunner.readDataUsingCallAPI:·jfr    thrpt             NaN               ---
ClusterBenchmarkRunner.readDataUsingSpaceAPI        thrpt   10  17092.066 ± 5721.286  ops/s
ClusterBenchmarkRunner.readDataUsingSpaceAPI:·jfr   thrpt             NaN               ---
ClusterBenchmarkRunner.writeDataUsingCallAPI        thrpt   10   8012.627 ± 1120.023  ops/s
ClusterBenchmarkRunner.writeDataUsingCallAPI:·jfr   thrpt             NaN               ---
ClusterBenchmarkRunner.writeDataUsingSpaceAPI       thrpt   10   9157.278 ± 3705.178  ops/s
ClusterBenchmarkRunner.writeDataUsingSpaceAPI:·jfr  thrpt             NaN               ---

Allocations for ClusterBenchmarkRunner.writeDataUsingCallAPI (after optimizations):

Allocation_after

TLAB allocations for ClusterBenchmarkRunner.writeDataUsingCallAPI (after optimizations):

TLAB_allocation_after

Summary

The average throughput gain is about 40% on read operations and 30% on write operations. The benchmark run completes approx. 30% faster with optimizations. The main reason for improvements can be a >50% reduction in allocations.

Benchmark runs were performed with JDK 17, G1 GC (default settings), using the following command:

mvn exec:exec -Pbenchmark -Dbenchmark="ClusterBenchmarkRunner" -DbenchmarkArgs="-prof=jfr"

@akudiyar akudiyar force-pushed the optimize-netty-handlers branch from 252a38f to 79551cd Compare January 19, 2024 20:53
@akudiyar akudiyar marked this pull request as ready for review January 19, 2024 21:01
@akudiyar akudiyar requested review from ArtDu, nickkkccc and bitgorbovsky and removed request for ArtDu and nickkkccc January 19, 2024 21:05
Copy link
Contributor

@ArtDu ArtDu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks sagnificent. Only nitpick comments

Comment on lines +57 to +58
.withConnections(2)
.withEventLoopThreadsNumber(2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

I think it's more interesting to compare on 1 connection = 1 thread. I don't think we need more. Throughput will be increased with thread increasing but it depends on how many cores you have on running pc

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, perhaps, but I also believe that the behavior will be too much like the behavior of internal netty parts, and it is too far from the real-world scenario.

@akudiyar akudiyar added this pull request to the merge queue Jan 24, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to a conflict with the base branch Jan 24, 2024
Additionally
 - add custom profile for running the benchmark
 - fix read benchmarks being run without data in the cluster
@akudiyar akudiyar force-pushed the optimize-netty-handlers branch from 79551cd to 0248c04 Compare January 24, 2024 01:45
@akudiyar akudiyar added this pull request to the merge queue Jan 24, 2024
Merged via the queue into master with commit 9ca555e Jan 24, 2024
3 checks passed
@akudiyar akudiyar deleted the optimize-netty-handlers branch January 24, 2024 01:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants