Add benchmark for cluster API #453

akudiyar · 2024-01-14T19:25:51Z

Before memory optimizations

Benchmark results on commit a6199e8 (before optimizations):

Total time: 00:33:33

Benchmark                                            Mode  Cnt      Score      Error  Units
ClusterBenchmarkRunner.readDataUsingCallAPI         thrpt   10  13761.025 ± 2657.463  ops/s
ClusterBenchmarkRunner.readDataUsingCallAPI:·jfr    thrpt             NaN               ---
ClusterBenchmarkRunner.readDataUsingSpaceAPI        thrpt   10  12033.690 ± 2032.909  ops/s
ClusterBenchmarkRunner.readDataUsingSpaceAPI:·jfr   thrpt             NaN               ---
ClusterBenchmarkRunner.writeDataUsingCallAPI        thrpt   10   6175.897 ±  630.966  ops/s
ClusterBenchmarkRunner.writeDataUsingCallAPI:·jfr   thrpt             NaN               ---
ClusterBenchmarkRunner.writeDataUsingSpaceAPI       thrpt   10   7123.615 ±  894.854  ops/s
ClusterBenchmarkRunner.writeDataUsingSpaceAPI:·jfr  thrpt             NaN               ---

Allocations for ClusterBenchmarkRunner.writeDataUsingCallAPI (before optimizations):

TLAB allocations for ClusterBenchmarkRunner.writeDataUsingCallAPI (before optimizations):

After memory optimizations

Benchmark results on commit 5f9e69c (after optimizations):

Total time: 00:24:56

Benchmark                                            Mode  Cnt      Score      Error  Units
ClusterBenchmarkRunner.readDataUsingCallAPI         thrpt   10  19674.739 ± 2977.708  ops/s
ClusterBenchmarkRunner.readDataUsingCallAPI:·jfr    thrpt             NaN               ---
ClusterBenchmarkRunner.readDataUsingSpaceAPI        thrpt   10  17092.066 ± 5721.286  ops/s
ClusterBenchmarkRunner.readDataUsingSpaceAPI:·jfr   thrpt             NaN               ---
ClusterBenchmarkRunner.writeDataUsingCallAPI        thrpt   10   8012.627 ± 1120.023  ops/s
ClusterBenchmarkRunner.writeDataUsingCallAPI:·jfr   thrpt             NaN               ---
ClusterBenchmarkRunner.writeDataUsingSpaceAPI       thrpt   10   9157.278 ± 3705.178  ops/s
ClusterBenchmarkRunner.writeDataUsingSpaceAPI:·jfr  thrpt             NaN               ---

Allocations for ClusterBenchmarkRunner.writeDataUsingCallAPI (after optimizations):

TLAB allocations for ClusterBenchmarkRunner.writeDataUsingCallAPI (after optimizations):

Summary

The average throughput gain is about 40% on read operations and 30% on write operations. The benchmark run completes approx. 30% faster with optimizations. The main reason for improvements can be a >50% reduction in allocations.

Benchmark runs were performed with JDK 17, G1 GC (default settings), using the following command:

mvn exec:exec -Pbenchmark -Dbenchmark="ClusterBenchmarkRunner" -DbenchmarkArgs="-prof=jfr"

ArtDu

Looks sagnificent. Only nitpick comments

src/main/java/io/tarantool/driver/api/MessagePackMapperBuilder.java

ArtDu · 2024-01-23T20:09:38Z

src/test/java/io/tarantool/driver/benchmark/ClusterTarantoolSetup.java

+            .withConnections(2)
+            .withEventLoopThreadsNumber(2)


nit:

I think it's more interesting to compare on 1 connection = 1 thread. I don't think we need more. Throughput will be increased with thread increasing but it depends on how many cores you have on running pc

Yeah, perhaps, but I also believe that the behavior will be too much like the behavior of internal netty parts, and it is too far from the real-world scenario.

Additionally - add custom profile for running the benchmark - fix read benchmarks being run without data in the cluster

akudiyar force-pushed the optimize-netty-handlers branch from 252a38f to 79551cd Compare January 19, 2024 20:53

akudiyar mentioned this pull request Jan 19, 2024

Replace streams with loops in DefaultMessagePackMapper #439

Merged

akudiyar marked this pull request as ready for review January 19, 2024 21:01

akudiyar requested review from ArtDu, nickkkccc and bitgorbovsky and removed request for ArtDu and nickkkccc January 19, 2024 21:05

ArtDu approved these changes Jan 23, 2024

View reviewed changes

akudiyar added this pull request to the merge queue Jan 24, 2024

github-merge-queue bot removed this pull request from the merge queue due to a conflict with the base branch Jan 24, 2024

Add benchmark for inserting a tuple using call API

0248c04

Additionally - add custom profile for running the benchmark - fix read benchmarks being run without data in the cluster

akudiyar force-pushed the optimize-netty-handlers branch from 79551cd to 0248c04 Compare January 24, 2024 01:45

akudiyar added this pull request to the merge queue Jan 24, 2024

Merged via the queue into master with commit 9ca555e Jan 24, 2024
3 checks passed

akudiyar deleted the optimize-netty-handlers branch January 24, 2024 01:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add benchmark for cluster API #453

Add benchmark for cluster API #453

akudiyar commented Jan 14, 2024 •

edited

Loading

ArtDu left a comment

ArtDu Jan 23, 2024

akudiyar Jan 23, 2024

Add benchmark for cluster API #453

Add benchmark for cluster API #453

Conversation

akudiyar commented Jan 14, 2024 • edited Loading

Before memory optimizations

After memory optimizations

Summary

ArtDu left a comment

Choose a reason for hiding this comment

ArtDu Jan 23, 2024

Choose a reason for hiding this comment

akudiyar Jan 23, 2024

Choose a reason for hiding this comment

akudiyar commented Jan 14, 2024 •

edited

Loading