Refactor NatsMessage to use ByteBuffer internally. #349

jameshilliard · 2020-10-16T13:13:47Z

Some groundwork for #347.

jameshilliard · 2020-10-17T16:20:53Z

src/main/java/io/nats/client/impl/NatsMessage.java

        this.data = null; // will set data and size after we read it
    }

+    private static int fastIntLength(int number) {


This implementation while a bit verbose is significantly faster than a naive implementation and allows us to eliminate a separate length buffer which lets us write directly to protocolBytes avoiding some somewhat expensive copy operations.

jameshilliard · 2020-10-17T16:25:21Z

src/main/java/io/nats/client/impl/NatsMessage.java

-
-    NatsMessage next; // for linked list
-
-    static final byte[] digits = {'0', '1', '2', '3', '4', '5', '6', '7', '8', '9'};


This isn't needed, should be faster to just add 0x30 to convert a byte to ascii.

add to the PR, should be ok - although i might - since i am paranoid - convert 0 to a byte to get the value :-)

ASCII "0" to byte gives you 0x30, I'm doing this in integer mode then casting to byte when writing to the buffer, at least I think that's probably the most efficient way to do it.

jameshilliard · 2020-10-17T16:28:16Z

src/main/java/io/nats/client/impl/NatsMessage.java

        } else {
            // Convert the length to bytes
-            byte[] lengthBytes = new byte[12];


Better to write this directly to protocolBytes.

jameshilliard · 2020-10-17T16:29:27Z

src/main/java/io/nats/client/impl/NatsMessage.java

+                int base = protocolBytes.limit();
+                for (int i = size; i > 0; i /= 10) {
+                    base--;
+                    protocolBytes.put(base, (byte)(i % 10 + 0x30));


We can just add 0x30 instead of using the lookup table approach.

jameshilliard · 2020-10-17T16:30:30Z

src/main/java/io/nats/client/impl/NatsMessage.java

    boolean isProtocol() {
        return this.subject == null;
    }

    // Will be null on an incoming message
    byte[] getProtocolBytes() {
-        return this.protocolBytes;
+        if (this.protocolBytes.hasArray() && this.protocolBytes.limit() == this.protocolBytes.capacity()) {
+            return this.protocolBytes.array();


This is the normal fast path that avoids copying.

sasbury · 2020-10-20T03:39:14Z

Nothing bad jumped out - will let Colin or someone from Synadia confirm.

jameshilliard · 2020-10-20T11:57:30Z

src/main/java/io/nats/client/impl/NatsMessage.java

    }

    // Create a protocol only message to publish
    NatsMessage(CharBuffer protocol) {
-        ByteBuffer byteBuffer = StandardCharsets.UTF_8.encode(protocol);


This does not allocate a correctly sized ByteBuffer.

good catch!

Yep, this sort of issue is one of the reasons I'm trying to move away from any low level char/string based protocol encoding/decoding to UTF-8/ASCII byte/bytebuffer based encoding/decoding as much as possible.

jameshilliard · 2020-10-20T11:59:13Z

src/main/java/io/nats/client/impl/NatsMessage.java

-        Arrays.fill(byteBuffer.array(), (byte) 0); // clear sensitive data
-        this.sizeInBytes = this.protocolBytes.length + 2;// for \r\n
+        protocol.mark();
+        this.protocolBytes = ByteBuffer.allocate(fastUtf8Length(protocol));


By pre-allocating a correctly sized ByteBuffer we can eliminate the slow path that involves copying bytes in getProtocolBytes().

jameshilliard · 2020-10-20T12:04:00Z

src/main/java/io/nats/client/impl/NatsMessage.java

    boolean isProtocol() {
        return this.subject == null;
    }

    // Will be null on an incoming message
    byte[] getProtocolBytes() {
-        return this.protocolBytes;
+        return this.protocolBytes.array();


Since all ByteBuffer's now have correctly allocated sizes we can unconditionally use the fast path here.

jameshilliard · 2020-10-20T17:34:36Z

Here's what I'm seeing performance wise:
Before:

11:25:10 AM: Executing task 'NatsAutoBench.main()'...

> Task :compileJava UP-TO-DATE
> Task :processResources NO-SOURCE
> Task :classes UP-TO-DATE

> Task :NatsAutoBench.main()
Connecting to NATS server at nats://localhost:4222
Running warmup
Current memory usage is 1019.25 mb / 1.02 gb / 16.00 gb free/total/max
Executing tests ........................................

PubOnly 0            10,000,000          5,138,146 msg/s       0.00 b/s
PubOnly 8            10,000,000          4,193,068 msg/s     31.99 mb/s
PubOnly 32           10,000,000          4,184,061 msg/s    127.69 mb/s
PubOnly 256          10,000,000          4,602,890 msg/s      1.10 gb/s
PubOnly 512          10,000,000          4,071,716 msg/s      1.94 gb/s
PubOnly 1024          1,000,000          3,113,506 msg/s      2.97 gb/s
PubOnly 4096            500,000            936,171 msg/s      3.57 gb/s
PubOnly 8192            100,000            461,962 msg/s      3.52 gb/s

PubSub 0             10,000,000          1,499,659 msg/s       0.00 b/s
PubSub 8             10,000,000          1,446,349 msg/s     11.03 mb/s
PubSub 32            10,000,000          1,330,344 msg/s     40.60 mb/s
PubSub 256           10,000,000          1,115,961 msg/s    272.45 mb/s
PubSub 512           10,000,000            981,733 msg/s    479.36 mb/s
PubSub 1024           1,000,000            839,707 msg/s    820.03 mb/s
PubSub 4096             500,000            347,289 msg/s      1.32 gb/s
PubSub 8192             100,000            144,818 msg/s      1.10 gb/s

PubDispatch 0        10,000,000          1,561,210 msg/s       0.00 b/s
PubDispatch 8        10,000,000          1,527,177 msg/s     11.65 mb/s
PubDispatch 32       10,000,000          1,414,433 msg/s     43.17 mb/s
PubDispatch 256      10,000,000          1,269,441 msg/s    309.92 mb/s
PubDispatch 512      10,000,000          1,105,003 msg/s    539.55 mb/s
PubDispatch 1024      1,000,000          1,002,108 msg/s    978.62 mb/s
PubDispatch 4096        500,000            347,199 msg/s      1.32 gb/s
PubDispatch 8192        100,000            195,271 msg/s      1.49 gb/s

ReqReply 0               20,000              9,905 msg/s       0.00 b/s
ReqReply 8               20,000              9,844 msg/s     76.91 kb/s
ReqReply 32              10,000              9,873 msg/s    308.54 kb/s
ReqReply 256             10,000              9,835 msg/s      2.40 mb/s
ReqReply 512             10,000              9,883 msg/s      4.83 mb/s
ReqReply 1024            10,000             10,172 msg/s      9.93 mb/s
ReqReply 4096            10,000              9,430 msg/s     36.84 mb/s
ReqReply 8192            10,000              9,154 msg/s     71.52 mb/s

Latency 0     5,000     39 /  52.55 / 287    +/- 0.78  (microseconds)
Latency 8     5,000     42 /  51.54 / 260    +/- 0.78  (microseconds)
Latency 32    5,000     41 /  50.87 / 167    +/- 0.76  (microseconds)
Latency 256   5,000     42 /  55.18 / 253    +/- 0.89  (microseconds)
Latency 512   5,000     43 /  59.28 / 260    +/- 0.90  (microseconds)
Latency 1024  5,000     42 /  51.17 / 187    +/- 0.75  (microseconds)
Latency 4096  5,000     42 /  52.90 / 267    +/- 0.79  (microseconds)
Latency 8192  5,000     45 /  55.60 / 859    +/- 0.85  (microseconds)

Final memory usage is 67.88 mb / 80.00 mb / 16.00 gb free/total/max

BUILD SUCCESSFUL in 2m 25s
2 actionable tasks: 1 executed, 1 up-to-date
11:27:36 AM: Task execution finished 'NatsAutoBench.main()'.

After:

11:30:12 AM: Executing task 'NatsAutoBench.main()'...

> Task :compileJava UP-TO-DATE
> Task :processResources NO-SOURCE
> Task :classes UP-TO-DATE

> Task :NatsAutoBench.main()
Connecting to NATS server at nats://localhost:4222
Running warmup
Current memory usage is 999.58 mb / 1.02 gb / 16.00 gb free/total/max
Executing tests ........................................

PubOnly 0            10,000,000          4,826,092 msg/s       0.00 b/s
PubOnly 8            10,000,000          5,019,848 msg/s     38.30 mb/s
PubOnly 32           10,000,000          4,330,835 msg/s    132.17 mb/s
PubOnly 256          10,000,000          4,942,284 msg/s      1.18 gb/s
PubOnly 512          10,000,000          4,086,727 msg/s      1.95 gb/s
PubOnly 1024          1,000,000          3,057,141 msg/s      2.92 gb/s
PubOnly 4096            500,000            916,903 msg/s      3.50 gb/s
PubOnly 8192            100,000            467,968 msg/s      3.57 gb/s

PubSub 0             10,000,000          1,533,636 msg/s       0.00 b/s
PubSub 8             10,000,000          1,433,106 msg/s     10.93 mb/s
PubSub 32            10,000,000          1,285,728 msg/s     39.24 mb/s
PubSub 256           10,000,000          1,096,267 msg/s    267.64 mb/s
PubSub 512           10,000,000            927,871 msg/s    453.06 mb/s
PubSub 1024           1,000,000            843,623 msg/s    823.85 mb/s
PubSub 4096             500,000            300,964 msg/s      1.15 gb/s
PubSub 8192             100,000            149,556 msg/s      1.14 gb/s

PubDispatch 0        10,000,000          1,303,114 msg/s       0.00 b/s
PubDispatch 8        10,000,000          1,347,450 msg/s     10.28 mb/s
PubDispatch 32       10,000,000          1,323,138 msg/s     40.38 mb/s
PubDispatch 256      10,000,000          1,157,187 msg/s    282.52 mb/s
PubDispatch 512      10,000,000          1,053,176 msg/s    514.25 mb/s
PubDispatch 1024      1,000,000            916,186 msg/s    894.71 mb/s
PubDispatch 4096        500,000            339,930 msg/s      1.30 gb/s
PubDispatch 8192        100,000            195,462 msg/s      1.49 gb/s

ReqReply 0               20,000              8,654 msg/s       0.00 b/s
ReqReply 8               20,000              8,855 msg/s     69.18 kb/s
ReqReply 32              10,000              8,619 msg/s    269.35 kb/s
ReqReply 256             10,000              8,262 msg/s      2.02 mb/s
ReqReply 512             10,000              8,927 msg/s      4.36 mb/s
ReqReply 1024            10,000              9,861 msg/s      9.63 mb/s
ReqReply 4096            10,000              9,109 msg/s     35.58 mb/s
ReqReply 8192            10,000              9,483 msg/s     74.09 mb/s

Latency 0     5,000     39 /  51.76 / 144    +/- 0.76  (microseconds)
Latency 8     5,000     39 /  50.99 / 301    +/- 0.76  (microseconds)
Latency 32    5,000     40 /  49.90 / 237    +/- 0.75  (microseconds)
Latency 256   5,000     40 /  50.53 / 255    +/- 0.78  (microseconds)
Latency 512   5,000     41 /  51.41 / 256    +/- 0.78  (microseconds)
Latency 1024  5,000     41 /  51.35 / 202    +/- 0.77  (microseconds)
Latency 4096  5,000     43 /  54.97 / 272    +/- 0.82  (microseconds)
Latency 8192  5,000     45 /  55.23 / 807    +/- 0.85  (microseconds)

Final memory usage is 67.84 mb / 80.00 mb / 16.00 gb free/total/max

BUILD SUCCESSFUL in 2m 29s
2 actionable tasks: 1 executed, 1 up-to-date
11:32:42 AM: Task execution finished 'NatsAutoBench.main()'.

jameshilliard · 2020-10-20T19:49:14Z

Before:

1:46:59 PM: Executing task 'MessageProtocolCreationBenchmark.main()'...

> Task :compileJava UP-TO-DATE
> Task :processResources NO-SOURCE
> Task :classes UP-TO-DATE
> Task :compileTestJava UP-TO-DATE
> Task :processTestResources UP-TO-DATE
> Task :testClasses UP-TO-DATE

> Task :MessageProtocolCreationBenchmark.main()
### Running benchmarks with 50,000,000 messages.

### Total time to create 50,000,000 non-utf8 messages for sending was 1,748 ms
	34.977587 ns/op
	28,589,736.279 op/sec

### Total time to create 50,000,000 utf8 messages for sending was 3,065 ms
	61.308911 ns/op
	16,310,842.564 op/sec

### Total time to create 50,000,000 a protocol message was 913 ms
	18.262107 ns/op
	54,758,195.611 op/sec

BUILD SUCCESSFUL in 6s
4 actionable tasks: 1 executed, 3 up-to-date
1:47:05 PM: Task execution finished 'MessageProtocolCreationBenchmark.main()'.

Benchmarks after further optimization of fastUtf8Length:

1:47:11 PM: Executing task 'MessageProtocolCreationBenchmark.main()'...

> Task :compileJava UP-TO-DATE
> Task :processResources NO-SOURCE
> Task :classes UP-TO-DATE
> Task :compileTestJava UP-TO-DATE
> Task :processTestResources UP-TO-DATE
> Task :testClasses UP-TO-DATE

> Task :MessageProtocolCreationBenchmark.main()
### Running benchmarks with 50,000,000 messages.

### Total time to create 50,000,000 non-utf8 messages for sending was 1,870 ms
	37.406164 ns/op
	26,733,561.637 op/sec

### Total time to create 50,000,000 utf8 messages for sending was 3,242 ms
	64.849350 ns/op
	15,420,354.966 op/sec

### Total time to create 50,000,000 a protocol message was 393 ms
	7.870244 ns/op
	127,060,856.575 op/sec

BUILD SUCCESSFUL in 5s
4 actionable tasks: 1 executed, 3 up-to-date
1:47:17 PM: Task execution finished 'MessageProtocolCreationBenchmark.main()'.

jameshilliard · 2020-10-21T17:37:36Z

Latest round of benchmarks:
New UTF-8:

11:14:17 AM: Executing task 'NatsAutoBench.main()'...

> Task :compileJava UP-TO-DATE
> Task :processResources NO-SOURCE
> Task :classes UP-TO-DATE

> Task :NatsAutoBench.main()
Connecting to NATS server at nats://localhost:4222
Enabling UTF-8 subjects
Running warmup
Current memory usage is 993.80 mb / 1.02 gb / 16.00 gb free/total/max
Executing tests ........................................

PubOnly 0            10,000,000          3,915,004 msg/s       0.00 b/s
PubOnly 8            10,000,000          4,202,807 msg/s     32.06 mb/s
PubOnly 32           10,000,000          5,110,407 msg/s    155.96 mb/s
PubOnly 256          10,000,000          4,335,994 msg/s      1.03 gb/s
PubOnly 512          10,000,000          3,805,189 msg/s      1.81 gb/s
PubOnly 1024          1,000,000          2,985,986 msg/s      2.85 gb/s
PubOnly 4096            500,000            759,566 msg/s      2.90 gb/s
PubOnly 8192            100,000            447,271 msg/s      3.41 gb/s

PubSub 0             10,000,000          1,164,335 msg/s       0.00 b/s
PubSub 8             10,000,000          1,048,153 msg/s      8.00 mb/s
PubSub 32            10,000,000          1,022,247 msg/s     31.20 mb/s
PubSub 256           10,000,000            890,375 msg/s    217.38 mb/s
PubSub 512           10,000,000            845,849 msg/s    413.01 mb/s
PubSub 1024           1,000,000            692,647 msg/s    676.41 mb/s
PubSub 4096             500,000            319,870 msg/s      1.22 gb/s
PubSub 8192             100,000            148,555 msg/s      1.13 gb/s

PubDispatch 0        10,000,000          1,234,253 msg/s       0.00 b/s
PubDispatch 8        10,000,000          1,211,989 msg/s      9.25 mb/s
PubDispatch 32       10,000,000          1,088,454 msg/s     33.22 mb/s
PubDispatch 256      10,000,000            998,732 msg/s    243.83 mb/s
PubDispatch 512      10,000,000            883,689 msg/s    431.49 mb/s
PubDispatch 1024      1,000,000            798,693 msg/s    779.97 mb/s
PubDispatch 4096        500,000            297,048 msg/s      1.13 gb/s
PubDispatch 8192        100,000            156,335 msg/s      1.19 gb/s

ReqReply 0               20,000              8,962 msg/s       0.00 b/s
ReqReply 8               20,000              9,406 msg/s     73.49 kb/s
ReqReply 32              10,000              8,426 msg/s    263.32 kb/s
ReqReply 256             10,000              8,458 msg/s      2.07 mb/s
ReqReply 512             10,000              8,271 msg/s      4.04 mb/s
ReqReply 1024            10,000              8,175 msg/s      7.98 mb/s
ReqReply 4096            10,000              8,183 msg/s     31.97 mb/s
ReqReply 8192            10,000              7,761 msg/s     60.63 mb/s

Latency 0     5,000     41 /  57.16 / 171    +/- 0.88  (microseconds)
Latency 8     5,000     41 /  55.82 / 819    +/- 0.87  (microseconds)
Latency 32    5,000     41 /  53.26 / 234    +/- 0.79  (microseconds)
Latency 256   5,000     43 /  54.25 / 360    +/- 0.81  (microseconds)
Latency 512   5,000     41 /  53.72 / 607    +/- 0.80  (microseconds)
Latency 1024  5,000     43 /  58.88 / 172    +/- 0.88  (microseconds)
Latency 4096  5,000     44 /  58.03 / 199    +/- 0.87  (microseconds)
Latency 8192  5,000     44 /  58.94 / 905    +/- 0.89  (microseconds)

Final memory usage is 67.86 mb / 80.00 mb / 16.00 gb free/total/max

BUILD SUCCESSFUL in 2m 51s
2 actionable tasks: 1 executed, 1 up-to-date
11:17:08 AM: Task execution finished 'NatsAutoBench.main()'.

Old UTF-8:

11:18:38 AM: Executing task 'NatsAutoBench.main()'...

> Task :compileJava UP-TO-DATE
> Task :processResources NO-SOURCE
> Task :classes UP-TO-DATE

> Task :NatsAutoBench.main()
Connecting to NATS server at nats://localhost:4222
Enabling UTF-8 subjects
Running warmup
Current memory usage is 997.58 mb / 1.02 gb / 16.00 gb free/total/max
Executing tests ........................................

PubOnly 0            10,000,000          4,400,967 msg/s       0.00 b/s
PubOnly 8            10,000,000          4,737,855 msg/s     36.15 mb/s
PubOnly 32           10,000,000          5,010,269 msg/s    152.90 mb/s
PubOnly 256          10,000,000          4,553,563 msg/s      1.09 gb/s
PubOnly 512          10,000,000          3,556,745 msg/s      1.70 gb/s
PubOnly 1024          1,000,000          2,858,358 msg/s      2.73 gb/s
PubOnly 4096            500,000            791,843 msg/s      3.02 gb/s
PubOnly 8192            100,000            445,880 msg/s      3.40 gb/s

PubSub 0             10,000,000          1,085,614 msg/s       0.00 b/s
PubSub 8             10,000,000          1,084,548 msg/s      8.27 mb/s
PubSub 32            10,000,000          1,005,820 msg/s     30.70 mb/s
PubSub 256           10,000,000            926,938 msg/s    226.30 mb/s
PubSub 512           10,000,000            831,612 msg/s    406.06 mb/s
PubSub 1024           1,000,000            679,077 msg/s    663.16 mb/s
PubSub 4096             500,000            336,255 msg/s      1.28 gb/s
PubSub 8192             100,000            191,322 msg/s      1.46 gb/s

PubDispatch 0        10,000,000          1,384,756 msg/s       0.00 b/s
PubDispatch 8        10,000,000          1,217,177 msg/s      9.29 mb/s
PubDispatch 32       10,000,000          1,153,245 msg/s     35.19 mb/s
PubDispatch 256      10,000,000          1,041,144 msg/s    254.19 mb/s
PubDispatch 512      10,000,000            964,513 msg/s    470.95 mb/s
PubDispatch 1024      1,000,000            833,387 msg/s    813.85 mb/s
PubDispatch 4096        500,000            312,898 msg/s      1.19 gb/s
PubDispatch 8192        100,000            157,818 msg/s      1.20 gb/s

ReqReply 0               20,000              7,519 msg/s       0.00 b/s
ReqReply 8               20,000              8,873 msg/s     69.32 kb/s
ReqReply 32              10,000              8,622 msg/s    269.44 kb/s
ReqReply 256             10,000              8,639 msg/s      2.11 mb/s
ReqReply 512             10,000              8,608 msg/s      4.20 mb/s
ReqReply 1024            10,000              8,825 msg/s      8.62 mb/s
ReqReply 4096            10,000              8,038 msg/s     31.40 mb/s
ReqReply 8192            10,000              8,058 msg/s     62.96 mb/s

Latency 0     5,000     42 /  55.63 / 227    +/- 0.84  (microseconds)
Latency 8     5,000     41 /  51.51 / 226    +/- 0.77  (microseconds)
Latency 32    5,000     41 /  51.05 / 197    +/- 0.76  (microseconds)
Latency 256   5,000     41 /  52.33 / 185    +/- 0.77  (microseconds)
Latency 512   5,000     42 /  53.45 / 618    +/- 0.82  (microseconds)
Latency 1024  5,000     45 /  67.08 / 209    +/- 1.02  (microseconds)
Latency 4096  5,000     44 /  57.10 / 187    +/- 0.84  (microseconds)
Latency 8192  5,000     44 /  60.40 / 853    +/- 0.91  (microseconds)

Final memory usage is 67.70 mb / 80.00 mb / 16.00 gb free/total/max

BUILD SUCCESSFUL in 2m 47s
2 actionable tasks: 1 executed, 1 up-to-date
11:21:26 AM: Task execution finished 'NatsAutoBench.main()'.

New ASCII:

11:24:59 AM: Executing task 'NatsAutoBench.main()'...

> Task :compileJava UP-TO-DATE
> Task :processResources NO-SOURCE
> Task :classes UP-TO-DATE

> Task :NatsAutoBench.main()
Connecting to NATS server at nats://localhost:4222
Running warmup
Current memory usage is 998.83 mb / 1.02 gb / 16.00 gb free/total/max
Executing tests ........................................

PubOnly 0            10,000,000          4,268,314 msg/s       0.00 b/s
PubOnly 8            10,000,000          4,572,550 msg/s     34.89 mb/s
PubOnly 32           10,000,000          4,891,019 msg/s    149.26 mb/s
PubOnly 256          10,000,000          4,234,485 msg/s      1.01 gb/s
PubOnly 512          10,000,000          3,788,739 msg/s      1.81 gb/s
PubOnly 1024          1,000,000          3,028,853 msg/s      2.89 gb/s
PubOnly 4096            500,000            938,663 msg/s      3.58 gb/s
PubOnly 8192            100,000            457,150 msg/s      3.49 gb/s

PubSub 0             10,000,000          1,317,297 msg/s       0.00 b/s
PubSub 8             10,000,000          1,290,515 msg/s      9.85 mb/s
PubSub 32            10,000,000          1,242,484 msg/s     37.92 mb/s
PubSub 256           10,000,000          1,113,833 msg/s    271.93 mb/s
PubSub 512           10,000,000            966,016 msg/s    471.69 mb/s
PubSub 1024           1,000,000            847,912 msg/s    828.04 mb/s
PubSub 4096             500,000            321,717 msg/s      1.23 gb/s
PubSub 8192             100,000            155,384 msg/s      1.19 gb/s

PubDispatch 0        10,000,000          1,520,825 msg/s       0.00 b/s
PubDispatch 8        10,000,000          1,300,240 msg/s      9.92 mb/s
PubDispatch 32       10,000,000          1,363,389 msg/s     41.61 mb/s
PubDispatch 256      10,000,000          1,143,525 msg/s    279.18 mb/s
PubDispatch 512      10,000,000            964,844 msg/s    471.12 mb/s
PubDispatch 1024      1,000,000            731,858 msg/s    714.71 mb/s
PubDispatch 4096        500,000            343,613 msg/s      1.31 gb/s
PubDispatch 8192        100,000            155,827 msg/s      1.19 gb/s

ReqReply 0               20,000              9,339 msg/s       0.00 b/s
ReqReply 8               20,000              8,703 msg/s     68.00 kb/s
ReqReply 32              10,000              8,757 msg/s    273.66 kb/s
ReqReply 256             10,000              8,578 msg/s      2.09 mb/s
ReqReply 512             10,000              8,483 msg/s      4.14 mb/s
ReqReply 1024            10,000              9,412 msg/s      9.19 mb/s
ReqReply 4096            10,000              8,774 msg/s     34.27 mb/s
ReqReply 8192            10,000              8,489 msg/s     66.32 mb/s

Latency 0     5,000     42 /  52.85 / 247    +/- 0.78  (microseconds)
Latency 8     5,000     40 /  51.37 / 242    +/- 0.77  (microseconds)
Latency 32    5,000     41 /  50.43 / 191    +/- 0.74  (microseconds)
Latency 256   5,000     41 /  51.07 / 198    +/- 0.75  (microseconds)
Latency 512   5,000     41 /  52.01 / 175    +/- 0.78  (microseconds)
Latency 1024  5,000     41 /  52.89 / 201    +/- 0.81  (microseconds)
Latency 4096  5,000     43 /  53.01 / 168    +/- 0.80  (microseconds)
Latency 8192  5,000     45 /  56.32 / 863    +/- 0.85  (microseconds)

Final memory usage is 67.87 mb / 80.00 mb / 16.00 gb free/total/max

BUILD SUCCESSFUL in 2m 34s
2 actionable tasks: 1 executed, 1 up-to-date
11:27:34 AM: Task execution finished 'NatsAutoBench.main()'.

Old ASCII:

11:30:55 AM: Executing task 'NatsAutoBench.main()'...

> Task :compileJava UP-TO-DATE
> Task :processResources NO-SOURCE
> Task :classes UP-TO-DATE

> Task :NatsAutoBench.main()
Connecting to NATS server at nats://localhost:4222
Running warmup
Current memory usage is 1016.10 mb / 1.02 gb / 16.00 gb free/total/max
Executing tests ........................................

PubOnly 0            10,000,000          4,674,445 msg/s       0.00 b/s
PubOnly 8            10,000,000          4,913,360 msg/s     37.49 mb/s
PubOnly 32           10,000,000          4,546,422 msg/s    138.75 mb/s
PubOnly 256          10,000,000          4,074,866 msg/s    994.84 mb/s
PubOnly 512          10,000,000          3,704,323 msg/s      1.77 gb/s
PubOnly 1024          1,000,000          2,898,953 msg/s      2.76 gb/s
PubOnly 4096            500,000            835,169 msg/s      3.19 gb/s
PubOnly 8192            100,000            452,756 msg/s      3.45 gb/s

PubSub 0             10,000,000          1,302,092 msg/s       0.00 b/s
PubSub 8             10,000,000          1,330,735 msg/s     10.15 mb/s
PubSub 32            10,000,000          1,196,178 msg/s     36.50 mb/s
PubSub 256           10,000,000          1,070,706 msg/s    261.40 mb/s
PubSub 512           10,000,000          1,026,833 msg/s    501.38 mb/s
PubSub 1024           1,000,000            871,019 msg/s    850.61 mb/s
PubSub 4096             500,000            312,244 msg/s      1.19 gb/s
PubSub 8192             100,000            145,726 msg/s      1.11 gb/s

PubDispatch 0        10,000,000          1,362,437 msg/s       0.00 b/s
PubDispatch 8        10,000,000          1,536,844 msg/s     11.73 mb/s
PubDispatch 32       10,000,000          1,373,791 msg/s     41.92 mb/s
PubDispatch 256      10,000,000          1,244,384 msg/s    303.80 mb/s
PubDispatch 512      10,000,000          1,013,914 msg/s    495.08 mb/s
PubDispatch 1024      1,000,000            907,392 msg/s    886.13 mb/s
PubDispatch 4096        500,000            333,031 msg/s      1.27 gb/s
PubDispatch 8192        100,000            184,617 msg/s      1.41 gb/s

ReqReply 0               20,000              8,461 msg/s       0.00 b/s
ReqReply 8               20,000              9,213 msg/s     71.98 kb/s
ReqReply 32              10,000              8,684 msg/s    271.38 kb/s
ReqReply 256             10,000              8,367 msg/s      2.04 mb/s
ReqReply 512             10,000              9,099 msg/s      4.44 mb/s
ReqReply 1024            10,000              8,838 msg/s      8.63 mb/s
ReqReply 4096            10,000              9,045 msg/s     35.33 mb/s
ReqReply 8192            10,000              7,772 msg/s     60.72 mb/s

Latency 0     5,000     38 /  56.89 / 194    +/- 0.88  (microseconds)
Latency 8     5,000     40 /  54.53 / 199    +/- 0.84  (microseconds)
Latency 32    5,000     42 /  53.64 / 150    +/- 0.80  (microseconds)
Latency 256   5,000     43 /  54.43 / 157    +/- 0.81  (microseconds)
Latency 512   5,000     42 /  57.23 / 166    +/- 0.86  (microseconds)
Latency 1024  5,000     43 /  56.61 / 252    +/- 0.86  (microseconds)
Latency 4096  5,000     43 /  58.39 / 264    +/- 0.89  (microseconds)
Latency 8192  5,000     44 /  56.61 / 831    +/- 0.84  (microseconds)

Final memory usage is 67.89 mb / 80.00 mb / 16.00 gb free/total/max

BUILD SUCCESSFUL in 2m 32s
2 actionable tasks: 1 executed, 1 up-to-date
11:33:28 AM: Task execution finished 'NatsAutoBench.main()'.

ColinSullivan1 · 2020-10-21T21:56:02Z

src/main/java/io/nats/client/impl/NatsMessage.java

+            charset = StandardCharsets.US_ASCII;
+        }
+        this.protocolBytes = ByteBuffer.allocate(len);
+        protocolBytes.put((byte)'P').put((byte)'U').put((byte)'B').put((byte)' ');


Would it be faster to use a backing array, initialized with PUB vs put APIs? Not sure which is faster in java...

I recall this was faster, and it will have better interoperability with the next round of changes which use ByteBuffer's in more places, ByteBuffer should be optimized for these types of operations.

src/main/java/io/nats/client/impl/NatsMessage.java

ColinSullivan1 · 2020-10-21T23:41:03Z

src/main/java/io/nats/client/impl/NatsMessage.java

+            protocol.mark();
+            this.protocolBytes = ByteBuffer.allocate(fastUtf8Length(protocol));
+            protocol.reset();
+            StandardCharsets.UTF_8.newEncoder().encode(protocol, this.protocolBytes, true);


Could you use a static instance rather than allocating every new msg?

I recall I tested that and this was a good bit faster in benchmarks, not 100% sure why, best guess is having multiple threads accessing the same static encoder instance causes cache flushing or something. I plan to refactor this anyways down the line once I refactor some prerequisite code.

Yep, verified this has a major performance impact.
Static instance:

5:59:11 PM: Executing task 'MessageProtocolCreationBenchmark.main()'... > Task :compileJava UP-TO-DATE > Task :processResources NO-SOURCE > Task :classes UP-TO-DATE > Task :compileTestJava UP-TO-DATE > Task :processTestResources UP-TO-DATE > Task :testClasses UP-TO-DATE > Task :MessageProtocolCreationBenchmark.main() ### Running benchmarks with 50,000,000 messages. ### Total time to create 50,000,000 non-utf8 messages for sending was 1,835 ms 36.718829 ns/op 27,233,984.034 op/sec ### Total time to create 50,000,000 utf8 messages for sending was 3,552 ms 71.053063 ns/op 14,073,988.718 op/sec ### Total time to create 50,000,000 a protocol message was 341 ms 6.833834 ns/op 146,330,744.581 op/sec BUILD SUCCESSFUL in 6s 4 actionable tasks: 1 executed, 3 up-to-date 5:59:17 PM: Task execution finished 'MessageProtocolCreationBenchmark.main()'.

non-static(current) implementation:

5:58:33 PM: Executing task 'MessageProtocolCreationBenchmark.main()'... > Task :compileJava > Task :processResources NO-SOURCE > Task :classes > Task :compileTestJava UP-TO-DATE > Task :processTestResources UP-TO-DATE > Task :testClasses UP-TO-DATE Note: Some input files use or override a deprecated API. Note: Recompile with -Xlint:deprecation for details. > Task :MessageProtocolCreationBenchmark.main() ### Running benchmarks with 50,000,000 messages. ### Total time to create 50,000,000 non-utf8 messages for sending was 1,802 ms 36.054858 ns/op 27,735,513.355 op/sec ### Total time to create 50,000,000 utf8 messages for sending was 2,841 ms 56.833210 ns/op 17,595,346.163 op/sec ### Total time to create 50,000,000 a protocol message was 334 ms 6.693376 ns/op 149,401,436.293 op/sec BUILD SUCCESSFUL in 5s 4 actionable tasks: 2 executed, 2 up-to-date 5:58:39 PM: Task execution finished 'MessageProtocolCreationBenchmark.main()'.

Wow, would have lost a bet on that one. Thanks!

Also noticed this here:

Instances of this class are not safe for use by multiple concurrent threads.

ColinSullivan1

LGTM other than some minor suggestions and nitpicks.

ColinSullivan1 · 2020-10-22T00:38:56Z

Thanks for the contribution!

jameshilliard force-pushed the bytebuffer branch 2 times, most recently from b0e4a71 to 3c0ad72 Compare October 17, 2020 16:13

jameshilliard commented Oct 17, 2020

View reviewed changes

jameshilliard force-pushed the bytebuffer branch from 3c0ad72 to 7c3e96e Compare October 20, 2020 11:56

jameshilliard commented Oct 20, 2020

View reviewed changes

jameshilliard force-pushed the bytebuffer branch from 7c3e96e to 7d6f316 Compare October 20, 2020 18:58

ColinSullivan1 self-requested a review October 21, 2020 21:46

ColinSullivan1 reviewed Oct 21, 2020

View reviewed changes

src/main/java/io/nats/client/impl/NatsMessage.java Outdated Show resolved Hide resolved

ColinSullivan1 reviewed Oct 21, 2020

View reviewed changes

src/main/java/io/nats/client/impl/NatsMessage.java Outdated Show resolved Hide resolved

ColinSullivan1 reviewed Oct 21, 2020

View reviewed changes

Refactor NatsMessage to use ByteBuffer internally.

ca458e0

jameshilliard force-pushed the bytebuffer branch from 7d6f316 to ca458e0 Compare October 21, 2020 23:52

ColinSullivan1 merged commit f154624 into nats-io:master Oct 22, 2020

jameshilliard deleted the bytebuffer branch October 22, 2020 00:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor NatsMessage to use ByteBuffer internally. #349

Refactor NatsMessage to use ByteBuffer internally. #349

jameshilliard commented Oct 16, 2020 •

edited

Loading

jameshilliard Oct 17, 2020

jameshilliard Oct 17, 2020 •

edited

Loading

sasbury Oct 20, 2020

jameshilliard Oct 20, 2020

jameshilliard Oct 17, 2020

jameshilliard Oct 17, 2020

jameshilliard Oct 17, 2020

sasbury commented Oct 20, 2020

jameshilliard Oct 20, 2020

ColinSullivan1 Oct 21, 2020

jameshilliard Oct 21, 2020

jameshilliard Oct 20, 2020

jameshilliard Oct 20, 2020

jameshilliard commented Oct 20, 2020

jameshilliard commented Oct 20, 2020

jameshilliard commented Oct 21, 2020

ColinSullivan1 Oct 21, 2020

jameshilliard Oct 21, 2020

ColinSullivan1 Oct 21, 2020

jameshilliard Oct 21, 2020

jameshilliard Oct 22, 2020

ColinSullivan1 Oct 22, 2020

jameshilliard Oct 22, 2020

ColinSullivan1 left a comment

ColinSullivan1 commented Oct 22, 2020


		NatsMessage next; // for linked list

		static final byte[] digits = {'0', '1', '2', '3', '4', '5', '6', '7', '8', '9'};

Refactor NatsMessage to use ByteBuffer internally. #349

Refactor NatsMessage to use ByteBuffer internally. #349

Conversation

jameshilliard commented Oct 16, 2020 • edited Loading

Choose a reason for hiding this comment

jameshilliard Oct 17, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sasbury commented Oct 20, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jameshilliard commented Oct 20, 2020

jameshilliard commented Oct 20, 2020

jameshilliard commented Oct 21, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ColinSullivan1 left a comment

Choose a reason for hiding this comment

ColinSullivan1 commented Oct 22, 2020

jameshilliard commented Oct 16, 2020 •

edited

Loading

jameshilliard Oct 17, 2020 •

edited

Loading