Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor NatsMessage to use ByteBuffer internally. #349

Merged
merged 1 commit into from
Oct 22, 2020

Conversation

jameshilliard
Copy link
Contributor

@jameshilliard jameshilliard commented Oct 16, 2020

Some groundwork for #347.

@jameshilliard jameshilliard force-pushed the bytebuffer branch 2 times, most recently from b0e4a71 to 3c0ad72 Compare October 17, 2020 16:13
this.data = null; // will set data and size after we read it
}

private static int fastIntLength(int number) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation while a bit verbose is significantly faster than a naive implementation and allows us to eliminate a separate length buffer which lets us write directly to protocolBytes avoiding some somewhat expensive copy operations.


NatsMessage next; // for linked list

static final byte[] digits = {'0', '1', '2', '3', '4', '5', '6', '7', '8', '9'};
Copy link
Contributor Author

@jameshilliard jameshilliard Oct 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't needed, should be faster to just add 0x30 to convert a byte to ascii.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add to the PR, should be ok - although i might - since i am paranoid - convert 0 to a byte to get the value :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ASCII "0" to byte gives you 0x30, I'm doing this in integer mode then casting to byte when writing to the buffer, at least I think that's probably the most efficient way to do it.

} else {
// Convert the length to bytes
byte[] lengthBytes = new byte[12];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to write this directly to protocolBytes.

int base = protocolBytes.limit();
for (int i = size; i > 0; i /= 10) {
base--;
protocolBytes.put(base, (byte)(i % 10 + 0x30));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just add 0x30 instead of using the lookup table approach.

boolean isProtocol() {
return this.subject == null;
}

// Will be null on an incoming message
byte[] getProtocolBytes() {
return this.protocolBytes;
if (this.protocolBytes.hasArray() && this.protocolBytes.limit() == this.protocolBytes.capacity()) {
return this.protocolBytes.array();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the normal fast path that avoids copying.

@sasbury
Copy link
Contributor

sasbury commented Oct 20, 2020

Nothing bad jumped out - will let Colin or someone from Synadia confirm.

}

// Create a protocol only message to publish
NatsMessage(CharBuffer protocol) {
ByteBuffer byteBuffer = StandardCharsets.UTF_8.encode(protocol);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not allocate a correctly sized ByteBuffer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this sort of issue is one of the reasons I'm trying to move away from any low level char/string based protocol encoding/decoding to UTF-8/ASCII byte/bytebuffer based encoding/decoding as much as possible.

Arrays.fill(byteBuffer.array(), (byte) 0); // clear sensitive data
this.sizeInBytes = this.protocolBytes.length + 2;// for \r\n
protocol.mark();
this.protocolBytes = ByteBuffer.allocate(fastUtf8Length(protocol));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By pre-allocating a correctly sized ByteBuffer we can eliminate the slow path that involves copying bytes in getProtocolBytes().

boolean isProtocol() {
return this.subject == null;
}

// Will be null on an incoming message
byte[] getProtocolBytes() {
return this.protocolBytes;
return this.protocolBytes.array();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since all ByteBuffer's now have correctly allocated sizes we can unconditionally use the fast path here.

@jameshilliard
Copy link
Contributor Author

Here's what I'm seeing performance wise:
Before:

11:25:10 AM: Executing task 'NatsAutoBench.main()'...

> Task :compileJava UP-TO-DATE
> Task :processResources NO-SOURCE
> Task :classes UP-TO-DATE

> Task :NatsAutoBench.main()
Connecting to NATS server at nats://localhost:4222
Running warmup
Current memory usage is 1019.25 mb / 1.02 gb / 16.00 gb free/total/max
Executing tests ........................................

PubOnly 0            10,000,000          5,138,146 msg/s       0.00 b/s
PubOnly 8            10,000,000          4,193,068 msg/s     31.99 mb/s
PubOnly 32           10,000,000          4,184,061 msg/s    127.69 mb/s
PubOnly 256          10,000,000          4,602,890 msg/s      1.10 gb/s
PubOnly 512          10,000,000          4,071,716 msg/s      1.94 gb/s
PubOnly 1024          1,000,000          3,113,506 msg/s      2.97 gb/s
PubOnly 4096            500,000            936,171 msg/s      3.57 gb/s
PubOnly 8192            100,000            461,962 msg/s      3.52 gb/s

PubSub 0             10,000,000          1,499,659 msg/s       0.00 b/s
PubSub 8             10,000,000          1,446,349 msg/s     11.03 mb/s
PubSub 32            10,000,000          1,330,344 msg/s     40.60 mb/s
PubSub 256           10,000,000          1,115,961 msg/s    272.45 mb/s
PubSub 512           10,000,000            981,733 msg/s    479.36 mb/s
PubSub 1024           1,000,000            839,707 msg/s    820.03 mb/s
PubSub 4096             500,000            347,289 msg/s      1.32 gb/s
PubSub 8192             100,000            144,818 msg/s      1.10 gb/s

PubDispatch 0        10,000,000          1,561,210 msg/s       0.00 b/s
PubDispatch 8        10,000,000          1,527,177 msg/s     11.65 mb/s
PubDispatch 32       10,000,000          1,414,433 msg/s     43.17 mb/s
PubDispatch 256      10,000,000          1,269,441 msg/s    309.92 mb/s
PubDispatch 512      10,000,000          1,105,003 msg/s    539.55 mb/s
PubDispatch 1024      1,000,000          1,002,108 msg/s    978.62 mb/s
PubDispatch 4096        500,000            347,199 msg/s      1.32 gb/s
PubDispatch 8192        100,000            195,271 msg/s      1.49 gb/s

ReqReply 0               20,000              9,905 msg/s       0.00 b/s
ReqReply 8               20,000              9,844 msg/s     76.91 kb/s
ReqReply 32              10,000              9,873 msg/s    308.54 kb/s
ReqReply 256             10,000              9,835 msg/s      2.40 mb/s
ReqReply 512             10,000              9,883 msg/s      4.83 mb/s
ReqReply 1024            10,000             10,172 msg/s      9.93 mb/s
ReqReply 4096            10,000              9,430 msg/s     36.84 mb/s
ReqReply 8192            10,000              9,154 msg/s     71.52 mb/s

Latency 0     5,000     39 /  52.55 / 287    +/- 0.78  (microseconds)
Latency 8     5,000     42 /  51.54 / 260    +/- 0.78  (microseconds)
Latency 32    5,000     41 /  50.87 / 167    +/- 0.76  (microseconds)
Latency 256   5,000     42 /  55.18 / 253    +/- 0.89  (microseconds)
Latency 512   5,000     43 /  59.28 / 260    +/- 0.90  (microseconds)
Latency 1024  5,000     42 /  51.17 / 187    +/- 0.75  (microseconds)
Latency 4096  5,000     42 /  52.90 / 267    +/- 0.79  (microseconds)
Latency 8192  5,000     45 /  55.60 / 859    +/- 0.85  (microseconds)

Final memory usage is 67.88 mb / 80.00 mb / 16.00 gb free/total/max

BUILD SUCCESSFUL in 2m 25s
2 actionable tasks: 1 executed, 1 up-to-date
11:27:36 AM: Task execution finished 'NatsAutoBench.main()'.

After:

11:30:12 AM: Executing task 'NatsAutoBench.main()'...

> Task :compileJava UP-TO-DATE
> Task :processResources NO-SOURCE
> Task :classes UP-TO-DATE

> Task :NatsAutoBench.main()
Connecting to NATS server at nats://localhost:4222
Running warmup
Current memory usage is 999.58 mb / 1.02 gb / 16.00 gb free/total/max
Executing tests ........................................

PubOnly 0            10,000,000          4,826,092 msg/s       0.00 b/s
PubOnly 8            10,000,000          5,019,848 msg/s     38.30 mb/s
PubOnly 32           10,000,000          4,330,835 msg/s    132.17 mb/s
PubOnly 256          10,000,000          4,942,284 msg/s      1.18 gb/s
PubOnly 512          10,000,000          4,086,727 msg/s      1.95 gb/s
PubOnly 1024          1,000,000          3,057,141 msg/s      2.92 gb/s
PubOnly 4096            500,000            916,903 msg/s      3.50 gb/s
PubOnly 8192            100,000            467,968 msg/s      3.57 gb/s

PubSub 0             10,000,000          1,533,636 msg/s       0.00 b/s
PubSub 8             10,000,000          1,433,106 msg/s     10.93 mb/s
PubSub 32            10,000,000          1,285,728 msg/s     39.24 mb/s
PubSub 256           10,000,000          1,096,267 msg/s    267.64 mb/s
PubSub 512           10,000,000            927,871 msg/s    453.06 mb/s
PubSub 1024           1,000,000            843,623 msg/s    823.85 mb/s
PubSub 4096             500,000            300,964 msg/s      1.15 gb/s
PubSub 8192             100,000            149,556 msg/s      1.14 gb/s

PubDispatch 0        10,000,000          1,303,114 msg/s       0.00 b/s
PubDispatch 8        10,000,000          1,347,450 msg/s     10.28 mb/s
PubDispatch 32       10,000,000          1,323,138 msg/s     40.38 mb/s
PubDispatch 256      10,000,000          1,157,187 msg/s    282.52 mb/s
PubDispatch 512      10,000,000          1,053,176 msg/s    514.25 mb/s
PubDispatch 1024      1,000,000            916,186 msg/s    894.71 mb/s
PubDispatch 4096        500,000            339,930 msg/s      1.30 gb/s
PubDispatch 8192        100,000            195,462 msg/s      1.49 gb/s

ReqReply 0               20,000              8,654 msg/s       0.00 b/s
ReqReply 8               20,000              8,855 msg/s     69.18 kb/s
ReqReply 32              10,000              8,619 msg/s    269.35 kb/s
ReqReply 256             10,000              8,262 msg/s      2.02 mb/s
ReqReply 512             10,000              8,927 msg/s      4.36 mb/s
ReqReply 1024            10,000              9,861 msg/s      9.63 mb/s
ReqReply 4096            10,000              9,109 msg/s     35.58 mb/s
ReqReply 8192            10,000              9,483 msg/s     74.09 mb/s

Latency 0     5,000     39 /  51.76 / 144    +/- 0.76  (microseconds)
Latency 8     5,000     39 /  50.99 / 301    +/- 0.76  (microseconds)
Latency 32    5,000     40 /  49.90 / 237    +/- 0.75  (microseconds)
Latency 256   5,000     40 /  50.53 / 255    +/- 0.78  (microseconds)
Latency 512   5,000     41 /  51.41 / 256    +/- 0.78  (microseconds)
Latency 1024  5,000     41 /  51.35 / 202    +/- 0.77  (microseconds)
Latency 4096  5,000     43 /  54.97 / 272    +/- 0.82  (microseconds)
Latency 8192  5,000     45 /  55.23 / 807    +/- 0.85  (microseconds)

Final memory usage is 67.84 mb / 80.00 mb / 16.00 gb free/total/max

BUILD SUCCESSFUL in 2m 29s
2 actionable tasks: 1 executed, 1 up-to-date
11:32:42 AM: Task execution finished 'NatsAutoBench.main()'.

@jameshilliard
Copy link
Contributor Author

Before:

1:46:59 PM: Executing task 'MessageProtocolCreationBenchmark.main()'...

> Task :compileJava UP-TO-DATE
> Task :processResources NO-SOURCE
> Task :classes UP-TO-DATE
> Task :compileTestJava UP-TO-DATE
> Task :processTestResources UP-TO-DATE
> Task :testClasses UP-TO-DATE

> Task :MessageProtocolCreationBenchmark.main()
### Running benchmarks with 50,000,000 messages.

### Total time to create 50,000,000 non-utf8 messages for sending was 1,748 ms
	34.977587 ns/op
	28,589,736.279 op/sec

### Total time to create 50,000,000 utf8 messages for sending was 3,065 ms
	61.308911 ns/op
	16,310,842.564 op/sec

### Total time to create 50,000,000 a protocol message was 913 ms
	18.262107 ns/op
	54,758,195.611 op/sec

BUILD SUCCESSFUL in 6s
4 actionable tasks: 1 executed, 3 up-to-date
1:47:05 PM: Task execution finished 'MessageProtocolCreationBenchmark.main()'.

Benchmarks after further optimization of fastUtf8Length:

1:47:11 PM: Executing task 'MessageProtocolCreationBenchmark.main()'...

> Task :compileJava UP-TO-DATE
> Task :processResources NO-SOURCE
> Task :classes UP-TO-DATE
> Task :compileTestJava UP-TO-DATE
> Task :processTestResources UP-TO-DATE
> Task :testClasses UP-TO-DATE

> Task :MessageProtocolCreationBenchmark.main()
### Running benchmarks with 50,000,000 messages.

### Total time to create 50,000,000 non-utf8 messages for sending was 1,870 ms
	37.406164 ns/op
	26,733,561.637 op/sec

### Total time to create 50,000,000 utf8 messages for sending was 3,242 ms
	64.849350 ns/op
	15,420,354.966 op/sec

### Total time to create 50,000,000 a protocol message was 393 ms
	7.870244 ns/op
	127,060,856.575 op/sec

BUILD SUCCESSFUL in 5s
4 actionable tasks: 1 executed, 3 up-to-date
1:47:17 PM: Task execution finished 'MessageProtocolCreationBenchmark.main()'.

@jameshilliard
Copy link
Contributor Author

Latest round of benchmarks:
New UTF-8:

11:14:17 AM: Executing task 'NatsAutoBench.main()'...

> Task :compileJava UP-TO-DATE
> Task :processResources NO-SOURCE
> Task :classes UP-TO-DATE

> Task :NatsAutoBench.main()
Connecting to NATS server at nats://localhost:4222
Enabling UTF-8 subjects
Running warmup
Current memory usage is 993.80 mb / 1.02 gb / 16.00 gb free/total/max
Executing tests ........................................

PubOnly 0            10,000,000          3,915,004 msg/s       0.00 b/s
PubOnly 8            10,000,000          4,202,807 msg/s     32.06 mb/s
PubOnly 32           10,000,000          5,110,407 msg/s    155.96 mb/s
PubOnly 256          10,000,000          4,335,994 msg/s      1.03 gb/s
PubOnly 512          10,000,000          3,805,189 msg/s      1.81 gb/s
PubOnly 1024          1,000,000          2,985,986 msg/s      2.85 gb/s
PubOnly 4096            500,000            759,566 msg/s      2.90 gb/s
PubOnly 8192            100,000            447,271 msg/s      3.41 gb/s

PubSub 0             10,000,000          1,164,335 msg/s       0.00 b/s
PubSub 8             10,000,000          1,048,153 msg/s      8.00 mb/s
PubSub 32            10,000,000          1,022,247 msg/s     31.20 mb/s
PubSub 256           10,000,000            890,375 msg/s    217.38 mb/s
PubSub 512           10,000,000            845,849 msg/s    413.01 mb/s
PubSub 1024           1,000,000            692,647 msg/s    676.41 mb/s
PubSub 4096             500,000            319,870 msg/s      1.22 gb/s
PubSub 8192             100,000            148,555 msg/s      1.13 gb/s

PubDispatch 0        10,000,000          1,234,253 msg/s       0.00 b/s
PubDispatch 8        10,000,000          1,211,989 msg/s      9.25 mb/s
PubDispatch 32       10,000,000          1,088,454 msg/s     33.22 mb/s
PubDispatch 256      10,000,000            998,732 msg/s    243.83 mb/s
PubDispatch 512      10,000,000            883,689 msg/s    431.49 mb/s
PubDispatch 1024      1,000,000            798,693 msg/s    779.97 mb/s
PubDispatch 4096        500,000            297,048 msg/s      1.13 gb/s
PubDispatch 8192        100,000            156,335 msg/s      1.19 gb/s

ReqReply 0               20,000              8,962 msg/s       0.00 b/s
ReqReply 8               20,000              9,406 msg/s     73.49 kb/s
ReqReply 32              10,000              8,426 msg/s    263.32 kb/s
ReqReply 256             10,000              8,458 msg/s      2.07 mb/s
ReqReply 512             10,000              8,271 msg/s      4.04 mb/s
ReqReply 1024            10,000              8,175 msg/s      7.98 mb/s
ReqReply 4096            10,000              8,183 msg/s     31.97 mb/s
ReqReply 8192            10,000              7,761 msg/s     60.63 mb/s

Latency 0     5,000     41 /  57.16 / 171    +/- 0.88  (microseconds)
Latency 8     5,000     41 /  55.82 / 819    +/- 0.87  (microseconds)
Latency 32    5,000     41 /  53.26 / 234    +/- 0.79  (microseconds)
Latency 256   5,000     43 /  54.25 / 360    +/- 0.81  (microseconds)
Latency 512   5,000     41 /  53.72 / 607    +/- 0.80  (microseconds)
Latency 1024  5,000     43 /  58.88 / 172    +/- 0.88  (microseconds)
Latency 4096  5,000     44 /  58.03 / 199    +/- 0.87  (microseconds)
Latency 8192  5,000     44 /  58.94 / 905    +/- 0.89  (microseconds)

Final memory usage is 67.86 mb / 80.00 mb / 16.00 gb free/total/max

BUILD SUCCESSFUL in 2m 51s
2 actionable tasks: 1 executed, 1 up-to-date
11:17:08 AM: Task execution finished 'NatsAutoBench.main()'.

Old UTF-8:

11:18:38 AM: Executing task 'NatsAutoBench.main()'...

> Task :compileJava UP-TO-DATE
> Task :processResources NO-SOURCE
> Task :classes UP-TO-DATE

> Task :NatsAutoBench.main()
Connecting to NATS server at nats://localhost:4222
Enabling UTF-8 subjects
Running warmup
Current memory usage is 997.58 mb / 1.02 gb / 16.00 gb free/total/max
Executing tests ........................................

PubOnly 0            10,000,000          4,400,967 msg/s       0.00 b/s
PubOnly 8            10,000,000          4,737,855 msg/s     36.15 mb/s
PubOnly 32           10,000,000          5,010,269 msg/s    152.90 mb/s
PubOnly 256          10,000,000          4,553,563 msg/s      1.09 gb/s
PubOnly 512          10,000,000          3,556,745 msg/s      1.70 gb/s
PubOnly 1024          1,000,000          2,858,358 msg/s      2.73 gb/s
PubOnly 4096            500,000            791,843 msg/s      3.02 gb/s
PubOnly 8192            100,000            445,880 msg/s      3.40 gb/s

PubSub 0             10,000,000          1,085,614 msg/s       0.00 b/s
PubSub 8             10,000,000          1,084,548 msg/s      8.27 mb/s
PubSub 32            10,000,000          1,005,820 msg/s     30.70 mb/s
PubSub 256           10,000,000            926,938 msg/s    226.30 mb/s
PubSub 512           10,000,000            831,612 msg/s    406.06 mb/s
PubSub 1024           1,000,000            679,077 msg/s    663.16 mb/s
PubSub 4096             500,000            336,255 msg/s      1.28 gb/s
PubSub 8192             100,000            191,322 msg/s      1.46 gb/s

PubDispatch 0        10,000,000          1,384,756 msg/s       0.00 b/s
PubDispatch 8        10,000,000          1,217,177 msg/s      9.29 mb/s
PubDispatch 32       10,000,000          1,153,245 msg/s     35.19 mb/s
PubDispatch 256      10,000,000          1,041,144 msg/s    254.19 mb/s
PubDispatch 512      10,000,000            964,513 msg/s    470.95 mb/s
PubDispatch 1024      1,000,000            833,387 msg/s    813.85 mb/s
PubDispatch 4096        500,000            312,898 msg/s      1.19 gb/s
PubDispatch 8192        100,000            157,818 msg/s      1.20 gb/s

ReqReply 0               20,000              7,519 msg/s       0.00 b/s
ReqReply 8               20,000              8,873 msg/s     69.32 kb/s
ReqReply 32              10,000              8,622 msg/s    269.44 kb/s
ReqReply 256             10,000              8,639 msg/s      2.11 mb/s
ReqReply 512             10,000              8,608 msg/s      4.20 mb/s
ReqReply 1024            10,000              8,825 msg/s      8.62 mb/s
ReqReply 4096            10,000              8,038 msg/s     31.40 mb/s
ReqReply 8192            10,000              8,058 msg/s     62.96 mb/s

Latency 0     5,000     42 /  55.63 / 227    +/- 0.84  (microseconds)
Latency 8     5,000     41 /  51.51 / 226    +/- 0.77  (microseconds)
Latency 32    5,000     41 /  51.05 / 197    +/- 0.76  (microseconds)
Latency 256   5,000     41 /  52.33 / 185    +/- 0.77  (microseconds)
Latency 512   5,000     42 /  53.45 / 618    +/- 0.82  (microseconds)
Latency 1024  5,000     45 /  67.08 / 209    +/- 1.02  (microseconds)
Latency 4096  5,000     44 /  57.10 / 187    +/- 0.84  (microseconds)
Latency 8192  5,000     44 /  60.40 / 853    +/- 0.91  (microseconds)

Final memory usage is 67.70 mb / 80.00 mb / 16.00 gb free/total/max

BUILD SUCCESSFUL in 2m 47s
2 actionable tasks: 1 executed, 1 up-to-date
11:21:26 AM: Task execution finished 'NatsAutoBench.main()'.

New ASCII:

11:24:59 AM: Executing task 'NatsAutoBench.main()'...

> Task :compileJava UP-TO-DATE
> Task :processResources NO-SOURCE
> Task :classes UP-TO-DATE

> Task :NatsAutoBench.main()
Connecting to NATS server at nats://localhost:4222
Running warmup
Current memory usage is 998.83 mb / 1.02 gb / 16.00 gb free/total/max
Executing tests ........................................

PubOnly 0            10,000,000          4,268,314 msg/s       0.00 b/s
PubOnly 8            10,000,000          4,572,550 msg/s     34.89 mb/s
PubOnly 32           10,000,000          4,891,019 msg/s    149.26 mb/s
PubOnly 256          10,000,000          4,234,485 msg/s      1.01 gb/s
PubOnly 512          10,000,000          3,788,739 msg/s      1.81 gb/s
PubOnly 1024          1,000,000          3,028,853 msg/s      2.89 gb/s
PubOnly 4096            500,000            938,663 msg/s      3.58 gb/s
PubOnly 8192            100,000            457,150 msg/s      3.49 gb/s

PubSub 0             10,000,000          1,317,297 msg/s       0.00 b/s
PubSub 8             10,000,000          1,290,515 msg/s      9.85 mb/s
PubSub 32            10,000,000          1,242,484 msg/s     37.92 mb/s
PubSub 256           10,000,000          1,113,833 msg/s    271.93 mb/s
PubSub 512           10,000,000            966,016 msg/s    471.69 mb/s
PubSub 1024           1,000,000            847,912 msg/s    828.04 mb/s
PubSub 4096             500,000            321,717 msg/s      1.23 gb/s
PubSub 8192             100,000            155,384 msg/s      1.19 gb/s

PubDispatch 0        10,000,000          1,520,825 msg/s       0.00 b/s
PubDispatch 8        10,000,000          1,300,240 msg/s      9.92 mb/s
PubDispatch 32       10,000,000          1,363,389 msg/s     41.61 mb/s
PubDispatch 256      10,000,000          1,143,525 msg/s    279.18 mb/s
PubDispatch 512      10,000,000            964,844 msg/s    471.12 mb/s
PubDispatch 1024      1,000,000            731,858 msg/s    714.71 mb/s
PubDispatch 4096        500,000            343,613 msg/s      1.31 gb/s
PubDispatch 8192        100,000            155,827 msg/s      1.19 gb/s

ReqReply 0               20,000              9,339 msg/s       0.00 b/s
ReqReply 8               20,000              8,703 msg/s     68.00 kb/s
ReqReply 32              10,000              8,757 msg/s    273.66 kb/s
ReqReply 256             10,000              8,578 msg/s      2.09 mb/s
ReqReply 512             10,000              8,483 msg/s      4.14 mb/s
ReqReply 1024            10,000              9,412 msg/s      9.19 mb/s
ReqReply 4096            10,000              8,774 msg/s     34.27 mb/s
ReqReply 8192            10,000              8,489 msg/s     66.32 mb/s

Latency 0     5,000     42 /  52.85 / 247    +/- 0.78  (microseconds)
Latency 8     5,000     40 /  51.37 / 242    +/- 0.77  (microseconds)
Latency 32    5,000     41 /  50.43 / 191    +/- 0.74  (microseconds)
Latency 256   5,000     41 /  51.07 / 198    +/- 0.75  (microseconds)
Latency 512   5,000     41 /  52.01 / 175    +/- 0.78  (microseconds)
Latency 1024  5,000     41 /  52.89 / 201    +/- 0.81  (microseconds)
Latency 4096  5,000     43 /  53.01 / 168    +/- 0.80  (microseconds)
Latency 8192  5,000     45 /  56.32 / 863    +/- 0.85  (microseconds)

Final memory usage is 67.87 mb / 80.00 mb / 16.00 gb free/total/max

BUILD SUCCESSFUL in 2m 34s
2 actionable tasks: 1 executed, 1 up-to-date
11:27:34 AM: Task execution finished 'NatsAutoBench.main()'.

Old ASCII:

11:30:55 AM: Executing task 'NatsAutoBench.main()'...

> Task :compileJava UP-TO-DATE
> Task :processResources NO-SOURCE
> Task :classes UP-TO-DATE

> Task :NatsAutoBench.main()
Connecting to NATS server at nats://localhost:4222
Running warmup
Current memory usage is 1016.10 mb / 1.02 gb / 16.00 gb free/total/max
Executing tests ........................................

PubOnly 0            10,000,000          4,674,445 msg/s       0.00 b/s
PubOnly 8            10,000,000          4,913,360 msg/s     37.49 mb/s
PubOnly 32           10,000,000          4,546,422 msg/s    138.75 mb/s
PubOnly 256          10,000,000          4,074,866 msg/s    994.84 mb/s
PubOnly 512          10,000,000          3,704,323 msg/s      1.77 gb/s
PubOnly 1024          1,000,000          2,898,953 msg/s      2.76 gb/s
PubOnly 4096            500,000            835,169 msg/s      3.19 gb/s
PubOnly 8192            100,000            452,756 msg/s      3.45 gb/s

PubSub 0             10,000,000          1,302,092 msg/s       0.00 b/s
PubSub 8             10,000,000          1,330,735 msg/s     10.15 mb/s
PubSub 32            10,000,000          1,196,178 msg/s     36.50 mb/s
PubSub 256           10,000,000          1,070,706 msg/s    261.40 mb/s
PubSub 512           10,000,000          1,026,833 msg/s    501.38 mb/s
PubSub 1024           1,000,000            871,019 msg/s    850.61 mb/s
PubSub 4096             500,000            312,244 msg/s      1.19 gb/s
PubSub 8192             100,000            145,726 msg/s      1.11 gb/s

PubDispatch 0        10,000,000          1,362,437 msg/s       0.00 b/s
PubDispatch 8        10,000,000          1,536,844 msg/s     11.73 mb/s
PubDispatch 32       10,000,000          1,373,791 msg/s     41.92 mb/s
PubDispatch 256      10,000,000          1,244,384 msg/s    303.80 mb/s
PubDispatch 512      10,000,000          1,013,914 msg/s    495.08 mb/s
PubDispatch 1024      1,000,000            907,392 msg/s    886.13 mb/s
PubDispatch 4096        500,000            333,031 msg/s      1.27 gb/s
PubDispatch 8192        100,000            184,617 msg/s      1.41 gb/s

ReqReply 0               20,000              8,461 msg/s       0.00 b/s
ReqReply 8               20,000              9,213 msg/s     71.98 kb/s
ReqReply 32              10,000              8,684 msg/s    271.38 kb/s
ReqReply 256             10,000              8,367 msg/s      2.04 mb/s
ReqReply 512             10,000              9,099 msg/s      4.44 mb/s
ReqReply 1024            10,000              8,838 msg/s      8.63 mb/s
ReqReply 4096            10,000              9,045 msg/s     35.33 mb/s
ReqReply 8192            10,000              7,772 msg/s     60.72 mb/s

Latency 0     5,000     38 /  56.89 / 194    +/- 0.88  (microseconds)
Latency 8     5,000     40 /  54.53 / 199    +/- 0.84  (microseconds)
Latency 32    5,000     42 /  53.64 / 150    +/- 0.80  (microseconds)
Latency 256   5,000     43 /  54.43 / 157    +/- 0.81  (microseconds)
Latency 512   5,000     42 /  57.23 / 166    +/- 0.86  (microseconds)
Latency 1024  5,000     43 /  56.61 / 252    +/- 0.86  (microseconds)
Latency 4096  5,000     43 /  58.39 / 264    +/- 0.89  (microseconds)
Latency 8192  5,000     44 /  56.61 / 831    +/- 0.84  (microseconds)

Final memory usage is 67.89 mb / 80.00 mb / 16.00 gb free/total/max

BUILD SUCCESSFUL in 2m 32s
2 actionable tasks: 1 executed, 1 up-to-date
11:33:28 AM: Task execution finished 'NatsAutoBench.main()'.

@ColinSullivan1 ColinSullivan1 self-requested a review October 21, 2020 21:46
charset = StandardCharsets.US_ASCII;
}
this.protocolBytes = ByteBuffer.allocate(len);
protocolBytes.put((byte)'P').put((byte)'U').put((byte)'B').put((byte)' ');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be faster to use a backing array, initialized with PUB vs put APIs? Not sure which is faster in java...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall this was faster, and it will have better interoperability with the next round of changes which use ByteBuffer's in more places, ByteBuffer should be optimized for these types of operations.

protocol.mark();
this.protocolBytes = ByteBuffer.allocate(fastUtf8Length(protocol));
protocol.reset();
StandardCharsets.UTF_8.newEncoder().encode(protocol, this.protocolBytes, true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use a static instance rather than allocating every new msg?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall I tested that and this was a good bit faster in benchmarks, not 100% sure why, best guess is having multiple threads accessing the same static encoder instance causes cache flushing or something. I plan to refactor this anyways down the line once I refactor some prerequisite code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, verified this has a major performance impact.
Static instance:

5:59:11 PM: Executing task 'MessageProtocolCreationBenchmark.main()'...

> Task :compileJava UP-TO-DATE
> Task :processResources NO-SOURCE
> Task :classes UP-TO-DATE
> Task :compileTestJava UP-TO-DATE
> Task :processTestResources UP-TO-DATE
> Task :testClasses UP-TO-DATE

> Task :MessageProtocolCreationBenchmark.main()
### Running benchmarks with 50,000,000 messages.

### Total time to create 50,000,000 non-utf8 messages for sending was 1,835 ms
	36.718829 ns/op
	27,233,984.034 op/sec

### Total time to create 50,000,000 utf8 messages for sending was 3,552 ms
	71.053063 ns/op
	14,073,988.718 op/sec

### Total time to create 50,000,000 a protocol message was 341 ms
	6.833834 ns/op
	146,330,744.581 op/sec

BUILD SUCCESSFUL in 6s
4 actionable tasks: 1 executed, 3 up-to-date
5:59:17 PM: Task execution finished 'MessageProtocolCreationBenchmark.main()'.

non-static(current) implementation:

5:58:33 PM: Executing task 'MessageProtocolCreationBenchmark.main()'...


> Task :compileJava

> Task :processResources NO-SOURCE
> Task :classes
> Task :compileTestJava UP-TO-DATE
> Task :processTestResources UP-TO-DATE
> Task :testClasses UP-TO-DATE
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.

> Task :MessageProtocolCreationBenchmark.main()
### Running benchmarks with 50,000,000 messages.

### Total time to create 50,000,000 non-utf8 messages for sending was 1,802 ms
	36.054858 ns/op
	27,735,513.355 op/sec

### Total time to create 50,000,000 utf8 messages for sending was 2,841 ms
	56.833210 ns/op
	17,595,346.163 op/sec

### Total time to create 50,000,000 a protocol message was 334 ms
	6.693376 ns/op
	149,401,436.293 op/sec

BUILD SUCCESSFUL in 5s
4 actionable tasks: 2 executed, 2 up-to-date
5:58:39 PM: Task execution finished 'MessageProtocolCreationBenchmark.main()'.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, would have lost a bet on that one. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also noticed this here:

Instances of this class are not safe for use by multiple concurrent threads.

Copy link
Member

@ColinSullivan1 ColinSullivan1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than some minor suggestions and nitpicks.

@ColinSullivan1 ColinSullivan1 merged commit f154624 into nats-io:master Oct 22, 2020
@ColinSullivan1
Copy link
Member

Thanks for the contribution!

@jameshilliard jameshilliard deleted the bytebuffer branch October 22, 2020 00:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants