-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor NatsMessage to use ByteBuffer internally. #349
Conversation
b0e4a71
to
3c0ad72
Compare
this.data = null; // will set data and size after we read it | ||
} | ||
|
||
private static int fastIntLength(int number) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation while a bit verbose is significantly faster than a naive implementation and allows us to eliminate a separate length buffer which lets us write directly to protocolBytes
avoiding some somewhat expensive copy operations.
|
||
NatsMessage next; // for linked list | ||
|
||
static final byte[] digits = {'0', '1', '2', '3', '4', '5', '6', '7', '8', '9'}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't needed, should be faster to just add 0x30
to convert a byte to ascii.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add to the PR, should be ok - although i might - since i am paranoid - convert 0 to a byte to get the value :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ASCII "0" to byte gives you 0x30
, I'm doing this in integer mode then casting to byte
when writing to the buffer, at least I think that's probably the most efficient way to do it.
} else { | ||
// Convert the length to bytes | ||
byte[] lengthBytes = new byte[12]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to write this directly to protocolBytes
.
int base = protocolBytes.limit(); | ||
for (int i = size; i > 0; i /= 10) { | ||
base--; | ||
protocolBytes.put(base, (byte)(i % 10 + 0x30)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can just add 0x30
instead of using the lookup table approach.
boolean isProtocol() { | ||
return this.subject == null; | ||
} | ||
|
||
// Will be null on an incoming message | ||
byte[] getProtocolBytes() { | ||
return this.protocolBytes; | ||
if (this.protocolBytes.hasArray() && this.protocolBytes.limit() == this.protocolBytes.capacity()) { | ||
return this.protocolBytes.array(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the normal fast path that avoids copying.
Nothing bad jumped out - will let Colin or someone from Synadia confirm. |
3c0ad72
to
7c3e96e
Compare
} | ||
|
||
// Create a protocol only message to publish | ||
NatsMessage(CharBuffer protocol) { | ||
ByteBuffer byteBuffer = StandardCharsets.UTF_8.encode(protocol); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not allocate a correctly sized ByteBuffer
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, this sort of issue is one of the reasons I'm trying to move away from any low level char/string based protocol encoding/decoding to UTF-8/ASCII byte/bytebuffer based encoding/decoding as much as possible.
Arrays.fill(byteBuffer.array(), (byte) 0); // clear sensitive data | ||
this.sizeInBytes = this.protocolBytes.length + 2;// for \r\n | ||
protocol.mark(); | ||
this.protocolBytes = ByteBuffer.allocate(fastUtf8Length(protocol)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By pre-allocating a correctly sized ByteBuffer
we can eliminate the slow path that involves copying bytes in getProtocolBytes()
.
boolean isProtocol() { | ||
return this.subject == null; | ||
} | ||
|
||
// Will be null on an incoming message | ||
byte[] getProtocolBytes() { | ||
return this.protocolBytes; | ||
return this.protocolBytes.array(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since all ByteBuffer
's now have correctly allocated sizes we can unconditionally use the fast path here.
Here's what I'm seeing performance wise:
After:
|
7c3e96e
to
7d6f316
Compare
Before:
Benchmarks after further optimization of
|
Latest round of benchmarks:
Old UTF-8:
New ASCII:
Old ASCII:
|
charset = StandardCharsets.US_ASCII; | ||
} | ||
this.protocolBytes = ByteBuffer.allocate(len); | ||
protocolBytes.put((byte)'P').put((byte)'U').put((byte)'B').put((byte)' '); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be faster to use a backing array, initialized with PUB
vs put APIs? Not sure which is faster in java...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recall this was faster, and it will have better interoperability with the next round of changes which use ByteBuffer
's in more places, ByteBuffer
should be optimized for these types of operations.
protocol.mark(); | ||
this.protocolBytes = ByteBuffer.allocate(fastUtf8Length(protocol)); | ||
protocol.reset(); | ||
StandardCharsets.UTF_8.newEncoder().encode(protocol, this.protocolBytes, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use a static instance rather than allocating every new msg?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recall I tested that and this was a good bit faster in benchmarks, not 100% sure why, best guess is having multiple threads accessing the same static encoder instance causes cache flushing or something. I plan to refactor this anyways down the line once I refactor some prerequisite code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, verified this has a major performance impact.
Static instance:
5:59:11 PM: Executing task 'MessageProtocolCreationBenchmark.main()'...
> Task :compileJava UP-TO-DATE
> Task :processResources NO-SOURCE
> Task :classes UP-TO-DATE
> Task :compileTestJava UP-TO-DATE
> Task :processTestResources UP-TO-DATE
> Task :testClasses UP-TO-DATE
> Task :MessageProtocolCreationBenchmark.main()
### Running benchmarks with 50,000,000 messages.
### Total time to create 50,000,000 non-utf8 messages for sending was 1,835 ms
36.718829 ns/op
27,233,984.034 op/sec
### Total time to create 50,000,000 utf8 messages for sending was 3,552 ms
71.053063 ns/op
14,073,988.718 op/sec
### Total time to create 50,000,000 a protocol message was 341 ms
6.833834 ns/op
146,330,744.581 op/sec
BUILD SUCCESSFUL in 6s
4 actionable tasks: 1 executed, 3 up-to-date
5:59:17 PM: Task execution finished 'MessageProtocolCreationBenchmark.main()'.
non-static(current) implementation:
5:58:33 PM: Executing task 'MessageProtocolCreationBenchmark.main()'...
> Task :compileJava
> Task :processResources NO-SOURCE
> Task :classes
> Task :compileTestJava UP-TO-DATE
> Task :processTestResources UP-TO-DATE
> Task :testClasses UP-TO-DATE
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :MessageProtocolCreationBenchmark.main()
### Running benchmarks with 50,000,000 messages.
### Total time to create 50,000,000 non-utf8 messages for sending was 1,802 ms
36.054858 ns/op
27,735,513.355 op/sec
### Total time to create 50,000,000 utf8 messages for sending was 2,841 ms
56.833210 ns/op
17,595,346.163 op/sec
### Total time to create 50,000,000 a protocol message was 334 ms
6.693376 ns/op
149,401,436.293 op/sec
BUILD SUCCESSFUL in 5s
4 actionable tasks: 2 executed, 2 up-to-date
5:58:39 PM: Task execution finished 'MessageProtocolCreationBenchmark.main()'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, would have lost a bet on that one. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also noticed this here:
Instances of this class are not safe for use by multiple concurrent threads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM other than some minor suggestions and nitpicks.
7d6f316
to
ca458e0
Compare
Thanks for the contribution! |
Some groundwork for #347.