-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use a virtual threads friendly pool with Jackson #35816
Conversation
@@ -63,6 +64,9 @@ public ObjectMapper objectMapper(@All List<ObjectMapperCustomizer> customizers, | |||
for (ObjectMapperCustomizer customizer : sortedCustomizers) { | |||
customizer.customize(objectMapper); | |||
} | |||
if (true) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By default and mostly for backward compatibility reasons Jackson uses the ThreadLocal
based pool that however isn't virtual thread friendly.
We should decide under which condition we may want to replace that default pool with something not relying on ThreadLocal
. Personally I wouldn't leave this decision to end users, but I'd take it automatically. For instance we could use a virtual threads friendly pool when the project is running on Java 21+ and uses the @RunOnVirtualThread
annotation at least once.
@@ -63,6 +64,9 @@ public ObjectMapper objectMapper(@All List<ObjectMapperCustomizer> customizers, | |||
for (ObjectMapperCustomizer customizer : sortedCustomizers) { | |||
customizer.customize(objectMapper); | |||
} | |||
if (true) { | |||
objectMapper.getFactory().setBufferRecyclerPool(BufferRecyclerPool.LockFreePool.shared()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I'm just demonstrating how to configure a different pool implementation using the new Jackson API. The specific pool that I'm using is one of the implementations now offered out of the box by Jackson. This pool works reasonably well also with virtual threads, but it's pretty simple also because one of the constraint understandably required by Jackson maintainers was to not add any dependency to jackson-core
. I believe that we could develop our own pool implementation, for instance based on some JCTools
highly concurrent data structures, and if we find that it performs satisfyingly in all conditions (with both native and virtual threads) we may also decide to always use it regardless.
@geoand @franz1981 @mariofusco I think for the first step we should have a condition like this:
I would use one of the default implementation for now, until we can measure the performance and see which one is the
We also need to see if we can leverage the JCTool queues to improve the performances. |
Currently What I am thinking, is that we might want to be more grained about which Jackson ObjectMapper we update. |
If I understand this comment correctly "being more grained" will imply that we will have more than one pool instance around. While the current Jackson implementation supports this scenario without any problem, I advice against this because we will have an unnecessary proliferation of those |
Gotcha thanks. What I am concerned about is that is there is say a single |
As I wrote my ideal solution would be to have a pool implementation (possibly leveraging |
Sounds good! |
@geoand @cescoffier Thanks to the change by @mariofusco I think we could leverage this already to speedup non-loom too :P:P See here This is the flamegraph of the JSON test on techempower (which is using the reactive stack AND no pipelining): the encoding to JSON is fairly dominated by FYI soft ref are pretty useless (in practice) due to how Just for reference from https://docs.oracle.com/en/java/javase/17/gctuning/other-considerations.html#GUID-9E3E5371-20F5-4B70-A003-9D7851B115AF
Which means that |
Yeah I agree, @mariofusco's change opens this up for us, that's great! In summary, if we can prove that our own |
@geoand I can easily run (ehm ehm gotta check if the CI is working TBH) a custom quarkus build - as long as the the jackson version is on maven - or use my shiny new Ryzen and play with it already Beware that the fast thread local one is for java < 21 because similarly to thread locals, it won't play nicely with loom, but still...given that it requires few years before everything will run on VT, we can have benefits for everyone in the meantime |
7ec1f8c
to
87c87b1
Compare
As anticipated my original goal was implementing a pool that is good enough for "all seasons" so I put together all suggestions from @cescoffier and @geoand (yeah, why not doing something more fine grained that behaves differently for native and virtual threads and takes the best of the two worlds?) other than the irreplaceable help of @franz1981 to implement this new This pool works regardless of the version of the JVM in use and internally uses 2 distinct pools one for native threads (which is exactly the same Note that this pool also guarantees that the pooled resource is always released to the same internal pool from where it has been acquired, regardless if the releasing thread is different from the one that originally made the acquisition. I developed and ran some preliminary benchmarks and the results looks extremely promising. In particular comparing the original jackson pool with my hybrid one using a benchmark having 100 threads (either native or virtual) running in parallel and marshalling a simple object in json I got the following outcome:
If these performances are confirmed (I will ask @franz1981 to double check them) I don't see any reason why we shouldn't unconditionally use this pool. |
Awesome! |
|
||
private static Predicate<Thread> findIsVirtual() { | ||
try { | ||
MethodHandle virtualMh = MethodHandles.publicLookup().findVirtual(Thread.class, "isVirtual", MethodType.methodType(boolean.class)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would save the MethodHandle in a static final variable or sadly it would be quite costy.
I know that's captured by the predicate but I suggest to benchmark to be sure that will be optimized the same...if we care to not have the MethodHandle populated until necessary, we could use a nice static holder pattern
ie a VirtualChecker static class which would hold the method handle ONLY when it will be called the first time (if will be called the first time).
In this way the usual exceptions thrown by the JVM during the reflective search with method handle, won't happen, until used
|
||
private static long getProbeOffset() { | ||
try { | ||
return UnsafeAccess.UNSAFE.objectFieldOffset(Thread.class.getDeclaredField("threadLocalRandomProbe")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cescoffier there's anything to do here considering that we use the UnsafeAccess from JCTools AND probably native image won't be happy about it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it might be a problem in native. We avoid Unsafe.
@jponge that's also something to consider when using JCTools in Mutiny.
|
||
private final int mask; | ||
|
||
private final MpmcUnboundedXaddArrayQueue<BufferRecycler>[] queues; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This queue is cool, but uses Unsafe
and I didn't (ehm ehm my fault!) implemented a varhandle/atomicref/long version of it, as per other JCTools variants...so maybe @cescoffier got some suggestion about native image or the approach to follow here; I'm not aware of quarkus code directly using Unsafe, but I can be wrong (I didn't searched it)...
Last but not least: JCTools is great (not by merit :P), but the benchmarks doesn't tell it to be the best by large margin tool for the job, because of how cpu works:
- the mpmc queue uses an array adding/removing elements in FIFO order out of it, BUT: if the queue is empty or with less than 16/32 elements in, producer(s) and consumer(s) will perform store(s) (by adding new values, or reading and adding
null
to mark consumption) in the same cache-line, competing for it. The Cache Coherency protocol of CPU will make the ping-pong of such up-to-date information, the primary bottleneck. In order to benefit from using a queue (and get the contention to spread between head/tail), you need a much larger already populated pool (more than 32 elements), but it would cost! In the "real" world (but we can add telemetry and report it, to see if I'm right or not!) the queue will be mostly empty, causing this contention to happen for real, slowing down everyone - there is a LOCK_FREE variant which use a single atomic ref as the top of a linked-list stack: i'm not a BIG FAN of linked-lists, but still, for the most practical cases,it will deliver similar performance, no Unsafe nor JCTools usage, and it's all jackson stuff. And although counterintuitive the reason is what I've explained in the previous point: we fight for a single cache line in both cases. The only difference is that with JCTools releasing doesn't cause any form of competition to change the state, but will still hit the contention of the cache-line in the array, while the LOCK_FREE pool, instead, is always competing to update the state, but is the only relevant difference.
The striping instead, can be made A LOT better, known the fact we have talked about related the contention, in short:
we can use a AtomicReferenceArray which elements are distant 16 elements each (or 32, to be safe? this is because of some Intel quirks with HW prefercher which can make false-sharing to happen with 2 cache lines, not just one), and the slots we can really use are padded to avoid false sharing while we update them (because of java heap noisy neighbours or just JIT code which use/read the java Object header, which is likely on the same cache line of the node value in top element).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not aware of quarkus code directly using Unsafe
Directly, no, we do not use it, but JBoss Threads which we do lean on uses it and I don't see any GraalVM substitutions for it, so I assume it works in native
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no Unsafe nor JCTools usage, and it's all jackson stuff
All else being equal, this would definitely be preferable for Quarkus
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no Unsafe nor JCTools usage, and it's all jackson stuff
All else being equal, this would definitely be preferable for Quarkus
I agree, I discussed this yesterday with @franz1981 and we believe that we can achieve similar performances without using Unsafe or JCTools. I will work on this in the next days.
* Mix thread id with golden ratio and then xorshift it | ||
* to spread consecutive ids (see Knuth multiplicative method as reference). | ||
*/ | ||
int probe = (int) ((Thread.currentThread().getId() * 0x9e3779b9) & Integer.MAX_VALUE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are the perf if we use such? @mariofusco
As discussed I developed a new pool, not depending on JCTools, that is a more sophisticated variation of my original lock free pool. I compared this new pool with the JCTools based one, with and without using
In case you want to give a look, or try the benchmarks on your own, all the experiments that I made with the various pool implementations that I tried are available here. In essence this new version of the pool that neither depends on JCTools nor uses |
extensions/jackson/runtime/src/main/java/io/quarkus/jackson/runtime/HybridJacksonPool.java
Outdated
Show resolved
Hide resolved
return virtualMh != null ? t -> { | ||
try { | ||
return (boolean) virtualMh.invokeExact(t); | ||
} catch (Throwable e) { | ||
throw new RuntimeException(e); | ||
} | ||
} : t -> false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We reallly try hard to avoid using lambas as much as possible in code that ends up on the runtime classpath
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reminder, fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙏
} | ||
|
||
private int probe() { | ||
int probe = (int) ((Thread.currentThread().getId() * 0x9e3779b9) & Integer.MAX_VALUE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs some comment about golden ratio hashing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Jackson 2.16 has been released, so should this be moved forward? |
@geoand as anticipated I moved the pool to vert.x ( see eclipse-vertx/vert.x#4920 ) and already upgraded my PR to jackson 2.16 there. My suggestion is to merge the vert.x pull request (and deploy a 4.5.1 release of vert.x containing it) asap, close this pull request and create a new smaller pull request for quarkus reusing the same pool implementation that we have in vert.x /cc @vietj @franz1981 |
Great, thanks for the update! |
FWIW, the Jackson update is causing some CI issues: #37188 . Not sure if it's a regression or something on our side, I didn't have the time to investigate (and won't have anytime soon). |
I can try and take a look at the end of the week |
@@ -63,6 +63,9 @@ public ObjectMapper objectMapper(@All List<ObjectMapperCustomizer> customizers, | |||
for (ObjectMapperCustomizer customizer : sortedCustomizers) { | |||
customizer.customize(objectMapper); | |||
} | |||
|
|||
objectMapper.getFactory().setBufferRecyclerPool(HybridJacksonPool.INSTANCE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the user decides to use their own ObjectMapper, as defined in https://quarkus.io/guides/resteasy#json, and forgets to set the correct buffer recycler pool. Maybe raise a warning? Or force the buffer recycler pool in any other way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a very good point, we need to see how it behaves
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed, good point!
Replaced by #38196 |
This pull request is a follow up of the work done to make the internal pool used by Jackson configurable and pluggable. It is not intended to be merged (also because we will have to wait the stable release of Jackson 2.16 before doing so), but to start some discussion on how and when to use this feature.
/cc @geoand @cescoffier @franz1981