-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BoundedLocalCache.scheduleDrainBuffers can perform draining on I/O threads #32317
Comments
Yep, you can configure this in the builder using The cache's own logic doesn't need it to maintain good performance. The maintenance work is very cheap (e.g. LRU pointer swaps) and the predecessor libraries amortized it on calling thread. However, we don't know the cost of the eviction callback or if a user's long running map computation blocks an entry's removal (ConcurrentHashMap's lock). Therefore deferring to avoid penalizing the caller for more predictable latencies made sense, especially since it is easy to disable. I don't know if you intend the I/O threads to perform user operations, as the executor is also used by the removal listener notifications and passed to the async cache loader. If you wanted those to not run on your I/O pool but remain asynchronous on FJP, you could have your removal listener forward the work onto an executor. Similarly the async cache loader produces a future so the given executor does not need to be used. So you can adjust all of this depending on what features are enabled and where you want the work to be performed. Longer term we can switch the default to virtual threads when they stop pinning on mutexes. It doesn't make sense until that is fixed because ConcurrentHashMap uses synchronize for internal locking. I am a bit concerned that when initially released in Java 21 it will be quite error prone due to this as an ad hoc map computation will pin the carrier. It will be a better fit for I/O or these inexpensive compute tasks, and avoid this native thread scheduling overhead. |
@ben-manes thanks for the fantastic explanations you often provide :) @franz1981 could you share a bit more detail? Would love to see the full stack to know which cache use we're discussing: we use Caffeine in many areas, I take it from Ben's explanation it would be wise to check if there's any callbacks and/or if this is an "internal only" cache. |
@ben-manes
Good point, I'll check with @Sanne (by providing him how we use it) - if such eviction work would be too costy, running it in the I/O thread(s) could be a good idea. Just a note, with I/O threads I mean the Netty event loop threads, and is an "historical" name: with Vertx (built on top of Netty) we usually run both user tasks and I/O work over them as long as non-blocking (partially false: the Netty allocator, built on top of jemalloc CAN and WILL block on arenas locks, if the thread locals buffers are exhausted).
OT: if Doug L would have switched using Cliff Click non-blocking algo (that we host on JCTools), caring about any locks on CHM were a non-problem anymore :P (I will ask on the concurrency list why he didn't - I couldn't find any info re it around)
I'm collecting some info to share here |
Thanks for the info @franz1981! I am not deeply familiar with netty, vertx, or quarkus so that's interesting stuff. I haven't followed NBHM since high-scale-lib, but originally it had some cases that could make it surprising. The ones that I remember off hand are that keys must be immutable even after removal due to tombstones, removal can return stale entries, and until a resize the removed entries could not be GC'd. All of those might be fixed by now, though. The locking in this case is for atomic computations which NBHM does not support, though lock striping could be added on top to emulate that (reintroducing the problem of blocking computes). There were some early comparisons that you might find interesting. Unfortunately, last I heard the CS dept's server was damaged by a power outage and parts of his website like concurrency-interest and the csv repo were not recovered. Doug was experimenting with avoiding built-in monitor locks by using a bitwise trie of AQS locks, which would be loom friendly and reduce the locking from hashbin to per-entry. I'd bet on monitors being supported by loom before he finishes that work, though. |
Some still there it seems, good memory!
This one didn't change, indeed we had few users complaining of this different behavior from CHM At this point NBHM seems a better solution for long/int keys variants we currently offer in JCTools
woa! Thanks for it; this fall off my radar; very very interesting indeed! |
@franz1981 when you have some cycles, a draft is here: #32448 |
fwiw, I think Quarkus' usage of caffeine has the following properties: If those can be held true and other features like cache refresh (#25921) are not supported, then I believe you can safely force it to any executor you'd like (likely the calling thread). |
Oh, I missed the hibernate cache implementation that you updated. That one is synchronous with its computes, so nevermind the above. |
@Sanne @ben-manes With such test the performance numbers suggest that INLINE is the best performer, followed by ON_FORKJOIN and ON_EVENTLOOP. Note: ON_EVENTLOOP seems the same as the default behaviour, but it's not! The version I've tested is including another modification to the local cache ie Thanks to this, the most issued (it happens twice for each query, at very near distance) drain buffers task is already happening on the same caller thread, saving awaking the FJ pool (for nothing, given that the same added key is removed, right after) see PutFromLoadValidator.Lock lock = putValidator.acquirePutFromLoadLock(session, key, txTimestamp);
if (lock == null) {
if (trace) {
log.tracef("Put from load lock not acquired for key %s", key);
}
return false;
}
try {
cache.putIfAbsent(key, value);
} finally {
putValidator.releasePutFromLoadLock(key, lock);
} The What has been surprising to me is that |
While running few performance tests with https://github.com/quarkusio/quarkus-quickstarts/tree/main/hibernate-reactive-panache-quickstart I can see
BoundedLocalCache.scheduleDrainBuffers
using FJ pool to schedule drain of buffers and, given that this pool is mostly idle, it causes paying the cost of awake it, seeIt would be great to setup the cache to use the I/O threads, that are often busy and awaken; is it a good idea? makes sense? is recommended?
see https://github.com/ben-manes/caffeine/blob/6445bb7504219169e10ad75cf72bcedf1932f603/caffeine/src/main/java/com/github/benmanes/caffeine/cache/BoundedLocalCache.java#L4005 for the API to configure the cache executor.
@ben-manes @Sanne wdyt?
The text was updated successfully, but these errors were encountered: