BoundedLocalCache.scheduleDrainBuffers can perform draining on I/O threads #32317

franz1981 · 2023-03-31T19:52:13Z

While running few performance tests with https://github.com/quarkusio/quarkus-quickstarts/tree/main/hibernate-reactive-panache-quickstart I can see BoundedLocalCache.scheduleDrainBuffers using FJ pool to schedule drain of buffers and, given that this pool is mostly idle, it causes paying the cost of awake it, see

It would be great to setup the cache to use the I/O threads, that are often busy and awaken; is it a good idea? makes sense? is recommended?

see https://github.com/ben-manes/caffeine/blob/6445bb7504219169e10ad75cf72bcedf1932f603/caffeine/src/main/java/com/github/benmanes/caffeine/cache/BoundedLocalCache.java#L4005 for the API to configure the cache executor.

@ben-manes @Sanne wdyt?

The text was updated successfully, but these errors were encountered:

quarkus-bot · 2023-03-31T19:52:16Z

/cc @gwenneg (cache), @mkouba (scheduler)

ben-manes · 2023-03-31T22:58:54Z

Yep, you can configure this in the builder using Caffeine.executor(e) when creating the cache. It is fairly common to see a same thread executor applied (e.g. Runnable::run) like Infinispan does. That's especially nice for unit tests.

The cache's own logic doesn't need it to maintain good performance. The maintenance work is very cheap (e.g. LRU pointer swaps) and the predecessor libraries amortized it on calling thread. However, we don't know the cost of the eviction callback or if a user's long running map computation blocks an entry's removal (ConcurrentHashMap's lock). Therefore deferring to avoid penalizing the caller for more predictable latencies made sense, especially since it is easy to disable.

I don't know if you intend the I/O threads to perform user operations, as the executor is also used by the removal listener notifications and passed to the async cache loader. If you wanted those to not run on your I/O pool but remain asynchronous on FJP, you could have your removal listener forward the work onto an executor. Similarly the async cache loader produces a future so the given executor does not need to be used. So you can adjust all of this depending on what features are enabled and where you want the work to be performed.

Longer term we can switch the default to virtual threads when they stop pinning on mutexes. It doesn't make sense until that is fixed because ConcurrentHashMap uses synchronize for internal locking. I am a bit concerned that when initially released in Java 21 it will be quite error prone due to this as an ad hoc map computation will pin the carrier. It will be a better fit for I/O or these inexpensive compute tasks, and avoid this native thread scheduling overhead.

Sanne · 2023-04-02T13:26:03Z

@ben-manes thanks for the fantastic explanations you often provide :)

@franz1981 could you share a bit more detail? Would love to see the full stack to know which cache use we're discussing: we use Caffeine in many areas, I take it from Ben's explanation it would be wise to check if there's any callbacks and/or if this is an "internal only" cache.

franz1981 · 2023-04-03T08:46:32Z

@ben-manes
Thanks for the detailed answer!

However, we don't know the cost of the eviction callback or if a user's long running map computation blocks an entry's removal (ConcurrentHashMap's lock)

Good point, I'll check with @Sanne (by providing him how we use it) - if such eviction work would be too costy, running it in the I/O thread(s) could be a good idea.

Just a note, with I/O threads I mean the Netty event loop threads, and is an "historical" name: with Vertx (built on top of Netty) we usually run both user tasks and I/O work over them as long as non-blocking (partially false: the Netty allocator, built on top of jemalloc CAN and WILL block on arenas locks, if the thread locals buffers are exhausted).
Hence the sole requirement for the work to be submitted there is to not be blocking (or super computational intensive - that's basically causing HOL to other tasks and I/O events).
In case it's blocking work, it's better to run it in the "blocking thread pool" we already provide; but I see no point to create FJ pools that are unlikely used for any other purpose in Quarkus, unless users use them already (possible but unlikely) or is running a Loom-powered task (via @RunOnVirtualThread, but still unlikely in prod - see FasterXML/jackson-core#919).

It doesn't make sense until that is fixed because ConcurrentHashMap uses synchronize for internal locking

OT: if Doug L would have switched using Cliff Click non-blocking algo (that we host on JCTools), caring about any locks on CHM were a non-problem anymore :P (I will ask on the concurrency list why he didn't - I couldn't find any info re it around)

@Sanne

could you share a bit more detail?

I'm collecting some info to share here

ben-manes · 2023-04-03T16:30:01Z

Thanks for the info @franz1981! I am not deeply familiar with netty, vertx, or quarkus so that's interesting stuff.

I haven't followed NBHM since high-scale-lib, but originally it had some cases that could make it surprising. The ones that I remember off hand are that keys must be immutable even after removal due to tombstones, removal can return stale entries, and until a resize the removed entries could not be GC'd. All of those might be fixed by now, though. The locking in this case is for atomic computations which NBHM does not support, though lock striping could be added on top to emulate that (reintroducing the problem of blocking computes). There were some early comparisons that you might find interesting.

Unfortunately, last I heard the CS dept's server was damaged by a power outage and parts of his website like concurrency-interest and the csv repo were not recovered. Doug was experimenting with avoiding built-in monitor locks by using a bitwise trie of AQS locks, which would be loom friendly and reduce the locking from hashbin to per-entry. I'd bet on monitors being supported by loom before he finishes that work, though.

franz1981 · 2023-04-04T07:32:06Z

@ben-manes

The ones that I remember off hand are that keys must be immutable even after removal due to tombstones, removal can return stale entries, and until a resize the removed entries could not be GC'd. All of those might be fixed by now, though

Some still there it seems, good memory!
See https://github.com/JCTools/JCTools/blob/29e455969bca5443e24b7a46bf93dfca2c401f3b/jctools-core/src/test/java/org/jctools/maps/NBHMRemoveTest.java#L65

The locking in this case is for atomic computations which NBHM does not support

This one didn't change, indeed we had few users complaining of this different behavior from CHM

At this point NBHM seems a better solution for long/int keys variants we currently offer in JCTools

There were some early comparisons that you might find interesting.

woa! Thanks for it; this fall off my radar; very very interesting indeed!

Sanne · 2023-04-05T19:59:47Z

@franz1981 when you have some cycles, a draft is here: #32448
(it's super simple but also requires building hibernate/quarkus-local-cache#6)

ben-manes · 2023-04-06T00:38:22Z

fwiw, I think Quarkus' usage of caffeine has the following properties:
a. AsyncCache operations are performed by the calling thread through a putIfAbsent of an incomplete future that it then runs.
b. There is no support for users to configure a removal callback (#32413).
c. There are no long running computes, if we can assume that getAsync(K key, Function<K, Uni<V>> valueLoader) is a trivial operation to return the reactive type (but we can't trust that if a user provided foreign function call).

If those can be held true and other features like cache refresh (#25921) are not supported, then I believe you can safely force it to any executor you'd like (likely the calling thread).

ben-manes · 2023-04-06T03:43:45Z

Oh, I missed the hibernate cache implementation that you updated. That one is synchronous with its computes, so nevermind the above.

franz1981 · 2023-04-06T15:36:56Z

@Sanne @ben-manes
I've performed some runs of https://github.com/quarkusio/quarkus-quickstarts/tree/main/hibernate-reactive-panache-quickstart using the 3 options provided: INLINE, ON_FORKJOIN and ON_EVENTLOOP.
Remember: the test is not caching the query result as a whole, but just the single entities found; basically it keep on querying the db while updating the cache each time.

With such test the performance numbers suggest that INLINE is the best performer, followed by ON_FORKJOIN and ON_EVENTLOOP.

Note: ON_EVENTLOOP seems the same as the default behaviour, but it's not!

The version I've tested is including another modification to the local cache ie
hibernate/quarkus-local-cache#6 (comment)

Thanks to this, the most issued (it happens twice for each query, at very near distance) drain buffers task is already happening on the same caller thread, saving awaking the FJ pool (for nothing, given that the same added key is removed, right after) see

        PutFromLoadValidator.Lock lock = putValidator.acquirePutFromLoadLock(session, key, txTimestamp);
        if (lock == null) {
            if (trace) {
                log.tracef("Put from load lock not acquired for key %s", key);
            }
            return false;
        }

        try {
            cache.putIfAbsent(key, value);
        } finally {
            putValidator.releasePutFromLoadLock(key, lock);
        }

The putValidator.* ops are adding and removing a key in a different caffeine cache from cache.putIfAbsent.

What has been surprising to me is that ON_EVENTLOOP has performed slightly worse then ON_FORKJOIN, but within error margins.

franz1981 added the kind/enhancement New feature or request label Mar 31, 2023

quarkus-bot bot added area/cache area/scheduler labels Mar 31, 2023

Sanne self-assigned this Apr 5, 2023

This was referenced Apr 5, 2023

Allow injecting a Cache Executor into each Caffeine cache being configured hibernate/quarkus-local-cache#6

Merged

Local cache upgrade #32448

Merged

Sanne closed this as completed in hibernate/quarkus-local-cache#6 Apr 7, 2023

ben-manes mentioned this issue Feb 16, 2024

Please update the link in the Readme maypok86/otter#33

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BoundedLocalCache.scheduleDrainBuffers can perform draining on I/O threads #32317

BoundedLocalCache.scheduleDrainBuffers can perform draining on I/O threads #32317

franz1981 commented Mar 31, 2023 •

edited

Loading

quarkus-bot bot commented Mar 31, 2023

ben-manes commented Mar 31, 2023

Sanne commented Apr 2, 2023

franz1981 commented Apr 3, 2023 •

edited

Loading

ben-manes commented Apr 3, 2023

franz1981 commented Apr 4, 2023

Sanne commented Apr 5, 2023

ben-manes commented Apr 6, 2023

ben-manes commented Apr 6, 2023

franz1981 commented Apr 6, 2023 •

edited

Loading

BoundedLocalCache.scheduleDrainBuffers can perform draining on I/O threads #32317

BoundedLocalCache.scheduleDrainBuffers can perform draining on I/O threads #32317

Comments

franz1981 commented Mar 31, 2023 • edited Loading

quarkus-bot bot commented Mar 31, 2023

ben-manes commented Mar 31, 2023

Sanne commented Apr 2, 2023

franz1981 commented Apr 3, 2023 • edited Loading

ben-manes commented Apr 3, 2023

franz1981 commented Apr 4, 2023

Sanne commented Apr 5, 2023

ben-manes commented Apr 6, 2023

ben-manes commented Apr 6, 2023

franz1981 commented Apr 6, 2023 • edited Loading

franz1981 commented Mar 31, 2023 •

edited

Loading

franz1981 commented Apr 3, 2023 •

edited

Loading

franz1981 commented Apr 6, 2023 •

edited

Loading