Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BoundedLocalCache.scheduleDrainBuffers can perform draining on I/O threads #32317

Closed
franz1981 opened this issue Mar 31, 2023 · 10 comments · Fixed by hibernate/quarkus-local-cache#6
Assignees

Comments

@franz1981
Copy link
Contributor

franz1981 commented Mar 31, 2023

While running few performance tests with https://github.com/quarkusio/quarkus-quickstarts/tree/main/hibernate-reactive-panache-quickstart I can see BoundedLocalCache.scheduleDrainBuffers using FJ pool to schedule drain of buffers and, given that this pool is mostly idle, it causes paying the cost of awake it, see

image

It would be great to setup the cache to use the I/O threads, that are often busy and awaken; is it a good idea? makes sense? is recommended?

see https://github.com/ben-manes/caffeine/blob/6445bb7504219169e10ad75cf72bcedf1932f603/caffeine/src/main/java/com/github/benmanes/caffeine/cache/BoundedLocalCache.java#L4005 for the API to configure the cache executor.

@ben-manes @Sanne wdyt?

@quarkus-bot
Copy link

quarkus-bot bot commented Mar 31, 2023

/cc @gwenneg (cache), @mkouba (scheduler)

@ben-manes
Copy link

Yep, you can configure this in the builder using Caffeine.executor(e) when creating the cache. It is fairly common to see a same thread executor applied (e.g. Runnable::run) like Infinispan does. That's especially nice for unit tests.

The cache's own logic doesn't need it to maintain good performance. The maintenance work is very cheap (e.g. LRU pointer swaps) and the predecessor libraries amortized it on calling thread. However, we don't know the cost of the eviction callback or if a user's long running map computation blocks an entry's removal (ConcurrentHashMap's lock). Therefore deferring to avoid penalizing the caller for more predictable latencies made sense, especially since it is easy to disable.

I don't know if you intend the I/O threads to perform user operations, as the executor is also used by the removal listener notifications and passed to the async cache loader. If you wanted those to not run on your I/O pool but remain asynchronous on FJP, you could have your removal listener forward the work onto an executor. Similarly the async cache loader produces a future so the given executor does not need to be used. So you can adjust all of this depending on what features are enabled and where you want the work to be performed.

Longer term we can switch the default to virtual threads when they stop pinning on mutexes. It doesn't make sense until that is fixed because ConcurrentHashMap uses synchronize for internal locking. I am a bit concerned that when initially released in Java 21 it will be quite error prone due to this as an ad hoc map computation will pin the carrier. It will be a better fit for I/O or these inexpensive compute tasks, and avoid this native thread scheduling overhead.

@Sanne
Copy link
Member

Sanne commented Apr 2, 2023

@ben-manes thanks for the fantastic explanations you often provide :)

@franz1981 could you share a bit more detail? Would love to see the full stack to know which cache use we're discussing: we use Caffeine in many areas, I take it from Ben's explanation it would be wise to check if there's any callbacks and/or if this is an "internal only" cache.

@franz1981
Copy link
Contributor Author

franz1981 commented Apr 3, 2023

@ben-manes
Thanks for the detailed answer!

However, we don't know the cost of the eviction callback or if a user's long running map computation blocks an entry's removal (ConcurrentHashMap's lock)

Good point, I'll check with @Sanne (by providing him how we use it) - if such eviction work would be too costy, running it in the I/O thread(s) could be a good idea.

Just a note, with I/O threads I mean the Netty event loop threads, and is an "historical" name: with Vertx (built on top of Netty) we usually run both user tasks and I/O work over them as long as non-blocking (partially false: the Netty allocator, built on top of jemalloc CAN and WILL block on arenas locks, if the thread locals buffers are exhausted).
Hence the sole requirement for the work to be submitted there is to not be blocking (or super computational intensive - that's basically causing HOL to other tasks and I/O events).
In case it's blocking work, it's better to run it in the "blocking thread pool" we already provide; but I see no point to create FJ pools that are unlikely used for any other purpose in Quarkus, unless users use them already (possible but unlikely) or is running a Loom-powered task (via @RunOnVirtualThread, but still unlikely in prod - see FasterXML/jackson-core#919).

It doesn't make sense until that is fixed because ConcurrentHashMap uses synchronize for internal locking

OT: if Doug L would have switched using Cliff Click non-blocking algo (that we host on JCTools), caring about any locks on CHM were a non-problem anymore :P (I will ask on the concurrency list why he didn't - I couldn't find any info re it around)

@Sanne

could you share a bit more detail?

I'm collecting some info to share here

@ben-manes
Copy link

Thanks for the info @franz1981! I am not deeply familiar with netty, vertx, or quarkus so that's interesting stuff.

I haven't followed NBHM since high-scale-lib, but originally it had some cases that could make it surprising. The ones that I remember off hand are that keys must be immutable even after removal due to tombstones, removal can return stale entries, and until a resize the removed entries could not be GC'd. All of those might be fixed by now, though. The locking in this case is for atomic computations which NBHM does not support, though lock striping could be added on top to emulate that (reintroducing the problem of blocking computes). There were some early comparisons that you might find interesting.

Unfortunately, last I heard the CS dept's server was damaged by a power outage and parts of his website like concurrency-interest and the csv repo were not recovered. Doug was experimenting with avoiding built-in monitor locks by using a bitwise trie of AQS locks, which would be loom friendly and reduce the locking from hashbin to per-entry. I'd bet on monitors being supported by loom before he finishes that work, though.

@franz1981
Copy link
Contributor Author

@ben-manes

The ones that I remember off hand are that keys must be immutable even after removal due to tombstones, removal can return stale entries, and until a resize the removed entries could not be GC'd. All of those might be fixed by now, though

Some still there it seems, good memory!
See https://github.com/JCTools/JCTools/blob/29e455969bca5443e24b7a46bf93dfca2c401f3b/jctools-core/src/test/java/org/jctools/maps/NBHMRemoveTest.java#L65

The locking in this case is for atomic computations which NBHM does not support

This one didn't change, indeed we had few users complaining of this different behavior from CHM

At this point NBHM seems a better solution for long/int keys variants we currently offer in JCTools

There were some early comparisons that you might find interesting.

woa! Thanks for it; this fall off my radar; very very interesting indeed!

@Sanne
Copy link
Member

Sanne commented Apr 5, 2023

@franz1981 when you have some cycles, a draft is here: #32448
(it's super simple but also requires building hibernate/quarkus-local-cache#6)

@ben-manes
Copy link

fwiw, I think Quarkus' usage of caffeine has the following properties:
a. AsyncCache operations are performed by the calling thread through a putIfAbsent of an incomplete future that it then runs.
b. There is no support for users to configure a removal callback (#32413).
c. There are no long running computes, if we can assume that getAsync(K key, Function<K, Uni<V>> valueLoader) is a trivial operation to return the reactive type (but we can't trust that if a user provided foreign function call).

If those can be held true and other features like cache refresh (#25921) are not supported, then I believe you can safely force it to any executor you'd like (likely the calling thread).

@ben-manes
Copy link

Oh, I missed the hibernate cache implementation that you updated. That one is synchronous with its computes, so nevermind the above.

@franz1981
Copy link
Contributor Author

franz1981 commented Apr 6, 2023

@Sanne @ben-manes
I've performed some runs of https://github.com/quarkusio/quarkus-quickstarts/tree/main/hibernate-reactive-panache-quickstart using the 3 options provided: INLINE, ON_FORKJOIN and ON_EVENTLOOP.
Remember: the test is not caching the query result as a whole, but just the single entities found; basically it keep on querying the db while updating the cache each time.

With such test the performance numbers suggest that INLINE is the best performer, followed by ON_FORKJOIN and ON_EVENTLOOP.

Note: ON_EVENTLOOP seems the same as the default behaviour, but it's not!

The version I've tested is including another modification to the local cache ie
hibernate/quarkus-local-cache#6 (comment)

Thanks to this, the most issued (it happens twice for each query, at very near distance) drain buffers task is already happening on the same caller thread, saving awaking the FJ pool (for nothing, given that the same added key is removed, right after) see

        PutFromLoadValidator.Lock lock = putValidator.acquirePutFromLoadLock(session, key, txTimestamp);
        if (lock == null) {
            if (trace) {
                log.tracef("Put from load lock not acquired for key %s", key);
            }
            return false;
        }

        try {
            cache.putIfAbsent(key, value);
        } finally {
            putValidator.releasePutFromLoadLock(key, lock);
        }

The putValidator.* ops are adding and removing a key in a different caffeine cache from cache.putIfAbsent.

What has been surprising to me is that ON_EVENTLOOP has performed slightly worse then ON_FORKJOIN, but within error margins.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants