Skip to content
This repository has been archived by the owner on Jul 7, 2020. It is now read-only.

EvictionMediator === weight of zero? #11

Open
ben-manes opened this issue Oct 21, 2015 · 7 comments
Open

EvictionMediator === weight of zero? #11

ben-manes opened this issue Oct 21, 2015 · 7 comments

Comments

@ben-manes
Copy link

MediatedEvictionConcurrentHashMap is fork of ConcurrentLinkedHashMap v1.3 to provide more control over which victim is selected for eviction. If an entry is currently in use, e.g. locked as part of a transaction, then it is not eligible and the cache should find another candidate. The mediator helps coordinate in a state machine, as well as provide a synchronous hook for persisting the evicted entry.

ConcurrentLinkedHashMap is being replaced by Caffeine, so I'd like to see if the features of that library would serve your purpose. If so, then it provides an upgrade path and will allow you to not maintain a fork.

The challenge with mediation as a feature in a generic cache is that in the worst case it is O(n) and perhaps no candidate is available. That breaks the contract and, while perhaps a user error, is not a good API design for general purpose. This is evident by Ehcache's EvictionVeto which tries to serve this case. However, if all (8) evaluated entries are vetoed then Ehcache evicts a non-eligible entry anyways, though this is not documented and may lead to surprising behavior.

An alternative idea might be to model this using weights. An entry with a weight of 0 is not size-based evicted and does not count towards the cache's maximum size. The cache can still find a victim to remove by skipping these entries, so its functionality is correct. The caveat that those entries don't count towards eviction may lead to an overall greater cache size, as the mediated cache might find a victim in the common case. It also requires a write into the cache to modify the entry's state (via compute), which is slower than atomically toggling a flag in the entry itself. However, when combined with a CacheWriter for a synchronous hook with eviction, it would serve the purpose Hydra needs.

Thoughts?

@tea-dragon
Copy link
Contributor

I actually spent quite a bit of time thinking about this a while ago. I think I might have even switched to caffeine for one of our simpler loading cache cases cause it looked so neat. I figured even just the cache writer's synchronous listener combined with a transient layer where leased but evicted entries could live would suffice and probably be fine -- presumably if something has just expressed interest in using an entry, it isn't likely to be evicted anyway. I've never considered trying to modify an entry's weight before though... I'm seeing now that caffeine has a small but important "and update time" clause added to when weights can be computed. That might be useful in a few other places as well.

One of the issues I have with the current (mediated) cache is that occasionally you get these pathological cases where the cache becomes way oversized, but every entry is considered ineligible for eviction. The API provided here only allows for that knowledge to be shared as it is slowly iterating through candidates. I've always thought Hydra's needs would be best met by either:

  • further specializing the cache to handle leases -- and so only adding entries to the eviction queue when leases hits zero
  • or just doing a better job of keeping keys around rather than values themselves, and then using the shared locks of the concurrent map methods on the cache itself (eg. compute) for any logic that used to use leasing

If entries with zero weight don't add to eviction scan times, then that sounds like it might be a possible improvement. The information is available to it ahead of time, so I think it should be possible, but relies on that optimization existing. If not, it's still pretty much a lateral move at worst, but maybe we'd rather just have a fall back layer to address the cache thrashing at the same time.

Oh. I think at the time I was last evaluating it, cache writer might not have been in any released version. I trust that's since changed, but I have a hard time keeping track of every library I like.

@ben-manes
Copy link
Author

Its been a long development slog, so I finally got to a point where it seemed reasonable to start this discussion. I don't think versions prior to v2 will fit your use-case.

In v1.x there's a race where an eviction candidate is selected, concurrently the weight is updated to zero, and is still evicted. This should be fixed in v2, as it wasn't a user reported bug and something I caught when rewriting the eviction policy. So v1.x won't work for you using the weight trick.

One reason for bringing this topic up was to find out whether zero weights will be used enough to deserve to be optimized. In my backlog is the idea of putting those entries on their own queue so that they are not scanned by the eviction policy. The queue is needed rather than free floating because, (1) they still count for expireAfterAccess and (2) policy-ordered iteration is a seldom but useful feature (so we need to retain access-order). If you think its an important optimization, then I can try to cram it into the upcoming release or include it in the following one.

A difference that might help reduce the impact of the pathological impact is that, by default, the eviction work is delegated to an Executor. Specifically, ForkJoinPool.commonPool() which was added in Java 8. That doesn't remove the worse case, but at least lets the penalty not be visible to a user thread.

On a side topic, if this worked out it would be interesting to see the impact of the new eviction policy, Window TinyLfu.

@tea-dragon
Copy link
Contributor

It would be a convenient optimization, and maybe important to other cases that expect to have many zero weight entries. However, I feel like the more correct solution is to not have so many transiently unevictable entries. I guess they are somewhat unavoidable if you have to lock on multiple cache values, but I'm not sure how commonly that is required. Certainly convenient though. People tend to do convenient things.

Not punishing a user thread is nice, but at this point, every thread is a user thread to me (even ones that aren't in globally accessible thread pools). Any cpu core spinning on nothing is a pretty bad place. On the other hand, the unimpeded user threads would probably either quickly clear the pathological case or oom the whole jvm anyway. That's definitely more helpful/noticeable than stalling out.

Speaking of ooming the jvm, is there a cap on how many pending evictions can be foisted onto the fork join pool before blocking additional loads? If not, I'd probably have to disable it anyway, and if so, I guess I just wouldn't get those earlier benefits. This is getting hard to look forward to...

New eviction algorithms are sweet though! Especially if I end up being able to replace our skip list cache as well. And even if not, free java code!

@tea-dragon
Copy link
Contributor

Thank you so much for taking an interest by the way. Nothing eases my mind as much as deleting code.

@ben-manes
Copy link
Author

There aren't any explicit caps.

In CLHM the amount of work for the policy maintenance was capped and amortized. That made sense to not overly penalize calling threads, but now that its in FJP it wasn't too useful. I guess you could say there is a cap with respect to the read buffers, which are lossy ring buffers used to replay accesses on the policy.

There isn't any throttling of loads vs evictions, except from the perspective of CHM's bucket locks. Because the expectation is that the work is small batches of O(1) operations, some form of rate limiting hasn't made sense.

FJP is only spammed when notifying the removal listener, but maintenance is a single task with guards to not thrash the pool. So only a slow CacheWriter.delete(...) could be harmful, but that's usually easy to mitigate.

I'm hopeful that this will end up letting you delete the code, but if not at least we'll learn more about the boundary conditions.

@tea-dragon
Copy link
Contributor

No concerns about user threads out numbering the single maintenance task and outpacing evictions with creations?

@ben-manes
Copy link
Author

Nope, artificial stress testing never showed that happening. The writes have a slower path going through the hash table, submitting the work to a write buffer (custom lock-free queue), and trying to schedule the maintenance task if its not processing. The maintenance always rips through its work faster than writes. None of this is very different from CLHM, which would be mimic'd by using a same-thread executor.

The only concern is if a custom CacheWriter makes the eviction really slow, so the above isn't true. That would have the exact same impact as if your mediator is slow, since they intercept in a similar point. That's usually easy to mitigate by having the synchronous write do the minimal work (e.g. persist header data) and then schedule the rest of the work asynchronously. So just like your mediator patch, it puts more responsibility on the user since we're calling foreign code in a critical section.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants