-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concurrent access issue in EvictableCache
#23384
Comments
@raunaqmorarka can you take a look? |
@Pluies awesome analysis! If this fails, we should leverage the fact you have repro inside your environment.
and here
|
Here is test that breaks @RepeatedTest(1000)
public void testLongLoadingCacheEntries()
{
Cache<String, String> cache = EvictableCacheBuilder.newBuilder()
.expireAfterWrite(Duration.ofSeconds(60))
.maximumSize(10)
.build();
try (ExecutorService executor = Executors.newFixedThreadPool(2)) {
Runnable longRunningCacheLoad = () -> {
try {
// adding Thread.sleep() to loader will make it fail even more frequently
cache.get("key", () -> "value");
}
catch (ExecutionException e) {
throw new RuntimeException(e);
}
};
executor.submit(longRunningCacheLoad);
executor.submit(longRunningCacheLoad);
}
assertThat(cache.getIfPresent("key")).isNotNull();
} Race is inside trino/lib/trino-cache/src/main/java/io/trino/cache/EvictableCache.java Lines 234 to 240 in b87e216
is correct, as by this comment:
key field inside Token is the same, but as its instance based equality, given the fact that token was removed, produces new cache key
Opened PR fixing this behavior: |
we know the ongoing loads ( @oskar-szwajkowski thanks for the analysis. |
@findepi Here is screenshot from debugger + diagram of one of possible threads behavior that could lead to this. I cannot find root cause in caches code, but I can see that it happen. Here is timeline of threads race: This screenshot from debugger shows this. Thread 1 locks There must be a race between Here is code that also does not present race behavior (one in linked PR is also race-free as far as my testing goes), but it looks hacky: // Token eviction via removalListener is blocked during loading, so we may need to do manual cleanup
private void removeDangling(Token<K> token, Object value)
{
try {
dataCache.get(token, () -> {
throw new RuntimeException("No value in cache");
});
}
catch (ExecutionException e) {
throw new RuntimeException(e);
}
catch (RuntimeException e) {
if (e.getMessage().equals("No value in cache")) {
tokens.remove(token.getKey(), token);
}
}
} |
@Pluies let us know whether 458 works for you and once again, thank you for detailed issue report :) |
this is an awesome summary of the problem @oskar-szwajkowski , thank you |
Hey folks,
I've been tracking down a rare issue of really long optimizing or planning times in Delta Lake after upgrading from Trino 448 to 453, leading to infrequent optimizer timeouts or planning timeouts.
Eventually, I managed to consistently reproduce it by running two concurrent MERGE INTO queries that use the same source table. They look something like:
And:
Where
src_query_foo
andsrc_query_bar
are both complex-ish queries involving the same source table, let's call itinstruction_calls_decoded_0018
(picked totally at random 😄).In Trino 448, these queries used to take 45s or so. In Trino 453, they fail after reaching the default optimizer timeout of 3 minutes.
Looking at tracing, what takes the most time in this is the optimizing step, which calls `Metadata.getTableStatistics several times in a row:
Interestingly, things work correctly (like in Trino 448) when:
For example a single-threaded run looks like:
We can see as many
Metadata.getTableStatistics
calls happening, but only the first one takes a long time, subsequent calls use the cache and return very quickly (ish - it still works through a bunch of data, but at least it doesn't reload everything from object store).(Aside: the fact this takes 35 to 40 seconds is due to the table being massive, and under-optimized. The checkpoint itself comes in at 250MB and the table is 1M files. We know this is bad, and we've brought it down to ~20 seconds with some optimisations, so the next logs might only show 20s instead of 40s for each getTableStatistics call. On the plus side, it's been a great way to spot this bug!)
Looking at leads us to TransactionLogAccess, which has two caches:
tableSnapshots
(link) which caches whole-table snapshotsactiveDataFileCache
(link) which caches active files for a given table versionThe first cache is not hit in our queries as they use different connector handles (with different filters etc) from
instruction_calls_decoded_0018
. The behavious is identical in Trino 448 and 453.activeDataFileCache
however should be used. It is correctly hit in Trino 448 and returns cached files, avoiding the costly retrieval of all these files from object store.In Trino 453 with concurrent access, we're seeing the two queries each populating this cache then not being able to read back from it after the other query has written to it, leading to a situation where we see:
Additional logging checks cache before looking it up here, has
Starting getActiveFiles
at the start of getActiveFiles, andSTARTING READING
/FINISHED READING
at the start & end of the lambda function that populates the cache.We thought of potential issues being:
-> the connector is running
delta.metadata.live-files.cache-size=21.76GB
anddelta.metadata.live-files.cache-ttl=30m
, which are both large values. We tried bumping the cache size to 60GB to double-check, and the problem persisted.-> this doesn't make sense as the logic hasn't changed between Trino 448 and 453. The table name, schema, and version are all identical, so the cache should be hit. To confirm this wasn't the cause, we tried rewriting the cache to use a super-simple
record
of two strings as the cache key instead of the more complexTableVersion
, but the problem persisted.Eventually after several debugging sessions and code dives, and looking at changes between 448 and 453, we zeroed in on #22302 as the root cause of this issue.
When this commit is reverted, the original behaviour is restored and concurrent cache access works again. From a fresh cache, logs then look like:
The thing is... We're still not fully sure why! We've tried writing a test case to confirm this behaviour, but the cache behaves properly under test conditions, so we haven't been able to confirm at this level unfortunately.
However, because the issue is in the
EvictableCache
implementation, this has the potential to impact things everywhere in the Trino codebase, so this is probably fairly important, and I thought it'd be best to raise it straight away.cc @findepi as you wrote the code to fix the cache 🙏 sorry there isn't a proper repro, but hopefully this is enough to get you going!
The text was updated successfully, but these errors were encountered: