Concurrent access issue in `EvictableCache` #23384

Pluies · 2024-09-12T10:51:05Z

Hey folks,

I've been tracking down a rare issue of really long optimizing or planning times in Delta Lake after upgrading from Trino 448 to 453, leading to infrequent optimizer timeouts or planning timeouts.

Eventually, I managed to consistently reproduce it by running two concurrent MERGE INTO queries that use the same source table. They look something like:

MERGE INTO dest_table_a USING src_query_foo (...)

And:

MERGE INTO dest_table_b USING src_query_bar (...)

Where src_query_foo and src_query_bar are both complex-ish queries involving the same source table, let's call it instruction_calls_decoded_0018 (picked totally at random 😄).

In Trino 448, these queries used to take 45s or so. In Trino 453, they fail after reaching the default optimizer timeout of 3 minutes.

Looking at tracing, what takes the most time in this is the optimizing step, which calls `Metadata.getTableStatistics several times in a row:

Interestingly, things work correctly (like in Trino 448) when:

Either query runs by itself (no concurrency)
Both queries run concurrently after at least one successful run (i.e. some cache is loaded)

For example a single-threaded run looks like:

We can see as many Metadata.getTableStatistics calls happening, but only the first one takes a long time, subsequent calls use the cache and return very quickly (ish - it still works through a bunch of data, but at least it doesn't reload everything from object store).

(Aside: the fact this takes 35 to 40 seconds is due to the table being massive, and under-optimized. The checkpoint itself comes in at 250MB and the table is 1M files. We know this is bad, and we've brought it down to ~20 seconds with some optimisations, so the next logs might only show 20s instead of 40s for each getTableStatistics call. On the plus side, it's been a great way to spot this bug!)

Looking at leads us to TransactionLogAccess, which has two caches:

tableSnapshots (link) which caches whole-table snapshots
activeDataFileCache (link) which caches active files for a given table version

The first cache is not hit in our queries as they use different connector handles (with different filters etc) from instruction_calls_decoded_0018. The behavious is identical in Trino 448 and 453.

activeDataFileCache however should be used. It is correctly hit in Trino 448 and returns cached files, avoiding the costly retrieval of all these files from object store.

In Trino 453 with concurrent access, we're seeing the two queries each populating this cache then not being able to read back from it after the other query has written to it, leading to a situation where we see:

2024-09-11T19:33:57.205Z	Query-20240911_193351_00050_6rwdr-470	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:33:57.205Z	Query-20240911_193351_00050_6rwdr-470	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161 already cached? true (matching entries in cache=0)
2024-09-11T19:33:57.205Z	Query-20240911_193351_00050_6rwdr-470	STARTING READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:33:57.630Z	Query-20240911_193352_00053_6rwdr-486	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:33:57.630Z	Query-20240911_193352_00053_6rwdr-486	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161 already cached? true (matching entries in cache=1)
2024-09-11T19:34:20.447Z	Query-20240911_193351_00050_6rwdr-470	FINISHED READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161, active files: 574606
2024-09-11T19:34:20.883Z	Query-20240911_193352_00053_6rwdr-486	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:34:20.884Z	Query-20240911_193352_00053_6rwdr-486	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161 already cached? true (matching entries in cache=0)
2024-09-11T19:34:20.884Z	Query-20240911_193352_00053_6rwdr-486	STARTING READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:34:21.261Z	Query-20240911_193351_00050_6rwdr-470	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:34:21.261Z	Query-20240911_193351_00050_6rwdr-470	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161 already cached? true (matching entries in cache=1)
2024-09-11T19:34:43.926Z	Query-20240911_193352_00053_6rwdr-486	FINISHED READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161, active files: 574606
2024-09-11T19:34:44.237Z	Query-20240911_193351_00050_6rwdr-470	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:34:44.237Z	Query-20240911_193351_00050_6rwdr-470	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161 already cached? true (matching entries in cache=0)
2024-09-11T19:34:44.237Z	Query-20240911_193351_00050_6rwdr-470	STARTING READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:34:44.765Z	Query-20240911_193352_00053_6rwdr-486	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:34:44.765Z	Query-20240911_193352_00053_6rwdr-486	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161 already cached? true (matching entries in cache=1)
2024-09-11T19:35:07.201Z	Query-20240911_193351_00050_6rwdr-470	FINISHED READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161, active files: 574606
2024-09-11T19:35:07.531Z	Query-20240911_193352_00053_6rwdr-486	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:35:07.531Z	Query-20240911_193352_00053_6rwdr-486	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161 already cached? true (matching entries in cache=0)
2024-09-11T19:35:07.531Z	Query-20240911_193352_00053_6rwdr-486	STARTING READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:35:07.998Z	Query-20240911_193351_00050_6rwdr-470	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:35:07.998Z	Query-20240911_193351_00050_6rwdr-470	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161 already cached? true (matching entries in cache=1)
2024-09-11T19:35:30.815Z	Query-20240911_193352_00053_6rwdr-486	FINISHED READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161, active files: 574606
2024-09-11T19:35:31.122Z	Query-20240911_193351_00050_6rwdr-470	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:35:31.122Z	Query-20240911_193351_00050_6rwdr-470	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161 already cached? true (matching entries in cache=0)
2024-09-11T19:35:31.122Z	Query-20240911_193351_00050_6rwdr-470	STARTING READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:35:31.627Z	Query-20240911_193352_00053_6rwdr-486	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:35:31.627Z	Query-20240911_193352_00053_6rwdr-486	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161 already cached? true (matching entries in cache=1)
2024-09-11T19:35:54.518Z	Query-20240911_193351_00050_6rwdr-470	FINISHED READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161, active files: 574606
2024-09-11T19:35:54.827Z	Query-20240911_193352_00053_6rwdr-486	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:35:54.827Z	Query-20240911_193352_00053_6rwdr-486	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161 already cached? true (matching entries in cache=0)
2024-09-11T19:35:55.003Z	Query-20240911_193352_00053_6rwdr-486	STARTING READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:35:55.305Z	Query-20240911_193351_00050_6rwdr-470	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:35:55.305Z	Query-20240911_193351_00050_6rwdr-470	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161 already cached? true (matching entries in cache=1)
2024-09-11T19:36:17.330Z	Query-20240911_193352_00053_6rwdr-486	FINISHED READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161, active files: 574606
2024-09-11T19:36:18.487Z	Query-20240911_193351_00050_6rwdr-470	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:36:18.487Z	Query-20240911_193351_00050_6rwdr-470	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161 already cached? true (matching entries in cache=0)
2024-09-11T19:36:18.487Z	Query-20240911_193351_00050_6rwdr-470	STARTING READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:36:18.915Z	Query-20240911_193352_00053_6rwdr-486	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:36:18.915Z	Query-20240911_193352_00053_6rwdr-486	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161 already cached? true (matching entries in cache=1)
2024-09-11T19:36:42.218Z	Query-20240911_193351_00050_6rwdr-470	FINISHED READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161, active files: 574606
2024-09-11T19:36:42.220Z	Query-20240911_193352_00053_6rwdr-486	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:36:42.220Z	Query-20240911_193352_00053_6rwdr-486	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161 already cached? true (matching entries in cache=0)
2024-09-11T19:36:42.220Z	Query-20240911_193352_00053_6rwdr-486	STARTING READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:36:42.676Z	Query-20240911_193351_00050_6rwdr-470	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:36:42.676Z	Query-20240911_193351_00050_6rwdr-470	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161 already cached? true (matching entries in cache=1)
2024-09-11T19:37:06.286Z	Query-20240911_193352_00053_6rwdr-486	FINISHED READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161, active files: 574606
2024-09-11T19:37:06.288Z	Query-20240911_193351_00050_6rwdr-470	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:37:06.288Z	Query-20240911_193351_00050_6rwdr-470	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161 already cached? true (matching entries in cache=0)
2024-09-11T19:37:06.288Z	Query-20240911_193351_00050_6rwdr-470	STARTING READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:37:06.799Z	Query-20240911_193352_00053_6rwdr-486	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:37:06.799Z	Query-20240911_193352_00053_6rwdr-486	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161 already cached? true (matching entries in cache=1)
2024-09-11T19:37:30.129Z	Query-20240911_193351_00050_6rwdr-470	FINISHED READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161, active files: 574606
2024-09-11T19:37:30.131Z	Query-20240911_193352_00053_6rwdr-486	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:37:30.131Z	Query-20240911_193352_00053_6rwdr-486	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161 already cached? true (matching entries in cache=0)
2024-09-11T19:37:30.131Z	Query-20240911_193352_00053_6rwdr-486	STARTING READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:37:30.625Z	Query-20240911_193351_00050_6rwdr-470	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:37:30.625Z	Query-20240911_193351_00050_6rwdr-470	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161 already cached? true (matching entries in cache=1)
2024-09-11T19:37:54.108Z	Query-20240911_193352_00053_6rwdr-486	FINISHED READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161, active files: 574606
2024-09-11T19:37:54.110Z	Query-20240911_193351_00050_6rwdr-470	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:37:54.110Z	Query-20240911_193351_00050_6rwdr-470	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161 already cached? true (matching entries in cache=0)
2024-09-11T19:37:54.110Z	Query-20240911_193351_00050_6rwdr-470	STARTING READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:37:54.632Z	Query-20240911_193352_00053_6rwdr-486	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:37:54.632Z	Query-20240911_193352_00053_6rwdr-486	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161 already cached? true (matching entries in cache=1)
2024-09-11T19:38:18.161Z	Query-20240911_193351_00050_6rwdr-470	FINISHED READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161, active files: 574606
2024-09-11T19:38:18.163Z	Query-20240911_193352_00053_6rwdr-486	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:38:18.163Z	Query-20240911_193352_00053_6rwdr-486	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161 already cached? true (matching entries in cache=0)
2024-09-11T19:38:18.163Z	Query-20240911_193352_00053_6rwdr-486	STARTING READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:38:18.621Z	Query-20240911_193351_00050_6rwdr-470	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:38:18.621Z	Query-20240911_193351_00050_6rwdr-470	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161 already cached? true (matching entries in cache=1)
2024-09-11T19:38:43.318Z	Query-20240911_193352_00053_6rwdr-486	FINISHED READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161, active files: 574606
2024-09-11T19:38:44.263Z	Query-20240911_193352_00053_6rwdr-486	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:38:44.263Z	Query-20240911_193352_00053_6rwdr-486	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161 already cached? true (matching entries in cache=0)
2024-09-11T19:38:44.263Z	Query-20240911_193352_00053_6rwdr-486	STARTING READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161

Additional logging checks cache before looking it up here, has Starting getActiveFiles at the start of getActiveFiles, and STARTING READING / FINISHED READING at the start & end of the lambda function that populates the cache.

We thought of potential issues being:

Is the cache filling up and simply discarding entries due to size? Or maybe ttl?

-> the connector is runningdelta.metadata.live-files.cache-size=21.76GB and delta.metadata.live-files.cache-ttl=30m, which are both large values. We tried bumping the cache size to 60GB to double-check, and the problem persisted.

Could it be that the cache key we're using is different, meaning these aren't really cache misses, but instead caching different things?

-> this doesn't make sense as the logic hasn't changed between Trino 448 and 453. The table name, schema, and version are all identical, so the cache should be hit. To confirm this wasn't the cause, we tried rewriting the cache to use a super-simple record of two strings as the cache key instead of the more complex TableVersion, but the problem persisted.

Eventually after several debugging sessions and code dives, and looking at changes between 448 and 453, we zeroed in on #22302 as the root cause of this issue.

When this commit is reverted, the original behaviour is restored and concurrent cache access works again. From a fresh cache, logs then look like:

2024-09-12T08:06:41.730Z	Query-20240912_080639_00051_puvbq-393	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:06:41.730Z	Query-20240912_080639_00051_puvbq-393	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? false (size=0)
2024-09-12T08:06:41.730Z	Query-20240912_080639_00051_puvbq-393	STARTING READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:06:42.837Z	Query-20240912_080640_00054_puvbq-473	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:06:42.837Z	Query-20240912_080640_00054_puvbq-473	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:00.374Z	Query-20240912_080639_00051_puvbq-393	FINISHED READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838, active files: 481683
2024-09-12T08:07:00.738Z	Query-20240912_080640_00054_puvbq-473	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:00.738Z	Query-20240912_080640_00054_puvbq-473	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:01.081Z	Query-20240912_080640_00054_puvbq-473	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:01.081Z	Query-20240912_080640_00054_puvbq-473	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:01.113Z	Query-20240912_080639_00051_puvbq-393	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:01.113Z	Query-20240912_080639_00051_puvbq-393	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:01.345Z	Query-20240912_080640_00054_puvbq-473	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:01.345Z	Query-20240912_080640_00054_puvbq-473	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:01.377Z	Query-20240912_080639_00051_puvbq-393	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:01.377Z	Query-20240912_080639_00051_puvbq-393	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:01.611Z	Query-20240912_080640_00054_puvbq-473	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:01.611Z	Query-20240912_080640_00054_puvbq-473	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:01.643Z	Query-20240912_080639_00051_puvbq-393	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:01.643Z	Query-20240912_080639_00051_puvbq-393	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:01.876Z	Query-20240912_080640_00054_puvbq-473	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:01.876Z	Query-20240912_080640_00054_puvbq-473	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:01.898Z	Query-20240912_080639_00051_puvbq-393	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:01.898Z	Query-20240912_080639_00051_puvbq-393	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:02.140Z	Query-20240912_080639_00051_puvbq-393	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:02.140Z	Query-20240912_080639_00051_puvbq-393	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:02.793Z	Query-20240912_080640_00054_puvbq-473	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:02.793Z	Query-20240912_080640_00054_puvbq-473	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:02.794Z	Query-20240912_080640_00054_puvbq-473	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:02.794Z	Query-20240912_080640_00054_puvbq-473	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:02.796Z	Query-20240912_080640_00054_puvbq-473	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:02.796Z	Query-20240912_080640_00054_puvbq-473	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:02.798Z	Query-20240912_080640_00054_puvbq-473	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:02.798Z	Query-20240912_080640_00054_puvbq-473	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:02.799Z	Query-20240912_080640_00054_puvbq-473	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:02.799Z	Query-20240912_080640_00054_puvbq-473	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:02.801Z	Query-20240912_080640_00054_puvbq-473	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:02.801Z	Query-20240912_080640_00054_puvbq-473	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:02.803Z	Query-20240912_080640_00054_puvbq-473	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:02.803Z	Query-20240912_080640_00054_puvbq-473	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:02.804Z	Query-20240912_080640_00054_puvbq-473	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:02.804Z	Query-20240912_080640_00054_puvbq-473	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:02.805Z	Query-20240912_080639_00051_puvbq-393	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:02.805Z	Query-20240912_080639_00051_puvbq-393	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:02.806Z	Query-20240912_080640_00054_puvbq-473	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:02.806Z	Query-20240912_080640_00054_puvbq-473	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:02.807Z	Query-20240912_080639_00051_puvbq-393	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:02.807Z	Query-20240912_080639_00051_puvbq-393	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:02.807Z	Query-20240912_080640_00054_puvbq-473	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:02.807Z	Query-20240912_080640_00054_puvbq-473	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:02.808Z	Query-20240912_080639_00051_puvbq-393	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:02.808Z	Query-20240912_080639_00051_puvbq-393	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:02.809Z	Query-20240912_080640_00054_puvbq-473	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:02.809Z	Query-20240912_080640_00054_puvbq-473	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:02.810Z	Query-20240912_080639_00051_puvbq-393	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:02.810Z	Query-20240912_080639_00051_puvbq-393	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:02.811Z	Query-20240912_080640_00054_puvbq-473	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:02.811Z	Query-20240912_080640_00054_puvbq-473	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:02.812Z	Query-20240912_080639_00051_puvbq-393	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:02.812Z	Query-20240912_080639_00051_puvbq-393	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:02.814Z	Query-20240912_080639_00051_puvbq-393	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:02.814Z	Query-20240912_080639_00051_puvbq-393	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
(.....)

The thing is... We're still not fully sure why! We've tried writing a test case to confirm this behaviour, but the cache behaves properly under test conditions, so we haven't been able to confirm at this level unfortunately.

However, because the issue is in the EvictableCache implementation, this has the potential to impact things everywhere in the Trino codebase, so this is probably fairly important, and I thought it'd be best to raise it straight away.

cc @findepi as you wrote the code to fix the cache 🙏 sorry there isn't a proper repro, but hopefully this is enough to get you going!

The text was updated successfully, but these errors were encountered:

wendigo · 2024-09-12T18:11:03Z

@raunaqmorarka can you take a look?

findepi · 2024-09-12T20:01:50Z

@Pluies awesome analysis!
it looks like you should be able to simulate this within TestEvictableCache. let me know how it goes.

If this fails, we should leverage the fact you have repro inside your environment.
You can add some logging for the cache invalidation (explicit, time based, etc.) especially here:

trino/lib/trino-cache/src/main/java/io/trino/cache/EvictableCache.java

Lines 78 to 88 in e9062c0

    
           .<Token<K>, V>removalListener(removal -> { 
        
               Token<K> token = removal.getKey(); 
        
               verify(token != null, "token is null"); 
        
               if (removal.getCause() == RemovalCause.REPLACED) { 
        
                   return; 
        
               } 
        
               if (removal.getCause() == RemovalCause.EXPIRED && ongoingLoads.containsKey(token)) { 
        
                   return; 
        
               } 
        
               tokens.remove(token.getKey(), token); 
        
           }),

and here

trino/lib/trino-cache/src/main/java/io/trino/cache/EvictableCache.java

Line 238 in e9062c0

tokens.remove(token.getKey(), token);

oskar-szwajkowski · 2024-09-13T13:26:11Z

Here is test that breaks EvictableCache (although it might pass sometimes, adding Thread.sleep() to cache loader will make it more obvious, as in reported case cache load is long):

    @RepeatedTest(1000)
    public void testLongLoadingCacheEntries()
    {
        Cache<String, String> cache = EvictableCacheBuilder.newBuilder()
                .expireAfterWrite(Duration.ofSeconds(60))
                .maximumSize(10)
                .build();
        try (ExecutorService executor = Executors.newFixedThreadPool(2)) {
            Runnable longRunningCacheLoad = () -> {
                try {
                    // adding Thread.sleep() to loader will make it fail even more frequently
                    cache.get("key", () -> "value");
                }
                catch (ExecutionException e) {
                    throw new RuntimeException(e);
                }
            };
            executor.submit(longRunningCacheLoad);
            executor.submit(longRunningCacheLoad);
        }
        assertThat(cache.getIfPresent("key")).isNotNull();
    }

Race is inside

trino/lib/trino-cache/src/main/java/io/trino/cache/EvictableCache.java

Lines 234 to 240 in b87e216

    
           // Token eviction via removalListener is blocked during loading, so we may need to do manual cleanup 
        
           private void removeDangling(Token<K> token) 
        
           { 
        
               if (!dataCache.asMap().containsKey(token)) { 
        
                   tokens.remove(token.getKey(), token); 
        
               } 
        
           }

dataCache.asMap().containsKey(token) will report false when cache is loading, and it doesn't take into account ongoing loads so token will be removed. Then when next thread visits, it has to generate new Token and populate cache with it, so:

Could it be that the cache key we're using is different, meaning these aren't really cache misses, but instead caching different things?

is correct, as by this comment:

trino/lib/trino-cache/src/main/java/io/trino/cache/EvictableCache.java

Line 430 in b87e216

// instance-based equality

key field inside Token is the same, but as its instance based equality, given the fact that token was removed, produces new cache key

Opened PR fixing this behavior:
#23401

findepi · 2024-09-14T10:19:50Z

dataCache.asMap().containsKey(token) will report false when cache is loading, and it doesn't take into account ongoing loads so token will be removed.

we know the ongoing loads (this.ongoingLoads)

@oskar-szwajkowski thanks for the analysis.
are you able to describe the race condition in sequential terms?

oskar-szwajkowski · 2024-09-15T09:32:17Z

@findepi Here is screenshot from debugger + diagram of one of possible threads behavior that could lead to this. I cannot find root cause in caches code, but I can see that it happen.

Here is timeline of threads race:

This screenshot from debugger shows this.

Thread 1 locks key and computes cache value, Thread 2 waits on lock.
Thread 1 finishes loading value, code proceeds to finally. Thread 2 wakes up and sees value in cache which is returned.
Thread 2 gets to removeDangling first, although it knows about value from dataCache.get(), dataCache.asMap().containsKey returns false for some reason, but after a moment it can see it and dataCache.asMap().containsKey return true (debugger value vs watch)

There must be a race between dataCache.get() and dataCache.asMap().containsKey(), where one returns value but in the same time second one returns false
Same race I can see when swapping out dataCache.asMap().containsKey() with cacheMap.getIfPresent(), cacheMap.get() looks fine, but we would have to catch exception - but I prefer using synchronize on token, see below

Here is code that also does not present race behavior (one in linked PR is also race-free as far as my testing goes), but it looks hacky:

    // Token eviction via removalListener is blocked during loading, so we may need to do manual cleanup
    private void removeDangling(Token<K> token, Object value)
    {
        try {
            dataCache.get(token, () -> {
                throw new RuntimeException("No value in cache");
            });
        }
        catch (ExecutionException e) {
            throw new RuntimeException(e);
        }
        catch (RuntimeException e) {
            if (e.getMessage().equals("No value in cache")) {
                tokens.remove(token.getKey(), token);
            }
        }
    }

wendigo · 2024-09-18T05:25:43Z

@Pluies let us know whether 458 works for you and once again, thank you for detailed issue report :)

findepi · 2024-09-28T18:53:45Z

Here is screenshot from debugger + diagram of one of possible threads behavior that could lead to this

this is an awesome summary of the problem @oskar-szwajkowski , thank you

Pluies assigned findepi Sep 12, 2024

oskar-szwajkowski mentioned this issue Sep 13, 2024

Fix race in EvictableCache#removeDangling #23401

Merged

wendigo added the RELEASE-BLOCKER label Sep 16, 2024

wendigo closed this as completed in #23401 Sep 17, 2024

wendigo mentioned this issue Sep 17, 2024

Iceberg queries stuck in the planning phase for long time #23451

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concurrent access issue in `EvictableCache` #23384

Concurrent access issue in `EvictableCache` #23384

Pluies commented Sep 12, 2024

wendigo commented Sep 12, 2024

findepi commented Sep 12, 2024

oskar-szwajkowski commented Sep 13, 2024

findepi commented Sep 14, 2024

oskar-szwajkowski commented Sep 15, 2024 •

edited

Loading

wendigo commented Sep 18, 2024 •

edited

Loading

findepi commented Sep 28, 2024

Concurrent access issue in EvictableCache #23384

Concurrent access issue in EvictableCache #23384

Comments

Pluies commented Sep 12, 2024

wendigo commented Sep 12, 2024

findepi commented Sep 12, 2024

oskar-szwajkowski commented Sep 13, 2024

findepi commented Sep 14, 2024

oskar-szwajkowski commented Sep 15, 2024 • edited Loading

wendigo commented Sep 18, 2024 • edited Loading

findepi commented Sep 28, 2024

Concurrent access issue in `EvictableCache` #23384

Concurrent access issue in `EvictableCache` #23384

oskar-szwajkowski commented Sep 15, 2024 •

edited

Loading

wendigo commented Sep 18, 2024 •

edited

Loading