Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrent access issue in EvictableCache #23384

Closed
Pluies opened this issue Sep 12, 2024 · 7 comments · Fixed by #23401
Closed

Concurrent access issue in EvictableCache #23384

Pluies opened this issue Sep 12, 2024 · 7 comments · Fixed by #23401
Assignees

Comments

@Pluies
Copy link
Contributor

Pluies commented Sep 12, 2024

Hey folks,

I've been tracking down a rare issue of really long optimizing or planning times in Delta Lake after upgrading from Trino 448 to 453, leading to infrequent optimizer timeouts or planning timeouts.

Eventually, I managed to consistently reproduce it by running two concurrent MERGE INTO queries that use the same source table. They look something like:

MERGE INTO dest_table_a USING src_query_foo (...)

And:

MERGE INTO dest_table_b USING src_query_bar (...)

Where src_query_foo and src_query_bar are both complex-ish queries involving the same source table, let's call it instruction_calls_decoded_0018 (picked totally at random 😄).

In Trino 448, these queries used to take 45s or so. In Trino 453, they fail after reaching the default optimizer timeout of 3 minutes.

Looking at tracing, what takes the most time in this is the optimizing step, which calls `Metadata.getTableStatistics several times in a row:

image

Interestingly, things work correctly (like in Trino 448) when:

  • Either query runs by itself (no concurrency)
  • Both queries run concurrently after at least one successful run (i.e. some cache is loaded)

For example a single-threaded run looks like:

image

We can see as many Metadata.getTableStatistics calls happening, but only the first one takes a long time, subsequent calls use the cache and return very quickly (ish - it still works through a bunch of data, but at least it doesn't reload everything from object store).

(Aside: the fact this takes 35 to 40 seconds is due to the table being massive, and under-optimized. The checkpoint itself comes in at 250MB and the table is 1M files. We know this is bad, and we've brought it down to ~20 seconds with some optimisations, so the next logs might only show 20s instead of 40s for each getTableStatistics call. On the plus side, it's been a great way to spot this bug!)

Looking at leads us to TransactionLogAccess, which has two caches:

  • tableSnapshots (link) which caches whole-table snapshots
  • activeDataFileCache (link) which caches active files for a given table version

The first cache is not hit in our queries as they use different connector handles (with different filters etc) from instruction_calls_decoded_0018. The behavious is identical in Trino 448 and 453.

activeDataFileCache however should be used. It is correctly hit in Trino 448 and returns cached files, avoiding the costly retrieval of all these files from object store.

In Trino 453 with concurrent access, we're seeing the two queries each populating this cache then not being able to read back from it after the other query has written to it, leading to a situation where we see:

2024-09-11T19:33:57.205Z	Query-20240911_193351_00050_6rwdr-470	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:33:57.205Z	Query-20240911_193351_00050_6rwdr-470	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161 already cached? true (matching entries in cache=0)
2024-09-11T19:33:57.205Z	Query-20240911_193351_00050_6rwdr-470	STARTING READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:33:57.630Z	Query-20240911_193352_00053_6rwdr-486	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:33:57.630Z	Query-20240911_193352_00053_6rwdr-486	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161 already cached? true (matching entries in cache=1)
2024-09-11T19:34:20.447Z	Query-20240911_193351_00050_6rwdr-470	FINISHED READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161, active files: 574606
2024-09-11T19:34:20.883Z	Query-20240911_193352_00053_6rwdr-486	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:34:20.884Z	Query-20240911_193352_00053_6rwdr-486	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161 already cached? true (matching entries in cache=0)
2024-09-11T19:34:20.884Z	Query-20240911_193352_00053_6rwdr-486	STARTING READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:34:21.261Z	Query-20240911_193351_00050_6rwdr-470	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:34:21.261Z	Query-20240911_193351_00050_6rwdr-470	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161 already cached? true (matching entries in cache=1)
2024-09-11T19:34:43.926Z	Query-20240911_193352_00053_6rwdr-486	FINISHED READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161, active files: 574606
2024-09-11T19:34:44.237Z	Query-20240911_193351_00050_6rwdr-470	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:34:44.237Z	Query-20240911_193351_00050_6rwdr-470	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161 already cached? true (matching entries in cache=0)
2024-09-11T19:34:44.237Z	Query-20240911_193351_00050_6rwdr-470	STARTING READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:34:44.765Z	Query-20240911_193352_00053_6rwdr-486	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:34:44.765Z	Query-20240911_193352_00053_6rwdr-486	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161 already cached? true (matching entries in cache=1)
2024-09-11T19:35:07.201Z	Query-20240911_193351_00050_6rwdr-470	FINISHED READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161, active files: 574606
2024-09-11T19:35:07.531Z	Query-20240911_193352_00053_6rwdr-486	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:35:07.531Z	Query-20240911_193352_00053_6rwdr-486	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161 already cached? true (matching entries in cache=0)
2024-09-11T19:35:07.531Z	Query-20240911_193352_00053_6rwdr-486	STARTING READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:35:07.998Z	Query-20240911_193351_00050_6rwdr-470	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:35:07.998Z	Query-20240911_193351_00050_6rwdr-470	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161 already cached? true (matching entries in cache=1)
2024-09-11T19:35:30.815Z	Query-20240911_193352_00053_6rwdr-486	FINISHED READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161, active files: 574606
2024-09-11T19:35:31.122Z	Query-20240911_193351_00050_6rwdr-470	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:35:31.122Z	Query-20240911_193351_00050_6rwdr-470	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161 already cached? true (matching entries in cache=0)
2024-09-11T19:35:31.122Z	Query-20240911_193351_00050_6rwdr-470	STARTING READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:35:31.627Z	Query-20240911_193352_00053_6rwdr-486	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:35:31.627Z	Query-20240911_193352_00053_6rwdr-486	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161 already cached? true (matching entries in cache=1)
2024-09-11T19:35:54.518Z	Query-20240911_193351_00050_6rwdr-470	FINISHED READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161, active files: 574606
2024-09-11T19:35:54.827Z	Query-20240911_193352_00053_6rwdr-486	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:35:54.827Z	Query-20240911_193352_00053_6rwdr-486	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161 already cached? true (matching entries in cache=0)
2024-09-11T19:35:55.003Z	Query-20240911_193352_00053_6rwdr-486	STARTING READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:35:55.305Z	Query-20240911_193351_00050_6rwdr-470	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:35:55.305Z	Query-20240911_193351_00050_6rwdr-470	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161 already cached? true (matching entries in cache=1)
2024-09-11T19:36:17.330Z	Query-20240911_193352_00053_6rwdr-486	FINISHED READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161, active files: 574606
2024-09-11T19:36:18.487Z	Query-20240911_193351_00050_6rwdr-470	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:36:18.487Z	Query-20240911_193351_00050_6rwdr-470	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161 already cached? true (matching entries in cache=0)
2024-09-11T19:36:18.487Z	Query-20240911_193351_00050_6rwdr-470	STARTING READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:36:18.915Z	Query-20240911_193352_00053_6rwdr-486	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:36:18.915Z	Query-20240911_193352_00053_6rwdr-486	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161 already cached? true (matching entries in cache=1)
2024-09-11T19:36:42.218Z	Query-20240911_193351_00050_6rwdr-470	FINISHED READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161, active files: 574606
2024-09-11T19:36:42.220Z	Query-20240911_193352_00053_6rwdr-486	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:36:42.220Z	Query-20240911_193352_00053_6rwdr-486	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161 already cached? true (matching entries in cache=0)
2024-09-11T19:36:42.220Z	Query-20240911_193352_00053_6rwdr-486	STARTING READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:36:42.676Z	Query-20240911_193351_00050_6rwdr-470	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:36:42.676Z	Query-20240911_193351_00050_6rwdr-470	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161 already cached? true (matching entries in cache=1)
2024-09-11T19:37:06.286Z	Query-20240911_193352_00053_6rwdr-486	FINISHED READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161, active files: 574606
2024-09-11T19:37:06.288Z	Query-20240911_193351_00050_6rwdr-470	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:37:06.288Z	Query-20240911_193351_00050_6rwdr-470	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161 already cached? true (matching entries in cache=0)
2024-09-11T19:37:06.288Z	Query-20240911_193351_00050_6rwdr-470	STARTING READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:37:06.799Z	Query-20240911_193352_00053_6rwdr-486	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:37:06.799Z	Query-20240911_193352_00053_6rwdr-486	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161 already cached? true (matching entries in cache=1)
2024-09-11T19:37:30.129Z	Query-20240911_193351_00050_6rwdr-470	FINISHED READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161, active files: 574606
2024-09-11T19:37:30.131Z	Query-20240911_193352_00053_6rwdr-486	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:37:30.131Z	Query-20240911_193352_00053_6rwdr-486	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161 already cached? true (matching entries in cache=0)
2024-09-11T19:37:30.131Z	Query-20240911_193352_00053_6rwdr-486	STARTING READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:37:30.625Z	Query-20240911_193351_00050_6rwdr-470	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:37:30.625Z	Query-20240911_193351_00050_6rwdr-470	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161 already cached? true (matching entries in cache=1)
2024-09-11T19:37:54.108Z	Query-20240911_193352_00053_6rwdr-486	FINISHED READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161, active files: 574606
2024-09-11T19:37:54.110Z	Query-20240911_193351_00050_6rwdr-470	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:37:54.110Z	Query-20240911_193351_00050_6rwdr-470	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161 already cached? true (matching entries in cache=0)
2024-09-11T19:37:54.110Z	Query-20240911_193351_00050_6rwdr-470	STARTING READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:37:54.632Z	Query-20240911_193352_00053_6rwdr-486	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:37:54.632Z	Query-20240911_193352_00053_6rwdr-486	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161 already cached? true (matching entries in cache=1)
2024-09-11T19:38:18.161Z	Query-20240911_193351_00050_6rwdr-470	FINISHED READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161, active files: 574606
2024-09-11T19:38:18.163Z	Query-20240911_193352_00053_6rwdr-486	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:38:18.163Z	Query-20240911_193352_00053_6rwdr-486	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161 already cached? true (matching entries in cache=0)
2024-09-11T19:38:18.163Z	Query-20240911_193352_00053_6rwdr-486	STARTING READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:38:18.621Z	Query-20240911_193351_00050_6rwdr-470	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:38:18.621Z	Query-20240911_193351_00050_6rwdr-470	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161 already cached? true (matching entries in cache=1)
2024-09-11T19:38:43.318Z	Query-20240911_193352_00053_6rwdr-486	FINISHED READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161, active files: 574606
2024-09-11T19:38:44.263Z	Query-20240911_193352_00053_6rwdr-486	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161
2024-09-11T19:38:44.263Z	Query-20240911_193352_00053_6rwdr-486	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161 already cached? true (matching entries in cache=0)
2024-09-11T19:38:44.263Z	Query-20240911_193352_00053_6rwdr-486	STARTING READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627161

Additional logging checks cache before looking it up here, has Starting getActiveFiles at the start of getActiveFiles, and STARTING READING / FINISHED READING at the start & end of the lambda function that populates the cache.

We thought of potential issues being:

  1. Is the cache filling up and simply discarding entries due to size? Or maybe ttl?

-> the connector is runningdelta.metadata.live-files.cache-size=21.76GB and delta.metadata.live-files.cache-ttl=30m, which are both large values. We tried bumping the cache size to 60GB to double-check, and the problem persisted.

  1. Could it be that the cache key we're using is different, meaning these aren't really cache misses, but instead caching different things?

-> this doesn't make sense as the logic hasn't changed between Trino 448 and 453. The table name, schema, and version are all identical, so the cache should be hit. To confirm this wasn't the cause, we tried rewriting the cache to use a super-simple record of two strings as the cache key instead of the more complex TableVersion, but the problem persisted.

Eventually after several debugging sessions and code dives, and looking at changes between 448 and 453, we zeroed in on #22302 as the root cause of this issue.

When this commit is reverted, the original behaviour is restored and concurrent cache access works again. From a fresh cache, logs then look like:

2024-09-12T08:06:41.730Z	Query-20240912_080639_00051_puvbq-393	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:06:41.730Z	Query-20240912_080639_00051_puvbq-393	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? false (size=0)
2024-09-12T08:06:41.730Z	Query-20240912_080639_00051_puvbq-393	STARTING READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:06:42.837Z	Query-20240912_080640_00054_puvbq-473	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:06:42.837Z	Query-20240912_080640_00054_puvbq-473	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:00.374Z	Query-20240912_080639_00051_puvbq-393	FINISHED READING s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838, active files: 481683
2024-09-12T08:07:00.738Z	Query-20240912_080640_00054_puvbq-473	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:00.738Z	Query-20240912_080640_00054_puvbq-473	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:01.081Z	Query-20240912_080640_00054_puvbq-473	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:01.081Z	Query-20240912_080640_00054_puvbq-473	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:01.113Z	Query-20240912_080639_00051_puvbq-393	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:01.113Z	Query-20240912_080639_00051_puvbq-393	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:01.345Z	Query-20240912_080640_00054_puvbq-473	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:01.345Z	Query-20240912_080640_00054_puvbq-473	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:01.377Z	Query-20240912_080639_00051_puvbq-393	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:01.377Z	Query-20240912_080639_00051_puvbq-393	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:01.611Z	Query-20240912_080640_00054_puvbq-473	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:01.611Z	Query-20240912_080640_00054_puvbq-473	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:01.643Z	Query-20240912_080639_00051_puvbq-393	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:01.643Z	Query-20240912_080639_00051_puvbq-393	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:01.876Z	Query-20240912_080640_00054_puvbq-473	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:01.876Z	Query-20240912_080640_00054_puvbq-473	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:01.898Z	Query-20240912_080639_00051_puvbq-393	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:01.898Z	Query-20240912_080639_00051_puvbq-393	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:02.140Z	Query-20240912_080639_00051_puvbq-393	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:02.140Z	Query-20240912_080639_00051_puvbq-393	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:02.793Z	Query-20240912_080640_00054_puvbq-473	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:02.793Z	Query-20240912_080640_00054_puvbq-473	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:02.794Z	Query-20240912_080640_00054_puvbq-473	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:02.794Z	Query-20240912_080640_00054_puvbq-473	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:02.796Z	Query-20240912_080640_00054_puvbq-473	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:02.796Z	Query-20240912_080640_00054_puvbq-473	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:02.798Z	Query-20240912_080640_00054_puvbq-473	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:02.798Z	Query-20240912_080640_00054_puvbq-473	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:02.799Z	Query-20240912_080640_00054_puvbq-473	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:02.799Z	Query-20240912_080640_00054_puvbq-473	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:02.801Z	Query-20240912_080640_00054_puvbq-473	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:02.801Z	Query-20240912_080640_00054_puvbq-473	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:02.803Z	Query-20240912_080640_00054_puvbq-473	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:02.803Z	Query-20240912_080640_00054_puvbq-473	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:02.804Z	Query-20240912_080640_00054_puvbq-473	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:02.804Z	Query-20240912_080640_00054_puvbq-473	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:02.805Z	Query-20240912_080639_00051_puvbq-393	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:02.805Z	Query-20240912_080639_00051_puvbq-393	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:02.806Z	Query-20240912_080640_00054_puvbq-473	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:02.806Z	Query-20240912_080640_00054_puvbq-473	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:02.807Z	Query-20240912_080639_00051_puvbq-393	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:02.807Z	Query-20240912_080639_00051_puvbq-393	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:02.807Z	Query-20240912_080640_00054_puvbq-473	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:02.807Z	Query-20240912_080640_00054_puvbq-473	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:02.808Z	Query-20240912_080639_00051_puvbq-393	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:02.808Z	Query-20240912_080639_00051_puvbq-393	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:02.809Z	Query-20240912_080640_00054_puvbq-473	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:02.809Z	Query-20240912_080640_00054_puvbq-473	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:02.810Z	Query-20240912_080639_00051_puvbq-393	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:02.810Z	Query-20240912_080639_00051_puvbq-393	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:02.811Z	Query-20240912_080640_00054_puvbq-473	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:02.811Z	Query-20240912_080640_00054_puvbq-473	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:02.812Z	Query-20240912_080639_00051_puvbq-393	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:02.812Z	Query-20240912_080639_00051_puvbq-393	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
2024-09-12T08:07:02.814Z	Query-20240912_080639_00051_puvbq-393	Starting getActiveFiles for s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838
2024-09-12T08:07:02.814Z	Query-20240912_080639_00051_puvbq-393	Is s3a://redacted/hive_metastore/instruction_calls_decoded_0018@627838 already cached? true (matching entries in cache=1)
(.....)

The thing is... We're still not fully sure why! We've tried writing a test case to confirm this behaviour, but the cache behaves properly under test conditions, so we haven't been able to confirm at this level unfortunately.

However, because the issue is in the EvictableCache implementation, this has the potential to impact things everywhere in the Trino codebase, so this is probably fairly important, and I thought it'd be best to raise it straight away.

cc @findepi as you wrote the code to fix the cache 🙏 sorry there isn't a proper repro, but hopefully this is enough to get you going!

@wendigo
Copy link
Contributor

wendigo commented Sep 12, 2024

@raunaqmorarka can you take a look?

@findepi
Copy link
Member

findepi commented Sep 12, 2024

@Pluies awesome analysis!
it looks like you should be able to simulate this within TestEvictableCache. let me know how it goes.

If this fails, we should leverage the fact you have repro inside your environment.
You can add some logging for the cache invalidation (explicit, time based, etc.) especially here:

.<Token<K>, V>removalListener(removal -> {
Token<K> token = removal.getKey();
verify(token != null, "token is null");
if (removal.getCause() == RemovalCause.REPLACED) {
return;
}
if (removal.getCause() == RemovalCause.EXPIRED && ongoingLoads.containsKey(token)) {
return;
}
tokens.remove(token.getKey(), token);
}),

and here
tokens.remove(token.getKey(), token);

@oskar-szwajkowski
Copy link
Contributor

Here is test that breaks EvictableCache (although it might pass sometimes, adding Thread.sleep() to cache loader will make it more obvious, as in reported case cache load is long):

    @RepeatedTest(1000)
    public void testLongLoadingCacheEntries()
    {
        Cache<String, String> cache = EvictableCacheBuilder.newBuilder()
                .expireAfterWrite(Duration.ofSeconds(60))
                .maximumSize(10)
                .build();
        try (ExecutorService executor = Executors.newFixedThreadPool(2)) {
            Runnable longRunningCacheLoad = () -> {
                try {
                    // adding Thread.sleep() to loader will make it fail even more frequently
                    cache.get("key", () -> "value");
                }
                catch (ExecutionException e) {
                    throw new RuntimeException(e);
                }
            };
            executor.submit(longRunningCacheLoad);
            executor.submit(longRunningCacheLoad);
        }
        assertThat(cache.getIfPresent("key")).isNotNull();
    }

Race is inside

// Token eviction via removalListener is blocked during loading, so we may need to do manual cleanup
private void removeDangling(Token<K> token)
{
if (!dataCache.asMap().containsKey(token)) {
tokens.remove(token.getKey(), token);
}
}

dataCache.asMap().containsKey(token) will report false when cache is loading, and it doesn't take into account ongoing loads so token will be removed. Then when next thread visits, it has to generate new Token and populate cache with it, so:

Could it be that the cache key we're using is different, meaning these aren't really cache misses, but instead caching different things?

is correct, as by this comment:


key field inside Token is the same, but as its instance based equality, given the fact that token was removed, produces new cache key

Opened PR fixing this behavior:
#23401

@findepi
Copy link
Member

findepi commented Sep 14, 2024

dataCache.asMap().containsKey(token) will report false when cache is loading, and it doesn't take into account ongoing loads so token will be removed.

we know the ongoing loads (this.ongoingLoads)

@oskar-szwajkowski thanks for the analysis.
are you able to describe the race condition in sequential terms?

@oskar-szwajkowski
Copy link
Contributor

oskar-szwajkowski commented Sep 15, 2024

@findepi Here is screenshot from debugger + diagram of one of possible threads behavior that could lead to this. I cannot find root cause in caches code, but I can see that it happen.

Here is timeline of threads race:
thread_race_cache (2)

This screenshot from debugger shows this.
image

Thread 1 locks key and computes cache value, Thread 2 waits on lock.
Thread 1 finishes loading value, code proceeds to finally. Thread 2 wakes up and sees value in cache which is returned.
Thread 2 gets to removeDangling first, although it knows about value from dataCache.get(), dataCache.asMap().containsKey returns false for some reason, but after a moment it can see it and dataCache.asMap().containsKey return true (debugger value vs watch)

There must be a race between dataCache.get() and dataCache.asMap().containsKey(), where one returns value but in the same time second one returns false
Same race I can see when swapping out dataCache.asMap().containsKey() with cacheMap.getIfPresent(), cacheMap.get() looks fine, but we would have to catch exception - but I prefer using synchronize on token, see below

Here is code that also does not present race behavior (one in linked PR is also race-free as far as my testing goes), but it looks hacky:

    // Token eviction via removalListener is blocked during loading, so we may need to do manual cleanup
    private void removeDangling(Token<K> token, Object value)
    {
        try {
            dataCache.get(token, () -> {
                throw new RuntimeException("No value in cache");
            });
        }
        catch (ExecutionException e) {
            throw new RuntimeException(e);
        }
        catch (RuntimeException e) {
            if (e.getMessage().equals("No value in cache")) {
                tokens.remove(token.getKey(), token);
            }
        }
    }

@wendigo
Copy link
Contributor

wendigo commented Sep 18, 2024

@Pluies let us know whether 458 works for you and once again, thank you for detailed issue report :)

@findepi
Copy link
Member

findepi commented Sep 28, 2024

Here is screenshot from debugger + diagram of one of possible threads behavior that could lead to this

this is an awesome summary of the problem @oskar-szwajkowski , thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

4 participants