-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cached delta log gets corrupted when dropping and recreating delta table with Trino #21451
Comments
The problem persists also in Trino 445. |
Hi all. I´m having the same issue. Everytime i get these errors i just delete the cache data directly in the disk and it solves my problem. My stack is: Trino (450) + DeltaLake Connector + AWS Glue cache configuration for the catalog is pretty straighforward:
When i get the snapshot errors now and then, i just do a |
Use cases where we can no longer assume that the commit and checkpoint files within the Delta log are immutable should likely just disable the cache on the coordinator. Having mutable commit and checkpoint files is not really addressed by the Delta protocol, and makes any concurrent access of Delta tables brittle. So it's better to avoid this, and use a unique locations every time a table is created. |
Hello @jkylling, I am definitely using the same unique and deterministic location for a given delta table (which simply depends on the delta table name). Maybe I misunderstood your response, could you please elaborate a bit more? |
Sorry, I might be confusing this with another issue. Are you able to share a bit more about your environment? Which object store are you using? You mention that this happens when overwriting a table. What operation do you run on Spark or Trino when you do an overwrite? Can it ever happen that the content of a commit at Unfortunately, it looks like disabling the cache on the coordinator only is no longer possible. |
I am running Trino in AWS EKS using EC2 nodes with NVMe support where I am mounting the cache folders (ec2 nodes are provisioned by Karpenter, and the NVMe disks are automatically mounted as RAID0 ephemeral storage). The object store is S3 and I am using Glue as a Data Catalog. With Spark I overwite the tables like so:
Regarding the |
@jkylling It seems that the stack @sdaberdaku is using is pretty similar to mine. It seems the cache seems to think the previous created table is still valid, so it seems to keep searching for the snapshot that do not exist anymore. Maybe it should flush the cache when this kind of error happens? I also tried with: But it doesnt work, i have to manually delete the cache to make it work again. |
Discussed this issue with @raunaqmorarka and we will add a configuration option to disable caching of files with That said, having mutable files within |
Hello @jkylling, Thanks, Sebastian |
Hi @sdaberdaku, No one has had time to look into this yet. If you want to contribute this yourself I'd be happy to give some guidance. Basically, a new https://github.com/trinodb/trino/blob/c8568a9ccfcf2876ef441588a6040270d82f95b2/plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/cache/DeltaLakeCacheKeyProvider.java must be added which returns an empty cache key for any file within trino/plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeConfig.java Line 45 in c8568a9
trino/plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeModule.java Line 154 in c8568a9
|
Alternatively, you can reuse code from closed PR #21131 and use skip-paths to skip over file paths containing |
Hello @jkylling, I implemented your suggestions and submitted a PR (I also submitted the cla). Best, Sebastian |
In Trino 444 with Alluxio cache enabled, when dropping and then recreating a delta table, I occasionally get the following error:
Unfortunately the error is not easy to replicate, I can drop and recreate the same table without issues multiple times.
The actual delta log files in the object storage are not corrupted, they can be read by Spark and by Trino if the coordinator is restarted.
The text was updated successfully, but these errors were encountered: