-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to disable filesystem caching of /_delta_log/ directory #23408
Conversation
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
@cla-bot check |
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
The cla-bot has been summoned, and re-checked this pull request! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR! It would be good if @raunaqmorarka and @wendigo could take a look as well.
.../src/main/java/io/trino/plugin/deltalake/cache/MutableDeltaLogDeltaLakeCacheKeyProvider.java
Outdated
Show resolved
Hide resolved
.../test/java/io/trino/plugin/deltalake/cache/TestMutableDeltaLogDeltaLakeCacheKeyProvider.java
Outdated
Show resolved
Hide resolved
Since the delta lake support |
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeConfig.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeConfig.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeConfig.java
Outdated
Show resolved
Hide resolved
.../src/main/java/io/trino/plugin/deltalake/cache/MutableDeltaLogDeltaLakeCacheKeyProvider.java
Outdated
Show resolved
Hide resolved
.../test/java/io/trino/plugin/deltalake/cache/TestMutableDeltaLogDeltaLakeCacheKeyProvider.java
Outdated
Show resolved
Hide resolved
@raunaqmorarka When overwriting with some Spark operations or externally deleting the delta table (i.e. manually deleting the files on the object storage) and recreating it makes the delta log mutable. |
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
Dear @jkylling, @raunaqmorarka, and, @Praveen2112, I implemented the suggested changes. Best, Sebastian PS: For some reason, the verification/cla-signed is still failing although I submitted the CLA on Friday. Maybe it needs more time to get processed. |
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
Could you clarify what Spark operations we would need to run to encounter this situation ?
CLA is processed manually, it will take time to process. |
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
It has happened to me when running:
With Spark 3.5.1 and Delta 3.1.0. |
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
@cla-bot check |
The cla-bot has been summoned, and re-checked this pull request! |
Hello @jkylling, @Praveen2112, @raunaqmorarka, and @wendigo! My CLA has finally been processed! Best, S. |
.../io/trino/plugin/deltalake/TestDeltaLakeAlluxioCacheFileOperationsMutableTransactionLog.java
Outdated
Show resolved
Hide resolved
...rino-delta-lake/src/main/java/io/trino/plugin/deltalake/cache/DeltaLakeCacheKeyProvider.java
Outdated
Show resolved
Hide resolved
371639e
to
6671578
Compare
Hello @jkylling and @raunaqmorarka, I don't know if you guys had a chance to review the minimal test I wrote. Thanks again for your support! Best, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comment about test, lgtm otherwise
...no-delta-lake/src/test/java/io/trino/plugin/deltalake/AbstractTestDeltaLakeAlluxioCache.java
Outdated
Show resolved
Hide resolved
.../src/test/java/io/trino/plugin/deltalake/TestDeltaLakeAlluxioCacheMutableTransactionLog.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm % minor comment
...ta-lake/src/test/java/io/trino/plugin/deltalake/TestDeltaLakeAlluxioCacheFileOperations.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
...rino-delta-lake/src/main/java/io/trino/plugin/deltalake/cache/DeltaLakeCacheKeyProvider.java
Outdated
Show resolved
Hide resolved
.../src/test/java/io/trino/plugin/deltalake/TestDeltaLakeAlluxioCacheMutableTransactionLog.java
Outdated
Show resolved
Hide resolved
.../src/test/java/io/trino/plugin/deltalake/TestDeltaLakeAlluxioCacheMutableTransactionLog.java
Outdated
Show resolved
Hide resolved
...ta-lake/src/test/java/io/trino/plugin/deltalake/TestDeltaLakeAlluxioCacheFileOperations.java
Outdated
Show resolved
Hide resolved
8877a18
to
bd63db7
Compare
Description
Added a configuration option to disable object caching of files with /_delta_log/ in their path to avoid issues with Delta tables with mutable commits. This is useful in those scenarios when delta tables are deleted and re-created and the files inside the _delta_log folder cannot be considered immutable anymore, and thus are unsafe to cache.
Additional context and related issues
Fixes #21451
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text: