Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable legacy filesystem implementation by default #23343

Merged
merged 1 commit into from
Sep 13, 2024

Conversation

anusudarsan
Copy link
Member

@anusudarsan anusudarsan commented Sep 9, 2024

Description

Additional context and related issues

Docs must be ready and merged at the same time so change lands in a release with the required docs. This is a breaking change for most users.

#23366

Release notes

(x) Release notes are required, with the following suggested text:

## Hive,  Delta Lake, Iceberg, and Hudi connector

* {{breaking}} Deactivate legacy file system support for all catalogs. You must 
  activate the desired [file system support](file-system-configuration) with 
  `fs.native-azure.enabled`,`fs.native-gcs.enabled`, `fs.native-s3.enabled`, or 
  `fs.hadoop.enabled` in each catalog. Use the migration guides for Azure Storage, 
  Google Cloud Storage, and S3 to assist if you have not switched from legacy support.

and link to migration guides

@cla-bot cla-bot bot added the cla-signed label Sep 9, 2024
@mosabua
Copy link
Member

mosabua commented Sep 9, 2024

Looks like they're a whole bunch of failures that I'm not false alarms @anusudarsan

@anusudarsan anusudarsan force-pushed the anu/enable-nfs-default branch 2 times, most recently from b650ce3 to 56a5e63 Compare September 10, 2024 17:00
@github-actions github-actions bot added jdbc Relates to Trino JDBC driver hive Hive connector labels Sep 10, 2024
@mosabua
Copy link
Member

mosabua commented Sep 10, 2024

I assume you will have to update a whole bunch of catalog properties files used for testing and stuff like that @anusudarsan

@mosabua
Copy link
Member

mosabua commented Sep 10, 2024

fyi @jhlodin and @Joelg96 .. you can work on docs PR in parallel

@@ -222,6 +222,9 @@ public DistributedQueryRunner build()
hiveProperties.put("hive.metastore", "file");
hiveProperties.put("hive.metastore.catalog.dir", queryRunner.getCoordinator().getBaseDataDir().resolve("hive_data").toString());
}
if (!hiveProperties.buildOrThrow().containsKey("fs.hadoop.enabled")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we not use the new navy file systems for most of the tests?

Copy link
Member Author

@anusudarsan anusudarsan Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All those tests were already migrated to native-fs (including product-tests), by explicitly setting fs.hadoop.enabled to false, and enabling one of s3/azure/gcs implementations

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So why do we need to add this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@electrum there are still tests with no fs.hadoop.enabled values set. When default was changed to false, tests fail with errors like No factory set for location /tmp.. . Hence this change

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we change those tests to make sure they do have the fs.hadoop.enabled set to on instead

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we change those tests to make sure they do have the fs.hadoop.enabled set to on instead

We could, but there will be a lot. I followed the pattern in the runner where we default to file metastore when no metastore is set.

Copy link
Contributor

@findinpath findinpath Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When default was changed to false, tests fail with errors like No factory set for location /tmp..

Here is a stacktrace

stacktrace io.trino.spi.TrinoException: Could not read database schema
at io.trino.plugin.hive.metastore.file.FileHiveMetastore.readFile(FileHiveMetastore.java:1406)
at io.trino.plugin.hive.metastore.file.FileHiveMetastore.readSchemaFile(FileHiveMetastore.java:1391)
at io.trino.plugin.hive.metastore.file.FileHiveMetastore.getDatabase(FileHiveMetastore.java:285)
at io.trino.plugin.hive.metastore.file.FileHiveMetastore.createDatabase(FileHiveMetastore.java:198)
at io.trino.metastore.tracing.TracingHiveMetastore.lambda$createDatabase$9(TracingHiveMetastore.java:173)
at io.trino.metastore.tracing.Tracing.lambda$withTracing$0(Tracing.java:31)
at io.trino.metastore.tracing.Tracing.withTracing(Tracing.java:39)
at io.trino.metastore.tracing.Tracing.withTracing(Tracing.java:30)
at io.trino.metastore.tracing.TracingHiveMetastore.createDatabase(TracingHiveMetastore.java:173)
at io.trino.plugin.hive.metastore.cache.CachingHiveMetastore.createDatabase(CachingHiveMetastore.java:572)
at io.trino.plugin.deltalake.TestDeltaLakeProjectionPushdownPlans.createPlanTester(TestDeltaLakeProjectionPushdownPlans.java:119)

It looks to me that we need to have a mapping for io.trino.filesystem.local.LocalFileSystemFactory in FileSystemModule or an extension of FileSystemModule that we are using for testing purposes in order to cope with local fs locations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supporting local fs would be cool anyway

@github-actions github-actions bot added iceberg Iceberg connector delta-lake Delta Lake connector labels Sep 10, 2024
@anusudarsan anusudarsan changed the title Enable native filesystem implementation by default Disable legacy filesystem implementation by default Sep 10, 2024
@@ -222,6 +222,9 @@ public DistributedQueryRunner build()
hiveProperties.put("hive.metastore", "file");
hiveProperties.put("hive.metastore.catalog.dir", queryRunner.getCoordinator().getBaseDataDir().resolve("hive_data").toString());
}
if (!hiveProperties.buildOrThrow().containsKey("fs.hadoop.enabled")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So why do we need to add this?

@mosabua mosabua marked this pull request as ready for review September 11, 2024 18:49
@jhlodin
Copy link
Contributor

jhlodin commented Sep 11, 2024

Release note edit, suggest
"have to now explicitly enable the filesystem implementation" -> "have to now explicitly enable the desired filesystem implementation" with "filesystem implementation" linking to here - https://trino.io/docs/current/object-storage.html#configuration

Reason being, we should make it clear that there is a choice between multiple different options. As written now, it sounds like we expect users to simply re-enable the hadoop FS they were previously using.

@anusudarsan anusudarsan force-pushed the anu/enable-nfs-default branch 2 times, most recently from 1bf8df1 to 8f8ab99 Compare September 11, 2024 21:28
@mosabua
Copy link
Member

mosabua commented Sep 11, 2024

@anusudarsan @electrum I'm wondering if for a lot of the tests instead of defaulting to the Hadoop filesystem we should default to another one to make the removal and migration later easier. But maybe that can also be a follow-up pull request

@anusudarsan anusudarsan force-pushed the anu/enable-nfs-default branch 2 times, most recently from a8b6215 to 7c60795 Compare September 12, 2024 13:29
@mosabua
Copy link
Member

mosabua commented Sep 12, 2024

/test-with-secrets sha=7c60795d814d6b7d1ec2e1d3f0a4118e491d2c98

Copy link

github-actions bot commented Sep 12, 2024

The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/10838968944

@mosabua
Copy link
Member

mosabua commented Sep 12, 2024

/test-with-secrets sha=13ff61e6f20bb1c208d9dae0d72a55a677335da2

Copy link

github-actions bot commented Sep 12, 2024

The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/10840306232

@mosabua
Copy link
Member

mosabua commented Sep 13, 2024

/test-with-secrets sha=58f28d1e02179c4b0c4e3b4ae37192cd94b27e7f

Copy link

github-actions bot commented Sep 13, 2024

The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/10842020393

@mosabua
Copy link
Member

mosabua commented Sep 13, 2024

/test-with-secrets sha=494c0ae41b2cca6ea43d59c445027f56d323b900

Copy link

The CI workflow run with tests that require additional secrets has been started: https://github.com/trinodb/trino/actions/runs/10851945727

This means we disable all filesystem implementations by default, enforcing users to explicitly configure either legacy or native filesystem implementation
@anusudarsan
Copy link
Member Author

rebased the PR

@mosabua
Copy link
Member

mosabua commented Sep 13, 2024

CI is good. Failure is a flaky false alarm.

Copy link
Member

@mosabua mosabua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work down the rabbit hole.

@mosabua mosabua merged commit 3d66499 into trinodb:master Sep 13, 2024
56 of 57 checks passed
@github-actions github-actions bot added this to the 458 milestone Sep 13, 2024
@anusudarsan anusudarsan deleted the anu/enable-nfs-default branch September 13, 2024 20:44
@ebyhr
Copy link
Member

ebyhr commented Sep 13, 2024

@mosabua
Copy link
Member

mosabua commented Sep 13, 2024

Dang .. that passed on the PR on the runs with secrets

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed delta-lake Delta Lake connector hive Hive connector iceberg Iceberg connector jdbc Relates to Trino JDBC driver
Development

Successfully merging this pull request may close these issues.

6 participants