Add new Directory implementation for AWS S3 #13949

albogdano · 2024-10-23T18:04:09Z

Description

This PR adds a new module s3directory to Lucene, containing a new Directory implementation for AWS S3.
The code was adapted from the lucene-s3-directory project as requested by @jpountz in #13868

Currently, there are a few issues which cause the build to fail:

AWS SDK module not found or could not be resolved in module-info.java
tests require an AWS account and valid credentials to execute

jpountz

Thanks for contributing this!

I'm surprised at the number of dependencies, does the S3 SDK actually have this many dependencies? E.g. https://central.sonatype.com/artifact/software.amazon.awssdk/s3/dependencies suggests that netty-transport is a test-only dependency? (or is it also a transitive dependency which is why you had to add it here?)

Regarding testing, a quick search suggests that there are mocks for S3 that avoid connecting to an actual S3 bucket, would it work? https://stackoverflow.com/questions/6615988/how-to-mock-amazon-s3-in-an-integration-test

I left some other comments from a quick pass over the code, it would be nice to make the code conform to some Lucene expectations, such as no logging.

jpountz · 2024-10-24T15:07:41Z

lucene/s3directory/build.gradle

+  moduleTestImplementation project(':lucene:test-framework')
+  moduleTestImplementation project(':lucene:analysis:common')
+  moduleTestImplementation project(':lucene:backward-codecs')
+  moduleTestImplementation project(':lucene:queryparser')


We shouldn't need to depend on the analysis-common, backward-codecs and queryparser modules for tests. For analysis, our tests usually use MockAnalyzer rather than whitespace or keyword analyzers.

jpountz · 2024-10-24T15:16:36Z

lucene/s3directory/src/test/org/apache/lucene/store/s3/TestS3Directory.java

+      ramDirectory = new MMapDirectory(FileSystems.getDefault().getPath("target/index"));
+      fsDirectory = FSDirectory.open(FileSystems.getDefault().getPath("target/index"));
+    } catch (IOException ex) {
+      logger.log(Level.SEVERE, null, ex);


Let's throw this exception instead of swallowing it, we should fail the test suite if this throws an exception?

jpountz · 2024-10-24T15:17:31Z

lucene/s3directory/src/test/org/apache/lucene/store/s3/TestS3Directory.java

+      }
+      iwriter.forceMerge(1, true);
+    } catch (Exception e) {
+      logger.log(Level.SEVERE, null, e);


likewise here, let's not swallow exceptions in general

jpountz · 2024-10-24T15:18:02Z

lucene/s3directory/src/test/org/apache/lucene/store/s3/TestS3Directory.java

+      // Parse a simple query that searches for "text":
+
+      final QueryParser parser = new QueryParser("fieldname", analyzer);
+      final Query query = parser.parse("text");


Let's create the query manually instead and remove the dependency on the query parser module? This could be:

final Query query = new TermQuery(new Term("fieldname", "text"));

jpountz · 2024-10-24T15:18:57Z

lucene/s3directory/src/test/org/apache/lucene/store/s3/TestS3Directory.java

+import org.junit.Assert;
+import org.junit.Test;
+
+public class TestS3Directory extends LuceneTestCase {


Could we make this test extend BaseDirectoryTestCase, it has a few tests for various things that directories are supposed to be able to do like cloning and slicing.

jpountz · 2024-10-24T15:20:14Z