-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial commit of Iceberg integration. #5277
Conversation
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergTools.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergTools.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergTools.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergTools.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergInstructions.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/location/IcebergColumnLocation.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/layout/IcebergPartitionedLayout.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/layout/IcebergPartitionedLayout.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/layout/IcebergFlatLayout.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/test/java/io.deephaven.iceberg/IcebergToolsTest.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergCatalog.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergCatalog.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/layout/IcebergPartitionedLayout.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/layout/IcebergFlatLayout.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/location/IcebergTableLocation.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergTools.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/test/java/io.deephaven.iceberg/IcebergToolsTest.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergInstructions.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergCatalog.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergInstructions.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/location/IcebergTableLocationFactory.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/test/java/io.deephaven.iceberg/IcebergToolsTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have a discussion with @malhotrashivam and @devinrsmith about configuration strategies to unify with Parquet and existing S3 support.
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergTools.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergTools.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergTools.java
Outdated
Show resolved
Hide resolved
...ions/iceberg/src/main/java/io/deephaven/iceberg/layout/IcebergKeyValuePartitionedLayout.java
Outdated
Show resolved
Hide resolved
...ions/iceberg/src/main/java/io/deephaven/iceberg/layout/IcebergKeyValuePartitionedLayout.java
Outdated
Show resolved
Hide resolved
...ions/iceberg/src/main/java/io/deephaven/iceberg/layout/IcebergKeyValuePartitionedLayout.java
Outdated
Show resolved
Hide resolved
...ions/iceberg/src/main/java/io/deephaven/iceberg/layout/IcebergKeyValuePartitionedLayout.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/layout/IcebergFlatLayout.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/layout/IcebergFlatLayout.java
Outdated
Show resolved
Hide resolved
...ions/iceberg/src/main/java/io/deephaven/iceberg/location/IcebergTableParquetLocationKey.java
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/location/IcebergTableLocationKey.java
Outdated
Show resolved
Hide resolved
As part of this, the bootstrapping S3 client was changed from sync to async to support S3TransferManager. Deleting keys has also been automated (unit tests no longer responsible for keeping track of keys used). SingletonContainers, LocalStack, and MinIO have been made public; this allows the container bootstrapping logic to be used by other projects' tests. These changes should make it easier for Iceberg testing in #5277
extensions/iceberg/src/main/java/io/deephaven/iceberg/layout/IcebergBaseLayout.java
Outdated
Show resolved
Hide resolved
} | ||
|
||
// Add the S3 instructions. | ||
if (instructions.s3Instructions().isPresent()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we require s3 instructions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to verify that (e.g.) an all-AWS install would not need explicit instructions but could function with defaults.
extensions/iceberg/src/main/java/io/deephaven/iceberg/layout/IcebergBaseLayout.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/layout/IcebergBaseLayout.java
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/layout/IcebergBaseLayout.java
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergCatalogAdapter.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergCatalogAdapter.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergCatalogAdapter.java
Outdated
Show resolved
Hide resolved
...ions/iceberg/src/main/java/io/deephaven/iceberg/layout/IcebergKeyValuePartitionedLayout.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergCatalogAdapter.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergInstructions.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/test/java/io.deephaven.iceberg/IcebergToolsTest.java
Outdated
Show resolved
Hide resolved
...ions/iceberg/src/main/java/io/deephaven/iceberg/layout/IcebergKeyValuePartitionedLayout.java
Show resolved
Hide resolved
...ions/iceberg/src/main/java/io/deephaven/iceberg/layout/IcebergKeyValuePartitionedLayout.java
Outdated
Show resolved
Hide resolved
...ions/iceberg/src/main/java/io/deephaven/iceberg/layout/IcebergKeyValuePartitionedLayout.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergCatalogAdapter.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergCatalogAdapter.java
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergCatalogAdapter.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergCatalogAdapter.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergCatalogAdapter.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergInstructions.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergCatalogAdapter.java
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergCatalogAdapter.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergCatalogAdapter.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm good with this, modulo a few quibbles. Maybe @niloc132 should look as a gradle expert.
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergCatalogAdapter.java
Outdated
Show resolved
Hide resolved
settings.gradle
Outdated
@@ -270,6 +270,9 @@ project(':extensions-s3').projectDir = file('extensions/s3') | |||
include(':extensions-iceberg') | |||
project(':extensions-iceberg').projectDir = file('extensions/iceberg') | |||
|
|||
include(':extensions-iceberg-aws') | |||
project(':extensions-iceberg-aws').projectDir = file('extensions/iceberg-aws') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be iceberg/aws
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nest subdir vs nested subproject - technically orthogonal. ryan's comment is asking for the former, which is a way to structure the directories (replace the string passed to the file() method). I would argue that not only do we want that, we also want nested subprojects, using :
delimiters.
The Gradle convention is that nested directories also always means nested projects. Deephaven (starting with DHE, inherited in DHC) doesn't follow this as a rule, though DHC sometimes uses this if the parent directory is itself a project.
That said, using subprojects also conflicts a bit with our naming policies (which is "deephaven-" plus the project's own name), since nesting projects removes the parent project from the name.
(for the rest of this i'm assuming we're calling it s3
rather than aws
, but aws-s3
might also make sense)
From talking to Larry it sounds like we anticipate other iceberg implementations that aren't s3 specific. So, I would propose something like:
extensions/
iceberg/ # name this project deephaven-extensions-iceberg?
build.gradle
src/
s3/ # name this project deephaven-iceberg-s3?
build.gradle
src/
r2/ # possible other impl?
build.gradle
src/
extensions/iceberg-aws/src/main/java/io/deephaven/iceberg/util/IcebergToolsAWS.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have only reviewed the gradle file in the py dir.
testcontainers.localstack.image=localstack/localstack:3.1.0 | ||
testcontainers.minio.image=minio/minio:RELEASE.2024-02-04T22-36-13Z |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am concerned about these version numbers, since unlike maven versions, these are docker tags that are mutable. Typically we use our docker-registry project to pin to a particular image hash and ensure that even if a tag moves, our old builds will behave the same.
That said, Devin is out, and this is how he wrote the S3 testing code, so we should be consistent until he can tell us why we want it that way (or offer another way forward).
settings.gradle
Outdated
@@ -270,6 +270,9 @@ project(':extensions-s3').projectDir = file('extensions/s3') | |||
include(':extensions-iceberg') | |||
project(':extensions-iceberg').projectDir = file('extensions/iceberg') | |||
|
|||
include(':extensions-iceberg-aws') | |||
project(':extensions-iceberg-aws').projectDir = file('extensions/iceberg-aws') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nest subdir vs nested subproject - technically orthogonal. ryan's comment is asking for the former, which is a way to structure the directories (replace the string passed to the file() method). I would argue that not only do we want that, we also want nested subprojects, using :
delimiters.
The Gradle convention is that nested directories also always means nested projects. Deephaven (starting with DHE, inherited in DHC) doesn't follow this as a rule, though DHC sometimes uses this if the parent directory is itself a project.
That said, using subprojects also conflicts a bit with our naming policies (which is "deephaven-" plus the project's own name), since nesting projects removes the parent project from the name.
(for the rest of this i'm assuming we're calling it s3
rather than aws
, but aws-s3
might also make sense)
From talking to Larry it sounds like we anticipate other iceberg implementations that aren't s3 specific. So, I would propose something like:
extensions/
iceberg/ # name this project deephaven-extensions-iceberg?
build.gradle
src/
s3/ # name this project deephaven-iceberg-s3?
build.gradle
src/
r2/ # possible other impl?
build.gradle
src/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Py dir changes look ok.
Labels indicate documentation is required. Issues for documentation have been opened: Community: deephaven/deephaven-docs-community#226 |
Highlights
Can select specific snapshots or automatically choose the latest when loading a table. Supports single-file, multi-file and key-value partitioned data.
TODO (known limits):