-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: expose Iceberg features to python users #5590
Conversation
Column("z", dtypes.double), | ||
] | ||
|
||
iceberg_instructions = iceberg.IcebergInstructions(table_definition=table_def) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing tests:
IcebergCatalogAdapter
s3_rest_adapter
aws_glue_adapter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These objects aren't instantiable without a working S3 or AWS backend. I have tested manually, but not sure how or if we can test these in CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IcebergCatalogAdapter
is tested as a Java object in CI (IcebergToolsTest.java
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Testing the Java is not the same as making sure the python actually works. Python tests don't need to be the same detail, but they should confirm that the code runs.
- In python, I think we test against Kafka, SQL, etc. Is there a reason that we shouldn't actually be testing against some S3 here? Is it significantly harder to setup than these other cases?
- There is no way to run the tests in a repeatable way when there is S3 available if the tests don't exist.
- I have low confidence that this (or any other code) will keep working without some kind of test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussing with @devinrsmith and @malhotrashivam , it is significantly harder to test the python wrappers directly. But these wrappers are very light wrapping over better tested Java objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regardless, the python isn't tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're quite confident that we don't want build additional dockerized integration tests. They tend to be flaky, and the work involved greatly exceeds the scope of this ticket. We'll file a ticket to expand testcontainers-based testing for glue, but per @devinrsmith it appears to be a paid feature so we may not want to add it.
I think manual testing is enough for such a thin wrapper, and I don't intend to sponsor docker-based automation at this time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created the following ticket to address missing CI testing: #5656
Column("z", dtypes.double), | ||
] | ||
|
||
iceberg_instructions = iceberg.IcebergInstructions(table_definition=table_def) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Testing the Java is not the same as making sure the python actually works. Python tests don't need to be the same detail, but they should confirm that the code runs.
- In python, I think we test against Kafka, SQL, etc. Is there a reason that we shouldn't actually be testing against some S3 here? Is it significantly harder to setup than these other cases?
- There is no way to run the tests in a repeatable way when there is S3 available if the tests don't exist.
- I have low confidence that this (or any other code) will keep working without some kind of test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Java-only light review.
extensions/iceberg/s3/src/main/java/io/deephaven/iceberg/util/IcebergToolsS3.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergCatalogAdapter.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergCatalogAdapter.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergCatalogAdapter.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergCatalogAdapter.java
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergCatalogAdapter.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Content with Java changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
Note that this PR fixes #5642 |
Labels indicate documentation is required. Issues for documentation have been opened: Community: deephaven/deephaven-docs-community#241 |
Exposes Iceberg table support and adapter creation through python.
Will close #5574
Example Usage:
Local (MinIO + REST Catalog):
AWS Glue:
NOTE: the region and credentials are specified locally in the
~/.aws/config
and~/.aws/credentials
files.