Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can get wrong results when querying Delta tables #1121

Open
Kimahriman opened this issue Nov 25, 2024 · 0 comments
Open

Can get wrong results when querying Delta tables #1121

Kimahriman opened this issue Nov 25, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@Kimahriman
Copy link
Contributor

Describe the bug

Because Delta scans work by using a subclass of ParquetFileFormat within a normal Hadoop relation, Comet will see this and simply replace it with a CometParquetFileFormat, losing all the Delta-specific things in it's own subclass, such as deletion vector support and column mapping

Steps to reproduce

Don't have exact steps right now, noticed this randomly while testing things out and kind of expected this to be a problem.

Expected behavior

If doing a Delta scan, it should not be eligible to be converted to a CometScan. The check for ParquetFileFormat probably needs to be an exact class comparison that doesn't include subclasses.

Longer term it would be interesting if it is possible to delegate the necessary behavior to custom file formats, but all the work trying to push down the Parquet scans to datafusion might make that impossible unless a different approach like using delta-rs directly is used such as in #174

Additional context

No response

@Kimahriman Kimahriman added the bug Something isn't working label Nov 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant