Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Parquet user-defined predicate to push down StartsWith #682

Closed
rdblue opened this issue Dec 4, 2019 · 9 comments
Closed

Add Parquet user-defined predicate to push down StartsWith #682

rdblue opened this issue Dec 4, 2019 · 9 comments
Labels
good first issue Good for newcomers

Comments

@rdblue
Copy link
Contributor

rdblue commented Dec 4, 2019

Parquet cannot currently push down startsWith predicates because it doesn't have a built-in implementation. Iceberg should add a user-defined predicate to push it down.

@rdblue rdblue added the good first issue Good for newcomers label Dec 4, 2019
@waterlx
Copy link
Contributor

waterlx commented Jan 17, 2020

Hi @rdblue has this issue addressed by PR #327 ? Could you please close this issue?

@aokolnychyi
Copy link
Contributor

I think #327 provided a basis and #398 implemented startsWith in our dictionary and metrics filters in Parquet. @rdblue, is this what you mean by this issue?

@maqroll
Copy link
Contributor

maqroll commented Jan 17, 2020

Sorry to jump in but maybe this issue refers to handle STARTS_WITH case with an UserDefinedPredicate here.

@rdblue
Copy link
Contributor Author

rdblue commented Jan 17, 2020

When I opened this issue, I was thinking about the old read path. That's the one that @maqroll correctly pointed to. But we now use the new read path and the row group filters are ParquetMetricsRowGroupFilter and ParquetDictionaryRowGroupFilter. We needed to add support to those classes for the startsWith predicate added by #327, which was done by @aokolnychyi in #398.

I think the only remaining task for this (and for IN/NOT_IN) is to update the conversion from Spark filters to use the new Iceberg expressions.

@rdblue
Copy link
Contributor Author

rdblue commented Jan 17, 2020

I'm going to close this one since I think it covers a case that we don't really need to worry about.

@bachar84
Copy link

In my humble opinion, issue #327 is the roots of persecution

@rdblue
Copy link
Contributor Author

rdblue commented Jan 24, 2020

In my humble opinion, issue #327 is the roots of persecution

I'm not sure I understand you. What do you mean?

@aokolnychyi
Copy link
Contributor

aokolnychyi commented Jan 24, 2020

I think SparkFilters already converts startsWith correctly. However, we need to update it to use new expressions for IN as it still rewrites it as OR/AND.

@rdblue
Copy link
Contributor Author

rdblue commented Jan 24, 2020

I agree. I opened #748 to track updating SparkFilters.

@rdblue rdblue closed this as completed Jan 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

5 participants