-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce startsWith Predicate #327
Conversation
@renato2099 & @Liorba had submitted PRs previously for this issue. I am taking up on their work, and trying to see if it can shepherded into |
@rdblue Could you please take a look? Thank you. |
@@ -113,6 +116,7 @@ public String toString() { | |||
predicate.op(), name, apply(predicate.literal().value())); | |||
// case IN: | |||
// return Expressions.predicate(); | |||
case STARTS_WITH: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if a startsWith
predicate makes sense in case of Bucket
transforms. Hence, leaving it to be handled by the default
case. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Transformed values tell us nothing about whether the original predicate is true or not.
a3c38b7
to
3180cec
Compare
assertProjectionInclusive(spec, startsWith("someStringCol", "ababab"), "abab"); | ||
|
||
assertProjectionStrict(spec, startsWith("someStringCol", "ab"), "ab"); | ||
assertProjectionStrict(spec, startsWith("someStringCol", "abab"), "abab"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should also test strict projection for startsWith("someStringCol", "ababab")
since that doesn't have a strict projection.
api/src/test/java/org/apache/iceberg/transforms/TestStartsWith.java
Outdated
Show resolved
Hide resolved
return null; | ||
switch (predicate.op()) { | ||
case STARTS_WITH: | ||
if (predicate.literal().value().length() <= width()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the case where the length of the literal equals width, is there value in converting to an equality predicate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have converted it into an equality predicate.
@sujithjay, nice work! I think this is about ready to go. I just had a few comments. @moulimukherjee, could you take a look at this as well, since you're familiar with the projections? |
Co-authored-by: Renato Marroquin <[email protected]> Co-authored-by: Lior Baber <[email protected]> Co-authored-by: Sujith Jay Nair <[email protected]>
} else if (predicate.literal().value().length() == width()) { | ||
return Expressions.equal(name, predicate.literal().value()); | ||
} else { | ||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does returning null
here still make sense, in light of recent changes? Or should it be returning ProjectionUtil.truncateArrayStrict(name, predicate, this)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning ProjectionUtil.truncateArrayStrict(name, predicate, this)
would still return null anyways, since it can't handle the STARTS_WITH
case. So I think it makes sense to return null
here.
api/src/test/java/org/apache/iceberg/transforms/TestStartsWith.java
Outdated
Show resolved
Hide resolved
} | ||
|
||
@Test | ||
public void assertTruncateString() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can some tests be added in the TestTruncatesResiduals.testStringTruncateTransformResiduals()
too with the STARTS_WITH
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ExpressionVisitors
for startsWith
currently throw an UnsupportedOperationException
(c.f. discussion-thread).
Given that, tests in TestTruncatesResiduals
with STARTS_WITH
would not add much value. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sujithjay, I agree with @moulimukherjee. But you're right that currently an UnsupportedOperationException
will be thrown. That's a problem because it will prevent creating splits for a scan when there is a startsWith
predicate.
The fix is to implement startsWith
in ResidualEvaluator
to support residual calculations and to add a couple of tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for ^ (implement startsWith
in ResidualEvaluator
and add tests)
Some nits, but the changes look good to me 👍. |
// starts with | ||
assertResidualValue(spec, startsWith("value", "bcd"), "ab", Expression.Operation.FALSE); | ||
assertResidualPredicate(spec, startsWith("value", "bcd"), "bc"); | ||
assertResidualValue(spec, startsWith("value", "bcd"), "cd", Expression.Operation.FALSE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@rdblue, @moulimukherjee I have updated the PR. Please review. |
Changes look good to me 👍 |
@@ -120,6 +124,9 @@ public R or(R leftResult, R rightResult) { | |||
return in(pred.ref(), pred.literal()); | |||
case NOT_IN: | |||
return notIn(pred.ref(), pred.literal()); | |||
case STARTS_WITH: | |||
/* startsWith accepts only Strings, hence type-casting */ | |||
return startsWith((BoundReference<String>) pred.ref(), (Literal<String>) pred.literal()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't catch this the first time through, but if the startsWith
method above is parameterized by T
then there shouldn't be a need to cast these to literals and bound references parameterized by String
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
startsWith
was parameterised with T
, but I missed removing the casts here.
api/src/test/java/org/apache/iceberg/transforms/TestStartsWith.java
Outdated
Show resolved
Hide resolved
@sujithjay, I just found 2 minor issues and I'll merge this after those are fixed. Thanks for pushing this feature through! |
Thank you for the help, @rdblue & @moulimukherjee. 🙏 |
I would like to follow up with a PR to enable pushdown of Apart from changes in I think one way to implement this is to take the same number of bytes from prefix and bounds (e.g. slice) and then rely on @sujithjay @rdblue @moulimukherjee Any thoughts? |
That plan sounds good to me! |
Implements #31.