-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Optimize for CASE WHEN .. THEN column ELSE null END #672
Conversation
I also see a ~5% performance improvement with TPC-DS q43 @ sf=100 |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #672 +/- ##
============================================
+ Coverage 33.69% 33.74% +0.04%
+ Complexity 840 835 -5
============================================
Files 109 109
Lines 42527 42529 +2
Branches 9343 9343
============================================
+ Hits 14331 14352 +21
+ Misses 25245 25211 -34
- Partials 2951 2966 +15 ☔ View full report in Codecov by Sentry. |
@parthchandra @kazuyukitanimura @huaxingao @viirya This is ready for review now |
@@ -541,6 +542,24 @@ impl PhysicalPlanner { | |||
Some(self.create_expr(case_when.else_expr.as_ref().unwrap(), input_schema)?) | |||
} | |||
}; | |||
|
|||
// optimized path for CASE WHEN predicate THEN expr ELSE null END | |||
if else_phy_expr.is_none() && when_then_pairs.len() == 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't we have added this as an optimization in DF itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I filed apache/datafusion#11484 to track up streaming this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataFusion PR: apache/datafusion#11534
Co-authored-by: Oleks V <[email protected]>
Co-authored-by: Oleks V <[email protected]>
Co-authored-by: Oleks V <[email protected]>
…into expr-or-null
Let's just wait to get these optimizations from upstream DataFusion. I can test against the lastest DF code when benchmarking. |
Which issue does this PR close?
Part of #669
Rationale for this change
CaseExpr
is an expensive expression and there are some cases where can use specialized expressions instead.For the usage
CASE WHEN predicate THEN column ELSE null END
which is seen in numerous TPC-DS queries, we can just evaluate thecolumn
and then change the null mask based on thepredicate
, which is a much faster operation (more than 10x faster than using the standard case expression).What changes are included in this PR?
How are these changes tested?
I manually tested with this query.
Here are the timings in seconds for five iterations.
Spark
Comet before this PR
Comet with this PR