Optimize CASE expression for "expr or expr" usage. #13953

aweltsch · 2024-12-31T12:34:11Z

Which issue does this PR close?

Rationale for this change

The objective of this PR is to optimize the case_when: expr or expr benchmark. I measured a small but consistent improvement of around 10% on this benchmark.

What changes are included in this PR?

I implemented an additional evaluation method to improve CASE WHEN condition THEN expr ELSE expr performance.
The implementation is supposed to be very close to the existing implementation of more general cases.

Are these changes tested?

I added a basic test case for the new evaluation method.

Are there any user-facing changes?

No, the changes should not affect the semantics of the CASE expression.

alamb · 2025-01-01T13:11:23Z

Thanks @aweltsch -- this looks quite nice. I am running the benchmarks on my test rig to verify

alamb

Thank you @aweltsch -- this looks good and is a very nice first contribution

I think the only think it is missing is some end to end tests (.slt)

The instructions for adding such tests is here
https://github.com/apache/datafusion/blob/main/datafusion/sqllogictest/README.md

Perhaps you can extend
https://github.com/apache/datafusion/blob/main/datafusion/sqllogictest/test_files/case.slt

My benchmark run shows about a 25% performance improvement. Nice work 🚀

++ critcmp main GH-11638-inner-loop
group                          GH-11638-inner-loop                    main
-----                          -------------------                    ----
case_when: CASE expr           1.01     24.4±0.35µs        ? ?/sec    1.00     24.3±0.28µs        ? ?/sec
case_when: column or null      1.01   1493.9±5.22ns        ? ?/sec    1.00   1481.8±3.38ns        ? ?/sec
case_when: expr or expr        1.00     31.6±0.13µs        ? ?/sec    1.24     39.3±1.34µs        ? ?/sec
case_when: scalar or scalar    1.02      8.5±0.02µs        ? ?/sec    1.00      8.3±0.02µs        ? ?/sec

2010YOUY01 · 2025-01-02T05:22:15Z

datafusion/physical-expr/src/expressions/case.rs

+            DataFusionError::Context(
+                "WHEN expression did not return a BooleanArray".to_string(),
+                Box::new(e),
+            )


Suggested change

DataFusionError::Context(

"WHEN expression did not return a BooleanArray".to_string(),

Box::new(e),

)

internal_datafusion_err!("WHEN expression did not return a BooleanArray")

nit: We can assume all type checks have been done before, then inside this function all cast failures should be unreachable, so we can use internal error instead

Thanks @2010YOUY01 for the feedback. I was not aware that all of the type-checking is guaranteed at this point.
One of my main motivations to have this here was to keep it consistent with the rest of the code in the file to minimize any deviation from the previous behavior. I can apply this change for the newly added code, what should happen to the rest of the code? Do you think it would make sense to add a new issue to clean-up the other functions to?

I agree, we can keep the code consistent now, and do clean-up later if possible

2010YOUY01 · 2025-01-02T05:23:47Z

datafusion/physical-expr/src/expressions/case.rs

+        let e = self.else_expr.as_ref().unwrap();
+        // keep `else_expr`'s data type and return type consistent
+        let expr = try_cast(Arc::clone(e), &batch.schema(), return_type.clone())
+            .unwrap_or_else(|_| Arc::clone(e));


Here is similar, we can return an internal error directly, and avoid propagating the casting failure

Since this is also used in all of the other evaluation methods for the CaseExpr I would also like to include this in the same clean-up issue. Would this be fine for you?

This would be great, thank you

aweltsch · 2025-01-02T10:47:34Z

Thanks for your feedback @alamb, I have added a new .slt test case in the file you mentioned. From my POV it should cover all relevant cases for the predicate (true, false, null) with proper expressions in the branches.

aweltsch · 2025-01-02T20:03:28Z

I added a follow-up issue #13990
I hope it is worded clearly and accurately reflects the changes desired. @2010YOUY01 feel free to chime in.

aweltsch added 2 commits December 30, 2024 09:54

Apply optimization for ExprOrExpr.

2629bec

Implement optimization similar to existing code.

175a111

github-actions bot added the physical-expr Physical Expressions label Dec 31, 2024

alamb reviewed Jan 1, 2025

View reviewed changes

alamb mentioned this pull request Jan 1, 2025

Minor: improve zip kernel docs apache/arrow-rs#6928

Open

2010YOUY01 reviewed Jan 2, 2025

View reviewed changes

Add sqllogictest.

0c973ae

github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Jan 2, 2025

aweltsch mentioned this pull request Jan 2, 2025

Simplify error handling in case.rs #13990

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize CASE expression for "expr or expr" usage. #13953

Optimize CASE expression for "expr or expr" usage. #13953

aweltsch commented Dec 31, 2024

alamb commented Jan 1, 2025

alamb left a comment •

edited

Loading

2010YOUY01 Jan 2, 2025

aweltsch Jan 2, 2025

2010YOUY01 Jan 2, 2025

2010YOUY01 Jan 2, 2025

aweltsch Jan 2, 2025 •

edited

Loading

2010YOUY01 Jan 2, 2025

aweltsch commented Jan 2, 2025

aweltsch commented Jan 2, 2025

Optimize CASE expression for "expr or expr" usage. #13953

Are you sure you want to change the base?

Optimize CASE expression for "expr or expr" usage. #13953

Conversation

aweltsch commented Dec 31, 2024

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

alamb commented Jan 1, 2025

alamb left a comment • edited Loading

Choose a reason for hiding this comment

2010YOUY01 Jan 2, 2025

Choose a reason for hiding this comment

aweltsch Jan 2, 2025

Choose a reason for hiding this comment

2010YOUY01 Jan 2, 2025

Choose a reason for hiding this comment

2010YOUY01 Jan 2, 2025

Choose a reason for hiding this comment

aweltsch Jan 2, 2025 • edited Loading

Choose a reason for hiding this comment

2010YOUY01 Jan 2, 2025

Choose a reason for hiding this comment

aweltsch commented Jan 2, 2025

aweltsch commented Jan 2, 2025

alamb left a comment •

edited

Loading

aweltsch Jan 2, 2025 •

edited

Loading