WIP Optimize hash_aggregate when there are no null group keys #922

novemberkilo · 2021-08-22T08:03:34Z

WIP - for purposes of discussion and feedback.

@alamb I have done the simplest possible thing to get the branching code path that avoids the is_valid check. However, I'd appreciate some input on how best to proceed:

I'd like to figure out how best to refactor this code to reduce duplication. I'm not sure that the duplication is that bad in the first place. If we were to improve it, is this best done via an additional macro, or perhaps by changing the API of eq_array to take an additional parameter that signals whether the array has no-nulls.
How do I test eq_array_no_nulls? Should I look to follow the test for eq_array?

Thanks much in advance for the guidance and input.

datafusion/src/physical_plan/hash_aggregate.rs

alamb

Thanks @novemberkilo . This is looking like a good start

How do I test eq_array_no_nulls? Should I look to follow the test for eq_array?

Yes I think the tests for eq_array would be good.

I'm not sure that the duplication is that bad in the first place. If we were to improve it, is this best done via an additional macro

The goal of the optimization is to ensure that checking if the array has any nulls is done once per array, and not once per row (e.g. once per function call to eq_array).

If you used a macro I think you could have the compiler do the duplication for you. For example, you could define a macro like

macro_rules! eq_array_general {
    ($self:expr, $array:expr, $index:expr, $has_nulls:expr) => {{
      if has_nulls { /* check for nulls */}
      ...
    }};
}

And then you can call it like

    pub fn eq_array(&self, array: &ArrayRef, index:usize) -> bool {
      eq_array_general!(self, array, index, true)
    }
    pub fn eq_array_no_nulls(&self, array: &ArrayRef, index:usize) -> bool {
      eq_array_general!(self, array, index, false)
    }

And then we would be counting on the rust compiler to optimize out the if false { .. } for the null check

alamb · 2021-08-23T11:15:16Z

datafusion/src/scalar.rs

+    #[inline]
+    pub fn eq_array_no_nulls(&self, array: &ArrayRef, index:usize) -> bool {
+        if let DataType::Dictionary(key_type, _) = array.data_type() {
+            return self.eq_array_dictionary(array, index, key_type);


I think this should perhaps also be self.eq_array_dictionary_no_nulls and have a corresponding dictionary special case.

novemberkilo · 2021-08-29T11:22:09Z

@alamb @Dandandan I've added a commit that tries to incorporate your suggestions and feedback. Did I understand them correctly ... can you please comment again on whether I am headed in the right direction?

I've not added tests to cover the additional code paths yet - I am aware that I need to do that.

Also, do you have any pointers in the direction of how best to measure the effects of the attempted optimisation? Thanks.

alamb · 2021-08-30T11:37:20Z

Hi @novemberkilo -- Thanks! I will try and review this tomorrow. In terms of measuring performance, the "Performance Summary" in #808 (comment) -- might give you some good ideas of what to try / ways to show this is faster

alamb · 2021-08-30T11:37:43Z

(sorry for delays in reviewing, I am on vacation this week and am about to run out of time for the day)

novemberkilo · 2021-08-31T01:47:41Z

Ah please don't stress about delays @alamb -- no rush on my behalf ... especially if you are on vacation.

alamb · 2021-08-31T14:37:20Z

datafusion/src/physical_plan/hash_aggregate.rs

+                .zip(null_information.iter())
+                .all(|((array, scalar), has_nulls)| {
+                    scalar.eq_array(array, row, has_nulls)
+                })


I think broadly speaking this is the pattern, though I am not sure how well the compiler will be able to optimize given a has nulls parameter has_nulls -- it might need to be hoisted out of this loop (the call to all here)

Have you had a chance to try profiling?

Hi I don't understand the suggestion about hoisting has_nulls out of the all

Would you be able to please sketch it out in pseudo code or similar?

I haven't used macros much and will go educate myself on how they work (particularly the optimisation role of the compiler). Will also get to profiling (although maybe not until the weekend).

novemberkilo · 2021-09-05T06:52:30Z

Hi @alamb - I've used datafusion_aggregate_bench to run up a comparison with master and the ratios are coming up at 0.99 so there's nothing significant in the way of a performance improvement yet.

Could you please help me understand your suggestion:

am not sure how well the compiler will be able to optimize given a has nulls parameter has_nulls -- it might need to be hoisted out of this loop (the call to all here)

I'm not sure what this means. Are you perhaps suggesting that I use an index lookup of null_information rather than using zip?

alamb · 2021-09-10T15:15:20Z

Hi @alamb - I've used datafusion_aggregate_bench to run up a comparison with master and the ratios are coming up at 0.99 so there's nothing significant in the way of a performance improvement yet.

One reason you may not have seen any difference is that the aggregate benchmark has nulls in the grouping keys. Perhaps adjusting the following numbers and ensuring your code path is hit would be a good next step

https://github.com/alamb/datafusion_aggregate_bench/blob/main/src/main.rs#L23-L24

novemberkilo · 2021-09-12T06:19:20Z

Thanks much @alamb

I followed your suggestion and set both of those constants to 0.0 and confirmed that the code path that is being hit is the one where array.is_valid() is skipped. I reran the benchmarks and still have no appreciable difference to report between my code and what is on master

This is confusing to me ... perhaps the is_valid() call is not as expensive as we thought? Or else the optimisation is still somehow not visible. I'll think about it some more but thought I would post these findings as an interim update.

alamb · 2021-09-12T11:00:04Z

This is confusing to me ... perhaps the is_valid() call is not as expensive as we thought?

Thanks for the update @novemberkilo 👍

This is definitely possible. I don't think we have measured the specific contribution that checking for nulls added to the overall runtime. If it turns out we can't see a difference in performance, perhaps the additional code complexity isn't worth it.

Unfortunately this does happen sometime while doing performance optimizations :( I am sorry if I mislead you here.

Dandandan · 2021-09-12T18:18:21Z

@novemberkilo thanks for checking this and opening this PR.

Next to the remarks by @alamb, I think there is also some more principal change that would be needed to get the branch outside of the loop.
Currently, the boolean check if *$has_nulls is still done per value inside checking value equality. Ideally, this should be done entirely outside of the loop, as is done in Arrow kernels too. Maybe there would be needed quite some invasive changes / experiments needed to accomplish this.

A similar thing that could be tried here as I started (but didn't complete yet) in this PR for the hash join
#864 is to vectorize the hash collision check: instead of checking the collision for each value individually, it's delayed to a later stage, so it can be done in a more simple loop, which can be specialized for non-null cases. The core idea is that the inner loop should include as little branches / downcasting / etc. as possible.

Some other reasons might be that this currently is not a really hot part of the code. This could be checked by running some profiling tools (e.g. perf / valgrind) which can give hints where the hot code lives. A micro benchmark could help too to more accurately measure performance changes in this part of the code (instead of the performance of executing a full query).

I think some other sources of overhead might live in that currently one of the two values is a ScalarValue (instead of living in a contiguous array), which is potentially boxed / doesn't have good locality and can not be specialized for having no nulls. Also the Array is downcasted in the inner loop based on the datatype.

Dandandan · 2021-09-12T18:19:47Z

datafusion/src/scalar.rs

+macro_rules! eq_array_general {
+    ($array:expr, $index:expr, $ARRAYTYPE:ident, $VALUE:expr, $has_nulls:expr) => {{
+        let array = $array.as_any().downcast_ref::<$ARRAYTYPE>().unwrap();
+        if *$has_nulls {


This branch should be ideally be outside of the loop of
https://github.com/apache/arrow-datafusion/pull/922/files#diff-03876812a8bef4074e517600fdcf8e6b49f1ea24df44905d6d806836fd61b2a8L360

But this might be hard to accomplish given the current design. I posted some ideas my earlier comment on this PR

novemberkilo · 2021-09-13T07:06:19Z

Unfortunately this does happen sometime while doing performance optimizations :( I am sorry if I mislead you here.

Hi @alamb - I am familiar with these sorts of outcomes when working on performance optimisation problems ... experimentation that results in a lack of positive results is also valuable. Please no need for any apology :)

novemberkilo · 2021-09-13T07:17:12Z

@Dandandan and @alamb thanks much for your responses. I'm not clear on how to proceed with this PR though. My read is that we probably close this PR and chalk up implementation I presented here as being unsuccessful/unnecessary. WDYT?

The approach that you are suggesting @Dandandan appears to be non-trivial and should probably be broken out or written up on the original issue? I can try and do some of the benchmarking you have suggested but tbh I don't understand the actual problem well enough at the moment to work on this in a focused and productive manner. I will revisit the original issue and ask questions but if either of you can point to any other documentation that may be relevant, that would be great. I will also keep an eye on #864 👍

novemberkilo · 2021-09-16T23:23:04Z

Concluding that this approach is not yielding the performance optimisation that we were reaching for. Per #922 (comment) we should probably look for a method that is similar to that being adopted in #864

Closing ...

alamb · 2021-09-17T10:12:57Z

👍 thanks again for giving it a try

Interim commit for feedback.

857bd84

github-actions bot added the datafusion Changes in the datafusion crate label Aug 22, 2021

Dandandan reviewed Aug 23, 2021

View reviewed changes

datafusion/src/physical_plan/hash_aggregate.rs Outdated Show resolved Hide resolved

alamb reviewed Aug 23, 2021

View reviewed changes

Follow up commit for feedback.

ffb11e0

alamb reviewed Aug 31, 2021

View reviewed changes

Dandandan reviewed Sep 12, 2021

View reviewed changes

novemberkilo closed this Sep 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP Optimize hash_aggregate when there are no null group keys #922

WIP Optimize hash_aggregate when there are no null group keys #922

novemberkilo commented Aug 22, 2021

alamb left a comment

alamb Aug 23, 2021

novemberkilo commented Aug 29, 2021

alamb commented Aug 30, 2021

alamb commented Aug 30, 2021

novemberkilo commented Aug 31, 2021

alamb Aug 31, 2021

novemberkilo Aug 31, 2021 •

edited

Loading

novemberkilo commented Sep 5, 2021 •

edited

Loading

alamb commented Sep 10, 2021

novemberkilo commented Sep 12, 2021

alamb commented Sep 12, 2021

Dandandan commented Sep 12, 2021

Dandandan Sep 12, 2021 •

edited

Loading

novemberkilo commented Sep 13, 2021

novemberkilo commented Sep 13, 2021 •

edited

Loading

novemberkilo commented Sep 16, 2021

alamb commented Sep 17, 2021

WIP Optimize hash_aggregate when there are no null group keys #922

WIP Optimize hash_aggregate when there are no null group keys #922

Conversation

novemberkilo commented Aug 22, 2021

alamb left a comment

Choose a reason for hiding this comment

alamb Aug 23, 2021

Choose a reason for hiding this comment

novemberkilo commented Aug 29, 2021

alamb commented Aug 30, 2021

alamb commented Aug 30, 2021

novemberkilo commented Aug 31, 2021

alamb Aug 31, 2021

Choose a reason for hiding this comment

novemberkilo Aug 31, 2021 • edited Loading

Choose a reason for hiding this comment

novemberkilo commented Sep 5, 2021 • edited Loading

alamb commented Sep 10, 2021

novemberkilo commented Sep 12, 2021

alamb commented Sep 12, 2021

Dandandan commented Sep 12, 2021

Dandandan Sep 12, 2021 • edited Loading

Choose a reason for hiding this comment

novemberkilo commented Sep 13, 2021

novemberkilo commented Sep 13, 2021 • edited Loading

novemberkilo commented Sep 16, 2021

alamb commented Sep 17, 2021

novemberkilo Aug 31, 2021 •

edited

Loading

novemberkilo commented Sep 5, 2021 •

edited

Loading

Dandandan Sep 12, 2021 •

edited

Loading

novemberkilo commented Sep 13, 2021 •

edited

Loading