-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP Optimize hash_aggregate when there are no null group keys #922
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @novemberkilo . This is looking like a good start
How do I test eq_array_no_nulls? Should I look to follow the test for eq_array?
Yes I think the tests for eq_array would be good.
I'm not sure that the duplication is that bad in the first place. If we were to improve it, is this best done via an additional macro
The goal of the optimization is to ensure that checking if the array has any nulls is done once per array, and not once per row (e.g. once per function call to eq_array
).
If you used a macro I think you could have the compiler do the duplication for you. For example, you could define a macro like
macro_rules! eq_array_general {
($self:expr, $array:expr, $index:expr, $has_nulls:expr) => {{
if has_nulls { /* check for nulls */}
...
}};
}
And then you can call it like
pub fn eq_array(&self, array: &ArrayRef, index:usize) -> bool {
eq_array_general!(self, array, index, true)
}
pub fn eq_array_no_nulls(&self, array: &ArrayRef, index:usize) -> bool {
eq_array_general!(self, array, index, false)
}
And then we would be counting on the rust compiler to optimize out the if false { .. }
for the null check
datafusion/src/scalar.rs
Outdated
#[inline] | ||
pub fn eq_array_no_nulls(&self, array: &ArrayRef, index:usize) -> bool { | ||
if let DataType::Dictionary(key_type, _) = array.data_type() { | ||
return self.eq_array_dictionary(array, index, key_type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should perhaps also be self.eq_array_dictionary_no_nulls
and have a corresponding dictionary special case.
@alamb @Dandandan I've added a commit that tries to incorporate your suggestions and feedback. Did I understand them correctly ... can you please comment again on whether I am headed in the right direction? I've not added tests to cover the additional code paths yet - I am aware that I need to do that. Also, do you have any pointers in the direction of how best to measure the effects of the attempted optimisation? Thanks. |
Hi @novemberkilo -- Thanks! I will try and review this tomorrow. In terms of measuring performance, the "Performance Summary" in #808 (comment) -- might give you some good ideas of what to try / ways to show this is faster |
(sorry for delays in reviewing, I am on vacation this week and am about to run out of time for the day) |
Ah please don't stress about delays @alamb -- no rush on my behalf ... especially if you are on vacation. |
.zip(null_information.iter()) | ||
.all(|((array, scalar), has_nulls)| { | ||
scalar.eq_array(array, row, has_nulls) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think broadly speaking this is the pattern, though I am not sure how well the compiler will be able to optimize given a has nulls parameter has_nulls
-- it might need to be hoisted out of this loop (the call to all
here)
Have you had a chance to try profiling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi I don't understand the suggestion about hoisting has_nulls
out of the all
Would you be able to please sketch it out in pseudo code or similar?
I haven't used macros much and will go educate myself on how they work (particularly the optimisation role of the compiler). Will also get to profiling (although maybe not until the weekend).
Hi @alamb - I've used Could you please help me understand your suggestion:
I'm not sure what this means. Are you perhaps suggesting that I use an index lookup of |
One reason you may not have seen any difference is that the aggregate benchmark has nulls in the grouping keys. Perhaps adjusting the following numbers and ensuring your code path is hit would be a good next step https://github.com/alamb/datafusion_aggregate_bench/blob/main/src/main.rs#L23-L24 |
Thanks much @alamb I followed your suggestion and set both of those constants to This is confusing to me ... perhaps the |
Thanks for the update @novemberkilo 👍 This is definitely possible. I don't think we have measured the specific contribution that checking for nulls added to the overall runtime. If it turns out we can't see a difference in performance, perhaps the additional code complexity isn't worth it. Unfortunately this does happen sometime while doing performance optimizations :( I am sorry if I mislead you here. |
@novemberkilo thanks for checking this and opening this PR. Next to the remarks by @alamb, I think there is also some more principal change that would be needed to get the branch outside of the loop. A similar thing that could be tried here as I started (but didn't complete yet) in this PR for the hash join Some other reasons might be that this currently is not a really hot part of the code. This could be checked by running some profiling tools (e.g. I think some other sources of overhead might live in that currently one of the two values is a |
macro_rules! eq_array_general { | ||
($array:expr, $index:expr, $ARRAYTYPE:ident, $VALUE:expr, $has_nulls:expr) => {{ | ||
let array = $array.as_any().downcast_ref::<$ARRAYTYPE>().unwrap(); | ||
if *$has_nulls { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This branch should be ideally be outside of the loop of
https://github.com/apache/arrow-datafusion/pull/922/files#diff-03876812a8bef4074e517600fdcf8e6b49f1ea24df44905d6d806836fd61b2a8L360
But this might be hard to accomplish given the current design. I posted some ideas my earlier comment on this PR
Hi @alamb - I am familiar with these sorts of outcomes when working on performance optimisation problems ... experimentation that results in a lack of positive results is also valuable. Please no need for any apology :) |
@Dandandan and @alamb thanks much for your responses. I'm not clear on how to proceed with this PR though. My read is that we probably close this PR and chalk up implementation I presented here as being unsuccessful/unnecessary. WDYT? The approach that you are suggesting @Dandandan appears to be non-trivial and should probably be broken out or written up on the original issue? I can try and do some of the benchmarking you have suggested but tbh I don't understand the actual problem well enough at the moment to work on this in a focused and productive manner. I will revisit the original issue and ask questions but if either of you can point to any other documentation that may be relevant, that would be great. I will also keep an eye on #864 👍 |
Concluding that this approach is not yielding the performance optimisation that we were reaching for. Per #922 (comment) we should probably look for a method that is similar to that being adopted in #864 Closing ... |
👍 thanks again for giving it a try |
ref #850
WIP - for purposes of discussion and feedback.
@alamb I have done the simplest possible thing to get the branching code path that avoids the
is_valid
check. However, I'd appreciate some input on how best to proceed:eq_array
to take an additional parameter that signals whether the array has no-nulls.eq_array_no_nulls
? Should I look to follow the test foreq_array
?Thanks much in advance for the guidance and input.