-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specialize single column primitive group values #7043
Specialize single column primitive group values #7043
Conversation
} | ||
|
||
/// A [`GroupValues`] making use of [`Rows`] | ||
struct GroupValuesRows { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is moved into a new module
let hash = key.hash(state); | ||
let insert = self.map.find_or_find_insert_slot( | ||
hash, | ||
|g| unsafe { self.values.get_unchecked(*g).is_eq(key) }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is awesome 🚀
Ok(v) => *v.as_ref(), | ||
Err(slot) => { | ||
let g = self.values.len(); | ||
self.map.insert_in_slot(hash, slot, g); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should still track the allocated memory (like insert_accounted
did?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is accounted in the size method
I just pushed a fix for the CI failure. I am now running benchmarks on this branch to confirm. I expect this change may make a substantial difference on some of the ClickBench results as well, but I don't have them automated quite yet |
My benchmark results are similar. I plan to review this PR carefully tomorrow morning FYI @yahoNanJing
|
}; | ||
} | ||
|
||
// TODO: More primitives |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still relevant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @tustvold -- this PR is awesome ❤️ I will rerun the clickbench numbers with this PR once it is merged.
I left a bunch of "improve the comment" type suggestions which I think would add value but are optional and can be done as a follow on PR (I will do them if you choose not to).
I do think we should add hash collision testing prior to merge, which I left a comment about
/// Stores the group index based on the hash of its value | ||
map: RawTable<usize>, | ||
/// The group index of the null value if any | ||
null_group: Option<usize>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
} | ||
EmitTo::First(n) => { | ||
// SAFETY: self.map outlives iterator and is not modified concurrently | ||
unsafe { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this code is largely replicated from the row version. I wonder if it could be refactored into a (templated) common function (with appropriate documentation)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There isn't an easy way to make this generic, as one stores tuples and one isn't... I at least can't see a way that doesn't just obfuscate the code
use hashbrown::raw::RawTable; | ||
use std::sync::Arc; | ||
|
||
/// A trait to allow hashing of floating point numbers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why this doesn't use create_hash
es? And perhaps add comments in the code about the rationale?
If it is important not to use create_hashes
I recommend
- Move this code to
hash_utils.rs
so it is easier to find - Implement the
force_hash_collisions
version (that always hashes things to 0) to ensure appropriate coverage
Here is an example of force_hash_collisions
https://github.com/apache/arrow-datafusion/blob/368f6e606a3cfca8e04638b8d5ff0ff116a20b57/datafusion/physical-expr/src/hash_utils.rs#L214-L224
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't use create_hashes as we are generating the hashes from the native values, not an array
// specific language governing permissions and limitations | ||
// under the License. | ||
|
||
use crate::physical_plan::aggregates::group_values::GroupValues; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this code is just moved, right?
|
||
hash_float!(f16, f32, f64); | ||
|
||
/// A [`GroupValues`] storing raw primitive values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// A [`GroupValues`] storing raw primitive values | |
/// A [`GroupValues`] storing a single column of raw primitive values | |
/// | |
/// This specialization is significantly faster than using the more general | |
/// purpose `Row`s format |
FYI @yahoNanJing -- I think this PR will make it even easier to evaluate the improvements that a fixed width arrow row format would provide (We can make a specialized GroupsValue under the correct circumstances) |
I saw some strange results when running with this branch on my clickbench testing. Please standby |
TLDR is I think this PR is good to go from a performance perspective. I don't see massive gains but I do see small improvements |
Which issue does this PR close?
Part of #6969
Rationale for this change
The slower tests appear to just be noise, at least they don't seem to reproduce consistently
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?