-
Notifications
You must be signed in to change notification settings - Fork 835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Null Mask to Prefix and Suffix Iters #6312
Conversation
arrow-string/src/predicate.rs
Outdated
@@ -134,7 +134,9 @@ impl<'a> Predicate<'a> { | |||
string_view_array | |||
.prefix_bytes_iter(v.len()) | |||
.map(|haystack| { | |||
equals_bytes(haystack, v.as_bytes(), equals_kernel) != negate | |||
haystack |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if you use unwrap_or_default instead of map_or
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be worse than map_or(false, ...) as an additional vtable call is introduced.
https://doc.rust-lang.org/1.80.1/src/core/option.rs.html#1003-1013
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the vtable, the types are all known statically here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I made a mistake. The above statement is wrong.
pub fn prefix_bytes_iter(&self, prefix_len: usize) -> impl Iterator<Item = Option<&[u8]>> { | ||
self.views().into_iter().enumerate().map(move |(i, v)| { | ||
if self.is_null(i) { | ||
None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens to performance if you don't check the null mask, but still return Option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_null
is cached null values which might faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was suggesting simply not checking the null mask at all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I didn't get... The latest commit has removed the is_null
check...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment still points to the old code... I have squash'ed everything in one commit and please see the changes.
New benchmark like_utf8view scalar complex
----------------------------
nullable_08_27 1.00 171.2±2.01ms ? ?/sec
master_08_27 1.00 171.9±3.44ms ? ?/sec
nullable_08_27_2 1.02 174.8±6.85ms ? ?/sec
like_utf8view scalar contains
-----------------------------
nullable_08_27_2 1.00 124.7±1.50ms ? ?/sec
master_08_27 1.02 127.1±1.40ms ? ?/sec
nullable_08_27 1.04 129.7±2.94ms ? ?/sec
like_utf8view scalar ends with 13 bytes
---------------------------------------
master_08_27 1.00 30.9±0.95ms ? ?/sec
nullable_08_27 1.09 33.8±0.72ms ? ?/sec
nullable_08_27_2 1.19 36.8±0.79ms ? ?/sec
like_utf8view scalar ends with 4 bytes
--------------------------------------
master_08_27 1.00 31.9±0.55ms ? ?/sec
nullable_08_27 1.10 35.0±0.51ms ? ?/sec
nullable_08_27_2 1.19 38.0±3.34ms ? ?/sec
like_utf8view scalar ends with 6 bytes
--------------------------------------
master_08_27 1.00 31.5±0.45ms ? ?/sec
nullable_08_27 1.10 34.7±0.42ms ? ?/sec
nullable_08_27_2 1.18 37.1±0.90ms ? ?/sec
like_utf8view scalar equals
---------------------------
nullable_08_27 1.00 26.9±0.44ms ? ?/sec
master_08_27 1.00 26.9±1.08ms ? ?/sec
nullable_08_27_2 1.02 27.4±1.58ms ? ?/sec
like_utf8view scalar starts with 13 bytes
-----------------------------------------
master_08_27 1.00 30.5±0.49ms ? ?/sec
nullable_08_27 1.14 34.6±0.42ms ? ?/sec
nullable_08_27_2 1.19 36.2±0.61ms ? ?/sec
like_utf8view scalar starts with 4 bytes
----------------------------------------
nullable_08_27 1.00 17.5±0.51ms ? ?/sec
master_08_27 1.25 21.9±0.31ms ? ?/sec
nullable_08_27_2 1.29 22.6±0.30ms ? ?/sec
like_utf8view scalar starts with 6 bytes
----------------------------------------
master_08_27 1.00 30.9±0.46ms ? ?/sec
nullable_08_27 1.15 35.7±0.61ms ? ?/sec
nullable_08_27_2 1.23 38.2±2.02ms ? ?/sec |
Update comment Update comment
a255da7
to
5e55c42
Compare
Which issue does this PR close?
This is a follow-up (possible?) of #6231 and incur discussions https://github.com/apache/arrow-rs/pull/6306/files#r1731703196.
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?