-
Notifications
You must be signed in to change notification settings - Fork 843
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize like
/ilike
kernels for StringView
#5951
Comments
like
/ilke
kernels for StringView
take |
I read through the code and have some initial ideas. There are serval sub-optimizations: Good to go:
Side effects (might be, this could be removed completed by specialized string view functions):
I would start with startwith and istartwith and needs some inputs for the latter 2. Please let me know your thoughts on generating better implementations :). |
Sounds good to me! Here's the current comparsion function that has similar handling for 4/12 bytes: https://github.com/apache/arrow-rs/blob/master/arrow-array/src/array/byte_view_array.rs#L340-L399 |
like
/ilke
kernels for StringViewlike
/illke
kernels for StringView
like
/illke
kernels for StringViewlike
/ilike
kernels for StringView
It seems to me like this ticket was completed with #6231 @xinlifoobar has some ideas for improving other kernels (like regexp) here #5951 (comment) I did file #6370 for Let me know if there are other kernels that we should update |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
@XiangpengHao added support for
StringView
to the like/ilike kernels in #5931#5931 (review)
This PR does not leverage the special 4 bytes inlined prefix for large string views, which might be able to optimize certain cases (specifically for quickly testing if the
starts_with
variant of like doesn't match without having to consult the actual strings, as @wjones127 discusses in #5931 (review)I added a benchmark in #5931
Describe the solution you'd like
Investigate various ways to make the benchmark faster. Run the benchmark like this:
Describe alternatives you've considered
Maybe it is fast enough as is
Additional context
The text was updated successfully, but these errors were encountered: