-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Utf8View for lpad scalar function #11941
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very impressive @Omega359 👏 I especially like the testing improvements of this PR
I mentioned some places this function could be made faster, but that can totally be done as a follow on (maybe waiting for someone for whom the performance of lpad
is important). I can file a follow on ticket
s.push_str(string); | ||
Ok(Some(s)) | ||
} | ||
let mut s: String = " ".repeat(length - graphemes.len()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR just follows the existing pattern so I don't think any changes are needed, but I think using a GenericStringBuilder
would be much faster here (as it wouldn't allocate a string for each output array element)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a test locally with using GenericStringBuilder
vs what is in this PR and the differences are within +/- 10% for 1024 and 2048 element arrays for each of the 3 string types. Worth noting though is that timing between runs on my laptop is not really stable at all (+/- 10% or so between runs of the exact same code).
Interestingly the utf8view seems to be consistently slightly slower than just plain utf8 at least in this function.
lpad function/utf8 type/2048
time: [715.78 µs 718.43 µs 721.89 µs]
lpad function/largeutf8 type/2048
time: [745.44 µs 759.62 µs 779.84 µs]
lpad function/stringview type/2048
time: [734.25 µs 751.13 µs 773.67 µs]
let length = if length < 0 { 0 } else { length as usize }; | ||
if length == 0 { | ||
Ok(Some("".to_string())) | ||
if length < graphemes.len() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
likewise here I think GenericStringBuilder would likely be faster
I was looking at @Lordworms implementation in #11942 and I think it would make sense to align the two implementations. In some aspects I like his approach much more than the one I took in this PR. |
Do you think we should merge this PR and work on improvements in a follow on PR? |
I have no objections to that. |
Thanks @Omega359 -- let's merge this to get the initial implementation and tests in and then we can work on improvements as a follow on PR. Let me know if you would like me to file a ticket for
|
I've got the work done locally in a branch along with the GenericStringBuilder work and the benchmark to go with it. I'll create a ticket tomorrow for it thanks @alamb |
Which issue does this PR close?
Closes #11857
Rationale for this change
Support utf8view for lpad udf.
What changes are included in this PR?
Code, tests.
Are these changes tested?
Yes.
Are there any user-facing changes?
No