-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement native support StringView for overlay #11968
Conversation
Signed-off-by: Chojan Shang <[email protected]>
Signed-off-by: Chojan Shang <[email protected]>
Signed-off-by: Chojan Shang <[email protected]>
@@ -726,7 +726,7 @@ EXPLAIN SELECT | |||
FROM test; | |||
---- | |||
logical_plan | |||
01)Projection: overlay(CAST(test.column1_utf8view AS Utf8), Utf8("foo"), Int64(2)) AS c1 | |||
01)Projection: overlay(test.column1_utf8view, Utf8View("foo"), Int64(2)) AS c1 | |||
02)--TableScan: test projection=[column1_utf8view] | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise, can we please also add actually running these queries to the tests
like
query
SELECT OVERLAY(column1_utf8view PLACING 'foo' FROM 2 ) as c1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! 907e27e
Signed-off-by: Chojan Shang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me -- thank you @PsiACE
DataType::LargeUtf8 => make_scalar_function(overlay::<i64>, vec![])(args), | ||
other => exec_err!("Unsupported data type {other:?} for function overlay"), | ||
} | ||
} | ||
} | ||
|
||
macro_rules! process_overlay { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have some sort of trait in arrow-rs that allowed us to write this as a generic function
It actually does have
https://github.com/apache/arrow-rs/blob/2461a16c19ee5032531b1c05dd7e7192bc842e0f/arrow-string/src/like.rs#L158-L161
But that is not public
@XiangpengHao do you know of anything that is pub?
We could also implement such a trait for DataFusion's convenience, and then propose upstreaming it 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just behind -- it seems that @Omega359 did exactly this in StringArrayType
#11941
datafusion/datafusion/functions/src/unicode/lpad.rs
Lines 251 to 263 in 94034be
trait StringArrayType<'a>: ArrayAccessor<Item = &'a str> + Sized { | |
fn iter(&self) -> ArrayIter<Self>; | |
} | |
impl<'a, T: OffsetSizeTrait> StringArrayType<'a> for &'a GenericStringArray<T> { | |
fn iter(&self) -> ArrayIter<Self> { | |
GenericStringArray::<T>::iter(self) | |
} | |
} | |
impl<'a> StringArrayType<'a> for &'a StringViewArray { | |
fn iter(&self) -> ArrayIter<Self> { | |
StringViewArray::iter(self) | |
} | |
} |
Maybe we can start to pull that trait into its own module and start reusing it across the string functions 🤔
Also, there is the ArrayAccessor
pattern used elegantly by @devanbenz in #11967 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current style is inspired by rpad in #11942 . I'll be rewriting it with ArrayAccessor
, which I've used in other PRs, and it's a much more elegant way.
Thanks @PsiACE |
* Implement native support StringView for overlay Signed-off-by: Chojan Shang <[email protected]> * Re-write impl of overlay Signed-off-by: Chojan Shang <[email protected]> * Minor update Signed-off-by: Chojan Shang <[email protected]> * Add more tests Signed-off-by: Chojan Shang <[email protected]> --------- Signed-off-by: Chojan Shang <[email protected]>
Which issue does this PR close?
Closes #11909 .
Rationale for this change
Ref: #11942
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?