-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement utf8_view for replace #12004
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking nice @thinh2 🙏 -- I'll start CI for this PR
other => { | ||
exec_err!("Unsupported data type {other:?} for function replace") | ||
} | ||
} | ||
} | ||
} | ||
|
||
fn replace_view(args: &[ArrayRef]) -> Result<ArrayRef> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
01)Projection: replace(__common_expr_1, Utf8("foo"), Utf8("bar")) AS c1, replace(__common_expr_1, CAST(test.column2_utf8view AS Utf8), Utf8("bar")) AS c2 | ||
02)--Projection: CAST(test.column1_utf8view AS Utf8) AS __common_expr_1, test.column2_utf8view | ||
03)----TableScan: test projection=[column1_utf8view, column2_utf8view] | ||
01)Projection: replace(test.column1_utf8view, Utf8("foo"), Utf8("bar")) AS c1, replace(test.column1_utf8view, test.column2_utf8view, Utf8("bar")) AS c2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think adding a test that shows this query running (aka run
SELECT
REPLACE(column1_utf8view, 'foo', 'bar') as c1,
REPLACE(column1_utf8view, column2_utf8view, 'bar') as c2
FROM test;
in addition to EXPLAIN
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @alamb, after updating the replace.rs to process Utf8View, my sql logic test still uses the CAST in its logical plan and this test is failing. Do you have any idea to debug this issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @alamb, I already found the issue. The reason is that I didn't update the function's signature to Utf8View.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me @thinh2 -- thank you 🙏
I pushed another commit to your branch that adds a slt test to very execution, and merged up from main
Thank you very much for your contribution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again @thinh2
vec![ | ||
Exact(vec![Utf8View, Utf8View, Utf8View]), | ||
Exact(vec![Utf8, Utf8, Utf8]), | ||
Exact(vec![LargeUtf8, LargeUtf8, LargeUtf8]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for also implementing support for LargeUTF8
Looks like there are some clippy errors and test updates needed to get CI clean on this PR |
Hi @alamb, currently my code fails at this sql logic test https://github.com/apache/datafusion/blob/main/datafusion/sqllogictest/test_files/functions.slt#L825
The reason is because of replace's signature order. If I push Utf8 before Utf8View, it will pass. However, test in string_view.slt will fail. I am trying to understand the functions test above to fix it. |
Hi @alamb, I already fixed all the test errors. Could you please check and run the CI pipeline again? |
Thank you @thinh2 |
🚀 |
Which issue does this PR close?
Closes #11913 .
Rationale for this change
Before this change, the
replace
function only supportutf8
andlargeUtf8
. We used to castutf8View
toutf8
, which reduces the performance of the function. This PR is to add support forutf8View
insidereplace
function.What changes are included in this PR?
replace
forutf8View
replace
functionsqllogictest
to match ourutf8View
expectation (no CAST in the logical plans)replace
to handleutf8View
. Without this change, the sql query will still castutf8View
toutf8
Are these changes tested?
replace
functionsqllogictest
to match ourutf8View
expectation (no CAST in the logical plans)Are there any user-facing changes?
No