Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support Utf8View for get_wider_type + binary_to_string_coercion functions #13370

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

jonathanc-n
Copy link
Contributor

@jonathanc-n jonathanc-n commented Nov 12, 2024

Which issue does this PR close?

Closes #13361
Closes #13360.

Rationale for this change

What changes are included in this PR?

Added string view to functions

Are these changes tested?

Are there any user-facing changes?

@jayzhan211
Copy link
Contributor

Same with this, I would expect test that failed if the change is reverted

@@ -1550,6 +1552,62 @@ mod tests {
);
}

#[test]
fn test_get_wider_type_with_utf8view() {
assert_eq!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not this kind of test, but a valid sql query that goes through the coercion function

Copy link
Contributor Author

@jonathanc-n jonathanc-n Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jayzhan211 Correct me if I'm wrong, but this seems to be a problem with arrow casting from numerics to utf8view, the same can be said for this, #13366. However correct me if I'm wrong, I can't seem to find a sql query that triggers it, although i believe there should be one

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems only like_coercion utilize this coercion. I think current coercion is unnecessary complex for like operator

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think get_wider_type is actually only used by ArrayConcat

image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean binary_to_string_coercion

@jonathanc-n jonathanc-n reopened this Nov 15, 2024
Copy link
Contributor

@Omega359 Omega359 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beyond the one bug I noted this looks fine to me. I do think however there should be a ticket filed to look at the get_wider_type function and see if it can be removed altogether as there seems to be just one use case and this may (I haven't checked but I feel it's likely) duplicate a function in arrow-cast.

@alamb
Copy link
Contributor

alamb commented Nov 25, 2024

I am trying to clean up the review backlog -- what is the status of this PR?

It seems like it is waiting on an end to end (e.g. sql or dataframe test) that actually has a problem / doesn't run without this PR but does run with it

I played around with it and I think I came up with one:

> create or replace table foo as select arrow_cast('1', 'Utf8View') as c;
0 row(s) fetched.
Elapsed 0.002 seconds.

> select c from foo;
+---+
| c |
+---+
| 1 |
+---+
1 row(s) fetched.
Elapsed 0.001 seconds.

> select c = 1 from foo;
Error during planning: Cannot infer common argument type for comparison operation Utf8View = Int64

But it works for strings:

> create or replace table foo as select arrow_cast('1', 'Utf8') as c;
0 row(s) fetched.
Elapsed 0.002 seconds.

> select c = 1 from foo;
+------------------+
| foo.c = Int64(1) |
+------------------+
| true             |
+------------------+
1 row(s) fetched.
Elapsed 0.002 seconds.

Looks like we need to test with Utf8 + Utf8View (aka string concat)

Also since the argument type order is important, also in both directions

Does that make sense @jonathanc-n ?

Thank you for your help

@jonathanc-n
Copy link
Contributor Author

@alamb I also believe that this one is also waiting on numeric to utf8view coercion pr in arrow-rs, correct me if I'm wrong though.

@alamb
Copy link
Contributor

alamb commented Jan 17, 2025

We just updated to a version of arrow-rs that I think has the necessary changes:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants