Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: arrow_cast function as UDF #9298

Closed

Conversation

brayanjuls
Copy link
Contributor

Which issue does this PR close?

Closes #9143
Closes #9287

Rationale for this change

Migration of core functions to UDF. This is the remediation for the issues after merging in the original PR #9235.

What changes are included in this PR?

arrow_cast function migration.

Are these changes tested?

Yes.

Are there any user-facing changes?

No.

@github-actions github-actions bot added sql SQL Planner core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Feb 21, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @brayanjuls -- this is a great contrbution and a great start 🙏

I have a few comments, but I think my biggest concern is that with this PR a bunch of the existing special case handling of cast will not be applicable to arrow_cast

Thus I think we should hold this PR until we have completed #9289 and thus allow arrow_cast to be rewritten to cast.

If no one else has had a chance to work on #9289 by the weekend I will do so

"| 2020-09-04 |",
"+------------+",
"+-----------------------------------+",
"| arrow_cast(t.values,Utf8(\"Utf8\")) |",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a direct consequence of the fact that the original arrow-cast function was "rewritten" (well really planned) to a CAST which gets various special treatment throughout the codebase

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think after @jayzhan211 has completed #9304 we will be able to avoid these changes by implementing ArrowCastFunc::simplify

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @brayanjuls -- I apologize for the delay in reviewing. this looks almost perfect. The only issue is that I think this PR will cause a performance regression in certain cases because DataFusion won't be able to "see" the cast and it has special case handling for Expr::Cast in several places

The good news is that once after @jayzhan211 has completed #9304 we will be able to fix this with a simple implementation of simplify

"| 2020-09-04 |",
"+------------+",
"+-----------------------------------+",
"| arrow_cast(t.values,Utf8(\"Utf8\")) |",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think after @jayzhan211 has completed #9304 we will be able to avoid these changes by implementing ArrowCastFunc::simplify

@brayanjuls
Copy link
Contributor Author

The good news is that once after @jayzhan211 has completed #9304 we will be able to fix this with a simple implementation of simplify

@alamb No worries, once is ready I will do the required changes.

@alamb
Copy link
Contributor

alamb commented Mar 5, 2024

Hi @brayanjuls -- sorry for the delay but we have now finally merged ScalarUDF::simplify #9304

Here is an example of how to implement a rewrite of a function to use a cast: https://github.com/apache/arrow-datafusion/blob/2873fd083d2af39f85315eea837070ef28cd5be0/datafusion/core/tests/user_defined/user_defined_scalar_functions.rs#L550-L576

Sorry again for the delay

@brayanjuls
Copy link
Contributor Author

Here is an example of how to implement a rewrite of a function to use a cast:

https://github.com/apache/arrow-datafusion/blob/2873fd083d2af39f85315eea837070ef28cd5be0/datafusion/core/tests/user_defined/user_defined_scalar_functions.rs#L550-L576

@alamb I have been I bit busy but I will try to upload the changes for simplify tomorrow. Thanks for the example.

@alamb
Copy link
Contributor

alamb commented Mar 8, 2024

@alamb I have been I bit busy but I will try to upload the changes for simplify tomorrow. Thanks for the example.

No worries -- thank you! I have also been traveling this week and so am still digging out of my own backlog. Thank you again for your help

@alamb
Copy link
Contributor

alamb commented Mar 14, 2024

I had some time and wanted to write some code today, so I merged this PR up from main and updated it to use the new simplify API -- since I can't push to this branch / repo directly I created a new PR: #9610

@alamb
Copy link
Contributor

alamb commented Mar 16, 2024

Marking this as a draft as I have made the necessary chanages in #9610

@alamb alamb marked this pull request as draft March 16, 2024 12:22
@brayanjuls
Copy link
Contributor Author

I had some time and wanted to write some code today, so I merged this PR up from main and updated it to use the new simplify API -- since I can't push to this branch / repo directly I created a new PR: #9610

@alamb Thanks for taking the time to implement it, I got stuck and I did not have much time.

@brayanjuls
Copy link
Contributor Author

Marking this as a draft as I have made the necessary chanages in #9610

@alamb Do you want me to close this PR?

@alamb
Copy link
Contributor

alamb commented Mar 16, 2024

@alamb Thanks for taking the time to implement it, I got stuck and I did not have much time.

No worries @brayanjuls -- I am sorry the appropriate APIs weren't ready when you worked on this PR

@alamb Do you want me to close this PR?

Feel free if you want to. Otherwise when #9610 is merged it will close this one too

@alamb alamb closed this in #9610 Mar 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate sql SQL Planner sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move arrow_cast to datafusion-functions crate Issue using arrow_cast in ORDER BY expressions
2 participants