-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: arrow_cast function as UDF #9298
feat: arrow_cast function as UDF #9298
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @brayanjuls -- this is a great contrbution and a great start 🙏
I have a few comments, but I think my biggest concern is that with this PR a bunch of the existing special case handling of cast
will not be applicable to arrow_cast
Thus I think we should hold this PR until we have completed #9289 and thus allow arrow_cast to be rewritten to cast.
If no one else has had a chance to work on #9289 by the weekend I will do so
"| 2020-09-04 |", | ||
"+------------+", | ||
"+-----------------------------------+", | ||
"| arrow_cast(t.values,Utf8(\"Utf8\")) |", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a direct consequence of the fact that the original arrow-cast function was "rewritten" (well really planned) to a CAST
which gets various special treatment throughout the codebase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think after @jayzhan211 has completed #9304 we will be able to avoid these changes by implementing ArrowCastFunc::simplify
Co-authored-by: Andrew Lamb <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @brayanjuls -- I apologize for the delay in reviewing. this looks almost perfect. The only issue is that I think this PR will cause a performance regression in certain cases because DataFusion won't be able to "see" the cast and it has special case handling for Expr::Cast
in several places
The good news is that once after @jayzhan211 has completed #9304 we will be able to fix this with a simple implementation of simplify
"| 2020-09-04 |", | ||
"+------------+", | ||
"+-----------------------------------+", | ||
"| arrow_cast(t.values,Utf8(\"Utf8\")) |", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think after @jayzhan211 has completed #9304 we will be able to avoid these changes by implementing ArrowCastFunc::simplify
@alamb No worries, once is ready I will do the required changes. |
Hi @brayanjuls -- sorry for the delay but we have now finally merged Here is an example of how to implement a rewrite of a function to use a cast: https://github.com/apache/arrow-datafusion/blob/2873fd083d2af39f85315eea837070ef28cd5be0/datafusion/core/tests/user_defined/user_defined_scalar_functions.rs#L550-L576 Sorry again for the delay |
@alamb I have been I bit busy but I will try to upload the changes for simplify tomorrow. Thanks for the example. |
No worries -- thank you! I have also been traveling this week and so am still digging out of my own backlog. Thank you again for your help |
I had some time and wanted to write some code today, so I merged this PR up from main and updated it to use the new simplify API -- since I can't push to this branch / repo directly I created a new PR: #9610 |
Marking this as a draft as I have made the necessary chanages in #9610 |
@alamb Thanks for taking the time to implement it, I got stuck and I did not have much time. |
No worries @brayanjuls -- I am sorry the appropriate APIs weren't ready when you worked on this PR
Feel free if you want to. Otherwise when #9610 is merged it will close this one too |
Which issue does this PR close?
Closes #9143
Closes #9287
Rationale for this change
Migration of core functions to UDF. This is the remediation for the issues after merging in the original PR #9235.
What changes are included in this PR?
arrow_cast function migration.
Are these changes tested?
Yes.
Are there any user-facing changes?
No.