-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coerce BinaryView/Utf8View to LargeBinary/LargeUtf8 on output. #12271
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @wiedld -- this looks about perfect. I think it needs one more test, but otherwise it looks great
I had some small code improvement suggestions too but I don't think they are necessary
…coverage, and expand with PlanD scenario
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @wiedld -- looks great to me
let if_not_coerced = "Projection: a\n Sort: a ASC NULLS FIRST\n Projection: a\n EmptyRelation"; | ||
do_not_coerce_on_output(plan.clone(), if_not_coerced)?; | ||
// Plan B: coerce requested: Utf8View => LargeUtf8 only on outermost | ||
let if_coerced = "Projection: CAST(a AS LargeUtf8)\n Sort: a ASC NULLS FIRST\n Projection: a\n EmptyRelation"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
🚀 |
Which issue does this PR close?
Closes #12119
Rationale for this change
We plan to change Datafusion to use BinaryVIew and Utf8View by default, however, the rest of the arrow ecosystem may not be ready for these data types. As such, we want to provide the option to cast the query output to a non-view type.
Because either Utf8 or LargeUtf8 can represented as a Utf8View, the cast on output converts to the larger type. Same with BinaryView to LargeBinary.
What changes are included in this PR?
During type coercion, after the plan has been coerced -- then based upon the final output type determine if an additional cast is needed.
Are these changes tested?
Yes.
Are there any user-facing changes?
Yes.
I made one existing API public (no longer crate private), and we added the optimizer config option
expand_views_at_output
.