-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[EPIC] Decouple logical from physical types #12622
Comments
I would like to organise a call so we can discuss the plan of action of this epic (as proposed #12536 (review) and in the ASF slack). In the meantime I'll try to collect as many open questions as possible and outline a meeting document. cc @alamb , @findepi , @jayzhan211 , @ozankabak and anyone else interested in this effort. |
I suggest to propose the plan directly, since it requires thinking to response, maybe not that efficiently to work in a call synchronously. And it would be more open to random people in the community. |
We should make it a goal that physical planning also abstracts over physical representation of individual batches. We should also make it a goal that function are expressible directly on logical types. Otherwise we won't be able to use them in logical planning. While function invocation can continue to work on physical representation, it does not necessarily have to be function implementor's burden to cater for them. See #12635 |
While I support both ideas (one of them was also mentioned in the original proposal) I think that there is still some discussion to be had before making them goals of this effort (#11513 (comment), #11513 (comment), and #11513 (comment)). I suggest we first define a plan (I'm drafting it now) for decoupling logical types from physical types (including the issue with functions that you are mentioning) and in parallel we can continue to validate this two ideas. |
Thanks @notfilippo . I understand it's not your goal to remove Arrow types from physical plans. |
What follows is my idea of the plan we might want to take to tackle this issue. Status and goalsCurrent status
#11513 initial goal
"Record batches as the physical type source" goal
How do we get there?
Introducing logical type limitations internally while keeping inputs and outputs the same should be helpful in order not to introduce too many breaking changes at once. At a high level, we could keep the type sources unchanged and just gradually limit the internal behaviour of the engine in order to reason about logical types instead of physical. Towards a Logical LogicalPlanThe
|
This seems like a good first step to me (and also a massive project in itself)
I think it is wise to split this out into its own second goal (as it can scope down the first part). It is probably good to note that this means we won't have "adaptive schema" in DataFusion at runtime until the second part is complete
I think UDFs will need to retain physical information (at least in All in all this sounds like an epic project. I hope we have the people who are excited to help make it happen! |
I've opened #12793 in order to continue the effort according to the plan. cc @jayzhan211 @findepi |
I can work on the logical type that replace current function signature on |
@jayzhan211 -- I was planning on introducing the logical types in the following PR, as I already started working on it. Do you mind waiting for it and then using it to replace the function signature? |
Sure |
@notfilippo this (#12622 (comment)) is a very nice graphical representation of what we want to do. thank you
@notfilippo awesome! |
Can a committer merge #13016 on |
Is your feature request related to a problem or challenge?
This epic tracks an ordered list tasks related to the proposal: Decouple logical from physical types (#11513). The goal is:
Describe the solution you'd like
Make
ScalarValue
values logical:Scalar
type for ColumnarValue #12536 (merged inlogical-types
)logical-types
)ScalarValue::LargeUtf8
andScalarValue::Utf8View
in favour ofScalarValue::Utf8
ScalarValue::LargeBinary
andScalarValue::BinaryView
in favour ofScalarValue::Binary
ScalarValue::Dictionary
(from Remove ScalarValue::Dictionary #12488)The text was updated successfully, but these errors were encountered: