-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce Scalar
type for ColumnarValue
#12536
Merged
alamb
merged 5 commits into
apache:logical-types
from
notfilippo:fr/scalar-columnar-value
Oct 1, 2024
Merged
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
9b2b60c
Introduce `Scalar` type for ColumnarValue
notfilippo 24dd660
Add constructor constraints for `Scalar`
notfilippo 9b19149
Add rustdoc for `Scalar`
notfilippo c049e85
Add TODO note on `ColumnarValue::cast_to`
notfilippo c98efba
Add more `Scalar` rustdoc
notfilippo File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when i read @jayzhan211 's #12488 (comment) o thought that ScalarValue is going to be left in ColumnarValue (least overhead, as relating to @alamb 's #12488 (review)).
But here is't being replaced.
What's the next step here? what's the end state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The next step is to slowly remove variants of
ScalarValue
while still accounting for them viadata_type
(which in my opinion is the least intrusive change to support this effort). After we are satisfied with the variants that remain we reconsider the logic in expressions and operators in order to support what @jayzhan211 in proposing:by sourcing the
data_type
directly from the execution schema.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do we need this for?
(maybe that's obvious for someone with clarity on the preceding questions)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is hard to tell the role given we are in the intermediate state. I would said all of
ColumnarValue
andScalarValue
andScalar
are physical types now, given they have eitherArrayRef
orDataType
The end state in my mind is that,
ColumnarValue
storesArrayRef
for multiple rows andScalarValue
for single rows case.ScalarValue
has either native type likei64
orarrow::Scalar
.We will have
LogicalType
which could be resolved fromDataType
given the mapping we determined, which should be customizable as well for user defined type. Similar to the roleScalar
in this PR, which record the relationship betweenScalarValue
toDataType
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our first step is to minimize the variant of types, so we don't need
ScalarValue::Utf8
,ScalarValue::LargeUtf8
any more butScalar
which hasScalarValue::String
+DataType::Utf8
orScalarValue::String
+DataType::LargeUtf8
. We determine the actualphysical type
based on arrow's DataType.ScalarValue::String
is then something close tological type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is nothing difference for literal case, we still get
Literal(ScalarValue)
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ScalarValue
along withDataType
is a transition state, I guess we could get the DataType from schema so we don't even requireDataType
inScalarValue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rough idea is as the following:
As long as we have
Schema
we can get theDataType
, and we can get the LogicalType from it if each DataType has at most one LogicalType.ScalarValue
is just a wrapper for the scalar case ofArrayRef
or rust native type. We can carryString
forUtf8
/Utf8View
/LargeUtf8
and decode it if needed, and also carryScalar<ArrayRef>
forList
/LargeList
and so on.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not assume this to be the case.
I suppose #12644 may convey why, but even if we don't do Extension Types, this assumption will be very limiting.
This comment was marked as outdated.
Sorry, something went wrong.