-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Utf8View and BinaryView in substrait serialization. #12199
Changes from 4 commits
d7be771
b17ae25
f38085d
5c4ebec
dc177d2
831017c
c587f99
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -52,6 +52,7 @@ pub const DATE_32_TYPE_VARIATION_REF: u32 = 0; | |
pub const DATE_64_TYPE_VARIATION_REF: u32 = 1; | ||
pub const DEFAULT_CONTAINER_TYPE_VARIATION_REF: u32 = 0; | ||
pub const LARGE_CONTAINER_TYPE_VARIATION_REF: u32 = 1; | ||
pub const VIEW_CONTAINER_TYPE_VARIATION_REF: u32 = 2; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. FWIW, hardcoding the numbers isn't really the proper way to do type variations. (Rather we should add the variation as an extension and refer to the extension's id.) However, given this is already used for default vs large, I guess adding view makes sense - and they can all be migrated at once to the proper way someday. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you. I'll craft a follow up ticket later There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I filed #12355 to track |
||
pub const DECIMAL_128_TYPE_VARIATION_REF: u32 = 0; | ||
pub const DECIMAL_256_TYPE_VARIATION_REF: u32 = 1; | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -716,12 +716,29 @@ async fn all_type_literal() -> Result<()> { | |
date32_col = arrow_cast('2020-01-01', 'Date32') AND | ||
binary_col = arrow_cast('binary', 'Binary') AND | ||
large_binary_col = arrow_cast('large_binary', 'LargeBinary') AND | ||
view_binary_col = arrow_cast(arrow_cast('binary_view', 'Binary'), 'BinaryView') AND | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See test There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I removed this workaround in a5bfedd |
||
utf8_col = arrow_cast('utf8', 'Utf8') AND | ||
large_utf8_col = arrow_cast('large_utf8', 'LargeUtf8');", | ||
large_utf8_col = arrow_cast('large_utf8', 'LargeUtf8') AND | ||
view_utf8_col = arrow_cast('utf8_view', 'Utf8View');", | ||
) | ||
.await | ||
} | ||
|
||
/// Arrow-cast does not currently handle direct casting from utf8 to binaryView. | ||
#[tokio::test] | ||
async fn binaryview_type_literal_needs_casting_fix() -> Result<()> { | ||
let err = roundtrip_all_types( | ||
"select * from data where | ||
view_binary_col = arrow_cast('binary_view', 'BinaryView');", | ||
) | ||
.await; | ||
|
||
assert!( | ||
matches!(err, Err(e) if e.to_string().contains("Unsupported CAST from Utf8 to BinaryView")) | ||
); | ||
Ok(()) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks like we have a few missing Note that datafusion's type coercion has been previously updated to prefer coercion to the view types. It's the explicit casting that has coverage gaps. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see sqllogictests which demonstrate what is supported by arrow_cast. Then my follow ups will be: (a) make sqllogictests showing what is, and is not, supported of the new view types, and then (b) make the upstream arrow-rs changes (with some correctness guidance during code review). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sqllogictests added: #12200 Turns out the arrow-cast changes are already made, but not in the current release used in datafusion. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We have now updated to the latest arrow-rs so we'll have the correct code #12032 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sweet. I've removed the work-arounds and deleted this (no longer applicable) test. Thank you. |
||
} | ||
|
||
#[tokio::test] | ||
async fn roundtrip_literal_list() -> Result<()> { | ||
roundtrip("SELECT [[1,2,3], [], NULL, [NULL]] FROM data").await | ||
|
@@ -1231,9 +1248,11 @@ async fn create_all_type_context() -> Result<SessionContext> { | |
Field::new("date64_col", DataType::Date64, true), | ||
Field::new("binary_col", DataType::Binary, true), | ||
Field::new("large_binary_col", DataType::LargeBinary, true), | ||
Field::new("view_binary_col", DataType::BinaryView, true), | ||
Field::new("fixed_size_binary_col", DataType::FixedSizeBinary(42), true), | ||
Field::new("utf8_col", DataType::Utf8, true), | ||
Field::new("large_utf8_col", DataType::LargeUtf8, true), | ||
Field::new("view_utf8_col", DataType::Utf8View, true), | ||
Field::new_list("list_col", Field::new("item", DataType::Int64, true), true), | ||
Field::new_list( | ||
"large_list_col", | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍