Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Implement CAST between struct types #1074

Merged
merged 11 commits into from
Nov 11, 2024

Conversation

andygrove
Copy link
Member

Which issue does this PR close?

Closes #815

Rationale for this change

We need support for casting between struct types to support reading structs from Parquet using DataFusion's ParquetExec.

What changes are included in this PR?

How are these changes tested?

@andygrove andygrove changed the title [WIP] feat: Implement CAST between struct types feat: [WIP] Implement CAST between struct types Nov 11, 2024
@andygrove andygrove changed the title feat: [WIP] Implement CAST between struct types feat: Implement CAST between struct types Nov 11, 2024
@andygrove andygrove marked this pull request as ready for review November 11, 2024 17:03
@andygrove
Copy link
Member Author

@parthchandra @mbutrovich could you review?

We need more extensive tests for sure, but it will be easier to add those as part of the comet-parquet-exec feature branch.

) -> DataFusionResult<ArrayRef> {
match (from_type, to_type) {
(DataType::Struct(from_fields), DataType::Struct(to_fields)) => {
assert!(to_fields.len() <= from_fields.len());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, why we have this assert? In Spark, Cast expression requires from_fields length equal to to_fields length. So we shouldn't encounter the case that they are not equal on an analyzed query plan.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I have removed that. I was confused by an error about an unsupported cast that was dropping a struct field, but this came from DataFusion and I understand why now.

@viirya
Copy link
Member

viirya commented Nov 11, 2024

We need more extensive tests for sure, but it will be easier to add those as part of the comet-parquet-exec feature branch.

Spark should have many Cast expression tests. As we pass Spark tests, it should be fine for general cases.

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Just one question about the assert added.

@andygrove andygrove merged commit 712658e into apache:main Nov 11, 2024
91 of 148 checks passed
@andygrove andygrove deleted the cast-struct-struct branch November 11, 2024 21:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement cast between struct types
2 participants