-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support unnest as subexpression #9592
support unnest as subexpression #9592
Conversation
|
||
## Unnest multiple columns | ||
query II | ||
select unnest(column1), unnest(column2) from unnest_table; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result of this query should be incorrect. In DuckDB:
D CREATE TABLE unnest_table
AS VALUES
([1,2,3], [7], 1),
([4,5], [8,9,10], 2),
([6], [11,12], 3),
([12], [null, 42, null], null),
-- null array to verify the `preserve_nulls` option
(null, null, 4)
;
D select unnest(col0), unnest(col1) from unnest_table;
┌──────────────┬──────────────┐
│ unnest(col0) │ unnest(col1) │
│ int32 │ int32 │
├──────────────┼──────────────┤
│ 1 │ 7 │
│ 2 │ │
│ 3 │ │
│ 4 │ 8 │
│ 5 │ 9 │
│ │ 10 │
│ 6 │ 11 │
│ │ 12 │
│ 12 │ │
│ │ 42 │
│ │ │
├──────────────┴──────────────┤
│ 11 rows 2 columns │
└─────────────────────────────┘
For multiple unnest expressions, we might need to extend the functionality of UnnestExec
to support it. I think we could implement it later, maybe in another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonahgao Agreed, I will remove this feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonahgao Removed
@@ -135,11 +135,6 @@ select array_remove(column1, 4), unnest(column2), column3 * 10 from unnest_table | |||
query error DataFusion error: Error during planning: unnest\(\) can only be applied to array, struct and null | |||
select unnest(column3) from unnest_table; | |||
|
|||
## Unnest multiple columns | |||
query error DataFusion error: This feature is not implemented: Only support single unnest expression for now | |||
select unnest(column1), unnest(column2) from unnest_table; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems multiple columns are not supported yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I mean multiple columns is actually multiple unnest
expression, it seems misleading
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jayzhan211 I renamed it to "multiple unnest functions"
data: transformed_expr, | ||
transformed, | ||
tnr: _, | ||
} = expr.transform_up_mut(&mut |expr: Expr| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the logic of unnest is split out of others two?
is doing if else directly without transformed more readable?
if unnest
else if column
else {
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jayzhan211 The transform is essential for this feature. We need to rewrite the expression in bottom-up recursion such that we can extract the child and parent expression of the unnest function. The reason to split out the Column variant is that: we need to retain the relation
field
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I got it. First time seeing transform_up_mut
, not familiar with it.
Ok(arg_data_types[0].clone()) | ||
let arg_data_type = arg_data_types[0].clone(); | ||
// Unnest's output type is the inner type of the list | ||
match arg_data_type{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @YjyJeff
datafusion/sql/src/select.rs
Outdated
data: transformed_expr, | ||
transformed, | ||
tnr: _, | ||
} = expr.transform_up_mut(&mut |expr: Expr| { | ||
let column_name = expr.display_name()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be moved inside the if statement, because it is not used in the else statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
My only slight concern is that there are two projections around the unnest plan, and there are always an extra copy of projection expressions. However, this seems hard to avoid unless UnnestExec
can unnest an expression, rather than just a column. This could perhaps be a future work.
Thank you for your nice contribution @YjyJeff
Which issue does this PR close?
Closees 9591 and 9589
Rationale for this change
Users could use unnest function with other expressions instead of writing a new wrapper query
What changes are included in this PR?
Are these changes tested?
Yes
Are there any user-facing changes?
No