-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support unnest
multiple arrays
#10044
Conversation
unnest
with multiple argumentsunnest
multiple columns
unnest
multiple columnsunnest
multiple arrays
@@ -186,7 +186,16 @@ pub enum Expr { | |||
|
|||
#[derive(Clone, PartialEq, Eq, Hash, Debug)] | |||
pub struct Unnest { | |||
pub exprs: Vec<Expr>, | |||
pub expr: Box<Expr>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the expression position, only one argument can be accepted.
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
That is a pretty sweet list of things this closes. Thank you @jonahgao -- I will try and review this PR over the next day or two if no one beats me to it |
query error DataFusion error: This feature is not implemented: Only support single unnest expression for now | ||
select unnest(column1), unnest(column2) from unnest_table; | ||
query ?I | ||
select unnest([]), unnest(NULL::int[]); |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should support null as well 🤔
query error DataFusion error: This feature is not implemented: unnest\(\) does not support null yet
select unnest([]), unnest(NULL);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how to support it yet. PostgreSQL, BigQuery, and Clickhouse don't support unnesting untyped NULLs.
postgres=# select unnest(NULL);
ERROR: function unnest(unknown) is not unique
LINE 1: select unnest(NULL);
^
HINT: Could not choose a best candidate function. You might need to add explicit type casts.
Although DuckDB supports it, its behavior seems a bit strange to me.
D select unnest(NULL), unnest([1,2]);
┌──────────────┬───────────────────────────────┐
│ unnest(NULL) │ unnest(main.list_value(1, 2)) │
│ int32 │ int32 │
├──────────────────────────────────────────────┤
│ 0 rows │
└──────────────────────────────────────────────┘
D select unnest(NULL::int[]), unnest([1,2]);
┌─────────────────────────────────┬───────────────────────────────┐
│ unnest(CAST(NULL AS INTEGER[])) │ unnest(main.list_value(1, 2)) │
│ int32 │ int32 │
├─────────────────────────────────┼───────────────────────────────┤
│ │ 1 │
│ │ 2 │
└─────────────────────────────────┴───────────────────────────────┘
Unnesting an untyped NULL and a typed NULL produced different results.
I think there may be two appropriate methods:
- Use type coercion to convert an untype NULL into a certain List type like
cast(NULL, NULL::int[])
. - Explicitly reject unnesting untyped NULLs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer 2, since PostgresSQL, and Clickhouse return errors in this case, we follow the majority.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer 2, since PostgresSQL, and Clickhouse return errors in this case, we follow the majority.
Make sense to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I double checked and this PR does error as expected. I'll add a small test for this case here and merge this PR.
❯ select unnest(null) from t;
This feature is not implemented: unnest() does not support null yet
unnest_field.name(), | ||
field.data_type().clone(), | ||
// Unnesting may produce NULLs even if the list is not null. | ||
// For example: unnset([1], []) -> 1, null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although the test does not cover this case, sadly I don't think there is currently a way to create a list with nullable as false 🤔 Maybe we can write a simple rust test for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so too. true
should be equivalent to the previous unnest_field.is_nullable()
.
I planned to write some tests related to DataFrame, which could potentially implement it. But the changes in this PR are larger than I expected, so those tests have been delayed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I planned to write some tests related to DataFrame, which could potentially implement it. But the changes in this PR are larger than I expected, so those tests have been delayed.
We can do it in another PR!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do it in another PR!
👌
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks pretty nice!
return Ok(input); | ||
} | ||
}; | ||
let mut unnested_fields: HashMap<usize, _> = HashMap::with_capacity(columns.len()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
🚀 |
Thank you @alamb for the new test and the review. Thank you @jayzhan211 for the review. |
Which issue does this PR close?
Closes #212.
Closes #1608.
Closes #6555.
Closes #6796.
Closes #7087.
Rationale for this change
From the PostgreSQL's document.
An
unnest
function with N parameters is equivalent to N single-parameterunnest
functions, that is,select * from unnest(arg1, arg2, arg3)
is equivalent toselect unnest(arg1), unnest(arg2), unnest(arg3)
.This PR will unify them all into the latter to be handled by
UnnestExec
.In order to support
UnnestExec
with multiple unnest columns:RecordBatch
, some list columns need to be unnested. We will expand the values in each list into multiple rows, taking the longest length among these lists, and shorter lists are padded with NULLs.What changes are included in this PR?
Expand
UnnestExec
to support unnesting multiple arrays.Also, enable the following two types of queries:
unnest
functions in the select list.from
clause,unnest
can support multiple arguments.Are these changes tested?
Yes
Are there any user-facing changes?
Yes.
Several unnest related structs have changed.