-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move nested union optimization from plan builder to logical optimizer #7695
Conversation
e8e6a32
to
79e20c6
Compare
4c42dc4
to
770bf79
Compare
a3a2484
to
b926721
Compare
b926721
to
7259e9f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @maruschin -- this is looking quite close. The basic idea and testing looks very solid. I think several of the CI failures on this PR are due to #7701, so merging / rebasing with master will help.
I also double checked this branch gets the same plan as main
❯ explain select * from (select 1 UNION ALL (select 2 UNION ALL select 3));
+---------------+----------------------------------------+
| plan_type | plan |
+---------------+----------------------------------------+
| logical_plan | Union |
| | Projection: Int64(1) AS Int64(1) |
| | EmptyRelation |
| | Projection: Int64(2) AS Int64(1) |
| | EmptyRelation |
| | Projection: Int64(3) AS Int64(1) |
| | EmptyRelation |
| physical_plan | UnionExec |
| | ProjectionExec: expr=[1 as Int64(1)] |
| | EmptyExec: produce_one_row=true |
| | ProjectionExec: expr=[2 as Int64(1)] |
| | EmptyExec: produce_one_row=true |
| | ProjectionExec: expr=[3 as Int64(1)] |
| | EmptyExec: produce_one_row=true |
| | |
+---------------+----------------------------------------+
2 rows in set. Query took 0.008 seconds.
I found a few sqllogictests seem to fail, specifically
cargo test --test sqllogictests -- timestamp
Running "timestamps.slt"
External error: query result mismatch:
[SQL] SELECT date_trunc('hour', TIMESTAMPTZ '2000-01-01T00:00:00+00:45') as ts_irregular_offset
UNION ALL
SELECT date_trunc('hour', TIMESTAMPTZ '2000-01-01T00:00:00+00:30') as ts_irregular_offset
UNION ALL
SELECT date_trunc('hour', TIMESTAMPTZ '2000-01-01T00:00:00+00:15') as ts_irregular_offset
UNION ALL
SELECT date_trunc('hour', TIMESTAMPTZ '2000-01-01T00:00:00-00:15') as ts_irregular_offset
UNION ALL
SELECT date_trunc('hour', TIMESTAMPTZ '2000-01-01T00:00:00-00:30') as ts_irregular_offset
UNION ALL
SELECT date_trunc('hour', TIMESTAMPTZ '2000-01-01T00:00:00-00:45') as ts_irregular_offset
[Diff] (-expected|+actual)
- 1999-12-31T23:00:00Z
- 1999-12-31T23:00:00Z
- 1999-12-31T23:00:00Z
- 2000-01-01T00:00:00Z
- 2000-01-01T00:00:00Z
- 2000-01-01T00:00:00Z
+ 1999-12-31T23:00:00
+ 1999-12-31T23:00:00
+ 1999-12-31T23:00:00
+ 2000-01-01T00:00:00
+ 2000-01-01T00:00:00
+ 2000-01-01T00:00:00
at test_files/timestamps.slt:1471
Which looks to me like the timestamp logic got lost somehow 🤔
c4a11b7
to
cdad0be
Compare
dfd4dab
to
4e116ea
Compare
d8024ab
to
e874a08
Compare
6258d57
to
e59e54c
Compare
e59e54c
to
65f09fa
Compare
65f09fa
to
bd6c4d4
Compare
plan: &LogicalPlan, | ||
_config: &dyn OptimizerConfig, | ||
) -> Result<Option<LogicalPlan>> { | ||
// TODO: Add optimization for nested distinct unions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at one point (maybe as a TODO) we probably should also remove unions that have only a single child since they are basically a no-op/pass-through. Not sure if this should be an separate optimizer rule or if this should be done in this rule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the EliminateOneUnion
rule, also added in this PR, handles the one union case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much @maruschin -- this is a really nice piece of work. It is well tested and the code is very clean.
Thank you @crepererum and @jackwener for the reviews
plan: &LogicalPlan, | ||
_config: &dyn OptimizerConfig, | ||
) -> Result<Option<LogicalPlan>> { | ||
// TODO: Add optimization for nested distinct unions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the EliminateOneUnion
rule, also added in this PR, handles the one union case
I took the liberty of merging this branch up from main and fixing clippy as I had it all checked out already I also wrote some end to end tests in #7787 |
Thanks again @maruschin |
…apache#7695) * Add naive implementation of eliminate_nested_union * Remove union optimization from LogicalPlanBuilder::union * Fix propagate_union_children_different_schema test * Add implementation of eliminate_one_union * Simplified eliminate_nested_union test * Fix * clippy --------- Co-authored-by: Evgeny Maruschenko <[email protected]> Co-authored-by: Andrew Lamb <[email protected]>
Which issue does this PR close?
Closes #7481.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?