-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: add rule to merge GroupBy and Window operators #117499
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice speedup! A couple of drive-by comments from me.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @michae2)
-- commits
line 20 at r2:
Does the ordering within the window frame not matter here? I think so but thought I'd ask.
pkg/sql/opt/memo/logical_props_builder.go
line 2927 at r2 (raw file):
continue } // Aggregations and first, last, and nth value functions always produce the
nit: nth_value
should be removed from the comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting! Still wrapping my head around this.
Reviewed 1 of 2 files at r1, 2 of 2 files at r2, 1 of 7 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball, @mgartner, and @yuzefovich)
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Does the ordering within the window frame not matter here? I think so but thought I'd ask.
Maybe for array_agg
or string_agg
? But even if there is no ORDER BY
in the window, I would expect every row within the partition to process the partition rows in the same order, so should get the same result.
pkg/sql/opt/norm/window_funcs.go
line 223 at r3 (raw file):
if windows[i].Frame.FrameExclusion != treewindow.NoExclusion || windows[i].Frame.StartBoundType != treewindow.UnboundedPreceding || windows[i].Frame.EndBoundType != treewindow.UnboundedFollowing {
It seems like we can lift the unbounded window requirement if the partition cols are a strict key? Because then each partition is only a single row, so there's only one single-row window frame? (Do I have that right?)
If that's true, instead of checking whether the windows are unbounded, would it make sense to check that all of the window functions results are functional dependencies of the partition columns?
pkg/sql/opt/norm/rules/groupby.opt
line 593 at r3 (raw file):
# Here's an example with slightly altered SQL syntax: # # SELECT max(b), const_agg(foo), const_agg(bar)
micro-nit: there's some whitespace or control character at the end of this line and a few of the other lines in this SQL statement
pkg/sql/opt/norm/rules/groupby.opt
line 598 at r3 (raw file):
# SELECT *, count(c) OVER w AS foo, array_agg(d) OVER w AS bar # FROM abcd # WINDOW w AS (PARTITION BY a ORDER BY d)
Seems like this example doesn't satisfy (4)? I guess if a
is a strict key of abcd
it still works?
pkg/sql/opt/norm/rules/groupby.opt
line 602 at r3 (raw file):
# GROUP BY a; # => # SELECT max(b), count(c), array_agg(d) FROM abcd GROUP BY a ORDER BY d;
Shouldn't the ORDER BY d
become part of the array_agg
(and any other window functions that are lifted to the GROUP BY) rather than becoming the overall ordering of the GROUP BY? I.e. shouldn't this be something like:
SELECT max(b), count(c ORDER BY d), array_agg(d ORDER BY d) FROM abcd GROUP BY a;
And then assuming something like this works, seems like we could lift the requirement that the GROUP BY be unordered?
pkg/sql/opt/norm/rules/groupby.opt
line 627 at r3 (raw file):
(WindowPartition $windowPrivate) ) & (ColsAreSubset $groupingCols $inputCols)
Why is this necessary if we've already checked that the grouping cols are equal to the partition cols?
pkg/sql/opt/memo/testdata/logprops/window
line 329 at r2 (raw file):
FROM kv WINDOW w AS ( PARTITION BY w ORDER BY v, s, d, f
This same case without ORDER BY
might be interesting (should make the window frame unbounded, I think?).
335ecd5
to
61c1cb6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @michae2, and @yuzefovich)
Previously, michae2 (Michael Erickson) wrote…
Maybe for
array_agg
orstring_agg
? But even if there is noORDER BY
in the window, I would expect every row within the partition to process the partition rows in the same order, so should get the same result.
That's a good point. If we hypothetically aggregated in a different order for rows within a frame, then this wouldn't work. But, I think we can rely on execution enforcing this invariant. I added a note to the getWindowPartitionDeps
function mentioning the assumption.
pkg/sql/opt/memo/logical_props_builder.go
line 2927 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit:
nth_value
should be removed from the comment?
Done.
pkg/sql/opt/norm/window_funcs.go
line 223 at r3 (raw file):
Previously, michae2 (Michael Erickson) wrote…
It seems like we can lift the unbounded window requirement if the partition cols are a strict key? Because then each partition is only a single row, so there's only one single-row window frame? (Do I have that right?)
If that's true, instead of checking whether the windows are unbounded, would it make sense to check that all of the window functions results are functional dependencies of the partition columns?
Hm, good point. I'll make the change. I'll add the strict-key case to the logical props builder so that it's reflected in the window's FDs.
I also noticed that we were missing some logic in the logical props builder to project the new columns, so I added that as well to the second commit.
pkg/sql/opt/norm/rules/groupby.opt
line 593 at r3 (raw file):
Previously, michae2 (Michael Erickson) wrote…
micro-nit: there's some whitespace or control character at the end of this line and a few of the other lines in this SQL statement
Done.
pkg/sql/opt/norm/rules/groupby.opt
line 598 at r3 (raw file):
Previously, michae2 (Michael Erickson) wrote…
Seems like this example doesn't satisfy (4)? I guess if
a
is a strict key ofabcd
it still works?
Oops, just forgot to add the window frame. Done.
pkg/sql/opt/norm/rules/groupby.opt
line 602 at r3 (raw file):
You're right that this syntax is incorrect, I'll fix it.
And then assuming something like this works, seems like we could lift the requirement that the GROUP BY be unordered?
The syntax is still implemented as an ordering on the GroupBy
operator as an internal ordering, not on its individual aggregate functions. So I think the restriction stands, at least for now.
pkg/sql/opt/norm/rules/groupby.opt
line 627 at r3 (raw file):
Previously, michae2 (Michael Erickson) wrote…
Why is this necessary if we've already checked that the grouping cols are equal to the partition cols?
Great point, removed it.
pkg/sql/opt/memo/testdata/logprops/window
line 329 at r2 (raw file):
Previously, michae2 (Michael Erickson) wrote…
This same case without
ORDER BY
might be interesting (should make the window frame unbounded, I think?).
Good point. I added the test case, and opened an issue to track normalizing the window frame #117716.
Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
61c1cb6
to
f4ef5eb
Compare
f4ef5eb
to
6fa9093
Compare
@mgartner pinging myself to review this |
054433d
to
4958a18
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r1, 9 of 9 files at r4, 1 of 18 files at r7, 18 of 18 files at r10, 10 of 10 files at r11, 6 of 8 files at r12, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball, @michae2, and @yuzefovich)
-- commits
line 29 at r11:
nit: "functin"
pkg/sql/opt/norm/testdata/rules/groupby
line 4760 at r12 (raw file):
I may be confused, but this sounds contradictory to:
// Condition 2: the aggregate function must be a AnyNotNullAgg, ConstAgg, // ConstNotNullAgg, or FirstAggOp that references a window function.
pkg/sql/opt/memo/logical_props_builder.go
line 1411 at r11 (raw file):
if !determinedCols.Empty() { // The partition columns determine some of the window function outputs. rel.FuncDeps.AddStrictDependency(window.Partition, determinedCols)
Just to be sure: this is a strict key even if some of the columns in determinedCols
are nullable, correct?
pkg/sql/opt/memo/logical_props_builder.go
line 2996 at r11 (raw file):
// NOTE: getWindowPartitionDeps assumes that execution performs aggregation in // the same order for every row in the window, even when there is no explicit // ORDER BY.
Is this really the case? I assume this is necessary to efficiently evaluate window functions? Is it possible this invariant changes in the future?
pkg/sql/opt/memo/testdata/logprops/window
line 461 at r11 (raw file):
lead(v, 2) OVER w FROM kv WINDOW w AS (PARTITION BY k)
nit: do you want to add an ORDER BY
here or something else that makes the frame "unbounded"?
pkg/sql/opt/norm/groupby_funcs.go
line 435 at r12 (raw file):
// MergeAggsAndWindow returns an AggregationsExpr that is equivalent to the // combination of the given (outer) aggregations and (inner) window functions. // ConstAgg outer aggregations that reference a window function are replaced with
nit: it's more than just ConstAgg aggregations, it's also AnyNotNull, ConstNotNullAgg, and FirstAgg, correct?
pkg/sql/opt/norm/testdata/rules/groupby
line 4796 at r12 (raw file):
│ └── count:8 └── const-agg [as=array_agg:9, outer=(9)] └── array_agg:9
I think there's a few more no-op test cases you could add:
- When the window functions are not deteremined by the grouping columns
- When there is a non-agg window function
- When there is a group-by aggregate that is not a const, anynotnull, constnotnull, or first.
- When there is a group-by aggregate that is one of the 4 above, but it references a column that is not produced by a window function.
This commit adds a new method, `AddStrictDependency`, to `FuncDepSet`. This will be used in the following commit to add a dependency between a Window operator's partition columns and its functions. Existing methods don't work for this because the dependency is not a key. Epic: None Release note: None
This commit adds logic to infer strict functional dependencies from a Window operator's partition column(s) to some or all of its window functions when the following conditions are satisfied: 1. The window function must be an aggregate, or first_value or last_value. 2. The window frame must be unbounded. The above conditions ensure that the window function always produces the same result given the same window frame, as well as that every row in a partition has the same window frame. This means that the window function produces the same output for every row in the partition, and therefore, the partition columns functionally determine the output of the window function. This patch also fixes a small omission made in the window function FD calculation, which caused the window's FDs to preserve an input key without extending the key to apply to all the window's output cols. Epic: None Release note: None
This commit adds a new norm rule, `FoldGroupByAndWindow`, which can merge a Window operator with a parent GroupBy operator when the grouping columns are the same as the partition columns. See the rule comment for the complete list of conditions. In addition to removing a potentially expensive Window operator, this transformation makes way for other rules to match. Fixes cockroachdb#113292 Release note: None
4958a18
to
147054d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @mgartner, @michae2, and @yuzefovich)
Previously, mgartner (Marcus Gartner) wrote…
nit: "functin"
Done.
pkg/sql/opt/memo/logical_props_builder.go
line 1411 at r11 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Just to be sure: this is a strict key even if some of the columns in
determinedCols
are nullable, correct?
Yes, a strict key just requires that any equivalency group for the determining columns is paired with a single set of values for the determined cols. NULLs are allowed in either set, as long as that condition is satisfied. This is analogous to the logic for GroupBy
that adds a strict key on the grouping columns.
pkg/sql/opt/memo/logical_props_builder.go
line 2996 at r11 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Is this really the case? I assume this is necessary to efficiently evaluate window functions? Is it possible this invariant changes in the future?
It's true right now, and I would be very surprised if we ever changed it. I'll note the invariant in both execution engines.
pkg/sql/opt/norm/groupby_funcs.go
line 435 at r12 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: it's more than just ConstAgg aggregations, it's also AnyNotNull, ConstNotNullAgg, and FirstAgg, correct?
I changed it to ConstAgg-like
.
pkg/sql/opt/memo/testdata/logprops/window
line 461 at r11 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: do you want to add an
ORDER BY
here or something else that makes the frame "unbounded"?
Done. Note that the ordering will have no effect, since the window function is operating on single-row partitions.
pkg/sql/opt/norm/testdata/rules/groupby
line 4760 at r12 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
I may be confused, but this sounds contradictory to:
// Condition 2: the aggregate function must be a AnyNotNullAgg, ConstAgg, // ConstNotNullAgg, or FirstAggOp that references a window function.
I clarified the comment to say (non ConstAgg-like) Sum GroupBy aggregate
.
pkg/sql/opt/norm/testdata/rules/groupby
line 4796 at r12 (raw file):
When the window functions are not deteremined by the grouping columns
We can't actually test this without hitting the check that the grouping and partition columns are equal, but the test on line 4645 does this.
When there is a non-agg window function
Done. I added one with row_number.
When there is a group-by aggregate that is not a const, anynotnull, constnotnull, or first.
The last case does this with the sum
.
When there is a group-by aggregate that is one of the 4 above, but it references a column that is not produced by a window function.
Done. Note that this case is actually allowed.
TFTR! bors r+ |
opt: add method to add strict dependency to FuncDepSet
This commit adds a new method,
AddStrictDependency
, toFuncDepSet
.This will be used in the following commit to add a dependency between
a Window operator's partition columns and its functions. Existing
methods don't work for this because the dependency is not a key.
Epic: None
Release note: None
opt: infer functional dependencies for window functions
This commit adds logic to infer strict functional dependencies from
a Window operator's partition column(s) to some or all of its window
functions when the following conditions are satisfied:
The above conditions ensure that the window function always produces the
same result given the same window frame, as well as that every row in a
partition has the same window frame. This means that the window function
produces the same output for every row in the partition, and therefore,
the partition columns functionally determine the output of the window
function.
Epic: None
Release note: None
opt: add rule to merge GroupBy and Window
This commit adds a new norm rule,
FoldGroupByAndWindow
, which canmerge a Window operator with a parent GroupBy operator when the grouping
columns are the same as the partition columns. See the rule comment for
the complete list of conditions. In addition to removing a potentially
expensive Window operator, this transformation makes way for other rules
to match.
Fixes #113292
Release note: None