Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: add rule to merge GroupBy and Window operators #117499

Merged
merged 3 commits into from
Dec 13, 2024

Conversation

DrewKimball
Copy link
Collaborator

opt: add method to add strict dependency to FuncDepSet

This commit adds a new method, AddStrictDependency, to FuncDepSet.
This will be used in the following commit to add a dependency between
a Window operator's partition columns and its functions. Existing
methods don't work for this because the dependency is not a key.

Epic: None

Release note: None

opt: infer functional dependencies for window functions

This commit adds logic to infer strict functional dependencies from
a Window operator's partition column(s) to some or all of its window
functions when the following conditions are satisfied:

  1. The window function must be an aggregate, or first_value or last_value.
  2. The window frame must be unbounded.

The above conditions ensure that the window function always produces the
same result given the same window frame, as well as that every row in a
partition has the same window frame. This means that the window function
produces the same output for every row in the partition, and therefore,
the partition columns functionally determine the output of the window
function.

Epic: None

Release note: None

opt: add rule to merge GroupBy and Window

This commit adds a new norm rule, FoldGroupByAndWindow, which can
merge a Window operator with a parent GroupBy operator when the grouping
columns are the same as the partition columns. See the rule comment for
the complete list of conditions. In addition to removing a potentially
expensive Window operator, this transformation makes way for other rules
to match.

Fixes #113292

Release note: None

@DrewKimball DrewKimball requested review from a team as code owners January 8, 2024 17:36
@DrewKimball DrewKimball requested review from michae2 and a team and removed request for a team January 8, 2024 17:36
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@DrewKimball
Copy link
Collaborator Author

//pkg/bench/rttanalysis:rttanalysis_test                                 PASSED in 154.6s

Executed 1 out of 1 test: 1 test passes.
name                         old time/op     new time/op     delta
ORMQueries/asyncpg_types-10      7.30s ± 1%      0.06s ± 1%  -99.23%  (p=0.000 n=10+9)

name                         old roundtrips  new roundtrips  delta
ORMQueries/asyncpg_types-10       0.00            4.00 ± 0%    +Inf%  (p=0.000 n=10+10)

name                         old alloc/op    new alloc/op    delta
ORMQueries/asyncpg_types-10     5.99GB ± 0%     0.04GB ± 2%  -99.27%  (p=0.000 n=10+10)

name                         old allocs/op   new allocs/op   delta
ORMQueries/asyncpg_types-10      32.9M ± 0%       0.3M ± 1%  -99.19%  (p=0.000 n=9+10)

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice speedup! A couple of drive-by comments from me.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @michae2)


-- commits line 20 at r2:
Does the ordering within the window frame not matter here? I think so but thought I'd ask.


pkg/sql/opt/memo/logical_props_builder.go line 2927 at r2 (raw file):

			continue
		}
		// Aggregations and first, last, and nth value functions always produce the

nit: nth_value should be removed from the comment?

@mgartner mgartner self-requested a review January 8, 2024 22:04
Copy link
Collaborator

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! Still wrapping my head around this.

Reviewed 1 of 2 files at r1, 2 of 2 files at r2, 1 of 7 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball, @mgartner, and @yuzefovich)


-- commits line 20 at r2:

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Does the ordering within the window frame not matter here? I think so but thought I'd ask.

Maybe for array_agg or string_agg? But even if there is no ORDER BY in the window, I would expect every row within the partition to process the partition rows in the same order, so should get the same result.


pkg/sql/opt/norm/window_funcs.go line 223 at r3 (raw file):

		if windows[i].Frame.FrameExclusion != treewindow.NoExclusion ||
			windows[i].Frame.StartBoundType != treewindow.UnboundedPreceding ||
			windows[i].Frame.EndBoundType != treewindow.UnboundedFollowing {

It seems like we can lift the unbounded window requirement if the partition cols are a strict key? Because then each partition is only a single row, so there's only one single-row window frame? (Do I have that right?)

If that's true, instead of checking whether the windows are unbounded, would it make sense to check that all of the window functions results are functional dependencies of the partition columns?


pkg/sql/opt/norm/rules/groupby.opt line 593 at r3 (raw file):

# Here's an example with slightly altered SQL syntax:
#
#   SELECT max(b), const_agg(foo), const_agg(bar) 

micro-nit: there's some whitespace or control character at the end of this line and a few of the other lines in this SQL statement


pkg/sql/opt/norm/rules/groupby.opt line 598 at r3 (raw file):

#       SELECT *, count(c) OVER w AS foo, array_agg(d) OVER w AS bar
#       FROM abcd
#       WINDOW w AS (PARTITION BY a ORDER BY d)

Seems like this example doesn't satisfy (4)? I guess if a is a strict key of abcd it still works?


pkg/sql/opt/norm/rules/groupby.opt line 602 at r3 (raw file):

#   GROUP BY a;
#   =>
#   SELECT max(b), count(c), array_agg(d) FROM abcd GROUP BY a ORDER BY d;

Shouldn't the ORDER BY d become part of the array_agg (and any other window functions that are lifted to the GROUP BY) rather than becoming the overall ordering of the GROUP BY? I.e. shouldn't this be something like:

SELECT max(b), count(c ORDER BY d), array_agg(d ORDER BY d) FROM abcd GROUP BY a;

And then assuming something like this works, seems like we could lift the requirement that the GROUP BY be unordered?


pkg/sql/opt/norm/rules/groupby.opt line 627 at r3 (raw file):

            (WindowPartition $windowPrivate)
        ) &
        (ColsAreSubset $groupingCols $inputCols)

Why is this necessary if we've already checked that the grouping cols are equal to the partition cols?


pkg/sql/opt/memo/testdata/logprops/window line 329 at r2 (raw file):

FROM kv
WINDOW w AS (
  PARTITION BY w ORDER BY v, s, d, f

This same case without ORDER BY might be interesting (should make the window frame unbounded, I think?).

Copy link
Collaborator Author

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @michae2, and @yuzefovich)


-- commits line 20 at r2:

Previously, michae2 (Michael Erickson) wrote…

Maybe for array_agg or string_agg? But even if there is no ORDER BY in the window, I would expect every row within the partition to process the partition rows in the same order, so should get the same result.

That's a good point. If we hypothetically aggregated in a different order for rows within a frame, then this wouldn't work. But, I think we can rely on execution enforcing this invariant. I added a note to the getWindowPartitionDeps function mentioning the assumption.


pkg/sql/opt/memo/logical_props_builder.go line 2927 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: nth_value should be removed from the comment?

Done.


pkg/sql/opt/norm/window_funcs.go line 223 at r3 (raw file):

Previously, michae2 (Michael Erickson) wrote…

It seems like we can lift the unbounded window requirement if the partition cols are a strict key? Because then each partition is only a single row, so there's only one single-row window frame? (Do I have that right?)

If that's true, instead of checking whether the windows are unbounded, would it make sense to check that all of the window functions results are functional dependencies of the partition columns?

Hm, good point. I'll make the change. I'll add the strict-key case to the logical props builder so that it's reflected in the window's FDs.

I also noticed that we were missing some logic in the logical props builder to project the new columns, so I added that as well to the second commit.


pkg/sql/opt/norm/rules/groupby.opt line 593 at r3 (raw file):

Previously, michae2 (Michael Erickson) wrote…

micro-nit: there's some whitespace or control character at the end of this line and a few of the other lines in this SQL statement

Done.


pkg/sql/opt/norm/rules/groupby.opt line 598 at r3 (raw file):

Previously, michae2 (Michael Erickson) wrote…

Seems like this example doesn't satisfy (4)? I guess if a is a strict key of abcd it still works?

Oops, just forgot to add the window frame. Done.


pkg/sql/opt/norm/rules/groupby.opt line 602 at r3 (raw file):
You're right that this syntax is incorrect, I'll fix it.

And then assuming something like this works, seems like we could lift the requirement that the GROUP BY be unordered?

The syntax is still implemented as an ordering on the GroupBy operator as an internal ordering, not on its individual aggregate functions. So I think the restriction stands, at least for now.


pkg/sql/opt/norm/rules/groupby.opt line 627 at r3 (raw file):

Previously, michae2 (Michael Erickson) wrote…

Why is this necessary if we've already checked that the grouping cols are equal to the partition cols?

Great point, removed it.


pkg/sql/opt/memo/testdata/logprops/window line 329 at r2 (raw file):

Previously, michae2 (Michael Erickson) wrote…

This same case without ORDER BY might be interesting (should make the window frame unbounded, I think?).

Good point. I added the test case, and opened an issue to track normalizing the window frame #117716.

Copy link

blathers-crl bot commented Jan 12, 2024

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@mgartner
Copy link
Collaborator

@mgartner pinging myself to review this

@DrewKimball DrewKimball force-pushed the slow-async branch 2 times, most recently from 054433d to 4958a18 Compare December 5, 2024 22:05
@mgartner mgartner requested a review from yuzefovich December 6, 2024 20:31
Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Great work!

Reviewed 2 of 2 files at r1, 9 of 9 files at r4, 1 of 18 files at r7, 18 of 18 files at r10, 10 of 10 files at r11, 6 of 8 files at r12, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball, @michae2, and @yuzefovich)


-- commits line 29 at r11:
nit: "functin"


pkg/sql/opt/norm/testdata/rules/groupby line 4760 at r12 (raw file):
I may be confused, but this sounds contradictory to:

  // Condition 2: the aggregate function must be a AnyNotNullAgg, ConstAgg,
  // ConstNotNullAgg, or FirstAggOp that references a window function.

pkg/sql/opt/memo/logical_props_builder.go line 1411 at r11 (raw file):

		if !determinedCols.Empty() {
			// The partition columns determine some of the window function outputs.
			rel.FuncDeps.AddStrictDependency(window.Partition, determinedCols)

Just to be sure: this is a strict key even if some of the columns in determinedCols are nullable, correct?


pkg/sql/opt/memo/logical_props_builder.go line 2996 at r11 (raw file):

// NOTE: getWindowPartitionDeps assumes that execution performs aggregation in
// the same order for every row in the window, even when there is no explicit
// ORDER BY.

Is this really the case? I assume this is necessary to efficiently evaluate window functions? Is it possible this invariant changes in the future?


pkg/sql/opt/memo/testdata/logprops/window line 461 at r11 (raw file):

  lead(v, 2) OVER w
FROM kv
WINDOW w AS (PARTITION BY k)

nit: do you want to add an ORDER BY here or something else that makes the frame "unbounded"?


pkg/sql/opt/norm/groupby_funcs.go line 435 at r12 (raw file):

// MergeAggsAndWindow returns an AggregationsExpr that is equivalent to the
// combination of the given (outer) aggregations and (inner) window functions.
// ConstAgg outer aggregations that reference a window function are replaced with

nit: it's more than just ConstAgg aggregations, it's also AnyNotNull, ConstNotNullAgg, and FirstAgg, correct?


pkg/sql/opt/norm/testdata/rules/groupby line 4796 at r12 (raw file):

           │    └── count:8
           └── const-agg [as=array_agg:9, outer=(9)]
                └── array_agg:9

I think there's a few more no-op test cases you could add:

  • When the window functions are not deteremined by the grouping columns
  • When there is a non-agg window function
  • When there is a group-by aggregate that is not a const, anynotnull, constnotnull, or first.
  • When there is a group-by aggregate that is one of the 4 above, but it references a column that is not produced by a window function.

This commit adds a new method, `AddStrictDependency`, to `FuncDepSet`.
This will be used in the following commit to add a dependency between
a Window operator's partition columns and its functions. Existing
methods don't work for this because the dependency is not a key.

Epic: None

Release note: None
This commit adds logic to infer strict functional dependencies from
a Window operator's partition column(s) to some or all of its window
functions when the following conditions are satisfied:
1. The window function must be an aggregate, or first_value or last_value.
2. The window frame must be unbounded.

The above conditions ensure that the window function always produces the
same result given the same window frame, as well as that every row in a
partition has the same window frame. This means that the window function
produces the same output for every row in the partition, and therefore,
the partition columns functionally determine the output of the window
function.

This patch also fixes a small omission made in the window function FD
calculation, which caused the window's FDs to preserve an input key
without extending the key to apply to all the window's output cols.

Epic: None

Release note: None
This commit adds a new norm rule, `FoldGroupByAndWindow`, which can
merge a Window operator with a parent GroupBy operator when the grouping
columns are the same as the partition columns. See the rule comment for
the complete list of conditions. In addition to removing a potentially
expensive Window operator, this transformation makes way for other rules
to match.

Fixes cockroachdb#113292

Release note: None
Copy link
Collaborator Author

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner, @michae2, and @yuzefovich)


-- commits line 29 at r11:

Previously, mgartner (Marcus Gartner) wrote…

nit: "functin"

Done.


pkg/sql/opt/memo/logical_props_builder.go line 1411 at r11 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Just to be sure: this is a strict key even if some of the columns in determinedCols are nullable, correct?

Yes, a strict key just requires that any equivalency group for the determining columns is paired with a single set of values for the determined cols. NULLs are allowed in either set, as long as that condition is satisfied. This is analogous to the logic for GroupBy that adds a strict key on the grouping columns.


pkg/sql/opt/memo/logical_props_builder.go line 2996 at r11 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Is this really the case? I assume this is necessary to efficiently evaluate window functions? Is it possible this invariant changes in the future?

It's true right now, and I would be very surprised if we ever changed it. I'll note the invariant in both execution engines.


pkg/sql/opt/norm/groupby_funcs.go line 435 at r12 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: it's more than just ConstAgg aggregations, it's also AnyNotNull, ConstNotNullAgg, and FirstAgg, correct?

I changed it to ConstAgg-like.


pkg/sql/opt/memo/testdata/logprops/window line 461 at r11 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: do you want to add an ORDER BY here or something else that makes the frame "unbounded"?

Done. Note that the ordering will have no effect, since the window function is operating on single-row partitions.


pkg/sql/opt/norm/testdata/rules/groupby line 4760 at r12 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

I may be confused, but this sounds contradictory to:

  // Condition 2: the aggregate function must be a AnyNotNullAgg, ConstAgg,
  // ConstNotNullAgg, or FirstAggOp that references a window function.

I clarified the comment to say (non ConstAgg-like) Sum GroupBy aggregate.


pkg/sql/opt/norm/testdata/rules/groupby line 4796 at r12 (raw file):

When the window functions are not deteremined by the grouping columns

We can't actually test this without hitting the check that the grouping and partition columns are equal, but the test on line 4645 does this.

When there is a non-agg window function

Done. I added one with row_number.

When there is a group-by aggregate that is not a const, anynotnull, constnotnull, or first.

The last case does this with the sum.

When there is a group-by aggregate that is one of the 4 above, but it references a column that is not produced by a window function.

Done. Note that this case is actually allowed.

@DrewKimball
Copy link
Collaborator Author

TFTR!

bors r+

@craig craig bot merged commit 32886df into cockroachdb:master Dec 13, 2024
22 checks passed
@DrewKimball DrewKimball deleted the slow-async branch December 13, 2024 01:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Slow introspection query in asyncpg
5 participants