-
Notifications
You must be signed in to change notification settings - Fork 753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(query): TopN window operator #16726
Conversation
Docker Image for PR
|
bd5ff51
to
3ee4458
Compare
Signed-off-by: coldWater <[email protected]>
Signed-off-by: coldWater <[email protected]>
Signed-off-by: coldWater <[email protected]>
Signed-off-by: coldWater <[email protected]>
Signed-off-by: coldWater <[email protected]>
Signed-off-by: coldWater <[email protected]>
3ee4458
to
bef0889
Compare
Signed-off-by: coldWater <[email protected]>
Signed-off-by: coldWater <[email protected]>
Signed-off-by: coldWater <[email protected]>
Signed-off-by: coldWater <[email protected]>
Signed-off-by: coldWater <[email protected]>
Signed-off-by: coldWater <[email protected]>
cc57d0e
to
cf8d5c9
Compare
Signed-off-by: coldWater <[email protected]>
cde5b15
to
91750d6
Compare
Signed-off-by: coldWater <[email protected]>
91750d6
to
e6c6e64
Compare
@forsaken628 Is there any perf results for window topn to share? |
Docker Image for PR
|
benchmark: SELECT * FROM (
SELECT number, rank() OVER ( PARTITION BY number % 3 ORDER BY number ) AS c
FROM numbers(1000000)
) WHERE c < 3 this pr WindowPartition Window main WindowPartition Window |
benchmark: create table random_tab (a int16, b String) Engine = Random;
explain ANALYZE select * from (select a % 1000, rank() over (PARTITION by a % 1000 ORDER by b) rk from (select * from random_tab limit 10000000)) where rk < 3; this pr
main
|
benchmark: create table random_tab (a int16, b String) Engine = Random;
explain ANALYZE select * from (select a % 100000, rank() over (PARTITION by a % 100000 ORDER by b) rk from (select * from random_tab limit 10000000)) where rk < 1000; this pr
main
|
Signed-off-by: coldWater <[email protected]>
src/query/sql/src/planner/optimizer/rule/rewrite/rule_push_down_filter_window_top_n.rs
Outdated
Show resolved
Hide resolved
Signed-off-by: coldWater <[email protected]>
Signed-off-by: coldWater <[email protected]>
I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/
Summary
fixes: #16394
This PR replaces row-by-row hash with sorting and then hashing only once per partition. the increased overhead of sort is offset by the reduced overhead of copy and hash. It looks like this optimisation is highly generalisable.
Tests
Type of change
This change is