Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(query): TopN window operator #16726

Merged
merged 17 commits into from
Nov 13, 2024
Merged

Conversation

forsaken628
Copy link
Collaborator

@forsaken628 forsaken628 commented Oct 29, 2024

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

fixes: #16394

This PR replaces row-by-row hash with sorting and then hashing only once per partition. the increased overhead of sort is offset by the reduced overhead of copy and hash. It looks like this optimisation is highly generalisable.

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@github-actions github-actions bot added the pr-feature this PR introduces a new feature to the codebase label Oct 29, 2024
@forsaken628 forsaken628 self-assigned this Oct 29, 2024
@forsaken628 forsaken628 added the ci-benchmark Benchmark: run all test label Oct 29, 2024
Copy link
Contributor

Docker Image for PR

  • tag: pr-16726-23a65eb-1730218502

note: this image tag is only available for internal use,
please check the internal doc for more details.

@forsaken628 forsaken628 removed the ci-benchmark Benchmark: run all test label Oct 29, 2024
@forsaken628 forsaken628 requested a review from sundy-li October 30, 2024 03:09
Signed-off-by: coldWater <[email protected]>
Signed-off-by: coldWater <[email protected]>
Signed-off-by: coldWater <[email protected]>
Signed-off-by: coldWater <[email protected]>
Signed-off-by: coldWater <[email protected]>
Signed-off-by: coldWater <[email protected]>
Signed-off-by: coldWater <[email protected]>
Signed-off-by: coldWater <[email protected]>
Signed-off-by: coldWater <[email protected]>
@forsaken628 forsaken628 force-pushed the top-n-window branch 2 times, most recently from cc57d0e to cf8d5c9 Compare November 4, 2024 07:41
Signed-off-by: coldWater <[email protected]>
@forsaken628 forsaken628 force-pushed the top-n-window branch 2 times, most recently from cde5b15 to 91750d6 Compare November 5, 2024 07:38
Signed-off-by: coldWater <[email protected]>
@forsaken628 forsaken628 marked this pull request as ready for review November 5, 2024 08:22
@sundy-li
Copy link
Member

sundy-li commented Nov 5, 2024

@forsaken628 Is there any perf results for window topn to share?

@forsaken628 forsaken628 added the ci-cloud Build docker image for cloud test label Nov 5, 2024
Copy link
Contributor

github-actions bot commented Nov 5, 2024

Docker Image for PR

  • tag: pr-16726-96169f1-1730810767

note: this image tag is only available for internal use,
please check the internal doc for more details.

@forsaken628
Copy link
Collaborator Author

benchmark:

SELECT * FROM (
  SELECT number, rank() OVER ( PARTITION BY number % 3 ORDER BY number ) AS c 
  FROM numbers(1000000)
) WHERE c < 3

this pr
version: pr-16726-96169f1-1730810767
size: Small
Total Execution Time: 23ms

WindowPartition
estimated rows: 1000000.00
cpu time: 48.109029ms
wait time: 812.222µs
output rows: 96

Window
cpu time: 28.382µs

main
version: cloud dev latest
size: Small
Total Execution Time: 127ms

WindowPartition
estimated rows: 1000000.00
cpu time: 100.431594ms
wait time: 1.078021ms
output rows: 1 million

Window
cpu time: 46.962569ms

@forsaken628
Copy link
Collaborator Author

benchmark:

create table random_tab (a int16, b String) Engine = Random;
explain ANALYZE select * from (select a % 1000, rank() over (PARTITION by a % 1000 ORDER by b) rk from (select * from random_tab limit 10000000)) where rk < 3;

this pr
version: pr-16726-96169f1-1730810767
size: Small

Window
├── output columns: [random_tab.a (#0), random_tab.b (#1), rank_part_0 (#2), rank() OVER ( PARTITION BY a % 1000 ORDER BY b ) (#3)]
├── aggregate function: [rank]
├── partition by: [rank_part_0]
├── order by: [b]
├── frame: [Range: Preceding(None) ~ CurrentRow]
├── cpu time: 10.049676ms
├── output rows: 64 thousand
├── output bytes: 1.67 MiB
└── WindowPartition
    ├── output columns: [random_tab.a (#0), random_tab.b (#1), rank_part_0 (#2)]
    ├── hash keys: [rank_part_0]
    ├── estimated rows: 0.00
    ├── cpu time: 1.941062005s
    ├── wait time: 3.682208ms
    ├── output rows: 64 thousand
    ├── output bytes: 1.18 MiB
    └── EvalScalar
        ├── output columns: [random_tab.a (#0), random_tab.b (#1), rank_part_0 (#2)]
        ├── expressions: [random_tab.a (#0) % 1000]
        ├── estimated rows: 0.00
        ├── cpu time: 27.624338ms
        ├── output rows: 10 million
        ├── output bytes: 184.77 MiB

main
version: cloud dev latest
size: Small

Window
├── output columns: [random_tab.a (#0), random_tab.b (#1), rank_part_0 (#2), rank() OVER ( PARTITION BY a % 1000 ORDER BY b ) (#3)]
├── aggregate function: [rank]
├── partition by: [rank_part_0]
├── order by: [b]
├── frame: [Range: Preceding(None) ~ CurrentRow]
├── cpu time: 912.368447ms
├── output rows: 10 million
├── output bytes: 261.07 MiB
└── WindowPartition
    ├── output columns: [random_tab.a (#0), random_tab.b (#1), rank_part_0 (#2)]
    ├── hash keys: [rank_part_0]
    ├── estimated rows: 0.00
    ├── cpu time: 2.927932431s
    ├── wait time: 4.966152ms
    ├── output rows: 10 million
    ├── output bytes: 184.78 MiB
    └── EvalScalar
        ├── output columns: [random_tab.a (#0), random_tab.b (#1), rank_part_0 (#2)]
        ├── expressions: [random_tab.a (#0) % 1000]
        ├── estimated rows: 0.00
        ├── cpu time: 27.556557ms
        ├── output rows: 10 million
        ├── output bytes: 184.77 MiB

@forsaken628
Copy link
Collaborator Author

forsaken628 commented Nov 6, 2024

benchmark:

create table random_tab (a int16, b String) Engine = Random;
explain ANALYZE select * from (select a % 100000, rank() over (PARTITION by a % 100000 ORDER by b) rk from (select * from random_tab limit 10000000)) where rk < 1000;

this pr
version: pr-16726-96169f1-1730810767
size: Small

Window
├── output columns: [random_tab.a (#0), random_tab.b (#1), rank_part_0 (#2), rank() OVER ( PARTITION BY a % 100000 ORDER BY b ) (#3)]
├── aggregate function: [rank]
├── partition by: [rank_part_0]
├── order by: [b]
├── frame: [Range: Preceding(None) ~ CurrentRow]
├── cpu time: 576.44352ms
├── output rows: 5.02 million
├── output bytes: 150.06 MiB
└── WindowPartition
    ├── output columns: [random_tab.a (#0), random_tab.b (#1), rank_part_0 (#2)]
    ├── hash keys: [rank_part_0]
    ├── estimated rows: 0.00
    ├── cpu time: 2.87370872s
    ├── wait time: 5.195948ms
    ├── output rows: 5.02 million
    ├── output bytes: 111.80 MiB
    └── EvalScalar
        ├── output columns: [random_tab.a (#0), random_tab.b (#1), rank_part_0 (#2)]
        ├── expressions: [random_tab.a (#0) % 100000]
        ├── estimated rows: 0.00
        ├── cpu time: 27.332388ms
        ├── output rows: 10 million
        ├── output bytes: 222.92 MiB

main
version: cloud dev latest
size: Small

Window
├── output columns: [random_tab.a (#0), random_tab.b (#1), rank_part_0 (#2), rank() OVER ( PARTITION BY a % 100000 ORDER BY b ) (#3)]
├── aggregate function: [rank]
├── partition by: [rank_part_0]
├── order by: [b]
├── frame: [Range: Preceding(None) ~ CurrentRow]
├── cpu time: 935.685435ms
├── output rows: 10 million
├── output bytes: 299.22 MiB
└── WindowPartition
    ├── output columns: [random_tab.a (#0), random_tab.b (#1), rank_part_0 (#2)]
    ├── hash keys: [rank_part_0]
    ├── estimated rows: 0.00
    ├── cpu time: 3.020992844s
    ├── wait time: 6.188028ms
    ├── output rows: 10 million
    ├── output bytes: 222.92 MiB
    └── EvalScalar
        ├── output columns: [random_tab.a (#0), random_tab.b (#1), rank_part_0 (#2)]
        ├── expressions: [random_tab.a (#0) % 100000]
        ├── estimated rows: 0.00
        ├── cpu time: 23.836146ms
        ├── output rows: 10 million
        ├── output bytes: 222.92 MiB

@sundy-li sundy-li added this pull request to the merge queue Nov 13, 2024
Merged via the queue into databendlabs:main with commit aab0f85 Nov 13, 2024
74 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-cloud Build docker image for cloud test pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature: TopN window operator
2 participants