Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The destruction of GroupState in high cardinality aggregation takes a lot of time #1504

Closed
ic4y opened this issue Dec 29, 2021 · 3 comments
Closed
Labels
enhancement New feature or request

Comments

@ic4y
Copy link
Contributor

ic4y commented Dec 29, 2021

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

The test is as follows(4core 16G MacOS)
select count(1) from (select user_id from event group by user_id)a

The total data is 350 million, and the user_id deduplication number is 5 million. The entire query takes 15s. Viewing through pprf, it is found that about 60% of the time is destructing the GroupState.

image

Describe the solution you'd like

Using bumpalo to allocate GroupState to a chunk of memory as much as possible, and then release it wholely. Will the destruction time be much better in this way?

@ic4y ic4y added the enhancement New feature or request label Dec 29, 2021
@ic4y
Copy link
Contributor Author

ic4y commented Dec 30, 2021

By using --features "mimalloc", it was found that the test results did not differ much.

@rdettai
Copy link
Contributor

rdettai commented Jan 3, 2022

Thanks for the analysis @ic4y ! I am quite surprised we pay the fragmentation that comes from the row oriented structure of Accumulators that much more at de-allocation time than when computing the actual aggregates.

I guess the arena strategy that you are suggesting should work, though I don't know bumpalo specifically.

It is worth referencing your discussion about making accumulators column based in #956, which I believe should also solve this issue.

@ic4y
Copy link
Contributor Author

ic4y commented Jan 6, 2022

By using --features "mimalloc", it was found that the test results did not differ much.

Using --features "mimalloc" did not take effect.

Add the following code in the main.rs (this is mentioned in the user guide) can solve this problem

#[global_allocator]
static ALLOC: snmalloc_rs::SnMalloc = snmalloc_rs::SnMalloc;

@ic4y ic4y closed this as completed Jan 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants