-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace the filter and aggregate query in stats call with subqueries … #133
Conversation
…to increase performance on larger datasets.
Why are you not mentioning the PostgreSQL versions you have tested this on? Where is the |
We support PostgreSQL versions from 9.5 to 16. |
The last post in #132 had the info. I can post the raw explain. The postgres database data was also in the linked issue. Environment data:Minion version: 10.29
PostgreSQL: 14.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit The current call to the stats query takes about 8-10 seconds: Changing the query to something without the filter aggregations using selects, ~1 second: |
Please don't use those abstract formats for posting query plans. I just want to see the raw plan. Here's examples for PostgreSQL 16 with 2.5 million jobs. Before:
After:
|
PostgreSQL 14.5: Existing
Plan
Proposed
Plan
Edited by @kraih: fixed code formatting |
And don't forget to squash commits when you fixed the SQL formatting. |
Pretty good sign that the plans from 14.5 to 16 are so similar, we can probably merge this PR once it is ready. |
…to increase performance on larger datasets. reformat sql to match existing style
lib/Minion/Backend/Pg.pm
Outdated
FROM minion_jobs" | ||
)->hash; | ||
my $stats = $self->pg->db->query(qq{ | ||
SELECT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation and quoting is still inconsistent, otherwise it looks ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see three different quoting formats in the existing module:
- '
- "
- q{}
I can modify to whichever one is preferable. Below is modified to use q{} instead of the current qq{} and I pushed the first line indentation in to match some of the other queries at 4 spaces.
sub stats {
my $self = shift;
my $stats = $self->pg->db->query(q{
SELECT
(SELECT COUNT(*) FROM minion_jobs WHERE state = 'inactive' AND (expires IS NULL OR expires > NOW())) AS inactive_jobs,
(SELECT COUNT(*) FROM minion_jobs WHERE state = 'active') AS active_jobs,
(SELECT COUNT(*) FROM minion_jobs WHERE state = 'failed') AS failed_jobs,
(SELECT COUNT(*) FROM minion_jobs WHERE state = 'finished') AS finished_jobs,
(SELECT COUNT(*) FROM minion_jobs WHERE state = 'inactive' AND delayed > NOW()) AS delayed_jobs,
(SELECT COUNT(*) FROM minion_locks WHERE expires > NOW()) AS active_locks,
(SELECT COUNT(DISTINCT worker) FROM minion_jobs mj WHERE state = 'active') AS active_workers,
(SELECT CASE WHEN is_called THEN last_value ELSE 0 END FROM minion_jobs_id_seq) AS enqueued_jobs,
(SELECT COUNT(*) FROM minion_workers) AS workers,
EXTRACT(EPOCH FROM NOW() - PG_POSTMASTER_START_TIME()) AS uptime
})->hash;
$stats->{inactive_workers} = $stats->{workers} - $stats->{active_workers};
return $stats;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just make it look exactly like one of the existing cases, including quotation token in front of and after the SQL string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We indent with 2 spaces everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll need to let me know if the indentation is how you want it. Since this is not like any other queries in the module, I indented 2 spaces after the initial SELECT statement for each sub-select and the extract, there are no WHERE or FROM statements.
I modified quoting to just use double quotes and wrapped it similarly to the other queries in the module.
…to increase performance on larger datasets. reformat sql to match existing style
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change LGTM, and the numbers are convincing.
Unfortunate that the commits were not squashed. |
Increase performance on larger datasets
Summary
Replace the filter/aggregation queries with subqueries. Query performance on data sets with records over 2-3 million records goes from 5-10 seconds to ~1 second depending on the backing database resources and configuration.
Example:
PG Database:
16 CPU / 32 GB MEM
Current state:
Raw query times:
Existing query time:
Proposed query time:
UI Response Times:
Existing query time:
Proposed query time:
Motivation
Over the course of the past several weeks I've implemented approximately 14 separate Minion installs of varying sizes. 1 or 2 of these implementations has a much heavier workload and is sitting at ~6,000,000 completed jobs. As the number of records in the minion_jobs table increased, I noticed a significant slowdown in response times, specifically in the admin UI. After doing some investigation and database tuning, I found the most noticeable slowdown to be in the stats query.
My last post in the closed issue below links to relevant data regarding the queries and additional information.
References
Minion becomes slow, admin unusable with processed jobs between 2-3 million records