Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace the filter and aggregate query in stats call with subqueries … #133

Merged
merged 3 commits into from
Jun 5, 2024

Conversation

rshingleton
Copy link

Increase performance on larger datasets

Summary

Replace the filter/aggregation queries with subqueries. Query performance on data sets with records over 2-3 million records goes from 5-10 seconds to ~1 second depending on the backing database resources and configuration.

Example:

PG Database:
16 CPU / 32 GB MEM

Current state:

inactive_jobs active_jobs failed_jobs finished_jobs delayed_jobs active_locks active_workers enqueued_jobs workers uptime
0 229 41666 6097633 0 0 24 9193564 24 81400.937263

Raw query times:

Existing query time:

  • execution 1: 12s
  • execution 2: 9s
  • execution 3: 10s

Proposed query time:

  • execution 1: .990s
  • execution 2: 1s
  • execution 3: 1s

UI Response Times:

Existing query time:

  • execution 1: 200 OK (9.671184s, 0.103/s)
  • execution 2: 200 OK (9.210999s, 0.109/s)
  • execution 3: 200 OK (9.304173s, 0.107/s)

Proposed query time:

  • execution 1: 200 OK (1.682046s, 0.595/s)
  • execution 2: 200 OK (1.727212s, 0.579/s)
  • execution 3: 200 OK (1.660772s, 0.602/s)

Motivation

Over the course of the past several weeks I've implemented approximately 14 separate Minion installs of varying sizes. 1 or 2 of these implementations has a much heavier workload and is sitting at ~6,000,000 completed jobs. As the number of records in the minion_jobs table increased, I noticed a significant slowdown in response times, specifically in the admin UI. After doing some investigation and database tuning, I found the most noticeable slowdown to be in the stats query.

My last post in the closed issue below links to relevant data regarding the queries and additional information.

References

Minion becomes slow, admin unusable with processed jobs between 2-3 million records

…to increase performance on larger datasets.
@kraih
Copy link
Member

kraih commented Jun 2, 2024

Why are you not mentioning the PostgreSQL versions you have tested this on? Where is the EXPLAIN ANALYZE output for comparison?

lib/Minion/Backend/Pg.pm Outdated Show resolved Hide resolved
@kraih
Copy link
Member

kraih commented Jun 2, 2024

We support PostgreSQL versions from 9.5 to 16.

@rshingleton
Copy link
Author

The last post in #132 had the info. I can post the raw explain. The postgres database data was also in the linked issue.

Environment data:

Minion version: 10.29
Perl test version:

  • perl 5, version 34, subversion 0 (v5.34.0) built for x86_64-linux-thread-multi
  • perl 5, version 36, subversion 3 (v5.36.3) built for darwin-2level

PostgreSQL: 14.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit

The current call to the stats query takes about 8-10 seconds:
https://explain.dalibo.com/plan/3633h4g79884d2g6#plan

Changing the query to something without the filter aggregations using selects, ~1 second:
https://explain.dalibo.com/plan/17ac42f69a623353

@kraih
Copy link
Member

kraih commented Jun 2, 2024

Please don't use those abstract formats for posting query plans. I just want to see the raw plan. Here's examples for PostgreSQL 16 with 2.5 million jobs.

Before:

EXPLAIN ANALYZE SELECT COUNT(*) FILTER (WHERE state = 'inactive' AND (expires IS NULL OR expires > NOW())) AS inactive_jobs,
       COUNT(*) FILTER (WHERE state = 'active') AS active_jobs, COUNT(*) FILTER (WHERE state = 'failed') AS failed_jobs,
       COUNT(*) FILTER (WHERE state = 'finished') AS finished_jobs,
       COUNT(*) FILTER (WHERE state = 'inactive' AND delayed > NOW()) AS delayed_jobs,
       (SELECT COUNT(*) FROM minion_locks WHERE expires > NOW()) AS active_locks,
       COUNT(DISTINCT worker) FILTER (WHERE state = 'active') AS active_workers,
       (SELECT CASE WHEN is_called THEN last_value ELSE 0 END FROM minion_jobs_id_seq) AS enqueued_jobs,
       (SELECT COUNT(*) FROM minion_workers) AS workers,
       EXTRACT(EPOCH FROM NOW() - PG_POSTMASTER_START_TIME()) AS uptime
     FROM minion_jobs;
                                                             QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=675682.03..675682.05 rows=1 width=104) (actual time=1551.513..1551.516 rows=1 loops=1)
   InitPlan 1 (returns $0)
     ->  Aggregate  (cost=3.03..3.04 rows=1 width=8) (actual time=0.029..0.030 rows=1 loops=1)
           ->  Seq Scan on minion_locks  (cost=0.00..3.03 rows=2 width=0) (actual time=0.027..0.028 rows=0 loops=1)
                 Filter: (expires > now())
   InitPlan 2 (returns $1)
     ->  Seq Scan on minion_jobs_id_seq  (cost=0.00..1.51 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)
   InitPlan 3 (returns $2)
     ->  Aggregate  (cost=1.51..1.52 rows=1 width=8) (actual time=0.008..0.008 rows=1 loops=1)
           ->  Seq Scan on minion_workers  (cost=0.00..1.51 rows=1 width=0) (actual time=0.006..0.007 rows=1 loops=1)
   ->  Sort  (cost=570206.08..576410.19 rows=2481644 width=28) (actual time=1065.599..1297.608 rows=2553971 loops=1)
         Sort Key: minion_jobs.worker
         Sort Method: external merge  Disk: 84992kB
         ->  Seq Scan on minion_jobs  (cost=0.00..162417.44 rows=2481644 width=28) (actual time=0.089..539.705 rows=2553971 loops=1)
 Planning Time: 0.294 ms
 Execution Time: 1563.877 ms
(16 rows)

After:

EXPLAIN ANALYZE   select
  (select count(*) from minion_jobs mj where state = 'inactive' and (expires IS NULL OR expires > NOW()))  as inactive_jobs,
  (select count(*) from minion_jobs mj where state = 'active') as active_jobs,
  (select count(*) from minion_jobs mj where state = 'failed') as failed_jobs,
  (select count(*) from minion_jobs mj where state = 'finished') as finished_jobs,
  (select count(*) from minion_jobs mj where state = 'inactive' and delayed > NOW()) as delayed_jobs,
  (select COUNT(*) from minion_locks where expires > NOW()) as active_locks,
  (select count(DISTINCT worker)  from minion_jobs mj  where state = 'active') as active_workers,
  (select CASE WHEN is_called THEN last_value ELSE 0 END FROM minion_jobs_id_seq) as enqueued_jobs,
  (select COUNT(*) from minion_workers) as workers,
  EXTRACT(EPOCH FROM NOW() - PG_POSTMASTER_START_TIME()) as uptime;
                                                                                                 QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=85910.23..85910.25 rows=1 width=104) (actual time=147.450..154.471 rows=1 loops=1)
   InitPlan 1 (returns $0)
     ->  Aggregate  (cost=2556.08..2556.09 rows=1 width=8) (actual time=0.023..0.023 rows=1 loops=1)
           ->  Index Scan using minion_jobs_state_priority_id_idx on minion_jobs mj  (cost=0.43..2552.56 rows=1406 width=0) (actual time=0.021..0.021 rows=0 loops=1)
                 Index Cond: (state = 'inactive'::minion_state)
                 Filter: ((expires IS NULL) OR (expires > now()))
   InitPlan 2 (returns $1)
     ->  Aggregate  (cost=4.45..4.46 rows=1 width=8) (actual time=0.012..0.012 rows=1 loops=1)
           ->  Index Only Scan using minion_jobs_state_priority_id_idx on minion_jobs mj_1  (cost=0.43..4.45 rows=1 width=0) (actual time=0.010..0.010 rows=1 loops=1)
                 Index Cond: (state = 'active'::minion_state)
                 Heap Fetches: 1
   InitPlan 3 (returns $2)
     ->  Aggregate  (cost=4.45..4.46 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=1)
           ->  Index Only Scan using minion_jobs_state_priority_id_idx on minion_jobs mj_2  (cost=0.43..4.45 rows=1 width=0) (actual time=0.003..0.003 rows=0 loops=1)
                 Index Cond: (state = 'failed'::minion_state)
                 Heap Fetches: 0
   InitPlan 4 (returns $4)
     ->  Finalize Aggregate  (cost=80780.33..80780.34 rows=1 width=8) (actual time=147.368..154.383 rows=1 loops=1)
           ->  Gather  (cost=80780.12..80780.33 rows=2 width=8) (actual time=147.331..154.380 rows=3 loops=1)
                 Workers Planned: 2
                 Workers Launched: 2
                 ->  Partial Aggregate  (cost=79780.12..79780.13 rows=1 width=8) (actual time=142.654..142.655 rows=1 loops=3)
                       ->  Parallel Index Only Scan using minion_jobs_state_priority_id_idx on minion_jobs mj_3  (cost=0.43..77196.54 rows=1033432 width=0) (actual time=0.047..105.646 rows=851323 loops=3)
                             Index Cond: (state = 'finished'::minion_state)
                             Heap Fetches: 2143
   InitPlan 5 (returns $5)
     ->  Aggregate  (cost=2552.57..2552.58 rows=1 width=8) (actual time=0.008..0.008 rows=1 loops=1)
           ->  Index Scan using minion_jobs_state_priority_id_idx on minion_jobs mj_4  (cost=0.43..2552.56 rows=1 width=0) (actual time=0.006..0.006 rows=0 loops=1)
                 Index Cond: (state = 'inactive'::minion_state)
                 Filter: (delayed > now())
   InitPlan 6 (returns $6)
     ->  Aggregate  (cost=3.03..3.04 rows=1 width=8) (actual time=0.008..0.008 rows=1 loops=1)
           ->  Seq Scan on minion_locks  (cost=0.00..3.03 rows=2 width=0) (actual time=0.007..0.007 rows=0 loops=1)
                 Filter: (expires > now())
   InitPlan 7 (returns $7)
     ->  Aggregate  (cost=6.22..6.23 rows=1 width=8) (actual time=0.011..0.011 rows=1 loops=1)
           ->  Sort  (cost=6.21..6.22 rows=1 width=8) (actual time=0.009..0.009 rows=1 loops=1)
                 Sort Key: mj_5.worker
                 Sort Method: quicksort  Memory: 25kB
                 ->  Index Scan using minion_jobs_state_priority_id_idx on minion_jobs mj_5  (cost=0.43..6.20 rows=1 width=8) (actual time=0.004..0.004 rows=1 loops=1)
                       Index Cond: (state = 'active'::minion_state)
   InitPlan 8 (returns $8)
     ->  Seq Scan on minion_jobs_id_seq  (cost=0.00..1.51 rows=1 width=8) (actual time=0.003..0.004 rows=1 loops=1)
   InitPlan 9 (returns $9)
     ->  Aggregate  (cost=1.51..1.52 rows=1 width=8) (actual time=0.004..0.005 rows=1 loops=1)
           ->  Seq Scan on minion_workers  (cost=0.00..1.51 rows=1 width=0) (actual time=0.003..0.003 rows=1 loops=1)
 Planning Time: 0.356 ms
 Execution Time: 154.546 ms
(48 rows)

@rshingleton
Copy link
Author

rshingleton commented Jun 2, 2024

PostgreSQL 14.5:

Existing

EXPLAIN ANALYZE SELECT COUNT(*) FILTER (WHERE state = 'inactive' AND (expires IS NULL OR expires > NOW())) AS inactive_jobs,
       COUNT(*) FILTER (WHERE state = 'active') AS active_jobs, COUNT(*) FILTER (WHERE state = 'failed') AS failed_jobs,
       COUNT(*) FILTER (WHERE state = 'finished') AS finished_jobs,
       COUNT(*) FILTER (WHERE state = 'inactive' AND delayed > NOW()) AS delayed_jobs,
       (SELECT COUNT(*) FROM minion_locks WHERE expires > NOW()) AS active_locks,
       COUNT(DISTINCT worker) FILTER (WHERE state = 'active') AS active_workers,
       (SELECT CASE WHEN is_called THEN last_value ELSE 0 END FROM minion_jobs_id_seq) AS enqueued_jobs,
       (SELECT COUNT(*) FROM minion_workers) AS workers,
       EXTRACT(EPOCH FROM NOW() - PG_POSTMASTER_START_TIME()) AS uptime
     FROM minion_jobs;

Plan

Aggregate  (cost=1829427.65..1829427.67 rows=1 width=104) (actual time=9728.803..9728.806 rows=1 loops=1)
  InitPlan 1 (returns $0)
    ->  Aggregate  (cost=24.74..24.75 rows=1 width=8) (actual time=0.012..0.013 rows=1 loops=1)
          ->  Index Only Scan using minion_locks_name_expires_idx on minion_locks  (cost=0.15..23.85 rows=357 width=0) (actual time=0.010..0.011 rows=0 loops=1)
                Index Cond: (expires > now())
                Heap Fetches: 0
  InitPlan 2 (returns $1)
    ->  Seq Scan on minion_jobs_id_seq  (cost=0.00..1.01 rows=1 width=8) (actual time=0.006..0.006 rows=1 loops=1)
  InitPlan 3 (returns $2)
    ->  Aggregate  (cost=6.37..6.38 rows=1 width=8) (actual time=0.041..0.041 rows=1 loops=1)
          ->  Index Only Scan using minion_workers_pkey on minion_workers  (cost=0.14..6.31 rows=24 width=0) (actual time=0.012..0.038 rows=24 loops=1)
                Heap Fetches: 20
  ->  Seq Scan on minion_jobs  (cost=0.00..1582458.30 rows=6173430 width=28) (actual time=0.026..8965.461 rows=6179362 loops=1)
Planning Time: 0.242 ms
Execution Time: 9728.917 ms

Proposed

EXPLAIN ANALYZE select 
(select count(*) from minion_jobs mj where state = 'inactive' and (expires IS NULL OR expires > NOW()))  as inactive_jobs,
(select count(*) from minion_jobs mj where state = 'active') as active_jobs,
(select count(*) from minion_jobs mj where state = 'failed') as failed_jobs,
(select count(*) from minion_jobs mj where state = 'finished') as finished_jobs,
(select count(*) from minion_jobs mj where state = 'inactive' and delayed > NOW()) as delayed_jobs,
(select COUNT(*) from minion_locks where expires > NOW()) as active_locks,
(select count(DISTINCT worker)  from minion_jobs mj  where state = 'active') as active_workers,
(select CASE WHEN is_called THEN last_value ELSE 0 END FROM minion_jobs_id_seq) as enqueued_jobs,
(select COUNT(*) from minion_workers) as workers,
EXTRACT(EPOCH FROM NOW() - PG_POSTMASTER_START_TIME()) as uptime

Plan

Result  (cost=362396.01..362396.03 rows=1 width=104) (actual time=1466.206..1515.356 rows=1 loops=1)
  InitPlan 1 (returns $0)
    ->  Aggregate  (cost=137.29..137.30 rows=1 width=8) (actual time=0.737..0.739 rows=1 loops=1)
          ->  Index Scan using minion_jobs_state_priority_id_idx on minion_jobs mj  (cost=0.43..137.23 rows=27 width=0) (actual time=0.733..0.733 rows=0 loops=1)
                Index Cond: (state = 'inactive'::minion_state)
                Filter: ((expires IS NULL) OR (expires > now()))
  InitPlan 2 (returns $1)
    ->  Aggregate  (cost=25.00..25.01 rows=1 width=8) (actual time=2.649..2.649 rows=1 loops=1)
          ->  Index Only Scan using minion_jobs_state_priority_id_idx on minion_jobs mj_1  (cost=0.43..24.48 rows=206 width=0) (actual time=0.017..2.626 rows=231 loops=1)
                Index Cond: (state = 'active'::minion_state)
                Heap Fetches: 2257
  InitPlan 3 (returns $2)
    ->  Aggregate  (cost=4750.11..4750.12 rows=1 width=8) (actual time=19.534..19.535 rows=1 loops=1)
          ->  Index Only Scan using minion_jobs_state_priority_id_idx on minion_jobs mj_2  (cost=0.43..4647.22 rows=41156 width=0) (actual time=0.013..17.481 rows=42868 loops=1)
                Index Cond: (state = 'failed'::minion_state)
                Heap Fetches: 10280
  InitPlan 4 (returns $4)
    ->  Finalize Aggregate  (cost=357177.47..357177.48 rows=1 width=8) (actual time=1440.811..1489.951 rows=1 loops=1)
          ->  Gather  (cost=357177.25..357177.46 rows=2 width=8) (actual time=1440.677..1489.944 rows=3 loops=1)
                Workers Planned: 2
                Workers Launched: 2
                ->  Partial Aggregate  (cost=356177.25..356177.26 rows=1 width=8) (actual time=1436.191..1436.192 rows=1 loops=3)
                      ->  Parallel Index Only Scan using minion_jobs_state_priority_id_idx on minion_jobs mj_3  (cost=0.43..349789.68 rows=2555028 width=0) (actual time=0.052..1322.820 rows=2046647 loops=3)
                            Index Cond: (state = 'finished'::minion_state)
                            Heap Fetches: 1316115
  InitPlan 5 (returns $5)
    ->  Aggregate  (cost=137.23..137.24 rows=1 width=8) (actual time=0.374..0.375 rows=1 loops=1)
          ->  Index Scan using minion_jobs_state_priority_id_idx on minion_jobs mj_4  (cost=0.43..137.23 rows=1 width=0) (actual time=0.372..0.373 rows=0 loops=1)
                Index Cond: (state = 'inactive'::minion_state)
                Filter: (delayed > now())
  InitPlan 6 (returns $6)
    ->  Aggregate  (cost=24.74..24.75 rows=1 width=8) (actual time=0.005..0.006 rows=1 loops=1)
          ->  Index Only Scan using minion_locks_name_expires_idx on minion_locks  (cost=0.15..23.85 rows=357 width=0) (actual time=0.004..0.004 rows=0 loops=1)
                Index Cond: (expires > now())
                Heap Fetches: 0
  InitPlan 7 (returns $7)
    ->  Aggregate  (cost=136.71..136.72 rows=1 width=8) (actual time=2.033..2.034 rows=1 loops=1)
          ->  Index Scan using minion_jobs_state_priority_id_idx on minion_jobs mj_5  (cost=0.43..136.20 rows=206 width=8) (actual time=0.023..1.985 rows=231 loops=1)
                Index Cond: (state = 'active'::minion_state)
  InitPlan 8 (returns $8)
    ->  Seq Scan on minion_jobs_id_seq  (cost=0.00..1.01 rows=1 width=8) (actual time=0.007..0.008 rows=1 loops=1)
  InitPlan 9 (returns $9)
    ->  Aggregate  (cost=6.37..6.38 rows=1 width=8) (actual time=0.037..0.037 rows=1 loops=1)
          ->  Index Only Scan using minion_workers_pkey on minion_workers  (cost=0.14..6.31 rows=24 width=0) (actual time=0.016..0.033 rows=24 loops=1)
                Heap Fetches: 24
Planning Time: 0.348 ms
Execution Time: 1515.482 ms

Edited by @kraih: fixed code formatting

@kraih
Copy link
Member

kraih commented Jun 2, 2024

And don't forget to squash commits when you fixed the SQL formatting.

@kraih
Copy link
Member

kraih commented Jun 2, 2024

Pretty good sign that the plans from 14.5 to 16 are so similar, we can probably merge this PR once it is ready.

…to increase performance on larger datasets.

reformat sql to match existing style
FROM minion_jobs"
)->hash;
my $stats = $self->pg->db->query(qq{
SELECT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation and quoting is still inconsistent, otherwise it looks ok.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see three different quoting formats in the existing module:

  • '
  • "
  • q{}

I can modify to whichever one is preferable. Below is modified to use q{} instead of the current qq{} and I pushed the first line indentation in to match some of the other queries at 4 spaces.

sub stats {
  my $self = shift;

  my $stats = $self->pg->db->query(q{
    SELECT
    (SELECT COUNT(*) FROM minion_jobs WHERE state = 'inactive' AND (expires IS NULL OR expires > NOW())) AS inactive_jobs,
    (SELECT COUNT(*) FROM minion_jobs WHERE state = 'active') AS active_jobs,
    (SELECT COUNT(*) FROM minion_jobs WHERE state = 'failed') AS failed_jobs,
    (SELECT COUNT(*) FROM minion_jobs WHERE state = 'finished') AS finished_jobs,
    (SELECT COUNT(*) FROM minion_jobs WHERE state = 'inactive' AND delayed > NOW()) AS delayed_jobs,
    (SELECT COUNT(*) FROM minion_locks WHERE expires > NOW()) AS active_locks,
    (SELECT COUNT(DISTINCT worker) FROM minion_jobs mj WHERE state = 'active') AS active_workers,
    (SELECT CASE WHEN is_called THEN last_value ELSE 0 END FROM minion_jobs_id_seq) AS enqueued_jobs,
    (SELECT COUNT(*) FROM minion_workers) AS workers,
    EXTRACT(EPOCH FROM NOW() - PG_POSTMASTER_START_TIME()) AS uptime
  })->hash;
  $stats->{inactive_workers} = $stats->{workers} - $stats->{active_workers};

  return $stats;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just make it look exactly like one of the existing cases, including quotation token in front of and after the SQL string.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We indent with 2 spaces everywhere.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll need to let me know if the indentation is how you want it. Since this is not like any other queries in the module, I indented 2 spaces after the initial SELECT statement for each sub-select and the extract, there are no WHERE or FROM statements.

I modified quoting to just use double quotes and wrapped it similarly to the other queries in the module.

…to increase performance on larger datasets.

reformat sql to match existing style
Copy link
Member

@jberger jberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

Copy link
Member

@marcusramberg marcusramberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change LGTM, and the numbers are convincing.

@mergify mergify bot merged commit 1c33cd6 into mojolicious:main Jun 5, 2024
7 checks passed
@kraih
Copy link
Member

kraih commented Jun 5, 2024

Unfortunate that the commits were not squashed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants