Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The home page query for new games was flagged as a slow query #1287

Open
dfabulich opened this issue Feb 17, 2025 · 9 comments
Open

The home page query for new games was flagged as a slow query #1287

dfabulich opened this issue Feb 17, 2025 · 9 comments
Labels
performance Make the site faster site stability slow query Slow database queries

Comments

@dfabulich
Copy link
Collaborator

This slow query log confuses me. It claims that the engine had to examine 14,000 rows to run this query, which really doesn't seem right to me.

# Time: 250217  2:36:52
# User@Host: ifdb[ifdb] @ localhost [127.0.0.1]
# Thread_id: 54364846  Schema: ifdb  QC_hit: No
# Query_time: 10.457211  Lock_time: 0.000006  Rows_sent: 7  Rows_examined: 28329
# Rows_affected: 0  Bytes_sent: 7893
# Tmp_tables: 1  Tmp_disk_tables: 1  Tmp_table_sizes: 8872420
# Full_scan: Yes  Full_join: No  Tmp_table: Yes  Tmp_table_on_disk: Yes
# Filesort: No  Filesort_on_disk: No  Merge_passes: 0  Priority_queue: No
#
# explain: id	select_type	table	type	possible_keys	key	key_len	ref	rows	r_rows	filtered	r_filtered	Extra
# explain: 1	SIMPLE	games	index	NULL	created	5	NULL	14161	14161.00	100.00	100.00	Using temporary
# explain: 1	SIMPLE	gameRatingsSandbox0_mv	eq_ref	PRIMARY	PRIMARY	130	ifdb.games.id	1	1.00	100.00	100.00	
#
use `ifdb`;
SET timestamp=1739759812;
select 
              distinct games.id as id,
                       games.title as title,
                       games.author as author,
                       games.desc as description,
                       games.tags as tags,
                       games.created as createdate,
                       games.moddate as moddate,
                       games.system as devsys,
                       if (time(games.published) = '00:00:00',
                           date_format(games.published, '%Y'),
                           date_format(games.published, '%M %e, %Y'))
                         as pubfmt,
                       if (time(games.published) = '00:00:00',
                           date_format(games.published, '%Y'),
                           date_format(games.published, '%Y-%m-%d'))
                         as published,
                       date_format(games.published, '%Y') as pubyear,
                       (games.coverart is not null) as hasart,
                       avgRating as avgrating,
                       numRatingsInAvg as ratingcnt,
                       stdDevRating as ratingdev,
                       numRatingsTotal,
                       numMemberReviews,
                       starsort,
                       games.sort_title as sort_title,
                       games.sort_author as sort_author,
                       ifnull(games.published, '9999-12-31') as sort_pub,
                       games.pagevsn,
                       games.flags
              
            from
              games
                          left join gameRatingsSandbox0_mv on games.id = gameid
              
            where
              1
              
            
            
            order by
              games.created desc
              
            limit 7;
@dfabulich dfabulich added performance Make the site faster site stability slow query Slow database queries labels Feb 17, 2025
@dfabulich
Copy link
Collaborator Author

Somehow, this appears to have taken down the site for a few minutes today.

@dfabulich
Copy link
Collaborator Author

Here's what I get when I explain it.

MariaDB [ifdb]> explain select
    ->               distinct games.id as id,
    ->                        games.title as title,
    ->                        games.author as author,
    ->                        games.desc as description,
    ->                        games.tags as tags,
    ->                        games.created as createdate,
    ->                        games.moddate as moddate,
    ->                        games.system as devsys,
    ->                        if (time(games.published) = '00:00:00',
    ->                            date_format(games.published, '%Y'),
    ->                            date_format(games.published, '%M %e, %Y'))
    ->                          as pubfmt,
    ->                        if (time(games.published) = '00:00:00',
    ->                            date_format(games.published, '%Y'),
    ->                            date_format(games.published, '%Y-%m-%d'))
    ->                          as published,
    ->                        date_format(games.published, '%Y') as pubyear,
    ->                        (games.coverart is not null) as hasart,
    ->                        avgRating as avgrating,
    ->                        numRatingsInAvg as ratingcnt,
    ->                        stdDevRating as ratingdev,
    ->                        numRatingsTotal,
    ->                        numMemberReviews,
    ->                        starsort,
    ->                        games.sort_title as sort_title,
    ->                        games.sort_author as sort_author,
    ->                        ifnull(games.published, '9999-12-31') as sort_pub,
    ->                        games.pagevsn,
    ->                        games.flags
    ->
    ->             from
    ->               games
    ->                           left join gameRatingsSandbox0_mv on games.id = gameid
    ->
    ->             where
    ->               1
    ->
    ->
    ->
    ->             order by
    ->               games.created desc
    ->
    ->             limit 7;
+------+-------------+------------------------+--------+---------------+---------+---------+---------------+------+-----------------+
| id   | select_type | table                  | type   | possible_keys | key     | key_len | ref           | rows | Extra           |
+------+-------------+------------------------+--------+---------------+---------+---------+---------------+------+-----------------+
|    1 | SIMPLE      | games                  | index  | NULL          | created | 5       | NULL          | 7    | Using temporary |
|    1 | SIMPLE      | gameRatingsSandbox0_mv | eq_ref | PRIMARY       | PRIMARY | 130     | ifdb.games.id | 1    |                 |
+------+-------------+------------------------+--------+---------------+---------+---------+---------------+------+-----------------+

@dfabulich
Copy link
Collaborator Author

Well, today I learned (or… maybe I'd forgotten and relearned?) about analyze instead of explain.

MariaDB [ifdb]> analyze select
    ->               distinct games.id as id,
    ->                        games.title as title,
    ->                        games.author as author,
    ->                        games.desc as description,
    ->                        games.tags as tags,
    ->                        games.created as createdate,
    ->                        games.moddate as moddate,
    ->                        games.system as devsys,
    ->                        if (time(games.published) = '00:00:00',
    ->                            date_format(games.published, '%Y'),
    ->                            date_format(games.published, '%M %e, %Y'))
    ->                          as pubfmt,
    ->                        if (time(games.published) = '00:00:00',
    ->                            date_format(games.published, '%Y'),
    ->                            date_format(games.published, '%Y-%m-%d'))
    ->                          as published,
    ->                        date_format(games.published, '%Y') as pubyear,
    ->                        (games.coverart is not null) as hasart,
    ->                        avgRating as avgrating,
    ->                        numRatingsInAvg as ratingcnt,
    ->                        stdDevRating as ratingdev,
    ->                        numRatingsTotal,
    ->                        numMemberReviews,
    ->                        starsort,
    ->                        games.sort_title as sort_title,
    ->                        games.sort_author as sort_author,
    ->                        ifnull(games.published, '9999-12-31') as sort_pub,
    ->                        games.pagevsn,
    ->                        games.flags
    ->
    ->             from
    ->               games
    ->                           left join gameRatingsSandbox0_mv on games.id = gameid
    ->
    ->             where
    ->               1
    ->
    ->
    ->
    ->             order by
    ->               games.created desc
    ->
    ->             limit 7;
+------+-------------+------------------------+--------+---------------+---------+---------+---------------+-------+----------+----------+------------+-----------------+
| id   | select_type | table                  | type   | possible_keys | key     | key_len | ref           | rows  | r_rows   | filtered | r_filtered | Extra           |
+------+-------------+------------------------+--------+---------------+---------+---------+---------------+-------+----------+----------+------------+-----------------+
|    1 | SIMPLE      | games                  | index  | NULL          | created | 5       | NULL          | 14161 | 14161.00 |   100.00 |     100.00 | Using temporary |
|    1 | SIMPLE      | gameRatingsSandbox0_mv | eq_ref | PRIMARY       | PRIMARY | 130     | ifdb.games.id | 1     | 1.00     |   100.00 |     100.00 |                 |
+------+-------------+------------------------+--------+---------------+---------+---------+---------------+-------+----------+----------+------------+-----------------+

I whittled it down to this:

MariaDB [ifdb]> analyze select id from games order by created desc limit 7;
+------+-------------+-------+-------+---------------+---------+---------+------+-------+--------+----------+------------+-------+
| id   | select_type | table | type  | possible_keys | key     | key_len | ref  | rows  | r_rows | filtered | r_filtered | Extra |
+------+-------------+-------+-------+---------------+---------+---------+------+-------+--------+----------+------------+-------+
|    1 | SIMPLE      | games | index | NULL          | created | 5       | NULL | 14161 | 7.00   |   100.00 |     100.00 |       |
+------+-------------+-------+-------+---------------+---------+---------+------+-------+--------+----------+------------+-------+

I had a very depressing conversation with ChatGPT about this. Basically, ChatGPT thinks we have to add some kind of filtering condition to force the index to cough up the seven newest games.

@dfabulich
Copy link
Collaborator Author

So, this looks good, but I feel there must be a better way. @salty-horse @alice-blue any bright ideas?

MariaDB [ifdb]> analyze select
    ->               distinct games.id as id,
    ->                        games.title as title,
    ->                        games.author as author,
    ->                        games.desc as description,
    ->                        games.tags as tags,
    ->                        games.created as createdate,
    ->                        games.moddate as moddate,
    ->                        games.system as devsys,
    ->                        if (time(games.published) = '00:00:00',
    ->                            date_format(games.published, '%Y'),
    ->                            date_format(games.published, '%M %e, %Y'))
    ->                          as pubfmt,
    ->                        if (time(games.published) = '00:00:00',
    ->                            date_format(games.published, '%Y'),
    ->                            date_format(games.published, '%Y-%m-%d'))
    ->                          as published,
    ->                        date_format(games.published, '%Y') as pubyear,
    ->                        (games.coverart is not null) as hasart,
    ->                        avgRating as avgrating,
    ->                        numRatingsInAvg as ratingcnt,
    ->                        stdDevRating as ratingdev,
    ->                        numRatingsTotal,
    ->                        numMemberReviews,
    ->                        starsort,
    ->                        games.sort_title as sort_title,
    ->                        games.sort_author as sort_author,
    ->                        ifnull(games.published, '9999-12-31') as sort_pub,
    ->                        games.pagevsn,
    ->                        games.flags
    ->
    ->             from
    ->               games
    ->                           left join gameRatingsSandbox0_mv on games.id = gameid
    ->
    ->             where
    ->               created > date_sub(now(), interval 90 day)
    ->
    ->
    ->
    ->             order by
    ->               games.created desc
    ->
    ->             limit 7;
+------+-------------+------------------------+--------+---------------+---------+---------+---------------+------+--------+----------+------------+------------------------------+
| id   | select_type | table                  | type   | possible_keys | key     | key_len | ref           | rows | r_rows | filtered | r_filtered | Extra                        |
+------+-------------+------------------------+--------+---------------+---------+---------+---------------+------+--------+----------+------------+------------------------------+
|    1 | SIMPLE      | games                  | range  | created       | created | 5       | NULL          | 329  | 188.00 |   100.00 |     100.00 | Using where; Using temporary |
|    1 | SIMPLE      | gameRatingsSandbox0_mv | eq_ref | PRIMARY       | PRIMARY | 130     | ifdb.games.id | 1    | 1.00   |   100.00 |     100.00 |                              |
+------+-------------+------------------------+--------+---------------+---------+---------+---------------+------+--------+----------+------------+------------------------------+

@dfabulich
Copy link
Collaborator Author

I guess we could just go back to caching logged-out results… or at least caching the date of the 7th most recently created game.

@alice-blue
Copy link
Contributor

alice-blue commented Feb 17, 2025

Are you concerned about logged-in queries as well as logged-out queries?

Caching seems like it would work for logged-out queries.

A limit on the past N days could work for both logged-out and logged-in queries (though for logged-in queries, I guess we'd want to see how the chosen number of days interacts with sandboxes and--if the filter games PR gets merged--game filters).

For logged-out queries, I don't know what makes one way better than another (caching vs. limiting days). I guess for caching, we'd have to be sure to re-run the query every time a game is added, deleted, merged, etc.?

I don't know what other options there might be. Or what it was that caused this query to slow down if it wasn't being flagged as slow before.

@dfabulich
Copy link
Collaborator Author

dfabulich commented Feb 17, 2025

Or what it was that caused this query to slow down if it wasn't being flagged as slow before.

That I can tell you: the site gets overloaded when it tries to run "too many" full-scan queries at once. Let's say the site can normally handle 10 requests per second, and each one takes 0.1s. (I'm making these numbers up.) But if we get 20 requests per second, each one starts taking 1s. This causes a cascade where the next 20 requests come in while the previous 20 requests are still running, so each query starts taking 2s, 10s, 30s. Eventually the site starts timing out. Eventually, the wave of traffic passes (perhaps because users/crawlers notice that the site is timing out), and then the "slow" queries stop being slow… until traffic rises again.

Fixing full-scan queries increases the maximum traffic capacity of the site.

So:

Are you concerned about logged-in queries as well as logged-out queries?

There are way more logged-out queries (partly because search-engine crawlers are always logged out), so logged-out queries are more important, but if we can help logged-in queries, too, that's good.

for logged-in queries, I guess we'd want to see how the chosen number of days interacts with sandboxes and--if the filter games PR gets merged--game filters

Game filters is the one most likely to get tripped up by this, but I think we could probably just ignore the problem for the game-filter PR. (I've been meaning to spend more time on this… work has been busy for the last few weeks.)

guess for caching, we'd have to be sure to re-run the query every time a game is added, deleted, merged, etc.?

Yeah, I'd say caching makes things more complex.

@dfabulich
Copy link
Collaborator Author

I think this took down the site for a few minutes again today. I'll try to take a crack at this in the next day or two.

@alice-blue
Copy link
Contributor

It occurred to me that a limit of 90 days could affect testing in the future. Because when we're testing, we're using database contents that are up to a quarter of a year old (I think?), and with a 90-day limit, there might be a few days a year when no new games would show up on the home page, or maybe a few, but not enough games to fill the slots.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Make the site faster site stability slow query Slow database queries
Projects
None yet
Development

No branches or pull requests

2 participants