Compound vs Singular Indexes? #327

bcollierjones · 2021-08-10T03:15:27Z

bcollierjones
Aug 10, 2021

I started playing around with good_job tonight as it seems like a pretty compelling option for our use case: ActiveJob native, uses existing db server, nice development runtime, job uniqueness, and the newly added cron was the final killer feature. Thank you!

I'm also new to PostgreSQL, so while I'm enjoying some of the new toys to play with, I'm also learning some new best practices vs MySQL.

I was noticing in the generated migrations that a few compound indexes were created that had overlapping columns and no uniqueness requirements. Coming from MySQL, where I had to do this a fair amount, it was my understanding that PostgreSQL is smart about mixing and matching its use of indexes. Is there something in particular about your use case that the compound indexes optimize for? I ask mostly to help correct any misconceptions I have.

As a thought, I broke up the indexes to remove the overlap (I think) while maintaining the partial indexes as appropriate. I'd think the biggest win would be eliminating duplication of created_at (on all rows) and scheduled_at (on non-finished rows) in multiple indexes. This does add an additional index, so maybe that's worse.

Edit: After a few moments, I'm wondering it's a sorting thing since job_id and cron_key aren't ints...

# Original indexes from migration (with name parameter removed for clarity)
add_index :good_jobs, :scheduled_at, where: "(finished_at IS NULL)"
add_index :good_jobs, [:queue_name, :scheduled_at], where: "(finished_at IS NULL)"
add_index :good_jobs, [:active_job_id, :created_at]
add_index :good_jobs, :concurrency_key, where: "(finished_at IS NULL)"
add_index :good_jobs, [:cron_key, :created_at]

# Singular indexes based on the same conditions
add_index :good_jobs, :scheduled_at, where: "(finished_at IS NULL)"
add_index :good_jobs, :queue_name, where: "(finished_at IS NULL)"
add_index :good_jobs, :concurrency_key, where: "(finished_at IS NULL)"
add_index :good_jobs, :active_job_id
add_index :good_jobs, :created_at
add_index :good_jobs, :cron_key

Thank you again!

reczy · 2021-08-10T04:05:41Z

reczy
Aug 10, 2021

Without commenting on specific decisions made for this gem, you might find the following two pages from the postgres documentation helpful (if you haven't seen them already):

https://www.postgresql.org/docs/current/indexes-multicolumn.html
https://www.postgresql.org/docs/current/indexes-bitmap-scans.html

I find the last paragraph of the second link particularly informative:

In all but the simplest applications, there are various combinations of indexes that might be useful, and the database developer must make trade-offs to decide which indexes to provide. Sometimes multicolumn indexes are best, but sometimes it's better to create separate indexes and rely on the index-combination feature. For example, if your workload includes a mix of queries that sometimes involve only column x, sometimes only column y, and sometimes both columns, you might choose to create two separate indexes on x and y, relying on index combination to process the queries that use both columns. You could also create a multicolumn index on (x, y). This index would typically be more efficient than index combination for queries involving both columns, but as discussed in Section 11.3, it would be almost useless for queries involving only y, so it should not be the only index. A combination of the multicolumn index and a separate index on y would serve reasonably well. For queries involving only x, the multicolumn index could be used, though it would be larger and hence slower than an index on x alone. The last alternative is to create all three indexes, but this is probably only reasonable if the table is searched much more often than it is updated and all three types of query are common. If one of the types of query is much less common than the others, you'd probably settle for creating just the two indexes that best match the common types.

0 replies

bensheldon · 2021-08-11T13:39:33Z

bensheldon
Aug 11, 2021
Maintainer

To be honest, I don't have a strong reason (e.g. based on explicit profiling for each index and use case), but just my own inertia and experience. The explanation of that is: the majority of queries are time- and order- based and I'm trying to optimize for index scans. And I read Use the Index Luke maybe 5 years ago (it's an excellent book).

More generally, I think indexes are relatively inexpensive, and a slow query can be expensive. Which is also an explanation/apology for defensive indexing.

If you did want to dig into query plans, that would be really helpful. I like this Active Record Explain-Analyze gem (there are a couple): 6/activerecord-explain-analyze#3

1 reply

bcollierjones Aug 13, 2021
Author

What caught me off guard was that the first column of the indexes were "key" type fields, that are probably unique and random and would work against any order the second column might provide. Now if these keys are duplicated (old jobs, retries, concurrent jobs) then it's a different story.

As I play with the gem I'll see what I come up with. I'm still early in the process so might be a bit. I appreciate the reply.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compound vs Singular Indexes? #327

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

Compound vs Singular Indexes? #327

bcollierjones Aug 10, 2021

Replies: 2 comments · 1 reply

reczy Aug 10, 2021

bensheldon Aug 11, 2021 Maintainer

bcollierjones Aug 13, 2021 Author

bcollierjones
Aug 10, 2021

Replies: 2 comments 1 reply

reczy
Aug 10, 2021

bensheldon
Aug 11, 2021
Maintainer

bcollierjones Aug 13, 2021
Author