Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combining ts_query AND IN (BIGINT[]) is ~4x slower than GIN #17

Open
andreak opened this issue May 24, 2017 · 0 comments
Open

Combining ts_query AND IN (BIGINT[]) is ~4x slower than GIN #17

andreak opened this issue May 24, 2017 · 0 comments

Comments

@andreak
Copy link

andreak commented May 24, 2017

Hi.

Having this index:

create index origo_email_delivery_fts_all_folder_idx ON origo_email_delivery using gin (fts_all, folder_id)

And this query:

EXPLAIN ANALYZE
SELECT
    del.entity_id,
    del.received_timestamp,
    del.received_timestamp <=> '3000-01-01' :: TIMESTAMP,
    del.folder_id
FROM origo_email_delivery del
WHERE del.fts_all @@ to_tsquery('simple', 'andreas&joseph')
      AND del.folder_id IN (44961, 204483, 44965, 2470519)
ORDER BY del.received_timestamp <=> '3000-01-01' :: TIMESTAMP
LIMIT 10 offset 10000

Using GIN-index:

┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                              QUERY PLAN                                                                               │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Limit  (cost=32018.97..32018.97 rows=1 width=32) (actual time=56.675..56.676 rows=10 loops=1)                                                                         │
│   ->  Sort  (cost=31994.41..32018.97 rows=9822 width=32) (actual time=56.077..56.469 rows=10010 loops=1)                                                              │
│         Sort Key: ((received_timestamp <=> '3000-01-01 00:00:00'::timestamp without time zone))                                                                       │
│         Sort Method: quicksort  Memory: 1541kB                                                                                                                        │
│         ->  Bitmap Heap Scan on origo_email_delivery del  (cost=404.67..31343.13 rows=9822 width=32) (actual time=35.911..53.027 rows=14806 loops=1)                  │
│               Recheck Cond: ((fts_all @@ '''andreas'' & ''joseph'''::tsquery) AND (folder_id = ANY ('{44961,204483,44965,2470519}'::bigint[])))                       │
│               Heap Blocks: exact=13043                                                                                                                                │
│               ->  Bitmap Index Scan on origo_email_delivery_fts_all_folder_idx  (cost=0.00..402.22 rows=9822 width=0) (actual time=34.493..34.493 rows=14806 loops=1) │
│                     Index Cond: ((fts_all @@ '''andreas'' & ''joseph'''::tsquery) AND (folder_id = ANY ('{44961,204483,44965,2470519}'::bigint[])))                   │
│ Planning time: 0.614 ms                                                                                                                                               │
│ Execution time: 56.726 ms                                                                                                                                             │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(11 rows)

And with RUM-index:

create index rum_idx ON origo_email_delivery using rum (fts_all rum_tsvector_addon_ops, folder_id, received_timestamp) WITH (attach=received_timestamp, "to"=fts_all)
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                       QUERY PLAN                                                                       │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Limit  (cost=32066.97..32066.97 rows=1 width=32) (actual time=217.899..217.900 rows=10 loops=1)                                                        │
│   ->  Sort  (cost=32042.41..32066.97 rows=9822 width=32) (actual time=217.300..217.688 rows=10010 loops=1)                                             │
│         Sort Key: ((received_timestamp <=> '3000-01-01 00:00:00'::timestamp without time zone))                                                        │
│         Sort Method: quicksort  Memory: 1541kB                                                                                                         │
│         ->  Bitmap Heap Scan on origo_email_delivery del  (cost=452.67..31391.13 rows=9822 width=32) (actual time=196.797..214.216 rows=14806 loops=1) │
│               Recheck Cond: ((fts_all @@ '''andreas'' & ''joseph'''::tsquery) AND (folder_id = ANY ('{44961,204483,44965,2470519}'::bigint[])))        │
│               Heap Blocks: exact=13043                                                                                                                 │
│               ->  Bitmap Index Scan on rum_idx  (cost=0.00..450.22 rows=9822 width=0) (actual time=195.369..195.369 rows=14806 loops=1)                │
│                     Index Cond: ((fts_all @@ '''andreas'' & ''joseph'''::tsquery) AND (folder_id = ANY ('{44961,204483,44965,2470519}'::bigint[])))    │
│ Planning time: 0.721 ms                                                                                                                                │
│ Execution time: 217.969 ms                                                                                                                             │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(11 rows)

217 / 56 = 3,875 time slower. Can anything be done to speed this up?

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants
@andreak @za-arthur and others