Skip to content

Commit

Permalink
Enable Posgres trigram extension and add class instance name index
Browse files Browse the repository at this point in the history
This adds a trigram index to the class_instance table to speed up name
queries that involve wildcards and regular expressions. These are
common, especially in the Neuron Search Widget. Since we also allow
wildcards at the end of a name (e.g. when searching for a neuron name),
B-tree can't be used. Trigram however works fine with this and is
included in a regular Postgres setup. It only needs to be enabled
through an extension.

For small instances, these queries are already fast. However on large
instances, sequential scans on the class_instance table can be quite
slow when doing name lookups. For instance, in a large test instance I
have 215,707,715 class instance entries and looking for a name takes
about ~30 seconds on a heavily optimized Postgres database. Using the
trigram index bring this down to 2 ms (!). The size of this index is
comparable to the foreign key index. The example table has a size of 19
GB, the primary key index is ~4.5 GB and the trigram index ~5 GB. Given
that neuron name search and annotation search is done a lot, I feel this
increase in space requirement is justified.

A simple name search looks like this (typing "aSIP" into the neuron name
search):

  SELECT ci.id, ci.user_id, ci.creation_time, ci.edition_time,
    ci.project_id, ci.class_id, ci.name, skel_link.skeletons
  FROM class_instance ci
  LEFT JOIN LATERAL (
    SELECT cici_n.class_instance_b AS id,
    array_agg(cici_n.class_instance_a) AS skeletons
    FROM class_instance_class_instance cici_n
    WHERE cici_n.class_instance_b = ci.id
    AND cici_n.relation_id = 10
    GROUP BY 1
  ) skel_link ON ci.id = skel_link.id
  WHERE ci.project_id = 1
    AND ci.class_id = ANY (ARRAY[3,6])
    AND ci.name ~~* '%aSIP%'
  ORDER BY id ASC;

The query plan without the trigram index looks like this:

 Sort  (cost=3443312.34..3443339.28 rows=10779 width=83) (actual time=25818.498..25818.498 rows=5 loops=1)
   Sort Key: ci.id
   Sort Method: quicksort  Memory: 25kB
   Buffers: shared hit=9144 read=2442104
   ->  Nested Loop Left Join  (cost=1000.57..3442590.36 rows=10779 width=83) (actual time=26.659..25818.478 rows=5 loops=1)
         Buffers: shared hit=9144 read=2442104
         ->  Gather  (cost=1000.00..3397022.14 rows=10779 width=51) (actual time=26.596..25860.168 rows=5 loops=1)
               Workers Planned: 4
               Workers Launched: 4
               Buffers: shared hit=9119 read=2442104
               ->  Parallel Seq Scan on class_instance ci  (cost=0.00..3394944.24 rows=2695 width=51) (actual time=10386.978..25810.516 rows=1 loops=5)
                     Filter: ((class_id = ANY ('{3,6}'::integer[])) AND ((name)::text ~~* '%aSIP%'::text) AND (project_id = 1))
                     Rows Removed by Filter: 43141542
                     Buffers: shared hit=9119 read=2442104
         ->  Subquery Scan on skel_link  (cost=0.57..4.22 rows=1 width=36) (actual time=0.028..0.028 rows=0 loops=5)
               Filter: (ci.id = skel_link.id)
               Buffers: shared hit=25
               ->  GroupAggregate  (cost=0.57..4.21 rows=1 width=36) (actual time=0.024..0.025 rows=0 loops=5)
                     Group Key: cici_n.class_instance_b
                     Buffers: shared hit=25
                     ->  Index Scan using class_instance_class_instance_class_instance_b on class_instance_class_instance cici_n  (cost=0.57..4.19 rows=1 width=8) (actual time=0.016..0.016 rows=0 loops=5)
                           Index Cond: (class_instance_b = ci.id)
                           Filter: (relation_id = 10)
                           Rows Removed by Filter: 1
                           Buffers: shared hit=25
 Planning Time: 1.321 ms
 Execution Time: 25860.449 ms

And the query plan using the trigram index looks like this:

 Sort  (cost=84130.89..84157.84 rows=10779 width=83) (actual time=0.123..0.123 rows=5 loops=1)
   Sort Key: ci.id
   Sort Method: quicksort  Memory: 25kB
   Buffers: shared hit=33
   ->  Nested Loop Left Join  (cost=413.44..83408.92 rows=10779 width=83) (actual time=0.070..0.110 rows=5 loops=1)
         Buffers: shared hit=33
         ->  Bitmap Heap Scan on class_instance ci  (cost=412.88..37840.70 rows=10779 width=51) (actual time=0.041..0.053 rows=5 loops=1)
               Recheck Cond: ((name)::text ~~* '%aSIP%'::text)
               Filter: ((class_id = ANY ('{3,6}'::integer[])) AND (project_id = 1))
               Heap Blocks: exact=3
               Buffers: shared hit=8
               ->  Bitmap Index Scan on class_instance_name_trgm_idx  (cost=0.00..410.18 rows=21571 width=0) (actual time=0.030..0.030 rows=5 loops=1)
                     Index Cond: ((name)::text ~~* '%aSIP%'::text)
                     Buffers: shared hit=5
         ->  Subquery Scan on skel_link  (cost=0.57..4.22 rows=1 width=36) (actual time=0.010..0.010 rows=0 loops=5)
               Filter: (ci.id = skel_link.id)
               Buffers: shared hit=25
               ->  GroupAggregate  (cost=0.57..4.21 rows=1 width=36) (actual time=0.010..0.010 rows=0 loops=5)
                     Group Key: cici_n.class_instance_b
                     Buffers: shared hit=25
                     ->  Index Scan using class_instance_class_instance_class_instance_b on class_instance_class_instance cici_n  (cost=0.57..4.19 rows=1 width=8) (actual time=0.007..0.007 rows=0 loops=5)
                           Index Cond: (class_instance_b = ci.id)
                           Filter: (relation_id = 10)
                           Rows Removed by Filter: 1
                           Buffers: shared hit=25
 Planning Time: 1.362 ms
 Execution Time: 0.194 ms

No sequential scan or even parallel workers are needed anymore.
  • Loading branch information
tomka committed Dec 1, 2018
1 parent 8f0d4bd commit 8a8ce7e
Show file tree
Hide file tree
Showing 6 changed files with 53 additions and 5 deletions.
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,10 @@

- Python 3.7 is now supported.

- Should migration 0057 fail due a permission error, the Postgres extension
"pg_trgm" has to be installed manually into the CATMAID database using a
Postgres superuser: CREATE EXTENSION pg_trgm;


### Features and enhancements

Expand All @@ -22,6 +26,8 @@ Miscellaneous:
- Confirming a radius selection (Shift + Y) can now also be done using the Enter
key.

- Neuron name searches and annotations should be much faster on larger
instances.

### Bug fixes

Expand Down
4 changes: 4 additions & 0 deletions UPDATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,10 @@ and other administration related changes are listed in order.
`django/applications/performancetests/migrations/` and
`django/applications/pgcompat/migrations/` need to be deleted.

- Should migration 0057 fail due a permission error, the Postgres extension
"pg_trgm" has to be installed manually into the CATMAID database using a
Postgres superuser: CREATE EXTENSION pg_trgm;

## 2018.11.09

- Python 3 is now required for the back-end. We recommend the use of Python 3.6.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# -*- coding: utf-8 -*-
from django.db import migrations
from django.contrib.postgres.operations import TrigramExtension


forward = """
CREATE INDEX class_instance_name_trgm_idx
ON class_instance USING gin (name gin_trgm_ops);
"""


backward = """
DROP INDEX class_instance_name_trgm_idx;
"""


class Migration(migrations.Migration):
"""This adds a trigram index to the class_instance table to speed up name
queries that involve wildcards and regular expressions. Since we also allow
wildcards at the end of a name (e.g. when searching for a neuron name),
B-tree can't be used. Trigram however works fine with this and is included
in a regular Postgres setup. It only needs to be enabled through an
extension. This migration tries to install the extension, but will also try
to setup the index if extension creation is not allowed for the CATMAID
user. This is done, because we expect the database administrator to have
installed this extension separately using "CREATE EXTENSION pg_trgm;"
"""

dependencies = [
('catmaid', '0057_merge_summary_bugfix'),
]

operations = [
TrigramExtension(),
migrations.RunSQL(forward, backward),
]
1 change: 1 addition & 0 deletions scripts/createuser.sh
Original file line number Diff line number Diff line change
Expand Up @@ -28,4 +28,5 @@ CREATE DATABASE "$CATMAID_DATABASE" OWNER "$CATMAID_USER" ENCODING 'UTF8';
\connect $CATMAID_DATABASE
CREATE EXTENSION postgis;
CREATE EXTENSION pg_trgm;
EOSQL
1 change: 1 addition & 0 deletions scripts/travis/setup_database.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@ set -ev

psql -c 'CREATE DATABASE catmaid;' -U postgres
psql -c 'CREATE EXTENSION postgis;' -U postgres catmaid
psql -c 'CREATE EXTENSION pg_trgm;' -U postgres catmaid
10 changes: 5 additions & 5 deletions sphinx-doc/source/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -168,11 +168,11 @@ supports tiff (check e.g. with the help of "gm convert -list format").
3. Install and configure PostgreSQL
###################################

If you are comfortable with creating a new PostgreSQL database
for CATMAID, then you should do that and continue to the next
section. If you decide to do so, please make sure to also install the
``postgis`` extension for the new CATMAID database. The advice here is a
suggested approach for people who are unsure what to do.
If you are comfortable with creating a new PostgreSQL database for CATMAID, then
you should do that and continue to the next section. If you decide to do so,
please make sure to also install the ``postgis`` extension and the ``pg_trgm``
extension for the new CATMAID database. The advice here is a suggested approach
for people who are unsure what to do.

If you are uncomfortable with using the PostgreSQL interactive
terminal from the command line, you may wish to install an
Expand Down

0 comments on commit 8a8ce7e

Please sign in to comment.