Skip to content

Latest commit

 

History

History
201 lines (143 loc) · 6.57 KB

README.md

File metadata and controls

201 lines (143 loc) · 6.57 KB

django-postgres-searchindex

CI Version Licence PyPI Downloads

A bit like django-haystack, but everything in postgres, accessible via Django ORM, using postgres fullext search capabilites. The goal is to ease setup and maintainance for smaller and medium sized projects - without dependencies on search technology like elastic, solr or whoosh.

During conception, I was thinking about developing a backend for django-haystack, but decided against, to be able to develop from the ground up, as simple as possible. The project could still provide a haystack backend one day, but it was just not my priority.

Features

  • Searchindex in PostgreSQL
  • No dependencies besides Django and PostgreSQL
  • contrib.djangocms, for easy indexing of django-cms sites

Quickstart

Describe, index, search.

Define index(es) in django settings

Default value, simplest possible configuration:

POSTGRES_SEARCHINDEX = {
    "default": {},
}

Example for a multilanguage setup:

POSTGRES_SEARCHINDEX = {
    "de": {
        "kwargs": {
            "language": "de",
        }
    },
    "fr": {
        "kwargs": {
            "language": "fr",
        }
    },
}

More complex configurations could include django's SITE_ID or other relevant infos in searchindex key and kwargs.

Define sources

Example, hopefully self explaining.

import html

from django.utils.html import strip_tags
from postgres_searchindex.base import IndexSource / MultiLanguageIndexSource
from postgres_searchindex.source_pool import source_pool

from news.models import News

@source_pool.register
class NewsIndexSource(IndexSource / MultiLanguageIndexSource):
    model = News

    def get_title(self, obj):
        return strip_tags(obj.description)

    def get_content(self, obj):
        return html.unescape(strip_tags(obj.description))

    def get_queryset(self):
        return self.model.objects.published()

Place this code in index_sources.py of your app, and it will be autodiscovered.

Populate the index

Run ./manage.py postgres_searchindex_update to update/build the index.

» ./manage.py postgres_searchindex_update
====================================
Updating index "de" with kwargs {'language': 'de'}
Person. Indexing 5 entries
> Done. Removed from index: 0
Project. Indexing 66 entries
> Done. Removed from index: 0
Media. Indexing 36 entries
> Done. Removed from index: 2
====================================
Updating index "fr" with kwargs {'language': 'fr'}
Person. Indexing 5 entries
> Done. Removed from index: 0
Project. Indexing 66 entries
> Done. Removed from index: 0
Media. Indexing 36 entries
> Done. Removed from index: 2

If you want to control how things were indexed, you can check your IndexEntry instances in Django admin.

Search!

You can now search in your index. You are free to use Django's builtin fulltext features as you like - as in the following example, or in a way more advanced manner.

from django.contrib.postgres.search import SearchVector
from postgres_searchindex.models import IndexEntry

# this will return entries containing "überhaupt" and "uberhaupt"
IndexEntry.objects.annotate(
    search=SearchVector("content", "title", config="german")
).filter(index_key=self.request.LANGUAGE_CODE, search="uberhaupt")

There is a full example in the source: views.py and urls.py will give you an idea.

To be done: |highlight:query templatefilter, to highlight the serach query in the search result text.

Keep the index fresh

Either you'll regularly run ./manage.py postgres_searchindex_update, or you'll implement a realtime or near realtime solution, with signals, throug the POSTGRES_SEARCHINDEX_SIGNAL_PROCESSOR setting.

There are two currently one none (not yet) builtin processors:

  • postgres_searchindex.signal_processors.RealtimeSyncedSignalProcessor
  • postgres_searchindex.signal_processors.RealtimeCelerySignalProcessor

The async signal processor will require you to have celery configured.

contrib.djangocms

A few tools to speed up indexing of django-cms sites.

AppHook

Add postgres_searchindex.contrib.djangocms to settings.INSTALLED_APPS. Configure one of your cms pages to use the app hook "Search Form (postgres_searchindex)". It will provide a very basic search form, and you can override the template postgres_searchindex/search.html if you want.

Indexing of cms pages

Add postgres_searchindex.contrib.djangocms to settings.INSTALLED_APPS.
And set settings.POSTGRES_SEARCHINDEX_USE_CMS_INDEX = True to have your django-cms pages indexed automagically (with the next call of ./manage.py postgres_searchindex_rebuild).

Indexing models with a PlaceholderField

Example Event model, with a PlaceholderField called "content":

import html

from django.utils.html import strip_tags
from postgres_searchindex.base import MultiLanguageIndexSource
from postgres_searchindex.contrib.djangocms.base import PlaceholderIndexSourceMixin
from postgres_searchindex.source_pool import source_pool

from .models import Event

@source_pool.register
class EventIndexSource(PlaceholderIndexSourceMixin, MultiLanguageIndexSource):
    model = Event
    placeholder_field_name = "content"

    def get_content(self, obj):
        c = strip_tags(obj.description)  # prepend with preview/description
        c += super().get_content(obj)  # render placeholder
        c = html.unescape(c)  # convert & to "
        return c

    def get_queryset(self):
        return self.model.objects.published()

Inspired by haystack

I used django-haystack for a decade, and I really like the concept. Building my first index though, was quite time intensive. After development of haystack and also some of it's backends have sometimes stalled, I was regularly thinking about writing my own search index, with PostgreSQL only.

TODO

See open issues.