Skip to content
This repository has been archived by the owner on Apr 4, 2023. It is now read-only.

Store fuzzy/bucketed positions in word_position_docids database #746

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

loiclec
Copy link
Contributor

@loiclec loiclec commented Dec 15, 2022

Pull Request

Related issue

Fixes (when merged into meilisearch) meilisearch/meilisearch#3222

Implementation

The design is described well in the related issue. For details of how different relative positions are grouped together, see the test bucketed_position.

Basically, we no longer store the exact position of words that appear far into an attribute, but instead group relative positions together in buckets whose size grows exponentially with the original position. This is done to improve the relevancy and the performance of the attribute ranking rule.


This is a draft until #742 is merged and the results of the benchmarks are available.


EDIT: I also realised just now that the iterative version of the algorithm needs to be updated as well!

@loiclec loiclec added enhancement New feature or request indexing Related to the documents/settings indexing algorithms. querying Related to the searching/fetch data algorithms. DB breaking The related changes break the DB performance Related to the performance in term of search/indexation speed or RAM/CPU/Disk consumption labels Dec 15, 2022
@loiclec loiclec marked this pull request as draft December 15, 2022 10:41
@loiclec
Copy link
Contributor Author

loiclec commented Jan 2, 2023

I think I am going to postpone this improvement to v1.1 because:

  1. The iterative version of the algorithm also needs to be updated
  2. I found an unrelated bug in the implementation of the set-based version of the algorithm, and I would like to debug it first.
  3. The whole design of the attribute ranking rule will change a lot soon, and so will the whole structure of almost all search algorithms, so I don't want to duplicate the work too much

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
DB breaking The related changes break the DB enhancement New feature or request indexing Related to the documents/settings indexing algorithms. performance Related to the performance in term of search/indexation speed or RAM/CPU/Disk consumption querying Related to the searching/fetch data algorithms.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant