Skip to content

Commit

Permalink
feat: Use forward and reverse tokenization
Browse files Browse the repository at this point in the history
We enable the "reverse" tokenization mode, which allow to search both in
forward and backward directions on tokens.
The forward mode allows to search from left to right, while the reverse
mode allows the opposite. For example, with the word "example", you can
search "exam" in forward mode, and "ample" in reverse.

We measured a 15-20% memory impact on enabling the reverse tokenization,
compared to the forward mode.
The "full" mode, allowing searching on all combinations, including in
the middle of the word, comes with ~70% memory increase. So, we decided
to to not enable it for now, as the cost/benefit ratio of such feature
is unclear.

The memory cost of enabling the reverse mode seems however reasonable.
  • Loading branch information
paultranvan committed Nov 26, 2024
1 parent bf664ef commit 8ac96fd
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion packages/cozy-dataproxy-lib/src/search/SearchEngine.ts
Original file line number Diff line number Diff line change
Expand Up @@ -180,7 +180,7 @@ export class SearchEngine {
const fieldsToIndex = SEARCH_SCHEMA[doctype]

const flexsearchIndex = new FlexSearch.Document<CozyDoc, true>({
tokenize: 'forward',
tokenize: 'reverse', // See https://github.com/nextapps-de/flexsearch?tab=readme-ov-file#tokenizer
encode: getSearchEncoder(),
// @ts-expect-error minlength is not described by Flexsearch types but exists
minlength: 2,
Expand Down

0 comments on commit 8ac96fd

Please sign in to comment.