-
Notifications
You must be signed in to change notification settings - Fork 559
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Use word boundary detection and give scores based on keyword matching
I'm using a helper library to implement a unicode algorithm, but I'm also detecting case changes within a word (from lower to upper case, or no case to some case, so the "o" in "Documents" doesn't count as a new word). Words are not searched--rather, the string is searched starting at a word boundary. That way multi-word sequences will correctly match. This is a basic solution to #260. Some things to consider: - We don't have options to control the case. If smart-case is disabled, the weights in compute_kw_score need to change. - Right now keyword score totally overpowers the frequency score. The frequency score is only a tie-breaker. They could be normalized and weighted so a much better frequency score would win despite a slightly worse keyword score. - Should we detect word endings for exact matches? I'm not sure it would give a good user experience. If I frequently access "src9" but not "src", I don't want "src" to win just because it more exactly matches what I typed. It's hard to refrain from typing a whole word. This gives an interesting wrong result with "c" being a perfect match for "/mnt/c/anything". - The above issue can be solved if we consider digits to start a new word. - I'm testing these changes with `cargo r -- query --list --score b`.
- Loading branch information
Showing
5 changed files
with
138 additions
and
14 deletions.
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters