Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

terms fuzzy matching #7

Open
cambro opened this issue May 10, 2016 · 3 comments
Open

terms fuzzy matching #7

cambro opened this issue May 10, 2016 · 3 comments

Comments

@cambro
Copy link
Member

cambro commented May 10, 2016

ideally some amount of fuzzy matching to terms would be automatically used to improve recall. Example would include pluralizations (stromatolite, stromatolites) hyphenations/slashes (stromatolitic-thrombolitic strom/throm).

Fancy option might include flag for explicitly not doing this.

@cambro
Copy link
Member Author

cambro commented Jun 20, 2016

Related... optional flag variations on terms (not in known dictionaries) and combinations (i.e., and vs. or)

@iross
Copy link
Member

iross commented Jun 21, 2016

Could you explain the "optional flag variations"? I'm not sure I follow that one.

Some (hyphens/slashes) we already have in place -- a paper with "stromatolitic-thrombolitic" will match for a term search of "stromatolite". Some of this (pluralizations) we'll get when we next build a new index (which I'm planning to do alongside an upgrade to ES 2.3 when the new servers arrive). Other pieces seem similar in concept to our proposed hierarchy crawler to clean up the signal (obviously requiring other terms in the hierarchy is just a special case of an AND combination).

@cambro
Copy link
Member Author

cambro commented Jul 9, 2016

here is a pretty complex example:
Include documents that match: [term='permeability' OR term='hydraulic conductivity' OR term='transmissivity'] AND [term='St Peter Sandstone' OR term='Tuscaloosa' OR term='Carbondale' OR term='Niobrara' OR term='Mount Simon Sandstone' OR term='Chattanooga Shale’]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants