-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add filtering commands. #63
base: master
Are you sure you want to change the base?
Conversation
These new commands allow running queries against a filter that matches 1% or 10% of documents. Filters are interesting because some optimizations that are easy/obvious for exhaustive evaluation become more complicated when a filter is applied. Yet filters are common, think of an e-commerce search filtered by category for instance. Only the Lucene 10.0 engine supports filtering for now because I'm not too familiar with Rust, but I assume that it should be easy to add support for it in a similar fashion.
Thanks for the PR, filtering is a great addition. I think the query side should be handled via the query list, with an added tag. The commands are more like different collectors. |
This is how I started, but I would really like to see how all queries perform when a filter is applied, not just a few of them, and duplicating all queries didn't feel right (especially if duplicated twice, once for the 1% filter and another time for the 10% filter). This would also make it annoying to add more queries to the benchmark. So my second idea was to add it as another dimension to the benchmark (one dimension is the command (=collector), another one the query, another one the filter density) but it felt a bit over-engineered. So I came to this 3rd approach of coupling it with the command, which didn't feel great at first, but now feels to me like the least worst approach? |
I think duplicating should be fine, but we could have it in code when loading the queries. This has the advantage that you can easily get an overview and compare the different results with a single run. We may add a FILTER_TAG option or similar to filter queries with certain tags. Another things that's missing currently is searching on multiple fields, which is probably the much more common use case. |
FWIW I have started looking at applying @PSeitz 's approach: https://github.com/quickwit-oss/search-benchmark-game/compare/master...jpountz:search-benchmark-game:filtered_queries?expand=1 in case you want to take a look (the query parsing bits are still missing). |
Good point. I think the complexity should be the same for both, where we need to have a special query handling to pass in the filter, but the usage with the queries approach should be easier and would require just one run instead of 3 or 4 runs to get the full picture. |
These new commands allow running queries against a filter that matches 1% or 10% of documents.
Filters are interesting because some optimizations that are easy/obvious for exhaustive evaluation become more complicated when a filter is applied. Yet filters are common, think of an e-commerce search filtered by category for instance.
Only the Lucene 10.0 engine supports filtering for now because I'm not too familiar with Rust, but I assume that it should be easy to add support for it to the Tantivy engine.