Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add filtering commands. #63

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jpountz
Copy link

@jpountz jpountz commented Nov 14, 2024

These new commands allow running queries against a filter that matches 1% or 10% of documents.

Filters are interesting because some optimizations that are easy/obvious for exhaustive evaluation become more complicated when a filter is applied. Yet filters are common, think of an e-commerce search filtered by category for instance.

Only the Lucene 10.0 engine supports filtering for now because I'm not too familiar with Rust, but I assume that it should be easy to add support for it to the Tantivy engine.

These new commands allow running queries against a filter that matches 1% or
10% of documents.

Filters are interesting because some optimizations that are easy/obvious for
exhaustive evaluation become more complicated when a filter is applied. Yet
filters are common, think of an e-commerce search filtered by category for
instance.

Only the Lucene 10.0 engine supports filtering for now because I'm not too
familiar with Rust, but I assume that it should be easy to add support for it
in a similar fashion.
@PSeitz
Copy link

PSeitz commented Nov 15, 2024

Thanks for the PR, filtering is a great addition.

I think the query side should be handled via the query list, with an added tag. The commands are more like different collectors.

@jpountz
Copy link
Author

jpountz commented Nov 15, 2024

This is how I started, but I would really like to see how all queries perform when a filter is applied, not just a few of them, and duplicating all queries didn't feel right (especially if duplicated twice, once for the 1% filter and another time for the 10% filter). This would also make it annoying to add more queries to the benchmark.

So my second idea was to add it as another dimension to the benchmark (one dimension is the command (=collector), another one the query, another one the filter density) but it felt a bit over-engineered. So I came to this 3rd approach of coupling it with the command, which didn't feel great at first, but now feels to me like the least worst approach?

@PSeitz
Copy link

PSeitz commented Nov 16, 2024

I think duplicating should be fine, but we could have it in code when loading the queries. This has the advantage that you can easily get an overview and compare the different results with a single run.

We may add a FILTER_TAG option or similar to filter queries with certain tags.

Another things that's missing currently is searching on multiple fields, which is probably the much more common use case.

@fulmicoton
Copy link
Collaborator

@PSeitz We'd also need to make sure the query language handles it though (filters should not impact scoring). It might be a pain.

I'd go with @jpountz solution for simplicity.

@jpountz
Copy link
Author

jpountz commented Nov 18, 2024

FWIW I have started looking at applying @PSeitz 's approach: https://github.com/quickwit-oss/search-benchmark-game/compare/master...jpountz:search-benchmark-game:filtered_queries?expand=1 in case you want to take a look (the query parsing bits are still missing).

@jpountz jpountz mentioned this pull request Nov 18, 2024
@PSeitz
Copy link

PSeitz commented Nov 19, 2024

@PSeitz We'd also need to make sure the query language handles it though (filters should not impact scoring). It might be a pain.

I'd go with @jpountz solution for simplicity.

Good point. I think the complexity should be the same for both, where we need to have a special query handling to pass in the filter, but the usage with the queries approach should be easier and would require just one run instead of 3 or 4 runs to get the full picture.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants