Add filtering commands. #63

jpountz · 2024-11-14T21:48:38Z

These new commands allow running queries against a filter that matches 1% or 10% of documents.

Filters are interesting because some optimizations that are easy/obvious for exhaustive evaluation become more complicated when a filter is applied. Yet filters are common, think of an e-commerce search filtered by category for instance.

Only the Lucene 10.0 engine supports filtering for now because I'm not too familiar with Rust, but I assume that it should be easy to add support for it to the Tantivy engine.

These new commands allow running queries against a filter that matches 1% or 10% of documents. Filters are interesting because some optimizations that are easy/obvious for exhaustive evaluation become more complicated when a filter is applied. Yet filters are common, think of an e-commerce search filtered by category for instance. Only the Lucene 10.0 engine supports filtering for now because I'm not too familiar with Rust, but I assume that it should be easy to add support for it in a similar fashion.

PSeitz · 2024-11-15T09:47:49Z

Thanks for the PR, filtering is a great addition.

I think the query side should be handled via the query list, with an added tag. The commands are more like different collectors.

jpountz · 2024-11-15T22:37:38Z

This is how I started, but I would really like to see how all queries perform when a filter is applied, not just a few of them, and duplicating all queries didn't feel right (especially if duplicated twice, once for the 1% filter and another time for the 10% filter). This would also make it annoying to add more queries to the benchmark.

So my second idea was to add it as another dimension to the benchmark (one dimension is the command (=collector), another one the query, another one the filter density) but it felt a bit over-engineered. So I came to this 3rd approach of coupling it with the command, which didn't feel great at first, but now feels to me like the least worst approach?

PSeitz · 2024-11-16T01:42:54Z

I think duplicating should be fine, but we could have it in code when loading the queries. This has the advantage that you can easily get an overview and compare the different results with a single run.

We may add a FILTER_TAG option or similar to filter queries with certain tags.

Another things that's missing currently is searching on multiple fields, which is probably the much more common use case.

fulmicoton · 2024-11-18T08:39:20Z

@PSeitz We'd also need to make sure the query language handles it though (filters should not impact scoring). It might be a pain.

I'd go with @jpountz solution for simplicity.

jpountz · 2024-11-18T08:47:55Z

FWIW I have started looking at applying @PSeitz 's approach: https://github.com/quickwit-oss/search-benchmark-game/compare/master...jpountz:search-benchmark-game:filtered_queries?expand=1 in case you want to take a look (the query parsing bits are still missing).

PSeitz · 2024-11-19T04:19:10Z

@PSeitz We'd also need to make sure the query language handles it though (filters should not impact scoring). It might be a pain.

I'd go with @jpountz solution for simplicity.

Good point. I think the complexity should be the same for both, where we need to have a special query handling to pass in the filter, but the usage with the queries approach should be easier and would require just one run instead of 3 or 4 runs to get the full picture.

jpountz mentioned this pull request Nov 18, 2024

Add filtered queries. #64

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add filtering commands. #63

Add filtering commands. #63

jpountz commented Nov 14, 2024

PSeitz commented Nov 15, 2024

jpountz commented Nov 15, 2024

PSeitz commented Nov 16, 2024

fulmicoton commented Nov 18, 2024

jpountz commented Nov 18, 2024 •

edited

Loading

PSeitz commented Nov 19, 2024

Add filtering commands. #63

Are you sure you want to change the base?

Add filtering commands. #63

Conversation

jpountz commented Nov 14, 2024

PSeitz commented Nov 15, 2024

jpountz commented Nov 15, 2024

PSeitz commented Nov 16, 2024

fulmicoton commented Nov 18, 2024

jpountz commented Nov 18, 2024 • edited Loading

PSeitz commented Nov 19, 2024

jpountz commented Nov 18, 2024 •

edited

Loading