Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add filtered queries. #64

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

jpountz
Copy link

@jpountz jpountz commented Nov 18, 2024

This is an alternative to #63 that adds filtered queries instead of new filtering commands.

Queries support optional filtering via the following syntax: " WHERE ", so engines can split on " WHERE " and interpret the left part as a required clause that computes scores and the right part as a filter. Only the Lucene 10 engines support this for now.

These new commands allow running queries against a filter that matches 1% or
10% of documents.

Filters are interesting because some optimizations that are easy/obvious for
exhaustive evaluation become more complicated when a filter is applied. Yet
filters are common, think of an e-commerce search filtered by category for
instance.

Only the Lucene 10.0 engine supports filtering for now because I'm not too
familiar with Rust, but I assume that it should be easy to add support for it
in a similar fashion.
@jpountz jpountz marked this pull request as draft November 18, 2024 09:49
@PSeitz
Copy link

PSeitz commented Nov 19, 2024

Thanks for the PR. I think we could make the handling slightly simpler when we move the WHERE clause to and extra "filter" field next to "query". What do you think?

@jpountz
Copy link
Author

jpountz commented Nov 19, 2024

Can you explain a bit more what you have in mind? It was convenient to avoid introducing a third column in the protocol that src/client.py uses to submit queries to engines, or in the results web page. And then I kept the queries.txt file the same for consistency.

@PSeitz
Copy link

PSeitz commented Nov 20, 2024

Can you explain a bit more what you have in mind? It was convenient to avoid introducing a third column in the protocol that src/client.py uses to submit queries to engines, or in the results web page. And then I kept the queries.txt file the same for consistency.

I was thinking the API with a third "filter" parameter would be a little bit clearer.
For the results we could still have a single column with query + "WHERE " + filter

@jpountz
Copy link
Author

jpountz commented Nov 25, 2024

@PSeitz I looked into your suggestion, but having different formats in the queries.txt and the results file while doing the conversion in src/client.py looks a bit awkward. It's possible that I misunderstood your suggestion, if so I'd appreciate if you could describe how you expect queries to look like in queries.txt, results.json and in the protocol that is used between src/client.py and the engines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants