WTF? Two versions of QueryEncoder and two versions of AutoQueryEncoder #2009

lintool · 2024-10-08T13:36:29Z

We have two versions of QueryEncoder and two versions of AutoQueryEncoder. One set of classes is in pyserini.search.faiss, the other set is in pyserini.encode.

QueryEncoder in pyserini/search/faiss/_searcher.py
QueryEncoder in pyserini/encode/_base.py
AutoQueryEncoder in pyserini/search/faiss/_searcher.py
AutoQueryEncoder in pyserini/encode/_auto.py

(I'm pointing to code at commit just prior to #1997 - because I think that commit breaks a number of things.)

@MXueguang has clearly stated in #1728 that the version in pyserini.encode is actually the one we should use. This is true, because only that version has the option to correctly handle the query prefix, which is needed for BGE to work correctly. However, the QueryEncoder in pyserini.search.faiss is the one that actually works, because only that version downloads query encodings.

So, here's the puzzle: How did we get into a state where we're using AutoQueryEncoder in pyserini.encode but QueryEncoder in pyserini.search.faiss, where the code is so crazily intertwined, and all the regressions pass?

Here's the crazy answer:

In pyserini/search/faiss/__main__.py, this is the import statement:

from pyserini.search import FaissSearcher, BinaryDenseSearcher, TctColBertQueryEncoder, QueryEncoder, \
    DprQueryEncoder, BprQueryEncoder, DkrrDprQueryEncoder, AnceQueryEncoder, AggretrieverQueryEncoder, DenseVectorAveragePrf, \
    DenseVectorRocchioPrf, DenseVectorAncePrf, OpenAIQueryEncoder, ClipQueryEncoder

from pyserini.encode import PcaEncoder, CosDprQueryEncoder, AutoQueryEncoder

So the Faiss searcher is getting most of the models from pyserini.search, and if you trace the imports to pyserini/search/__init__.py, we see the imports "loop back to itself":

from .faiss import DenseSearchResult, PRFDenseSearchResult, FaissSearcher, BinaryDenseSearcher, QueryEncoder, \
    DprQueryEncoder, BprQueryEncoder, DkrrDprQueryEncoder, TctColBertQueryEncoder, AnceQueryEncoder, AggretrieverQueryEncoder, AutoQueryEncoder, ClipQueryEncoder
from .faiss import AnceEncoder
from .faiss import DenseVectorAveragePrf, DenseVectorRocchioPrf, DenseVectorAncePrf
from .faiss import OpenAIQueryEncoder

Which means that for most of the encoder classes, the implementations in pyserini/search/faiss/_searcher.py are used.

Except for AutoQueryEncoder (and CosDprQueryEncoder, but that's an aside).

So now that we understand what's going on, it's probably easier to fix. This also means that #1997 is broken, because it uses the wrong implementation of AutoQueryEncoder.

The text was updated successfully, but these errors were encountered:

MXueguang · 2024-10-08T14:33:25Z

We actually also need to take care about imports in https://github.com/castorini/pyserini/blob/master/pyserini/encode/query.py
What do you think on this?

lintool · 2024-10-08T20:10:18Z

Yes, definitely, that will need refactoring. First this though: #2008

lintool · 2024-10-15T21:19:10Z

Leaving open because we still need to think about what to do with:

pyserini/encode/query.py
pyserini/encode/__main__.py

This was referenced Oct 9, 2024

Refactoring to remove duplicates in pyserini.encode and pyserini.search.faiss #2008

Merged

Reconcile/merge encoders in pyserini.search.faiss and pyserini.encode #2013

Closed

This was referenced Oct 15, 2024

Importing LuceneSearcher relies on FAISS and Torch #1375

Closed

dynamically import dependencies #853

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WTF? Two versions of QueryEncoder and two versions of AutoQueryEncoder #2009

WTF? Two versions of QueryEncoder and two versions of AutoQueryEncoder #2009

lintool commented Oct 8, 2024

MXueguang commented Oct 8, 2024

lintool commented Oct 8, 2024

lintool commented Oct 15, 2024

WTF? Two versions of QueryEncoder and two versions of AutoQueryEncoder #2009

WTF? Two versions of QueryEncoder and two versions of AutoQueryEncoder #2009

Comments

lintool commented Oct 8, 2024

MXueguang commented Oct 8, 2024

lintool commented Oct 8, 2024

lintool commented Oct 15, 2024