You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have two versions of QueryEncoder and two versions of AutoQueryEncoder. One set of classes is in pyserini.search.faiss, the other set is in pyserini.encode.
(I'm pointing to code at commit just prior to #1997 - because I think that commit breaks a number of things.)
@MXueguang has clearly stated in #1728 that the version in pyserini.encode is actually the one we should use. This is true, because only that version has the option to correctly handle the query prefix, which is needed for BGE to work correctly. However, the QueryEncoder in pyserini.search.faiss is the one that actually works, because only that version downloads query encodings.
So, here's the puzzle: How did we get into a state where we're using AutoQueryEncoder in pyserini.encode but QueryEncoder in pyserini.search.faiss, where the code is so crazily intertwined, and all the regressions pass?
So the Faiss searcher is getting most of the models from pyserini.search, and if you trace the imports to pyserini/search/__init__.py, we see the imports "loop back to itself":
Except for AutoQueryEncoder (and CosDprQueryEncoder, but that's an aside).
So now that we understand what's going on, it's probably easier to fix. This also means that #1997 is broken, because it uses the wrong implementation of AutoQueryEncoder.
The text was updated successfully, but these errors were encountered:
We have two versions of
QueryEncoder
and two versions ofAutoQueryEncoder
. One set of classes is inpyserini.search.faiss
, the other set is inpyserini.encode
.QueryEncoder
inpyserini/search/faiss/_searcher.py
QueryEncoder
inpyserini/encode/_base.py
AutoQueryEncoder
inpyserini/search/faiss/_searcher.py
AutoQueryEncoder
inpyserini/encode/_auto.py
(I'm pointing to code at commit just prior to #1997 - because I think that commit breaks a number of things.)
@MXueguang has clearly stated in #1728 that the version in
pyserini.encode
is actually the one we should use. This is true, because only that version has the option to correctly handle the query prefix, which is needed for BGE to work correctly. However, theQueryEncoder
inpyserini.search.faiss
is the one that actually works, because only that version downloads query encodings.So, here's the puzzle: How did we get into a state where we're using
AutoQueryEncoder
inpyserini.encode
butQueryEncoder
inpyserini.search.faiss
, where the code is so crazily intertwined, and all the regressions pass?Here's the crazy answer:
In
pyserini/search/faiss/__main__.py
, this is the import statement:So the Faiss searcher is getting most of the models from
pyserini.search
, and if you trace the imports topyserini/search/__init__.py
, we see the imports "loop back to itself":Which means that for most of the encoder classes, the implementations in
pyserini/search/faiss/_searcher.py
are used.Except for
AutoQueryEncoder
(andCosDprQueryEncoder
, but that's an aside).So now that we understand what's going on, it's probably easier to fix. This also means that #1997 is broken, because it uses the wrong implementation of
AutoQueryEncoder
.The text was updated successfully, but these errors were encountered: