-
Notifications
You must be signed in to change notification settings - Fork 379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add raw() to DenseSearchResult and PRFDenseSearchResult #1876
Conversation
hey @manveertamber can you take a look at this? |
@DanielKohn1208 can you add some documentation to explain how to fetch document content for documents retrieved using a dense index: https://github.com/castorini/pyserini/blob/master/docs/usage-fetch.md. It looks like right now we only explain how to do this for a lucene index |
We can call .doc() on a FaissSearcher too, which routes to the lucene index .doc() but the documentation doesn't explain this either. |
@DanielKohn1208 looks good! But maybe change the headings in the documentation to Using a Lucene Index and Using a Faiss Index instead of Using a Sparse Representation and Using a Dense Representation. And you should add some example code in the documentation that showcases your changes. Like calling .raw() on some retrieved hits. |
@manveertamber After going through the documentation, I think that I approached this issue in completely the wrong way. As it is now, my fix actually doesn't actually solve any problems. Someone could already get a hit's raw content from a FaissIndex as follows from pyserini.search.faiss import FaissSearcher, AutoQueryEncoder
DENSE_INDEX = 'beir-v1.0.0-nfcorpus.bge-base-en-v1.5'
encoder = AutoQueryEncoder('BAAI/bge-base-en-v1.5', device='cpu', pooling='mean', l2_norm=True)
searcher = FaissSearcher.from_prebuilt_index(DENSE_INDEX, encoder)
hits = searcher.search('How to Help Prevent Abdominal Aortic Aneurysms')
doc = searcher.doc(hits[0].docid)
raw_content = doc.raw() Also note that my original assumption about the Perhaps I am completely misunderstanding the original issue, but I think all that is actually needed is a documentation update? |
Should I close this PR and create a new one at some point to update the docs? |
Yeah that works. |
Closing PR |
closes #1856 by adding a
raw
method to bothDenseSearchResult
andPRFDenseSearchResult
Had to change how
DenseSearchResult
andPRFDenseSearchResult
is initialized as well.