The following benchmarks (& runs) are available. Results are for the dev2 set.:
Benchmark | Runfiles | NDCG@10 | NDCG@1000 | MRR@1000 | R@1000 |
---|---|---|---|---|---|
BM25 (k1=1, b=1.0) | runs | 0.0657 | 0.1033 | 0.0590 | 0.3600 |
Dense Retrieval (SBERT) (DR) | runs | 0.1040 | 0.1665 | 0.0901 | 0.5600 |
Note: The current repository only supports the 2024 version of the corpus/queries.
For using the 2023 version, refer to the 2023 release, use tot23.py
instead, and change the ir_dataset
names
used by baselines inside the code.
## optional: create new environment using py-env virtual-env
## pyenv virtualenv 3.8.11 trec-tot-benchmarks
# install requirements
pip install ir_datasets sentence-transformers==2.2.2 pyserini==0.20.0 pytrec_eval faiss-cpu==1.6.5
After downloading the files (see guidelines), set DATA_PATH to the folder which contains the uncompressed files s.t:
DATA_PATH/
| train-2024
| | - queries.jsonl
| | - qrel.txt
| dev1-2024
| | - queries.jsonl
| | - qrel.txt
| dev2-2024
| | - queries.jsonl
| | - qrel.txt
| corpus.jsonl
Quick test to see if data is setup properly:
python tot.py
The command above should print the correct number of train/dev queries and the number of documents in the corpus, along with example queries and documents.
After downloading the files (see guidelines), set DATA_PATH to the folder which contains the uncompressed files s.t:
DATA_PATH/
| train
| | - queries.jsonl
| | - qrel.txt
| dev
| | - queries.jsonl
| | - qrel.txt
Quick test to see if data is setup properly:
python tot.py
The command above should print the correct number of train/dev queries and the number of documents in the corpus, along with example queries and documents.