Skip to content
/ OPEX Public

CIKM'24 Over-penalization against Extra Information in Neural IR Models (Short)

Notifications You must be signed in to change notification settings

argonism/OPEX

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Over-penalization against Extra Information in Neural IR Models

paper: https://dl.acm.org/doi/pdf/10.1145/3627673.3679975

Requirements

For experiments with robust04, follow the instructions on the following ir-datasets to set up TREC disks 4 and 5. https://ir-datasets.com/disks45.html#disks45/nocr/trec-robust-2004

Getting started

Install dependencies.

conda env create -f=envs/denserr.yml
conda activate denserr

Run Sentence Deletion Analysis experiments

python main.py denserr.DamagedAnalyze --local-scheduler

Task result is output to resources/denserr/analyzer/damaged_analyzer/{cache_file_name}.

Then, correct and visualize ranking shift results

python scripts/compare_ranking_shifts.py \
resources/denserr/analyzer/damaged_analyzer/{cache_file_name}

You can compare multiple results with this script

python scripts/compare_ranking_shifts.py \
resources/denserr/analyzer/damaged_analyzer/{BM25_result_filename}
resources/denserr/analyzer/damaged_analyzer/{ANCE_result_filename}
resources/denserr/analyzer/damaged_analyzer/{ColBERT_result_filename}
resources/denserr/analyzer/damaged_analyzer/{DeepCT_result_filename}
resources/denserr/analyzer/damaged_analyzer/{SPLADE_result_filename}

When changing the datasets, models, and various settings used in the experiments, please edit conf/param.ini.

For example, if you are going to do experimets on msmarco document, set params like this:

[DenseErrConfig]
dataset_name=msmarco-doc

available datasets are listed at denserr/dataset/load_dataset.py

Tasks

To run Sentence Addition Analysis experiments, execute SentenceInstactAnalyze

python main.py denserr.SentenceInstactAnalyze --local-scheduler

For evaluate retireval effectiveness, run Evaluate Task

python main.py denserr.Evaluate --local-scheduler

For ColBERT, SPLADE

To resolve dependency issues, we have prepared an conda environment yml file for both ColBERT and SPLADE. If you want to use these models, create and activate their respective conda envs. If you are using pyenv, don't forget to set the appropriate Python version using pyenv local [version].

colbert: envs/colbert.yml splade: envs/ptsplade.yml

e.g.

conda env create -f=envs/ptsplade.yml
conda activate ptsplade

# for pyenv
pyenv local {your conda version}/envs/ptsplade

About

CIKM'24 Over-penalization against Extra Information in Neural IR Models (Short)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published