Information Retrieval Measurements

Measurement tool for Effectiveness , Retrievability Bias and Fairness given output files from Information Retrieval Experiment

Pre-Requisite

Use Linux interpreter to run bash commands

For Windows Use WSL Ubuntu Interpreter
download Anserini Anserini . Under anserini main path do the following :
- Create "indexes" folder and put the required indexes of [AQUAINT - CORE17 - WAPO]
- move resource folder from our Github account to that location for Queries
download Trec_Eval Trec_Eval
download cwl_eval cwl_Eval
Set The locations of [Anserini - Trec_Eval - CWL_Eval] in general.py constants list

Performance Measurement

File : performanceCalculator - Test in measurePerformance

Description :

Run Retrieval experiment on 50 queries and get performance measurements based on input

input

corpus , exp , model , docs , terms , beta , parameter , index , res_file , qry_file

corpus = ( a = AQUAINT , c = CORE17 , w = WAPO )
exp = (b = Baseline , a = Axiom , r = RM3 )
model = ( b = BM25 , P = PL2 , l = LMD )
docs = Number of FbDocs - For Baseline = 0 automatically
terms = Number of FbTerms - For Baseline = 0 automatically
beta = Original Query Weight
parameter = Length Normalization Parameter
index = Index to use with Path - For empty use default index path Anserini_root/indexes/ (Anserini index Name Ex. lucene-index.robust05.pos+docvectors+rawdocs)
res_file = res_file from Experiment - if Empty String use Anserini_root/ (our Naming Ex = AQ-BM25-AX-b0.1-200K-beta0.5-10-05.res)
qry_file = qry_file from Experiment - if Empty String use Anserini_root/resource

Comment

Dummy Files (res_file - bash command file) are stored in "dum" directory under anserini

Output

[TrecMAP,TrecBref,TrecP10,TrecNDCG,CWLMAP,CWLNDCG,CWLP10,CWLRBP0.4,CWLRBP0.6,CWLRBP0.8]

Evaluate Res File

If you have a ready res file , you may directly evaluate it.

File : eval.py

Performance Evaluation

eval_performance (res_file,corpus)

Description

Evaluate given res_file using 50 TREC queries and both trec_eval and cwl_eval. (AQUAINT - CORE17 - WAPO) are allowed

input

res_file : path of res_file (Linux path)
corpus : ( a = AQUAINT , c = CORE17 , w = WAPO ). for detecting the query and qrel files.

output

[TrecMAP , TrecBref , TrecP10 , TrecNDCG , CWLMAP , CWLNDCG , CWLP10 , CWLRBP0.4 , CWLRBP0.6 , CWLRBP0.8]

Comments

You may use eval_performance_trec or eval_performance_cwl with same input for specific results

Bias Evaluation

eval_bias (res_file , corpus , b)

Description

Calculate Retrievability MAP [docid-r] based on given input
Produce 6 outputs [Gini - count of (r=0) docs - Total Retrievability] in terms of individual documents and author cohorts.

input

res_file : path of res_file (Linux path)
corpus : ( a = AQUAINT , c = CORE17 , w = WAPO ). for detecting the query and qrel files.
b : retrievability b value to define the gain.

output

g(d) : Gini between individual documents.
ctr_zero(d) : Count of documents with zero retrievability.
rSum(d) : Total Retrievability for individual documents.
g(a) : Gini between author cohorts.
ctr_zero(a) : Count of authors with zero retrievability.
rSum(a) : Total Retrievability for author cohorts. mostly = rSum(d) but just in case needed.

Fairness Evaluation

eval_fairness (res_file , corpus , b)

Description

Using a combination of :
- TREC Fair Ranking Track 2019
- Retrievability Measurement

Calculate Author-Relevance MAP [docid - {author,rel_sum , rel_count} ].
- rel_sum = sum of relevance scores in judgement file for a specific document.
- rel_count = count of relevance scores in judgement file for a specific document.
Calculate Retrievability MAP [docid-r] based on given input
Produce 4 outputs [rel_sum_exposure,size_exposure,grp_exposure,rel_avg_exposure] based on author cohorts.

input

res_file : path of res_file (Linux path)
corpus : ( a = AQUAINT , c = CORE17 , w = WAPO ). for detecting the query and qrel files.
b : retrievability b value to define the gain.

output

rel_sum_exposure (Unfairness By Relevance): distribution of retrievability (exposure) over rel_sum scores.
size_exposure (Unfairness By Size) : distribution of retrievability (exposure) over number of documents written by author.
grp_exposure (Unfairness By Equality) : distribution of retrievability (exposure) when supposing that all authors worth same.
rel_avg_exposure (Unfairness by Relevance) : distribution of mean of retrievability (exposure) over rel_sum.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.idea		.idea
images		images
resource		resource
src		src
.gitattributes		.gitattributes
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Information Retrieval Measurements

Pre-Requisite

Performance Measurement

Description :

input

Comment

Output

Evaluate Res File

Performance Evaluation

Description

input

output

Comments

Bias Evaluation

Description

input

output

Fairness Evaluation

Description

input

output

About

Releases

Packages

Languages

ABDULAZIZALQATAN/IR-Measurements

Folders and files

Latest commit

History

Repository files navigation

Information Retrieval Measurements

Pre-Requisite

Performance Measurement

Description :

input

Comment

Output

Evaluate Res File

Performance Evaluation

Description

input

output

Comments

Bias Evaluation

Description

input

output

Fairness Evaluation

Description

input

output

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages