Skip to content

Fuzzy Jaccard Index (FUJI) implementation and data

Notifications You must be signed in to change notification settings

Petkomat/fuji-score

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 

Repository files navigation

FUJI: Fuzzy Jaccard Index: A robust comparison of ordered lists

This repository contains the code (and many feature rankings computed on over twenty real-life benchmark data sets) from the paper Fuzzy Jaccard Index: A robust comparison of ordered lists (also available on arXiv).

This code is distributed under the Creative Commons Attribution license (CC BY 4.0), so the authors would greatly apprecieate if you acknowledge its use by citing the paper above (the corresponding bibtex is shown below).

@article{fuji,
    title = {Fuzzy Jaccard Index: A robust comparison of ordered lists},
    journal = {Applied Soft Computing},
    volume = {113},
    pages = {107849},
    year = {2021},
    issn = {1568-4946},
    doi = {https://doi.org/10.1016/j.asoc.2021.107849},
    url = {https://www.sciencedirect.com/science/article/pii/S1568494621007717},
    author = {Matej Petkovi\'{c} and Bla\v{Z} \v{S}krlj and Dragi Kocev and Nikola Simidjievski},
    keywords = {Ordered lists, Fuzzy scores, Feature ranking, Information retrieval, Jaccard index}
}

Example

The code is easy to use and implements the FUJI score (fuzzy_jaccard), as well as all the baselines that we compare to (jaccard, hamming, pog, npog, kuncheva, wald, lustgarten, krizek, cwrel, pearson, correlation, fuzzy_gamma).

For example, once we obtain the rankings r and s, e.g.,

r = [1.0, 0.9, 0.3, 0.14, 0.1]
s = [0.8, 0.9, 0.3, 0.14, 0.1]

(where r[i] and s[i] give the importance of the i-th feature), FUJI can be computed as

curve, auc = compute_similarity(r, s, "fuzzy_jaccard")

The list curve is a list, containing the FUJI values at each point, and auc is the area under this curve. For some other examples, see main.py.

Dependencies

The code implements many similarity scores. Some of them need numpy or scipy. For showing the progress, tqdm can be used.

.fimp files

The structure of the files is the following:

<meta data (if available)>
<fimp table>

<fimp table> consists of four columns:

  • index of the feature in the dataset
  • name of the feature
  • rank of the feature (>= 1)
  • feature relevance score

The values are tab-separated.

About

Fuzzy Jaccard Index (FUJI) implementation and data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages