Skip to content

gcgbarbosa/cidacs-rl-v1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cidacs-RL

Citing:

@article{cidacsrl,
  title={CIDACS-RL: a novel indexing search and scoring-based record linkage system for huge datasets with high accuracy and scalability},
  author={George C. G. Barbosa and M. S. Ali and Bruno Rodrigues De Ara{\'u}jo and Sandra Reis and Samila Sena and Maria Yuri Ichihara and J{\'u}lia Moreira Pescarini and Rosemeire L. Fiaccone and Leila D Amorim and Robespierre Pita and Marcos Ennes Barreto and Liam Smeeth and Mauricio Lima Barreto},
  journal={BMC Medical Informatics and Decision Making},
  year={2020},
  volume={20}
}

How to run

After cloning this repository follow the steps:

  1. Enter the repo folder and create a folder called assets.
git clone https://github.com/gcgbarbosa/cidacs-rl-v1.git
cd cidacs-rl-v1
mkdir assets
  1. Inside the assets folder, download the datasets:
cd assets
wget https://github.com/cidacslab/atyimo/raw/master/atyimo_spark/datasets_sample/small/DATASET_1_5K_records.csv.gz
wget https://github.com/cidacslab/atyimo/raw/master/atyimo_spark/datasets_sample/small/DATASET_2_1M_records.csv.gz
  1. Unpack the and rename files:
gunzip DATASET_1_5K_records.csv.gz
gunzip DATASET_2_1M_records.csv.gz
mv DATASET_1_5K_records.csv dsa.csv
mv DATASET_2_1M_records.csv dsb.csv
  1. Replace the separator of dsa.csv and dsb.csv files from ; to , (because Cidacs-RL uses comma as separator).
sed -i 's/;/,/g' dsa.csv
sed -i 's/;/,/g' dsb.csv
  1. Go back to cidacs-rl-v1 folder and generate the jar file:
cd ..
mvn install
  1. Finally, run the Cidacs-RL:
mvn exec:java

After the program finishes, a folder called linkage-*date*-*time* will be created with the files resulting from Spark run. There will be multiple files inside the folder. In order to generate a single csv file, execute the following command inside the linkage folder:

cat * > linkage.csv

Cases:

  1. Pescarini JM, Williamson E, Nery JS, et al. Effect of a conditional cash transfer programme on leprosy treatment adherence and cure in patients from the nationwide 100 Million Brazilian Cohort: a quasi-experimental study. Lancet Infect Dis. 2020;20(5):618-627. doi:10.1016/S1473-3099(19)30624-3 [link]

  2. de Andrade KVF, Silva Nery J, Moreira Pescarini J, et al. Geographic and socioeconomic factors associated with leprosy treatment default: An analysis from the 100 Million Brazilian Cohort. PLoS Negl Trop Dis. 2019;13(9):e0007714. Published 2019 Sep 6. doi:10.1371/journal.pntd.0007714 [link]