Citing:
@article{cidacsrl,
title={CIDACS-RL: a novel indexing search and scoring-based record linkage system for huge datasets with high accuracy and scalability},
author={George C. G. Barbosa and M. S. Ali and Bruno Rodrigues De Ara{\'u}jo and Sandra Reis and Samila Sena and Maria Yuri Ichihara and J{\'u}lia Moreira Pescarini and Rosemeire L. Fiaccone and Leila D Amorim and Robespierre Pita and Marcos Ennes Barreto and Liam Smeeth and Mauricio Lima Barreto},
journal={BMC Medical Informatics and Decision Making},
year={2020},
volume={20}
}
After cloning this repository follow the steps:
- Enter the repo folder and create a folder called
assets
.
git clone https://github.com/gcgbarbosa/cidacs-rl-v1.git
cd cidacs-rl-v1
mkdir assets
- Inside the assets folder, download the datasets:
cd assets
wget https://github.com/cidacslab/atyimo/raw/master/atyimo_spark/datasets_sample/small/DATASET_1_5K_records.csv.gz
wget https://github.com/cidacslab/atyimo/raw/master/atyimo_spark/datasets_sample/small/DATASET_2_1M_records.csv.gz
- Unpack the and rename files:
gunzip DATASET_1_5K_records.csv.gz
gunzip DATASET_2_1M_records.csv.gz
mv DATASET_1_5K_records.csv dsa.csv
mv DATASET_2_1M_records.csv dsb.csv
- Replace the separator of
dsa.csv
anddsb.csv
files from;
to,
(because Cidacs-RL uses comma as separator).
sed -i 's/;/,/g' dsa.csv
sed -i 's/;/,/g' dsb.csv
- Go back to cidacs-rl-v1 folder and generate the jar file:
cd ..
mvn install
- Finally, run the Cidacs-RL:
mvn exec:java
After the program finishes, a folder called linkage-*date*-*time*
will be created with the files resulting from Spark run. There will be multiple files inside the folder. In order to generate a single csv file, execute the following command inside the linkage folder:
cat * > linkage.csv
-
Pescarini JM, Williamson E, Nery JS, et al. Effect of a conditional cash transfer programme on leprosy treatment adherence and cure in patients from the nationwide 100 Million Brazilian Cohort: a quasi-experimental study. Lancet Infect Dis. 2020;20(5):618-627. doi:10.1016/S1473-3099(19)30624-3 [link]
-
de Andrade KVF, Silva Nery J, Moreira Pescarini J, et al. Geographic and socioeconomic factors associated with leprosy treatment default: An analysis from the 100 Million Brazilian Cohort. PLoS Negl Trop Dis. 2019;13(9):e0007714. Published 2019 Sep 6. doi:10.1371/journal.pntd.0007714 [link]