Skip to content

Latest commit

 

History

History
74 lines (50 loc) · 3.44 KB

File metadata and controls

74 lines (50 loc) · 3.44 KB

Medical-Record-Linkage-Ensemble Paper Reproduction

Course: Deep Learning for Healthcare, Gargi Deb

This repository contains two notebooks one for each dataset (FEBRL and ePBRN) and utilizes the code provided by the authors of the original paper, Statistical supervised meta-ensemble algorithm for medical record linkage, to reproduce its results and claims and also builds on top of for additional ablations and experiments.

Authors of the original paper:

Resources used to reproduce results:

Original Code provided by authors:

Kha Vo and Jitendra Jonnagaddala and Siaw-Teng Liaw. (2019). Medical-Record-Linkage-Ensemble. Retrieved from https://github.com/ePBRN/Medical-Record-Linkage-Ensemble. Paper: "Statistical supervised meta-ensemble algorithm for data linkage"

Original Paper:

Kha Vo, Jitendra Jonnagaddala, Siaw-Teng Liaw, Statistical supervised meta- ensemble algorithm for medical record linkage, Journal of Biomedical Informatics, Volume 95, 2019, 103220, ISSN 1532-0464, https://doi.org/10.1016/j.jbi.2019.103220.

Requirements

All code is run sucessfully in Google Colab Pro environment with Python 3.6. You will need a Google Colab Pro Account to run the notebooks on. Google Colab already comes with a lot of default ML packages installed and does not require additional installation. The only package used by the authors that is not installed in Google Colab Pro is record_linkage. There is a cell in each runbook that when it is ran in Google Colab Pro, it will install the package.

To access the datasets used by the authors, download the following files from the repository provided by the original authors and save them in your local file system where they will be retrieved at the time of uploading when running the cells in the notebook.

Download and save:

febrl4_UNSW.csv

ePBRN_D_dup.csv

ePBRN_F_dup.csv

febrl3_UNSW.csv

from the repo provided by the original authors. Repo provided by authors can be found here: https://github.com/ePBRN/Medical-Record-Linkage-Ensemble

Packages used:

numpy pandas sklearn torch recordlinkage

Training & Evaluation

For training and evaluating the models, there are dedicated cells in each notebook that have the set hyperparameters and does not require any additional setup or commands, except just running the cell in Google Colab Pro.

Results

The following baseline performance results are from the reproduction of the original paper, and not the original results as stated by the authors:

FEBRN dataset (Source A):

Model name Precision Recall F-Score
SVM 98.72% 99.63% 99.18%
NN 96.96% 99.43% 99.19%
LR 97.64% 99.63% 99.62%

FEBRN dataset (Source B):

Model name Precision Recall F-Score
SVM 31.78% 98.61% 48.07%
NN 69.20% 96.46% 80.59%
LR 59.06% 96.84% 73.37%