Record linkage accuracy assessment using machine learning (rl-acc)

How to run

rl-acc is a supervised method to learn and assess the accuracy of record linkage. It requires a labeled dataset in order to work. The dataset must have at least three columns: the label (y) and two coluns with information from the two liked datasets.

Config file

The first step before running rl-acc is to edit the config file acording to your data. Assuming you have three columns (y, name_a, and name_b [assuming a and b were the two liked datasets]) you would have the following config file:

[DEFAULT]
DBA = a
DBB = b
IndexScore = 
Y = y

[0]
Id = name
Type = name
IndexA = name_a
IndexB = name_b
Features = jaro

In the config above, Y means the name of the column in the data containing the label, IndexA and IndexB the name of the column containing the variable to be used to extract the features. Features defines the features extracted for this column, which can also be a list, separated by comma (ie, jaro,hamming). Each entry of Features must be defined in featuresExtractor.py inside extractFeatures function.

Running

Once the config file is configured properly, you can use the main.ipynb notebook to run rl-acc. The notebook is self expanatory. If you have any questions, please contact me at [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
README.md		README.md
attribute.py		attribute.py
columnConfig.py		columnConfig.py
configLoader.py		configLoader.py
datasetContruction.ipynb		datasetContruction.ipynb
featuresExtractor.py		featuresExtractor.py
main.ipynb		main.ipynb
pair.py		pair.py
record.py		record.py
recordReader.py		recordReader.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Record linkage accuracy assessment using machine learning (rl-acc)

How to run

Config file

Running

About

Releases

Packages

Languages

gcgbarbosa/rl-accuracy

Folders and files

Latest commit

History

Repository files navigation

Record linkage accuracy assessment using machine learning (rl-acc)

How to run

Config file

Running

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages