CSG Data Fusion by Stephen Larroque, 2016-2019

This toolset provide several tools to unify databases from multiple sources (data fusion).

Here are the descriptions of a few tools included (not exhaustive - there are descriptions and readme included at the top of each script):

PDF/DOC reports extractor to csv database.
CSV databases comparison and merge (to merge reports database with fmp database, but can be used with any two csv files as long as they have two columns: name and final_diagnosis).
Dicom and csv anonymizer (Note: this is an old version, the latest anonymizer is now standalone in its own script and with a GUI! There is another folder or it is also on github: lrq3000/csg_dicoms_anonymizer).

To use them, you need to pip install pandas (if you have installed Anaconda, you don't need to install pandas).

For the reports extractor specifically, you will also need to pip install textract. If you want to use OCR (to extract a few PDF documents that cannot be extracted otherwise), you will need also install tesseract ocr v3 (this is not a Python library, so you will have to grab the installer for your platform). Tesseract ocr v4 might also work but at the time of this writing it is still in alpha so it was not possible to test.

You might need additional libraries depending on the script, but the author tried his best to maintain the number of dependencies to the minimum (to avoid complicating usage and scripts breaking when the dependencies are updated).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CSG Data Fusion by Stephen Larroque, 2016-2019

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
csg_fileutil_libs		csg_fileutil_libs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
_version.py		_version.py
csg_datafusion_db_merger.ipynb		csg_datafusion_db_merger.ipynb
csg_datafusion_db_merger_old.ipynb		csg_datafusion_db_merger_old.ipynb
csg_datafusion_deanonymizer.ipynb		csg_datafusion_deanonymizer.ipynb
csg_datafusion_dicoms_anonymizer.ipynb		csg_datafusion_dicoms_anonymizer.ipynb
csg_datafusion_dicoms_extract.ipynb		csg_datafusion_dicoms_extract.ipynb
csg_datafusion_ecg_db_generator.ipynb		csg_datafusion_ecg_db_generator.ipynb
csg_datafusion_extract_fields_from_reports.ipynb		csg_datafusion_extract_fields_from_reports.ipynb
csg_datafusion_fmp_db_cleaner.ipynb		csg_datafusion_fmp_db_cleaner.ipynb
csg_datafusion_sarah_db_cleaner.ipynb		csg_datafusion_sarah_db_cleaner.ipynb
csg_datafusion_shortendb.ipynb		csg_datafusion_shortendb.ipynb
csg_datafusion_stats_analysis_fmp_dicoms_db.ipynb		csg_datafusion_stats_analysis_fmp_dicoms_db.ipynb
csg_datafusion_stats_analysis_fmp_dicoms_db_acute.ipynb		csg_datafusion_stats_analysis_fmp_dicoms_db_acute.ipynb

License

GIGA-Consciousness/csg_datafusion

Folders and files

Latest commit

History

Repository files navigation

CSG Data Fusion by Stephen Larroque, 2016-2019

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages