Skip to content

caretech-owl/Text-De-Identifizierer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Text De-Identifizierer

This project provides a skript for automatic removal of direct personal identifiers in pdf, docx, txt and log files. Note that, according to GDPR, this is not a full anonymization scheme. However, the procedure of masking direct identifiers can be part of technical measures for data privacy.

Installation

We assume that python and git are installed and there is basic knowledge about both tools. Here, we provide commands to install requirements in a python virtualenv, which have been tested on Linux.

git clone https://github.com/caretech-owl/Text-De-Identifizierer
cd text-anonymisierer
python -m venv
source venv/bin/activate
pip install -r requirements.txt

Usage

For de-identifying a single file:

source venv/bin/activate
python anonymize.py path/to/file

For de-identifying files in a directory:

source venv/bin/activate
python anonymize.py path/

Results are saved as txt in a directory called output.

For (simple) testing purposes we added small examples in the example folder. Give it a try

source venv/bin/activate
python anonymize.py examples/

About

Automatic removal of direct personal identifiers

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages