periodicals-analysis

Description

This repository implements the optical character recognition (OCR), natural-language processing (NLP) and topic modeling of historical organic periodicals.

Requirements

python 3.6+
tesseract-ocr (https://github.com/tesseract-ocr/tesseract)
imagemagick
nltk
scikit-learn
jupyter
pyLDAvis (for visualizing models)

I recommend using virtualenv/virtualenvwrapper to install/manage the python modules.

Install

Use Pip to install directly from GitHub.

pip install git+https://github.com/cloudbopper/perysis.git@master#egg=perysis

Add '-e' for an editable install.

Running

Preprocessing OCR-ed text

python -m perysis.preprocess -input_dir documents_raw -output_dir documents_processed

License

periodicals-analysis is free, open source software, released under the MIT license. See LICENSE for details.

Contact

Akshay Sood

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
notebooks		notebooks
perysis		perysis
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

periodicals-analysis

Description

Requirements

Install

Running

License

Contact

About

Releases

Packages

Languages

License

cloudbopper/perysis

Folders and files

Latest commit

History

Repository files navigation

periodicals-analysis

Description

Requirements

Install

Running

License

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages