Skip to content
This repository has been archived by the owner on May 11, 2021. It is now read-only.

Decrease reliance on non-Python APIs #2

Open
jczaplew opened this issue Aug 20, 2018 · 0 comments
Open

Decrease reliance on non-Python APIs #2

jczaplew opened this issue Aug 20, 2018 · 0 comments

Comments

@jczaplew
Copy link
Contributor

jczaplew commented Aug 20, 2018

This could be streamlined somewhat by using something like tesserocr or pyocr instead of using shell scripts.

Additionally, it would be great if there were a way to extract entities from a PDF without needing to run preprocess.sh to convert each page to an image and run tesseract on it.

Ghostscript - https://stackoverflow.com/a/36113000/1956065

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant