-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
redo_ocr_PDF some version conflict #1
Comments
Well, clearly I should be catching more errors earlier in the script. :-) I think the problem traces to the inability to open the output file. I suspect that may be Apple's file system protections. Can you try using a file in your Documents folder for output? Looks like you're using something in the root dir right now. |
Thanks for your reply! |
I'm going on this first report of an error:
AFAICT, it looks like the script is trying to create a file in the root dir (
I doubt that's a problem. I'm using basic ocrmypdf functionality and this has worked for me through a few versions. I'm also not sure pdfminer is needed. If it still doesn't work, try this from the command line and let me know what it says: PS I've also removed the mac-specific notification at the end. |
That was definitely going into the right direction.
to make it work. I also changed:
to
and
to
so that the file would be created in the temp-folder. I guess the tmpdir should ideally also be removed again by the script after everything finished, but for now I do it manually. The pdfminer issue is a temporary problem with an old version of ocrmypdf that I could fix. The script runs through without errors now, but unfortunately it still does not give me a final.pdf (neither in the temp folder nor in the working directory of the script or the original file). |
I'm not sure how
Also if you just add a "/" to the end of your tmpdir string, you can leave the other commands alone. But if you want to test, maybe set that to a known folder, so you can make sure everything else is working ok.
The text-only file should be fairly small, since all it has in it is the text. 4.9k is very small for sure, though I don't know what your file looks like.
The output file isn't called "final.pdf", but has a date-stamped version of the original file's name, like |
Hi Jmuccigr,
your redo_ocr script looks really interesting and I would love to use it for all these jstore pdfs sitting on my harddrive which I cannot properly annotate due to the poor ocr. Unfortunately, however, there seems to be some version conflict which I cannot solve on my own. Could you maybe help me out with some advice?
This is the error I get when I run redo_ocr_PDF.sh:
The text was updated successfully, but these errors were encountered: