Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hocr-cut gives error #154

Open
sarangtc opened this issue Aug 28, 2019 · 6 comments
Open

hocr-cut gives error #154

sarangtc opened this issue Aug 28, 2019 · 6 comments

Comments

@sarangtc
Copy link

hocr-cut.py gives the following error:

Traceback (most recent call last):
File "../hocr-cut.py", line 48, in
filename = os.path.join(os.path.dirname(args.file), filename)
File "/usr/lib/python2.7/posixpath.py", line 68, in join
if b.startswith('/'):
AttributeError: 'NoneType' object has no attribute 'startswith'

@zuphilip
Copy link
Collaborator

The message refers to line 48:

https://github.com/tmbdev/hocr-tools/blob/b3e380779e5c88ad99dca2a6b8b292c0f375fd68/hocr-cut#L48

What is the exact call of hocr-cut you are doing? Can you share the hocr file here?

@sarangtc
Copy link
Author

sarangtc commented Aug 31, 2019 via email

@zuphilip
Copy link
Collaborator

zuphilip commented Sep 1, 2019

The pip package is not up-to-date and therefore hocr-cut is not found in the beginning. Try instead

pip install git+https://github.com/tmbdev/hocr-tools.git

However, I am not sure this will solve your problems...

Your example file is not attached here to this issue in GitHub (I guess that this does not work when you attach it to the email only). Can you upload it directly to this issue in GitHub? Or upload it e.g. at https://pastebin.com/ and give the link here.

@sarangtc
Copy link
Author

sarangtc commented Sep 1, 2019

here is the file
test_0012.txt

@zuphilip
Copy link
Collaborator

zuphilip commented Sep 1, 2019

Okay, I see that you don't have specified the image in your hocr file on line 13. Try to adapt this line to something like

<div class='ocr_page' lang='unknown' title='image IMAGENAME.PNG; bbox 0 0 6169 4648'>

where you should replace IMAGENAME.PNG with the name of your image file. Does that work?

(We can try to make a better error message for this.)

@sarangtc
Copy link
Author

sarangtc commented Sep 1, 2019

ok, that worked,
it gave me a myimage.left.jpg and myimage.right.jpg
I was primarily expecting two hocr files, one for each half
(later to be merged with the images to make the hocr-pdf)

I assumed this from the description:
Cut a page (horizontally) into two pages in the middle
such that the most of the bounding boxes are separated
nicely, e.g. cutting double pages or double columns

I guess you meant the image itself and not the hocr file !!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants