Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

region segmentation crashes #62

Closed
EEngl52 opened this issue Aug 20, 2020 · 10 comments · Fixed by #61
Closed

region segmentation crashes #62

EEngl52 opened this issue Aug 20, 2020 · 10 comments · Fixed by #61

Comments

@EEngl52
Copy link

EEngl52 commented Aug 20, 2020

ocrd-cis-ocropy-segment crashed completely on this picture with the following workflow:
ocrd-cis-ocropy-binarize|MAX|OCR-D-BIN1| | |ERROR
ocrd-anybaseocr-crop|OCR-D-BIN1|OCR-D-CROP| | |ERROR
ocrd-olena-binarize|OCR-D-CROP|OCR-D-BIN| | |ERROR
ocrd-cis-ocropy-deskew|OCR-D-BIN|OCR-D-DESKEW| | /test/data/ocrd/taverna/models/param-cis-deskew-page.json |ERROR
ocrd-cis-ocropy-denoise|OCR-D-DESKEW|OCR-D-DENOISE| | |ERROR
ocrd-cis-ocropy-segment|OCR-D-DENOISE|OCR-D-SEG-REGION| | /test/data/ocrd/taverna/models/param-cis-seg-page.json |ERROR

@cneud
Copy link
Contributor

cneud commented Aug 20, 2020

ocrd-cis-ocropy-segment --mets /test/data/almahide/mets.xml --working-dir /test/data/almahide --input-file-grp OCR-D-DENOISE --output-file-grp OCR-D-SEG-REGION --parameter /test/data/ocrd/taverna/models/param-cis-seg-page.json --log-level ERROR
19:14:52.753 INFO root - Overriding log level globally to ERROR
19:29:35.092 ERROR shapely.geos - TopologyException: Input geom 0 is invalid: Self-intersection at or near point -1 561 at -1 561
Traceback (most recent call last):
  File "/home/habocr/newinstallation/ocrd_all/venv/bin/ocrd-cis-ocropy-segment", line 8, in <module>
    sys.exit(ocrd_cis_ocropy_segment())
  File "/home/habocr/newinstallation/ocrd_all/venv/lib/python3.6/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/habocr/newinstallation/ocrd_all/venv/lib/python3.6/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/habocr/newinstallation/ocrd_all/venv/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/habocr/newinstallation/ocrd_all/venv/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/habocr/newinstallation/ocrd_all/venv/lib/python3.6/site-packages/ocrd_cis/ocropy/cli.py", line 54, in ocrd_cis_ocropy_segment
    return ocrd_cli_wrap_processor(OcropySegment, *args, **kwargs)
  File "/home/habocr/newinstallation/ocrd_all/venv/lib/python3.6/site-packages/ocrd/decorators.py", line 102, in ocrd_cli_wrap_processor
    run_processor(processorClass, ocrd_tool, mets, workspace=workspace, **kwargs)
  File "/home/habocr/newinstallation/ocrd_all/venv/lib/python3.6/site-packages/ocrd/processor/base.py", line 61, in run_processor
    processor.process()
  File "/home/habocr/newinstallation/ocrd_all/venv/lib/python3.6/site-packages/ocrd_cis/ocropy/segment.py", line 306, in process
    page_id, file_id, zoom, rogroup=rogroup)
  File "/home/habocr/newinstallation/ocrd_all/venv/lib/python3.6/site-packages/ocrd_cis/ocropy/segment.py", line 579, in _process_element
    line_polygon = polygon_for_parent(line_polygon, region)
  File "/home/habocr/newinstallation/ocrd_all/venv/lib/python3.6/site-packages/ocrd_cis/ocropy/segment.py", line 676, in polygon_for_parent
    interp = childp.intersection(parentp)
  File "/home/habocr/newinstallation/ocrd_all/venv/lib/python3.6/site-packages/shapely/geometry/base.py", line 649, in intersection
    return geom_factory(self.impl['intersection'](self, other))
  File "/home/habocr/newinstallation/ocrd_all/venv/lib/python3.6/site-packages/shapely/topology.py", line 70, in __call__
    self._check_topology(err, this, other)
  File "/home/habocr/newinstallation/ocrd_all/venv/lib/python3.6/site-packages/shapely/topology.py", line 38, in _check_topology
    self.fn.__name__, repr(geom)))
shapely.errors.TopologicalError: The operation 'GEOSIntersection_r' could not be performed. Likely cause is invalidity of the geometry <shapely.geometry.polygon.Polygon object at 0x7f12f8f12828>

@bertsky
Copy link
Collaborator

bertsky commented Aug 20, 2020

Sorry, cannot reproduce yet. My segmentation runs correctly. Can you tell me the parameters you used in that workflow? And which versions (esp. ocrd_anybaseocr and ocrd_olena)?

@EEngl52
Copy link
Author

EEngl52 commented Aug 21, 2020

thanks for looking into this so quickly!
I'm using ocrd_all natively, last commit ca24263 (ocrd-olena-binarize version 1.2.0 and ocrd-anybasocr-cropversion 0.0.5). I didn't specify any parameters except region level forocrd-cis-ocropy-deskewandocrd-cis-ocropy-segment`

@bertsky
Copy link
Collaborator

bertsky commented Aug 21, 2020

I'm using ocrd_all natively, last commit ca24263 (ocrd-olena-binarize version 1.2.0 and ocrd-anybasocr-cropversion 0.0.5). I didn't specify any parameters except region level forocrd-cis-ocropy-deskewandocrd-cis-ocropy-segment`

Okay, sadly, ocrd_all hasn't updated the modules for some time now (because we want to migrate all OCR-D/spec#164 changes at once). So the difference might be caused by ocrd-cis-ocropy-binarize's DPI zoom change, checking...

@bertsky
Copy link
Collaborator

bertsky commented Aug 21, 2020

I'm using ocrd_all natively, last commit ca24263 (ocrd-olena-binarize version 1.2.0 and ocrd-anybasocr-cropversion 0.0.5). I didn't specify any parameters except region level forocrd-cis-ocropy-deskewandocrd-cis-ocropy-segment`

Okay, sadly, ocrd_all hasn't updated the modules for some time now (because we want to migrate all OCR-D/spec#164 changes at once). So the difference might be caused by ocrd-cis-ocropy-binarize's DPI zoom change, checking...

Cannot reproduce with current ocrd/all:maximum (built from OCR-D/ocrd_all@5413688 which has identical submodules to your native OCR-D/ocrd_all@ca24263).

@EEngl52
Copy link
Author

EEngl52 commented Aug 21, 2020

ok, tried it again now. For whatever reason it works if I only process this single image but it keeps failing if I try to process the whole book
mets.zip

@bertsky
Copy link
Collaborator

bertsky commented Aug 21, 2020

ok, tried it again now. For whatever reason it works if I only process this single image but it keeps failing if I try to process the whole book

all I need is an image file which fails …

@EEngl52
Copy link
Author

EEngl52 commented Aug 21, 2020

file-max-idp140325664

@bertsky
Copy link
Collaborator

bertsky commented Aug 21, 2020

But that's the same as above!

To test your hypothesis that it happens only on non-first pages in a sequence, I re-added this page as another. Cannot reproduce it with this setup.

As to your METS: I need the images of course! I can see only local JPG references in MAX, but DEFAULT has some remote URLs. Is that the right fileGrp?

@bertsky
Copy link
Collaborator

bertsky commented Sep 14, 2020

Thanks @EEngl52 for sharing the METS and images! I can reproduce now. The problem seems to be an instance of what I described in point 3 of OCR-D/ocrd_segment#43 – namely that rounding (here: when converting the line polygon from relative to absolute coordinates via coordinate_for_segment) can make a valid Polygon shape invalid (self-intersect). I will try to make an analogous fix here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants