You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Version: included in Docker Image ocrd/all from 2020-08-04 (docker image id: 158ea3d64eae)
Current Behavior:
When executing something like: docker run --rm -u "40366" -w /data -v "/home/aqayv/project/ulb-it-migration/WORKSPACE_OCR/203074":/data -v /usr/share/tesseract-ocr/4.00/tessdata:/usr/local/share/tessdata/ ocrd/all:2020-08-04 ocrd-make -f ulb-ocrd-vd18-02.mk .:
make: Entering directory '/data'
make -R -C . -I /data/ -f /data/ulb-ocrd-vd18-02.mk 2>&1 | tee ..ulb-ocrd-vd18-02.log
make[1]: Entering directory '/data'
building OCR-D-SEGMENT-OCROPY from OCR-D-CLIP with pattern rule for ocrd-cis-ocropy-segment
STAMP=`test -e OCR-D-SEGMENT-OCROPY && date -Ins -r OCR-D-SEGMENT-OCROPY`; ocrd-cis-ocropy-segment -I OCR-D-CLIP -p OCR-D-SEGMENT-OCROPY.json -O OCR-D-SEGMENT-OCROPY --overwrite 2>&1 | tee OCR-D-SEGMENT-OCROPY.log && touch -c OCR-D-SEGMENT-OCROPY || { if test -z "$STAMP"; then rm -fr OCR-D-SEGMENT-OCROPY; else touch -c -d "$STAMP" OCR-D-SEGMENT-OCROPY; fi; false; }
05:42:29.063 WARNING matplotlib - Matplotlib created a temporary config/cache directory at /.config/matplotlib because the default path (/tmp/matplotlib-ib2pg3_l) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
05:42:39.158 ERROR shapely.geos - TopologyException: Input geom 1 is invalid: Self-intersection at or near point 238 1073 at 238 1073
Traceback (most recent call last):
File "/usr/bin/ocrd-cis-ocropy-segment", line 8, in <module>
sys.exit(ocrd_cis_ocropy_segment())
File "/usr/lib/python3.6/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/usr/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/lib/python3.6/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/ocrd_cis/ocropy/cli.py", line 54, in ocrd_cis_ocropy_segment
return ocrd_cli_wrap_processor(OcropySegment, *args, **kwargs)
File "/usr/lib/python3.6/site-packages/ocrd/decorators.py", line 102, in ocrd_cli_wrap_processor
run_processor(processorClass, ocrd_tool, mets, workspace=workspace, **kwargs)
File "/usr/lib/python3.6/site-packages/ocrd/processor/base.py", line 61, in run_processor
processor.process()
File "/usr/lib/python3.6/site-packages/ocrd_cis/ocropy/segment.py", line 387, in process
region.id, file_id + '_' + region.id, zoom)
File "/usr/lib/python3.6/site-packages/ocrd_cis/ocropy/segment.py", line 653, in _process_element
line_polygon = polygon_for_parent(line_polygon, element)
File "/usr/lib/python3.6/site-packages/ocrd_cis/ocropy/segment.py", line 676, in polygon_for_parent
interp = childp.intersection(parentp)
File "/usr/lib/python3.6/site-packages/shapely/geometry/base.py", line 649, in intersection
return geom_factory(self.impl['intersection'](self, other))
File "/usr/lib/python3.6/site-packages/shapely/topology.py", line 70, in __call__
self._check_topology(err, this, other)
File "/usr/lib/python3.6/site-packages/shapely/topology.py", line 38, in _check_topology
self.fn.__name__, repr(geom)))
shapely.errors.TopologicalError: The operation 'GEOSIntersection_r' could not be performed. Likely cause is invalidity of the geometry <shapely.geometry.polygon.Polygon object at 0x7fca99544160>
Makefile:320: recipe for target 'OCR-D-SEGMENT-OCROPY' failed
make[1]: *** [OCR-D-SEGMENT-OCROPY] Error 1
make[1]: Leaving directory '/data'
make: *** [.] Error 2
Makefile:205: recipe for target '.' failed
make: Leaving directory '/data'
Expected Behavior:
Please do not crash, but log an Error and move on gracefully
Looks similar to #62 and OCR-D/ocrd_tesserocr#149. I'd very much like to hunt this down, but the problem is with the producers of invalid coordinates, we cannot make each and every consuming processor robust to that kind of error.
Looking into your workflow and PAGE results, there's a self-intersection in TextRegion region0010 with 238,1073 240,1935 1929,1931 1927,932 1719,932 1719,909 238,913 238,936 238,1074 238,1073 (see last 2 points). That region was introduced by ocrd-segment-repair (when reducing overlaps from bbox to polygon). I'll try to transfer the issue there and look what I can do.
Environment
ocrd/all
from 2020-08-04 (docker image id: 158ea3d64eae)Current Behavior:
When executing something like:
docker run --rm -u "40366" -w /data -v "/home/aqayv/project/ulb-it-migration/WORKSPACE_OCR/203074":/data -v /usr/share/tesseract-ocr/4.00/tessdata:/usr/local/share/tessdata/ ocrd/all:2020-08-04 ocrd-make -f ulb-ocrd-vd18-02.mk .
:Expected Behavior:
Please do not crash, but log an Error and move on gracefully
2020-09-10-bug-203074.zip
The text was updated successfully, but these errors were encountered: