ocrd-anybaseocr-tiseg not applying default wiring #54

sepastian · 2020-03-13T09:41:02Z

The --help of ocrd-anybaseocr-tiseg states a default wiring of ['OCR-D-IMG-CROP'] -> ['OCR-D-SEG-TISEG'].

root@38fa7aad0b43:/data/ocrd_workspace# ocrd-anybaseocr-tiseg --help
Using TensorFlow backend.

Usage: ocrd-anybaseocr-tiseg [OPTIONS]
  
  separate text and non-text part with anyBaseOCR

Options:
  -V, --version                   Show version
  -l, --log-level [OFF|ERROR|WARN|INFO|DEBUG|TRACE]
                                  Log level
  -J, --dump-json                 Dump tool description as JSON and exit
  -p, --parameter TEXT            Parameters, either JSON string or path
                                  JSON file
  -g, --page-id TEXT              ID(s) of the pages to process
  -O, --output-file-grp TEXT      File group(s) used as output.
  -I, --input-file-grp TEXT       File group(s) used as input.
  -w, --working-dir TEXT          Working Directory
  -m, --mets TEXT                 METS to process
  -h, --help                      This help message

Parameters:
  "operation_level" [string - page] PAGE XML hierarchy level to operate
      on Possible values: ["page", "region", "line"]

Default Wiring:
  ['OCR-D-IMG-CROP'] -> ['OCR-D-SEG-TISEG']

The workspace contains a file group named OCR-D-IMG-CROP, a corresponding folder exists.

root@38fa7aad0b43:/data/ocrd_workspace# ls -1
OCR-D-BINPAGE
OCR-D-CROP
OCR-D-DESKEW
OCR-D-IMG
OCR-D-IMG-BIN
OCR-D-IMG-CROP
OCR-D-IMG-DESKEW
mets.xml
root@38fa7aad0b43:/data/ocrd_workspace# ls -1
OCR-D-BINPAGE
OCR-D-CROP
OCR-D-DESKEW
OCR-D-IMG
OCR-D-IMG-BIN
OCR-D-IMG-CROP
OCR-D-IMG-DESKEW
mets.xml

I would expect that running orcd-anybaseocr-tiseg without any arguments would default to using OCR-D-IMG-CROP as input and OCR-D-SEG-TISEG as output. However, the program fails with the following error, because its using the non-existing INPUT as input and OUTPUT as output file group.

root@38fa7aad0b43:/data/ocrd_workspace# ocrd-anybaseocr-tiseg -m mets.xml 
Using TensorFlow backend.
09:22:34.382 INFO ocrd.workspace_validator - input_file_grp=['INPUT'] output_file_grp=['OUTPUT']
Traceback (most recent call last):
  File "/usr/bin/ocrd-anybaseocr-tiseg", line 8, in <module>
    sys.exit(ocrd_anybaseocr_tiseg())
  File "/usr/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/ocrd_anybaseocr/cli/cli.py", line 37, in ocrd_anybaseocr_tiseg
    return ocrd_cli_wrap_processor(OcrdAnybaseocrTiseg, *args, **kwargs)
  File "/usr/lib/python3.6/site-packages/ocrd/decorators.py", line 53, in ocrd_cli_wrap_processor
    raise Exception("Invalid input/output file grps:\n\t%s" % '\n\t'.join(report.errors))
Exception: Invalid input/output file grps:
        Input fileGrp[@USE='INPUT'] not in METS!

From what I can tell, this is due to class OcrdAnybaseocrTiseg(Processor) not overriding input_file_grp and output_file_grp in __init__, along the lines of:

kwargs['input_file_group'] = 'OCR-D-IMG-CROP'
kwargs['output_file_group'] = 'OCR-D-SEG-TISEG'

The text was updated successfully, but these errors were encountered:

bertsky · 2020-04-06T10:34:43Z

You are right, this should work as you expect. (At least as long as we keep describing it as default wiring.) But this has not been implemented yet in ocrd (the base package), cf. OCR-D/core#274.

You have to call with explicit input and output file groups for now.

kba transferred this issue from OCR-D/OLD_ocrd_anybaseocr Apr 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ocrd-anybaseocr-tiseg not applying default wiring #54

ocrd-anybaseocr-tiseg not applying default wiring #54

sepastian commented Mar 13, 2020

bertsky commented Apr 6, 2020

ocrd-anybaseocr-tiseg not applying default wiring #54

ocrd-anybaseocr-tiseg not applying default wiring #54

Comments

sepastian commented Mar 13, 2020

bertsky commented Apr 6, 2020