Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ocrd-anybaseocr-tiseg not applying default wiring #54

Open
sepastian opened this issue Mar 13, 2020 · 1 comment
Open

ocrd-anybaseocr-tiseg not applying default wiring #54

sepastian opened this issue Mar 13, 2020 · 1 comment

Comments

@sepastian
Copy link

The --help of ocrd-anybaseocr-tiseg states a default wiring of ['OCR-D-IMG-CROP'] -> ['OCR-D-SEG-TISEG'].

root@38fa7aad0b43:/data/ocrd_workspace# ocrd-anybaseocr-tiseg --help
Using TensorFlow backend.

Usage: ocrd-anybaseocr-tiseg [OPTIONS]
  
  separate text and non-text part with anyBaseOCR

Options:
  -V, --version                   Show version
  -l, --log-level [OFF|ERROR|WARN|INFO|DEBUG|TRACE]
                                  Log level
  -J, --dump-json                 Dump tool description as JSON and exit
  -p, --parameter TEXT            Parameters, either JSON string or path
                                  JSON file
  -g, --page-id TEXT              ID(s) of the pages to process
  -O, --output-file-grp TEXT      File group(s) used as output.
  -I, --input-file-grp TEXT       File group(s) used as input.
  -w, --working-dir TEXT          Working Directory
  -m, --mets TEXT                 METS to process
  -h, --help                      This help message

Parameters:
  "operation_level" [string - page] PAGE XML hierarchy level to operate
      on Possible values: ["page", "region", "line"]

Default Wiring:
  ['OCR-D-IMG-CROP'] -> ['OCR-D-SEG-TISEG']

The workspace contains a file group named OCR-D-IMG-CROP, a corresponding folder exists.

root@38fa7aad0b43:/data/ocrd_workspace# ls -1
OCR-D-BINPAGE
OCR-D-CROP
OCR-D-DESKEW
OCR-D-IMG
OCR-D-IMG-BIN
OCR-D-IMG-CROP
OCR-D-IMG-DESKEW
mets.xml
root@38fa7aad0b43:/data/ocrd_workspace# ls -1
OCR-D-BINPAGE
OCR-D-CROP
OCR-D-DESKEW
OCR-D-IMG
OCR-D-IMG-BIN
OCR-D-IMG-CROP
OCR-D-IMG-DESKEW
mets.xml

I would expect that running orcd-anybaseocr-tiseg without any arguments would default to using OCR-D-IMG-CROP as input and OCR-D-SEG-TISEG as output. However, the program fails with the following error, because its using the non-existing INPUT as input and OUTPUT as output file group.

root@38fa7aad0b43:/data/ocrd_workspace# ocrd-anybaseocr-tiseg -m mets.xml 
Using TensorFlow backend.
09:22:34.382 INFO ocrd.workspace_validator - input_file_grp=['INPUT'] output_file_grp=['OUTPUT']
Traceback (most recent call last):
  File "/usr/bin/ocrd-anybaseocr-tiseg", line 8, in <module>
    sys.exit(ocrd_anybaseocr_tiseg())
  File "/usr/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/ocrd_anybaseocr/cli/cli.py", line 37, in ocrd_anybaseocr_tiseg
    return ocrd_cli_wrap_processor(OcrdAnybaseocrTiseg, *args, **kwargs)
  File "/usr/lib/python3.6/site-packages/ocrd/decorators.py", line 53, in ocrd_cli_wrap_processor
    raise Exception("Invalid input/output file grps:\n\t%s" % '\n\t'.join(report.errors))
Exception: Invalid input/output file grps:
        Input fileGrp[@USE='INPUT'] not in METS!

From what I can tell, this is due to class OcrdAnybaseocrTiseg(Processor) not overriding input_file_grp and output_file_grp in __init__, along the lines of:

kwargs['input_file_group'] = 'OCR-D-IMG-CROP'
kwargs['output_file_group'] = 'OCR-D-SEG-TISEG'
@kba kba transferred this issue from OCR-D/OLD_ocrd_anybaseocr Apr 6, 2020
@bertsky
Copy link
Contributor

bertsky commented Apr 6, 2020

You are right, this should work as you expect. (At least as long as we keep describing it as default wiring.) But this has not been implemented yet in ocrd (the base package), cf. OCR-D/core#274.

You have to call with explicit input and output file groups for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants