-
Notifications
You must be signed in to change notification settings - Fork 7
Workflow Guide text recognition
This processor recognizes text in segmented lines.
An overview on the existing model repositories and short descriptions on the most important models can be found here.
We strongly recommend to use the OCR-D resource manager to download the models, as this way you don't have to specify the path to each model.
Processor | Parameter | Remarks | Call |
---|---|---|---|
ocrd-tesserocr-recognize |
-P model GT4HistOCR_50000000.997_191951
|
Recommended Model can be found here a faster variant is here |
TESSDATA_PREFIX="/test/data/tesseractmodels/" ocrd-tesserocr-recognize -I OCR-D-DEWARP-LINE -O OCR-D-OCR -P model Fraktur+Latin |
ocrd-calamari-recognize |
if you downloaded your model with the [OCR-D resource manager](https://ocr-d.de/en/models), use-P checkpoint_dir modelname else use -P checkpoint_dir /path/to/models
|
Recommended Model can be found here; For checkpoint you need to pass the local path on your hard drive as parameter value, and keep the verbatim asterisk (* ).
|
ocrd-calamari-recognize -I OCR-D-DEWARP-LINE -O OCR-D-OCR -P checkpoint_dir qurator-gt4histocr-1.0 |
Note: For ocrd-tesserocr
the environment variable TESSDATA_PREFIX
has
to be set to point to the directory where the used models are stored unless
the default directory (normally $VIRTUAL_ENV/share/tessdata) is used.
The directory should at least contain the following models:
deu.traineddata
, eng.traineddata
, osd.traineddata
.
Note: Faster models for tesserocr-recognize
are available from
https://ub-backup.bib.uni-mannheim.de/~stweil/ocrd-train/data/Fraktur_5000000/tessdata_fast/.
A good and currently the fastest model is
Fraktur-fast.
UB Mannheim provides many more models online
which were trained on different GT data sets, for example from
Austrian Newspapers.
Note: If you want to go on with the optional post correction, you should also set the textequiv_level
to glyph
or in the case of
ocrd-calamari-recognize
at least word
(which is already the default for ocrd-tesserocr-recognize
).
E.g.
- which parameters do you use with what values?
- which parameters are insufficiently documented?
- which aspects of a processor should be parameterizable but are not?
E.g. which processors worked best with what material? -- feel free to post sample images here, too.
Welcome to the OCR-D wiki, a companion to the OCR-D website.
Articles and tutorials
- Running OCR-D on macOS
- Running OCR-D in Windows 10 with Windows Subsystem for Linux
- Running OCR-D on POWER8 (IBM pSeries)
- Running browse-ocrd in a Docker container
- OCR-D Installation on NVIDIA Jetson Nano and Xavier
- Mapping PAGE to ALTO
- Comparison of OCR formats (outdated)
- A Practicioner's View on Binarization
- How to use the bulk-add command to generate workspaces from existing files
- Evaluation of (intermediary) steps of an OCR workflow
- A quickstart guide to ocrd workspace
- Introduction to parameters in OCR-D
- Introduction to OCR-D processors
- Introduction to OCR-D workflows
- Visualizing (intermediate) OCR-D-results
- Guide to updating ocrd workspace calls for 2.15.0+
- Introduction to Docker in OCR-D
- How to import Abbyy-generated ALTO
- How to create ALTO for DFG Viewer
- How to create searchable fulltext data for DFG Viewer
- Setup native CUDA Toolkit for Qurator tools on Ubuntu 18.04
- OCR-D Code Review Guidelines
- OCR-D Recommendations for Using CI in Your Repository
Expert section on OCR-D- workflows
Particular workflow steps
Workflow Guide
- Workflow Guide: preprocessing
- Workflow Guide: binarization
- Workflow Guide: cropping
- Workflow Guide: denoising
- Workflow Guide: deskewing
- Workflow Guide: dewarping
- Workflow Guide: region-segmentation
- Workflow Guide: clipping
- Workflow Guide: line-segmentation
- Workflow Guide: resegmentation
- Workflow Guide: olr-evaluation
- Workflow Guide: text-recognition
- Workflow Guide: text-alignment
- Workflow Guide: post-correction
- Workflow Guide: ocr-evaluation
- Workflow Guide: adaptation-of-coordinates
- Workflow Guide: format-conversion
- Workflow Guide: generic transformations
- Workflow Guide: dummy processing
- Workflow Guide: archiving
- Workflow Guide: recommended workflows