Workflow Guide binarization

All the images should be binarized right at the beginning of your workflow. Many of the following processors require binarized images. Some implementations (for deskewing, segmentation or recognition) may produce better results using the original image. But these can always retrieve the raw image instead of the binarized version automatically.

In this processing step, a scanned colored /gray scale document image is taken as input and a black and white binarized image is produced. This step should separate the background from the foreground.

Note: Binarization tools usually provide a threshold parameter which allows you to increase or decrease the weight of the foreground. This is optional and can be especially useful for images which have not been enhanced.

Available processors

Processor	Parameter	Remark	Call
ocrd-olena-binarize	`-P impl wolf -P k 0.10`	Fast	`ocrd-olena-binarize -I OCR-D-IMG -O OCR-D-BIN`
ocrd-cis-ocropy-binarize	`-P threshold 0.1`	Fast	`ocrd-cis-ocropy-binarize -I OCR-D-IMG -O OCR-D-BIN`
ocrd-sbb-binarize	`-P model`	Recommended; pre-trained models can be downloaded from here or via the OCR-D resource manager	`ocrd-sbb-binarize -I OCR-D-IMG -O OCR-D-BIN -P model modelname`
ocrd-skimage-binarize	`-P k 0.10`	Slow	`ocrd-skimage-binarize -I OCR-D-IMG -O OCR-D-BIN`
ocrd-doxa-binarize	`-P algorithm ISauvola`	Fast	`ocrd-doxa-binarize -I OCR-D-IMG -O OCR-D-BIN`

Notes on parameter usage

E.g.

which parameters do you use with what values?
which parameters are insufficiently documented?
which aspects of a processor should be parameterizable but are not?

Notes on document-specific usage

E.g. which processors worked best with what material? -- feel free to post sample images here, too.

Welcome to the OCR-D wiki, a companion to the OCR-D website.

Articles and tutorials

Discussions

Expert section on OCR-D- workflows

Particular workflow steps

Recommended workflows

Successful Workflows for Particular Material (Template)

Workflow Guide

Videos

Section on Ground Truth

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workflow Guide binarization

Available processors

Notes on parameter usage

Notes on document-specific usage

Clone this wiki locally