Tools for Visualizing (intermediate) OCR D results

PAGE Tools
METS Tools
- OCRD Browser
Image Viewer and Tools
- Feh
- Evince
- ImageMagick

The Page Viewer is a stand alone application for viewing page layout and text content of segmentation ground truth and results of page recognition/OCR systems. The natively supported file format is PAGE XML. However, ALTO XML, FineReader XML, and HOCR can be opened as well.

The viewer shows the page layout as a transparent overlay on the document image. Text content and object attributes are displayed as tooltips.

The Page Viewer requires a Java Runtime Environment version 6 or later. Both 32 and 64 bit installations are supported. Supported platforms are: Windows, Linux, and MacOS. (https://www.primaresearch.org/tools/PAGEViewer)

Installation

download a pre-built release from Github
unzip somewhere
copy/symlink the startup script from your platform's subdirectory to your search PATH, probably adding --resolve-dir $PWD (or similar) to the arguments (in order to make PageViewer resolve relative image paths w.r.t. the current working directory instead of the XML file – which is more useful for OCR-D workspaces).
For example, on Linux, add this to your ~/.bash_aliases or ~/.bashrc:

alias jpageviewer='java -jar ~/path/to/JPageViewer\ 1.4\ \(Linux\,\ 64\ bit\)/JPageViewer.jar --resolve-dir $PWD'

Usage

    # cd into workspace directory
    jpageviewer OCR-D-SEG-TESS/PAGE1.xml

(Then continue with the Open button, navigating to the next PAGE file, or close the UI and start new instance on the shell.)

Advantages

Scheme support: all PAGE versions, but also ALTO
Shows fully recursive regions, including reading order
Shows all hierarchy levels from Border to Glyph
Platforms: Win, Linux, Mac
Recommended usage: viewing

Drawbacks

Bugs related to zooming (which breaks tooltips)
Sometimes does not open the document if PAGE is not sufficient (although valid) – without error message indicating cause
Does not show AlternativeImage content
Does not rotate image according to annotated skew in the Page-XML file
Fixed colour scheme
No METS or directory navigation (pages have to be opened individually)

Aletheia

Aletheia is an advanced system for accurate and yet cost-effective analysis, recognition and annotation of scanned documents. It aids the user with a number of automated and semi-automated tools which were developed and fine-tuned based on feedback from major libraries across Europe and from their digitisation service providers which are using it in a production environment.

Cutting-edge features are, among others, the support of top-down ground truthing with sophisticated split and shrink tools as well as bottom-up ground truthing supporting the aggregation of lower-level elements to more complex structures. The integrated rules and guidelines validator, in combination with powerful correction tools, enable efficient production of highly accurate ground truth as well as standardised electronic renditions of digitised documents.

In addition, special features such as a customisable virtual keyboard and the Aletheia Sans font with extensive coverage of special characters in Unicode have been developed to support working with the complexities of historical documents. (https://www.primaresearch.org/tools/Aletheia)

Aletheia is available either as a free Lite version (only requires registration via Email) or as a Pro version (annual paid subscription, added features and support).

See also the feature comparison for both versions.

Installation

unzip somewhere
run Aletheia.exe

Advantages

Scheme support: all PAGE versions, but also ALTO
Shows fully recursive regions, including reading order
Shows all hierarchy levels from Border to Glyph
Offers lots of check/fixup tools for consistency
Platforms: Win
Recommended usage: editing and viewing
Some directory navigation (pages have to be opened collectively)

Drawbacks

~~Does not show AlternativeImage content~~
~~Does not rotate image according to annotated skew~~
Fixed colour scheme
No OCR-D METS navigation, but Aletheia uses its own METS format

Transkribus

Service platform for collaborative creation of HTR and OCR.
Provides an desktop client for advanced use and a lightweight browser version with reduced functionality
Requires signing up with READ-COOP to use

Installation

Desktop client: https://readcoop.eu/transkribus/download/
Lite: Requires registration

Advantages

Supports editing polygons, Tables and structural Metadata-Annotations

Drawbacks

Not Free Software, recognition backend and trained models are proprietary
Commercial software
Does not support recent PAGE versions
Produces invalid PAGE-XML because of extensions in the same namespace
(which can be repaired via transkribus-to-prima, though)
Enhanced name matching for images and corresponding OCR-Files

Transkribus SWT-Client

Transkribus SWT-Client is an open source alternative client based on the Transkribus desktop client.

Installation

Requires local build. For detailed instructions, please see the project's README.

Advantages

Supports editing polygons, Tables and structural Metadata-Annotations
No Registration required, only local working mode
Platforms: Win, Linux, Mac with recent OpenJDK included (Win64)
Imports recent ALTO 3.0+ with ComponentBlock elements from Tesseract-OCR 4.x+
Imports recent PAGE 2019
Enhanced name matching for images and corresponding OCR-Files

Drawbacks

Only supports export to the older PAGE-XML 2013 format with extensions in the same namespace
(which can be repaired via transkribus-to-prima, though)
only region-line-word hierarchy, no glyphs or super regions possible

LAREX

Installation

native: as described the README
Docker:
- docker pull bertsky/larex and then as described here, e.g. docker run --rm -u 0:$GROUPS -p 8080:8080 -v path/to/workspace:/data bertsky/larex
- docker pull maxnth/larex and then as described here, e.g. docker run --rm -u 0:$GROUPS -p 8080:8080 -v path/to/workspace:/home/books -v path/to/larex.config:/larex.config maxnth/larex

Usage

go to http://localhost:8080/Larex with your browser (preferably Chrome/chromium)

Advantages

Very efficient for large amounts of pages (fast, has keyboard shortcuts for everything), esp. for text correction
Offers custom auto-segmentation, including reading order
Variable colour scheme
Platforms: Linux or Docker-capable
Recommended usage: editing and viewing

Drawbacks

Does not show Border or hierarchy levels below TextLine
Does not show recursive regions (e.g. table contents)
~~Does not show AlternativeImage content~~ (fixed in current dev version / v0.6)
~~Does not rotate image according to annotated skew~~ (fixed in current dev version / v0.6)
~~No direct METS navigation (custom, flat bookpath directory structure which needs to be exported from OCR-D fileGrps via ocrd-export-larex)~~ (fixed in current dev version / v0.6)

nw-page-editor

nw-page-editor is an application for editing ground truth information for diverse purposes related to the areas of document processing and text recognition. The edition is done interactively and visually on top of images of scanned documents. Additionally the app supports many keyboard shortcuts to allow more efficient editing, see section Application usage shortcuts.

The app is available in two variants. The first variant is as a desktop application based on the NW.js framework thus making it cross-platform. The second variant is as a web application that allows remote editing by multiple users and can be easily setup via a docker container. (https://github.com/mauvilsa/nw-page-editor)

Installation

Advantages

Scheme support: PAGE XML Version [http://schema.primaresearch.org/PAGE/gts/pagecontent/2013-07-15] and property extensions (https://github.com/omni-us/pageformat)
Platforms: Win, Linux, Mac
Recommended usage: ~~editing and~~ viewing

Drawbacks

Custom PAGE extensions when editing

METS Tools

OCRD Browser

An extensible viewer for OCRD mets.xml files (https://github.com/hnesk/browse-ocrd)

Installation

    sudo make deps-ubuntu
    pip install browse-ocrd

Usage

    browse-ocrd path/to/mets.xml # or open METS interactively

Advantages

Scheme support: OCR-D METS conventions (https://ocr-d.de/en/spec/mets)
Shows pages on all fileGrps, including AlternativeImages (on all hierarchy levels)
Shows segmentation (in PageViewer-like colour scheme), with
- structural elements selectable (Border, ReadingOrder, Region, TextLine, Baseline, Word, Glyph)
- mouse-over element ID and content, and exact coordinates
- warnings where polygon path is invalid
- AlternativeImages (including cropped and deskewed)
Shows concatenated text
Shows raw PAGE-XML with syntax highlighting
Can start JPageViewer for current PAGE-XML
Shows rendered HTML (from ocrd-dinglehopper comparison reports)
Allows fast zooming into/out of images or text
Can show multiple pages or views next to each other
Gives page/segment IDs in mouse-over tooltips
Platforms: Linux
Recommended usage: viewing

Drawbacks

No text search currently

Image Viewer and Tools

Feh

feh is an X11 image viewer aimed mostly at console users. Unlike most other viewers, it does not have a fancy GUI, but simply displays images. It is controlled via commandline arguments and configurable key/mouse actions. (https://feh.finalrewind.org/)

Installation

    sudo apt install feh

Usage

    # cd into workspace directory
    feh OCR-D-IMG-BIN/

Advantages

Exact zoom interpolation
Extensive keyboard shortcuts
Allows keeping zoom level across pages
Very versatily and fast
Can browse multiple files, including thumbnail mode

Drawbacks

No multi-page TIFF display

Evince

Installation

    sudo apt install evince

Usage

    # cd into workspace directory
    evince OCR-D-IMG-BIN/PAGE1.png

Advantages

Has multi-page TIFF display

Drawbacks

Artefacts and/or decreased sharpness in zoom interpolation
Cannot browse multiple files

ImageMagick

Use ImageMagick® to create, edit, compose, or convert bitmap images. It can read and write images in a variety of formats (over 200) including PNG, JPEG, GIF, HEIC, TIFF, DPX, EXR, WebP, Postscript, PDF, and SVG. ImageMagick can resize, flip, mirror, rotate, distort, shear and transform images, adjust image colors, apply various special effects, or draw text, lines, polygons, ellipses and Bézier curves.

Installation

    sudo apt install imagemagick

Usage

    # cd into workspace directory
    identify -verbose OCR-D-IMG/*.tiff
    compare OCR-D-IMG-BIN1/PAGE1.png OCR-D-IMG-BIN2/PAGE1.png PAGE1-BIN1-BIN2.png
    display OCR-D-IMG-BIN1/PAGE1.png OCR-D-IMG-BIN2/PAGE1.png PAGE1-BIN1-BIN2.png

Advantages

Query images with identify
Compare images with compare
View images with display
Process images with convert

Drawbacks

Welcome to the OCR-D wiki, a companion to the OCR-D website.

Articles and tutorials

Discussions

Expert section on OCR-D- workflows

Particular workflow steps

Recommended workflows

Successful Workflows for Particular Material (Template)

Workflow Guide

Videos

Section on Ground Truth

Tools for Visualizing (intermediate) OCR D results

PAGE Tools

Installation

Usage

Advantages

Drawbacks

Installation

Advantages

Drawbacks

Installation

Advantages

Drawbacks

Installation

Advantages

Drawbacks

Installation

Usage

Advantages

Drawbacks

Installation

Advantages

Drawbacks

METS Tools

Installation

Usage

Advantages

Drawbacks

Image Viewer and Tools

Installation

Usage

Advantages

Drawbacks

Installation

Usage

Advantages

Drawbacks

Installation

Usage

Advantages

Drawbacks

Clone this wiki locally