-
Notifications
You must be signed in to change notification settings - Fork 7
Tools for Visualizing (intermediate) OCR D results
The Page Viewer is a stand alone application for viewing page layout and text content of segmentation ground truth and results of page recognition/OCR systems. The natively supported file format is PAGE XML. However, ALTO XML, FineReader XML, and HOCR can be opened as well.
The viewer shows the page layout as a transparent overlay on the document image. Text content and object attributes are displayed as tooltips.
The Page Viewer requires a Java Runtime Environment version 6 or later. Both 32 and 64 bit installations are supported. Supported platforms are: Windows, Linux, and MacOS. (https://www.primaresearch.org/tools/PAGEViewer)
- download a pre-built release from Github
- unzip somewhere
- copy/symlink the startup script from your platform's subdirectory to your search
PATH
, probably adding--resolve-dir $PWD
(or similar) to the arguments (in order to make PageViewer resolve relative image paths w.r.t. the current working directory instead of the XML file – which is more useful for OCR-D workspaces).
For example, on Linux, add this to your~/.bash_aliases
or~/.bashrc
:
alias jpageviewer='java -jar ~/path/to/JPageViewer\ 1.4\ \(Linux\,\ 64\ bit\)/JPageViewer.jar --resolve-dir $PWD'
# cd into workspace directory
jpageviewer OCR-D-SEG-TESS/PAGE1.xml
(Then continue with the Open button, navigating to the next PAGE file, or close the UI and start new instance on the shell.)
- Scheme support: all PAGE versions, but also ALTO
- Shows fully recursive regions, including reading order
- Shows all hierarchy levels from Border to Glyph
- Platforms: Win, Linux, Mac
- Recommended usage: viewing
- Bugs related to zooming (which breaks tooltips)
- Sometimes does not open the document if PAGE is not sufficient (although valid) – without error message indicating cause
- Does not show AlternativeImage content
- Does not rotate image according to annotated skew in the Page-XML file
- Fixed colour scheme
- No METS or directory navigation (pages have to be opened individually)
Aletheia is an advanced system for accurate and yet cost-effective analysis, recognition and annotation of scanned documents. It aids the user with a number of automated and semi-automated tools which were developed and fine-tuned based on feedback from major libraries across Europe and from their digitisation service providers which are using it in a production environment.
Cutting-edge features are, among others, the support of top-down ground truthing with sophisticated split and shrink tools as well as bottom-up ground truthing supporting the aggregation of lower-level elements to more complex structures. The integrated rules and guidelines validator, in combination with powerful correction tools, enable efficient production of highly accurate ground truth as well as standardised electronic renditions of digitised documents.
In addition, special features such as a customisable virtual keyboard and the Aletheia Sans font with extensive coverage of special characters in Unicode have been developed to support working with the complexities of historical documents. (https://www.primaresearch.org/tools/Aletheia)
Aletheia is available either as a free Lite version (only requires registration via Email) or as a Pro version (annual paid subscription, added features and support).
See also the feature comparison for both versions.
- unzip somewhere
- run
Aletheia.exe
- Scheme support: all PAGE versions, but also ALTO
- Shows fully recursive regions, including reading order
- Shows all hierarchy levels from Border to Glyph
- Offers lots of check/fixup tools for consistency
- Platforms: Win
- Recommended usage: editing and viewing
- Some directory navigation (pages have to be opened collectively)
Does not show AlternativeImage contentDoes not rotate image according to annotated skew- Fixed colour scheme
- No OCR-D METS navigation, but Aletheia uses its own METS format
- Service platform for collaborative creation of HTR and OCR.
- Provides an desktop client for advanced use and a lightweight browser version with reduced functionality
- Requires signing up with READ-COOP to use
- Desktop client: https://readcoop.eu/transkribus/download/
- Lite: Requires registration
- Supports editing polygons, Tables and structural Metadata-Annotations
- Not Free Software, recognition backend and trained models are proprietary
- Commercial software
- Does not support recent PAGE versions
- Produces invalid PAGE-XML because of extensions in the same namespace
(which can be repaired via transkribus-to-prima, though) - Enhanced name matching for images and corresponding OCR-Files
Transkribus SWT-Client is an open source alternative client based on the Transkribus desktop client.
Requires local build. For detailed instructions, please see the project's README.
- Supports editing polygons, Tables and structural Metadata-Annotations
- No Registration required, only local working mode
- Platforms: Win, Linux, Mac with recent OpenJDK included (Win64)
- Imports recent ALTO 3.0+ with
ComponentBlock
elements from Tesseract-OCR 4.x+ - Imports recent PAGE 2019
- Enhanced name matching for images and corresponding OCR-Files
- Only supports export to the older PAGE-XML 2013 format with extensions in the same namespace
(which can be repaired via transkribus-to-prima, though) - only region-line-word hierarchy, no glyphs or super regions possible
- native: as described the README
- Docker:
-
docker pull bertsky/larex
and then as described here, e.g.docker run --rm -u 0:$GROUPS -p 8080:8080 -v path/to/workspace:/data bertsky/larex
-
docker pull maxnth/larex
and then as described here, e.g.docker run --rm -u 0:$GROUPS -p 8080:8080 -v path/to/workspace:/home/books -v path/to/larex.config:/larex.config maxnth/larex
-
- go to
http://localhost:8080/Larex
with your browser (preferably Chrome/chromium)
- Very efficient for large amounts of pages (fast, has keyboard shortcuts for everything), esp. for text correction
- Offers custom auto-segmentation, including reading order
- Variable colour scheme
- Platforms: Linux or Docker-capable
- Recommended usage: editing and viewing
- Does not show Border or hierarchy levels below TextLine
- Does not show recursive regions (e.g. table contents)
-
Does not show AlternativeImage content(fixed in currentdev
version / v0.6) -
Does not rotate image according to annotated skew(fixed in currentdev
version / v0.6) -
No direct METS navigation (custom, flat(fixed in currentbookpath
directory structure which needs to be exported from OCR-D fileGrps viaocrd-export-larex
)dev
version / v0.6)
nw-page-editor is an application for editing ground truth information for diverse purposes related to the areas of document processing and text recognition. The edition is done interactively and visually on top of images of scanned documents. Additionally the app supports many keyboard shortcuts to allow more efficient editing, see section Application usage shortcuts.
The app is available in two variants. The first variant is as a desktop application based on the NW.js framework thus making it cross-platform. The second variant is as a web application that allows remote editing by multiple users and can be easily setup via a docker container. (https://github.com/mauvilsa/nw-page-editor)
- Scheme support: PAGE XML Version [http://schema.primaresearch.org/PAGE/gts/pagecontent/2013-07-15] and property extensions (https://github.com/omni-us/pageformat)
- Platforms: Win, Linux, Mac
- Recommended usage:
editing andviewing
- Custom PAGE extensions when editing
An extensible viewer for OCRD mets.xml files (https://github.com/hnesk/browse-ocrd)
sudo make deps-ubuntu
pip install browse-ocrd
browse-ocrd path/to/mets.xml # or open METS interactively
- Scheme support: OCR-D METS conventions (https://ocr-d.de/en/spec/mets)
- Shows pages on all fileGrps, including AlternativeImages (on all hierarchy levels)
- Shows segmentation (in PageViewer-like colour scheme), with
- structural elements selectable (Border, ReadingOrder, Region, TextLine, Baseline, Word, Glyph)
- mouse-over element ID and content, and exact coordinates
- warnings where polygon path is invalid
- AlternativeImages (including cropped and deskewed)
- Shows concatenated text
- Shows raw PAGE-XML with syntax highlighting
- Can start JPageViewer for current PAGE-XML
- Shows rendered HTML (from
ocrd-dinglehopper
comparison reports) - Allows fast zooming into/out of images or text
- Can show multiple pages or views next to each other
- Gives page/segment IDs in mouse-over tooltips
- Platforms: Linux
- Recommended usage: viewing
- No text search currently
feh is an X11 image viewer aimed mostly at console users. Unlike most other viewers, it does not have a fancy GUI, but simply displays images. It is controlled via commandline arguments and configurable key/mouse actions. (https://feh.finalrewind.org/)
sudo apt install feh
# cd into workspace directory
feh OCR-D-IMG-BIN/
- Exact zoom interpolation
- Extensive keyboard shortcuts
- Allows keeping zoom level across pages
- Very versatily and fast
- Can browse multiple files, including thumbnail mode
- No multi-page TIFF display
sudo apt install evince
# cd into workspace directory
evince OCR-D-IMG-BIN/PAGE1.png
- Has multi-page TIFF display
- Artefacts and/or decreased sharpness in zoom interpolation
- Cannot browse multiple files
Use ImageMagick® to create, edit, compose, or convert bitmap images. It can read and write images in a variety of formats (over 200) including PNG, JPEG, GIF, HEIC, TIFF, DPX, EXR, WebP, Postscript, PDF, and SVG. ImageMagick can resize, flip, mirror, rotate, distort, shear and transform images, adjust image colors, apply various special effects, or draw text, lines, polygons, ellipses and Bézier curves.
sudo apt install imagemagick
# cd into workspace directory
identify -verbose OCR-D-IMG/*.tiff
compare OCR-D-IMG-BIN1/PAGE1.png OCR-D-IMG-BIN2/PAGE1.png PAGE1-BIN1-BIN2.png
display OCR-D-IMG-BIN1/PAGE1.png OCR-D-IMG-BIN2/PAGE1.png PAGE1-BIN1-BIN2.png
Welcome to the OCR-D wiki, a companion to the OCR-D website.
Articles and tutorials
- Running OCR-D on macOS
- Running OCR-D in Windows 10 with Windows Subsystem for Linux
- Running OCR-D on POWER8 (IBM pSeries)
- Running browse-ocrd in a Docker container
- OCR-D Installation on NVIDIA Jetson Nano and Xavier
- Mapping PAGE to ALTO
- Comparison of OCR formats (outdated)
- A Practicioner's View on Binarization
- How to use the bulk-add command to generate workspaces from existing files
- Evaluation of (intermediary) steps of an OCR workflow
- A quickstart guide to ocrd workspace
- Introduction to parameters in OCR-D
- Introduction to OCR-D processors
- Introduction to OCR-D workflows
- Visualizing (intermediate) OCR-D-results
- Guide to updating ocrd workspace calls for 2.15.0+
- Introduction to Docker in OCR-D
- How to import Abbyy-generated ALTO
- How to create ALTO for DFG Viewer
- How to create searchable fulltext data for DFG Viewer
- Setup native CUDA Toolkit for Qurator tools on Ubuntu 18.04
- OCR-D Code Review Guidelines
- OCR-D Recommendations for Using CI in Your Repository
Expert section on OCR-D- workflows
Particular workflow steps
Workflow Guide
- Workflow Guide: preprocessing
- Workflow Guide: binarization
- Workflow Guide: cropping
- Workflow Guide: denoising
- Workflow Guide: deskewing
- Workflow Guide: dewarping
- Workflow Guide: region-segmentation
- Workflow Guide: clipping
- Workflow Guide: line-segmentation
- Workflow Guide: resegmentation
- Workflow Guide: olr-evaluation
- Workflow Guide: text-recognition
- Workflow Guide: text-alignment
- Workflow Guide: post-correction
- Workflow Guide: ocr-evaluation
- Workflow Guide: adaptation-of-coordinates
- Workflow Guide: format-conversion
- Workflow Guide: generic transformations
- Workflow Guide: dummy processing
- Workflow Guide: archiving
- Workflow Guide: recommended workflows