-
Notifications
You must be signed in to change notification settings - Fork 7
browse ocrd in Docker
The Python package browse-ocrd is a GTK based viewer for OCR-D mets.xml files.
Setting up GTK on windows is a bit tedious and tends to clutter your os with libraries.
So to test browse-ocrd on windows you can put the GTK application into a Docker container and use the browser based GTK backend called Broadway.
Note that this is just an experiment and not a production ready setup!
- We are using the official Debian based Python 3.7 image as our base image.
The browse-ocrd repository suggests using Ubuntu 18.04 (ships with Python 3.6) but also requires Python 3.7. So using python:3.7 saves us from having to install Python 3.7 on the ubuntu:18.04 Docker image.
- We then install the packages mentioned in the installation instructions from browse-ocrd and add libgtk-3-bin to be able to use the Broadway backend.
- The browse-ocrd installation we handle via pip.
Using the 2020-resolver and updating setuptools avoids some package installation problems.
- To setup broadway we have to set the two environment variables
GDK_BACKEND
andBROADWAY_DISPLAY
and expose the port8085
. - Starting Broadway and browse-ocrd is handled via a separate
init.sh
.
FROM python:3.7
RUN apt-get update \
&& apt-get install -y --no-install-recommends libcairo2-dev libgtk-3-bin libgtk-3-dev libglib2.0-dev libgtksourceview-3.0-dev libgirepository1.0-dev gir1.2-webkit2-4.0 pkg-config cmake \
&& pip3 install -U setuptools --use-feature=2020-resolver \
&& pip3 install browse-ocrd --use-feature=2020-resolver
ENV GDK_BACKEND broadway
ENV BROADWAY_DISPLAY :5
EXPOSE 8085
COPY init.sh /init.sh
RUN chmod +x /init.sh
CMD ["/init.sh"]
#!/usr/bin/env bash
set -x
nohup broadwayd :5 &
browse-ocrd /data/mets.xml
You can then build and run the container:
docker build --tag browse-ocrd .
WORKSPACE=/path/to/your/data
docker run -it --rm -v ${WORKSPACE}:/data -w /data -p 8085:8085 browse-ocrd
Open the application with your browser on http://localhost:8085.
Source: https://gist.github.com/b2m/d29c9e5dba9658bb3e5aad6d6d93c3bb
Welcome to the OCR-D wiki, a companion to the OCR-D website.
Articles and tutorials
- Running OCR-D on macOS
- Running OCR-D in Windows 10 with Windows Subsystem for Linux
- Running OCR-D on POWER8 (IBM pSeries)
- Running browse-ocrd in a Docker container
- OCR-D Installation on NVIDIA Jetson Nano and Xavier
- Mapping PAGE to ALTO
- Comparison of OCR formats (outdated)
- A Practicioner's View on Binarization
- How to use the bulk-add command to generate workspaces from existing files
- Evaluation of (intermediary) steps of an OCR workflow
- A quickstart guide to ocrd workspace
- Introduction to parameters in OCR-D
- Introduction to OCR-D processors
- Introduction to OCR-D workflows
- Visualizing (intermediate) OCR-D-results
- Guide to updating ocrd workspace calls for 2.15.0+
- Introduction to Docker in OCR-D
- How to import Abbyy-generated ALTO
- How to create ALTO for DFG Viewer
- How to create searchable fulltext data for DFG Viewer
- Setup native CUDA Toolkit for Qurator tools on Ubuntu 18.04
- OCR-D Code Review Guidelines
- OCR-D Recommendations for Using CI in Your Repository
Expert section on OCR-D- workflows
Particular workflow steps
Workflow Guide
- Workflow Guide: preprocessing
- Workflow Guide: binarization
- Workflow Guide: cropping
- Workflow Guide: denoising
- Workflow Guide: deskewing
- Workflow Guide: dewarping
- Workflow Guide: region-segmentation
- Workflow Guide: clipping
- Workflow Guide: line-segmentation
- Workflow Guide: resegmentation
- Workflow Guide: olr-evaluation
- Workflow Guide: text-recognition
- Workflow Guide: text-alignment
- Workflow Guide: post-correction
- Workflow Guide: ocr-evaluation
- Workflow Guide: adaptation-of-coordinates
- Workflow Guide: format-conversion
- Workflow Guide: generic transformations
- Workflow Guide: dummy processing
- Workflow Guide: archiving
- Workflow Guide: recommended workflows