browse ocrd in Docker

Running browse-ocrd in a Docker container

The Python package browse-ocrd is a GTK based viewer for OCR-D mets.xml files.

Setting up GTK on windows is a bit tedious and tends to clutter your os with libraries.

So to test browse-ocrd on windows you can put the GTK application into a Docker container and use the browser based GTK backend called Broadway.

Build setup

Note that this is just an experiment and not a production ready setup!

We are using the official Debian based Python 3.7 image as our base image.

The browse-ocrd repository suggests using Ubuntu 18.04 (ships with Python 3.6) but also requires Python 3.7. So using python:3.7 saves us from having to install Python 3.7 on the ubuntu:18.04 Docker image.

We then install the packages mentioned in the installation instructions from browse-ocrd and add libgtk-3-bin to be able to use the Broadway backend.
The browse-ocrd installation we handle via pip.

Using the 2020-resolver and updating setuptools avoids some package installation problems.

To setup broadway we have to set the two environment variables GDK_BACKEND and BROADWAY_DISPLAY and expose the port 8085.
Starting Broadway and browse-ocrd is handled via a separate init.sh.

Dockerfile:

FROM python:3.7

RUN apt-get update \
    && apt-get install -y --no-install-recommends libcairo2-dev libgtk-3-bin libgtk-3-dev libglib2.0-dev libgtksourceview-3.0-dev libgirepository1.0-dev gir1.2-webkit2-4.0 pkg-config cmake \
    && pip3 install -U setuptools --use-feature=2020-resolver \
    && pip3 install browse-ocrd --use-feature=2020-resolver

ENV GDK_BACKEND broadway
ENV BROADWAY_DISPLAY :5

EXPOSE 8085

COPY init.sh /init.sh
RUN chmod +x /init.sh

CMD ["/init.sh"]

init.sh:

#!/usr/bin/env bash

set -x
nohup broadwayd :5 &
browse-ocrd /data/mets.xml

Build and run

You can then build and run the container:

docker build --tag browse-ocrd .
WORKSPACE=/path/to/your/data
docker run -it --rm -v ${WORKSPACE}:/data -w /data -p 8085:8085 browse-ocrd

Open the application with your browser on http://localhost:8085.

Source: https://gist.github.com/b2m/d29c9e5dba9658bb3e5aad6d6d93c3bb

Welcome to the OCR-D wiki, a companion to the OCR-D website.

Articles and tutorials

Discussions

Expert section on OCR-D- workflows

Particular workflow steps

Recommended workflows

Successful Workflows for Particular Material (Template)

Workflow Guide

Videos

Section on Ground Truth

Provide feedback

Saved searches