RFC: Debian/Ubuntu packaging of ocrd_all components and OCR models #130

kba · 2020-07-23T13:25:24Z

Now that a solution to the conflicting dependency problem is imminent, we should discuss how we can reduce build times and simplify management of OCR models by supporting OS package management.

I see three areas where package management can improve ocrd_all:

Providing packages for processors with full dependencies, e.g. with AppImage as @stweil proposed.
Providing packages for compile-intensiv packages, i.e. tesseract and olena
Packaging models, like the GT4HistOCR-based ones, for tesseract, calamari, ocropy and kraken

Ad 1.: The only way this can work without creating system-wide dependency conflicts would be basically a repackaging of the maximum docker image. This is also of interest and AppImage is probably a good solution

Ad 2.: Since the scope is limited (tesseract and olena), @mikegerber has already built debian/ubuntu packages for olena and @AlexanderP builds tesseract for Launchpad's PPA, this would be relatively straightforward

Ad 3.: For tesseract models we can take the official tesseract-ocr-* models as a blueprint. ocropy and kraken models can also be packaged relatively easy. For calamari models, we should probably agree on a convention where and how models should be stored (ping @maxnth @andbue @chreul if you have already ideas/plans in that regard)

The model packaging in particular would be of benefit also outside the OCR-D "ecosphere".

My questions for the ocrd_all users/developers:

Which of the three approaches are worth exploring in your opinion?
Who has experience in Debian/Ubuntu packaging and can help with setting up the tooling necessary?
How should we distribute the models? PPA seems like a straightforward choice but only supports Ubuntu (?) not Debian. Another proposal was https://packagecloud.io. Or could we build a repository as a GitHub pages static site or use GitHub releases as a pseudo-repository?

Feedback and pointers to solutions are very welcome.

The text was updated successfully, but these errors were encountered:

mikegerber · 2020-07-23T15:33:41Z

Q&D ocrd AppImage to be built with pkg2appimage:

# Based on https://github.com/AppImage/pkg2appimage/blob/9249a99e653272416c8ee8f42cecdde12573ba3e/recipes/ProcDump.yml


app: ocrd

ingredients:
  dist: bionic
  sources:
    - deb http://us.archive.ubuntu.com/ubuntu/ bionic bionic-updates bionic-security main universe
    - deb http://us.archive.ubuntu.com/ubuntu/ bionic-updates main universe
    - deb http://us.archive.ubuntu.com/ubuntu/ bionic-security main universe
  packages:
    - python3.6-venv
  script:

script:
  - virtualenv --python=python3 usr
  - ./usr/bin/pip3 install ocrd
  - ./usr/bin/pip3 freeze | grep "^ocrd==" | cut -d "=" -f 3 > ../VERSION

  # XXX at least pkg2appimage needs a desktop file and an icon, might want to use something
  # else to build, but this is a POC, so...
  - mkdir -p usr/share/applications/
  - cat > usr/share/applications/ocrd.desktop <<\EOF
  - [Desktop Entry]
  - Name=ocrd
  - Exec=ocrd
  - Icon=ocrd
  - Comment=OCR-D core
  - Categories=Office;
  - Type=Application
  - Terminal=true
  - EOF
  - touch usr/share/icons/hicolor/512x512/apps/ocrd.png # FIXME
  - cp usr/share/icons/hicolor/512x512/apps/ocrd.png .
  - cp usr/share/applications/ocrd.desktop .

This has some quirks like .desktop and the icon and the handling of the working directory, but it was pleasingly easy to build this:

% ~/devel/app-image-ocrd/out/ocrd-2.12.2.glibc2.3.3-x86_64.AppImage workspace -d /tmp/actevedef_718448162 get-id 
http://resolver.staatsbibliothek-berlin.de/SBB00008F1000000000

(ugly bagit.py error message removed)

mikegerber · 2020-07-29T09:03:36Z

My opinion(!) on this:

If OCR-D has everything either

pip installable (for Python source)
apt installable on Ubuntu LTS (everything else)
a. OCR-D things not covered by pip
b. binary dependencies like Olena or Tesseract

then - with a little experience - it is easy to build and maintain dependency-isolated AppImages or Docker containers. I would aim for this situation.

This way it's possible to:

Just put an AppImage into /usr/local/bin and have a working processor
If you choose so, you can still have it wild and install everything "by hand"

Packaging everything into classical Ubuntu packages will produce the same Gordian knot of dependency problems as the original ocrd_all concept. (I call it Gordian knot because I am currently upgrading ocrd_calamari to TF2 and now need TF2.3 to solve some issues → I am sure some other processor will have issues with that.)

(There are some quirks with AppImage we should have a look at, but it looks really good.)

mikegerber · 2020-07-29T09:28:55Z

(My fat container approach https://travis-ci.org/github/mikegerber/my_ocrd_workflow has the same Gordian knot, I just include fewer processors.)

mikegerber · 2020-07-30T18:51:08Z

And you can then still stick an AppImage into a Ubuntu package. It's a bit perverse but easy to do.

(Needs a bit more work if you have e.g. a classical ocrd_olena package and then another one that includes everything as an AppImage.)

mikegerber mentioned this issue Jul 23, 2020

[RFC] AppImage for OCR-D #106

Open

cneud mentioned this issue Jul 28, 2020

Building conda packages for OCR-D OCR-D/core#528

Open

mikegerber mentioned this issue Jul 31, 2020

Allow building with thin module Docker containers #69

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Debian/Ubuntu packaging of ocrd_all components and OCR models #130

RFC: Debian/Ubuntu packaging of ocrd_all components and OCR models #130

kba commented Jul 23, 2020

mikegerber commented Jul 23, 2020 •

edited

Loading

mikegerber commented Jul 29, 2020 •

edited

Loading

mikegerber commented Jul 29, 2020 •

edited

Loading

mikegerber commented Jul 30, 2020

RFC: Debian/Ubuntu packaging of ocrd_all components and OCR models #130

RFC: Debian/Ubuntu packaging of ocrd_all components and OCR models #130

Comments

kba commented Jul 23, 2020

mikegerber commented Jul 23, 2020 • edited Loading

mikegerber commented Jul 29, 2020 • edited Loading

mikegerber commented Jul 29, 2020 • edited Loading

mikegerber commented Jul 30, 2020

mikegerber commented Jul 23, 2020 •

edited

Loading

mikegerber commented Jul 29, 2020 •

edited

Loading

mikegerber commented Jul 29, 2020 •

edited

Loading