-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Debian/Ubuntu packaging of ocrd_all components and OCR models #130
Comments
Q&D ocrd AppImage to be built with
This has some quirks like .desktop and the icon and the handling of the working directory, but it was pleasingly easy to build this:
(ugly bagit.py error message removed) |
My opinion(!) on this: If OCR-D has everything either
then - with a little experience - it is easy to build and maintain dependency-isolated AppImages or Docker containers. I would aim for this situation. This way it's possible to:
Packaging everything into classical Ubuntu packages will produce the same Gordian knot of dependency problems as the original ocrd_all concept. (I call it Gordian knot because I am currently upgrading ocrd_calamari to TF2 and now need TF2.3 to solve some issues → I am sure some other processor will have issues with that.) (There are some quirks with AppImage we should have a look at, but it looks really good.) |
(My fat container approach https://travis-ci.org/github/mikegerber/my_ocrd_workflow has the same Gordian knot, I just include fewer processors.) |
And you can then still stick an AppImage into a Ubuntu package. It's a bit perverse but easy to do. (Needs a bit more work if you have e.g. a classical ocrd_olena package and then another one that includes everything as an AppImage.) |
Now that a solution to the conflicting dependency problem is imminent, we should discuss how we can reduce build times and simplify management of OCR models by supporting OS package management.
I see three areas where package management can improve ocrd_all:
Ad 1.: The only way this can work without creating system-wide dependency conflicts would be basically a repackaging of the
maximum
docker image. This is also of interest and AppImage is probably a good solutionAd 2.: Since the scope is limited (tesseract and olena), @mikegerber has already built debian/ubuntu packages for olena and @AlexanderP builds tesseract for Launchpad's PPA, this would be relatively straightforward
Ad 3.: For tesseract models we can take the official
tesseract-ocr-*
models as a blueprint. ocropy and kraken models can also be packaged relatively easy. For calamari models, we should probably agree on a convention where and how models should be stored (ping @maxnth @andbue @chreul if you have already ideas/plans in that regard)The model packaging in particular would be of benefit also outside the OCR-D "ecosphere".
My questions for the ocrd_all users/developers:
Feedback and pointers to solutions are very welcome.
The text was updated successfully, but these errors were encountered: