-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft for slim containers #386
Changes from 1 commit
b7010f1
5e71acf
2fb2347
fa2a477
7736969
e866117
10725b9
c6da76e
2526006
3ca07bd
b001777
1047ba2
ab2d1c3
7834686
d53366c
c6e67d6
cfd1a79
7ce6096
a7c1328
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
export PYTHON ?= python3 | ||
VIRTUAL_ENV = $(CURDIR)/venv2 | ||
BIN = $(VIRTUAL_ENV)/bin | ||
ACTIVATE_VENV = $(BIN)/activate | ||
OCRD_MODULES = OCRD_CIS OCRD_TESSEROCR | ||
OCRD_CIS = ocrd-cis-ocropy-binarize ocrd-cis-ocropy-dewarp | ||
OCRD_TESSEROCR = ocrd-tesserocr-recognize ocrd-tesserocr-segment-region | ||
PROCESSORS = $(foreach mod,$(OCRD_MODULES),$(foreach proc,$($(mod)), $(proc) )) | ||
DELEGATORS = $(foreach proc,$(PROCESSORS),$(BIN)/$(proc)) | ||
|
||
slim-venv: docker-compose.yaml .env $(DELEGATORS) | $(VIRTUAL_ENV) | ||
|
||
|
||
# create a delegator to the processing server for the processor | ||
$(BIN)/ocrd-%: | $(VIRTUAL_ENV) | ||
@sed "s/{{\s*processor_name\s*}}/$(subst $(BIN)/,,$@)/" slim-containers-files/delegator_template.py > $@; | ||
@chmod u+x $@ | ||
|
||
|
||
$(VIRTUAL_ENV): $(ACTIVATE_VENV) | ||
. $(ACTIVATE_VENV) && $(MAKE) -C core install | ||
|
||
%/bin/activate: | ||
$(PYTHON) -m venv $(subst /bin/activate,,$@) | ||
. $@ && pip install --upgrade pip setuptools wheel | ||
|
||
# append the service to docker-compose for a processor | ||
add_proc = sed -e "s/{{\s*processor_name\s*}}/$1/" -e "s/{{\s*processor_group_name\s*}}/\L$2/" \ | ||
slim-containers-files/docker-compose.processor.template.yaml >> docker-compose.yaml; | ||
joschrew marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
docker-compose.yaml: | ||
@cat slim-containers-files/docker-compose.template.yaml > docker-compose.yaml | ||
@$(foreach mod,$(OCRD_MODULES),$(foreach proc,$($(mod)),$(call add_proc,$(proc),$(mod)))) | ||
|
||
.env: | ||
@rm -rf .env | ||
joschrew marked this conversation as resolved.
Show resolved
Hide resolved
|
||
@echo OCRD_PS_PORT=8000 >> .env | ||
@echo OCRD_PS_MTU=1300 >> .env | ||
@echo MONGODB_URL=mongodb://ocrd-mongodb:27017 >> .env | ||
joschrew marked this conversation as resolved.
Show resolved
Hide resolved
|
||
@echo RABBITMQ_URL=amqp://admin:admin@ocrd-rabbitmq:5672 >> .env | ||
joschrew marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
# I need this because i need the network-for-slim-branch and this contains a comment how to make | ||
# pudb run. | ||
FROM ocrd/core:latest AS base | ||
WORKDIR /build-ocrd | ||
RUN apt install vim-tiny -y | ||
joschrew marked this conversation as resolved.
Show resolved
Hide resolved
|
||
RUN git clone https://github.com/ocr-d/core.git && \ | ||
cd core && \ | ||
git checkout network-for-slim-prep && \ | ||
joschrew marked this conversation as resolved.
Show resolved
Hide resolved
|
||
#sed -i "290 i \ from pudb.remote import set_trace; set_trace(term_size=(160, 40), host='0.0.0.0', port=6900)" ocrd_network/ocrd_network/processing_server.py && \ | ||
joschrew marked this conversation as resolved.
Show resolved
Hide resolved
joschrew marked this conversation as resolved.
Show resolved
Hide resolved
|
||
make install-dev && \ | ||
pip install pudb | ||
EXPOSE 6900 | ||
joschrew marked this conversation as resolved.
Show resolved
Hide resolved
|
||
WORKDIR /data |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,24 @@ | ||||||
#!/usr/bin/env python | ||||||
import sys | ||||||
from pathlib import Path | ||||||
import subprocess | ||||||
|
||||||
# Later the address (or rather the port) should be dynamic | ||||||
processing_server_address = "http://localhost:8000" | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why not the exposed
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How does that work (I mean the thing with the hostname, I think I know how to set set Port via env)? When I am on the host it cannot resolve ocrd-processing-server out of the box. What do I have to change additionally to make it work to query the container from the host via its service/host name? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not from the host (network), but from the virtual Docker network (i.e. from inside another container). See Compose documentation. (The port then is the container internal port BTW.) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. After trying some things I think I cannot solve this properly. First the hostname cannot be the service name because the delegator is run from the host. And I cannot read the port from .env because I don't know the working dir where the delegator is executed from. So I decided to set the port dynamic in the Makefile like the processor-name. The proposed suggestion at least does not work. Commit: 7736969 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Oh, right! Sorry, weak thinking on my side.
Ok, good point. However, we could write all the .env values into the venv's %/bin/activate:
$(PYTHON) -m venv $(subst /bin/activate,,$@)
. $@ && pip install --upgrade pip setuptools wheel
@echo ". $(CURDIR)/.env" >> $@ IMO the .env should be the central source for configuration, so you should be able to modify it to your needs after it was generated. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see a problem with that approach, because it only works with the venv activated in bash (not csh or fish) and invoking executables directly would not work with this approach (e.g.: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Have reverted the last commit regarding this and decided to use the proposed approach |
||||||
processor_name = "{{ processor_name }}" | ||||||
|
||||||
args = list(sys.argv) | ||||||
if "-m" in args: | ||||||
joschrew marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
idx = args.index("-m") | ||||||
metspath = args[idx + 1] | ||||||
if Path(metspath).is_absolute(): | ||||||
print("absolute path is not supported") | ||||||
exit(1) | ||||||
args[idx + 1] = f"/data/{metspath}" | ||||||
joschrew marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
|
||||||
cmd = [ | ||||||
"ocrd", "network", "client", "processing", "processor", | ||||||
processor_name, "--address", processing_server_address | ||||||
] | ||||||
subprocess.run(cmd + args[1:]) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The Processing Server API is asynchronous, and so is its network client. So this CLI will not give the same user experience as the native CLI (for example, you cannot script these calls). Thus, either we add some callback mechanism here and block until the job is done, or we switch to the Processor Server API which is synchronous (but has no client CLI yet). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It could be as simple as passing PORT = None
async def serve_callback():
nonlocal PORT
# set up web server
def get(path):
raise Exception("done")
...
# run web server on arbitrary available port
PORT = ...
server = asyncio.create_task(serve_callback) Then after the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have added a waiting mechanism with the python build-in |
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,14 @@ | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
{{ processor_name }}: | ||||||||||||||||||||||||
extends: | ||||||||||||||||||||||||
file: slim-containers-files/{{ processor_group_name}}/docker-compose.yaml | ||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So IIUC in the final setup, when we have correct Dockerfiles and compose files in all modules, this will simply become |
||||||||||||||||||||||||
service: {{ processor_name }} | ||||||||||||||||||||||||
command: ocrd network processing-worker --database $MONGODB_URL --queue $RABBITMQ_URL --create-queue {{ processor_name }} | ||||||||||||||||||||||||
depends_on: | ||||||||||||||||||||||||
- ocrd-processing-server | ||||||||||||||||||||||||
- ocrd-mongodb | ||||||||||||||||||||||||
- ocrd-rabbitmq | ||||||||||||||||||||||||
# restart: The worker creates its queue but rabbitmq needs a few seconds to be available | ||||||||||||||||||||||||
Comment on lines
+7
to
+11
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If timing is an issue, I suggest to change the dependency type:
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The underlining problem is the following: The container for the queue is started and running, but it needs 1-3 seconds that queue creation is possible. But the processing worker tries to create it's queue right away. This suggestion ( For this PR to function some extension to core is needed anyway. There I want to add an optional queue-creation-timeout to the worker startup so that it waits a few seconds with adding its queue or to try again a few times. But this restart-fix was the fastest way to do that that's why it is here and I agree that it should be removed finally. I will remark this as solved as soon as the needed changes to core are made (which need one change to this PR as well). |
||||||||||||||||||||||||
restart: on-failure:3 | ||||||||||||||||||||||||
volumes: | ||||||||||||||||||||||||
- "$PWD/data:/data" |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
networks: | ||
default: | ||
driver: bridge | ||
driver_opts: | ||
com.docker.network.driver.mtu: ${OCRD_PS_MTU} | ||
|
||
services: | ||
ocrd-processing-server: | ||
build: | ||
# later real ocrd-core image should be referenced here | ||
dockerfile: slim-containers-files/Dummy-Core-Dockerfile | ||
args: | ||
BASE_IMAGE: ubuntu:20.04 | ||
joschrew marked this conversation as resolved.
Show resolved
Hide resolved
|
||
ports: | ||
- ${OCRD_PS_PORT}:8000 | ||
volumes: | ||
- "./slim-containers-files/ps-config.yaml:/ocrd-processing-server-config.yaml" | ||
joschrew marked this conversation as resolved.
Show resolved
Hide resolved
|
||
command: ocrd network processing-server -a 0.0.0.0:8000 /ocrd-processing-server-config.yaml | ||
|
||
ocrd-mongodb: | ||
image: mongo | ||
joschrew marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# Ports are only needed during the implementation phase to test. To be removed later | ||
ports: | ||
- "27018:27017" | ||
joschrew marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
ocrd-rabbitmq: | ||
image: rabbitmq:3-management | ||
# Ports are only needed during the implementation phase to test. To be removed later | ||
ports: | ||
- "5672:5672" | ||
- "15672:15672" | ||
joschrew marked this conversation as resolved.
Show resolved
Hide resolved
|
||
environment: | ||
- "RABBITMQ_DEFAULT_USER=admin" | ||
- "RABBITMQ_DEFAULT_PASS=admin" | ||
joschrew marked this conversation as resolved.
Show resolved
Hide resolved
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
FROM ocrd/core:latest AS base | ||
WORKDIR /build-ocrd | ||
# Remove the next RUN, this is only to checkout my branch while the changes are not in core yet | ||
RUN git clone https://github.com/ocr-d/core.git && \ | ||
cd core && \ | ||
git checkout network-for-slim-prep && \ | ||
make install | ||
|
||
# Not based on ocrd_cis "original" Dockerfile. That seems out of date and in ocrd_all ocrd_cis is | ||
# simply installed with pip so I do the same here | ||
COPY ocrd_cis/ ./ocrd_cis/ | ||
COPY setup.py README.md LICENSE ocrd-tool.json Manifest.in ./ | ||
RUN pip install . && rm -rf /build-ocrd | ||
# TODO: install models for ocrd-cis | ||
WORKDIR /data |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
services: | ||
ocrd-cis-ocropy-binarize: | ||
build: | ||
context: ../../ocrd_cis | ||
dockerfile: ../slim-containers-files/ocrd_cis/Dockerfile | ||
command: | ||
ocrd network processing-worker ocrd-cis-ocropy-binarize --database $MONGODB_URL --queue $RABBITMQ_URL --create-queue | ||
|
||
ocrd-cis-ocropy-dewarp: | ||
build: | ||
context: ../../ocrd_cis | ||
dockerfile: ../slim-containers-files/ocrd_cis/Dockerfile | ||
command: | ||
ocrd network processing-worker ocrd-cis-ocropy-dewarp --database $MONGODB_URL --queue $RABBITMQ_URL --create-queue |
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is this necessary at all? ocrd_tesserocr already contains a suitable Dockerfile... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it's just for proof-of-concept, so @joschrew does not need to keep multiple PR in sync. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
FROM ocrd/core:latest AS base | ||
WORKDIR /build-ocrd-core | ||
# Remove the next RUN, this is only to checkout my branch while the changes are not in core yet | ||
RUN git clone https://github.com/ocr-d/core.git && \ | ||
cd core && \ | ||
git checkout network-for-slim-prep && \ | ||
make install | ||
|
||
# copied from https://github.com/OCR-D/ocrd_tesserocr/blob/master/Dockerfile and modified | ||
ARG VCS_REF | ||
ARG BUILD_DATE | ||
LABEL \ | ||
maintainer="https://ocr-d.de/kontakt" \ | ||
org.label-schema.vcs-ref=$VCS_REF \ | ||
org.label-schema.vcs-url="https://github.com/OCR-D/ocrd_tesserocr" \ | ||
org.label-schema.build-date=$BUILD_DATE | ||
|
||
ENV DEBIAN_FRONTEND noninteractive | ||
ENV PYTHONIOENCODING utf8 | ||
|
||
# avoid HOME/.local/share (hard to predict USER here) | ||
# so let XDG_DATA_HOME coincide with fixed system location | ||
# (can still be overridden by derived stages) | ||
ENV XDG_DATA_HOME /usr/local/share | ||
|
||
WORKDIR /build-ocrd | ||
COPY setup.py . | ||
COPY ocrd_tesserocr/ocrd-tool.json . | ||
COPY README.md . | ||
COPY requirements.txt . | ||
COPY requirements_test.txt . | ||
COPY ocrd_tesserocr ./ocrd_tesserocr | ||
COPY Makefile . | ||
RUN make deps-ubuntu && \ | ||
apt-get install -y --no-install-recommends \ | ||
g++ \ | ||
&& make deps install \ | ||
&& rm -rf /build-ocrd \ | ||
&& apt-get -y remove --auto-remove g++ libtesseract-dev make | ||
RUN ocrd resmgr download ocrd-tesserocr-recognize Fraktur.traineddata | ||
RUN ocrd resmgr download ocrd-tesserocr-recognize deu.traineddata | ||
|
||
WORKDIR /data | ||
VOLUME /data |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,14 @@ | ||||||
services: | ||||||
ocrd-tesserocr-recognize: | ||||||
build: | ||||||
context: ../../ocrd_tesserocr | ||||||
dockerfile: ../slim-containers-files/ocrd_tesserocr/Dockerfile | ||||||
command: | ||||||
ocrd network processing-worker ocrd-tesseroc-recognize --database $MONGODB_URL --queue $RABBITMQ_URL --create-queue | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
ocrd-tesserocr-segment-region: | ||||||
build: | ||||||
context: ../../ocrd_tesserocr | ||||||
dockerfile: ../slim-containers-files/ocrd_tesserocr/Dockerfile | ||||||
command: | ||||||
ocrd network processing-worker ocrd-tesserocr-segment-region --database $MONGODB_URL --queue $RABBITMQ_URL --create-queue | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should now use the |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
process_queue: | ||
address: ocrd-rabbitmq | ||
port: 5672 | ||
skip_deployment: true | ||
credentials: | ||
username: admin | ||
password: admin | ||
joschrew marked this conversation as resolved.
Show resolved
Hide resolved
|
||
database: | ||
address: ocrd-mongodb | ||
port: 27017 | ||
skip_deployment: true | ||
joschrew marked this conversation as resolved.
Show resolved
Hide resolved
|
||
hosts: [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of this would be needed if you added to the existing
Makefile
directly. We already have (sensible+configurable) definitions forVIRTUAL_ENV
,OCRD_MODULES
(not a variable but the true submodule name) andOCRD_EXECUTABLES
(yourPROCESSORS
). There is even an existing delegator mechanism (used for sub-venvs on some modules).