Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Processing-Server #974

Merged
merged 230 commits into from
Apr 2, 2023
Merged
Show file tree
Hide file tree
Changes from 146 commits
Commits
Show all changes
230 commits
Select commit Hold shift + click to select a range
f2cf4e6
prototype for processing broker
joschrew Dec 7, 2022
2a4919c
refactoring: try to make it easier to read
joschrew Dec 16, 2022
d23bb3c
adapt code to changed config-file
joschrew Dec 16, 2022
7e20fef
change config representation class
joschrew Dec 19, 2022
d5a8641
add config validation
joschrew Dec 19, 2022
fd8e724
add deployment with docker-sdk
joschrew Dec 19, 2022
8f0796d
deploy queue and mongodb with docker-sdk
joschrew Dec 20, 2022
65dc60f
refactor code
joschrew Dec 20, 2022
ad8acc6
Add the required dependencies
MehmedGIT Jan 2, 2023
55ba0e4
Make queue and db deploy more flexible
MehmedGIT Jan 2, 2023
22bc5de
Refactor deployment of processors
MehmedGIT Jan 2, 2023
efdc5da
Conceptual integration of the RabbitMQ library
MehmedGIT Jan 2, 2023
1724662
Refactor: separate deplyoer from deployment utils
MehmedGIT Jan 2, 2023
16bcb4b
Refactor processing broker/worker
MehmedGIT Jan 2, 2023
9901f90
Revert requirements.txt to keep track of future conflicting files
MehmedGIT Jan 2, 2023
0ce527b
Merge remote-tracking branch 'origin/master' into dev-processing-broker
MehmedGIT Jan 2, 2023
86979f2
Extend requirements
MehmedGIT Jan 3, 2023
07b1733
Adopt useful methods
MehmedGIT Jan 3, 2023
6639c79
Create network package by refactoring
MehmedGIT Jan 3, 2023
76c40a3
Add example broker configuration
MehmedGIT Jan 3, 2023
3e4e76c
Change logger names
joschrew Jan 6, 2023
f1c27f3
Replace double quoted strings with single quoted
joschrew Jan 6, 2023
4fcaeb5
Add typehints for network files
joschrew Jan 9, 2023
d700522
Add address-option to processing broker
joschrew Jan 9, 2023
019e94e
Remove informational non-code files again
joschrew Jan 9, 2023
36160cc
Change functionsignature docker/ssh clientcreation
joschrew Jan 9, 2023
23b0db9
Remove (not working) close_clients function
joschrew Jan 9, 2023
f218669
Move, rename and modify config-classes
joschrew Jan 9, 2023
078fa31
Change enum value comparision for deploy_type
joschrew Jan 10, 2023
4f8d700
Improve for loops over processors
joschrew Jan 10, 2023
6b549f1
Remove and change some asserts
joschrew Jan 10, 2023
f5ad740
Add a few explanatory comments
joschrew Jan 10, 2023
02e3c4c
Change and and comments
joschrew Jan 10, 2023
fde789a
Change config file validation and handling
joschrew Jan 10, 2023
ab74eec
refactor: port -> ports_mapping
MehmedGIT Jan 11, 2023
edaf55d
refactor: _queue -> _rabbitmq
MehmedGIT Jan 11, 2023
87a83a8
Fix typo in kill_processing_workers
MehmedGIT Jan 11, 2023
e82de10
Implement kill_processing_worker separately
MehmedGIT Jan 11, 2023
612dc0b
Improve comments and checks
MehmedGIT Jan 11, 2023
76825fc
Add processing worker cli template
MehmedGIT Jan 12, 2023
8c60172
Copy RabbitMQ utility library from the WebAPI impl repo
MehmedGIT Jan 12, 2023
736a606
Add the reference note to the RabbitMQ utils
MehmedGIT Jan 12, 2023
1ec9371
Use actual RMQConsumer inside the processing_worker
MehmedGIT Jan 12, 2023
51690ee
Use actual RMQPublisher inside the processing_broker
MehmedGIT Jan 12, 2023
af5c0b4
Refactor and optimize imports
MehmedGIT Jan 12, 2023
fb213e8
Fix imports in rabbitmq_utils, add pika as a requirement
MehmedGIT Jan 12, 2023
41defe3
Remove config file and tomli as a requirement
MehmedGIT Jan 12, 2023
03c0946
Refactor defaults in rabbitmq_utils webapi->network
MehmedGIT Jan 12, 2023
ca1b1b9
Comment out RabbitMQ publisher/consumer
MehmedGIT Jan 12, 2023
49089c7
Improve deploy/kill order
MehmedGIT Jan 12, 2023
346bb2c
Split configurations for better readability
MehmedGIT Jan 12, 2023
1a290a2
Remove typing which was auto added by the IDE
MehmedGIT Jan 13, 2023
1453b74
change scope of get_processor
MehmedGIT Jan 13, 2023
2933a06
Move methods from Processing Worker to Deployer
MehmedGIT Jan 13, 2023
3300391
Refactor name -> processor_name
MehmedGIT Jan 13, 2023
9a7641b
Finish transfering of methods from Worker to Deployer
MehmedGIT Jan 13, 2023
a059f1c
Improve the template of processing worker
MehmedGIT Jan 13, 2023
29c30e5
Import the job and tool models from prev implementation
MehmedGIT Jan 16, 2023
908b118
Add processing worker wrappers for run_cli and run_processor
MehmedGIT Jan 17, 2023
680b706
Fix import of ocrd_messages
MehmedGIT Jan 17, 2023
9de2175
RabbitMQ and MongoDB addr handling helpers
MehmedGIT Jan 17, 2023
a2481bd
Add helpers for addr parsing
MehmedGIT Jan 17, 2023
c83fec9
Extend decorators to accept --queue and --database
MehmedGIT Jan 17, 2023
1d8c1b0
Extend processing worker
MehmedGIT Jan 17, 2023
62ae01b
Remove default values for --queue and --database
MehmedGIT Jan 17, 2023
eeb797b
Increase sleep command of the dummy docker container
MehmedGIT Jan 17, 2023
fd24c4f
Fix rabbitmq path in logging of processing_worker
MehmedGIT Jan 17, 2023
6359bdf
Refactor processing broker + activate queue creation
MehmedGIT Jan 17, 2023
ef34e67
Log messages received from the Management UI on port 15672
MehmedGIT Jan 17, 2023
cfcc036
Improve logging + add some TODOs and remarks
MehmedGIT Jan 18, 2023
3bd6f70
Retrieve ocrd_tool by using util method
MehmedGIT Jan 18, 2023
7138979
Force logging settings - paramiko=INFO, ocrd.network=DEBUG
MehmedGIT Jan 18, 2023
0f826fd
Implement on_consumed_message method
MehmedGIT Jan 18, 2023
2a5d442
default message constructing methods
MehmedGIT Jan 18, 2023
7912ae3
Append default global vhost, when not provided
MehmedGIT Jan 18, 2023
fb11953
database prefix should be passed as an argument
MehmedGIT Jan 18, 2023
e93d702
Fix --queue and --database, not flags
MehmedGIT Jan 18, 2023
952e6e9
Add remarks about start_*_processor methods
MehmedGIT Jan 18, 2023
43050ab
Provide a basic testing endpoint to processing broker (aka server)
MehmedGIT Jan 19, 2023
ae18c08
Make address param for processing broker required
joschrew Jan 20, 2023
fdfaf8d
Use credentials from config for rabbitmq messaging
joschrew Jan 20, 2023
f387f92
Add new parameters to processor help output
joschrew Jan 20, 2023
d35d006
Improve error message for processing worker
joschrew Jan 20, 2023
beca781
Insert credentials on rabbitmq startup
joschrew Jan 23, 2023
f2c3e6e
Change message exchange from pickle to str for now
joschrew Jan 23, 2023
2f045b3
Change rabbitmq default credentials
joschrew Jan 24, 2023
d1c1f50
Add optional path_to_bin_dir to config and use it
joschrew Jan 24, 2023
344e593
Add route to p-broker to start a p-worker
joschrew Jan 25, 2023
77a316d
Add typehints for a method in processing_worker
joschrew Jan 25, 2023
e16e3d4
Replace double quoted strings with single quoted
joschrew Jan 25, 2023
512ea77
Add missing typehints for network modules
joschrew Jan 25, 2023
e876cf9
Review (mainly comments for) CustomDockerClient
joschrew Jan 25, 2023
59e8418
Remind myself to rename the broker
joschrew Jan 26, 2023
5900c73
Reuse parts of older impl for running a processor
joschrew Jan 26, 2023
2323fac
Add processor enpoints from previous impl (pr 884)
joschrew Jan 26, 2023
c57e75c
Change comments and format and do some cleanup
joschrew Jan 26, 2023
868d58d
Adapt to latest config changes
joschrew Jan 30, 2023
53e9b47
Use definitions.json for rabbitmq
joschrew Jan 30, 2023
72a8e5c
Rename processing broker to processing server
joschrew Jan 30, 2023
0f399ea
Set job state after worker finishes
joschrew Jan 30, 2023
8143614
Add enpoint to list processors
joschrew Jan 30, 2023
871323b
Remove redundant log output
joschrew Feb 2, 2023
7f4f1d7
Ensure processor availability before processing
joschrew Feb 2, 2023
1caffd6
Implement the result-queue logic - worker uses RMQPublisher
MehmedGIT Feb 2, 2023
a16aa02
Send encoded result message
MehmedGIT Feb 2, 2023
ab326f7
Add exception handler to debug validation
MehmedGIT Feb 8, 2023
4368b53
Workers push finished job status to result queue
MehmedGIT Feb 9, 2023
e2452e4
Example NF script (still broken) executed by the WF Server to submit …
MehmedGIT Feb 9, 2023
de3b41a
Add workspace_id parameter to processor-call
joschrew Feb 9, 2023
8b583d9
Add workspace model basically that from webapi
joschrew Feb 9, 2023
5d218a1
Add workspace_id/path param-check to run_processor
joschrew Feb 9, 2023
1142449
Resolve mets path with workspace id
joschrew Feb 10, 2023
be0fe51
Check off some trivial todos
joschrew Feb 10, 2023
67753b1
Stop splitting config in ProcessingServer
joschrew Feb 10, 2023
6a526af
Stop storing pid for queue and mongo in config
joschrew Feb 10, 2023
d9d959e
Separating config and data for hosts
joschrew Feb 10, 2023
f51b3b8
Removing unnecessary things after meeting with Jonas
MehmedGIT Feb 13, 2023
30498a7
Add cleanup and exceptionhandling to startup
joschrew Feb 14, 2023
6a3836d
Remove unnecessary loggging and clean comments
joschrew Feb 14, 2023
c838ed7
Add job response model
joschrew Feb 14, 2023
d763d4c
Add mechanism to ensure queue startup
joschrew Feb 14, 2023
5b4e22d
Remove a few more unneeded todo-notes
joschrew Feb 14, 2023
f6fe1c3
Merge branch 'master' into dev-processing-broker
MehmedGIT Feb 16, 2023
84acd77
Remove unnecessary parts
MehmedGIT Feb 16, 2023
b586f70
Add logging to file for processing workers
MehmedGIT Feb 16, 2023
0bb9bdb
Wait 100ms between deployment of ocr-d processors
MehmedGIT Feb 17, 2023
ec9ffc6
Merge master
MehmedGIT Feb 17, 2023
9ab1146
Dirty fix - disable interactive shell logging for calamari
MehmedGIT Feb 17, 2023
6cd2a89
Enable instance caching for processors - tested and working
MehmedGIT Feb 17, 2023
144f262
Add tensorflow to core requirements
MehmedGIT Feb 17, 2023
6c3c6d5
Add changes from fix-972 branch
MehmedGIT Feb 20, 2023
cb54a67
Replace ocrd logging with python logging
MehmedGIT Feb 20, 2023
5a328c2
Lower the scope of tf keras import
MehmedGIT Feb 20, 2023
c02f0ba
Revert logging back to ocrd logging in worker
MehmedGIT Feb 20, 2023
cef1d0d
Refactor imports, logging, spacing
MehmedGIT Feb 21, 2023
b73c899
Remove config-var path_to_bin_dir again
joschrew Feb 21, 2023
3135a4b
Rearrange docker and ssh client creation
joschrew Feb 21, 2023
c00dc04
Fix copy paste mistake from last commit
joschrew Feb 22, 2023
bd3a6c2
Add workaround for rabbitmq startup on remote host
joschrew Feb 22, 2023
a1d5611
Remove the unnecessary rabbitmq.conf file
MehmedGIT Feb 23, 2023
de492b8
Rename table for workspace in mongo
joschrew Feb 24, 2023
6c09711
Use config credentials for server and workers
MehmedGIT Feb 24, 2023
b264652
Create ocrd_network package
joschrew Feb 24, 2023
826d9a0
Remove usage of definitions.json with rabbitmq
joschrew Feb 24, 2023
ad98e12
Fix dependency in requirements
MehmedGIT Feb 24, 2023
fffb7dc
Update doc for processing server
joschrew Feb 27, 2023
f5e45b3
Improve exception message
MehmedGIT Mar 2, 2023
9829b18
Remove parsing of stdout in ocrd decorators
MehmedGIT Mar 2, 2023
b73e99a
Update processing worker cli help
MehmedGIT Mar 2, 2023
cf7321d
Update ocrd-network package info in __init__
MehmedGIT Mar 2, 2023
132b93f
Update ocrd_network/ocrd_network/deployer.py info
MehmedGIT Mar 2, 2023
a2a7fe9
Change getcwd() call location
MehmedGIT Mar 2, 2023
22311f1
Update ocrd_network/ocrd_network/deployer.py help
MehmedGIT Mar 2, 2023
f9c5414
Update ocrd_network/ocrd_network/deployer.py
MehmedGIT Mar 2, 2023
4e66ed3
Raise the error instead of returning it
MehmedGIT Mar 2, 2023
d1f287d
Revert getcwd() location - failing tests
MehmedGIT Mar 2, 2023
ddc92d7
Fix set_job_state
MehmedGIT Mar 2, 2023
75b5d4e
Make conditional TF import
MehmedGIT Mar 3, 2023
b3ada1f
Remove unused function
joschrew Mar 9, 2023
f154949
Implement custom type check
MehmedGIT Mar 10, 2023
8a256d5
Change native processor startup output handling
joschrew Mar 15, 2023
e125a14
Fix uvicorn logging bug
joschrew Mar 15, 2023
b5d3496
Add possibility to run bashlib processors too
joschrew Mar 15, 2023
5ff8860
Add module docstring for database
joschrew Mar 16, 2023
2b5cc3b
parameters -> arguments in RMQ connector
MehmedGIT Mar 16, 2023
ad04cb6
Fix quotes
MehmedGIT Mar 16, 2023
8e7b1f8
better error message
MehmedGIT Mar 16, 2023
44e1eaa
Improve validation and parsing of URIs
MehmedGIT Mar 17, 2023
c791260
Clean, refactor, and add small TODOs
MehmedGIT Mar 17, 2023
494c0ab
chdir to ws as in #987
MehmedGIT Mar 20, 2023
d784a75
Call getcwd before get_processor
MehmedGIT Mar 20, 2023
43a5660
Set ws outside the constructor
MehmedGIT Mar 20, 2023
e396b2b
Modify OcrdResultMessage
MehmedGIT Mar 21, 2023
57d4791
Move check before assignments
MehmedGIT Mar 21, 2023
7288d6d
Remove pymongo, add call_sync wrapper
MehmedGIT Mar 21, 2023
5e6d0ad
Refactor helpers -> utils
MehmedGIT Mar 21, 2023
66851d4
Add callback_url, improve models
MehmedGIT Mar 21, 2023
07828a6
Remove the debugging file
MehmedGIT Mar 21, 2023
0eb74b9
Remove result queue related info logs
MehmedGIT Mar 21, 2023
da2e7e1
Initiate DB client after db addr verification
MehmedGIT Mar 22, 2023
31d3a08
Improve exception message
MehmedGIT Mar 22, 2023
b82446c
refactor result_queue, callback_url checks
MehmedGIT Mar 22, 2023
0f0a6d3
Set workspace_id when available
MehmedGIT Mar 22, 2023
44b6b6e
minor variable and comment improvements
MehmedGIT Mar 22, 2023
8e0d7d8
fix result queue related things
MehmedGIT Mar 22, 2023
d1e7ff9
path -> path_to_mets in Job
MehmedGIT Mar 22, 2023
c6be974
Fix job.path -> job.path_to_mets, refactor
MehmedGIT Mar 22, 2023
4dd0ea9
Create job_id, don't rely on document_id
MehmedGIT Mar 22, 2023
69399fb
Refactor status -> state to have same standard
MehmedGIT Mar 22, 2023
d1d85c6
Remove json.dumps
MehmedGIT Mar 22, 2023
93b7f5b
Refactor the setup.py
MehmedGIT Mar 22, 2023
8f32004
Big refactoring - use relative imports
MehmedGIT Mar 22, 2023
3c3273b
Fix setup.py
MehmedGIT Mar 22, 2023
32b935b
Merge master v2.48.0
MehmedGIT Mar 22, 2023
e080a0f
Refactor processing config validation
MehmedGIT Mar 23, 2023
59a06ab
Validate processing message against message schema
MehmedGIT Mar 23, 2023
787510e
Fix: catch db workspace exception
MehmedGIT Mar 24, 2023
1679382
impove: match the returned http status code
MehmedGIT Mar 24, 2023
709c46c
fix: resolve mets path in worker, not server
MehmedGIT Mar 24, 2023
39840e2
fix: process job model
MehmedGIT Mar 24, 2023
9ba05e0
db single update method
MehmedGIT Mar 24, 2023
0fe51f2
calculate execution time in ms
MehmedGIT Mar 24, 2023
9910482
resolve: suggestions by kba
MehmedGIT Mar 24, 2023
4774f51
bashlib: implement Processing Worker args
bertsky Mar 24, 2023
6741d55
improve generate_processor_help:
bertsky Mar 24, 2023
0ccc963
ocrd_cli_options: remove redundant help kwarg
bertsky Mar 24, 2023
e2e6e69
start native cli: no need for is_bashlib_processor
bertsky Mar 24, 2023
4ab2657
rm is_bashlib_processor
bertsky Mar 24, 2023
a4b9c96
rm Py36 test – no longer supported
bertsky Mar 24, 2023
3f13747
CI: use Python images from CircleCI instead of DH
bertsky Mar 24, 2023
6ead599
require Py37+
bertsky Mar 24, 2023
89570a7
require Py37+
bertsky Mar 24, 2023
e1f6ed8
require Py37+
bertsky Mar 24, 2023
fed4128
require Py37+
bertsky Mar 24, 2023
af08d1e
require Py37+
bertsky Mar 24, 2023
21c46c1
require Py37+
bertsky Mar 24, 2023
fe3303e
fix CI:
bertsky Mar 25, 2023
df7bfee
--help: clearer description of --page-id
kba Mar 26, 2023
378400e
.circleci/config.yml: fix syntax
kba Mar 26, 2023
dbae7c4
.circleci/config.yml: fix syntax
kba Mar 26, 2023
53f94b7
Merge branch 'dev-processing-broker-add-bashlib-processing-worker' of…
kba Mar 26, 2023
25cc6fc
.circleci/config.yml: fix syntax
kba Mar 26, 2023
176cf73
.circleci/config.yml: fix syntax
kba Mar 26, 2023
eeaafc4
Merge pull request #1024 from OCR-D/dev-processing-broker-add-bashlib…
MehmedGIT Mar 26, 2023
2fa674f
Make receiving job info for procesor work again
joschrew Mar 27, 2023
2fc3856
provide flexible queue checks
MehmedGIT Mar 27, 2023
7835154
remove unnecessary checks
MehmedGIT Mar 27, 2023
123366b
bruh, I need some rest for today
MehmedGIT Mar 27, 2023
6fac7b8
check defaults, pass shallow copy
MehmedGIT Mar 28, 2023
bc9a29a
Update README.md
kba Apr 2, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ COPY ocrd_models ./ocrd_models
COPY ocrd_utils ./ocrd_utils
RUN mv ./ocrd_utils/ocrd_logging.conf /etc
COPY ocrd_validators/ ./ocrd_validators
COPY ocrd_network/ ./ocrd_network
COPY Makefile .
COPY README.md .
COPY LICENSE .
Expand Down
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ TESTDIR = tests

SPHINX_APIDOC =

BUILD_ORDER = ocrd_utils ocrd_models ocrd_modelfactory ocrd_validators ocrd
BUILD_ORDER = ocrd_utils ocrd_models ocrd_modelfactory ocrd_validators ocrd_network ocrd
bertsky marked this conversation as resolved.
Show resolved Hide resolved

FIND_VERSION = grep version= ocrd_utils/setup.py|grep -Po "([0-9ab]+\.?)+"

Expand Down
7 changes: 7 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
* [ocrd_models](#ocrd_models)
* [ocrd_modelfactory](#ocrd_modelfactory)
* [ocrd_validators](#ocrd_validators)
* [ocrd_network](#ocrd_network)
* [ocrd](#ocrd)
* [bash library](#bash-library)
* [bashlib API](#bashlib-api)
Expand Down Expand Up @@ -122,6 +123,12 @@ Schemas and routines for validating BagIt, `ocrd-tool.json`, workspaces, METS, p

See [README for `ocrd_validators`](./ocrd_validators/README.md) for further information.

### ocrd_network

Tools for offering (web-)services with OCR-D
kba marked this conversation as resolved.
Show resolved Hide resolved

See [README for `ocrd_network`](./ocrd_network/README.md) for further information.

### ocrd

Depends on all of the above, also contains decorators and classes for creating OCR-D processors and CLIs.
Expand Down
5 changes: 5 additions & 0 deletions ocrd/ocrd/cli/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,9 @@ def get_help(self, ctx):
from ocrd.decorators import ocrd_loglevel
from .zip import zip_cli
from .log import log_cli
from .processing_server import processing_server_cli
from .processing_worker import processing_worker_cli


@click.group()
@click.version_option()
Expand All @@ -48,3 +51,5 @@ def cli(**kwargs): # pylint: disable=unused-argument
cli.add_command(validate_cli)
cli.add_command(log_cli)
cli.add_command(resmgr_cli)
cli.add_command(processing_server_cli)
cli.add_command(processing_worker_cli)
37 changes: 37 additions & 0 deletions ocrd/ocrd/cli/processing_server.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
"""
OCR-D CLI: start the processing server

.. click:: ocrd.cli.processing_server:zip_cli
MehmedGIT marked this conversation as resolved.
Show resolved Hide resolved
:prog: ocrd processing-server
:nested: full
"""
import click
from ocrd_utils import initLogging
from ocrd_network import ProcessingServer
import logging


@click.command('processing-server')
@click.argument('path_to_config', required=True, type=click.STRING)
@click.option('-a', '--address', help='Host (name/IP) and port to bind the Processing-Server to. Example: localhost:8080', required=True)
MehmedGIT marked this conversation as resolved.
Show resolved Hide resolved
def processing_server_cli(path_to_config, address: str):
"""
Start and manage processing workers with the processing server

PATH_TO_CONFIG is a yaml file to configure the server and the workers. See
https://github.com/OCR-D/spec/pull/222/files#diff-a71bf71cbc7d9ce94fded977f7544aba4df9e7bdb8fc0cf1014e14eb67a9b273
for further information (TODO: update path when spec is available/merged)

"""
initLogging()
# TODO: Remove before the release
logging.getLogger('paramiko.transport').setLevel(logging.INFO)
logging.getLogger('ocrd.network').setLevel(logging.DEBUG)
kba marked this conversation as resolved.
Show resolved Hide resolved

try:
host, port = address.split(":")
port_int = int(port)
except ValueError:
raise click.UsageError('The --address option must have the format IP:PORT')
MehmedGIT marked this conversation as resolved.
Show resolved Hide resolved
processing_server = ProcessingServer(path_to_config, host, port_int)
processing_server.start()
56 changes: 56 additions & 0 deletions ocrd/ocrd/cli/processing_worker.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
"""
OCR-D CLI: start the processing worker

.. click:: ocrd.cli.processing_worker:zip_cli
MehmedGIT marked this conversation as resolved.
Show resolved Hide resolved
:prog: ocrd processing-worker
:nested: full
"""
import click
import logging
from ocrd_utils import (
initLogging,
get_ocrd_tool_json
)
from ocrd_network.processing_worker import ProcessingWorker


@click.command('processing-worker')
@click.argument('processor_name', required=True, type=click.STRING)
@click.option('-q', '--queue',
default="admin:admin@localhost:5672/",
help='The username, password, host, port, and virtual host of the RabbitMQ Server. '
'Format: username:password@host:port/vhost')
MehmedGIT marked this conversation as resolved.
Show resolved Hide resolved
@click.option('-d', '--database',
default="mongodb://localhost:27018",
help='The host and port of the MongoDB')
MehmedGIT marked this conversation as resolved.
Show resolved Hide resolved
def processing_worker_cli(processor_name: str, queue: str, database: str):
"""
Start a processing worker (a specific ocr-d processor)
"""
initLogging()
# TODO: Remove before the release
logging.getLogger('ocrd.network').setLevel(logging.DEBUG)
kba marked this conversation as resolved.
Show resolved Hide resolved

# Get the ocrd_tool dictionary
# ocrd_tool = parse_json_string_with_comments(
# run([processor_name, '--dump-json'], stdout=PIPE, check=True, universal_newlines=True).stdout
# )

ocrd_tool = get_ocrd_tool_json(processor_name)
if not ocrd_tool:
raise Exception(f"The ocrd_tool is empty or missing")

try:
processing_worker = ProcessingWorker(
rabbitmq_addr=queue,
mongodb_addr=database,
processor_name=ocrd_tool['executable'],
ocrd_tool=ocrd_tool,
processor_class=None, # For readability purposes assigned here
)
# The RMQConsumer is initialized and a connection to the RabbitMQ is performed
processing_worker.connect_consumer()
# Start consuming from the queue with name `processor_name`
processing_worker.start_consuming()
except Exception as e:
raise Exception("Processing worker has failed with error") from e
41 changes: 40 additions & 1 deletion ocrd/ocrd/decorators/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
from os.path import isfile
from os import environ
import sys
from contextlib import redirect_stdout
from io import StringIO

import click

Expand All @@ -10,9 +12,11 @@
set_json_key_value_overrides,
)

from ocrd_utils import getLogger, initLogging
from ocrd_utils import getLogger, initLogging, parse_json_string_with_comments
from ocrd_validators import WorkspaceValidator

from ocrd_network import ProcessingWorker

from ..resolver import Resolver
from ..processor.base import run_processor

Expand All @@ -35,6 +39,8 @@ def ocrd_cli_wrap_processor(
overwrite=False,
show_resource=None,
list_resources=False,
queue=None,
database=None,
**kwargs
):
if not sys.argv[1:]:
Expand All @@ -51,6 +57,39 @@ def ocrd_cli_wrap_processor(
list_resources=list_resources
)
sys.exit()
# If either of these two is provided but not both
if bool(queue) != bool(database):
raise Exception("Both queue and database addresses must be provided - not just either of them.")
MehmedGIT marked this conversation as resolved.
Show resolved Hide resolved
# If both of these are provided - start the processing worker instead of the processor - processorClass
if queue and database:
initLogging()
# TODO: Remove before the release
# We are importing the logging here because it's not the ocrd logging but python one
import logging
logging.getLogger('ocrd.network').setLevel(logging.DEBUG)
kba marked this conversation as resolved.
Show resolved Hide resolved

# Get the ocrd_tool dictionary
f_out = StringIO()
with redirect_stdout(f_out):
processorClass(workspace=None, dump_json=True)
# TODO: Verify this. There is `ocrd_tool` parameter passed as an argument.
# The following line overwrites the passed parameter.
ocrd_tool = parse_json_string_with_comments(f_out.getvalue())
MehmedGIT marked this conversation as resolved.
Show resolved Hide resolved

try:
processing_worker = ProcessingWorker(
rabbitmq_addr=queue,
mongodb_addr=database,
processor_name=ocrd_tool['executable'],
ocrd_tool=ocrd_tool,
processor_class=processorClass,
)
# The RMQConsumer is initialized and a connection to the RabbitMQ is performed
processing_worker.connect_consumer()
# Start consuming from the queue with name `processor_name`
processing_worker.start_consuming()
except Exception as e:
raise Exception("Processing worker has failed with error") from e
else:
initLogging()
LOG = getLogger('ocrd_cli_wrap_processor')
Expand Down
4 changes: 3 additions & 1 deletion ocrd/ocrd/decorators/ocrd_cli_options.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from click import option
from click import option, STRING
from .parameter_option import parameter_option, parameter_override_option
from .loglevel_option import loglevel_option

Expand Down Expand Up @@ -26,6 +26,8 @@ def cli(mets_url):
option('-O', '--output-file-grp', help='File group(s) used as output.', default='OUTPUT'),
option('-g', '--page-id', help="ID(s) of the pages to process"),
option('--overwrite', help="Overwrite the output file group or a page range (--page-id)", is_flag=True, default=False),
option('--queue', help="The RabbitMQ server address in format: {host}:{port}/{vhost}", type=STRING),
option('--database', help="The MongoDB address in format: mongodb://{host}:{port}", type=STRING),
MehmedGIT marked this conversation as resolved.
Show resolved Hide resolved
option('-C', '--show-resource', help='Dump the content of processor resource RESNAME', metavar='RESNAME'),
option('-L', '--list-resources', is_flag=True, default=False, help='List names of processor resources'),
parameter_option,
Expand Down
42 changes: 30 additions & 12 deletions ocrd/ocrd/processor/helpers.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
"""
Helper methods for running and documenting processors
"""
from os import environ
from os import chdir, environ, getcwd
from time import perf_counter, process_time
from functools import lru_cache
import json
Expand All @@ -14,7 +14,7 @@

from click import wrap_text
from ocrd.workspace import Workspace
from ocrd_utils import freeze_args, getLogger
from ocrd_utils import freeze_args, getLogger, pushd_popd


__all__ = [
Expand Down Expand Up @@ -95,6 +95,9 @@ def run_processor(
instance_caching=instance_caching
)

old_cwd = getcwd()
MehmedGIT marked this conversation as resolved.
Show resolved Hide resolved
chdir(processor.workspace.directory)

ocrd_tool = processor.ocrd_tool
name = '%s v%s' % (ocrd_tool['executable'], processor.version)
otherrole = ocrd_tool['steps'][0]
Expand All @@ -104,21 +107,34 @@ def run_processor(
t0_cpu = process_time()
if any(x in environ.get('OCRD_PROFILE', '') for x in ['RSS', 'PSS']):
backend = 'psutil_pss' if 'PSS' in environ['OCRD_PROFILE'] else 'psutil'
mem_usage = memory_usage(proc=processor.process,
# only run process once
max_iterations=1,
interval=.1, timeout=None, timestamps=True,
# include sub-processes
multiprocess=True, include_children=True,
# get proportional set size instead of RSS
backend=backend)
try:
mem_usage = memory_usage(proc=processor.process,
# only run process once
max_iterations=1,
interval=.1, timeout=None, timestamps=True,
# include sub-processes
multiprocess=True, include_children=True,
# get proportional set size instead of RSS
backend=backend)
except Exception as err:
log.exception("Failure in processor '%s'" % ocrd_tool['executable'])
return err
finally:
chdir(old_cwd)
mem_usage_values = [mem for mem, _ in mem_usage]
mem_output = 'memory consumption: '
mem_output += ''.join(sparklines(mem_usage_values))
mem_output += ' max: %.2f MiB min: %.2f MiB' % (max(mem_usage_values), min(mem_usage_values))
logProfile.info(mem_output)
else:
processor.process()
try:
processor.process()
except Exception as err:
log.exception("Failure in processor '%s'" % ocrd_tool['executable'])
return err
finally:
chdir(old_cwd)

t1_wall = perf_counter() - t0_wall
t1_cpu = process_time() - t0_cpu
logProfile.info("Executing processor '%s' took %fs (wall) %fs (CPU)( [--input-file-grp='%s' --output-file-grp='%s' --parameter='%s' --page-id='%s']" % (
Expand Down Expand Up @@ -198,7 +214,7 @@ def run_cli(

def generate_processor_help(ocrd_tool, processor_instance=None):
"""Generate a string describing the full CLI of this processor including params.

Args:
ocrd_tool (dict): this processor's ``tools`` section of the module's ``ocrd-tool.json``
processor_instance (object, optional): the processor implementation
Expand Down Expand Up @@ -247,6 +263,8 @@ def wrap(s):
-g, --page-id ID Physical page ID(s) to process
--overwrite Remove existing output pages/images
(with --page-id, remove only those)
--queue The RabbitMQ server address in format: {host}:{port}/{vhost}"
--database The MongoDB address in format: mongodb://{host}:{port}"
--profile Enable profiling
--profile-file Write cProfile stats to this file. Implies --profile
-p, --parameter JSON-PATH Parameters, either verbatim JSON string
Expand Down
1 change: 1 addition & 0 deletions ocrd/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
install_requires.append('ocrd_models == %s' % VERSION)
install_requires.append('ocrd_modelfactory == %s' % VERSION)
install_requires.append('ocrd_validators == %s' % VERSION)
install_requires.append('ocrd_network == %s' % VERSION)

setup(
name='ocrd',
Expand Down
5 changes: 5 additions & 0 deletions ocrd_network/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# ocrd_network

> OCR-D framework - web API

See also: https://github.com/OCR-D/core
17 changes: 17 additions & 0 deletions ocrd_network/ocrd_network/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# This network package is supposed to contain all the packages and modules to realize the network architecture:
# https://user-images.githubusercontent.com/7795705/203554094-62ce135a-b367-49ba-9960-ffe1b7d39b2c.jpg
MehmedGIT marked this conversation as resolved.
Show resolved Hide resolved

# For reference, currently:
# 1. The WebAPI is available here:
# https://github.com/OCR-D/ocrd-webapi-implementation
# 2. The RabbitMQ Library (i.e., utils) is available here:
# https://github.com/OCR-D/ocrd-webapi-implementation/tree/main/ocrd_webapi/rabbitmq
# 3. Some potentially more useful code to be adopted for the Processing Server/Worker is available here:
# https://github.com/OCR-D/core/pull/884
# 4. The Mets Server discussion/implementation is available here:
# https://github.com/OCR-D/core/pull/966
bertsky marked this conversation as resolved.
Show resolved Hide resolved

# Note: The Mets Server is still not placed on the architecture diagram and probably won't be a part of
# the network package. The reason, Mets Server is tightly coupled with the `OcrdWorkspace`.
kba marked this conversation as resolved.
Show resolved Hide resolved
from .processing_server import ProcessingServer
from .processing_worker import ProcessingWorker
13 changes: 13 additions & 0 deletions ocrd_network/ocrd_network/database.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
from beanie import init_beanie
MehmedGIT marked this conversation as resolved.
Show resolved Hide resolved
from motor.motor_asyncio import AsyncIOMotorClient

from ocrd_network.models.job import Job
from ocrd_network.models.workspace import Workspace


async def initiate_database(db_url: str):
client = AsyncIOMotorClient(db_url)
await init_beanie(
database=client.get_default_database(default='ocrd'),
document_models=[Job, Workspace]
)
Loading