Processing-Server #974

joschrew · 2023-01-24T12:28:41Z

This pull request implements the processing server (OCR-D/spec#222). The processing server starts a rabbitmq, mongodb and processing workers via ssh and docker. It receives calls to run ocr-d processors. The jobs are enqueued and the workers read the jobs from the queue and execute them. In the mongdb the jobs are stored and status changes of the jobs are reflected in the database and can be queried through the processing workers API.

To start a worker it must be defined in the configuration file. There the host of the worker has to be set. It is expected that the worker executables (e.g. ocrd-dummy) are available in the PATH. PATH can be modified via ~/.bash_profile or ~/.profile.

Example configuration file:

process_queue:
  address: localhost
  port: 5672
  credentials:
    username: testuser
    path_to_privkey: /home/testuser/.ssh/testuser.key
  ssh:
    username: testuser
    path_to_privkey: /home/testuser/.ssh/testuser.key
database:
  address: localhost
  port: 27018
  credentials:
    username: admin
    password: admin
  ssh:
    username: testuser
    path_to_privkey: /home/testuser/.ssh/testuser.key
hosts:
  - address: localhost
    username: testuser
    path_to_privkey: /home/testuser/.ssh/testuser.key
    workers:
      - name: ocrd-dummy
        number_of_instance: 1
        deploy_type: native

Example calls for the processing-server endpoints:

Run a processor:
curl 'http://localhost:8080/processor/ocrd-dummy' -H 'accept: application/json' -H 'Content-Type: application/json' -d '{ "path": "/home/testuser/.local/share/ocrd-workspaces/data/mets.xml", "input_file_grps": ["OCR-D-IMG"], "output_file_grps": ["IMG-COPY-1"], "parameters": { "copy_files": true } }'

Request processor status:
curl 'http://localhost:8080/processor/ocrd-dummy/<insert-job-id-here>'

List available processors:
curl 'http://localhost:8080/processor'

Get information about a single processor
curl 'http://localhost:8080/processor/ocrd-dummy'

We decided to use only single quotes for strings to make it consistent. I kept docstrings in triple double quotes because of PEP 257.

They were added to give additional information but are not needed here any more in this place

…-processing-worker add bashlib processing worker, require Python 3.7

MehmedGIT

Should be good to be merged now!

MehmedGIT · 2023-03-27T15:44:38Z

ocrd_network/ocrd_network/processing_server.py

+            try:
+                # Only checks if the process queue exists, if not raises ValueError
+                self.rmq_publisher.create_queue(processor_name, passive=True)
+            except ChannelClosedByBroker as error:


I was expecting this to raise just a ValueError according to the doc if the queue doesn't exist, however, that seems to not be the case. The channel is closed.

MehmedGIT · 2023-03-27T15:49:58Z

ocrd_network/ocrd_network/processing_server.py

+            finally:
+                # Reconnect publisher - not efficient, but works
+                # TODO: Revisit when reconnection strategy is implemented
+                self.connect_publisher(enable_acks=True)


Hence, a reconnection is needed here. I know that it's more efficient to just open a new channel but don't want to deal with that now - will be more appropriate to invest time to implement correctly the reconnection scheme internally in the rabbitmq_utils.

Fix the silly mistake I made.

ocrd_network/ocrd_network/processing_server.py

Makefile

README.md

kba · 2023-03-29T09:33:28Z

Okay, another note that will potentially be raised soon. Currently, the supported processing-worker CLI is still the old one:

> > ```shell > # 1. > $ ocrd processing-worker --queue= --database= > > # 2. > $ --queue= --database= > ``` > > We (me and @joschrew) are aware that in the [spec](https://github.com/OCR-D/spec/blob/master/web_api.md) there is a change that is not adapted here, yet: > > ```shell > # 1. Use ocrd CLI bundled with OCR-D/core > $ ocrd server --type=worker --queue= --database= > > # 2. Use processor name > $ --server --type=worker --queue= --database= > ``` > > Where the `--type` can be either: `worker` (processing worker) > > ```shell > ocrd server --type=worker --queue= --database= > ``` > > or `server` (the REST API wrapper based on #884). > > ```shell > ocrd server --type=server --database= > ``` > > However, it's a bad idea to extend the processing worker code to support the REST API processing server (aka standalone processing worker that has nothing to do with the bigger Processing Server in the spec and does not need the queues). > > Now, try to imagine a good way to explain to the OCR-D community without confusing them: > > 1. why the `` is a server, but is not a server when `--type=worker` and is referred to with `processing-worker` and no direct requests can be sent > 2. why the `` is a server, and actual server when `--type=server` and is referred to with `processing-server` > 3. why both are grouped together under `ocrd server...` or `... --server ...` but potentially implemented together under `processing_worker.py` > 4. why Processing Server (PS-big) and processing servers (PS-small), standalone ocrd processors, are different concepts but both referred to with `processing server`. > > Sounds confusing? It's even more when it has to be implemented in a clean way. So there are 3 main reasons to not adapt that yet: > > * The priority changed a bit and the higher priority now is to deploy working Processing Server, Workflow Server, and Workspace Server together on a live VM instance, so KITODO can use that to continue their development. > * Identify and fix problems that arise when combining the 3 servers above from 2 different repositories ([Processing-Server #974](https://github.com//pull/974) and reference WebAPI impl). > * To not complicate the current implementation without first trying to think how to separate concepts properly and avoid potential problems. Ideally: > > 1. the standalone processing servers (aka server processing workers) should be implemented as a separate class (name suggestions?), sibling of `ProcessingWorker`, and both will share the common methods. > 2. the CLI for both should be separated with improved naming conventions.

continued in #1032

Co-authored-by: Robert Sachunsky <[email protected]>

joschrew and others added 30 commits December 7, 2022 16:25

prototype for processing broker

f2cf4e6

refactoring: try to make it easier to read

2a4919c

adapt code to changed config-file

d23bb3c

change config representation class

7e20fef

add config validation

d5a8641

add deployment with docker-sdk

fd8e724

deploy queue and mongodb with docker-sdk

8f0796d

refactor code

65dc60f

Add the required dependencies

ad8acc6

Make queue and db deploy more flexible

55ba0e4

Refactor deployment of processors

22bc5de

Conceptual integration of the RabbitMQ library

efdc5da

Refactor: separate deplyoer from deployment utils

1724662

Refactor processing broker/worker

16bcb4b

Revert requirements.txt to keep track of future conflicting files

9901f90

Merge remote-tracking branch 'origin/master' into dev-processing-broker

0ce527b

Extend requirements

86979f2

Adopt useful methods

07b1733

Create network package by refactoring

6639c79

Add example broker configuration

76c40a3

Change logger names

3e4e76c

Replace double quoted strings with single quoted

f1c27f3

We decided to use only single quotes for strings to make it consistent. I kept docstrings in triple double quotes because of PEP 257.

Add typehints for network files

4fcaeb5

Add address-option to processing broker

d700522

Remove informational non-code files again

019e94e

They were added to give additional information but are not needed here any more in this place

Change functionsignature docker/ssh clientcreation

36160cc

Remove (not working) close_clients function

23b0db9

Move, rename and modify config-classes

f218669

Change enum value comparision for deploy_type

078fa31

Improve for loops over processors

4f8d700

Merge pull request #1024 from OCR-D/dev-processing-broker-add-bashlib…

eeaafc4

…-processing-worker add bashlib processing worker, require Python 3.7

MehmedGIT mentioned this pull request Mar 26, 2023

Processor parameters validator overwrites #1025

Open

Make receiving job info for procesor work again

2fa674f

bertsky mentioned this pull request Mar 27, 2023

Web API: change /processor/{executable}/{job_id} to just /processor/{job_id} OCR-D/spec#246

Open

MehmedGIT added 2 commits March 27, 2023 17:37

provide flexible queue checks

2fc3856

remove unnecessary checks

7835154

MehmedGIT self-requested a review March 27, 2023 15:41

MehmedGIT reviewed Mar 27, 2023

View reviewed changes

bruh, I need some rest for today

123366b

Fix the silly mistake I made.

bertsky reviewed Mar 28, 2023

View reviewed changes

ocrd_network/ocrd_network/processing_server.py Outdated Show resolved Hide resolved

ocrd_network/ocrd_network/processing_server.py Outdated Show resolved Hide resolved

MehmedGIT mentioned this pull request Mar 28, 2023

Extension to the Processing Server (#974) #1028

Closed

MehmedGIT added a commit that referenced this pull request Mar 28, 2023

merge from #974

51993fb

check defaults, pass shallow copy

6fac7b8

MehmedGIT requested a review from bertsky March 28, 2023 10:23

bertsky reviewed Mar 28, 2023

View reviewed changes

Makefile Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

MehmedGIT added a commit that referenced this pull request Mar 28, 2023

Adapt the fix from #974

42f16ab

This was referenced Mar 28, 2023

Standalone Processor Server (#884 + #974) #1029

Closed

Standalone Processor Server (#884 + #974) #1030

Merged

kba mentioned this pull request Mar 29, 2023

ocrd network CLI syntax and terminology #1032

Open

tdoan2010 approved these changes Mar 30, 2023

View reviewed changes

Update README.md

bc9a29a

Co-authored-by: Robert Sachunsky <[email protected]>

kba merged commit a2bfd62 into master Apr 2, 2023

kba deleted the dev-processing-broker branch April 2, 2023 12:00

bertsky mentioned this pull request Jun 21, 2023

Allow building with thin module Docker containers OCR-D/ocrd_all#69

Open

bertsky mentioned this pull request Jul 3, 2023

ocrd network: defer enqueue until workspace is free #1046

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Processing-Server #974

Processing-Server #974

joschrew commented Jan 24, 2023 •

edited

Loading

MehmedGIT left a comment

MehmedGIT Mar 27, 2023

MehmedGIT Mar 27, 2023

kba commented Mar 29, 2023

Processing-Server #974

Processing-Server #974

Conversation

joschrew commented Jan 24, 2023 • edited Loading

MehmedGIT left a comment

Choose a reason for hiding this comment

MehmedGIT Mar 27, 2023

Choose a reason for hiding this comment

MehmedGIT Mar 27, 2023

Choose a reason for hiding this comment

kba commented Mar 29, 2023

joschrew commented Jan 24, 2023 •

edited

Loading