Refactor network module - prepare it for easier sub-module testing #1191

MehmedGIT · 2024-02-17T05:02:40Z

Please feel free to suggest more changes that would lead to easier testing or improve the readability of the code. The Scrutinizer analysis already reports some good improvements so far.

Implemented refactoring changes:

Makefile extension to support ocrd network test commands: network-module-test, network-integration-test
Harmonize code - spacing between method parameters to maintain readability while still reducing the lines of the files. More variable typing wherever it was missing. Extensive use of " instead of ' for strings to remove the need of escaping the ' symbol when the string includes '.
Improved error messages to include extra variable data if available for easier debugging
Replaced repetitive constants with extensive use of Enums such as: AgentType, DeployType, JobState, NetworkLoggingDirs, ServerApiTags
Env variable refactor to be generally used for all RabbitMQ clients and not just workers OCRD_NETWORK_WORKER_QUEUE_CONNECT_ATTEMPTS -> OCRD_NETWORK_RABBITMQ_CLIENT_CONNECT_ATTEMPTS
The default job state is now set to UNSET when not provided
StateEnum -> JobState for better clarification
All occurrences of status -> state to maintain the same variable name everywhere
The logging module to avoid potential name collisions: src/ocrd_network/logging.py -> src/ocrd_network/logging_utils.py
Combined all RabbitMQ-related data into a single dictionary variable instead of 5 separate variables (processing server and processing worker). The validators also return a dictionary of validated variables.
Deployer module - increased readability and testability of separate deployments.
Run-time data modules - created separate files for different concepts. Check here.
Provided config parser wrapper methods for the different run-time data (for easier testing). Check here.
Processing server improvements (~20% less lines):
- Introduced raise_http_exception method to wrap repetitive lines
- Created wrapper methods for separate concepts in big methods for easier concept testing, e.g., parse_workflow_tasks, validate_first_task_input_file_groups_existence and moved these to utils or server_utils depending on whether a server response is involved or not. Moreover, moved some other full methods which were better fitting inside utils or server_utils such as create_processing_message
- Overall improved naming conventions for methods, e.g., push_processor_job -> validate_and_forward_job_to_network_agent where network agent is of kind AgentType (check Enums above)
- Refactored server cache implementation + provided sync versions of the async methods to directly reuse these in the tests.
Refactored rabbitmq_utils:
- Moved the create_queue method from processing worker to rabbitmq_utils module
- Provided extra 4 wrapper methods to remove code duplication in the processing server and processing worker. Check here.
Extended the OcrdProcessingMessage schema to support agent_type. Check here.
Fixed the OcrdResultMessage schema - the schema required either path_to_mets or workspace_id. However, setting workspace_id to None was triggering an error that it's not of type string according to the schema. On the other hand, assigning an empty string to workspace_id was triggering an error that either path_to_mets or workspace_id could be set but not both. So there was no way to create a result message without a crash. By default, the fields path_to_mets or workspace_id are set to empty strings when nothing is assigned. Check here.

Implemented tests:

The naming conventions are test_integration_* or test_modules_* - the integration tests require access to either the MongoDB or RabbitMQ, but the module tests do not. Hence, the module tests can be executed without starting any extra services (faster for testing).
Refactor test_db.py -> test_integration_1_db.py and replaced magic values with constants
Refactor test_rabbitmq.py -> test_integration_2_rabbitmq.py and replaced magic values with constants
test_integration_3_server_cache_requests.py - tests of the server cache for processing requests
test_integration_4_processing_worker.py - tests of the processing worker
test_integration_5_processing_server.py - tests of the processing server
test_modules_logging_utils.py
test_modules_param_validators.py
test_modules_process_helpers.py - Testing invocation of pythonic and bashlib processor
test_modules_server_cache_pages.py - Unlike the server cache requests, the pages cache does not require database access.

Open issues:

The bashlib processor test is still not working and is disabled currently - _test_invoke_processor_bash() - Shall be fixed in this PR.
There is still the deploy_agent_native_get_pid_hack method, but now isolated away from the deployer. An eye should be kept on that in another PR.

Note: I have not merged the master branch to this PR yet, since v2.63.3 broke some mets caching. I will potentially do that before this PR can be finally merged.

Note2: The macOS tests are disabled since they fail due to a reason not related to this PR itself.

MehmedGIT · 2024-03-26T08:31:40Z

For me this PR is hard to review, because this are really a lot of changes.

Understandable.

I have tried to look into all the files but especially the changes in the processing_server where hard to follow.

For the processing server specifically, it is better to check the result not the changes. That one file got a lot of modifications and had to become fewer lines (~20% less).

I run the tests though and I tried to run the changes in a self build Dockercontainer together with a pythonic and bashlib worker. This tests all passed on my machine.

Have you also tried the _test_invoke_processor_bash() test by removing the underscore? That is the one I am not sure how to run yet to test the spawning of a bashlib processor separately from a worker.

The good thing is this changes nearly only affect the network package

Yes, tried to avoid moving stuff inside other modules (shall be the last thing to do) for now since this PR is already big enough.

kba

LGTM, very thorough. I'll now try to fix the bashlib test case, besides that looks good to go.

README.md

src/ocrd_network/constants.py

src/ocrd_network/processor_server.py

src/ocrd_network/rabbitmq_utils/helpers.py

src/ocrd_network/runtime_data/hosts.py

Co-authored-by: Konstantin Baierer <[email protected]>

joschrew · 2024-03-27T07:19:50Z

Have you also tried the _test_invoke_processor_bash() test by removing the underscore? That is the one I am not sure how to run yet to test the spawning of a bashlib processor separately from a worker.

No, I didn't try that.

(consider case where for_fileIds is an empty list)

MehmedGIT added 30 commits February 13, 2024 16:19

fix shadow name config

6afc982

remove unnecessary params

1126c4d

refine exceptions

1b2bf19

refine imports

6f798e4

remove unnecessary code

3aee09e

fix processor server

30809fb

move expand pages

4084129

simplify verifications

ef9264f

improve server cache formatting

e382ca5

refactor workflow endpoint

1cba489

fileGrp -> file_group

2147c29

simplify run workflow method

377464d

http exception wrapper

ca22496

greatly refactor deployer, agents, services

c48fe0c

further refactor, fix broken hosts

41c92d3

change deployer location

b822f03

fix weird spacing in messages

75784d1

get rid of extra RMQ connection

80ff855

improve config parsing

4871aa2

refactor deployer mets server

cb48505

get rid of leftovers

ecb9afe

fix broken connections

e4e9bfc

fix typo

60399c5

fix circular imports, move to constants

55654f8

revert some changes

bc1f007

fix typing

fdd34be

add extra filter catcher

57c2568

update validators

b3c609a

more verbose error print

f4b7b69

fix typing

28f359e

add missed call in processor server

8a137e8

fix spacing

0d3def2

kba approved these changes Mar 26, 2024

View reviewed changes

MehmedGIT and others added 3 commits March 26, 2024 17:07

Update README.md

0345829

Co-authored-by: Konstantin Baierer <[email protected]>

Update src/ocrd_network/constants.py

911724b

Co-authored-by: Konstantin Baierer <[email protected]>

move rmq uri pattern to const

61c7ad8

MehmedGIT mentioned this pull request Mar 26, 2024

Update the RabbitMQ Version to 3.12 #1147

Closed

kba and others added 16 commits April 8, 2024 15:26

make coverage: omit generateDS code

81939bc

ocrd_mets: type hints

e9f5800

ocrd_mets: forgot import Tuple

459127a

ocrd_mets.py: cannot use defined class in type hints

b1b8fbf

ocrd_mets.py: forgot Any

afb25f0

test_ocrd_mets.py: also test empty for_fileIds list

4169579

ocrd_mets.py: fix get_physical_pages_for

f563757

(consider case where for_fileIds is an empty list)

get_physical_pages_for: if for_fileIds is empty list, return empty list

76293a9

OcrdMets.set_physical_page_for_file: fix cached case

0041ada

OcrdMets: simplify

e9b23ac

📦 v2.63.1

a3720e0

📦 v2.63.2

da4fb66

run test-logging in a temp dir to avoid "logging" package being shadowed

77b0f9b

put ocrd-tool.json for bashlib_cp_processor in separate file

5a1f501

enable bashlib test in test_modules_process_helpers

cfb26f1

forgot to add the renamed ocrd-cp processor

9c08d85

kba added a commit that referenced this pull request Apr 8, 2024

commit network-testing.yml workflow here, so it runs on #1191

537b1f7

kba added 2 commits April 8, 2024 16:01

merge master

9319743

network-testing: workflow_dispatch

7851f8a

kba merged commit 7851f8a into master Apr 8, 2024
17 checks passed

kba deleted the refactor-network branch April 8, 2024 14:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor network module - prepare it for easier sub-module testing #1191

Refactor network module - prepare it for easier sub-module testing #1191

MehmedGIT commented Feb 17, 2024 •

edited

Loading

MehmedGIT commented Mar 26, 2024

kba left a comment

joschrew commented Mar 27, 2024

Refactor network module - prepare it for easier sub-module testing #1191

Refactor network module - prepare it for easier sub-module testing #1191

Conversation

MehmedGIT commented Feb 17, 2024 • edited Loading

MehmedGIT commented Mar 26, 2024

kba left a comment

Choose a reason for hiding this comment

joschrew commented Mar 27, 2024

MehmedGIT commented Feb 17, 2024 •

edited

Loading