Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow endpoint of the Processing Server #1083

Merged
merged 97 commits into from
Oct 11, 2023
Merged

Workflow endpoint of the Processing Server #1083

merged 97 commits into from
Oct 11, 2023

Conversation

joschrew
Copy link
Contributor

@joschrew joschrew commented Aug 28, 2023

Endpoint to run a workflow with the processing-server

Usage described here

Base automatically changed from processing_server_ext_1046 to master September 4, 2023 11:38
ocrd_network/ocrd_network/processing_server.py Outdated Show resolved Hide resolved
try:
tasks_list = workflow.splitlines()
tasks = [ProcessorTask.parse(task_str) for task_str in tasks_list if task_str.strip()]
except BaseException as e:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really need more specific exceptions in ocrd/ocrd/task_sequence.py...

ocrd_network/ocrd_network/processing_server.py Outdated Show resolved Hide resolved
ocrd_network/ocrd_network/processing_server.py Outdated Show resolved Hide resolved
ocrd_network/ocrd_network/server_utils.py Outdated Show resolved Hide resolved
@MehmedGIT
Copy link
Contributor

MehmedGIT commented Sep 8, 2023

There is a need of request revocation method to remove stuck requests from the internal queue of a workspace. This situation happens when a processing job fails, and there are other jobs depending on the failed job. A simple solution would be to cancel all jobs that depend on the failed job. Then further cancel all jobs that are depending on the cancelled jobs.

EDIT: This is implemented.

@MehmedGIT MehmedGIT changed the title First version of the workflow endpoint Workflow endpoint of the Processing Server Sep 12, 2023
@kba kba mentioned this pull request Sep 28, 2023
@MehmedGIT
Copy link
Contributor

MehmedGIT commented Sep 28, 2023

Things to be considered yet (maybe not in #1083 directly since WebAPI docs adaptations would be required too):

  1. Ease the user for checking job statuses, i.e., currently the processing server supports GET /processor/{processor_name}/{job_id} for checking the status of the processing job. However, this is inconvenient for the general user. The user should be able to check the processing step job status without knowing the name of the processor. The job_id is a globally unique id.
    Suggestion: GET /processor/{processor_name}/{job_id} -> GET /processor/job_status/{job_id}
  2. Since there is no way to check the workflow job status, a separate endpoint is needed. In a similar manner to checking the processing job status. There is already a DB and Response model for the Workflow.
    Suggestion: provide an endpoint GET /workflow/job_status/{job_id}
  3. Find an optimal way to update the job status of a workflow based on the job statuses of the building processing steps.

@MehmedGIT
Copy link
Contributor

MehmedGIT commented Sep 28, 2023

For the page-wise processing, I was also considering a way to reduce the amount of total requests. With a workflow that utilizes 9 processors and a workspace of 100 pages that makes 900 requests per workflow... We could also provide something like page_step. Instead of creating a one-page request, the page_step amount of pages per request will be created. This adds more complexity for testing but is also more flexible and convenient for the user. For now, I would just keep it as an idea - till we have everything tested and working.

EDIT: Feature postponed since it will bring more complexity and further increase the required timeout value for the METS Server.

Adding it as an extra note so it does not disappear somewhere above.

@kba kba merged commit e6b7ae0 into master Oct 11, 2023
1 check passed
@kba kba deleted the workflow-endpoint branch November 17, 2023 13:11
@bertsky
Copy link
Collaborator

bertsky commented Jul 31, 2024

Things to be considered yet (maybe not in #1083 directly since WebAPI docs adaptations would be required too):

  1. Ease the user for checking job statuses, i.e., currently the processing server supports GET /processor/{processor_name}/{job_id} for checking the status of the processing job. However, this is inconvenient for the general user. The user should be able to check the processing step job status without knowing the name of the processor. The job_id is a globally unique id.
    Suggestion: GET /processor/{processor_name}/{job_id} -> GET /processor/job_status/{job_id}

Good idea.

Note: the current spec still says we should use GET /processor/job/{job_id} for this. So either core or spec needs to be updated anyway.

  1. Since there is no way to check the workflow job status, a separate endpoint is needed. In a similar manner to checking the processing job status. There is already a DB and Response model for the Workflow.
    Suggestion: provide an endpoint GET /workflow/job_status/{job_id}

Also 👍

But @joschrew already implemented e6ce39e as GET /workflow/{job_id} now.

IMO this is ambiguous: as of the spec, this looks like querying the workflow definition.

The current spec still says we should use GET /workflow/job/{job_id} for this.

Both processing step and workflow status endpoints are also explained in the human-readable readme in these terms.

  1. Find an optimal way to update the job status of a workflow based on the job statuses of the building processing steps.

IIUC @joschrew already found a way of doing that in the aforementioned commit. That solves it, right?

@bertsky
Copy link
Collaborator

bertsky commented Jul 31, 2024

We still need a network client CLI implementation for the new workflow endpoint. Should we track this in a new issue?

@MehmedGIT
Copy link
Contributor

Anything you see in the spec should be up-to-date and matching with core v2.67.2. If that is not the case, let me know.

Regarding the network client CLI - that idea was left behind at some point and needs to be implemented. A separate issue makes sense.

@bertsky
Copy link
Collaborator

bertsky commented Jul 31, 2024

Anything you see in the spec should be up-to-date and matching with core v2.67.2. If that is not the case, let me know.

oops, sry, I did not look at master – everything is in order between spec and core, as you stated.

Regarding the network client CLI - that idea was left behind at some point and needs to be implemented. A separate issue makes sense.

#1264

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants