Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additions to the workflow-endpoint #1108

Merged
merged 6 commits into from
Oct 11, 2023
Merged

Conversation

joschrew
Copy link
Contributor

@joschrew joschrew commented Oct 3, 2023

This contains a new endpoint to query the status of a workflow run and improved error handling for the workflow-enpoint. This were some of the parts we realized in our last meeting we need improvements in.

@joschrew joschrew requested review from MehmedGIT and kba October 3, 2023 08:18
Copy link
Contributor

@MehmedGIT MehmedGIT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works just great on my end! Could and should be merged to master together with #1083 and #1105 in the next core release.

@MehmedGIT
Copy link
Contributor

MehmedGIT commented Oct 5, 2023

Okay, I thought of a few other small quality-of-life improvements. Since there is an endpoint for checking the workflow job status, potentially we could further improve:

  1. The user is no longer interested in getting a big list with all processing job IDs when a workflow is triggered.
    Suggestion: Simplify the response to contain just the workflow job ID (and all other fields except processing_job_ids).

  2. When a few pages fail in the workflow, currently there is no way for the user to know exactly which processing tasks, i.e. pages, have failed.
    Suggestion: When returning the dictionary with states, in case some page ids have a state "FAILED", then a more detailed list inside the response dictionary should be filled with all the DBProcessorJob parameters for the failed tasks. That would help the user to identify directly the failed pages and check the correct log files.

@MehmedGIT MehmedGIT self-requested a review October 5, 2023 07:15
@MehmedGIT MehmedGIT self-requested a review October 5, 2023 12:50
if failed_tasks_key not in res:
res[failed_tasks_key] = failed_tasks
failed_tasks.setdefault(job.processor_name, [])
failed_tasks[job.processor_name].append(job.job_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should append here more than just the job.job_id. In my suggestion, it was the entire job model but that may still fill the output with unnecessary details (bad). My idea there was that the user could already get all the information needed for the failed jobs (mainly the page_id and potentially the log file path that will come in the future) instead of checking the details of each job with separate requests to the processing server again.

Although I am not sure what is the best here. It all comes down to what will be easier for the user when automatically parsing the response. Maybe @kba could provide more hints regarding that. We could also keep it as it is now. This is fine for the current state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I didn't realize that you wanted it that way. I think then it would be better if we e.g. I at least add the page_id for now so that it is easier to adapt later and that it is more obvious that more fields would potentially be available.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now added the page_id. I agree that the logfile-path would be helpful. We should add that sometime when it is possible.

Base automatically changed from workflow-endpoint-logging-2023-09-26 to workflow-endpoint October 11, 2023 11:25
Copy link
Member

@kba kba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. We should continue the discussion about what best to return to users in a separate issue after release.

@kba kba merged commit 155ea2a into workflow-endpoint Oct 11, 2023
1 check passed
@kba kba deleted the workflow-additions branch October 11, 2023 11:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants