Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate Continuation after Task Failure and Success Determination #12530

Open
ryancurrah opened this issue Jan 16, 2024 · 9 comments
Open

Separate Continuation after Task Failure and Success Determination #12530

ryancurrah opened this issue Jan 16, 2024 · 9 comments
Labels
area/controller Controller issues, panics area/spec Changes to the workflow specification. area/templates/dag solution/workaround There's a workaround, might not be great, but exists type/feature Feature request

Comments

@ryancurrah
Copy link
Contributor

ryancurrah commented Jan 16, 2024

Description

I've encountered a situation in Argo Workflows where, when using a Directed Acyclic Graph (DAG) with FailFast set to true, there's a need to allow a workflow to continue even after a task fails, but still have the workflow recognize the failure at the end. Currently, by using depends: "B.Succeeded || B.Failed" in a task, we can ensure that a subsequent task (like task C) will run even if task B fails. However, this approach also leads to the workflow being marked as successful, despite the failure of task B (assuming no other tasks fail).

This behavior is not explicitly documented in the enhanced depends logic documentation, and we need to distinguish between continuing a workflow after a task failure and the final success/failure determination of the workflow itself.

Related Slack conversation.

Suggested Enhancement

I propose an enhancement where there's a clear separation between allowing a workflow to continue after a task failure and determining the overall success or failure of the workflow. This could potentially be an additional attribute or a modification in the depends logic.

Use Case

This enhancement would be beneficial in scenarios where certain tasks are critical but not blockers for the continuation of the workflow. Teams would have the flexibility to allow the workflow to proceed with subsequent tasks, while still being able to flag the workflow as failed if any critical task doesn't succeed.

Expected Behavior

With this enhancement, users should be able to:

  • Allow subsequent tasks to run after a task failure without automatically marking the entire workflow as successful.
  • Have the workflow's final status reflect the failure of critical tasks, even if other tasks were able to run successfully afterwards.

Message from the maintainers:

Love this enhancement proposal? Give it a 👍. We prioritize the proposals with the most 👍.

@ryancurrah ryancurrah added the type/feature Feature request label Jan 16, 2024
@Joibel Joibel added the area/controller Controller issues, panics label Jan 16, 2024
@agilgur5 agilgur5 added the area/spec Changes to the workflow specification. label Jan 20, 2024
@agilgur5
Copy link

agilgur5 commented Mar 9, 2024

This could also apply to hooks (and exit handlers), which have a bit of a similar confusion among them.

@agilgur5
Copy link

agilgur5 commented Mar 9, 2024

Perhaps this could be something like a ignoreForFinalStatus field (or a simpler name). Defaults to false for backward-compat, but if you set it to true, the node's exit code / status has no impact on the overall Workflow status.

@ryancurrah
Copy link
Contributor Author

ryancurrah commented Mar 16, 2024

That seems like an acceptable solution if we change the depends logic to no longer affect the outcome of workflow results.

  1. Adjusting depends Logic: The depends logic would continue to dictate the execution order of tasks based on their success or failure but would not directly influence the final workflow status. This change ensures that workflow execution is still flexible, allowing subsequent tasks to run even if previous tasks fail, based on the defined dependencies.
  2. Introducing ignoreFailure Field: A new field (e.g., ignoreFailure) would be added to each task definition. This field allows workflow designers to explicitly mark certain tasks as non-critical for the overall workflow success. If ignoreFailure is set to true for a task, its failure will not prevent the workflow from being marked as successful, provided all other tasks deemed critical are successful.
tasks:
  - name: optional-setup
    template: setup-template
    ignoreFailure: true  # This task's failure won't affect the workflow's final status

  - name: critical-analysis
    template: analysis-template
    ignoreFailure: false  # This task's failure will cause the workflow to fail

  - name: cleanup
    template: cleanup-template
    depends: "optional-setup.Succeeded || optional-setup.Failed"
    ignoreFailure: true  # Cleanup runs regardless of previous task success but also doesn't affect final status

I think this would be a breaking change though if people expected depends to affect the final results of a workflow.

@ryancurrah
Copy link
Contributor Author

ryancurrah commented Mar 16, 2024

Also for others here is a workaround:

Provide the task step that has the depends logic depends: "B.Succeeded || B.Failed" with parameters of the previous task step B status and if it failed return a non-zero exit code.

- name: reportquality
  arguments:
    parameters:
      - name: b_status
        value: "{{tasks.B.status}}"
[ "{{inputs.parameters.b_status}}" = "Failed" ] && { echo "B task step failed. exiting with non-zero status."; exit 1; } || true

@agilgur5 agilgur5 added the solution/workaround There's a workaround, might not be great, but exists label Mar 16, 2024
@agilgur5 agilgur5 changed the title Separate Control for Continuing Workflow After a Task Failure and Final Workflow Success Determination Separate Continuation after Task Failure and Success Determination Mar 17, 2024
@agilgur5
Copy link

Wait a minute, isn't this already possible with the continueOn field?

@ryancurrah
Copy link
Contributor Author

Wait a minute, isn't this already possible with the continueOn field?

My understanding is that it doesn't work with depends and I tried continueOn without depends and using dependencies and it still did not work.

@agilgur5
Copy link

agilgur5 commented Mar 19, 2024

However, this approach also leads to the workflow being marked as successful, despite the failure of task B (assuming no other tasks fail).

Oh right, you're trying to do the opposite of continueOn.

I think this would be a breaking change though if people expected depends to affect the final results of a workflow.

That makes more sense why it would need a breaking change. This is unlikely to happen though, and I'm not sure that that behavior is more intuitive; arguably it is less intuitive.

A non-breaking way to do this would be to introduce a new top-level Workflow flag, similar to failFast, that affects the processing of the whole DAG. But I don't think we've ironed out how to do this intuitively yet.

My understanding is that it doesn't work with depends and I tried continueOn without depends and using dependencies and it still did not work.

You might be able to do this by adding steps inside of your dag.

But I think this still wouldn't achieve your goal, since you need the opposite of continueOn. Something like "continueOn but still fail". Or a custom ignoreStatus field that lets you choose which statuses to ignore (then you could ignore Succeeded as well). This is getting pretty complicated 😅

@ryancurrah
Copy link
Contributor Author

Yeah it seems like a simple ask at first. This is coming from a need to upload reports with the results of previous tasks (If they failed or not) and fail the entire workflow if the reports contain issues.

@jjkavalam
Copy link

This is coming from a need to upload reports with the results of previous tasks (If they failed or not) and fail the entire workflow if the reports contain issues.

Exactly my use case. (In Jenkins, such components are called "publishers"; here is a list of related use cases)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/controller Controller issues, panics area/spec Changes to the workflow specification. area/templates/dag solution/workaround There's a workaround, might not be great, but exists type/feature Feature request
Projects
None yet
Development

No branches or pull requests

4 participants