Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: argo cron suspend-all / argo cron resume-all Command #13595

Open
hmoulart opened this issue Sep 13, 2024 · 3 comments
Open

Proposal: argo cron suspend-all / argo cron resume-all Command #13595

hmoulart opened this issue Sep 13, 2024 · 3 comments
Labels
area/cli The `argo` CLI area/cron-workflows solution/workaround There's a workaround, might not be great, but exists type/feature Feature request

Comments

@hmoulart
Copy link

hmoulart commented Sep 13, 2024

Proposal: argo cron suspend-all / argo cron resume-all Command

Background and Motivation

Currently, Argo Workflows lacks a native way to suspend multiple workflows simultaneously. Users managing large-scale deployments (500+ workflows) face significant time and resource constraints when trying to suspend all workflows, especially in critical situations requiring quick action.

The existing method of suspending workflows individually or through scripted solutions is inefficient and time-consuming, often taking up to 30 minutes for 500+ workflows. This limitation poses a significant operational challenge and may impact system performance and reliability in production environments.

argo cron list -o name | awk '{print "argo cron suspend "$1}' | bash
argo cron list -o name | awk '{print "argo cron resume "$1}' | bash

Note: A similar demand was raised on Apr 11, 2023 (#10876), but the proposal was incomplete and did not gain traction. This proposal aims to provide a more comprehensive and actionable solution to address this long-standing need.

Proposed Feature

We propose adding a new command to the Argo CLI: argo cron suspend-all. This command would provide a native, efficient way to suspend multiple workflows at once, significantly improving the management of large-scale Argo Workflows deployments.

Command Syntax

argo cron suspend-all [-n namespace] [--selector=<label-selector>] [--dry-run] [--force]

Parameters

  • -n, --namespace: Specify the namespace (optional, defaults to the current namespace)
  • --selector: Label selector to filter workflows (optional)
  • --dry-run: Print the workflows that would be suspended without actually suspending them
  • --force: Skip confirmation prompt

Behavior

  1. The command will identify all running workflows in the specified namespace (or across all namespaces if not specified).
  2. If a selector is provided, it will filter the workflows based on the given labels.
  3. It will display a summary of workflows to be suspended and prompt for confirmation (unless --force is used).
  4. Upon confirmation, it will suspend all identified workflows in parallel, with appropriate rate limiting to prevent API overload.
  5. After completion, it will display a summary of the results (e.g., "Successfully suspended 495/500 workflows").

Example Usage

$ argo cron suspend-all -n production --selector=env=staging

Found 527 running workflows in namespace 'production' matching selector 'env=staging'.

Operation completed. Results:
- Successfully suspended: 525 workflows
- Failed to suspend: 2 workflows
- Errors:
  - workflow-abc: Permission denied
  - workflow-xyz: Workflow already completed

Time taken: 45 seconds

Implementation Considerations

  1. Performance: Implement parallel processing with appropriate rate limiting to handle large numbers of workflows efficiently.
  2. Error Handling: Provide clear error messages and continue processing other workflows if suspension fails for some.
  3. Reversibility: Consider adding a complementary resume-all command for easy reversal of this operation.
  4. Logging: Ensure detailed logs are generated for auditing purposes.
  5. Testing: Include comprehensive unit and integration tests, especially for large-scale scenarios.

Benefits

  1. Significantly reduced time for bulk workflow management operations.
  2. Improved disaster recovery and maintenance procedures.
  3. Enhanced user experience for large-scale Argo Workflows deployments.
  4. Standardized approach to mass workflow suspension, reducing the need for custom scripts.

Next Steps

  1. Gather community feedback on this proposal.
  2. Refine the design based on maintainer and user input.
  3. Create a detailed technical design document if required.
  4. Implement the feature with thorough testing and documentation.

We believe this feature would greatly enhance Argo Workflows' capabilities in managing large-scale deployments and improve overall user experience.

@hmoulart hmoulart added the type/feature Feature request label Sep 13, 2024
@agilgur5 agilgur5 added the area/cli The `argo` CLI label Sep 13, 2024
@agilgur5
Copy link

agilgur5 commented Sep 13, 2024

We propose adding a new command to the Argo CLI: argo suspend-all

There is already a keyword @latest in the CLI, so rather than a new command, it could be a new keyword @all

through scripted solutions is inefficient and time-consuming, often taking up to 30 minutes for 500+ workflows.

argo cron list -o name | awk '{print "argo cron suspend "$1}' | bash

If I'm not mistaken, this is being run in serial, hence the slow throughput. Parallelizing this should be substantially faster.

Note: A similar demand was raised on Apr 11, 2023 (#10876)

Also that and your above command are for argo cron suspend, not argo suspend. Please differentiate the two

Behavior

Having a confirmation, progress bar, etc is quite complicated and involved; I would strongly suggest having an MVP for a first pass rather than all these requirements.

Retrieving all Workflows and then suspending one by one for progress reporting is also multiple API requests, whereas nearly all the CLI commands correspond 1-to-1 to an API method. That ensures consistency and simplicity

@agilgur5 agilgur5 changed the title Feature Proposal: argo suspend-all / argo resume-all Command Proposal: argo suspend-all / argo resume-all Command Sep 13, 2024
@hmoulart hmoulart changed the title Proposal: argo suspend-all / argo resume-all Command Proposal: argo cron suspend-all / argo cron resume-all Command Sep 14, 2024
@agilgur5 agilgur5 added the solution/workaround There's a workaround, might not be great, but exists label Sep 14, 2024
@hmoulart
Copy link
Author

hmoulart commented Sep 15, 2024

Hello,
Thank you for your answer and suggestions!
I've updated the description to simplify the behavior. I'm going to try parallelizing the command:
argo cron list -o name | xargs -P 4 -I {} argo cron suspend {}

This command uses xargs with the -P flag to run up to 4 parallel processes, which should improve performance when suspending multiple cron workflows. The -I {} flag allows us to use {} as a placeholder for each workflow name.
Let me know if you need any further clarification or have any questions about this approach.

@agilgur5
Copy link

I've updated the description to simplify the behavior

It looks like you only removed the progress bar element?
I would suggest, rather than removing, split the "Behavior" section with a subsection called "Stretch Goals", and move certain features like progress bar there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cli The `argo` CLI area/cron-workflows solution/workaround There's a workaround, might not be great, but exists type/feature Feature request
Projects
None yet
Development

No branches or pull requests

2 participants