Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the dataMoverPrepareTimeout and resourceTimeout to the Node-Agent #1657

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mpryc
Copy link
Contributor

@mpryc mpryc commented Mar 6, 2025

Add the dataMoverPrepareTimeout and resourceTimeout to the DPA's Node-Agent

Fixes #1368

Adds the following fields to the Node-Agent within the DPA:

  • dataMoverPrepareTimeout How long to wait for preparing a DataUpload/DataDownload. Default is 30 minutes.
  • resourceTimeout How long to wait for resource processes which are not covered by other specific timeout parameters. Default is 10 minutes.

Why the changes were made

To fix #1368

How to test the changes made

  1. Run make tests
  2. Manually:
  • Create DPA with the following options
  configuration:
    nodeAgent:
      dataMoverPrepareTimeout: 20m
      enable: true
      resourceTimeout: 10m
      uploaderType: kopia
  • In the node-agent pod shell, confirm the node-agent server is running with proper flags set:
$ tr '\0' ' ' < /proc/1/cmdline
/velero node-agent server --data-mover-prepare-timeout=20m0s --resource-timeout=10m0s
  • (update DPA scenario) Modify the DPA with updated dataMoverPrepareTimeout:
  configuration:
    nodeAgent:
      dataMoverPrepareTimeout: 25m30s
      enable: true
      resourceTimeout: 10m
      uploaderType: kopia
  • In the node-agent pod shell, confirm the node-agent server is running with proper flags set:
$ tr '\0' ' ' < /proc/1/cmdline
/velero node-agent server --data-mover-prepare-timeout=25m30s --resource-timeout=10m0s
  • (update DPA scenario - removed timeout) Modify the DPA with removed dataMoverPrepareTimeout:
  configuration:
    nodeAgent:
      enable: true
      resourceTimeout: 15m
      uploaderType: kopia
  • In the node-agent pod shell, confirm the node-agent server is running with proper flags:
$ tr '\0' ' ' < /proc/1/cmdline      
/velero node-agent server --resource-timeout=15m0s 

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 6, 2025
@mpryc mpryc force-pushed the issue-1368 branch 2 times, most recently from 8d45879 to 7c6ccf7 Compare March 6, 2025 12:29
…-Agent

Fixes openshift#1368

Adds the following fields to the Node-Agent within the DPA:
 - dataMoverPrepareTimeout
   How long to wait for preparing a DataUpload/DataDownload. Default is 30 minutes.
 - resourceTimeout
   How long to wait for resource processes which are not covered by other specific
   timeout parameters. Default is 10 minutes.

Signed-off-by: Michal Pryc <[email protected]>
@mpryc
Copy link
Contributor Author

mpryc commented Mar 6, 2025

/retest

@mpryc mpryc requested a review from mateusoliveira43 March 6, 2025 17:49
@mpryc
Copy link
Contributor Author

mpryc commented Mar 6, 2025

/test 4.19-e2e-test-aws

@mateusoliveira43
Copy link
Contributor

/test unit-test

@@ -403,6 +403,19 @@ func (r *DataProtectionApplicationReconciler) customizeNodeAgentDaemonset(ds *ap
nodeAgentContainer.ImagePullPolicy = imagePullPolicy
setContainerDefaults(nodeAgentContainer)

// append data mover prepare timeout and resource timeout to nodeAgent container args
if !useResticConf {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think PR needs rebase, since I removed this var in here 9ce4670#diff-080fe4f6649a4b1692368f7a41402aa8c6217c4ad5e5f8afe572fdd87974f939

Copy link
Contributor

@mateusoliveira43 mateusoliveira43 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think rebase is needed

Copy link

openshift-ci bot commented Mar 7, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mpryc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

openshift-ci bot commented Mar 7, 2025

@mpryc: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/unit-test e9e506f link true /test unit-test

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OADP-3736 Need to set data-mover-prepare-timeout parameter in DPA
2 participants