Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data mover not working with velero 1.15.0 and Azure Workload Identity #8433

Closed
fredgate opened this issue Nov 19, 2024 · 4 comments · Fixed by #8487
Closed

Data mover not working with velero 1.15.0 and Azure Workload Identity #8433

fredgate opened this issue Nov 19, 2024 · 4 comments · Fixed by #8487

Comments

@fredgate
Copy link

fredgate commented Nov 19, 2024

Resumption of issue vmware-tanzu/helm-charts#627

What steps did you take and what happened:

On AKS, we backup persistent volumes backed by Azure Disk via CSI snapshot and data movement.
Authentication against the object storage (Azure blob) used to upload backup metadata and CSI snapshot data is performed via Azure Workload Identity.

Starting with velero 1.15.0 (helm chart 8.0.0) the data upload actions were moved outside of the node agent into microservice pods, each dedicated to one DataUpload.
These pods, however, do not inherit the labels set to velero and node-agent pods via the helm value podLabels. On the other hand these pods uses velero-server service account well.

Azure Workload identity requires the label azure.workload.identity/use: "true" to be set such that the pod can source the client id from the service account.
As a consequence of this missing label, authentication against Azure blob fails and the data upload cannot be completed.

Here the metadata description of such a pod :

Name:             test-cf5q7
Namespace:        velero
Priority:         0
Service Account:  velero-server
Node:             node1
Start Time:       Tue, 19 Nov 2024 16:42:54 +0100
Labels:           velero.io/data-upload=test-cf5q7
                  velero.io/exposer-pod-group=snapshot-exposer
Annotations:      <none>
Status:           Failed
Controlled By:    DataUpload/test-cf5q7
Containers:
  dce57391-c34d-4b84-9ec6-9b04f1dd4d78:
    Image:         registry.contoso.com/velero/velero:v1.15.0
    Command:
      /velero
      data-mover
      backup

What did you expect to happen:

The CSI snapshot is restored into a temporary PVC and uploaded towards Azure Blob.

Anything else you would like to add:

The creation of the micro-service pod for data movement is made by the csiSnapshotExposer.Expose method by providing labels issued from csiExposeParam.HostingPodLabels.
These labels are a map initialized with the data-upload label, and completed with the exposer-pod-group label

Environment:

  • Velero version : 1.15.0
  • Velero features (use velero client config get features): CSI, data mover
  • Kubernetes version : 1.29
  • Kubernetes installer & version: AKS
  • Cloud provider or hardware configuration: Azure

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "I would like to see this bug fixed as soon as possible"
  • 👎 for "There are more important bugs to focus on right now"
@reasonerjt
Copy link
Contributor

I recall i was a decision that the microservice DM pod should not copy all the labels and annotations of velero server and node-agent, we may add this label to the "white-list" and consider make the label configurable in future, if needed.

@fredgate
Copy link
Author

The labels of velero server and node agent pods are

labels:
  app.kubernetes.io/instance=velero
  app.kubernetes.io/managed-by=Helm
  app.kubernetes.io/name=velero
  app.kubernetes.io/version=1.15.0
  azure.workload.identity/use=true
  helm.sh/chart=velero-8.0.0
  name=velero
  pod-template-hash=54d8684d9

labels:
  app.kubernetes.io/instance=velero
  app.kubernetes.io/managed-by=Helm
  app.kubernetes.io/name=velero
  azure.workload.identity/use=true
  controller-revision-hash=556d89b4c6
  helm.sh/chart=velero-8.0.0
  name=node-agent
  pod-template-generation=5

Instead of to have a green-list of labels to copy to the micro-service pods, it could be better to use a red-list with known labels so if someone use some custom labels there will be present without velero needing to know them. These labels could in the red list and excluded from copy :

  app.kubernetes.io/instance
  app.kubernetes.io/managed-by
  app.kubernetes.io/name
  app.kubernetes.io/version
  helm.sh/chart
  name
  controller-revision-hash
  pod-template-hash
  pod-template-generation

@Lyndon-Li
Copy link
Contributor

As the discussion, we will continue to use white list, which is more controllable. A PR #8487 has been submitted.

@fredgate
Copy link
Author

fredgate commented Dec 17, 2024

Great news :-)
When is version 1.15.1 scheduled for release ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants