Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Azure] Add AAD Workload Identity support #5116

Closed
pinlast opened this issue Jul 8, 2022 · 24 comments
Closed

[Azure] Add AAD Workload Identity support #5116

pinlast opened this issue Jul 8, 2022 · 24 comments
Assignees
Milestone

Comments

@pinlast
Copy link

pinlast commented Jul 8, 2022

Describe the problem/challenge you have
We are using AAD Workload Identity for granting permissions in our azure k8s. And seems like there is no support for it in velero. Only for getting permissions from file or env.
Considering Azure is going to deprecate pod identity for workload identity that would make sense.

Describe the solution you'd like
Add AAD Workload Identity support, getting token from azure-identity-token secret.

Environment:

  • helm version: version.BuildInfo{Version:"v3.9.0", GitCommit:"7ceeda6c585217a19a1131663d8cd1f7d641b2a7", GitTreeState:"clean", GoVersion:"go1.18.2"}

  • helm chart version and app version: chart: velero-2.30.1 app: 1.9.0

  • Kubernetes version : Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.0", GitCommit:"c2b5237ccd9c0f1d600d3072634ca66cefdf272f", GitTreeState:"clean", BuildDate:"2021-08-04T18:03:20Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"darwin/amd64"}
    Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.5", GitCommit:"5a97ee6d15525f6e4a1c2646bf1dfd2ebd5220b5", GitTreeState:"clean", BuildDate:"2022-06-15T04:26:33Z", GoVersion:"go1.17.8", Compiler:"gc", Platform:"linux/amd64"}

  • Kubernetes installer & version: aks

  • Cloud provider or hardware configuration: azure

@jenting jenting transferred this issue from vmware-tanzu/helm-charts Jul 13, 2022
@stale
Copy link

stale bot commented Sep 16, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the staled label Sep 16, 2022
@hunter86bg
Copy link

Do we have something working that I can test ?
I'm planning to setup Velero for our AKS clusters and would be nice to directly deploy via "Azure Workload Identity".

@stale stale bot removed the staled label Sep 20, 2022
@flbla
Copy link

flbla commented Oct 31, 2022

I tried to use Velero + AAD WI with the proxy mode of AAD WI (https://azure.github.io/azure-workload-identity/docs/topics/service-account-labels-and-annotations.html#annotations-1) : azure.workload.identity/inject-proxy-sidecar: true

It's working but my backups are shown as "failed" the only error I found is :

velero-669595d7f8-6clb2 velero time="2022-10-31T08:39:11Z" level=error msg="backup failed" controller=backup error="[rpc error: code = Unavailable desc = error reading from server: EOF, rpc error: code = Unavailable desc = connection error: desc = \"transport: error while dialing: dial unix /tmp/plugin1080651534: connect: connection refused\"]" key=kube-system/backup logSource="pkg/controller/backup_controller.go:301"

but the backup is done, I can see my backup in my storage account, I also tried with disk snapshot, it's working but backup is also shown as failed.
I tried to restore them without any issue

@flbla
Copy link

flbla commented Oct 31, 2022

after some investigations, it was the memory request/limit too low of my pod which was causing this issue.

So if you want to use AAD Workload Identity with Velero, you can,
this is what I added to the Helm values :

serviceAccount:
  server:
    create: true
    name: velero-server
    labels:
      azure.workload.identity/use: "true"
    annotations: 
      azure.workload.identity/client-id: ${application_client_id}

podAnnotations:
  azure.workload.identity/inject-proxy-sidecar: "true"

before, you'll need to create the AAD Identity : https://azure.github.io/azure-workload-identity/docs/quick-start.html

@hunter86bg
Copy link

Thanks, I will give it a try.

@stevehipwell
Copy link

Is there any progress on this as I don't want to have to use the proxy?

@ywk253100
Copy link
Contributor

@pinlast @hunter86bg @stevehipwell Per my understanding, seems there is nothing to do on the Velero side to support the Azure AD workload identity.
For the Velero helm chart, as @flbla's comment, you can already set the related connfigurations(service account/label/annotation); for the installation from CLI, you can run velero install --dry-run first and then edit the generated yaml files.

Correct me if I'm wrong.

@ywk253100 ywk253100 added the Needs info Waiting for information label Jan 9, 2023
@pearj
Copy link

pearj commented Jan 9, 2023

@ywk253100 This ticket is about getting explicit support for AAD Workload Identity directly into Velero. @flbla's solution is a workaround that uses a sidecar (ie an additional container) that offers backwards compatibility with AAD Pod Identity.

It would still be better for Velero to properly support AAD Workload Identity without the need of the sidecar.

@adamrushuk
Copy link
Contributor

@ywk253100
Copy link
Contributor

Thanks @pearj @adamrushuk. Let's put it into the 1.11 milestone.

There is a PR that updates the Azure libraries used by the Velero Azure plugin, maybe it helps for this issue either. Let's do the verification after the PR merged

@ywk253100 ywk253100 added this to the v1.11 milestone Jan 9, 2023
@ywk253100 ywk253100 removed Needs info Waiting for information 1.11-candidate labels Jan 9, 2023
@pearj
Copy link

pearj commented Jan 9, 2023

@ywk253100 It looks like in that PR updates the azure-sdk-for-go/sdk/azidentity to v1.2.0, however full support isn't until v1.3.0-beta.1 (which is the most recent release at the time of writing). It seems that v1.3.0-beta.1 brings automatic support for AAD Workload Identity Support as long as the correct environment variables are there (which they should be). That sounds probably the easiest way to implement. That same issue also had example code if you need to stay on sdk/azidentity v1.2.0 for some reason.

@ywk253100
Copy link
Contributor

@pearj Got it. Will test it after the PR is merged

@jkroepke
Copy link

jkroepke commented Feb 3, 2023

after some investigations, it was the memory request/limit too low of my pod which was causing this issue.

So if you want to use AAD Workload Identity with Velero, you can, this is what I added to the Helm values :

serviceAccount:
  server:
    create: true
    name: velero-server
    labels:
      azure.workload.identity/use: "true"
    annotations: 
      azure.workload.identity/client-id: ${application_client_id}

podAnnotations:
  azure.workload.identity/inject-proxy-sidecar: "true"

before, you'll need to create the AAD Identity : azure.github.io/azure-workload-identity/docs/quick-start.html

Does this work with https://github.com/vmware-tanzu/velero-plugin-for-microsoft-azure, too? What I have to configure there?

@ywk253100
Copy link
Contributor

As we are close to the FC date of v1.11.0, the GA version of v1.3.0 for Azure SDK sdk/azidentity isn't released yet, I'm going to move this issue out of the scope of v1.11.0. We'll fix it once v1.3.0 of the SDK is GAed.

BTW, making changes only on the Velero Azure plugin side isn't enough, Restic/Kopia used by Velero doesn't support AAD workload identity either at this moment. Issue opened for Kopia

@paytience
Copy link

paytience commented May 10, 2023

after some investigations, it was the memory request/limit too low of my pod which was causing this issue.

So if you want to use AAD Workload Identity with Velero, you can, this is what I added to the Helm values :

serviceAccount:
  server:
    create: true
    name: velero-server
    labels:
      azure.workload.identity/use: "true"
    annotations: 
      azure.workload.identity/client-id: ${application_client_id}

podAnnotations:
  azure.workload.identity/inject-proxy-sidecar: "true"

before, you'll need to create the AAD Identity : https://azure.github.io/azure-workload-identity/docs/quick-start.html

Using this workaround does work for making backups and restoring them. However I found that if I delete my helmrelease my pods are stuck in terminating state:

node-agent-zv8g6          2/2     Terminating   6 (24h ago)   24h
velero-54d6979d77-9dpxx   2/2     Terminating   6 (24h ago)   24h

We decided it is not worth it if pods can become hanging. We'll wait for v1.12

@flbla
Copy link

flbla commented May 10, 2023

@paytience It depends on the version of AAD workload identity you use
I had same issue : Azure/azure-workload-identity#774 (comment)

@paytience
Copy link

@paytience It depends on the version of AAD workload identity you use I had same issue : Azure/azure-workload-identity#774 (comment)

I understand, any fixes for this issue? Does a new release of AAD workload identity fix using sidecar annotation?

@flbla
Copy link

flbla commented Jun 20, 2023

@paytience : with the latest version of AAD workload identity (1.1.0) I don't have the issue anymore.

@admincasper
Copy link

@flbla We're using webhook controller 1.0.0 through AKS Add-on to run AAD workload identity webhook unfortunately..

@ywk253100
Copy link
Contributor

The Azure Workload Identity is supported by the Velero Azure plugin, please refer to the doc https://github.com/vmware-tanzu/velero-plugin-for-microsoft-azure/blob/main/README.md#option-2-use-azure-ad-workload-identity.

But it is not supported by Kopia yet, so taking file system backup with the Azure Workload Identity is not working, please note that.

@reasonerjt reasonerjt added backlog 1.13-candidate issue/pr that should be considered to target v1.13 minor release labels Jul 19, 2023
@reasonerjt reasonerjt removed this from the v1.12 milestone Jul 19, 2023
@admincasper
Copy link

The Azure Workload Identity is supported by the Velero Azure plugin, please refer to the doc https://github.com/vmware-tanzu/velero-plugin-for-microsoft-azure/blob/main/README.md#option-2-use-azure-ad-workload-identity.

But it is not supported by Kopia yet, so taking file system backup with the Azure Workload Identity is not working, please note that.

Actually, it is not yet available since they have not made a release yet. In v.1.7.1 workload identity is still not supported unfortunately..

@reasonerjt reasonerjt removed the 1.13-candidate issue/pr that should be considered to target v1.13 minor release label Aug 18, 2023
@reasonerjt reasonerjt added this to the v1.13 milestone Aug 18, 2023
@ywk253100
Copy link
Contributor

Kopia repository part is fixed by #6686

@ccadruvi
Copy link

For anyone else stumbling upon this that wants to use the helm chart, the following are the important helm values:

podLabels: {
    azure.workload.identity/use: "true"
}
labels: {
    azure.workload.identity/use: "true"
}
serviceAccount:
  server:
    create: true
    name: velero-server
    annotations:
      azure.workload.identity/client-id: ${velero_mgid_client_id}
      azure.workload.identity/tenant-id: ${tenant_id}
configuration:
  backupStorageLocation:
  - provider: azure
    bucket: velero
    config:
      resourceGroup: ${backup_resource_group}
      storageAccount: ${backup_storage_account}
      useAAD: "true"

@mbauer-dilt
Copy link

Anybody stumbling upon this at some later point: I can confirm @ccadruvi has the right fix.
I swear to god people, please update the plugin manual/readme page. I have spent more time than I am willing to admit on setting up Velero and all because I was simply missing the two label entries.

Especially since Microsofts "Multiple User Assigned Identities" error message is really not clear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests