Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Velero Restic Backup Fails with "Identity not found" Error on Azure Kubernetes Service (AKS) #526

Closed
Mahagon opened this issue Dec 4, 2023 · 1 comment

Comments

@Mahagon
Copy link

Mahagon commented Dec 4, 2023

What steps did you take and what happened:
I encountered an issue while trying to back up a volume with Restic in Velero. Here are the steps I took and the resulting error:

  1. I annotated a test deployment with backup.velero.io/backup-volumes to enable backup of a volume using Restic. The volume uses the storage class private-azurefile-csi.
  2. I enabled the node-agent deployment in my setup.
  3. Upon triggering a backup, I received the following error message:
azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://management.azure.com/subscriptions/%SUBSCRIPTIONID%/resourceGroups/%RESOURCEGROUP%/providers/Microsoft.Storage/storageAccounts/%STORAGE_ACCOUNT%/listKeys?%24expand=kerb&api-version=2019-06-01: StatusCode=400 -- Original Error: adal: Refresh request failed. Status Code = '400'. Response body: {"error":"invalid_request","error_description":"Identity not found"} Endpoint http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&client_id=%CLIENT_ID%&resource=https%3A%2F%2Fmanagement.azure.com%2F

The identity mentioned in the error does exist, and I can locate it in the Azure portal. I'm wondering if there's an issue with how the node agent handles workload identities or if I'm missing a configuration step.

What did you expect to happen:
I expected the Node-Agent to correctly identify and use the client ID for authentication, allowing Velero to perform the restic backup operation without any errors.

The output of the following commands will help us better understand what's going on:
(Pasting long output into a GitHub gist or other pastebin is fine.)

Anything else you would like to add:
Additional resources including our Helmfile, logs, and backup repository description are available in this GitHub gist.

Some historical context that might be relevant: about a year ago, we successfully backed up an Azure File volume using a similar setup. However, at that time, we were utilizing aad-pod-identity instead of the current workload identity. This change might be a contributing factor to the issue, although I cannot rule out other changes like updates in the Kubernetes version.

It's also worth noting that our regular backups, which do not involve File Share Backup (FSB), are running without any issues. This problem seems specific to backups involving Restic and potentially related to the transition from aad-pod-identity to workload identity.

Environment:

  • helm version (use helm version): v3.13.2
  • helm chart version and app version (use helm list -n <YOUR NAMESPACE>): Chart: velero-5.1.6, App: 1.12.2
  • Kubernetes version (use kubectl version): 1.28.4
  • Kubernetes installer & version: AKS 1.27.3
  • Cloud provider or hardware configuration: Azure
  • OS (e.g. from /etc/os-release): Ubuntu 22.04
@Mahagon Mahagon closed this as not planned Won't fix, can't repro, duplicate, stale Dec 7, 2023
@Mahagon
Copy link
Author

Mahagon commented Dec 7, 2023

Would guess it's an issue in our configuration therefor i closed the issue.
We found a way avoid using the azurefile storage ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant