Velero Restic Backup Fails with "Identity not found" Error on Azure Kubernetes Service (AKS) #526

Mahagon · 2023-12-04T13:55:02Z

What steps did you take and what happened:
I encountered an issue while trying to back up a volume with Restic in Velero. Here are the steps I took and the resulting error:

I annotated a test deployment with backup.velero.io/backup-volumes to enable backup of a volume using Restic. The volume uses the storage class private-azurefile-csi.
I enabled the node-agent deployment in my setup.
Upon triggering a backup, I received the following error message:

azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://management.azure.com/subscriptions/%SUBSCRIPTIONID%/resourceGroups/%RESOURCEGROUP%/providers/Microsoft.Storage/storageAccounts/%STORAGE_ACCOUNT%/listKeys?%24expand=kerb&api-version=2019-06-01: StatusCode=400 -- Original Error: adal: Refresh request failed. Status Code = '400'. Response body: {"error":"invalid_request","error_description":"Identity not found"} Endpoint http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&client_id=%CLIENT_ID%&resource=https%3A%2F%2Fmanagement.azure.com%2F

The identity mentioned in the error does exist, and I can locate it in the Azure portal. I'm wondering if there's an issue with how the node agent handles workload identities or if I'm missing a configuration step.

What did you expect to happen:
I expected the Node-Agent to correctly identify and use the client ID for authentication, allowing Velero to perform the restic backup operation without any errors.

The output of the following commands will help us better understand what's going on:
(Pasting long output into a GitHub gist or other pastebin is fine.)

Anything else you would like to add:
Additional resources including our Helmfile, logs, and backup repository description are available in this GitHub gist.

Some historical context that might be relevant: about a year ago, we successfully backed up an Azure File volume using a similar setup. However, at that time, we were utilizing aad-pod-identity instead of the current workload identity. This change might be a contributing factor to the issue, although I cannot rule out other changes like updates in the Kubernetes version.

It's also worth noting that our regular backups, which do not involve File Share Backup (FSB), are running without any issues. This problem seems specific to backups involving Restic and potentially related to the transition from aad-pod-identity to workload identity.

Environment:

helm version (use helm version): v3.13.2
helm chart version and app version (use helm list -n <YOUR NAMESPACE>): Chart: velero-5.1.6, App: 1.12.2
Kubernetes version (use kubectl version): 1.28.4
Kubernetes installer & version: AKS 1.27.3
Cloud provider or hardware configuration: Azure
OS (e.g. from /etc/os-release): Ubuntu 22.04

The text was updated successfully, but these errors were encountered:

Mahagon · 2023-12-07T14:21:45Z

Would guess it's an issue in our configuration therefor i closed the issue.
We found a way avoid using the azurefile storage ;)

Mahagon closed this as not planned Won't fix, can't repro, duplicate, stale Dec 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Velero Restic Backup Fails with "Identity not found" Error on Azure Kubernetes Service (AKS) #526

Velero Restic Backup Fails with "Identity not found" Error on Azure Kubernetes Service (AKS) #526

Mahagon commented Dec 4, 2023

Mahagon commented Dec 7, 2023

Velero Restic Backup Fails with "Identity not found" Error on Azure Kubernetes Service (AKS) #526

Velero Restic Backup Fails with "Identity not found" Error on Azure Kubernetes Service (AKS) #526

Comments

Mahagon commented Dec 4, 2023

Mahagon commented Dec 7, 2023