Velero crashes with an "invalid memory address or nil pointer dereference" #8440

sentia-be-ops · 2024-11-21T15:02:13Z

What steps did you take and what happened:
Installed velero v1.15.0 using the helm chart v8.0.0 on a k8s v1.30 cluster with vmware-csi based PVs.

Velero will start correctly after install. As soon as the first backup runs the velero pod gets into a CrashLoopBackoff state.
Error shown in the log mentions an "invalid memory address or nil pointer dereference".

I initially installed with the previous helm chart and figured it might be fixed by using chart v8.0.0 as that includes new CRD fields, but no change.

What did you expect to happen:
Velero to not crash and backup my cluster.

The following information will help us better understand what's going on:

Logs

Environment:

Velero version: v1.15.0
Velero features: EnableCSI
Kubernetes version: v1.30.5
Kubernetes installer & version: ClusterAPI v1.8.4
Cloud provider or hardware configuration: vsphere-cpi
OS: Flatcar Container Linux by Kinvolk 3975.2.2 (Oklo)

Other Info
Backup repository is backed by a NetApp based S3 bucket.

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

👍 for "I would like to see this bug fixed as soon as possible"
👎 for "There are more important bugs to focus on right now"

blackpiglet · 2024-11-22T06:58:34Z

velero/pkg/backup/backup.go

Line 1050 in 1d4f147

    
           volumeInfos[index].SnapshotDataMovementInfo.SnapshotHandle = dataUpload.Status.SnapshotID

The code that triggers the panic is clear, but it is still important to find out how this happens.

The panic only happens when the DataUpload CR's related VolumeInfo doesn't have the SnapshotDataMovementInfo section.
The Velero backup creates a VolumeInfo metadata file in the Object Storage bucket.

Could you help check the content of your failed backup's VolumeInfo metadata file?
The name should be something like this: backup-name-volumeinfo.json.gz.

sentia-be-ops · 2024-11-22T08:43:54Z

I've added the volumeinfo.json to the gist.

Could the issue be caused because there is a mix of CSI-based and non-CSI-based PVs on the cluster? The PV's it can't backup up because of that have skipped: true and don't have a SnapshotDataMovementInfo section because it can't backup those.

The goal is to get rid of the non-CSI-based ones but that is a work in progress. In the meantime it can't backup the data from those non-CSI-based PVs but that is fine, though it shouldn't completely fail on it I think.

blackpiglet · 2024-11-22T08:52:57Z

Thanks for the information.
It's a reasonable request. I will create a PR to fix it, and the fix should be included in the coming v1.15.1 patch release.

blackpiglet · 2024-11-27T13:33:01Z

Hi @sentia-be-ops,
I created PR #8465 to fix this issue, but after thinking twice, I still couldn't figure out a scenario that could trigger this error.
I built an image based on PR #8465, and the image address is blackpiglet/velero:8440. It's a public image. You should have permission to access it.
Could you help verify whether this PR can fix your issue?

blackpiglet self-assigned this Nov 22, 2024

blackpiglet added Volumes Relating to volume backup and restore Needs info Waiting for information Needs triage We need discussion to understand problem and decide the priority labels Nov 22, 2024

blackpiglet added the target/1.15.1 label Nov 22, 2024

reasonerjt removed Needs info Waiting for information Needs triage We need discussion to understand problem and decide the priority labels Nov 25, 2024

reasonerjt added this to the v1.16 milestone Nov 25, 2024

reasonerjt added the Bug label Nov 25, 2024

blackpiglet mentioned this issue Nov 27, 2024

[release-1.15]Add nil check for updating DataUpload VolumeInfo in finalizing phase. #8465

Merged

3 tasks

Lyndon-Li closed this as completed Dec 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Velero crashes with an "invalid memory address or nil pointer dereference" #8440

Velero crashes with an "invalid memory address or nil pointer dereference" #8440

sentia-be-ops commented Nov 21, 2024 •

edited

Loading

blackpiglet commented Nov 22, 2024

sentia-be-ops commented Nov 22, 2024

blackpiglet commented Nov 22, 2024

blackpiglet commented Nov 27, 2024

Velero crashes with an "invalid memory address or nil pointer dereference" #8440

Velero crashes with an "invalid memory address or nil pointer dereference" #8440

Comments

sentia-be-ops commented Nov 21, 2024 • edited Loading

blackpiglet commented Nov 22, 2024

sentia-be-ops commented Nov 22, 2024

blackpiglet commented Nov 22, 2024

blackpiglet commented Nov 27, 2024

sentia-be-ops commented Nov 21, 2024 •

edited

Loading