-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
velero completely ignores --csi-snapshot-timeout
flag
#8473
Comments
kubectl get backup -oyaml |
Also please give an example how you used |
Sorry, about missing that detail. I just updated the original shell command to show it. What's really strange is I have a very small volume that I can snapshot just fine, it's just these that have actual data (8gbs) that take longer and have this problem. My demo volume is 50gigs but probably has less than 1mb of dummy data and it works fine to snapshot. The one with real data always fails. |
I guess it is sending that value over? but then why is it timing out before it sees the volume contents in the attached logs 🤔 For the record manual backups (outside of velero) work just fine, if I create a volume snapshot using kubernetes manifests. So Oracle CSI is working as expected and I've just tried a different backup scheduler and its not having issues at all. I would like to keep using velero though since it has the neat CLI to go with it.
- apiVersion: velero.io/v1
kind: Backup
metadata:
annotations:
velero.io/resource-timeout: 10m0s
velero.io/source-cluster-k8s-gitversion: v1.29.1
velero.io/source-cluster-k8s-major-version: "1"
velero.io/source-cluster-k8s-minor-version: "29"
creationTimestamp: "2024-12-03T17:45:36Z"
generation: 6
labels:
velero.io/storage-location: default
name: manual-backup-11
namespace: velero
resourceVersion: "509394137"
uid: 62b9fd3d-abfe-4a3e-b804-0d2278e21d17
spec:
csiSnapshotTimeout: 20m0s
defaultVolumesToFsBackup: false
hooks: {}
includedNamespaces:
- app-12345-425
includedResources:
- persistentvolumeclaims
itemOperationTimeout: 4h0m0s
labelSelector:
matchLabels:
test: velero-backup
metadata: {}
snapshotMoveData: false
snapshotVolumes: true
storageLocation: default
ttl: 720h0m0s
status:
completionTimestamp: "2024-12-03T17:46:32Z"
errors: 1
expiration: "2025-01-02T17:45:41Z"
formatVersion: 1.1.0
hookStatus: {}
phase: PartiallyFailed
progress:
itemsBackedUp: 2
totalItems: 2
startTimestamp: "2024-12-03T17:45:41Z"
version: 1 |
Yours did not fail due to timeout. Otherwise you will see log velero/pkg/util/csi/volume_snapshot.go Lines 754 to 772 in 23ca089
|
So I changed, nothing. and it's working now on a schedule now.
|
Was it reliably reproducible (how many out of 10 fails etc.)? I think it might've been a oneoff.. we can add some retry in here to make it more reliable if apiserver is slow. |
Yes it was, for the past 24 hours it ran every 5 minutes and only one succeeded and now its succeeding everytime. |
ok.. so not 100% reproducible given enough retries.. so env flakes. |
I can try PR something. But since your env is now succeeding every time, I assume you won't be able to test it. |
this is what I did just before it started working: apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: app-12345-425-as-as-dc1-app-app-12345-425-as-as-dc1-app-0
namespace: app-12345-425
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 250Gi
dataSource:
name: app-12345-425-as-as-dc1-app-app-12345-425-as-as-dc1-app-0-app-12345-425-snapshot-202412031840
kind: VolumeSnapshot
apiGroup: snapshot.storage.k8s.io Then I scaled backup my statefulset. |
I can still test it because I have a lot more deployments that need to be backed up as well. |
|
What steps did you take and what happened:
I created a volume like this:
velero backup create data-backup \ --include-namespaces app-12345-425 \ --include-resources=persistentvolumeclaims \ --selector='test=velero-backup' \ --csi-snapshot-timeout=20m
Results in the following logs. it actually does create the volume backup but deletes it. I've timed how long it takes to create a backup manually and it takes about 47 seconds - so I think the issue might be in the fact it's only checking for 50 seconds.
What did you expect to happen:
A successfully created backup.
Anything else you would like to add:
Creating a volume snapshot manually like this, but takes about 47 seconds to complete and be in the "Ready" state in kubernetes. We should be able to toggle the wait time to maybe a few minutes so that we can always succeed.
Environment:
velero version
):velero client config get features
):kubectl version
):Vote on this issue!
This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.
The text was updated successfully, but these errors were encountered: