Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not able to restore backup which has PV and PVC's to a cluster in different region [AWS EKS] #4799

Closed
Shapai opened this issue Mar 31, 2022 · 5 comments
Assignees
Labels
Area/Cloud/AWS Needs info Waiting for information

Comments

@Shapai
Copy link

Shapai commented Mar 31, 2022

What steps did you take and what happened:
[A clear and concise description of what the bug is, and what commands you ran.)

hello we are using velero in DR planning , we are working on strategy of cross region backup restore ,
We are taking backups of workloads, PV and PVC's
We are facing issues while restoring the backup to second region (US-West-2) from (US-EAST-2).

The installation goes through without any issues on both clusters using below command

velero install \
    --provider aws \
    --plugins velero/velero-plugin-for-aws:v1.4.0 \
    --bucket velerobucket\
    --backup-location-config region=us-east-2 \
    --snapshot-location-config region=us-east-2 \
    --secret-file secret-file

the backup creation also goes through without any error

velero backup create zookeeperbkp --include-namespaces zookeeper --snapshot-volumes

While doing the restore on us-west-2 cluster from us-east-2, the restore completes successfully without any errors in velero restore logs
but the zookeeper pods go in pending state

velero restore create  --from-backup zookeeperbkp
kubectl get pods -n zookeeper
NAME          READY   STATUS    RESTARTS   AGE
zookeeper-0   0/2     Pending   0          3m24s
zookeeper-1   0/2     Pending   0          3m24s
zookeeper-2   0/2     Pending   0          3m24s

after describing the pods it complains with

0/1 nodes are available: 1 node(s) had volume node affinity conflict.

after describing the PV, it seems it is trying to create PV in us-east-2, the labels are of us-east-2
whereas it should be us-west-2(Restore cluster)

After all this i read more on limitations of velero on restoring PV's and PVC's in cross region cluster. There are links where people have modified the velero jsons in S3

I tried doing the same, by modifying the velero snapshot json file from s3

aws s3 cp s3://velerobkpxyz/backups/zookeeper/ ./ --recursive
gunzip zookeeper-volumesnapshots.json.gz
sed -i "s/us-east-2/us-west-2/g" zookeeper-volumesnapshots.json
s3 cp zookeeper-volumesnapshots.json.gz s3://velerobkp/backups/zookeeper/zookeeper-volumesnapshots.json.gz

similarly i did the change for zookeeper.tar.gz

mkdir zookeeper-temp
tar xzf zookeeper.tar.gz -C zookeeper-temp/
cd zookeeper-temp/
find . -name \*.json -exec sh -c "sed -i 's/us-east-2/us-west-2/g' {}" \;
tar czf ../zookeeper.tar.gz *
aws s3 cp zookeeper.tar.gz s3://velerobkp/backups/zookeeper/

After this the backup when described comes up with correct region names for PV's

velero backup describe zookeeper9 --details

Name:         zookeeper9
Namespace:    velero
Labels:       velero.io/storage-location=default
Annotations:  velero.io/source-cluster-k8s-gitversion=v1.21.5-eks-bc4871b
              velero.io/source-cluster-k8s-major-version=1
              velero.io/source-cluster-k8s-minor-version=21+

Phase:  Completed

Errors:    0
Warnings:  0

Namespaces:
  Included:  zookeeper
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        <none>
  Cluster-scoped:  auto

Label selector:  <none>

Storage Location:  default

Velero-Native Snapshot PVs:  true

TTL:  720h0m0s

Hooks:  <none>

Backup Format Version:  1.1.0

Started:    2022-03-30 20:37:53 +0530 IST
Completed:  2022-03-30 20:37:57 +0530 IST

Expiration:  2022-04-29 20:37:53 +0530 IST

Total items to be backed up:  52
Items backed up:              52

Resource List:
  apiextensions.k8s.io/v1/CustomResourceDefinition:
    - servicemonitors.monitoring.coreos.com
  apps/v1/ControllerRevision:
    - zookeeper/zookeeper-596cddb599
    - zookeeper/zookeeper-5977bdccb6
    - zookeeper/zookeeper-5cd569cbf9
    - zookeeper/zookeeper-6585c9bc89
    - zookeeper/zookeeper-6bf55cfd99
    - zookeeper/zookeeper-856646d9f6
    - zookeeper/zookeeper-8cdd5f46
    - zookeeper/zookeeper-ccf87988c
  apps/v1/StatefulSet:
    - zookeeper/zookeeper
  discovery.k8s.io/v1/EndpointSlice:
    - zookeeper/zookeeper-headless-2tnx5
    - zookeeper/zookeeper-mzdlc
  monitoring.coreos.com/v1/ServiceMonitor:
    - zookeeper/zookeeper-exporter
  policy/v1/PodDisruptionBudget:
    - zookeeper/zookeeper
  v1/ConfigMap:
    - zookeeper/kube-root-ca.crt
    - zookeeper/zookeeper
  v1/Endpoints:
    - zookeeper/zookeeper
    - zookeeper/zookeeper-headless
  v1/Namespace:
    - zookeeper
  v1/PersistentVolume:
    - pvc-261b9803-8e55-4880-bb31-b29ca3a6c323
    - pvc-89cfd5b9-65da-4fd1-a095-83d21d1d21db
    - pvc-9e027e4c-cc9e-11ea-9ce3-061b42a2865e
    - pvc-a835d78d-9dfd-41f7-92bd-7f2e752dbeb7
    - pvc-c0e454f7-cc9e-11ea-9ce3-061b42a2865e
    - pvc-ee6aad46-cc9e-11ea-9ce3-061b42a2865e
  v1/PersistentVolumeClaim:
    - zookeeper/data-zookeeper-0
    - zookeeper/data-zookeeper-1
    - zookeeper/data-zookeeper-2
    - zookeeper/data-zookeeper-3
    - zookeeper/data-zookeeper-4
    - zookeeper/data-zookeeper-5
  v1/Pod:
    - zookeeper/zookeeper-0
    - zookeeper/zookeeper-1
    - zookeeper/zookeeper-2
    - zookeeper/zookeeper-3
    - zookeeper/zookeeper-4
    - zookeeper/zookeeper-5
  v1/Secret:
    - zookeeper/default-token-kcl4m
    - zookeeper/sh.helm.release.v1.zookeeper.v1
    - zookeeper/sh.helm.release.v1.zookeeper.v10
    - zookeeper/sh.helm.release.v1.zookeeper.v11
    - zookeeper/sh.helm.release.v1.zookeeper.v12
    - zookeeper/sh.helm.release.v1.zookeeper.v13
    - zookeeper/sh.helm.release.v1.zookeeper.v4
    - zookeeper/sh.helm.release.v1.zookeeper.v5
    - zookeeper/sh.helm.release.v1.zookeeper.v6
    - zookeeper/sh.helm.release.v1.zookeeper.v7
    - zookeeper/sh.helm.release.v1.zookeeper.v8
    - zookeeper/sh.helm.release.v1.zookeeper.v9
  v1/Service:
    - zookeeper/zookeeper
    - zookeeper/zookeeper-headless
  v1/ServiceAccount:
    - zookeeper/default

Velero-Native Snapshots:
  pvc-9e027e4c-cc9e-11ea-9ce3-061b42a2865e:
    Snapshot ID:        snap-0f81f2f62e476584a
    Type:               gp2
    Availability Zone:  us-west-2b
    IOPS:               <N/A>
  pvc-c0e454f7-cc9e-11ea-9ce3-061b42a2865e:
    Snapshot ID:        snap-0c689771f3dbfa361
    Type:               gp2
    Availability Zone:  us-west-2a
    IOPS:               <N/A>
  pvc-ee6aad46-cc9e-11ea-9ce3-061b42a2865e:
    Snapshot ID:        snap-068c63f1bb31af3cc
    Type:               gp2
    Availability Zone:  us-west-2b
    IOPS:               <N/A>
  pvc-89cfd5b9-65da-4fd1-a095-83d21d1d21db:
    Snapshot ID:        snap-050e2e51eac92bd74
    Type:               gp2
    Availability Zone:  us-west-2a
    IOPS:               <N/A>
  pvc-261b9803-8e55-4880-bb31-b29ca3a6c323:
    Snapshot ID:        snap-08e45396c99e7aac3
    Type:               gp2
    Availability Zone:  us-west-2b
    IOPS:               <N/A>
  pvc-a835d78d-9dfd-41f7-92bd-7f2e752dbeb7:
    Snapshot ID:        snap-07ad93657b0bdc1a6
    Type:               gp2
    Availability Zone:  us-west-2a
    IOPS:               <N/A> 

But when trying to restore it fails

velero restore create --from-backup zookeeper

velero restore describe zookeeper9-20220331145320
Name:         zookeeper9-20220331145320
Namespace:    velero
Labels:       <none>
Annotations:  <none>

Phase:                       PartiallyFailed (run 'velero restore logs zookeeper9-20220331145320' for more information)
Total items to be restored:  52
Items restored:              52

Started:    2022-03-31 14:53:24 +0530 IST
Completed:  2022-03-31 14:53:36 +0530 IST

Warnings:
  Velero:     <none>
  Cluster:    <none>
  Namespaces:
    zookeeper:  could not restore, ConfigMap "kube-root-ca.crt" already exists. Warning: the in-cluster version is different than the backed-up version.

Errors:
  Velero:     <none>
  Cluster:  error executing PVAction for persistentvolumes/pvc-261b9803-8e55-4880-bb31-b29ca3a6c323: rpc error: code = Unknown desc = InvalidZone.NotFound: The zone 'us-west-2b' does not exist.
  status code: 400, request id: 2b5ae55c-dfd5-4c52-8494-105e46bce78b
    error executing PVAction for persistentvolumes/pvc-89cfd5b9-65da-4fd1-a095-83d21d1d21db: rpc error: code = Unknown desc = InvalidZone.NotFound: The zone 'us-west-2a' does not exist.
  status code: 400, request id: ed91b698-d3b9-450f-b7b4-a3869cbae6ae
    error executing PVAction for persistentvolumes/pvc-9e027e4c-cc9e-11ea-9ce3-061b42a2865e: rpc error: code = Unknown desc = InvalidZone.NotFound: The zone 'us-west-2b' does not exist.
  status code: 400, request id: 2b493106-84c6-4210-9663-4d00f47c06de
    error executing PVAction for persistentvolumes/pvc-a835d78d-9dfd-41f7-92bd-7f2e752dbeb7: rpc error: code = Unknown desc = InvalidZone.NotFound: The zone 'us-west-2a' does not exist.
  status code: 400, request id: 387c6c27-6b18-4bc6-9bb8-3ed152cb49d1
    error executing PVAction for persistentvolumes/pvc-c0e454f7-cc9e-11ea-9ce3-061b42a2865e: rpc error: code = Unknown desc = InvalidZone.NotFound: The zone 'us-west-2a' does not exist.
  status code: 400, request id: 7d7d2931-e7d9-4bc5-8cb1-20e3b2849fe2
    error executing PVAction for persistentvolumes/pvc-ee6aad46-cc9e-11ea-9ce3-061b42a2865e: rpc error: code = Unknown desc = InvalidZone.NotFound: The zone 'us-west-2b' does not exist.
  status code: 400, request id: 75648031-97ca-4e2a-a079-8f6618902b2a
  Namespaces: <none>

Backup:  zookeeper9

Namespaces:
  Included:  all namespaces found in the backup
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io
  Cluster-scoped:  auto

Namespace mappings:  <none>

Label selector:  <none>

Restore PVs:  auto

Preserve Service NodePorts:  auto

it complains with

 Cluster:  error executing PVAction for persistentvolumes/pvc-261b9803-8e55-4880-bb31-b29ca3a6c323: rpc error: code = Unknown desc = InvalidZone.NotFound: The zone 'us-west-2b' does not exist.
  status code: 400, request id: 2b5ae55c-dfd5-4c52-8494-105e46bce78b

Not sure why this is happening , is there anything that i have missed .

What did you expect to happen:

The following information will help us better understand what's going on:

If you are using velero v1.7.0+:
Please use velero debug --backup <backupname> --restore <restorename> to generate the support bundle, and attach to this issue, more options please refer to velero debug --help

If you are using earlier versions:
Please provide the output of the following commands (Pasting long output into a GitHub gist or other pastebin is fine.)

  • kubectl logs deployment/velero -n velero
  • velero backup describe <backupname> or kubectl get backup/<backupname> -n velero -o yaml
  • velero backup logs <backupname>
  • velero restore describe <restorename> or kubectl get restore/<restorename> -n velero -o yaml
  • velero restore logs <restorename>

bundle-2022-03-31-18-04-57.tar.gz

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • Velero version (use velero version): Version: v1.8.1
  • Velero features (use velero client config get features):
  • Kubernetes version (use kubectl version):
    Client Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.2-13+d2965f0db10712", GitCommit:"d2965f0db1071203c6f5bc662c2827c71fc8b20d", GitTreeState:"clean", BuildDate:"2021-06-26T01:02:11Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"linux/amd64"}
    Server Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.5-eks-bc4871b", GitCommit:"5236faf39f1b7a7dabea8df12726f25608131aa9", GitTreeState:"clean", BuildDate:"2021-10-29T23:32:16Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"linux/amd64"}
  • Kubernetes installer & version:
  • Cloud provider or hardware configuration: AWS EKS
  • OS (e.g. from /etc/os-release):

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "I would like to see this bug fixed as soon as possible"
  • 👎 for "There are more important bugs to focus on right now"
@reasonerjt
Copy link
Contributor

reasonerjt commented Apr 5, 2022

After all this i read more on limitations of velero on restoring PV's and PVC's in cross region cluster. There are links where people have modified the velero jsons in S3

Could you show me the link? I haven't tried it but I think best scenario this only works for certain cases. AFAIK AWS snapshot is region-specific, did you replicate the snapshot from us-east to us-west?

And as for the particular error:
The zone 'us-west-2b' does not exist.
It seems in the json the AZ is set to us-west-2b, please double-check.

@reasonerjt reasonerjt self-assigned this Apr 5, 2022
@reasonerjt reasonerjt added the Needs info Waiting for information label Apr 5, 2022
@Shapai
Copy link
Author

Shapai commented Apr 7, 2022

@reasonerjt thanks for you response , yes aws snapshot is region specific , for velero to restore it in other region the snapshots will have to be available in other region.
Here there is a PR that was raised https://github.com/jglick/velero-plugin-for-aws/tree/x-region which serves the purpose of copying the snapshots to alternate region and then restore it in the second region . It is working currently for us when tried , also this branch https://github.com/jglick/velero/tree/concurrent-snapshot for velero core
We sourced images for( velero and velero plugin for aws) from these branches and we were able to restore in other regions using these features .
Are there any plans to merge these feature branches , This will help ones who are using cross region restore on aws with velero .

@reasonerjt
Copy link
Contributor

@Shapai
This branch is not on our radar.
You can talk to the author to write an issue and PR to discuss with the maintainers.

So this seems an issue you encounter while using a forked repo. Do you think we need to keep this issue open?

@reasonerjt
Copy link
Contributor

closing due to inactivity.

@kmadel
Copy link

kmadel commented Mar 16, 2023

Actually there was a PR here #4242 - but it was declined for future architecture reasons or something like that. But there is still no cross regions support with snapshots in AWS or Azure as far as I know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area/Cloud/AWS Needs info Waiting for information
Projects
None yet
Development

No branches or pull requests

3 participants