-
Notifications
You must be signed in to change notification settings - Fork 59
Backup and Restore
The Cassandra operator supports taking backups of a cluster managed by the operator and restoring those backups into a new cluster. This document outlines how to configure and manage backups and restores.
To backup a cluster means that the whole state of a cluster, per node, is uploaded to remote location. The whole state means that by default, it will upload all SSTables to some cloud destination. Currently we are supporting upload to S3, Azure or GCP.
The backup procedure is initiated by Cassandra operator itself once you apply a backup spec to Kubernetes. The backup controller watches this CRD and it will call Icarus of each node via HTTP where it submits a backup operation. Internally, Icarus uses our other project - Instaclustr Esop - via which it will take a snapshot of a node and all created SSTables are uploaded. This is happening in parallel on each node and SSTables themselves are also uploaded in parallel. SSTables are stored in a bucket for whatever cloud it might be. If a bucket does not exist, it might be automatically created.
To restore a cluster means that before a node is started, SSTables are downloaded from remote location where they were previously uploaded. They are downloaded into the location where Cassandra picks them up upon its start to it seems as if these files where there all the time. A restoration is done in node-by-node fashion via restore init container.
Upon restoration, we are setting auto_bootstrap to false and we set initial_token to be equal to tokens a node was running with when it was about to be backed up.
Restoration of whole cluster does not mean that we can "rewind" the state on the current cluster. We can currently restore "from scratch" only. However, you might restore some keyspace / table by sending a restoration operation request to Sidecar which would restore a keyspace / table on a running node.
The very first thing you need to do in order to make a backup happen is to specify credentials for a cloud you want your SSTables to be backed up. The authentication mechanism varies across clouds. If you were to support all targets we currently provide, you would have to creat a Kubernetes Secret which would look like this:
apiVersion: v1
kind: Secret
metadata:
name: cloud-backup-secrets
type: Opaque
stringData:
awssecretaccesskey: _enter_here_aws_secret_access_key_
awsaccesskeyid: _enter_here_aws_access_key_id_
awsregion: _enter_here_aws_region_eu-central-1_
awsendpoint: _enter_aws_endpoint_if_any_
azurestorageaccount: _enter_here_azure_storage_account_
azurestoragekey: _enter_here_azure_storage_key_
gcp: 'put here content of gcp.json from Google GCP backend'
In order to backup to your cloud of choice, you have to use same keys in stringData
as above. Cassandra operator reacts to these keys specifically and to nothing else. You do not need to specify every key. For example, if you plan to upload only to Azure, just create a secret which contains only Azure specific keys.
After the secret is specified, you can proceed to backup, you have to apply this spec:
apiVersion: cassandraoperator.instaclustr.com/v1alpha1
kind: CassandraBackup
metadata:
name: test-cassandra-backup
labels:
app: cassandra
spec:
cdc: test-cluster-dc1-cassandra
cluster: test-cluster
datacenter: dc1
storageLocation: "s3://cassandra-bucket"
snapshotTag: "mySnapshotTag"
secret: cloud-backup-secrets
globalRequest: true
secret: cloud-backup-secrets
First of all, notice how we are calling our backup - test-cassandra-backup-restore-s3, we will use this
name once we want to restore a cluster. Secondly, cdc
, that is name of our cluster, datacenter
and cluster
are taken from Cassandra CDC too. Notice also storageLocation, its prefix is azure so it means that we are going to perform a backup to Azure. Next, bucket name is stefan-cassandra-testdc-bucket so this will be the bucket a backup operation will upload all files to. Currently, this bucket has to exist beforehand. snapshotTag follows - this is the name of a snapshot a backup procedure does. As you can imagine, if you apply this spec multiple times with different snapshots over time, you will end up with different data which reflects different state of your cluster.
If your bucket does not exist, it could be automatically created, there are these fields for backup spec related to buckets:
- createMissingBucket - if set to
true
, in case your bucket does not exist, it will be automatically created - skipBucketVerification - if set to true, we will skip checking if a bucket exists, keep in mind that if that bucket does not exist and you skip its verification, your backup fails
- insecure - if set to
true
, the communication with a bucket, e.g with S3 or Azure will be carried out on HTTP instead on HTTPS.insecure
is by defaultfalse
, so all communication is by default done securely - metadataDirective - specific to S3 only, specifies whether the metadata is copied from the source object or replaced with metadata provided in the request, defaults to COPY.
Lastly, you have to specify secret. Here, we reference the name of the secret created at the beginning.
If you apply this CRD, you can track the progress by getting or describing respective resource:
$ kubectl get cassandrabackups.cassandraoperator.instaclustr.com
NAME STATUS PROGRESS
test-cassandra-backup-restore-s3 RUNNING 83%
$ kubectl describe cassandrabackups.cassandraoperator.instaclustr.com
Name: test-cassandra-backup-restore-s3
... other fields omitted
Status:
Node: cassandra-test-dc-cassandra-west1-a-0
Progress: 66%
State: RUNNING
Node: cassandra-test-dc-cassandra-west1-b-0
Progress: 48%
State: RUNNING
Node: cassandra-test-dc-cassandra-west1-c-0
Progress: 45%
State: RUNNING
Once all pods are backed up, Progress
will be 100% and State
will become Completed
.
Congratulations, you have backed up your cluster. Let's see how you actually restore it.
The restoration is very simple. Firstly, be sure your secret exists as specified so we can talk to a cloud storage upon restore. Restoration is done by init container, before Cassandra and Sidecar is even started. Init container will download all data from a cloud so when Cassandra starts, it feels as if it just started.
All you need to do is to specify this snippet in CDC:
apiVersion: cassandraoperator.instaclustr.com/v1alpha1
kind: CassandraDataCenter
metadata:
name: test-dc-cassandra
labels:
app: cassandra
spec:
# a bunch of other configration parameters
restore:
backupName: test-cassandra-backup-restore-s3
All CDC spec is as you are used to, it differs only on restore
. backupName
is, surprisingly, name of a backup. That backup object has to exist. There might be as well secret
field - name of a secret from the backup spec above. We inject name of this secret to init container so restoration procedure will resolve all necessary credentials from Kubernetes dynamically. It will be automatically taken from backup spec when not specified.
Absolutely. This is the reason why we have implemented it in that way. The trick is that once a backup operation request is sent to a Sidecar container, it will internally reach to Kubernetes API, from within, by official Kubernetes Java API client and it will try to resolve credentials for a cloud you have specified a prefix in storageLocation for. Hence you can change your credentials as you wish because they will be retrieved from Kubernetes every time dynamically.
No worries. Imagine you have a completely different Kubernetes cluster you want to restore a Cassandra cluster into. Similarly, maybe you have just lost your backup spec accidentally. In either case, we can create a backup spec but when we create it, it will not proceed to actual backup because there is nothing to backup. You have to specify a field with name justCreate
and set it to true like this:
apiVersion: cassandraoperator.instaclustr.com/v1alpha1
kind: CassandraBackup
metadata:
name: test-cassandra-backup
labels:
app: cassandra
spec:
cdc: test-cluster-dc1-cassandra
cluster: test-cluster
datacenter: dc1
storageLocation: "s3://cassandra-bucket"
snapshotTag: "mySnapshotTag"
secret: cloud-backup-secrets
justCreate: true <----- see?
For Cassandra 3/4, you should also backup system_schema
, you need to set this specifically otherwise a new node, upon restore, would regenerate system tables as well as system_schema
and it would not know about your keyspaces you are trying to restores. Hence it is important to backup this keyspace explicitly, so specify it like entities: system_schema,testks1,testks2
for example.
Sure. Imagine you have a cluster of 5 nodes with data. Then you back it up. If you restore, you can restore under different cluster name, so it will effectively create completely new cluster, running in the same namespace. Hence, you have effectively cloned your whole cluster and you have exact copy of that cluster from particular date (snapshot you took).
No. In spite of snapshots being deleted automatically after files where uploaded, it does not make sense to upload something twice under same snapshot. Why would you even want that? Rule of thumb is to include some date and time information into its name so you can return back to it in the future, referencing arbitrary snapshot.
That is fine. So do not specify any credentials related to AWS. Firstly it will try to look them up and if it fails, it will eventually fallback to last chance to authenticate. This is delegated to S3 client builder itself.
If awsendpoint is set but awsregion is not, the backup request fails. If awsendpoint is not set but awsregion is set, only this region will be set.
If you do not specify awssecretaccesskey nor awsaccesskeyid, as stated above, it will fallback to instance authentication mechanism.
We are setting auto_bootstrap: false and initial_token with tokes a respective node was running with when it was backed up.