diff --git a/docs/docs/workflows/backup.md b/docs/docs/workflows/backup.md new file mode 100644 index 00000000..12b93fae --- /dev/null +++ b/docs/docs/workflows/backup.md @@ -0,0 +1,108 @@ +# Backing up and restoring MarbleRun state + +In a production environment, you should regularly back up the state of the MarbleRun Coordinator to be able to restore it in case of failure. Backup is easy, but there are differences based on how you deployed MarbleRun. + +## Prerequisites + +Restoring a backup includes [recovering the Coordinator](../features/recovery.md), so you need to [define recovery keys](define-manifest.md#recoverykeys) in the manifest. + +## Backing up the Coordinator state + +The Coordinator supports live backup, so you can back up its state without stopping it. + + + + + + +Make a copy of the `marblerun-state` Secret in the `marblerun` namespace: + +```bash +kubectl -n marblerun get secret marblerun-state -o yaml > marblerun-state-backup.yaml +``` + + + + +The state is stored on a PersistentVolume bound to the PersistentVolumeClaim `coordinator-pv-claim` in the `marblerun` namespace. +Check the documentation of your Kubernetes distribution on the available options to back up PersistentVolumes. + +Alternatively, you can copy the `sealed_data` file from the PersistentVolume (via the Coordinator Pod) to your machine: + +```bash +podname=$(kubectl -n marblerun get pods -l app.kubernetes.io/name=coordinator -o jsonpath='{.items[0].metadata.name}') +kubectl -n marblerun cp $podname:/coordinator/data/sealed_data sealed_data_backup +``` + + + + +Make a copy of the `sealed_data` file in the `marblerun-coordinator-data` directory. + + + + +## Restoring the Coordinator state + + + + +1. Stop all Coordinator instances: + + ```bash + kubectl -n marblerun scale --replicas=0 deployment/marblerun-coordinator + ``` + +2. Apply the state from the backup: + + ```bash + kubectl apply -f marblerun-state-backup.yaml + ``` + +3. Scale the Coordinator back to the desired number of instances: + + ```bash + kubectl -n marblerun scale --replicas=3 deployment/marblerun-coordinator + ``` + +:::tip + +If you want to restore MarbleRun in a fresh cluster, you can apply the state from the backup before installing MarbleRun: + +```bash +kubectl create ns marblerun +kubectl apply -f marblerun-state-backup.yaml +marblerun install ... +``` + +::: + + + + +* If you have a backup of the PersistentVolume, stop the Coordinator instance, restore the volume, and start the Coordinator again. +* If you have a backup of the `sealed_data` file, copy it to the PersistentVolume and then restart the Coordinator: + + ```bash + podname=$(kubectl -n marblerun get pods -l app.kubernetes.io/name=coordinator -o jsonpath='{.items[0].metadata.name}') + kubectl -n marblerun cp sealed_data_backup $podname:/coordinator/data/sealed_data + kubectl -n marblerun delete pod $podname + ``` + + + + +Stop the Coordinator, copy back the `sealed_data` file to the `marblerun-coordinator-data` directory, and start the Coordinator again. + + + + +After restoring the state from the backup, you may need to [recover the Coordinator](recover-coordinator.md). + +## Things to consider + +**Backup events**: In addition to regular backups, you may want to back up the state after significant changes, such as manifest updates. + +**Cluster backup**: If you use a Kubernetes cluster backup solution, the MarbleRun state may already be included in that backup. You should check if restoring and recovering the Coordinator works as expected. + +**Marbles**: Marbles may have state and that state may depend on the Coordinator state (e.g., secrets, monotonic counters). If so, you may need to back up Marble state and Coordinator state together. diff --git a/docs/sidebars.js b/docs/sidebars.js index 206c2b7d..ec164185 100644 --- a/docs/sidebars.js +++ b/docs/sidebars.js @@ -155,6 +155,11 @@ const sidebars = { label: 'Monitoring and logging', id: 'workflows/monitoring', }, + { + type: 'doc', + label: 'Backup and restore', + id: 'workflows/backup', + }, { type: 'doc', label: 'Update a manifest', diff --git a/docs/versioned_docs/version-1.6/workflows/backup.md b/docs/versioned_docs/version-1.6/workflows/backup.md new file mode 100644 index 00000000..12b93fae --- /dev/null +++ b/docs/versioned_docs/version-1.6/workflows/backup.md @@ -0,0 +1,108 @@ +# Backing up and restoring MarbleRun state + +In a production environment, you should regularly back up the state of the MarbleRun Coordinator to be able to restore it in case of failure. Backup is easy, but there are differences based on how you deployed MarbleRun. + +## Prerequisites + +Restoring a backup includes [recovering the Coordinator](../features/recovery.md), so you need to [define recovery keys](define-manifest.md#recoverykeys) in the manifest. + +## Backing up the Coordinator state + +The Coordinator supports live backup, so you can back up its state without stopping it. + + + + + + +Make a copy of the `marblerun-state` Secret in the `marblerun` namespace: + +```bash +kubectl -n marblerun get secret marblerun-state -o yaml > marblerun-state-backup.yaml +``` + + + + +The state is stored on a PersistentVolume bound to the PersistentVolumeClaim `coordinator-pv-claim` in the `marblerun` namespace. +Check the documentation of your Kubernetes distribution on the available options to back up PersistentVolumes. + +Alternatively, you can copy the `sealed_data` file from the PersistentVolume (via the Coordinator Pod) to your machine: + +```bash +podname=$(kubectl -n marblerun get pods -l app.kubernetes.io/name=coordinator -o jsonpath='{.items[0].metadata.name}') +kubectl -n marblerun cp $podname:/coordinator/data/sealed_data sealed_data_backup +``` + + + + +Make a copy of the `sealed_data` file in the `marblerun-coordinator-data` directory. + + + + +## Restoring the Coordinator state + + + + +1. Stop all Coordinator instances: + + ```bash + kubectl -n marblerun scale --replicas=0 deployment/marblerun-coordinator + ``` + +2. Apply the state from the backup: + + ```bash + kubectl apply -f marblerun-state-backup.yaml + ``` + +3. Scale the Coordinator back to the desired number of instances: + + ```bash + kubectl -n marblerun scale --replicas=3 deployment/marblerun-coordinator + ``` + +:::tip + +If you want to restore MarbleRun in a fresh cluster, you can apply the state from the backup before installing MarbleRun: + +```bash +kubectl create ns marblerun +kubectl apply -f marblerun-state-backup.yaml +marblerun install ... +``` + +::: + + + + +* If you have a backup of the PersistentVolume, stop the Coordinator instance, restore the volume, and start the Coordinator again. +* If you have a backup of the `sealed_data` file, copy it to the PersistentVolume and then restart the Coordinator: + + ```bash + podname=$(kubectl -n marblerun get pods -l app.kubernetes.io/name=coordinator -o jsonpath='{.items[0].metadata.name}') + kubectl -n marblerun cp sealed_data_backup $podname:/coordinator/data/sealed_data + kubectl -n marblerun delete pod $podname + ``` + + + + +Stop the Coordinator, copy back the `sealed_data` file to the `marblerun-coordinator-data` directory, and start the Coordinator again. + + + + +After restoring the state from the backup, you may need to [recover the Coordinator](recover-coordinator.md). + +## Things to consider + +**Backup events**: In addition to regular backups, you may want to back up the state after significant changes, such as manifest updates. + +**Cluster backup**: If you use a Kubernetes cluster backup solution, the MarbleRun state may already be included in that backup. You should check if restoring and recovering the Coordinator works as expected. + +**Marbles**: Marbles may have state and that state may depend on the Coordinator state (e.g., secrets, monotonic counters). If so, you may need to back up Marble state and Coordinator state together. diff --git a/docs/versioned_sidebars/version-1.6-sidebars.json b/docs/versioned_sidebars/version-1.6-sidebars.json index 8ca2d215..f5006f35 100644 --- a/docs/versioned_sidebars/version-1.6-sidebars.json +++ b/docs/versioned_sidebars/version-1.6-sidebars.json @@ -138,6 +138,11 @@ "label": "Monitoring and logging", "id": "workflows/monitoring" }, + { + "type": "doc", + "label": "Backup and restore", + "id": "workflows/backup" + }, { "type": "doc", "label": "Update a manifest",