From 05fb1648d5a7532bc5ff9046d54c34ffdd874dd4 Mon Sep 17 00:00:00 2001 From: Matt Linville Date: Tue, 5 Nov 2024 15:47:50 -0800 Subject: [PATCH] Update Kubernetes self-hosted upgrade docs --- .../common/upgrade/finalize-kubernetes.md | 40 +++ .../common/upgrade/finalize-self-hosted.md | 10 +- .../major-version-upgrade-kubernetes.md | 264 ++++++++++++++++++ .../upgrade/patch-rollback-kubernetes.md | 1 + .../upgrade/patch-upgrade-kubernetes.md | 125 +++++++++ .../upgrade/patch-upgrade-self-hosted.md | 10 +- .../upgrade/prepare-to-upgrade-kubernetes.md | 16 ++ .../common/upgrade/rollback-kubernetes.md | 30 ++ .../kubernetes-upgrade-cluster-helm.md | 259 ----------------- .../kubernetes-upgrade-cluster-manual.md | 248 ---------------- .../v24.3/upgrade-cockroachdb-kubernetes.md | 122 ++------ 11 files changed, 510 insertions(+), 615 deletions(-) create mode 100644 src/current/_includes/common/upgrade/finalize-kubernetes.md create mode 100644 src/current/_includes/common/upgrade/major-version-upgrade-kubernetes.md create mode 100644 src/current/_includes/common/upgrade/patch-rollback-kubernetes.md create mode 100644 src/current/_includes/common/upgrade/patch-upgrade-kubernetes.md create mode 100644 src/current/_includes/common/upgrade/prepare-to-upgrade-kubernetes.md create mode 100644 src/current/_includes/common/upgrade/rollback-kubernetes.md delete mode 100644 src/current/_includes/v24.3/orchestration/kubernetes-upgrade-cluster-helm.md delete mode 100644 src/current/_includes/v24.3/orchestration/kubernetes-upgrade-cluster-manual.md diff --git a/src/current/_includes/common/upgrade/finalize-kubernetes.md b/src/current/_includes/common/upgrade/finalize-kubernetes.md new file mode 100644 index 00000000000..77b5bd58429 --- /dev/null +++ b/src/current/_includes/common/upgrade/finalize-kubernetes.md @@ -0,0 +1,40 @@ +{% assign major_version_numeric = page.version.version | remove_first: 'v' %} + +To finalize a major-version upgrade: + +1. Connect to the cluster using the SQL shell: + + ~~~ shell + $ kubectl exec -it cockroachdb-client-secure \ + -- ./cockroach sql \ + --certs-dir=/cockroach-certs \ + --host=cockroachdb-public + ~~~ + +1. Run the following command: + + {% include_cached copy-clipboard.html %} + ~~~ sql + > RESET CLUSTER SETTING cluster.preserve_downgrade_option; + ~~~ + + A series of migration jobs runs to enable certain types of features and changes in the new major version that cannot be rolled back. These include changes to system schemas, indexes, and descriptors, and enabling certain types of improvements and new features. Until the upgrade is finalized, these features and functions will not be available and the command `SHOW CLUSTER SETTING version` will return the previous version`. + + You can monitor the process of the migration in the DB Console [Jobs page]({% link {{ page.version.version }}/ui-jobs-page.md %}). Migration jobs have names in the format `{{ major_version_numeric }}-{migration-id}`. If a migration job fails or stalls, Cockroach Labs can use the migration ID to help diagnose and troubleshoot the problem. Each major version has different migration jobs with different IDs. + + The amount of time required for finalization depends on the amount of data in the cluster, because finalization runs various internal maintenance and migration tasks. During this time, the cluster will experience a small amount of additional load. + + {{site.data.alerts.callout_info}} + Finalization is not complete until all [schema change]({% link {{ page.version.version }}/online-schema-changes.md %}) jobs reach a terminal state. Finalization can take as long as the longest-running schema change. + {{site.data.alerts.end}} + + When all migration jobs have completed, the upgrade is complete. + +1. To confirm that finalization has completed, check the cluster version: + + {% include_cached copy-clipboard.html %} + ~~~ sql + > SHOW CLUSTER SETTING version; + ~~~ + + If the cluster continues to report that it is on the previous version, finalization has not completed. If auto-finalization is enabled but finalization has not completed, check for the existence of [decommissioning nodes]({% link {{ page.version.version }}/node-shutdown.md %}?filters=decommission#status-change) where decommission has stalled. In most cases, issuing the `decommission` command again resolves the issue. If you have trouble upgrading, [contact Support](https://cockroachlabs.com/support/hc/). diff --git a/src/current/_includes/common/upgrade/finalize-self-hosted.md b/src/current/_includes/common/upgrade/finalize-self-hosted.md index edf61601916..324752098b2 100644 --- a/src/current/_includes/common/upgrade/finalize-self-hosted.md +++ b/src/current/_includes/common/upgrade/finalize-self-hosted.md @@ -1,4 +1,6 @@ -To finalize a major-version upgrade, run the following command. Replace `{VERSION}` new major version, such as `{{ page.version.version }}: +{% assign major_version_numeric = page.version.version | remove_first: 'v' %} + +To finalize a major-version upgrade: 1. Connect to the cluster using the SQL shell: @@ -7,14 +9,14 @@ To finalize a major-version upgrade, run the following command. Replace `{VERSIO cockroach sql ~~~ -1. Run the following command. Replace `{VERSION}` with the new major version, such as `{{ page.version.version }}`. +1. Run the following command. Replace `{VERSION}` with the new major version, such as `{{ major_version_numeric }}`. {% include_cached copy-clipboard.html %} - ~~~ shell + ~~~ sql SET CLUSTER SETTING version '{VERSION}'; ~~~ - A series of migration jobs runs to enable certain types of features and changes in the new major version that cannot be rolled back. These include changes to system schemas, indexes, and descriptors, and enabling certain types of improvements and new features. Until the upgrade is finalized, these features and functions will not be available and the command `SHOW CLUSTER SETTING version` will return `{{ previous_version_numeric }}`. + A series of migration jobs runs to enable certain types of features and changes in the new major version that cannot be rolled back. These include changes to system schemas, indexes, and descriptors, and enabling certain types of improvements and new features. Until the upgrade is finalized, these features and functions will not be available and the command `SHOW CLUSTER SETTING version` will return the previous version. You can monitor the process of the migration in the DB Console [Jobs page]({% link {{ page.version.version }}/ui-jobs-page.md %}). Migration jobs have names in the format `{{ major_version_numeric }}-{migration-id}`. If a migration job fails or stalls, Cockroach Labs can use the migration ID to help diagnose and troubleshoot the problem. Each major version has different migration jobs with different IDs. diff --git a/src/current/_includes/common/upgrade/major-version-upgrade-kubernetes.md b/src/current/_includes/common/upgrade/major-version-upgrade-kubernetes.md new file mode 100644 index 00000000000..d8cbeb4717e --- /dev/null +++ b/src/current/_includes/common/upgrade/major-version-upgrade-kubernetes.md @@ -0,0 +1,264 @@ +To perform a major upgrade: + +
+ +1. Change the container image image in the custom resource: + + ~~~ + image: + name: cockroachdb/cockroach:{{page.release_info.version}} + ~~~ + +1. Apply the new settings to the cluster: + + {% include_cached copy-clipboard.html %} + ~~~ shell + kubectl apply -f example.yaml + ~~~ + + The Operator will perform the staged update. + +1. To check the status of the rolling upgrade, run `kubectl get pods`. +1. Verify that all pods have been upgraded: + + {% include_cached copy-clipboard.html %} + ~~~ shell + $ kubectl get pods \ + -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[0].image}{"\n"}' + ~~~ + + You can also check the CockroachDB version of each node in the [DB Console]({% link {{ page.version.version }}/ui-cluster-overview-page.md %}#node-details). + +1. Before beginning a major-version upgrade, the Operator disables auto-finalization by setting the cluster setting `cluster.preserve_downgrade_option` to the cluster's current major version. Before finalizing an upgrade, follow your organization's testing procedures to decide whether to [finalize](#finalize-a-major-version-upgrade) or [roll back](#roll-back-a-major-version-upgrade) the upgrade. After finalization begins, you can no longer roll back to the cluster's previous major version. + +
+ +
+ +1. +1. Add a [partition](https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#staging-an-update) to the update strategy defined in the StatefulSet. Only the pods numbered greater than or equal to the partition value will be updated. For a cluster with 3 pods (e.g., `cockroachdb-0`, `cockroachdb-1`, `cockroachdb-2`) the partition value should be 2: + + {% include_cached copy-clipboard.html %} + ~~~ shell + $ kubectl patch statefulset cockroachdb \ + -p='{"spec":{"updateStrategy":{"type":"RollingUpdate","rollingUpdate":{"partition":2}}}}' + ~~~ + + ~~~ + statefulset.apps/cockroachdb patched + ~~~ + +1. Change the container image in the StatefulSet: + + {% include_cached copy-clipboard.html %} + ~~~ shell + $ kubectl patch statefulset cockroachdb \ + --type='json' \ + -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/image", "value":"cockroachdb/cockroach:{{page.release_info.version}}"}]' + ~~~ + + ~~~ + statefulset.apps/cockroachdb patched + ~~~ + +1. To check the status of the rolling upgrade, run `kubectl get pods`. +1. After the pod has been restarted with the new image, start the CockroachDB [built-in SQL client]({% link {{ page.version.version }}/cockroach-sql.md %}): + + {% include_cached copy-clipboard.html %} + ~~~ shell + $ kubectl exec -it cockroachdb-client-secure \-- ./cockroach sql \ + --certs-dir=/cockroach-certs \ + --host=cockroachdb-public + ~~~ + +1. Run the following SQL query to verify that the number of underreplicated ranges is zero: + + {% include_cached copy-clipboard.html %} + ~~~ sql + SELECT sum((metrics->>'ranges.underreplicated')::DECIMAL)::INT AS ranges_underreplicated FROM crdb_internal.kv_store_status; + ~~~ + + ~~~ + ranges_underreplicated + -------------------------- + 0 + (1 row) + ~~~ + + This indicates that it is safe to proceed to the next pod. + +1. Exit the SQL shell: + + {% include_cached copy-clipboard.html %} + ~~~ sql + > \q + ~~~ + +1. Decrement the partition value by 1 to allow the next pod in the cluster to update: + + {% include_cached copy-clipboard.html %} + ~~~ shell + $ kubectl patch statefulset cockroachdb \ + -p='{"spec":{"updateStrategy":{"type":"RollingUpdate","rollingUpdate":{"partition":1}}}}' + ~~~ + + ~~~ + statefulset.apps/cockroachdb patched + ~~~ + +1. Repeat steps 4-8 until all pods have been restarted and are running the new image (the final partition value should be `0`). + +1. Check the image of each pod to confirm that all have been upgraded: + + {% include_cached copy-clipboard.html %} + ~~~ shell + $ kubectl get pods \ + -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[0].image}{"\n"}' + ~~~ + + ~~~ + cockroachdb-0 cockroachdb/cockroach:{{page.release_info.version}} + cockroachdb-1 cockroachdb/cockroach:{{page.release_info.version}} + cockroachdb-2 cockroachdb/cockroach:{{page.release_info.version}} + ... + ~~~ + + You can also check the CockroachDB version of each node in the [DB Console]({% link {{ page.version.version }}/ui-cluster-overview-page.md %}#node-details). +1. If auto-finalization is disabled, the upgrade is not complete until you [finalize the upgrade](#finalize-a-major-version-upgrade). + +
+ +
+ +1. Add a [partition](https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#staging-an-update) to the update strategy defined in the StatefulSet. Only the pods numbered greater than or equal to the partition value will be updated. For a cluster with 3 pods (e.g., `cockroachdb-0`, `cockroachdb-1`, `cockroachdb-2`) the partition value should be 2: + + {% include_cached copy-clipboard.html %} + ~~~ shell + $ helm upgrade \ + my-release \ + cockroachdb/cockroachdb \ + --set statefulset.updateStrategy.rollingUpdate.partition=2 + ~~~ + +1. Connect to the cluster using the SQL shell: + + {% include_cached copy-clipboard.html %} + ~~~ shell + $ kubectl exec -it cockroachdb-client-secure \ + -- ./cockroach sql \ + --certs-dir=/cockroach-certs \ + --host=my-release-cockroachdb-public + ~~~ + +1. Remove the cluster initialization job from when the cluster was created: + + {% include_cached copy-clipboard.html %} + ~~~ shell + $ kubectl delete job my-release-cockroachdb-init + ~~~ + +1. Change the container image in the StatefulSet: + + {% include_cached copy-clipboard.html %} + ~~~ shell + $ helm upgrade \ + my-release \ + cockroachdb/cockroachdb \ + --set image.tag={{page.release_info.version}} \ + --reuse-values + ~~~ + + ~~~ + NAME READY STATUS RESTARTS AGE + my-release-cockroachdb-0 1/1 Running 0 2m + my-release-cockroachdb-1 1/1 Running 0 3m + my-release-cockroachdb-2 0/1 ContainerCreating 0 25s + my-release-cockroachdb-init-nwjkh 0/1 ContainerCreating 0 6s + ... + ~~~ + + {{site.data.alerts.callout_info}} + Ignore the pod for cluster initialization. It is re-created as a byproduct of the StatefulSet configuration but does not impact your existing cluster. + {{site.data.alerts.end}} + +1. After the pod has been restarted with the new image, start the CockroachDB [built-in SQL client]({% link {{ page.version.version }}/cockroach-sql.md %}): + + {% if page.secure == true %} + + {% include_cached copy-clipboard.html %} + ~~~ shell + $ kubectl exec -it cockroachdb-client-secure \ + -- ./cockroach sql \ + --certs-dir=/cockroach-certs \ + --host=my-release-cockroachdb-public + ~~~ + + {% else %} + + {% include_cached copy-clipboard.html %} + ~~~ shell + $ kubectl run cockroachdb -it \ + --image=cockroachdb/cockroach \ + --rm \ + --restart=Never \ + -- sql \ + --insecure \ + --host=my-release-cockroachdb-public + ~~~ + {% endif %} + +1. Run the following SQL query to verify that the number of underreplicated ranges is zero: + + {% include_cached copy-clipboard.html %} + ~~~ sql + SELECT sum((metrics->>'ranges.underreplicated')::DECIMAL)::INT AS ranges_underreplicated FROM crdb_internal.kv_store_status; + ~~~ + + ~~~ + ranges_underreplicated + -------------------------- + 0 + (1 row) + ~~~ + + This indicates that it is safe to proceed to the next pod. + +1. Exit the SQL shell: + + {% include_cached copy-clipboard.html %} + ~~~ sql + > \q + ~~~ + +1. Decrement the partition value by 1 to allow the next pod in the cluster to update: + + {% include_cached copy-clipboard.html %} + ~~~ shell + $ helm upgrade \ + my-release \ + cockroachdb/cockroachdb \ + --set statefulset.updateStrategy.rollingUpdate.partition=1 \ + ~~~ + +1. Repeat steps 4-8 until all pods have been restarted and are running the new image (the final partition value should be `0`). + +1. Check the image of each pod to confirm that all have been upgraded: + + {% include_cached copy-clipboard.html %} + ~~~ shell + $ kubectl get pods \ + -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[0].image}{"\n"}' + ~~~ + + ~~~ + my-release-cockroachdb-0 cockroachdb/cockroach:{{page.release_info.version}} + my-release-cockroachdb-1 cockroachdb/cockroach:{{page.release_info.version}} + my-release-cockroachdb-2 cockroachdb/cockroach:{{page.release_info.version}} + ... + ~~~ + + You can also check the CockroachDB version of each node in the [DB Console]({% link {{ page.version.version }}/ui-cluster-overview-page.md %}#node-details). + +1. If auto-finalization is disabled, the upgrade is not complete until you [finalize the upgrade](#finalize-a-major-version-upgrade). + +
diff --git a/src/current/_includes/common/upgrade/patch-rollback-kubernetes.md b/src/current/_includes/common/upgrade/patch-rollback-kubernetes.md new file mode 100644 index 00000000000..c0d0644e57d --- /dev/null +++ b/src/current/_includes/common/upgrade/patch-rollback-kubernetes.md @@ -0,0 +1 @@ +To roll back a patch upgrade, repeat the steps in [Perform a patch upgrade](#perform-a-patch-upgrade), but configure the container image for the pods to the previous major version. diff --git a/src/current/_includes/common/upgrade/patch-upgrade-kubernetes.md b/src/current/_includes/common/upgrade/patch-upgrade-kubernetes.md new file mode 100644 index 00000000000..950fe5cd5ae --- /dev/null +++ b/src/current/_includes/common/upgrade/patch-upgrade-kubernetes.md @@ -0,0 +1,125 @@ +
+ +To upgrade from one patch release to another within the same major version, perform the following steps on one node at a time: + +1. Change the container image in the custom resource: + + ~~~ + image: + name: cockroachdb/cockroach:{{page.release_info.version}} + ~~~ + +1. Apply the new settings to the cluster: + + {% include_cached copy-clipboard.html %} + ~~~ shell + kubectl apply -f example.yaml + ~~~ + + The Operator will perform the staged update. + +1. To check the status of the rolling upgrade, run `kubectl get pods`. +1. Verify that all pods have been upgraded: + + {% include_cached copy-clipboard.html %} + ~~~ shell + $ kubectl get pods \ + -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[0].image}{"\n"}' + ~~~ + + You can also check the CockroachDB version of each node in the [DB Console]({% link {{ page.version.version }}/ui-cluster-overview-page.md %}#node-details). + +
+ +
+ +1. Add a [partition](https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#staging-an-update) to the update strategy defined in the StatefulSet. Only the pods numbered greater than or equal to the partition value will be updated. For a cluster with 3 pods (e.g., `cockroachdb-0`, `cockroachdb-1`, `cockroachdb-2`) the partition value should be 2: + + {% include_cached copy-clipboard.html %} + ~~~ shell + $ kubectl patch statefulset cockroachdb \ + -p='{"spec":{"updateStrategy":{"type":"RollingUpdate","rollingUpdate":{"partition":2}}}}' + ~~~ + + ~~~ + statefulset.apps/cockroachdb patched + ~~~ + +1. Change the container image in the StatefulSet: + + {% include_cached copy-clipboard.html %} + ~~~ shell + $ kubectl patch statefulset cockroachdb \ + --type='json' \ + -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/image", "value":"cockroachdb/cockroach:{{page.release_info.version}}"}]' + ~~~ + + ~~~ + statefulset.apps/cockroachdb patched + ~~~ + +1. To check the status of the rolling upgrade, run `kubectl get pods`. +1. Verify that all pods have been upgraded: + + {% include_cached copy-clipboard.html %} + ~~~ shell + $ kubectl get pods \ + -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[0].image}{"\n"}' + ~~~ + + You can also check the CockroachDB version of each node in the [DB Console]({% link {{ page.version.version }}/ui-cluster-overview-page.md %}#node-details). + +
+ +
+ +1. Add a [partition](https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#staging-an-update) to the update strategy defined in the StatefulSet. Only the pods numbered greater than or equal to the partition value will be updated. For a cluster with 3 pods (e.g., `cockroachdb-0`, `cockroachdb-1`, `cockroachdb-2`) the partition value should be 2: + + {% include_cached copy-clipboard.html %} + ~~~ shell + $ helm upgrade \ + my-release \ + cockroachdb/cockroachdb \ + --set statefulset.updateStrategy.rollingUpdate.partition=2 + ~~~ + +1. Connect to the cluster using the SQL shell: + + {% include_cached copy-clipboard.html %} + ~~~ shell + $ kubectl exec -it cockroachdb-client-secure \ + -- ./cockroach sql \ + --certs-dir=/cockroach-certs \ + --host=my-release-cockroachdb-public + ~~~ + +1. Remove the cluster initialization job from when the cluster was created: + + {% include_cached copy-clipboard.html %} + ~~~ shell + $ kubectl delete job my-release-cockroachdb-init + ~~~ + +1. Change the container image in the StatefulSet: + + {% include_cached copy-clipboard.html %} + ~~~ shell + $ helm upgrade \ + my-release \ + cockroachdb/cockroachdb \ + --set image.tag={{page.release_info.version}} \ + --reuse-values + ~~~ + +1. To check the status of the rolling upgrade, run `kubectl get pods`. +1. Verify that all pods have been upgraded: + + {% include_cached copy-clipboard.html %} + ~~~ shell + $ kubectl get pods \ + -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[0].image}{"\n"}' + ~~~ + + You can also check the CockroachDB version of each node in the [DB Console]({% link {{ page.version.version }}/ui-cluster-overview-page.md %}#node-details). + +
diff --git a/src/current/_includes/common/upgrade/patch-upgrade-self-hosted.md b/src/current/_includes/common/upgrade/patch-upgrade-self-hosted.md index 5845e2f1b06..135554599b2 100644 --- a/src/current/_includes/common/upgrade/patch-upgrade-self-hosted.md +++ b/src/current/_includes/common/upgrade/patch-upgrade-self-hosted.md @@ -5,11 +5,11 @@ To upgrade from one patch release to another within the same major version, perf 1. Verify that the node has rejoined the cluster. 1. Ensure that the node is ready to accept a SQL connection. - Unless there are tens of thousands of ranges on the node, it's usually sufficient to wait one minute. To be certain that the node is ready, run the following command: + Unless there are tens of thousands of ranges on the node, it's usually sufficient to wait one minute. To be certain that the node is ready, run the following command: - {% include_cached copy-clipboard.html %} - ~~~ shell - cockroach sql -e 'select 1' - ~~~ + {% include_cached copy-clipboard.html %} + ~~~ shell + cockroach sql -e 'select 1' + ~~~ When all nodes are running the new patch version, the upgrade is complete. diff --git a/src/current/_includes/common/upgrade/prepare-to-upgrade-kubernetes.md b/src/current/_includes/common/upgrade/prepare-to-upgrade-kubernetes.md new file mode 100644 index 00000000000..5875f314b32 --- /dev/null +++ b/src/current/_includes/common/upgrade/prepare-to-upgrade-kubernetes.md @@ -0,0 +1,16 @@ +Before beginning a major-version or patch upgrade: + +1. Verify the overall health of your cluster using the [DB Console]({% link {{ page.version.version }}/ui-cluster-overview-page.md %}): + - Under **Node Status**, make sure all nodes that should be live are listed as such. If any nodes are unexpectedly listed as `SUSPECT` or `DEAD`, identify why the nodes are offline and either restart them or [decommission]({% link {{ page.version.version }}/node-shutdown.md %}?filters=decommission#remove-nodes) them before beginning your upgrade. If there are `DEAD` and non-decommissioned nodes in your cluster, the upgrade cannot be finalized. + + If any node is not fully decommissioned, try the following: + 1. First, reissue the [decommission command]({% link {{ page.version.version }}/node-shutdown.md %}?filters=decommission#decommission-the-node). The second command typically succeeds within a few minutes. + 1. If the second decommission command does not succeed, [recommission]({% link {{ page.version.version }}/node-shutdown.md %}?filters=decommission#recommission-nodes) and then decommission it again. Before continuing the upgrade, the node must be marked as `decommissioned`. + - Under **Replication Status**, make sure there are `0` under-replicated and unavailable ranges. Otherwise, performing a rolling upgrade increases the risk that ranges will lose a majority of their replicas and cause cluster unavailability. Therefore, it's important to identify and resolve the cause of range under-replication and/or unavailability before beginning your upgrade. + - In the **Node List**, make sure all nodes are on the same version. Upgrade them to the cluster's current version before continuing. If any nodes are behind, this also indicates that the previous major-version upgrade may not be finalized. + - In the **Metrics** dashboards, make sure [CPU]({% link {{ page.version.version }}/common-issues-to-monitor.md %}#cpu-usage), [memory]({% link {{ page.version.version }}/common-issues-to-monitor.md %}#database-memory-usage), and [storage]({% link {{ page.version.version }}/common-issues-to-monitor.md %}#storage-capacity) capacity are within acceptable values for each node. Nodes must be able to tolerate some increase in case the new version uses more resources for your workload. If any of these metrics is above healthy limits, consider [adding nodes]({% link {{ page.version.version }}/cockroach-start.md %}) to your cluster before beginning your upgrade. +1. Make sure your cluster is behind a [load balancer]({% link {{ page.version.version }}/recommended-production-settings.md %}#load-balancing), or your clients are configured to talk to multiple nodes. If your application communicates with only a single node, stopping that node to upgrade its CockroachDB binary will cause your application to fail. +1. {% include {{page.version.version}}/storage/compaction-concurrency.md %} +1. {% include {{page.version.version}}/backups/recommend-backups-for-upgrade.md %} Refer to [Restoring backups across versions]({% link {{ page.version.version }}/restoring-backups-across-versions.md %}#support-for-restoring-backups-into-a-newer-version). +1. Review the [{{ page.version.version }} Release Notes]({% link releases/{{ page.version.version }}.md %}), as well as the release notes for any skipped major version. Pay careful attention to the sections for backward-incompatible changes, deprecations, changes to default cluster settings, and features that are not available until the upgrade is finalized. +1. Optionally [disable auto-finalization](#disable-auto-finalization) to preserve the ability to roll back a major-version upgrade instead of finalizing it. If auto-finalization is disabled, a major-version upgrade is not complete until it is finalized. diff --git a/src/current/_includes/common/upgrade/rollback-kubernetes.md b/src/current/_includes/common/upgrade/rollback-kubernetes.md new file mode 100644 index 00000000000..83efa02140c --- /dev/null +++ b/src/current/_includes/common/upgrade/rollback-kubernetes.md @@ -0,0 +1,30 @@ +To roll back to the previous major version before an upgrade is finalized: + +1. Change the container image in the custom resource to use the previous major version: + + ~~~ + image: + name: cockroachdb/cockroach:{{ page.version.version }} + ~~~ + +1. Apply the new settings to the cluster: + + {% include_cached copy-clipboard.html %} + ~~~ shell + kubectl apply -f example.yaml + ~~~ + + The Operator will perform the staged rollback. + +1. To check the status of the rollback, run `kubectl get pods`. +1. Verify that all pods have been rolled back: + + {% include_cached copy-clipboard.html %} + ~~~ shell + $ kubectl get pods \ + -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[0].image}{"\n"}' + ~~~ + + You can also check the CockroachDB version of each node in the [DB Console]({% link {{ page.version.version }}/ui-cluster-overview-page.md %}#node-details). + +1. Like a major-version upgrade, a rollback must be [finalized](#finalize-a-major-version-upgrade). When the rollback is finalized, the rollback is complete. diff --git a/src/current/_includes/v24.3/orchestration/kubernetes-upgrade-cluster-helm.md b/src/current/_includes/v24.3/orchestration/kubernetes-upgrade-cluster-helm.md deleted file mode 100644 index 4506ae28c74..00000000000 --- a/src/current/_includes/v24.3/orchestration/kubernetes-upgrade-cluster-helm.md +++ /dev/null @@ -1,259 +0,0 @@ -{% assign previous_version = site.data.versions | where_exp: "previous_version", "previous_version.major_version == page.version.version" | first | map: "previous_version" %} - -1. Verify that you can upgrade. - - To upgrade to a new major version, you must first be on a production release of the previous version. The release does not need to be the latest production release of the previous version, but it must be a production [release]({% link releases/index.md %}) and not a testing release (alpha/beta). - - Therefore, in order to upgrade to {{ page.version.version }}, you must be on a production release of {{ previous_version }}. - - 1. If you are upgrading to {{ page.version.version }} from a production release earlier than {{ previous_version }}, or from a testing release (alpha/beta), first [upgrade to a production release of {{ previous_version }}]({% link {{ previous_version }}/upgrade-cockroachdb-kubernetes.md %}?filters=helm). Be sure to complete all the steps. - - 1. Then return to this page and perform a second upgrade to {{ page.version.version }}. - - 1. If you are upgrading from any production release of {{ previous_version }}, or from any earlier {{ page.version.version }} patch release, you do not have to go through intermediate releases; continue to step 2. - -1. Verify the overall health of your cluster using the [DB Console]({% link {{ page.version.version }}/ui-overview.md %}). On the **Overview**: - - Under **Node Status**, make sure all nodes that should be live are listed as such. If any nodes are unexpectedly listed as suspect or dead, identify why the nodes are offline and either restart them or [decommission](scale-cockroachdb-kubernetes.html?filters=helm#remove-nodes) them before beginning your upgrade. If there are dead and non-decommissioned nodes in your cluster, it will not be possible to finalize the upgrade (either automatically or manually). - - Under **Replication Status**, make sure there are 0 under-replicated and unavailable ranges. Otherwise, performing a rolling upgrade increases the risk that ranges will lose a majority of their replicas and cause cluster unavailability. Therefore, it's important to [identify and resolve the cause of range under-replication and/or unavailability]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#replication-issues) before beginning your upgrade. - - In the **Node List**: - - Make sure all nodes are on the same version. If not all nodes are on the same version, upgrade them to the cluster's highest current version first, and then start this process over. - - Make sure capacity and memory usage are reasonable for each node. Nodes must be able to tolerate some increase in case the new version uses more resources for your workload. Also go to **Metrics > Dashboard: Hardware** and make sure CPU percent is reasonable across the cluster. If there's not enough headroom on any of these metrics, consider [adding nodes](scale-cockroachdb-kubernetes.html?filters=helm#add-nodes) to your cluster before beginning your upgrade. - -{% assign rd = site.data.versions | where_exp: "rd", "rd.major_version == page.version.version" | first %} - -1. Review the [backward-incompatible changes in {{ page.version.version }}]({% link releases/{{ page.version.version }}.md %}{% unless rd.release_date == "N/A" or rd.release_date > today %}#{{ page.version.version | replace: ".", "-" }}-0-backward-incompatible-changes{% endunless %}) and [deprecated features]({% link releases/{{ page.version.version }}.md %}#{% unless rd.release_date == "N/A" or rd.release_date > today %}{{ page.version.version | replace: ".", "-" }}-0-deprecations{% endunless %}). If any affect your deployment, make the necessary changes before starting the rolling upgrade to {{ page.version.version }}. - -1. Decide how the upgrade will be finalized. - - By default, after all nodes are running the new version, the upgrade process will be **auto-finalized**. This will enable certain [features and performance improvements introduced in {{ page.version.version }}. After finalization, however, it will no longer be possible to perform a downgrade to {{ previous_version }}. In the event of a catastrophic failure or corruption, the only option is to start a new cluster using the old binary and then restore from a [backup]({% link {{ page.version.version }}/take-full-and-incremental-backups.md %}) created prior to the upgrade. For this reason, **we recommend disabling auto-finalization** so you can monitor the stability and performance of the upgraded cluster before finalizing the upgrade, but note that you will need to follow all of the subsequent directions, including the manual finalization in a later step. - - {{site.data.alerts.callout_info}} - Finalization only applies when performing a major version upgrade (for example, from {{ previous_version }}.x to {{ page.version.version }}). Patch version upgrades (for example, within the {{ page.version.version }}.x series) can always be downgraded. - {{site.data.alerts.end}} - - {% if page.secure == true %} - - 1. Get a shell into the pod with the `cockroach` binary created earlier and start the CockroachDB [built-in SQL client]({% link {{ page.version.version }}/cockroach-sql.md %}): - - {% include_cached copy-clipboard.html %} - ~~~ shell - $ kubectl exec -it cockroachdb-client-secure \ - -- ./cockroach sql \ - --certs-dir=/cockroach-certs \ - --host=my-release-cockroachdb-public - ~~~ - - {% else %} - - 1. Launch a temporary interactive pod and start the [built-in SQL client]({% link {{ page.version.version }}/cockroach-sql.md %}) inside it: - - {% include_cached copy-clipboard.html %} - ~~~ shell - $ kubectl run cockroachdb -it \ - --image=cockroachdb/cockroach \ - --rm \ - --restart=Never \ - -- sql \ - --insecure \ - --host=my-release-cockroachdb-public - ~~~ - - {% endif %} - - 1. Set the `cluster.preserve_downgrade_option` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) to the version you are upgrading from: - - {% include_cached copy-clipboard.html %} - ~~~ sql - > SET CLUSTER SETTING cluster.preserve_downgrade_option = '{{ previous_version | remove_first: "v" }}'; - ~~~ - - 1. Exit the SQL shell and delete the temporary pod: - - {% include_cached copy-clipboard.html %} - ~~~ sql - > \q - ~~~ - -1. Add a [partition](https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#staging-an-update) to the update strategy defined in the StatefulSet. Only the pods numbered greater than or equal to the partition value will be updated. For a cluster with 3 pods (e.g., `cockroachdb-0`, `cockroachdb-1`, `cockroachdb-2`) the partition value should be 2: - - {% include_cached copy-clipboard.html %} - ~~~ shell - $ helm upgrade \ - my-release \ - cockroachdb/cockroachdb \ - --set statefulset.updateStrategy.rollingUpdate.partition=2 - ~~~ - -1. Kick off the upgrade process by changing the Docker image used in the CockroachDB StatefulSet: - - {{site.data.alerts.callout_info}} - For Helm, you must remove the cluster initialization job from when the cluster was created before the cluster version can be changed. - {{site.data.alerts.end}} - - {% include_cached copy-clipboard.html %} - ~~~ shell - $ kubectl delete job my-release-cockroachdb-init - ~~~ - - {% include_cached copy-clipboard.html %} - ~~~ shell - $ helm upgrade \ - my-release \ - cockroachdb/cockroachdb \ - --set image.tag={{page.release_info.version}} \ - --reuse-values - ~~~ - -1. Check the status of your cluster's pods. You should see one of them being restarted: - - {% include_cached copy-clipboard.html %} - ~~~ shell - $ kubectl get pods - ~~~ - - ~~~ - NAME READY STATUS RESTARTS AGE - my-release-cockroachdb-0 1/1 Running 0 2m - my-release-cockroachdb-1 1/1 Running 0 3m - my-release-cockroachdb-2 0/1 ContainerCreating 0 25s - my-release-cockroachdb-init-nwjkh 0/1 ContainerCreating 0 6s - ... - ~~~ - - {{site.data.alerts.callout_info}} - Ignore the pod for cluster initialization. It is re-created as a byproduct of the StatefulSet configuration but does not impact your existing cluster. - {{site.data.alerts.end}} - -1. After the pod has been restarted with the new image, start the CockroachDB [built-in SQL client]({% link {{ page.version.version }}/cockroach-sql.md %}): - - {% if page.secure == true %} - - {% include_cached copy-clipboard.html %} - ~~~ shell - $ kubectl exec -it cockroachdb-client-secure \ - -- ./cockroach sql \ - --certs-dir=/cockroach-certs \ - --host=my-release-cockroachdb-public - ~~~ - - {% else %} - - {% include_cached copy-clipboard.html %} - ~~~ shell - $ kubectl run cockroachdb -it \ - --image=cockroachdb/cockroach \ - --rm \ - --restart=Never \ - -- sql \ - --insecure \ - --host=my-release-cockroachdb-public - ~~~ - {% endif %} - -1. Run the following SQL query to verify that the number of underreplicated ranges is zero: - - {% include_cached copy-clipboard.html %} - ~~~ sql - SELECT sum((metrics->>'ranges.underreplicated')::DECIMAL)::INT AS ranges_underreplicated FROM crdb_internal.kv_store_status; - ~~~ - - ~~~ - ranges_underreplicated - -------------------------- - 0 - (1 row) - ~~~ - - This indicates that it is safe to proceed to the next pod. - -1. Exit the SQL shell: - - {% include_cached copy-clipboard.html %} - ~~~ sql - > \q - ~~~ - -1. Decrement the partition value by 1 to allow the next pod in the cluster to update: - - {% include_cached copy-clipboard.html %} - ~~~ shell - $ helm upgrade \ - my-release \ - cockroachdb/cockroachdb \ - --set statefulset.updateStrategy.rollingUpdate.partition=1 \ - ~~~ - -1. Repeat steps 4-8 until all pods have been restarted and are running the new image (the final partition value should be `0`). - -1. Check the image of each pod to confirm that all have been upgraded: - - {% include_cached copy-clipboard.html %} - ~~~ shell - $ kubectl get pods \ - -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[0].image}{"\n"}' - ~~~ - - ~~~ - my-release-cockroachdb-0 cockroachdb/cockroach:{{page.release_info.version}} - my-release-cockroachdb-1 cockroachdb/cockroach:{{page.release_info.version}} - my-release-cockroachdb-2 cockroachdb/cockroach:{{page.release_info.version}} - ... - ~~~ - - You can also check the CockroachDB version of each node in the [DB Console]({% link {{ page.version.version }}/ui-cluster-overview-page.md %}#node-details). - - -1. If you disabled auto-finalization earlier, monitor the stability and performance of your cluster until you are comfortable with the upgrade (generally at least a day). - - If you decide to roll back the upgrade, repeat the rolling restart procedure with the old binary. - - {{site.data.alerts.callout_info}} - This is only possible when performing a major version upgrade (for example, from {{ previous_version }}.x to {{ page.version.version }}). Patch version upgrades (for example, within the {{ page.version.version }}.x series) are auto-finalized. - {{site.data.alerts.end}} - - To finalize the upgrade, re-enable auto-finalization: - - {% if page.secure == true %} - - 1. Get a shell into the pod with the `cockroach` binary created earlier and start the CockroachDB [built-in SQL client]({% link {{ page.version.version }}/cockroach-sql.md %}): - - {% include_cached copy-clipboard.html %} - ~~~ shell - $ kubectl exec -it cockroachdb-client-secure \ - -- ./cockroach sql \ - --certs-dir=/cockroach-certs \ - --host=my-release-cockroachdb-public - ~~~ - - {% else %} - - 1. Launch a temporary interactive pod and start the [built-in SQL client]({% link {{ page.version.version }}/cockroach-sql.md %}) inside it: - - {% include_cached copy-clipboard.html %} - ~~~ shell - $ kubectl run cockroachdb -it \ - --image=cockroachdb/cockroach \ - --rm \ - --restart=Never \ - -- sql \ - --insecure \ - --host=my-release-cockroachdb-public - ~~~ - - {% endif %} - - 1. Re-enable auto-finalization: - - {% include_cached copy-clipboard.html %} - ~~~ sql - > RESET CLUSTER SETTING cluster.preserve_downgrade_option; - ~~~ - - After the upgrade to {{ page.version.version }} is finalized, you may notice an increase in compaction activity due to a background migration within the storage engine. To observe the migration's progress, check the **Compactions** section of the [Storage Dashboard]({% link {{ page.version.version }}/ui-storage-dashboard.md %}) in the DB Console or monitor the `storage.marked-for-compaction-files` [time-series metric]({% link {{ page.version.version }}/metrics.md %}). When the metric's value nears or reaches `0`, the migration is complete and compaction activity will return to normal levels. - - 1. Exit the SQL shell and delete the temporary pod: - - {% include_cached copy-clipboard.html %} - ~~~ sql - > \q - ~~~ diff --git a/src/current/_includes/v24.3/orchestration/kubernetes-upgrade-cluster-manual.md b/src/current/_includes/v24.3/orchestration/kubernetes-upgrade-cluster-manual.md deleted file mode 100644 index 078858ee083..00000000000 --- a/src/current/_includes/v24.3/orchestration/kubernetes-upgrade-cluster-manual.md +++ /dev/null @@ -1,248 +0,0 @@ -{% assign previous_version = site.data.versions | where_exp: "previous_version", "previous_version.major_version == page.version.version" | first | map: "previous_version" %} - -1. Verify that you can upgrade. - - To upgrade to a new major version, you must first be on a production release of the previous version. The release does not need to be the latest production release of the previous version, but it must be a production [release]({% link releases/index.md %}) and not a testing release (alpha/beta). - - Therefore, in order to upgrade to {{ page.version.version }}, you must be on a production release of {{ previous_version }}. - - 1. If you are upgrading to {{ page.version.version }} from a production release earlier than {{ previous_version }}, or from a testing release (alpha/beta), first [upgrade to a production release of {{ previous_version }}]({% link {{ previous_version }}/upgrade-cockroachdb-kubernetes.md %}?filters=manual). Be sure to complete all the steps. - - 1. Then return to this page and perform a second upgrade to {{ page.version.version }}. - - 1. If you are upgrading from any production release of {{ previous_version }}, or from any earlier {{ page.version.version }} patch release, you do not have to go through intermediate releases; continue to step 2. - -1. Verify the overall health of your cluster using the [DB Console]({% link {{ page.version.version }}/ui-overview.md %}). On the **Overview**: - - Under **Node Status**, make sure all nodes that should be live are listed as such. If any nodes are unexpectedly listed as suspect or dead, identify why the nodes are offline and either restart them or [decommission](scale-cockroachdb-kubernetes.html?filters=manual#remove-nodes) them before beginning your upgrade. If there are dead and non-decommissioned nodes in your cluster, it will not be possible to finalize the upgrade (either automatically or manually). - - Under **Replication Status**, make sure there are 0 under-replicated and unavailable ranges. Otherwise, performing a rolling upgrade increases the risk that ranges will lose a majority of their replicas and cause cluster unavailability. Therefore, it's important to [identify and resolve the cause of range under-replication and/or unavailability]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#replication-issues) before beginning your upgrade. - - In the **Node List**: - - Make sure all nodes are on the same version. If not all nodes are on the same version, upgrade them to the cluster's highest current version first, and then start this process over. - - Make sure capacity and memory usage are reasonable for each node. Nodes must be able to tolerate some increase in case the new version uses more resources for your workload. Also go to **Metrics > Dashboard: Hardware** and make sure CPU percent is reasonable across the cluster. If there's not enough headroom on any of these metrics, consider [adding nodes](scale-cockroachdb-kubernetes.html?filters=manual#add-nodes) to your cluster before beginning your upgrade. - -{% assign rd = site.data.versions | where_exp: "rd", "rd.major_version == page.version.version" | first %} - -1. Review the [backward-incompatible changes in {{ page.version.version }}]({% link releases/{{ page.version.version }}.md %}{% unless rd.release_date == "N/A" or rd.release_date > today %}#{{ page.version.version | replace: ".", "-" }}-0-backward-incompatible-changes{% endunless %}) and [deprecated features]({% link releases/{{ page.version.version }}.md %}#{% unless rd.release_date == "N/A" or rd.release_date > today %}{{ page.version.version | replace: ".", "-" }}-0-deprecations{% endunless %}). If any affect your deployment, make the necessary changes before starting the rolling upgrade to {{ page.version.version }}. - -1. Decide how the upgrade will be finalized. - - By default, after all nodes are running the new version, the upgrade process will be **auto-finalized**. This will enable certain [features and performance improvements introduced in {{ page.version.version }}. After finalization, however, it will no longer be possible to perform a downgrade to {{ previous_version }}. In the event of a catastrophic failure or corruption, the only option is to start a new cluster using the old binary and then restore from a [backup]({% link {{ page.version.version }}/take-full-and-incremental-backups.md %}) created prior to the upgrade. For this reason, **we recommend disabling auto-finalization** so you can monitor the stability and performance of the upgraded cluster before finalizing the upgrade, but note that you will need to follow all of the subsequent directions, including the manual finalization in a later step. - - {{site.data.alerts.callout_info}} - Finalization only applies when performing a major version upgrade (for example, from {{ previous_version }}.x to {{ page.version.version }}). Patch version upgrades (for example, within the {{ page.version.version }}.x series) can always be downgraded. - {{site.data.alerts.end}} - - {% if page.secure == true %} - - 1. Start the CockroachDB [built-in SQL client]({% link {{ page.version.version }}/cockroach-sql.md %}). For example, if you followed the steps in [Deploy CockroachDB with Kubernetes](deploy-cockroachdb-with-kubernetes.html?filters=manual#step-3-use-the-built-in-sql-client) to launch a secure client pod, get a shell into the `cockroachdb-client-secure` pod: - - {% include_cached copy-clipboard.html %} - ~~~ shell - $ kubectl exec -it cockroachdb-client-secure \-- ./cockroach sql \ - --certs-dir=/cockroach-certs \ - --host=cockroachdb-public - ~~~ - - {% else %} - - 1. Launch a temporary interactive pod and start the [built-in SQL client]({% link {{ page.version.version }}/cockroach-sql.md %}) inside it: - - {% include_cached copy-clipboard.html %} - ~~~ shell - $ kubectl run cockroachdb -it \ - --image=cockroachdb/cockroach \ - --rm \ - --restart=Never \ - -- sql \ - --insecure \ - --host=cockroachdb-public - ~~~ - - {% endif %} - - 1. Set the `cluster.preserve_downgrade_option` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) to the version you are upgrading from: - - {% include_cached copy-clipboard.html %} - ~~~ sql - > SET CLUSTER SETTING cluster.preserve_downgrade_option = '{{ previous_version | remove_first: "v" }}'; - ~~~ - - 1. Exit the SQL shell and delete the temporary pod: - - {% include_cached copy-clipboard.html %} - ~~~ sql - > \q - ~~~ - -1. Add a [partition](https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#staging-an-update) to the update strategy defined in the StatefulSet. Only the pods numbered greater than or equal to the partition value will be updated. For a cluster with 3 pods (e.g., `cockroachdb-0`, `cockroachdb-1`, `cockroachdb-2`) the partition value should be 2: - - {% include_cached copy-clipboard.html %} - ~~~ shell - $ kubectl patch statefulset cockroachdb \ - -p='{"spec":{"updateStrategy":{"type":"RollingUpdate","rollingUpdate":{"partition":2}}}}' - ~~~ - - ~~~ - statefulset.apps/cockroachdb patched - ~~~ - -1. Kick off the upgrade process by changing the Docker image used in the CockroachDB StatefulSet: - - {% include_cached copy-clipboard.html %} - ~~~ shell - $ kubectl patch statefulset cockroachdb \ - --type='json' \ - -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/image", "value":"cockroachdb/cockroach:{{page.release_info.version}}"}]' - ~~~ - - ~~~ - statefulset.apps/cockroachdb patched - ~~~ - -1. Check the status of your cluster's pods. You should see one of them being restarted: - - {% include_cached copy-clipboard.html %} - ~~~ shell - $ kubectl get pods - ~~~ - - ~~~ - NAME READY STATUS RESTARTS AGE - cockroachdb-0 1/1 Running 0 2m - cockroachdb-1 1/1 Running 0 2m - cockroachdb-2 0/1 Terminating 0 1m - ... - ~~~ - -1. After the pod has been restarted with the new image, start the CockroachDB [built-in SQL client]({% link {{ page.version.version }}/cockroach-sql.md %}): - - {% if page.secure == true %} - {% include_cached copy-clipboard.html %} - ~~~ shell - $ kubectl exec -it cockroachdb-client-secure \-- ./cockroach sql \ - --certs-dir=/cockroach-certs \ - --host=cockroachdb-public - ~~~ - - {% else %} - - {% include_cached copy-clipboard.html %} - ~~~ shell - $ kubectl run cockroachdb -it \ - --image=cockroachdb/cockroach \ - --rm \ - --restart=Never \ - -- sql \ - --insecure \ - --host=cockroachdb-public - ~~~ - - {% endif %} - -1. Run the following SQL query to verify that the number of underreplicated ranges is zero: - - {% include_cached copy-clipboard.html %} - ~~~ sql - SELECT sum((metrics->>'ranges.underreplicated')::DECIMAL)::INT AS ranges_underreplicated FROM crdb_internal.kv_store_status; - ~~~ - - ~~~ - ranges_underreplicated - -------------------------- - 0 - (1 row) - ~~~ - - This indicates that it is safe to proceed to the next pod. - -1. Exit the SQL shell: - - {% include_cached copy-clipboard.html %} - ~~~ sql - > \q - ~~~ - -1. Decrement the partition value by 1 to allow the next pod in the cluster to update: - - {% include_cached copy-clipboard.html %} - ~~~ shell - $ kubectl patch statefulset cockroachdb \ - -p='{"spec":{"updateStrategy":{"type":"RollingUpdate","rollingUpdate":{"partition":1}}}}' - ~~~ - - ~~~ - statefulset.apps/cockroachdb patched - ~~~ - -1. Repeat steps 4-8 until all pods have been restarted and are running the new image (the final partition value should be `0`). - -1. Check the image of each pod to confirm that all have been upgraded: - - {% include_cached copy-clipboard.html %} - ~~~ shell - $ kubectl get pods \ - -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[0].image}{"\n"}' - ~~~ - - ~~~ - cockroachdb-0 cockroachdb/cockroach:{{page.release_info.version}} - cockroachdb-1 cockroachdb/cockroach:{{page.release_info.version}} - cockroachdb-2 cockroachdb/cockroach:{{page.release_info.version}} - ... - ~~~ - - You can also check the CockroachDB version of each node in the [DB Console]({% link {{ page.version.version }}/ui-cluster-overview-page.md %}#node-details). - -1. If you disabled auto-finalization earlier, monitor the stability and performance of your cluster until you are comfortable with the upgrade (generally at least a day). - - If you decide to roll back the upgrade, repeat the rolling restart procedure with the old binary. - - {{site.data.alerts.callout_info}} - This is only possible when performing a major version upgrade (for example, from {{ previous_version }}.x to {{ page.version.version }}). Patch version upgrades (for example, within the {{ page.version.version }}.x series) are auto-finalized. - {{site.data.alerts.end}} - - To finalize the upgrade, re-enable auto-finalization: - - {% if page.secure == true %} - - 1. Start the CockroachDB [built-in SQL client]({% link {{ page.version.version }}/cockroach-sql.md %}): - - {% include_cached copy-clipboard.html %} - ~~~ shell - $ kubectl exec -it cockroachdb-client-secure \ - -- ./cockroach sql \ - --certs-dir=/cockroach-certs \ - --host=cockroachdb-public - ~~~ - - {% else %} - - 1. Launch a temporary interactive pod and start the [built-in SQL client]({% link {{ page.version.version }}/cockroach-sql.md %}) inside it: - - {% include_cached copy-clipboard.html %} - ~~~ shell - $ kubectl run cockroachdb -it \ - --image=cockroachdb/cockroach \ - --rm \ - --restart=Never \ - -- sql \ - --insecure \ - --host=cockroachdb-public - ~~~ - - {% endif %} - - 1. Re-enable auto-finalization: - - {% include_cached copy-clipboard.html %} - ~~~ sql - > RESET CLUSTER SETTING cluster.preserve_downgrade_option; - ~~~ - - After the upgrade to {{ page.version.version }} is finalized, you may notice an increase in compaction activity due to a background migration within the storage engine. To observe the migration's progress, check the **Compactions** section of the [Storage Dashboard]({% link {{ page.version.version }}/ui-storage-dashboard.md %}) in the DB Console or monitor the `storage.marked-for-compaction-files` [time-series metric]({% link {{ page.version.version }}/metrics.md %}). When the metric's value nears or reaches `0`, the migration is complete and compaction activity will return to normal levels. - - 1. Exit the SQL shell and delete the temporary pod: - - {% include_cached copy-clipboard.html %} - ~~~ sql - > \q - ~~~ diff --git a/src/current/v24.3/upgrade-cockroachdb-kubernetes.md b/src/current/v24.3/upgrade-cockroachdb-kubernetes.md index e59a58e6918..c68292b835f 100644 --- a/src/current/v24.3/upgrade-cockroachdb-kubernetes.md +++ b/src/current/v24.3/upgrade-cockroachdb-kubernetes.md @@ -1,5 +1,5 @@ --- -title: Cluster Upgrades +title: Upgrade a cluster in Kubernetes summary: How to upgrade a secure 3-node CockroachDB cluster with Kubernetes. toc: true toc_not_nested: true @@ -7,15 +7,15 @@ secure: true docs_area: deploy --- -{% assign previous_version = site.data.versions | where_exp: "previous_version", "previous_version.major_version == page.version.version" | first | map: "previous_version" %} +This page shows how to upgrade a CockroachDB cluster that is [deployed on a Kubernetes cluster]({% link {{ page.version.version }}/deploy-cockroachdb-with-kubernetes.md %}). -{{site.data.alerts.callout_info}} -This article assumes you have already [deployed CockroachDB on a single Kubernetes cluster]({% link {{ page.version.version }}/deploy-cockroachdb-with-kubernetes.md %}). -{{site.data.alerts.end}} +## Overview + +{% include common/upgrade/overview.md %} -We strongly recommend that you regularly upgrade your CockroachDB version in order to pick up bug fixes, performance improvements, and new features. +On Kubernewtes, the upgrade is a [staged update](https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#staging-an-update) in which each pod's container image for CockroachDB is updated in a rolling fashion. The cluster remains available during the upgrade. -The upgrade process on Kubernetes is a [staged update](https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#staging-an-update) in which the Docker image is applied to the pods one at a time, with each pod being stopped and restarted in turn. This is to ensure that the cluster remains available during the upgrade. +Select the cluster's deployment method to continue.
@@ -23,113 +23,37 @@ The upgrade process on Kubernetes is a [staged update](https://kubernetes.io/doc
-
-{% include {{ page.version.version }}/orchestration/operator-check-namespace.md %} +## Before you begin {{site.data.alerts.callout_success}} If you [deployed CockroachDB on Red Hat OpenShift]({% link {{ page.version.version }}/deploy-cockroachdb-with-kubernetes-openshift.md %}), substitute `kubectl` with `oc` in the following commands. {{site.data.alerts.end}} -1. Verify that you can upgrade. - - To upgrade to a new major version, you must first be on a production release of the previous version. The release does not need to be the latest production release of the previous version, but it must be a production [release]({% link releases/index.md %}) and not a testing release (alpha/beta). - - Therefore, in order to upgrade to {{ page.version.version }}, you must be on a production release of {{ previous_version }}. - - 1. If you are upgrading to {{ page.version.version }} from a production release earlier than {{ previous_version }}, or from a testing release (alpha/beta), first [upgrade to a production release of {{ previous_version }}]({% link {{ previous_version }}/upgrade-cockroachdb-kubernetes.md %}). Be sure to complete all the steps. - - 1. Then return to this page and perform a second upgrade to {{ page.version.version }}. - - 1. If you are upgrading from a production release of {{ previous_version }}, or from any earlier {{ page.version.version }} patch release, you do not have to go through intermediate releases; continue to step 2. - -1. Verify the overall health of your cluster using the [DB Console]({% link {{ page.version.version }}/ui-overview.md %}). On the **Overview**: - - Under **Node Status**, make sure all nodes that should be live are listed as such. If any nodes are unexpectedly listed as suspect or dead, identify why the nodes are offline and either restart them or [decommission]({% link {{ page.version.version }}/scale-cockroachdb-kubernetes.md %}#remove-nodes) them before beginning your upgrade. If there are dead and non-decommissioned nodes in your cluster, it will not be possible to finalize the upgrade (either automatically or manually). - - Under **Replication Status**, make sure there are 0 under-replicated and unavailable ranges. Otherwise, performing a rolling upgrade increases the risk that ranges will lose a majority of their replicas and cause cluster unavailability. Therefore, it's important to [identify and resolve the cause of range under-replication and/or unavailability]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#replication-issues) before beginning your upgrade. - - In the **Node List**: - - Make sure all nodes are on the same version. If not all nodes are on the same version, upgrade them to the cluster's highest current version first, and then start this process over. - - Make sure capacity and memory usage are reasonable for each node. Nodes must be able to tolerate some increase in case the new version uses more resources for your workload. Also go to **Metrics > Dashboard: Hardware** and make sure CPU percent is reasonable across the cluster. If there's not enough headroom on any of these metrics, consider [adding nodes]({% link {{ page.version.version }}/scale-cockroachdb-kubernetes.md %}#add-nodes) to your cluster before beginning your upgrade. - -{% assign rd = site.data.versions | where_exp: "rd", "rd.major_version == page.version.version" | first %} - -1. Review the [backward-incompatible changes in {{ page.version.version }}]({% link releases/{{ page.version.version }}.md %}{% unless rd.release_date == "N/A" or rd.release_date > today %}#{{ page.version.version | replace: ".", "-" }}-0-backward-incompatible-changes{% endunless %}) and [deprecated features]({% link releases/{{ page.version.version }}.md %}#{% unless rd.release_date == "N/A" or rd.release_date > today %}{{ page.version.version | replace: ".", "-" }}-0-deprecations{% endunless %}). If any affect your deployment, make the necessary changes before starting the rolling upgrade to {{ page.version.version }}. - -1. Change the desired Docker image in the custom resource: - - ~~~ - image: - name: cockroachdb/cockroach:{{page.release_info.version}} - ~~~ - -1. Apply the new settings to the cluster: - - {% include_cached copy-clipboard.html %} - ~~~ shell - $ kubectl apply -f example.yaml - ~~~ - - The Operator will perform the staged update. - - {{site.data.alerts.callout_info}} - The Operator automatically sets the `cluster.preserve_downgrade_option` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) to the version you are upgrading from. This disables auto-finalization of the upgrade so that you can monitor the stability and performance of the upgraded cluster before manually finalizing the upgrade. This will enable certain features and performance improvements introduced in {{ page.version.version }}. - - Note that after finalization, it will no longer be possible to perform a downgrade to {{ previous_version }}. In the event of a catastrophic failure or corruption, the only option will be to start a new cluster using the previous binary and then restore from a [backup]({% link {{ page.version.version }}/take-full-and-incremental-backups.md %}) created prior to performing the upgrade. - - Finalization only applies when performing a major version upgrade (for example, from {{ previous_version }}.x to {{ page.version.version }}). Patch version upgrades (for example, within the {{ page.version.version }}.x series) can always be downgraded. - {{site.data.alerts.end}} - -1. To check the status of the rolling upgrade, run `kubectl get pods`. The pods are restarted one at a time with the new image. - -1. Verify that all pods have been upgraded by running: - - {% include_cached copy-clipboard.html %} - ~~~ shell - $ kubectl get pods \ - -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[0].image}{"\n"}' - ~~~ - - You can also check the CockroachDB version of each node in the [DB Console]({% link {{ page.version.version }}/ui-cluster-overview-page.md %}#node-details). - -1. Monitor the stability and performance of your cluster until you are comfortable with the upgrade (generally at least a day). +{% include {{ page.version.version }}/orchestration/operator-check-namespace.md %} +{% include common/upgrade/prepare-to-upgrade-self-hosted.md %} - If you decide to roll back the upgrade, revert the image name in the custom resource and apply the new value. +## Perform a patch upgrade - {{site.data.alerts.callout_info}} - This is only possible when performing a major version upgrade (for example, from {{ previous_version }}.x to {{ page.version.version }}). Patch version upgrades (for example, within the {{ page.version.version }}.x series) are auto-finalized. - {{site.data.alerts.end}} +{% include_cached common/upgrade/patch-upgrade-kubernetes.md %} - To finalize the upgrade, re-enable auto-finalization: +### Roll back a patch upgrade - 1. Start the CockroachDB [built-in SQL client]({% link {{ page.version.version }}/cockroach-sql.md %}). For example, if you followed the steps in [Deploy CockroachDB with Kubernetes]({% link {{ page.version.version }}/deploy-cockroachdb-with-kubernetes.md %}#step-3-use-the-built-in-sql-client) to launch a secure client pod, get a shell into the `cockroachdb-client-secure` pod: +{% include_cached common/upgrade/patch-rollback-kubernetes.md %} - {% include_cached copy-clipboard.html %} - ~~~ shell - $ kubectl exec -it cockroachdb-client-secure \-- ./cockroach sql \ - --certs-dir=/cockroach/cockroach-certs \ - --host={cluster-name}-public - ~~~ +## Perform a major-version upgrade - 1. Re-enable auto-finalization: +{% include common/upgrade/major-version-upgrade-kubernetes.md %} - {% include_cached copy-clipboard.html %} - ~~~ sql - > RESET CLUSTER SETTING cluster.preserve_downgrade_option; - ~~~ +### Finalize a major-version upgrade - After the upgrade to {{ page.version.version }} is finalized, you may notice an increase in [compaction]({% link {{ page.version.version }}/architecture/storage-layer.md %}#compaction) activity due to a background migration within the storage engine. To observe the migration's progress, check the **Compactions** section of the [Storage Dashboard]({% link {{ page.version.version }}/ui-storage-dashboard.md %}) in the DB Console or monitor the `storage.marked-for-compaction-files` [time-series metric]({% link {{ page.version.version }}/metrics.md %}). When the metric's value nears or reaches `0`, the migration is complete and compaction activity will return to normal levels. {% include {{page.version.version}}/storage/compaction-concurrency.md %} +{% include common/upgrade/finalize-kubernetes.md %} - 1. Exit the SQL shell and pod: +### Roll back a major-version upgrade - {% include_cached copy-clipboard.html %} - ~~~ sql - > \q - ~~~ +{% include common/upgrade/rollback-kubernetes.md %} -
+## Troubleshooting -
-{% include {{ page.version.version }}/orchestration/kubernetes-upgrade-cluster-manual.md %} -
+{% include common/upgrade/troubleshooting-self-hosted.md %} -
-{% include {{ page.version.version }}/orchestration/kubernetes-upgrade-cluster-helm.md %} -
+## See also