New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add docs for PCR on Cloud in Preview #19320

Closed

kathancox wants to merge 1 commit into main from test-pcr-cloud-location

Contributor

kathancox commented Jan 24, 2025 •

edited

Loading

Fixes DOC-10050

PR adds docs for PCR on Advanced clusters in Preview phase.

Adds a setup tutorial page with short technical reference to cover "replication lag" and "retained time" (without the VC mentions in the self-hosted docs).
Updates the self-hosted PCR docs to callout Advanced availability and to navigate Cloud users to the correct docs.
Updates the general PCR overview page to adapt the Features section to a comp table for the supported features in Advanced vs. self-hosted.
Adds PCR on Advanced in the Preview section of the feature availability page.
Includes the new PCR on Advanced page in under the Cross-cluster Replication nav item, and also under Cloud Deployments.

Rendered Preview

Cloud PCR page
General PCR Overview page updates

github-actions bot commented Jan 24, 2025 •

edited

Loading

Files changed:

src/current/_includes/v25.1/known-limitations/physical-cluster-replication.md:

src/current/_includes/v25.1/sidebar-data/cloud-deployments.json
src/current/_includes/v25.1/sidebar-data/cross-cluster-replication.json
src/current/cockroachcloud/physical-cluster-replication.md
src/current/css/customstyles.scss
src/current/v25.1/cockroachdb-feature-availability.md
src/current/v25.1/failover-replication.md
src/current/v25.1/physical-cluster-replication-monitoring.md
src/current/v25.1/physical-cluster-replication-overview.md
src/current/v25.1/set-up-physical-cluster-replication.md

netlify bot commented Jan 24, 2025 •

edited

Loading

✅ Deploy Preview for cockroachdb-api-docs canceled.

Name	Link
🔨 Latest commit	`8ed31f0`
🔍 Latest deploy log	https://app.netlify.com/sites/cockroachdb-api-docs/deploys/6797e4c54840f20008c7b12e

netlify bot commented Jan 24, 2025 •

edited

Loading

✅ Deploy Preview for cockroachdb-interactivetutorials-docs canceled.

Name	Link
🔨 Latest commit	`8ed31f0`
🔍 Latest deploy log	https://app.netlify.com/sites/cockroachdb-interactivetutorials-docs/deploys/6797e4c504140c0008ab998b

netlify bot commented Jan 24, 2025 •

edited

Loading

✅ Netlify Preview

Name	Link
🔨 Latest commit	`8ed31f0`
🔍 Latest deploy log	https://app.netlify.com/sites/cockroachdb-docs/deploys/6797e4c534d2c80008313be4
😎 Deploy Preview	https://deploy-preview-19320--cockroachdb-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

kathancox force-pushed the test-pcr-cloud-location branch from fd5db34 to ac6fef7 Compare

January 27, 2025 19:36


          PCR on Cloud

8ed31f0

kathancox force-pushed the test-pcr-cloud-location branch from ac6fef7 to 8ed31f0 Compare

January 27, 2025 19:55

kathancox marked this pull request as ready for review

January 27, 2025 19:57

kathancox requested review from alicia-l2 and davidwding

January 27, 2025 20:21

alicia-l2 reviewed

View reviewed changes

alicia-l2 left a comment

overall LGTM, had some comments

src/current/_includes/v25.1/sidebar-data/cloud-deployments.json

@@ @@ -551,6 +551,12 @@ @@
                                   }
                               ]
                             },
+                            {
+                              "title": "Physical Cluster Replication",

alicia-l2 Jan 28, 2025

do we want to add, "on CockroachDB Cloud?"

src/current/cockroachcloud/physical-cluster-replication.md


		The following describes the required roles for the `replication-streams` endpoint methods:

		Method \| Required roles \| Description

alicia-l2 Jan 28, 2025

this is super clear! thanks :)

davidwding Jan 28, 2025

It's worth noting that the POST requires the role on both clusters. The GET and PATCH only require permissions on one of the two clusters.

src/current/cockroachcloud/physical-cluster-replication.md

+              - Clusters must be in the same cloud (AWS, GCP, or Azure).
+              - Clusters must be single [region]({% link cockroachcloud/regions.md %}) (multiple availability zones per clusteris supported).
+              - The primary and standby cluster in AWS and Azure must be in different regions.
+              - The primary and standby cluster in GCP can be in the same region, but must not have overlapping CIDR ranges.

alicia-l2 Jan 28, 2025

@davidwding is the CIDR range part of the public cloud API? I feel like it was an undocumented field last i checked.

davidwding Jan 28, 2025

I see it on the public API. See spec.dedicated.cidr_range under the "schema" tab: https://www.cockroachlabs.com/docs/api/cloud/v1#post-/api/v1/clusters

src/current/cockroachcloud/physical-cluster-replication.md

+              - Clusters must be single [region]({% link cockroachcloud/regions.md %}) (multiple availability zones per clusteris supported).
+              - The primary and standby cluster in AWS and Azure must be in different regions.
+              - The primary and standby cluster in GCP can be in the same region, but must not have overlapping CIDR ranges.
+              - Clusters can have different [node topology]({% link cockroachcloud/plan-your-cluster-advanced.md %}#cluster-topology) and [hardware configurations]({% link cockroachcloud/plan-your-cluster-advanced.md %}#cluster-sizing-and-scaling). For disaster recovery purposes (failover and redirecting application traffic to a standby), we recommend configuring the primary and standby clusters with similar hardware.

alicia-l2 Jan 28, 2025

For disaster recovery purposes (failover and redirecting application traffic to a standby), we recommend configuring the primary and standby clusters with similar hardware.

Instead of calling out the DR specific purpose, can we say something along the lines of "To avoid hitting performance constraints, we recommend configuring the primary and standby clusters with similar node topology and hardware" ?

src/current/cockroachcloud/physical-cluster-replication.md


		### Step 1. Create the clusters

		To use PCR, it is necessary to set the standby cluster with the `support_physical_cluster_replication` field to `true`, which indicates that a cluster should start using an architecture that supports PCR. For details on supported cluster cloud provider and region setup, refer to [Configuration](#configuration).

alicia-l2 Jan 28, 2025

Do we want to call out that its highly recommended that you start your primary cluster with the 'support_physical_cluster_replication' field to 'true', but that you can still start PCR from an existing cluster if you must?

davidwding Jan 28, 2025

We should also note that an existing cluster that was started without the support_physical_cluster_replication flag can be the source of a PCR stream, but never the target.

src/current/cockroachcloud/physical-cluster-replication.md

+                  curl --location --request POST 'https://cockroachlabs.cloud/api/v1/clusters' --header "Authorization: Bearer api_secret_key" --header 'Content-Type: application/json' --data '{"name": "standby_cluster_name", "provider": "AWS", "spec": {"dedicated": {"cockroachVersion": "v24.3", "hardware": {"disk_iops": 0, "machine_spec": {"num_virtual_cpus": 4}, "storage_gib": 16}, "region_nodes": {"us-east-2": 3}, "support_physical_cluster_replication": true}}}'
+                  ~~~
+                  If you're creating clusters in AWS or Azure, you must start the primary and standby clusters in different regions.

alicia-l2 Jan 28, 2025

If we're going to say this, we should also call out the GCP overlapping CIDR range thing

src/current/cockroachcloud/physical-cluster-replication.md

+              ### Step 2. Start the PCR stream
+              {{site.data.alerts.callout_info}}
+              The standby cluster must be empty upon starting PCR. It is possible to write to both clusters before initiating the PCR stream, however, we recommend keeping the standby empty. That is, not writing to the standby prior to starting PCR. When you initiate the PCR stream, CockroachDB {{ site.data.products.cloud }} will take a full cluster backup of the standby cluster, delete all data from the standby, and then start the PCR stream.

alicia-l2 Jan 28, 2025

Should we start this section with: "We recommend starting with an empty standby cluster when starting PCR. When you start the PCR stream....then start the PCR stream." and then add "this is to ensure that the standby will be fully consistent with the primary during PCR"

src/current/cockroachcloud/physical-cluster-replication.md

+              curl --location --request PATCH "https://cockroachlabs.cloud/api/v1/replication-streams/7487d7a6-868b-4c6f-aa60-cc306cc525fe" --header "Authorization: Bearer api_secret_key" --header 'Content-Type:application/json' --data '{"status": "FAILING_OVER", "failover_at": "2025-01-13T19:35:14.472670Z"}'
+              ~~~
+              To fail over to the latest consistent time, you only need to include `"status": "FAILING_OVER"` in your request with one of the cluster IDs or PCR stream ID:

alicia-l2 Jan 28, 2025

small nit- can we swap the order of this to match the above bullet points?
i.e., latest time first, then pick a time second

src/current/cockroachcloud/physical-cluster-replication.md

+              After the failover is complete, both clusters can receive traffic and operate as separate clusters. It is necessary to redirect application traffic manually.
+              {{site.data.alerts.callout_info}}
+              PCR is on the cluster level, which means that the job also replicates all system tables. Users that need to access the standby cluster after failover should use the user roles for the primary cluster, because the standby cluster is a copy of the primary cluster. PCR overwrites all previous system tables on the standby cluster.

alicia-l2 Jan 28, 2025

PCR replicates on the cluster level?

src/current/cockroachcloud/physical-cluster-replication.md


		The tracked replicated time and the advancing protected timestamp provide the PCR job with enough information to track _retained time_, which is a timestamp in the past indicating the lower bound that the PCR stream could fail over to. Therefore, the _failover window_ for a PCR stream falls between the retained time and the replicated time.

		<img src="{{ 'images/v25.1/failover.svg' \| relative_url }}" alt="Timeline showing how the failover window is between the retained time and replicated time." style="border:0px solid #eee;width:100%" />

alicia-l2 Jan 28, 2025

i like this a lot! Would it also make sense to add it to the self-hosted PCR technical overview page?

davidwding reviewed

View reviewed changes

src/current/cockroachcloud/physical-cluster-replication.md


		### Fail back to the primary cluster

		To fail back from the standby to the primary cluster, start another PCR stream with the standby cluster as the `sourceClusterId` and the original primary cluster as the `targetClusterId`.

davidwding Jan 28, 2025

Is it worth noting that we attempt to perform fast failback if possible, and fall back to regular failback if not?

Contributor Author

kathancox commented Feb 5, 2025

Closing this PR as we're pushing back the launch of this feature.

kathancox closed this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet