From 7f3947a4a3203ff9c28a4f6cfe7280b11e2fc3bb Mon Sep 17 00:00:00 2001 From: Kathryn May Date: Tue, 12 Nov 2024 10:40:36 -0500 Subject: [PATCH] Michael's feedback --- .../logical-data-replication-monitoring.md | 8 +++----- .../logical-data-replication-overview.md | 8 ++++++-- .../v24.3/manage-logical-data-replication.md | 20 ------------------- 3 files changed, 9 insertions(+), 27 deletions(-) diff --git a/src/current/v24.3/logical-data-replication-monitoring.md b/src/current/v24.3/logical-data-replication-monitoring.md index 2e324be2085..cf54b302765 100644 --- a/src/current/v24.3/logical-data-replication-monitoring.md +++ b/src/current/v24.3/logical-data-replication-monitoring.md @@ -64,15 +64,11 @@ SHOW LOGICAL REPLICATION JOBS WITH details; - Replication latency: The commit-to-commit replication latency, which is tracked from when a row is committed on the source cluster, to when it is "committed" on the destination cluster. A _commit_ is when the LDR job either adds a row to the [dead letter queue (DLQ)]({% link {{ page.version.version }}/manage-logical-data-replication.md %}#dead-letter-queue-dlq) or applies a row successfully to the destination cluster. - `logical_replication.commit_latency-p50` - `logical_replication.commit_latency-p99` -- Replication lag: How far behind the source cluster is from the destination cluster at a specific point in time. The replication lag is equivalent to [RPO]({% link {{ page.version.version }}/disaster-recovery-overview.md %}) during a disaster. +- Replication lag: How far behind the source cluster is from the destination cluster at a specific point in time. The replication lag is equivalent to [RPO]({% link {{ page.version.version }}/disaster-recovery-overview.md %}) during a disaster. Calculate the replication lag with this metric. For example, `time.now() - replicated_time_seconds`. - `logical_replication.replicated_time_seconds` - Row updates applied: These metrics indicate whether the destination cluster is actively receiving and applying data from the source cluster. - `logical_replication.events_ingested` - `logical_replication.events_dlqed` -- Events dead letter queued: How often the LDR job is putting writes in the DLQ because they cannot be applied successfully on the destination cluster. - - `logical_replication.events_dlqed_age` - - `logical_replication.events_dlqed_space` - - `logical_replication.events_dlqed_errtype` ## DB Console @@ -120,6 +116,8 @@ To view metrics at the job level, you can use the `label` option when you start - `logical_replication.replicated_time_seconds` - `logical_replication.events_ingested` - `logical_replication.events_dlqed` +- `logical_replication.scanning_ranges` +- `logical_replication.catchup_ranges` To use metrics labels, ensure you have enabled the child metrics cluster setting: diff --git a/src/current/v24.3/logical-data-replication-overview.md b/src/current/v24.3/logical-data-replication-overview.md index 57094682d3d..9ce2e733fca 100644 --- a/src/current/v24.3/logical-data-replication-overview.md +++ b/src/current/v24.3/logical-data-replication-overview.md @@ -8,7 +8,7 @@ toc: true {% include feature-phases/preview.md %} {{site.data.alerts.end}} -{% include_cached new-in.html version="v24.3" %} **Logical data replication (LDR)** continuously replicates tables between active CockroachDB clusters. Both source and destination cluster can receive application reads and writes, with LDR enabling bidirectional replication for eventual consistency in the replicating tables. The active-active setup between clusters can provide protection against cluster, datacenter, or region failure while still achieving single-region low latency reads and writes in the individual CockroachDB clusters. Each cluster in an LDR job still benefits individually from [multi-active availability]({% link {{ page.version.version }}/multi-active-availability.md %}) with CockroachDB's built-in [Raft replication]({% link {{ page.version.version }}/demo-replication-and-rebalancing.md %}) providing data consistency across nodes, zones, and regions. +{% include_cached new-in.html version="v24.3" %} **Logical data replication (LDR)** continuously replicates tables between an active _source_ CockroachDB cluster to an active _destination_ CockroachDB cluster. Both source and destination can receive application reads and writes, and participate in [_bidirectional_](#use-cases) LDR replication for eventual consistency in the replicating tables. The active-active setup between clusters can provide protection against cluster, datacenter, or region failure while still achieving single-region low latency reads and writes in the individual CockroachDB clusters. Each cluster in an LDR job still benefits individually from [multi-active availability]({% link {{ page.version.version }}/multi-active-availability.md %}) with CockroachDB's built-in [Raft replication]({% link {{ page.version.version }}/demo-replication-and-rebalancing.md %}) providing data consistency across nodes, zones, and regions. {{site.data.alerts.callout_success}} Cockroach Labs also has a [physical cluster replication]({% link {{ page.version.version }}/physical-cluster-replication-overview.md %}) tool that continuously replicates data for transactional consistency from a primary cluster to an independent standby cluster. @@ -21,6 +21,10 @@ You can run LDR in a _unidirectional_ or _bidirectional_ setup to meet different - [High availability and single-region write latency in two-datacenter deployments](#achieve-high-availability-and-single-region-write-latency-in-two-datacenter-deployments) - [Workload isolation between clusters](#achieve-workload-isolation-between-clusters) +{{site.data.alerts.callout_info}} +For a comparison of CockroachDB high availability and resilience features and tooling, refer to the [Data Resilience]({% link {{ page.version.version }}/data-resilience.md %}) page. +{{site.data.alerts.end}} + ### Achieve high availability and single-region write latency in two-datacenter deployments Maintain [high availability]({% link {{ page.version.version }}/data-resilience.md %}#high-availability) and resilience to region failures with a two-datacenter topology. You can run bidirectional LDR to ensure [data resilience]({% link {{ page.version.version }}/data-resilience.md %}) in your deployment, particularly in datacenter or region failures. If you set up two single-region clusters, in LDR, both clusters can receive application reads and writes with low, single-region write latency. Then, in a datacenter, region, or cluster outage, you can redirect application traffic to the surviving cluster with [low downtime]({% link {{ page.version.version }}/data-resilience.md %}#high-availability). In the following diagram, the two single-region clusters are deployed in US East and West to provide low latency for that region. The two LDR jobs ensure that the tables on both clusters will reach eventual consistency. @@ -29,7 +33,7 @@ Maintain [high availability]({% link {{ page.version.version }}/data-resilience. ### Achieve workload isolation between clusters -Isolate critical application workloads from non-critical application workloads in a unidirectional setup. For example, you may want to run jobs like [changefeeds]({% link {{ page.version.version }}/change-data-capture-overview.md %}) or [backups]({% link {{ page.version.version }}/backup-and-restore-overview.md %}) from one cluster to isolate these jobs from the cluster receiving the principal application traffic. +Isolate critical application workloads from non-critical application workloads. For example, you may want to run jobs like [changefeeds]({% link {{ page.version.version }}/change-data-capture-overview.md %}) or [backups]({% link {{ page.version.version }}/backup-and-restore-overview.md %}) from one cluster to isolate these jobs from the cluster receiving the principal application traffic. Diagram showing unidirectional LDR from a source cluster to a destination cluster with the destination cluster supporting secondary workloads plus jobs and the source cluster accepting the main application traffic. diff --git a/src/current/v24.3/manage-logical-data-replication.md b/src/current/v24.3/manage-logical-data-replication.md index 8c847c5d71d..6e9b47fa1cf 100644 --- a/src/current/v24.3/manage-logical-data-replication.md +++ b/src/current/v24.3/manage-logical-data-replication.md @@ -155,26 +155,6 @@ You have a bidirectional LDR setup with a stream between cluster A to cluster B, CREATE LOGICAL REPLICATION STREAM FROM TABLE {database.public.table_name} ON 'external://{source_external_connection}' INTO TABLE {database.public.table_name}; ~~~ -<<<<<<< HEAD -======= -#### Continue application traffic on both clusters independently - -1. Drop the LDR job on both clusters A and B. Canceling the LDR streams will remove the history retention job, which will cause the data to be garbage collected according to the [`gc.ttlseconds`]({% link {{ page.version.version }}/configure-replication-zones.md %}#gc-ttlseconds) setting. Use [`CANCEL JOB`]({% link {{ page.version.version }}/cancel-job.md %}): - - {% include_cached copy-clipboard.html %} - ~~~ sql - CANCEL JOB {ldr_job_id}; - ~~~ - -1. Perform the schema change on the table in both clusters independently. Application traffic can run to both clusters as usual. -1. After the schema changes on both clusters have completed successfully, create new LDR streams **without** a `cursor` timestamp for the table on both clusters A and B. The LDR jobs will start a full initial scan of the source tables. Run `CREATE LOGICAL REPLICATION STREAM` from the **destination** cluster for each stream: - - {% include_cached copy-clipboard.html %} - ~~~ sql - CREATE LOGICAL REPLICATION STREAM FROM TABLE {database.public.table_name} ON 'external://{source_external_connection}' INTO TABLE {database.public.table_name}; - ~~~ - ->>>>>>> f838b6900 (Update schema change directions) #### Coordinate schema changes for unidirectional LDR If you have a unidirectional LDR setup, you should cancel the running LDR stream and redirect all application traffic to the source cluster.