Skip to content

Commit

Permalink
Michael's feedback
Browse files Browse the repository at this point in the history
  • Loading branch information
kathancox committed Nov 14, 2024
1 parent 4e934b3 commit 955cc90
Show file tree
Hide file tree
Showing 3 changed files with 9 additions and 27 deletions.
8 changes: 3 additions & 5 deletions src/current/v24.3/logical-data-replication-monitoring.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,15 +64,11 @@ SHOW LOGICAL REPLICATION JOBS WITH details;
- Replication latency: The commit-to-commit replication latency, which is tracked from when a row is committed on the source cluster, to when it is "committed" on the destination cluster. A _commit_ is when the LDR job either adds a row to the [dead letter queue (DLQ)]({% link {{ page.version.version }}/manage-logical-data-replication.md %}#dead-letter-queue-dlq) or applies a row successfully to the destination cluster.
- `logical_replication.commit_latency-p50`
- `logical_replication.commit_latency-p99`
- Replication lag: How far behind the source cluster is from the destination cluster at a specific point in time. The replication lag is equivalent to [RPO]({% link {{ page.version.version }}/disaster-recovery-overview.md %}) during a disaster.
- Replication lag: How far behind the source cluster is from the destination cluster at a specific point in time. The replication lag is equivalent to [RPO]({% link {{ page.version.version }}/disaster-recovery-overview.md %}) during a disaster. Calculate the replication lag with this metric. For example, `time.now() - replicated_time_seconds`.
- `logical_replication.replicated_time_seconds`
- Row updates applied: These metrics indicate whether the destination cluster is actively receiving and applying data from the source cluster.
- `logical_replication.events_ingested`
- `logical_replication.events_dlqed`
- Events dead letter queued: How often the LDR job is putting writes in the DLQ because they cannot be applied successfully on the destination cluster.
- `logical_replication.events_dlqed_age`
- `logical_replication.events_dlqed_space`
- `logical_replication.events_dlqed_errtype`

## DB Console

Expand Down Expand Up @@ -120,6 +116,8 @@ To view metrics at the job level, you can use the `label` option when you start
- `logical_replication.replicated_time_seconds`
- `logical_replication.events_ingested`
- `logical_replication.events_dlqed`
- `logical_replication.scanning_ranges`
- `logical_replication.catchup_ranges`

To use metrics labels, ensure you have enabled the child metrics cluster setting:

Expand Down
8 changes: 6 additions & 2 deletions src/current/v24.3/logical-data-replication-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ toc: true
{% include feature-phases/preview.md %}
{{site.data.alerts.end}}

{% include_cached new-in.html version="v24.3" %} **Logical data replication (LDR)** continuously replicates tables between active CockroachDB clusters. Both source and destination cluster can receive application reads and writes, with LDR enabling bidirectional replication for eventual consistency in the replicating tables. The active-active setup between clusters can provide protection against cluster, datacenter, or region failure while still achieving single-region low latency reads and writes in the individual CockroachDB clusters. Each cluster in an LDR job still benefits individually from [multi-active availability]({% link {{ page.version.version }}/multi-active-availability.md %}) with CockroachDB's built-in [Raft replication]({% link {{ page.version.version }}/demo-replication-and-rebalancing.md %}) providing data consistency across nodes, zones, and regions.
{% include_cached new-in.html version="v24.3" %} **Logical data replication (LDR)** continuously replicates tables between an active _source_ CockroachDB cluster to an active _destination_ CockroachDB cluster. Both source and destination can receive application reads and writes, and participate in [_bidirectional_](#use-cases) LDR replication for eventual consistency in the replicating tables. The active-active setup between clusters can provide protection against cluster, datacenter, or region failure while still achieving single-region low latency reads and writes in the individual CockroachDB clusters. Each cluster in an LDR job still benefits individually from [multi-active availability]({% link {{ page.version.version }}/multi-active-availability.md %}) with CockroachDB's built-in [Raft replication]({% link {{ page.version.version }}/demo-replication-and-rebalancing.md %}) providing data consistency across nodes, zones, and regions.

{{site.data.alerts.callout_success}}
Cockroach Labs also has a [physical cluster replication]({% link {{ page.version.version }}/physical-cluster-replication-overview.md %}) tool that continuously replicates data for transactional consistency from a primary cluster to an independent standby cluster.
Expand All @@ -21,6 +21,10 @@ You can run LDR in a _unidirectional_ or _bidirectional_ setup to meet different
- [High availability and single-region write latency in two-datacenter deployments](#achieve-high-availability-and-single-region-write-latency-in-two-datacenter-deployments)
- [Workload isolation between clusters](#achieve-workload-isolation-between-clusters)

{{site.data.alerts.callout_info}}
For a comparison of CockroachDB high availability and resilience features and tooling, refer to the [Data Resilience]({% link {{ page.version.version }}/data-resilience.md %}) page.
{{site.data.alerts.end}}

### Achieve high availability and single-region write latency in two-datacenter deployments

Maintain [high availability]({% link {{ page.version.version }}/data-resilience.md %}#high-availability) and resilience to region failures with a two-datacenter topology. You can run bidirectional LDR to ensure [data resilience]({% link {{ page.version.version }}/data-resilience.md %}) in your deployment, particularly in datacenter or region failures. If you set up two single-region clusters, in LDR, both clusters can receive application reads and writes with low, single-region write latency. Then, in a datacenter, region, or cluster outage, you can redirect application traffic to the surviving cluster with [low downtime]({% link {{ page.version.version }}/data-resilience.md %}#high-availability). In the following diagram, the two single-region clusters are deployed in US East and West to provide low latency for that region. The two LDR jobs ensure that the tables on both clusters will reach eventual consistency.
Expand All @@ -29,7 +33,7 @@ Maintain [high availability]({% link {{ page.version.version }}/data-resilience.

### Achieve workload isolation between clusters

Isolate critical application workloads from non-critical application workloads in a unidirectional setup. For example, you may want to run jobs like [changefeeds]({% link {{ page.version.version }}/change-data-capture-overview.md %}) or [backups]({% link {{ page.version.version }}/backup-and-restore-overview.md %}) from one cluster to isolate these jobs from the cluster receiving the principal application traffic.
Isolate critical application workloads from non-critical application workloads. For example, you may want to run jobs like [changefeeds]({% link {{ page.version.version }}/change-data-capture-overview.md %}) or [backups]({% link {{ page.version.version }}/backup-and-restore-overview.md %}) from one cluster to isolate these jobs from the cluster receiving the principal application traffic.

<image src="{{ 'images/v24.3/unidirectional.svg' | relative_url }}" alt="Diagram showing unidirectional LDR from a source cluster to a destination cluster with the destination cluster supporting secondary workloads plus jobs and the source cluster accepting the main application traffic." style="width:80%" />

Expand Down
20 changes: 0 additions & 20 deletions src/current/v24.3/manage-logical-data-replication.md
Original file line number Diff line number Diff line change
Expand Up @@ -155,26 +155,6 @@ You have a bidirectional LDR setup with a stream between cluster A to cluster B,
CREATE LOGICAL REPLICATION STREAM FROM TABLE {database.public.table_name} ON 'external://{source_external_connection}' INTO TABLE {database.public.table_name};
~~~

<<<<<<< HEAD
=======
#### Continue application traffic on both clusters independently

1. Drop the LDR job on both clusters A and B. Canceling the LDR streams will remove the history retention job, which will cause the data to be garbage collected according to the [`gc.ttlseconds`]({% link {{ page.version.version }}/configure-replication-zones.md %}#gc-ttlseconds) setting. Use [`CANCEL JOB`]({% link {{ page.version.version }}/cancel-job.md %}):

{% include_cached copy-clipboard.html %}
~~~ sql
CANCEL JOB {ldr_job_id};
~~~

1. Perform the schema change on the table in both clusters independently. Application traffic can run to both clusters as usual.
1. After the schema changes on both clusters have completed successfully, create new LDR streams **without** a `cursor` timestamp for the table on both clusters A and B. The LDR jobs will start a full initial scan of the source tables. Run `CREATE LOGICAL REPLICATION STREAM` from the **destination** cluster for each stream:

{% include_cached copy-clipboard.html %}
~~~ sql
CREATE LOGICAL REPLICATION STREAM FROM TABLE {database.public.table_name} ON 'external://{source_external_connection}' INTO TABLE {database.public.table_name};
~~~

>>>>>>> f838b6900 (Update schema change directions)
#### Coordinate schema changes for unidirectional LDR

If you have a unidirectional LDR setup, you should cancel the running LDR stream and redirect all application traffic to the source cluster.
Expand Down

0 comments on commit 955cc90

Please sign in to comment.