LDR overview and monitoring pages #19085

kathancox · 2024-10-31T21:07:19Z

This PR adds the Monitoring page and General Overview page for the LDR section.

Preview

This PR is 2 of 4(?) for LDR docs. It contains:

Overview: General description of LDR and its features/use cases
Monitoring LDR: This is a page that describes what is available for monitoring and links out to the relevant docs material

Reviewers: You'll see a lot of comments for follow-up PRs to connect pages with links and such.

Content pages to come:
CREATE/SHOW sql ref
Technical Overview
LDR Metrics Dashboard docs

PR 1 #19043

github-actions · 2024-10-31T21:07:42Z

Files changed:

src/current/_includes/v24.3/ldr/multiple-tables.md:

src/current/_includes/v24.3/ldr/show-logical-replication-responses.md:

src/current/v24.3/logical-data-replication-monitoring.md

src/current/_includes/v24.3/sidebar-data/cross-cluster-replication.json
src/current/images/v24.3/east-west-region.svg:

src/current/v24.3/logical-data-replication-overview.md

src/current/images/v24.3/unidirectional.svg:

src/current/v24.3/logical-data-replication-overview.md

src/current/v24.3/logical-data-replication-monitoring.md
src/current/v24.3/logical-data-replication-overview.md
src/current/v24.3/manage-logical-data-replication.md
src/current/v24.3/set-up-logical-data-replication.md

netlify · 2024-10-31T21:07:49Z

✅ Deploy Preview for cockroachdb-interactivetutorials-docs canceled.

Name	Link
🔨 Latest commit	`e869cad`
🔍 Latest deploy log	https://app.netlify.com/sites/cockroachdb-interactivetutorials-docs/deploys/67364717bd88aa0008800852

netlify · 2024-10-31T21:07:50Z

✅ Deploy Preview for cockroachdb-api-docs canceled.

Name	Link
🔨 Latest commit	`e869cad`
🔍 Latest deploy log	https://app.netlify.com/sites/cockroachdb-api-docs/deploys/673647173e98c50008e0ff69

netlify · 2024-10-31T21:11:12Z

✅ Netlify Preview

Name	Link
🔨 Latest commit	`e869cad`
🔍 Latest deploy log	https://app.netlify.com/sites/cockroachdb-docs/deploys/673647171a3286000824eba9
😎 Deploy Preview	https://deploy-preview-19085--cockroachdb-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

src/current/v24.3/logical-data-replication-overview.md

alicia-l2

Left some comments, LGTM

src/current/v24.3/logical-data-replication-overview.md

src/current/_includes/v24.3/ldr/multiple-tables.md

src/current/v24.3/logical-data-replication-monitoring.md

alicia-l2 · 2024-11-05T20:12:06Z

src/current/_includes/v24.3/ldr/show-logical-replication-responses.md

@@ -0,0 +1,8 @@
+Field    | Response


Is this missing the "description" field in LDR responses?

yes it is, I am adding. Thanks for catching.

alicia-l2 · 2024-11-05T20:13:39Z

src/current/v24.3/logical-data-replication-monitoring.md

+
+## Recommended LDR metrics to track
+
+- Replication latency: The commit-to-commit replication latency. A _commit_ is when the LDR job either adds a row to the [dead letter queue (DLQ)]({% link {{ page.version.version }}/manage-logical-data-replication.md %}#dead-letter-queue-dlq) or applies a row successfully to the destination cluster.


"This is tracked from when a row is committed on the source cluster, to when it is "committed" on the target cluster. A commit on the target cluster is when"....

could we add this sentence in after "The commit-to-commit replication latency"

alicia-l2 · 2024-11-05T20:27:05Z

src/current/v24.3/set-up-logical-data-replication.md

+
+For more details, refer to the LDR [Known limitations]({% link {{ page.version.version }}/set-up-logical-data-replication.md %}#known-limitations).
+
+When you run LDR in `immediate` mode, you cannot replicate a table with [SQL constraints]({% link {{ page.version.version }}/constraints.md %}). In `validated` mode, SQL constraints **must** match. 


SQL constraints --> it should be "you cannot replicate a table with foreign key constraints". Other constraints are fine.

alicia-l2

LGTM added some small comments. thank you!

src/current/v24.3/logical-data-replication-overview.md

alicia-l2 · 2024-11-08T18:29:07Z

src/current/v24.3/logical-data-replication-overview.md

+
+Maintain [high availability]({% link {{ page.version.version }}/data-resilience.md %}#high-availability) with a two-datacenter topology. You can run bidirectional LDR to ensure [data resilience]({% link {{ page.version.version }}/data-resilience.md %}) in your deployment, particularly in datacenter or region failures. Both clusters can receive application reads and writes with low, single-region write latency. In a datacenter or cluster outage, you can redirect application traffic to the surviving cluster with [low downtime]({% link {{ page.version.version }}/data-resilience.md %}#high-availability). In the following diagram, the clusters are deployed in US East and West to provide low latency for that region. The two LDR jobs ensure that the tables on both clusters will reach eventual consistency.
+
+<image src="{{ 'images/v24.3/east-west-region.svg' | relative_url }}" alt="Diagram showing bidirectional LDR from cluster A to B and back again from cluster B to A." style="width:50%" />


small nit -- on top of the arrows can we write LDR stream #1 and LDR stream #2

alicia-l2 · 2024-11-08T18:33:24Z

src/current/v24.3/logical-data-replication-monitoring.md

+
+{% include_cached copy-clipboard.html %}
+~~~ sql
+CREATE LOGICAL REPLICATION STREAM FROM TABLE {database.public.table_name} 


is this WITH option included in the setup page too?

I need to add this in a few places

msbutler

looks great! left mostly nits

src/current/v24.3/logical-data-replication-overview.md

msbutler · 2024-11-11T15:58:44Z

src/current/v24.3/logical-data-replication-overview.md

+
+### Achieve high availability and single-region write latency in two-datacenter deployments
+
+Maintain [high availability]({% link {{ page.version.version }}/data-resilience.md %}#high-availability) and resilience to region failures with a two-datacenter topology. You can run bidirectional LDR to ensure [data resilience]({% link {{ page.version.version }}/data-resilience.md %}) in your deployment, particularly in datacenter or region failures. If you set up two single-region clusters, in LDR, both clusters can receive application reads and writes with low, single-region write latency. Then, in a datacenter, region, or cluster outage, you can redirect application traffic to the surviving cluster with [low downtime]({% link {{ page.version.version }}/data-resilience.md %}#high-availability). In the following diagram, the two single-region clusters are deployed in US East and West to provide low latency for that region. The two LDR jobs ensure that the tables on both clusters will reach eventual consistency.


@alicia-l2 are we planning to publicly document the pros and cons of using our MR feature suite vs LDR for the first use case? as well as zone cfgs + execution locality backup/cdc vs ldr for the second use case?

We can link to blog posts describing this for the first use case.
For second use case, @kathancox maybe we can try to get more specific about hardware/cluster specific isolation?

maybe above both use cases we should add a note saying that we consider this a tool and is an alternative deployment option to our native Raft/MRarchitecture

Should we just link to the Data Resilience page? I think that covers some of this, particularly when it comes to MR feature comparison. I'm adding a note linking to that now.

Anything further in terms of execution locality + cdc comps, we should probably create another docs issue for that.

yeah, i think we should revisit how we publicly document the pros and cons of LDR vs PCR vs CRDB replication, but i don't think we need to block this PR.

OK yes, I think it's something that @alicia-l2 and I are working on gradually. We recently published this page https://www.cockroachlabs.com/docs/dev/data-resilience and created a new top-level section, so I building out this area is a good idea.

src/current/v24.3/logical-data-replication-overview.md

msbutler · 2024-11-11T18:04:45Z

src/current/v24.3/logical-data-replication-monitoring.md

+
+### Responses
+
+{% include {{ page.version.version }}/ldr/show-logical-replication-responses.md %}


ugh, i may have to backport some changes to this table. this is fine to merge as is.

src/current/v24.3/logical-data-replication-monitoring.md

msbutler · 2024-11-11T18:14:34Z

src/current/v24.3/logical-data-replication-monitoring.md

+
+- `logical_replication.replicated_time_seconds`
+- `logical_replication.events_ingested`
+- `logical_replication.events_dlqed`


while we're here, you could add the .scanning_ranges and .lagging_ranges metrics. added on friday cockroachdb/cockroach@9eb6c8b

Thanks, I have added these to this label list

rmloveland

LGTM modulo that i strongly suggest if at all possible switching from scare quote "committed" to "applied" and then defining what "applied" means (see my comment) since COMMIT is already super duper in use and we should not be redefining it, it will only lead to tears IMO

src/current/v24.3/logical-data-replication-overview.md

rmloveland · 2024-11-14T15:53:49Z

src/current/_includes/v24.3/ldr/multiple-tables.md

@@ -0,0 +1 @@
+There are some tradeoffs between enabling one table per LDR job versus multiple tables in one LDR job. Multiple tables in one LDR job can be easier to operate. For example, if you pause and resume the single job, LDR will stop and resume for all the tables. However, the most granular level observability will be at the job level. One table in one LDR job will allow for table-level observability.


lol i was just checking if this was an include

rmloveland · 2024-11-14T15:56:50Z

src/current/_includes/v24.3/ldr/show-logical-replication-responses.md

+`job_id` | The job's ID. Use with [`CANCEL JOB`]({% link {{ page.version.version }}/cancel-job.md %}), [`PAUSE JOB`]({% link {{ page.version.version }}/pause-job.md %}), [`RESUME JOB`]({% link {{ page.version.version }}/resume-job.md %}), [`SHOW JOB`]({% link {{ page.version.version }}/show-jobs.md %}).
+`status` | Status of the job `running`, `paused`, `canceled`. {% comment  %}check these{% endcomment %}
+`targets` | The fully qualified name of the table(s) that are part of the LDR job.
+`replicated_time` | The latest timestamp at which the destination cluster has consistent data. This time advances automatically as long as the LDR job proceeds without error. `replicated_time` is updated periodically (every 30s). {% comment %}To confirm this line is accurate{% endcomment %}


timestamp could link to our TIMESTAMP type docs

rmloveland · 2024-11-14T16:01:54Z

src/current/v24.3/logical-data-replication-monitoring.md

+
+## Recommended LDR metrics to track
+
+- Replication latency: The commit-to-commit replication latency, which is tracked from when a row is committed on the source cluster, to when it is "committed" on the destination cluster. A _commit_ is when the LDR job either adds a row to the [dead letter queue (DLQ)]({% link {{ page.version.version }}/manage-logical-data-replication.md %}#dead-letter-queue-dlq) or applies a row successfully to the destination cluster.


I don't like the scare quotes around "committed" and i don't think overloading the term commit is a good idea

COMMIT is already very well defined semantically in SQL world and in our docs

i'd actually reverse this and say "to when it is applied on the destination cluster" where "applied" means either:

COMMIT to target table, OR

inserted to DLQ (which is also a COMMIT)

pls ignore for now if there isn't time for this but i think it's a serious problem if we are sprinkling "committed" to mean "LDR's special notion of committed" around LDR docs when we really mean "applied" per the definition i'm using above which doesn't cause the same confusion IMO

Got it, so I updated here where I could to "apply". Ofc the metric name has "commit" in the syntax and the same is true for at least one changefeed metric.

src/current/v24.3/logical-data-replication-monitoring.md

rmloveland · 2024-11-14T16:05:07Z

src/current/v24.3/logical-data-replication-monitoring.md

+
+### Jobs page
+
+On the **Jobs** page, select:


same as above re: recommend linking to jobs page docs for we who live the memento lifestyle

kathancox · 2024-11-14T19:00:17Z

TFTRs!

kathancox force-pushed the ldr-overview-monitoring branch 3 times, most recently from cf4b5b4 to 62a9a70 Compare November 5, 2024 19:50

kathancox changed the title ~~WIP: LDR overview and monitoring pages~~ LDR overview and monitoring pages Nov 5, 2024

alicia-l2 reviewed Nov 5, 2024

View reviewed changes

src/current/v24.3/logical-data-replication-overview.md Outdated Show resolved Hide resolved

alicia-l2 reviewed Nov 5, 2024

View reviewed changes

src/current/v24.3/logical-data-replication-overview.md Outdated Show resolved Hide resolved

alicia-l2 reviewed Nov 5, 2024

View reviewed changes

src/current/v24.3/logical-data-replication-overview.md Outdated Show resolved Hide resolved

alicia-l2 reviewed Nov 5, 2024

View reviewed changes

src/current/v24.3/logical-data-replication-overview.md Outdated Show resolved Hide resolved

alicia-l2 reviewed Nov 5, 2024

View reviewed changes

src/current/v24.3/logical-data-replication-overview.md Outdated Show resolved Hide resolved

kathancox force-pushed the ldr-overview-monitoring branch from 62a9a70 to fdd7d51 Compare November 5, 2024 20:04

alicia-l2 approved these changes Nov 5, 2024

View reviewed changes

alicia-l2 reviewed Nov 5, 2024

View reviewed changes

kathancox force-pushed the ldr-overview-monitoring branch 2 times, most recently from 200f88e to fea64d0 Compare November 5, 2024 21:47

kathancox marked this pull request as ready for review November 5, 2024 21:48

kathancox marked this pull request as draft November 7, 2024 16:11

kathancox force-pushed the ldr-overview-monitoring branch from fea64d0 to 31d37a6 Compare November 7, 2024 19:10

kathancox marked this pull request as ready for review November 7, 2024 19:19

alicia-l2 approved these changes Nov 8, 2024

View reviewed changes

kathancox force-pushed the ldr-overview-monitoring branch from 31d37a6 to 7c1ba99 Compare November 8, 2024 20:44

msbutler reviewed Nov 11, 2024

View reviewed changes

kathancox force-pushed the ldr-overview-monitoring branch from 7c1ba99 to f1df57f Compare November 12, 2024 15:40

kathancox requested a review from msbutler November 12, 2024 15:50

msbutler approved these changes Nov 13, 2024

View reviewed changes

kathancox requested a review from rmloveland November 13, 2024 15:21

rmloveland approved these changes Nov 14, 2024

View reviewed changes

kathancox force-pushed the ldr-overview-monitoring branch 2 times, most recently from 0f7daee to 2e29c5a Compare November 14, 2024 18:42

kathancox added 11 commits November 14, 2024 13:53

Feedback + known limitations

e39af5a

Feedback/corrections

073b656

Update write modes

c9f07c7

Michael's feedback

7e343df

Update schema change directions

5f743dd

overview and monitoring pages

f53fa76

Feedback

9c7506a

Alicia feedback

021672f

Alicia feedback

4e934b3

Michael's feedback

955cc90

Rich's feedback

e869cad

kathancox force-pushed the ldr-overview-monitoring branch from 2e29c5a to e869cad Compare November 14, 2024 18:53

kathancox merged commit 36c94c6 into main Nov 14, 2024
6 checks passed

kathancox deleted the ldr-overview-monitoring branch November 14, 2024 19:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LDR overview and monitoring pages #19085

LDR overview and monitoring pages #19085

kathancox commented Oct 31, 2024 •

edited

Loading

github-actions bot commented Oct 31, 2024 •

edited

Loading

netlify bot commented Oct 31, 2024 •

edited

Loading

netlify bot commented Oct 31, 2024 •

edited

Loading

netlify bot commented Oct 31, 2024 •

edited

Loading

alicia-l2 left a comment

alicia-l2 Nov 5, 2024

kathancox Nov 5, 2024

alicia-l2 Nov 5, 2024

alicia-l2 Nov 5, 2024

alicia-l2 left a comment

alicia-l2 Nov 8, 2024

alicia-l2 Nov 8, 2024

kathancox Nov 8, 2024

msbutler left a comment

msbutler Nov 11, 2024

alicia-l2 Nov 11, 2024

kathancox Nov 12, 2024

msbutler Nov 13, 2024 •

edited

Loading

kathancox Nov 13, 2024

msbutler Nov 11, 2024

kathancox Nov 12, 2024

msbutler Nov 11, 2024

kathancox Nov 12, 2024

rmloveland left a comment

rmloveland Nov 14, 2024

rmloveland Nov 14, 2024

kathancox Nov 14, 2024

rmloveland Nov 14, 2024

kathancox Nov 14, 2024

rmloveland Nov 14, 2024

kathancox Nov 14, 2024

kathancox commented Nov 14, 2024


		## Recommended LDR metrics to track

		- Replication latency: The commit-to-commit replication latency. A _commit_ is when the LDR job either adds a row to the [dead letter queue (DLQ)]({% link {{ page.version.version }}/manage-logical-data-replication.md %}#dead-letter-queue-dlq) or applies a row successfully to the destination cluster.


		For more details, refer to the LDR [Known limitations]({% link {{ page.version.version }}/set-up-logical-data-replication.md %}#known-limitations).

		When you run LDR in `immediate` mode, you cannot replicate a table with [SQL constraints]({% link {{ page.version.version }}/constraints.md %}). In `validated` mode, SQL constraints must match.


		Maintain [high availability]({% link {{ page.version.version }}/data-resilience.md %}#high-availability) with a two-datacenter topology. You can run bidirectional LDR to ensure [data resilience]({% link {{ page.version.version }}/data-resilience.md %}) in your deployment, particularly in datacenter or region failures. Both clusters can receive application reads and writes with low, single-region write latency. In a datacenter or cluster outage, you can redirect application traffic to the surviving cluster with [low downtime]({% link {{ page.version.version }}/data-resilience.md %}#high-availability). In the following diagram, the clusters are deployed in US East and West to provide low latency for that region. The two LDR jobs ensure that the tables on both clusters will reach eventual consistency.

		<image src="{{ 'images/v24.3/east-west-region.svg' \| relative_url }}" alt="Diagram showing bidirectional LDR from cluster A to B and back again from cluster B to A." style="width:50%" />


		### Achieve high availability and single-region write latency in two-datacenter deployments

		Maintain [high availability]({% link {{ page.version.version }}/data-resilience.md %}#high-availability) and resilience to region failures with a two-datacenter topology. You can run bidirectional LDR to ensure [data resilience]({% link {{ page.version.version }}/data-resilience.md %}) in your deployment, particularly in datacenter or region failures. If you set up two single-region clusters, in LDR, both clusters can receive application reads and writes with low, single-region write latency. Then, in a datacenter, region, or cluster outage, you can redirect application traffic to the surviving cluster with [low downtime]({% link {{ page.version.version }}/data-resilience.md %}#high-availability). In the following diagram, the two single-region clusters are deployed in US East and West to provide low latency for that region. The two LDR jobs ensure that the tables on both clusters will reach eventual consistency.


		### Responses

		{% include {{ page.version.version }}/ldr/show-logical-replication-responses.md %}

		@@ -0,0 +1 @@
		There are some tradeoffs between enabling one table per LDR job versus multiple tables in one LDR job. Multiple tables in one LDR job can be easier to operate. For example, if you pause and resume the single job, LDR will stop and resume for all the tables. However, the most granular level observability will be at the job level. One table in one LDR job will allow for table-level observability.

LDR overview and monitoring pages #19085

LDR overview and monitoring pages #19085

Conversation

kathancox commented Oct 31, 2024 • edited Loading

Preview

github-actions bot commented Oct 31, 2024 • edited Loading

Files changed:

netlify bot commented Oct 31, 2024 • edited Loading

✅ Deploy Preview for cockroachdb-interactivetutorials-docs canceled.

netlify bot commented Oct 31, 2024 • edited Loading

✅ Deploy Preview for cockroachdb-api-docs canceled.

netlify bot commented Oct 31, 2024 • edited Loading

✅ Netlify Preview

alicia-l2 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alicia-l2 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

msbutler left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

msbutler Nov 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rmloveland left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kathancox commented Nov 14, 2024

kathancox commented Oct 31, 2024 •

edited

Loading

github-actions bot commented Oct 31, 2024 •

edited

Loading

netlify bot commented Oct 31, 2024 •

edited

Loading

netlify bot commented Oct 31, 2024 •

edited

Loading

netlify bot commented Oct 31, 2024 •

edited

Loading

msbutler Nov 13, 2024 •

edited

Loading