Skip to content

Commit

Permalink
Add new changefeed DB Console graphs (#18011)
Browse files Browse the repository at this point in the history
  • Loading branch information
kathancox authored Nov 2, 2023
1 parent a64b31b commit 986ca68
Show file tree
Hide file tree
Showing 17 changed files with 97 additions and 29 deletions.
2 changes: 1 addition & 1 deletion src/current/_includes/v23.2/ui/ui-metrics-navigation.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@ Use the **Graph** menu to display metrics for your entire cluster or for a speci

To the right of the Graph and Dashboard menus, a time interval selector allows you to filter the view for a predefined or custom time interval. Use the navigation buttons to move to the previous, next, or current time interval. When you select a time interval, the same interval is selected in the [SQL Activity]({% link {{ page.version.version }}/ui-overview.md %}#sql-activity) pages. However, if you select 10 or 30 minutes, the interval defaults to 1 hour in SQL Activity pages.

When viewing graphs, a tooltip will appear at your mouse cursor providing further insight into the data under the mouse cursor. Click anywhere within the graph to pin the tooltip in place, decoupling the tooltip from your mouse movements. Click anywhere within the graph to cause the tooltip to follow your mouse once more.
When viewing graphs, two perpendicular lines will appear at your mouse cursor providing further insight into the data. The metric values are displayed in the legend under the graph. Click anywhere within the graph to pin the values in place, decoupling the values from your mouse movements. Click anywhere within the graph to cause the values to change with your mouse movements once more.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added src/current/images/v23.2/ui-changefeed-status.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added src/current/images/v23.2/ui-commit-latency.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added src/current/images/v23.2/ui-emitted-bytes.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added src/current/images/v23.2/ui-sink-counts.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed src/current/images/v23.2/ui_changefeed_restarts.png
Binary file not shown.
Binary file removed src/current/images/v23.2/ui_sink_byte_traffic.png
Binary file not shown.
Binary file removed src/current/images/v23.2/ui_sink_counts.png
Binary file not shown.
8 changes: 4 additions & 4 deletions src/current/v23.2/monitor-and-debug-changefeeds.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,10 @@ toc: true
docs_area: stream_data
---

{{site.data.alerts.callout_info}}
Monitoring is only available for [{{ site.data.products.enterprise }} changefeeds]({% link {{ page.version.version }}/change-data-capture-overview.md %}#stream-row-level-changes-with-changefeeds).
{{site.data.alerts.end}}

Changefeeds work as jobs in CockroachDB, which allows for [monitoring](#monitor-a-changefeed) and [debugging](#debug-a-changefeed) through the [DB Console]({% link {{ page.version.version }}/ui-overview.md %}) [**Jobs**]({% link {{ page.version.version }}/ui-jobs-page.md %}) page and [`SHOW JOBS`]({% link {{ page.version.version }}/show-jobs.md %}) SQL statements using the job ID.

<a name="changefeed-retry-errors"></a>
Expand All @@ -24,10 +28,6 @@ We recommend monitoring changefeeds with [Prometheus]({% link {{ page.version.ve

## Monitor a changefeed

{{site.data.alerts.callout_info}}
Monitoring is only available for {{ site.data.products.enterprise }} changefeeds.
{{site.data.alerts.end}}

Changefeed progress is exposed as a high-water timestamp that advances as the changefeed progresses. This is a guarantee that all changes before or at the timestamp have been emitted. You can monitor a changefeed:

- On the [**Changefeeds** dashboard]({% link {{ page.version.version }}/ui-cdc-dashboard.md %}) of the DB Console.
Expand Down
116 changes: 92 additions & 24 deletions src/current/v23.2/ui-cdc-dashboard.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,65 +9,133 @@ The **Changefeeds** dashboard in the DB Console lets you monitor the [changefeed

To view this dashboard, [access the DB Console]({% link {{ page.version.version }}/ui-overview.md %}#db-console-access), click **Metrics** on the left-hand navigation bar, and then select **Dashboard** > **Changefeeds**.

{{site.data.alerts.callout_success}}
For more general recommendations on metrics and tooling options for change data capture monitoring, refer to the [Monitor and Debug Changefeeds]({% link {{ page.version.version }}/monitor-and-debug-changefeeds.md %}) page.
{{site.data.alerts.end}}

## Dashboard navigation

{% include {{ page.version.version }}/ui/ui-metrics-navigation.md %}

The **Changefeeds** dashboard displays the following time series graphs:

## Max Changefeed Latency
## Changefeed Status

This graph shows the maximum latency for resolved timestamps of any running changefeed.
This graph displays the status of all running changefeeds.

<img src="{{ 'images/v23.2/ui_max_changefeed.png' | relative_url }}" alt="DB Console Max Changefeed Latency graph" style="border:1px solid #eee;max-width:100%" />
<img src="{{ 'images/v23.2/ui-changefeed-status.png' | relative_url }}" alt="DB Console Changefeed Status graph showing running, paused, and failed changefeeds." style="border:1px solid #eee;max-width:100%" />

{{site.data.alerts.callout_info}}
The maximum latency for resolved timestamps is distinct from and slower than the commit-to-emit latency for individual change messages. For more information about resolved timestamps, see [Ordering guarantees]({% link {{ page.version.version }}/changefeed-messages.md %}#ordering-guarantees).
{{site.data.alerts.end}}
Metric | Description
--------|----
**Running** | The number of changefeeds running.
**Paused** | The number of [paused]({% link {{ page.version.version }}/pause-job.md %}) changefeeds.
**Failed** | The number of changefeeds that have failed.

Refer to [`RESUME JOB`]({% link {{ page.version.version }}/resume-job.md %}) and [`PAUSE JOB`]({% link {{ page.version.version }}/pause-job.md %}) to manage a changefeed.

In the case of a failed changefeed, you may want to use the [`cursor`]({% link {{ page.version.version }}/create-changefeed.md %}#cursor-option) option to restart the changefeed. Refer to the [Start a new changefeed where the previous failed]({% link {{ page.version.version }}/create-changefeed.md %}#start-a-new-changefeed-where-another-ended) example.

## Commit Latency

This graph displays the 99th, 90th, and 50th percentile of commit latency for running changefeeds. This is the difference between an event's MVCC timestamp and the time it was acknowledged as received by the [downstream sink]({% link {{ page.version.version }}/changefeed-sinks.md %}).

<img src="{{ 'images/v23.2/ui-commit-latency.png' | relative_url }}" alt="DB Console Commit Latency graph showing the 99th, 90th, and 50th percentile of commit latency." style="border:1px solid #eee;max-width:100%" />

If the sink batches events, then the difference between the oldest event in the batch and acknowledgement is recorded. Latency during backfill is excluded.

## Sink Byte Traffic
## Emitted Bytes

This graph shows the number of bytes emitted by CockroachDB into the sink for changefeeds.
This graph shows the number of bytes emitted by CockroachDB into the changefeed's [downstream sink]({% link {{ page.version.version }}/changefeed-sinks.md %}).

<img src="{{ 'images/v23.2/ui_sink_byte_traffic.png' | relative_url }}" alt="DB Console Sink Byte Traffic graph" style="border:1px solid #eee;max-width:100%" />
<img src="{{ 'images/v23.2/ui-emitted-bytes.png' | relative_url }}" alt="DB Console Emitted Bytes Graph showing the time and emitted bites" style="border:1px solid #eee;max-width:100%" />

Metric | Description
--------|----
**Emitted Bytes** | The number of bytes emitted by CockroachDB into the sink for changefeeds.
**Emitted Bytes** | The number of bytes emitted by CockroachDB into the configured changefeed sinks.

## Sink Counts

This graph shows:
This graph displays data relating to the number of messages and flushes at the changefeed sink.

- The number of messages that CockroachDB sent to the sink.
- The number of flushes that the sink performed for changefeeds.

<img src="{{ 'images/v23.2/ui_sink_counts.png' | relative_url }}" alt="DB Console Sink Counts graph" style="border:1px solid #eee;max-width:100%" />
<img src="{{ 'images/v23.2/ui-sink-counts.png' | relative_url }}" alt="DB Console Sink Counts graph" style="border:1px solid #eee;max-width:100%" />

Metric | Description
--------|----
**Messages** | The number of messages that CockroachDB sent to the sink for changefeeds.
**Flushes** | The the number of flushes that the sink performed for changefeeds.
**Messages** | The number of messages that CockroachDB sent to the downstream sink.
**Flushes** | The total number of flushes to the sink.

## Sink Timings
## Max Checkpoint Latency

This graph shows:
This graph displays the most any changefeed's persisted [checkpoint]({% link {{ page.version.version }}/how-does-an-enterprise-changefeed-work.md %}) is behind the present time. Larger values indicate issues with successfully ingesting or emitting changes. If errors cause a changefeed to restart, or the changefeed is [paused]({% link {{ page.version.version }}/pause-job.md %}) and unpaused, emitted data up to the last checkpoint may be re-emitted.

- The time in milliseconds per second required by CockroachDB to send messages to the sink.
- The time CockroachDB spent waiting for the sink to flush the messages for changefeeds.
<img src="{{ 'images/v23.2/ui-max-checkpoint-latency.png' | relative_url }}" alt="DB Console Max Checkpoint Latency graph" style="border:1px solid #eee;max-width:100%" />

<img src="{{ 'images/v23.2/ui_sink_timings.png' | relative_url }}" alt="DB Console Sink Timings graph" style="border:1px solid #eee;max-width:100%" />
{{site.data.alerts.callout_info}}
The maximum checkpoint latency is distinct from, and slower than, the commit latency for individual change messages. For more information about resolved timestamps, refer to the [Changefeed Messages]({% link {{ page.version.version }}/changefeed-messages.md %}#resolved-messages) page.
{{site.data.alerts.end}}

## Changefeed Restarts

This graph displays the number of times changefeeds restarted due to [retryable errors]({% link {{ page.version.version }}/monitor-and-debug-changefeeds.md %}#changefeed-retry-errors).

<img src="{{ 'images/v23.2/ui-changefeed-restarts.png' | relative_url }}" alt="DB Console Changefeed Restarts graph" style="border:1px solid #eee;max-width:100%" />

Metric | Description
--------|----
**Message Emit Time** | The time in milliseconds per second required by CockroachDB to send messages to the sink for changefeeds.
**Flush Time** | The time in milliseconds per second that CockroachDB spent waiting for the sink to flush the messages for changefeeds.
**Retryable Errors** | The number of retryable errors that changefeeds are encountering. That is, an error the changefeed will automatically retry.

## Changefeed Restarts
## Oldest Protected Timestamp

This graph displays the oldest [protected timestamp]({% link {{ page.version.version }}/architecture/storage-layer.md %}#protected-timestamps) of any running changefeed on the cluster.

<img src="{{ 'images/v23.2/ui-oldest-protected-timestamp.png' | relative_url }}" alt="DB Console Oldest Protected Timestamp graph" style="border:1px solid #eee;max-width:100%" />

Metric | Description
--------|----
**Protected Timestamp Age** | The age of the oldest protected timestamp (of any changefeed) that is protecting data from being [garbage collected]({% link {{ page.version.version }}/architecture/storage-layer.md %}#garbage-collection).

## Backfill Pending Ranges

This graph displays the number of ranges being backfilled that are yet to enter the changefeed pipeline. An [initial scan]({% link {{ page.version.version }}/create-changefeed.md %}#initial-scan) or [schema change]({% link {{ page.version.version }}/online-schema-changes.md %}) can cause a backfill.

<img src="{{ 'images/v23.2/ui-backfill-pending-ranges.png' | relative_url }}" alt="DB Console Backfill Pending Ranges graph" style="border:1px solid #eee;max-width:100%" />

Metric | Description
--------|----
**Backfill Pending Ranges** | The number of ranges in the backfill process that have not yet entered the changefeed stream.

## Schema Registry Registrations

This graph displays the rate of schema registration requests made by CockroachDB nodes to a configured schema registry endpoint. For example, a [Kafka sink]({% link {{ page.version.version }}/changefeed-sinks.md %}#kafka) pointing to a [Confluent Schema Registry]({% link {{ page.version.version }}/stream-a-changefeed-to-a-confluent-cloud-kafka-cluster.md %}).

This graph displays the number of times changefeeds restarted due to retryable errors.
<img src="{{ 'images/v23.2/ui-schema-registry-registrations.png' | relative_url }}" alt="DB Console Schema Registry Registrations graph" style="border:1px solid #eee;max-width:100%" />

Metric | Description
--------|----
**Schema Registry Registrations** | The number of registrations to a downstream schema registry.

<img src="{{ 'images/v23.2/ui_changefeed_restarts.png' | relative_url }}" alt="DB Console Changefeed Restarts graph" style="border:1px solid #eee;max-width:100%" />
## Ranges in catchup mode

This graph displays the total number of ranges with an active [rangefeed]({% link {{ page.version.version }}/create-and-configure-changefeeds.md %}#enable-rangefeeds) that is performing a catchup scan.

<img src="{{ 'images/v23.2/ui-ranges-in-catchup-mode.png' | relative_url }}" alt="DB Console Ranges in Catchup Mode graph" style="border:1px solid #eee;max-width:100%" />

Metric | Description
--------|----
**Ranges** | The number of ranges that are performing a catchup scan.

## Rangefeed Catchup Scans Duration

This graph displays the duration of catchup scans that changefeeds are performing.

<img src="{{ 'images/v23.2/ui-rangefeed-catchup-scans-duration.png' | relative_url }}" alt="DB Console Rangefeed Catchup Scans Duration graph" style="border:1px solid #eee;max-width:100%" />

Metric | Description
--------|----
**(Node Hostname)** | The duration of the catchup scan displayed per node.

{% include {{ page.version.version }}/ui/ui-summary-events.md %}

Expand Down

0 comments on commit 986ca68

Please sign in to comment.