Debezium migration (#17945)

* placeholders * drafting * dustin's revision * shell and clipboard * review * jdbc link
cockroachdb · Oct 10, 2023 · 8fc4b42 · 8fc4b42
1 parent efa06de
commit 8fc4b42
Show file tree

Hide file tree

Showing 6 changed files with 202 additions and 0 deletions.
diff --git a/src/current/_includes/v23.1/misc/tooling.md b/src/current/_includes/v23.1/misc/tooling.md
@@ -93,6 +93,7 @@ For a list of tools supported by the CockroachDB community, see [Third-Party Too
 | [Qlik Replicate](https://www.qlik.com/us/products/qlik-replicate) | November 2022 | Full | [Migrate and Replicate Data with Qlik Replicate]({% link {{ page.version.version }}/qlik.md %})
 | [Striim](https://www.striim.com) | 4.1.2 | Full | [Migrate and Replicate Data with Striim]({% link {{ page.version.version }}/striim.md %})
 | [Oracle GoldenGate](https://www.oracle.com/integration/goldengate/) | 21.3 | Partial | [Migrate and Replicate Data with Oracle GoldenGate]({% link {{ page.version.version }}/goldengate.md %})
+| [Debezium](https://debezium.io/) | 2.4 | Full | [Migrate Data with Debezium]({% link {{ page.version.version }}/debezium.md %})
 
 ## Provisioning tools
 | Tool | Latest tested version | Support level | Documentation |

diff --git a/src/current/_includes/v23.1/sidebar-data/migrate.json b/src/current/_includes/v23.1/sidebar-data/migrate.json
@@ -52,6 +52,12 @@
           "urls": [
             "/${VERSION}/goldengate.html"
           ]
+        },
+        {
+          "title": "Debezium",
+          "urls": [
+            "/${VERSION}/debezium.html"
+          ]
         }
       ]
     },

diff --git a/src/current/_includes/v23.2/misc/tooling.md b/src/current/_includes/v23.2/misc/tooling.md
@@ -93,6 +93,7 @@ For a list of tools supported by the CockroachDB community, see [Third-Party Too
 | [Qlik Replicate](https://www.qlik.com/us/products/qlik-replicate) | November 2022 | Full | [Migrate and Replicate Data with Qlik Replicate]({% link {{ page.version.version }}/qlik.md %})
 | [Striim](https://www.striim.com) | 4.1.2 | Full | [Migrate and Replicate Data with Striim]({% link {{ page.version.version }}/striim.md %})
 | [Oracle GoldenGate](https://www.oracle.com/integration/goldengate/) | 21.3 | Partial | [Migrate and Replicate Data with Oracle GoldenGate]({% link {{ page.version.version }}/goldengate.md %})
+| [Debezium](https://debezium.io/) | 2.4 | Full | [Migrate Data with Debezium]({% link {{ page.version.version }}/debezium.md %})
 
 ## Provisioning tools
 | Tool | Latest tested version | Support level | Documentation |

diff --git a/src/current/_includes/v23.2/sidebar-data/migrate.json b/src/current/_includes/v23.2/sidebar-data/migrate.json
@@ -52,6 +52,12 @@
           "urls": [
             "/${VERSION}/goldengate.html"
           ]
+        },
+        {
+          "title": "Debezium",
+          "urls": [
+            "/${VERSION}/debezium.html"
+          ]
         }
       ]
     },

diff --git a/src/current/v23.1/debezium.md b/src/current/v23.1/debezium.md
@@ -0,0 +1,94 @@
+---
+title: Migrate Data with Debezium
+summary: Use Debezium to migrate data to a CockroachDB cluster.
+toc: true
+docs_area: migrate
+---
+
+[Debezium](https://debezium.io/) is a self-hosted distributed platform that can read data from a variety of sources and import it into Kafka. You can use Debezium to [migrate data to CockroachDB](#migrate-data-to-cockroachdb) from another database that is accessible over the public internet.
+
+As of this writing, Debezium supports the following database [sources](https://debezium.io/documentation/reference/stable/connectors/index.html):
+
+- MongoDB
+- MySQL
+- PostgreSQL
+- SQL Server
+- Oracle
+- Db2
+- Cassandra
+- Vitess (incubating)
+- Spanner (incubating)
+- JDBC (incubating)
+
+{{site.data.alerts.callout_info}}
+Migrating with Debezium requires familiarity with Kafka. Refer to the [Debezium documentation](https://debezium.io/documentation/reference/stable/architecture.html) for information on how Debezium is deployed with Kafka Connect.
+{{site.data.alerts.end}}
+
+## Before you begin
+
+Complete the following items before using Debezium:
+
+- Configure a secure [publicly-accessible]({% link cockroachcloud/network-authorization.md %}) CockroachDB cluster running the latest **{{ page.version.version }}** [production release](https://www.cockroachlabs.com/docs/releases/{{ page.version.version }}) with at least one [SQL user]({% link {{ page.version.version }}/security-reference/authorization.md %}#sql-users), make a note of the credentials for the SQL user.
+- Install and configure [Debezium](https://debezium.io/), [Kafka Connect](https://docs.confluent.io/platform/current/connect/index.html), and [Kafka](https://kafka.apache.org/). This documentation assumes you have already added data from your [source database](https://debezium.io/documentation/reference/stable/connectors/index.html) to a Kafka topic.
+
+## Migrate data to CockroachDB
+
+Once all of the [prerequisite steps](#before-you-begin) are completed, you can use Debezium to migrate data to CockroachDB.
+
+1. To write data from Kafka to CockroachDB, use the Confluent JDBC Sink Connector. First use the following `dockerfile` to create a custom image with the [JDBC driver](https://www.confluent.io/hub/confluentinc/kafka-connect-jdbc):
+
+    {% include_cached copy-clipboard.html %}
+    ~~~
+    FROM quay.io/debezium/connect:latest
+    ENV KAFKA_CONNECT_JDBC_DIR=$KAFKA_CONNECT_PLUGINS_DIR/kafka-connect-jdbc \
+
+
+    ARG POSTGRES_VERSION=latest
+    ARG KAFKA_JDBC_VERSION=latest
+
+
+    # Deploy PostgreSQL JDBC Driver
+    RUN cd /kafka/libs && curl -sO https://jdbc.postgresql.org/download/postgresql-$POSTGRES_VERSION.jar
+
+
+    # Deploy Kafka Connect JDBC
+    RUN mkdir $KAFKA_CONNECT_JDBC_DIR && cd $KAFKA_CONNECT_JDBC_DIR &&\
+       curl -sO https://packages.confluent.io/maven/io/confluent/kafka-connect-jdbc/$KAFKA_JDBC_VERSION/kafka-connect-jdbc-$KAFKA_JDBC_VERSION.jar
+    ~~~
+
+1. Create the JSON configuration file that you will use to create the sink:
+
+    {% include_cached copy-clipboard.html %}
+    ~~~ shell
+    {
+       "name": "pg-sink",
+       "config": {
+           "connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector", 
+           "tasks.max": "10",
+           "topics" : "{topic.example.table}",
+           "connection.url": "jdbc:postgresql://{host}:{port}/{user}?sslmode=require",
+           "connection.user": "{username}",
+           "connection.password": "{password}",
+           "insert.mode": "upsert",
+           "pk.mode": "record_value",
+           "pk.fields":"id",
+           "database.time_zone": "UTC",
+           "auto.create":true,
+           "auto.evolve": false,
+           "transforms": "unwrap",
+           "transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState"
+       }
+    }
+    ~~~
+    
+    Specify the **Connection URL** in [JDBC format]({% link {{ page.version.version }}/connect-to-the-database.md %}?filters=java&#step-5-connect-to-the-cluster). For information about where to find the CockroachDB connection parameters, see [Connect to a CockroachDB Cluster]({% link {{ page.version.version }}/connect-to-the-database.md %}).
+    
+1. To create the sink, `POST` the JSON configuration file to the Kafka Connect `/connectors` endpoint. Refer to the [Kafka Connect API documentation](https://kafka.apache.org/documentation/#connect_rest) for more information.
+
+## See also
+
+- [Migration Overview]({% link {{ page.version.version }}/migration-overview.md %})
+- [Schema Conversion Tool](https://www.cockroachlabs.com/docs/cockroachcloud/migrations-page)
+- [Change Data Capture Overview]({% link {{ page.version.version }}/change-data-capture-overview.md %})
+- [Third-Party Tools Supported by Cockroach Labs]({% link {{ page.version.version }}/third-party-database-tools.md %})
+- [Stream a Changefeed to a Confluent Cloud Kafka Cluster]({% link {{ page.version.version }}/stream-a-changefeed-to-a-confluent-cloud-kafka-cluster.md %})
diff --git a/src/current/v23.2/debezium.md b/src/current/v23.2/debezium.md
@@ -0,0 +1,94 @@
+---
+title: Migrate Data with Debezium
+summary: Use Debezium to migrate data to a CockroachDB cluster.
+toc: true
+docs_area: migrate
+---
+
+[Debezium](https://debezium.io/) is a self-hosted distributed platform that can read data from a variety of sources and import it into Kafka. You can use Debezium to [migrate data to CockroachDB](#migrate-data-to-cockroachdb) from another database that is accessible over the public internet.
+
+As of this writing, Debezium supports the following database [sources](https://debezium.io/documentation/reference/stable/connectors/index.html):
+
+- MongoDB
+- MySQL
+- PostgreSQL
+- SQL Server
+- Oracle
+- Db2
+- Cassandra
+- Vitess (incubating)
+- Spanner (incubating)
+- JDBC (incubating)
+
+{{site.data.alerts.callout_info}}
+Migrating with Debezium requires familiarity with Kafka. Refer to the [Debezium documentation](https://debezium.io/documentation/reference/stable/architecture.html) for information on how Debezium is deployed with Kafka Connect.
+{{site.data.alerts.end}}
+
+## Before you begin
+
+Complete the following items before using Debezium:
+
+- Configure a secure [publicly-accessible]({% link cockroachcloud/network-authorization.md %}) CockroachDB cluster running the latest **{{ page.version.version }}** [production release](https://www.cockroachlabs.com/docs/releases/{{ page.version.version }}) with at least one [SQL user]({% link {{ page.version.version }}/security-reference/authorization.md %}#sql-users), make a note of the credentials for the SQL user.
+- Install and configure [Debezium](https://debezium.io/), [Kafka Connect](https://docs.confluent.io/platform/current/connect/index.html), and [Kafka](https://kafka.apache.org/). This documentation assumes you have already added data from your [source database](https://debezium.io/documentation/reference/stable/connectors/index.html) to a Kafka topic.
+
+## Migrate data to CockroachDB
+
+Once all of the [prerequisite steps](#before-you-begin) are completed, you can use Debezium to migrate data to CockroachDB.
+
+1. To write data from Kafka to CockroachDB, use the Confluent JDBC Sink Connector. First use the following `dockerfile` to create a custom image with the [JDBC driver](https://www.confluent.io/hub/confluentinc/kafka-connect-jdbc):
+
+    {% include_cached copy-clipboard.html %}
+    ~~~
+    FROM quay.io/debezium/connect:latest
+    ENV KAFKA_CONNECT_JDBC_DIR=$KAFKA_CONNECT_PLUGINS_DIR/kafka-connect-jdbc \
+
+
+    ARG POSTGRES_VERSION=latest
+    ARG KAFKA_JDBC_VERSION=latest
+
+
+    # Deploy PostgreSQL JDBC Driver
+    RUN cd /kafka/libs && curl -sO https://jdbc.postgresql.org/download/postgresql-$POSTGRES_VERSION.jar
+
+
+    # Deploy Kafka Connect JDBC
+    RUN mkdir $KAFKA_CONNECT_JDBC_DIR && cd $KAFKA_CONNECT_JDBC_DIR &&\
+       curl -sO https://packages.confluent.io/maven/io/confluent/kafka-connect-jdbc/$KAFKA_JDBC_VERSION/kafka-connect-jdbc-$KAFKA_JDBC_VERSION.jar
+    ~~~
+
+1. Create the JSON configuration file that you will use to create the sink:
+
+    {% include_cached copy-clipboard.html %}
+    ~~~ shell
+    {
+       "name": "pg-sink",
+       "config": {
+           "connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector", 
+           "tasks.max": "10",
+           "topics" : "{topic.example.table}",
+           "connection.url": "jdbc:postgresql://{host}:{port}/{user}?sslmode=require",
+           "connection.user": "{username}",
+           "connection.password": "{password}",
+           "insert.mode": "upsert",
+           "pk.mode": "record_value",
+           "pk.fields":"id",
+           "database.time_zone": "UTC",
+           "auto.create":true,
+           "auto.evolve": false,
+           "transforms": "unwrap",
+           "transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState"
+       }
+    }
+    ~~~
+    
+    Specify the **Connection URL** in [JDBC format]({% link {{ page.version.version }}/connect-to-the-database.md %}?filters=java&#step-5-connect-to-the-cluster). For information about where to find the CockroachDB connection parameters, see [Connect to a CockroachDB Cluster]({% link {{ page.version.version }}/connect-to-the-database.md %}).
+
+1. To create the sink, `POST` the JSON configuration file to the Kafka Connect `/connectors` endpoint. Refer to the [Kafka Connect API documentation](https://kafka.apache.org/documentation/#connect_rest) for more information.
+
+## See also
+
+- [Migration Overview]({% link {{ page.version.version }}/migration-overview.md %})
+- [Schema Conversion Tool](https://www.cockroachlabs.com/docs/cockroachcloud/migrations-page)
+- [Change Data Capture Overview]({% link {{ page.version.version }}/change-data-capture-overview.md %})
+- [Third-Party Tools Supported by Cockroach Labs]({% link {{ page.version.version }}/third-party-database-tools.md %})
+- [Stream a Changefeed to a Confluent Cloud Kafka Cluster]({% link {{ page.version.version }}/stream-a-changefeed-to-a-confluent-cloud-kafka-cluster.md %})