Skip to content

Commit

Permalink
updates to options for detect and propogate schema changes
Browse files Browse the repository at this point in the history
  • Loading branch information
Adorism committed May 17, 2024
1 parent 1c3a6c4 commit 007395c
Show file tree
Hide file tree
Showing 5 changed files with 109 additions and 4 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ You can configure the following settings:
| [Schedule Type](/using-airbyte/core-concepts/sync-schedules.md) | How often data syncs (can be scheduled, cron, API-triggered or manual) |
| [Destination Namespace](/using-airbyte/core-concepts/namespaces.md) | Where the replicated data is written to in the destination |
| Destination Stream Prefix | A prefix added to each table name in the destination |
| [Detect and propagate schema changes](/cloud/managing-airbyte-cloud/manage-schema-changes.md) | How Airbyte handles schema changes in the source |
| [Detect and propagate schema changes](using-airbyte/schema-change-management.md) | How Airbyte handles schema changes in the source |
| [Connection Data Residency](/cloud/managing-airbyte-cloud/manage-data-residency.md) | Where data will be processed (Cloud only) |

## Modify Streams
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ If a sync starts to fail, it will automatically be disabled after multiple conse

If a new major version of the connector has been released, you will also see a banner on this page indicating the cutoff date for the version. Airbyte recommends upgrading before the cutoff date to ensure your data continues syncing. If you do not upgrade before the cutoff date, Airbyte will automatically disable your connection.

Learn more about version upgrades in our [resolving breaking change documentation](/cloud/managing-airbyte-cloud/manage-schema-changes#resolving-breaking-changes).
Learn more about version upgrades in our [resolving breaking change documentation](/using-airbyte/schema-change-management.md#resolving-breaking-changes).

## Review the stream status

Expand Down
2 changes: 1 addition & 1 deletion docs/using-airbyte/core-concepts/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ A connection is an automated data pipeline that replicates data from a source to
| [Sync Mode](/using-airbyte/core-concepts/sync-modes/README.md) | How should the streams be replicated (read and written)? |
| [Sync Schedule](/using-airbyte/core-concepts/sync-schedules.md) | When should a data sync be triggered? |
| [Destination Namespace and Stream Prefix](/using-airbyte/core-concepts/namespaces.md) | Where should the replicated data be written? |
| [Schema Propagation](/cloud/managing-airbyte-cloud/manage-schema-changes.md) | How should Airbyte handle schema drift in sources? |
| [Schema Propagation](using-airbyte/schema-change-management.md) | How should Airbyte handle schema drift in sources? |

## Stream

Expand Down
105 changes: 105 additions & 0 deletions docs/using-airbyte/schema-change-management.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
---
products: all
---

# Schema Change Management

You can specify for each connection how Airbyte should handle any change of schema in the source. This process helps ensure accurate and efficient data syncs, minimizing errors and saving you time and effort in managing your data pipelines.

## Types of Schema Changes

With propogation enabled, data in the destination will automatically shift as you bring in changes.

| Type ofSchema Change | Propagation Behavior |
| ------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| New Column | The new colummn will be created in the destination. Values for the column will be filled in for the updated rows. If you are missing values for rows not updated, a backfill can be done by completing a full resync or through the `Backfill new or renamed columns` option (see below) |
| Removal of column | The old column will be removed from the destination. |
| New stream | The first sync will create the new stream in the destination and fill all data in as if it is an initial sync. |
| Removal of stream | The stream will stop updating, and any existing data in the destination will remain. |
| Column data type changes | The data in the destination will remain the same. For those syncing on a Destinations V2 destination, any new or updated rows with incompatible data types will result in a row error in the destination tables and show an error in the `airbyte_meta` field. You will need to refresh the schema and do a full resync to ensure the data types are consistent. |

## Detect and Propagate Schema Changes

Airbyte offers some options for how it should **Detect and propagate schema changes**. Depending on how you configure it, Airbyte will automatically sync those changes or ignore them. The following options are available:

| Setting | Description |
| ------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| Propagate all field and stream changes | All new streams and column changes from the source will automatically be propagated and reflected in the destination. This includes stream changes (additions or deletions), column changes (additions or deletions) and data type changes |
| Propagate field changes only | Only column changes will be propagated. New or removed streams will be ignored. |
| Approve all changes myself | This allows you to detect and manually approve changes. Schema changes will be detected, but not propagated. Syncs will continue running with the schema you've set up. To propagate the detected schema changes, you will need to approve the changes manually |
| Stop future syncs | Connections will be automatically paused as soon as any schema changes are detected |

Airbyte currently checks for any changes in your source schema immediately before syncing, at most once every 24 hours. This means that your schema may not always be propagated before your sync.

:::tip
Ensure you receive schema notifications for your connection by enabling notifications in the connection's settings.
:::

In all cases, if a breaking schema change is detected, the connection will be paused immediately for manual review to prevent future syncs from failing. Breaking schema changes occur when:

- An existing primary key is removed from the source
- An existing cursor is removed from the source

To re-enable the streams, ensure the correct **Primary Key** and **Cursor** are selected for each stream and save the connection. You will be prompted to clear the affected streams so that Airbyte can ensure future syncs are successful.

### Backfill new or renamed columns

To further automate the propagation of schema changes, Airbyte also offers the option to backfill new or renamed columns as a part of the sync. This means that anytime a new column is detected through the auto-propagation of schema changes, Airbyte will sync the entire stream again so that all values in the new columns will be completely filled, even if the row was not updated. If this option is not enabled, only rows that are updated as a part of the regular sync will be populated with a value.

This feature will only perform the backfill when `Detect and propagate schema changes` is set to `Propagate all changes` or `Propagate columns changes only` and Airbyte detects the schema change as a part of a sync. Refreshing the schema manually and applying schema changes will not allow the backfill to occur.

:::tip
Enabling automatic backfills may incur increased destination costs from refreshing the entire stream.
:::

For Cloud users, any stream that contains a new or renamed column will not be billed and the free usage will be noted on the billing page. Streams that are synced in the same sync and do not contain a new or renamed column will be billed as usual.

## Review non-breaking schema changes

If the connection is set to **Detect any changes and manually approve** schema changes, Airbyte continues syncing according to your last saved schema. You need to manually approve any detected schema changes for the schema in the destination to change.

1. In the Airbyte UI, click **Connections**. Select a connection and navigate to the **Schema** tab. If schema changes are detected, you'll see a blue "i" icon next to the Replication ab.

2. Click **Review changes**.

3. The **Refreshed source schema** dialog displays the changes detected.

4. Review the changes and click **OK** to close the dialog.

5. Scroll to the bottom of the page and click **Save changes**.

## Resolving breaking changes

Breaking changes require your attention to resolve. They may immediately cause the connection to be disabled if your source changed. When a breaking change occurs due to a new major connector version, you can upgrade the connector manually within a time period once reviewing the changes.

A connection will always automatically be disabled if an existing primary key or cursor field is removed. You must review and fix the changes before editing the connection or resuming syncs.

Breaking changes can also occur when a new major version of the connector is released. In these cases, the connection will alert you of a breaking change but continue to sync until the cutoff date for upgrade. On the cutoff date, the connection will automatically be disabled on that date to prevent failure or unexpected behavior. It is **highly recommended** to upgrade before the cutoff date to ensure you continue syncing without interruption.

A major version upgrade will include a breaking change if any of these apply:

| Type of Change | Description |
| ----------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------- |
| Connector Spec Change | The configuration has been changed and syncs will fail until users reconfigure or re-authenticate. |
| Schema Change | The type of property previously present within a record has changed and a refresh of the source schema is required. |
| Stream or Property Removal | Data that was previously being synced is no longer going to be synced |
| Destination Format / Normalization Change | The way the destination writes the final data or how Airbyte cleans that data is changing in a way that requires a full refresh |
| State Changes | The format of the source’s state has changed, and the full dataset will need to be re-synced |

To review and fix breaking schema changes:

1. In the Airbyte UI, click **Connections** and select the connection with breaking changes.

2. Review the description of what has changed in the new version. The breaking change will require you to upgrade your source or destination to a new version by a specific cutoff date.

3. Update the source or destination to the new version to continue syncing. Follow the connector-specific migration guide to ensure your connections continue syncing successfully.

### Manually refresh the source schema

In addition to Airbyte's automatic schema change detection, you can manually refresh the source schema to stay up to date with changes in your schema. To manually refresh the source schema:

1. In the Airbyte UI, click **Connections** and then click the connection you want to refresh. Click the **Schema** tab.

2. In the **Select streams** table, click **Refresh source schema** to fetch the schema of your data source.

3. If there are changes to the schema, you can review them in the **Refreshed source schema** dialog.
2 changes: 1 addition & 1 deletion docusaurus/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -458,7 +458,7 @@ module.exports = {
"using-airbyte/core-concepts/basic-normalization"
],
},
"cloud/managing-airbyte-cloud/manage-schema-changes",
"using-airbyte/schema-change-management",
{
type: "category",
label: "Transformations",
Expand Down

0 comments on commit 007395c

Please sign in to comment.