Skip to content

Commit

Permalink
Add resource-config pages for microbatch configs (#6575)
Browse files Browse the repository at this point in the history
## What are you changing in this pull request and why?
We had resource config page for `event_time` but not, `begin`,
`lookback`, or `batch_size` (which are all configs for microbatch
incremental models). This seemed like a gap. As such, I've added pages
for `begin`, `lookback`, and `batch_size` and also linked them in the
incremental-microbatch page.

## Checklist
- [ ] I have reviewed the [Content style
guide](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/content-style-guide.md)
so my content adheres to these guidelines.
- [ ] The topic I'm writing about is for specific dbt version(s) and I
have versioned it according to the [version a whole
page](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#adding-a-new-version)
and/or [version a block of
content](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#versioning-blocks-of-content)
guidelines.
- [ ] I have added checklist item(s) to this list for anything anything
that needs to happen before this PR is merged, such as "needs technical
review" or "change base branch."
- [ ] The content in this PR requires a dbt release note, so I added one
to the [release notes
page](https://docs.getdbt.com/docs/dbt-versions/dbt-cloud-release-notes).
<!--
PRE-RELEASE VERSION OF dbt (if so, uncomment):
- [ ] Add a note to the prerelease version [Migration
Guide](https://github.com/dbt-labs/docs.getdbt.com/tree/current/website/docs/docs/dbt-versions/core-upgrade)
-->
<!-- 
ADDING OR REMOVING PAGES (if so, uncomment):
- [ ] Add/remove page in `website/sidebars.js`
- [ ] Provide a unique filename for new pages
- [ ] Add an entry for deleted pages in `website/vercel.json`
- [ ] Run link testing locally with `npm run build` to update the links
that point to deleted pages
-->

---------

Co-authored-by: Mirna Wong <[email protected]>
Co-authored-by: Matt Shaver <[email protected]>
  • Loading branch information
3 people authored Dec 6, 2024
1 parent f1fe9f4 commit c274111
Show file tree
Hide file tree
Showing 5 changed files with 175 additions and 6 deletions.
12 changes: 6 additions & 6 deletions website/docs/docs/build/incremental-microbatch.md
Original file line number Diff line number Diff line change
Expand Up @@ -179,12 +179,12 @@ It does not matter whether the table already contains data for that day. Given t

Several configurations are relevant to microbatch models, and some are required:

| Config | Description | Default | Type | Required |
|----------|---------------|---------|------|---------|
| [`event_time`](/reference/resource-configs/event-time) | The column indicating "at what time did the row occur." Required for your microbatch model and any direct parents that should be filtered. | N/A | Column | Required |
| `begin` | The "beginning of time" for the microbatch model. This is the starting point for any initial or full-refresh builds. For example, a daily-grain microbatch model run on `2024-10-01` with `begin = '2023-10-01` will process 366 batches (it's a leap year!) plus the batch for "today." | N/A | Date | Required |
| `batch_size` | The granularity of your batches. Supported values are `hour`, `day`, `month`, and `year` | N/A | String | Required |
| `lookback` | Process X batches prior to the latest bookmark to capture late-arriving records. | `1` | Integer | Optional |
| Config | Type | Description | Default |
|----------|------|---------------|---------|
| [`event_time`](/reference/resource-configs/event-time) | Column (required) | The column indicating "at what time did the row occur." Required for your microbatch model and any direct parents that should be filtered. | N/A |
| [`begin`](/reference/resource-configs/begin) | Date (required) | The "beginning of time" for the microbatch model. This is the starting point for any initial or full-refresh builds. For example, a daily-grain microbatch model run on `2024-10-01` with `begin = '2023-10-01` will process 366 batches (it's a leap year!) plus the batch for "today." | N/A |
| [`batch_size`](/reference/resource-configs/batch-size) | String (required) | The granularity of your batches. Supported values are `hour`, `day`, `month`, and `year` | N/A |
| [`lookback`](/reference/resource-configs/lookback) | Integer (optional) | Process X batches prior to the latest bookmark to capture late-arriving records. | `1` |

<Lightbox src="/img/docs/building-a-dbt-project/microbatch/event_time.png" title="The event_time column configures the real-world time of this record"/>

Expand Down
56 changes: 56 additions & 0 deletions website/docs/reference/resource-configs/batch_size.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
---
title: "batch_size"
id: "batch-size"
sidebar_label: "batch_size"
resource_types: [models]
description: "dbt uses `batch_size` to determine how large batches are when running a microbatch incremental model."
datatype: hour | day | month | year
---

Available in dbt Cloud Versionless and dbt Core v1.9 and higher.

## Definition

The`batch_size` config determines how large batches are when running a microbatch. Accepted values are `hour`, `day`, `month`, or `year`. You can configure `batch_size` for a [model](/docs/build/models) in your `dbt_project.yml` file, property YAML file, or config block.

## Examples

The following examples set `day` as the `batch_size` for the `user_sessions` model.

Example of the `batch_size` config in the `dbt_project.yml` file:

<File name='dbt_project.yml'>

```yml
models:
my_project:
user_sessions:
+batch_size: day
```
</File>
Example in a properties YAML file:
<File name='models/properties.yml'>
```yml
models:
- name: user_sessions
config:
batch_size: day
```
</File>
Example in sql model config block:
<File name="models/user_sessions.sql">
```sql
{{ config(
lookback='day
) }}
```

</File>

55 changes: 55 additions & 0 deletions website/docs/reference/resource-configs/begin.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
---
title: "begin"
id: "begin"
sidebar_label: "begin"
resource_types: [models]
description: "dbt uses `begin` to determine when a microbatch incremental model should begin from. When defined on a micorbatch incremental model, `begin` is used as the lower time bound when the model is built for the first time or fully refreshed."
datatype: string
---

Available in dbt Cloud Versionless and dbt Core v1.9 and higher.

## Definition

Set the `begin` config to the timestamp value at which your microbatch model data should begin &mdash; at the point the data becomes relevant for the microbatch model. You can configure `begin` for a [model](/docs/build/models) in your `dbt_project.yml` file, property YAML file, or config block. The value for `begin` must be a string representing an ISO formatted date OR date and time.

## Examples

The following examples set `2024-01-01 00:00:00` as the `begin` config for the `user_sessions` model.

Example in the `dbt_project.yml` file:

<File name='dbt_project.yml'>

```yml
models:
my_project:
user_sessions:
+begin: "2024-01-01 00:00:00"
```
</File>
Example in a properties YAML file:
<File name='models/properties.yml'>
```yml
models:
- name: user_sessions
config:
begin: "2024-01-01 00:00:00"
```
</File>
Example in sql model config block:
<File name="models/user_sessions.sql">
```sql
{{ config(
begin='2024-01-01 00:00:00'
) }}
```

</File>
55 changes: 55 additions & 0 deletions website/docs/reference/resource-configs/lookback.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
---
title: "lookback"
id: "lookback"
sidebar_label: "lookback"
resource_types: [models]
description: "dbt uses `lookback` to detrmine how many 'batches' of `batch_size` to reprocesses when a microbatch incremental model is running incrementally."
datatype: int
---

Available in dbt Cloud Versionless and dbt Core v1.9 and higher.

## Definition

Set the `lookback` to an integer greater than or equal to zero. The default value is `1`. You can configure `lookback` for a [model](/docs/build/models) in your `dbt_project.yml` file, property YAML file, or config block.

## Examples

The following examples set `2` as the `lookback` config for the `user_sessions` model.

Example in the `dbt_project.yml` file:

<File name='dbt_project.yml'>

```yml
models:
my_project:
user_sessions:
+lookback: 2
```
</File>
Example in a properties YAML file:
<File name='models/properties.yml'>
```yml
models:
- name: user_sessions
config:
lookback: 2
```
</File>
Example in sql model config block:
<File name="models/user_sessions.sql">
```sql
{{ config(
lookback=2
) }}
```

</File>
3 changes: 3 additions & 0 deletions website/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -926,6 +926,8 @@ const sidebarSettings = {
items: [
"reference/resource-configs/access",
"reference/resource-configs/alias",
"reference/resource-configs/batch-size",
"reference/resource-configs/begin",
"reference/resource-configs/database",
"reference/resource-configs/enabled",
"reference/resource-configs/event-time",
Expand All @@ -934,6 +936,7 @@ const sidebarSettings = {
"reference/resource-configs/grants",
"reference/resource-configs/group",
"reference/resource-configs/docs",
"reference/resource-configs/lookback",
"reference/resource-configs/persist_docs",
"reference/resource-configs/pre-hook-post-hook",
"reference/resource-configs/schema",
Expand Down

0 comments on commit c274111

Please sign in to comment.