Skip to content

Commit

Permalink
Merge pull request #1197 from elementary-data/ele-1790-period-param-r…
Browse files Browse the repository at this point in the history
…enaming-finalization-docs

ELE-1790: Renaming, Detection Delay Docs
  • Loading branch information
dapollak authored Nov 12, 2023
2 parents 92db102 + fd66a5e commit 84d2cc5
Show file tree
Hide file tree
Showing 15 changed files with 178 additions and 147 deletions.
58 changes: 0 additions & 58 deletions docs/guides/anomaly-detection-configuration/backfill-days.mdx

This file was deleted.

64 changes: 0 additions & 64 deletions docs/guides/anomaly-detection-configuration/days-back.mdx

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ detection_delay:
```

The duration for retracting the detection period.
That's useful in cases which the latest data should be excluded from the test. For example, this can happen because of scheduling issues- if the test is running before the table is populated for some reason. The detection delay is the period of time to ignore, after the detection period.

- _Default: 0_
- _Relevant tests: Anomaly detection tests with `timestamp_column`_
Expand Down Expand Up @@ -50,4 +51,4 @@ vars:
#### How it works?
The `detection_delay` param only works for tests that have `timestamp_column` configuration.
It does not affect the other duration parameters, like `backfill_days` or `days_back`.
It does not affect the other duration parameters, like `detection_period` or `training_period`.
63 changes: 63 additions & 0 deletions docs/guides/anomaly-detection-configuration/detection-period.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
---
title: "detection_period"
sidebarTitle: "detection_period"
---

```
detection_period:
period: < time period > # supported periods: day, week, month
count: < number of periods >
```

Configuration to define the detection period.
If the detection_period are set to 2 days, only data points in the last 2 days will be included in the detection period and could be flagged anomalous.
If detection_period is set to 7 days, the detection period will be 7 days long.

For incremental models, this is also the period for re-calculating metrics.
If metrics for buckets in the backfill days were already calculated, Elementary will overwrite them. The reason behind it is to monitor recent backfills of data, if there were any.
This configuration should be changed according to your data delays.

- _Default: 2 days_
- _Relevant tests: Anomaly detection tests with `timestamp_column`_

<img src="/pics/anomalies/detection-period.png" alt="Detection Period" />

<RequestExample>

```yml test
models:
- name: this_is_a_model
tests:
- elementary.volume_anomalies:
detection_period:
period: day
count: 30
```
```yml model
models:
- name: this_is_a_model
config:
elementary:
detection_period:
period: month
count: 1
```
```yml dbt_project.yml
vars:
detection_period:
period: week
count: 2
```
</RequestExample>
#### How it works?
The `detection_period` param only works for tests that have `timestamp_column` configuration.

It works differently according to the table materialization:

- **Regular tables and views** - `detection_period` defines the detection period.
- **Incremental models and sources** - `detection_period` defines the detection period, and the period for which metrics will be re-calculated.
69 changes: 69 additions & 0 deletions docs/guides/anomaly-detection-configuration/training-period.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
---
title: "training_period"
sidebarTitle: "training_period"
---

```
training_period:
period: < time period > # supported periods: day, week, month
count: < number of periods >
```

The maximal timeframe for which the test will collect data.
This timeframe includes the training period and detection period. If a detection delay is defined, the whole training period is being delayed.

- _Default: 14 days_
- _Relevant tests: Anomaly detection tests with `timestamp_column`_

<img src="/pics/anomalies/training-period.png" alt="Training Period" />

<RequestExample>

```yml test
models:
- name: this_is_a_model
tests:
- elementary.volume_anomalies:
training_period:
period: day
count: 30
```
```yml model
models:
- name: this_is_a_model
config:
elementary:
detection_delay:
period: week
count: 1
```
```yml dbt_project.yml
vars:
detection_delay:
period: month
count: 1
```
</RequestExample>
#### How it works?
The `training_period` param only works for tests that have `timestamp_column` configuration.

It works differently according to the table materialization:

- **Regular tables and views** - The values of the full `training_period` period is calculated on each run.
- **Incremental models and sources** - The values of the full `training_period` period is calculated on the first test run, and on full refresh. The following test runs will only calculate the values of the `detection_period` period.

**Changes from default:**

- **Full time buckets** - Elementary will increase the `training_period` automatically to insure full time buckets. For example if the `time_bucket` of the test is `period: week`, and 14 `training_period` result in Tuesday, the test will collect 2 more days back to complete a week (starting on Sunday).
- **Seasonality training set** - If seasonality is configured, Elementary will increase the `training_period` automatically to ensure there are enough training set values to calculate an anomaly. For example if the `seasonality` of the test is `day_of_week`, `training_period` will be increased to ensure enough Sundays, Mondays, Tuesdays, etc. to calculate an anomaly for each.

#### The impact of changing `training_period`

If you **increase `training_period`** your test training set will be larger. This means a larger sample size for calculating the expected range, which should make the test less sensitive to outliers. This means less chance of false positive anomalies, but also less sensitivity so anomalies have a higher threshold.

If you **decrease `training_period`** your test training set will be smaller. This means a smaller sample size for calculating the expected range, which might make the test more sensitive to outliers. This means more chance of false positive anomalies, but also more sensitivity as anomalies have a lower threshold.
9 changes: 6 additions & 3 deletions docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -29,9 +29,12 @@ No mandatory configuration, however it is highly recommended to configure a `tim
&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/where-expression"><font color="#CD7D55">where_expression: sql expression</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/anomaly-sensitivity"><font color="#CD7D55">anomaly_sensitivity: int</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/anomaly-direction"><font color="#CD7D55">anomaly_direction: [both | spike | drop]</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/days-back"><font color="#CD7D55">days_back: int</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/backfill-days"><font color="#CD7D55">backfill_days: int</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/min-training-set-size"><font color="#CD7D55">min_training_set_size: int</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/training-period"><font color="#CD7D55">training_period: int</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/training-period"><font color="#CD7D55">period: [hour | day | week | month]</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/training-period"><font color="#CD7D55">count: int</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/detection-period"><font color="#CD7D55">detection_period: int</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/detection-period"><font color="#CD7D55">period: [hour | day | week | month]</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/detection-period"><font color="#CD7D55">count: int</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/time-bucket"><font color="#CD7D55">time_bucket:</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/time-bucket"><font color="#CD7D55">period: [hour | day | week | month]</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/time-bucket"><font color="#CD7D55">count: int</font></a>
Expand Down
9 changes: 6 additions & 3 deletions docs/guides/anomaly-detection-tests/column-anomalies.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,12 @@ No mandatory configuration, however it is highly recommended to configure a `tim
&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/where-expression"><font color="#CD7D55">where_expression: sql expression</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/anomaly-sensitivity"><font color="#CD7D55">anomaly_sensitivity: int</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/anomaly-direction"><font color="#CD7D55">anomaly_direction: [both | spike | drop]</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/days-back"><font color="#CD7D55">days_back: int</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/backfill-days"><font color="#CD7D55">backfill_days: int</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/min-training-set-size"><font color="#CD7D55">min_training_set_size: int</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/training-period"><font color="#CD7D55">training_period: int</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/training-period"><font color="#CD7D55">period: [hour | day | week | month]</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/training-period"><font color="#CD7D55">count: int</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/detection-period"><font color="#CD7D55">detection_period: int</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/detection-period"><font color="#CD7D55">period: [hour | day | week | month]</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/detection-period"><font color="#CD7D55">count: int</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/time-bucket"><font color="#CD7D55">time_bucket:</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/time-bucket"><font color="#CD7D55">period: [hour | day | week | month]</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/time-bucket"><font color="#CD7D55">count: int</font></a>
Expand Down
9 changes: 6 additions & 3 deletions docs/guides/anomaly-detection-tests/dimension-anomalies.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,12 @@ _Required configuration: `dimensions`_
&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/where-expression"><font color="#CD7D55">where_expression: sql expression</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/anomaly-sensitivity"><font color="#CD7D55">anomaly_sensitivity: int</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/anomaly-direction"><font color="#CD7D55">anomaly_direction: [both | spike | drop]</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/days-back"><font color="#CD7D55">days_back: int</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/backfill-days"><font color="#CD7D55">backfill_days: int</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/min-training-set-size"><font color="#CD7D55">min_training_set_size: int</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/training-period"><font color="#CD7D55">training_period: int</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/training-period"><font color="#CD7D55">period: [hour | day | week | month]</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/training-period"><font color="#CD7D55">count: int</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/detection-period"><font color="#CD7D55">detection_period: int</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/detection-period"><font color="#CD7D55">period: [hour | day | week | month]</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/detection-period"><font color="#CD7D55">count: int</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/time-bucket"><font color="#CD7D55">time_bucket:</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/time-bucket"><font color="#CD7D55">period: [hour | day | week | month]</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/time-bucket"><font color="#CD7D55">count: int</font></a>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,12 @@ _Default configuration: `anomaly_direction: spike` to alert only on delays._
&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/update_timestamp_column"><font color="#CD7D55">update_timestamp_column: column name</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/where-expression"><font color="#CD7D55">where_expression: sql expression</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/anomaly-sensitivity"><font color="#CD7D55">anomaly_sensitivity: int</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/days-back"><font color="#CD7D55">days_back: int</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/backfill-days"><font color="#CD7D55">backfill_days: int</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/min-training-set-size"><font color="#CD7D55">min_training_set_size: int</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/training-period"><font color="#CD7D55">training_period: int</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/training-period"><font color="#CD7D55">period: [hour | day | week | month]</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/training-period"><font color="#CD7D55">count: int</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/detection-period"><font color="#CD7D55">detection_period: int</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/detection-period"><font color="#CD7D55">period: [hour | day | week | month]</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/detection-period"><font color="#CD7D55">count: int</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/time-bucket"><font color="#CD7D55">time_bucket:</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/time-bucket"><font color="#CD7D55">period: [hour | day | week | month]</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="/guides/anomaly-detection-configuration/time-bucket"><font color="#CD7D55">count: int</font></a>
Expand Down
Loading

0 comments on commit 84d2cc5

Please sign in to comment.