Skip to content

Commit

Permalink
Merge pull request #340 from dbt-labs/feature-allow-excluding-models-…
Browse files Browse the repository at this point in the history
…packages

Feature to allow excluding models and packages
  • Loading branch information
b-per authored May 10, 2023
2 parents e3fa5f6 + fd43832 commit 6dc05ba
Show file tree
Hide file tree
Showing 37 changed files with 188 additions and 13 deletions.
1 change: 1 addition & 0 deletions docs/contributing.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ Docs are then automatically pushed to the website as part of our CI/CD process.
"markdownlint.config": {
"ul-indent": {"indent": 4},
"MD036": false,
"MD046": false,
}
```

Expand Down
14 changes: 10 additions & 4 deletions docs/customization/customization.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,14 @@
# Disabling Models
# Disabling checks from the package

If there is a particular model or set of models that you *do not want this package to execute*, you can
disable these models as you would any other model in your `dbt_project.yml` file
!!! note

This section is describing how to completely deactivate tests from the package.
If you are looking to deactivate models/sources from being tested, you can look at [excluding packages and paths](excluding-packages-and-paths.md)

All the tests done as part of the package are tied to `fct` models.

If there is a particular test or set of tests that you *do not want this package to execute*, you can
disable the corresponding `fct` models as you would any other model in your `dbt_project.yml` file

``` yaml title="dbt_project.yml"
models:
Expand All @@ -14,5 +21,4 @@ models:
# disable single DAG model
fct_model_fanout:
+enabled: false

```
57 changes: 57 additions & 0 deletions docs/customization/excluding-packages-and-paths.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Excluding packages or sources/models based on their path

!!! note

This section is describing how to entirely exclude models/sources and packages to be evaluated.
If you want to document exceptions to the rules, see the section [on exceptions](exceptions.md)
and if you want to deactivate entire tests you can follow instructions from [this page](customization.md)

There might be cases where you want to exclude models/sources from being tested:

- they could come from a package for which you have no control over
- you might be refactoring your project and wanting to exclude entire folders to follow best-practices in the new models

In that case, this package provides the ability to exclude whole packages and/or models and sources based on their path

## Configuration

The variables `exclude_packages` and `exclude_paths_from_project` allow you to define a list of regex patterns to exclude from being reported as errors.

- `exclude_packages` accepts a list of package names to exclude from the tool. To exclude all packages except the current project, you can set it to `["all"]`
- `exclude_paths_from_project` accepts a list of regular expressions of paths to exclude for the current project
- **for models**, the regex provided will try to match the pattern in the string `<path/to/model.sql>`, allowing to exclude packages, but also whole folders or individual models
- **for sources**, the regex will try to match the pattern in `<path/to/sources.yml>:<source_name>.<source_table_name>` *(the pattern is different than for models because the path itself doesn't let us exclude individual sources)*

!!! note

We currently don't allow excluding metrics and exposures, as if those need to be entirely excluded they could be deactivated from the project.

If you have a specific use case requiring this ability, please raise a GitHub issue to explain the situation you'd like to solve and we can revisit this decision !

### Example to exclude a whole package

```yaml title="dbt_project.yml"
vars:
exclude_packages: ["upstream_package"]
```
### Example to exclude models/sources in a given path
```yaml title="dbt_project.yml"
vars:
exclude_paths_from_project: ["/models/legacy/"]
```
### Example to exclude both a package and models/sources in 2 different paths
```yaml title="dbt_project.yml"
vars:
exclude_packages: ["upstream_package"]
exclude_paths_from_project: ["/models/legacy/", "/my_date_spine.sql"]
```
## Tips and tricks
Regular expressions are very powerful but can become complex. After defining your value for `exclude_paths_from_project`, we recommend running the package and inspecting the model `int_all_graph_resources`, checking if the value in the column `is_excluded` matches your expectation.

A useful tool to debug regular expression is [regex101](https://regex101.com/). You can provide a pattern and a list of strings to see which ones actually match the pattern.
2 changes: 2 additions & 0 deletions integration_tests/dbt_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,8 @@ seeds:

vars:
# ensure integration tests run successfully when there are 0 of a given model type (extra)
exclude_packages: []
exclude_paths_from_project: ["/to_exclude/","source_3.table_6"]
model_types: ['staging', 'intermediate', 'marts', 'other', 'extra', 'new_model_type']
# dummy variable used for testing fct_hard_coded_references
my_table_reference: 'grace_table'
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
-- depends on: {{ ref('fct_model_9') }}

select 1 as id
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
-- depends on: {{ ref('fct_model_9') }}

select 1 as id
7 changes: 6 additions & 1 deletion integration_tests/models/staging/source_1/source.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,4 +17,9 @@ sources:
schema: real_schema_2
# database: real_database
tables:
- name: table_3
- name: table_3

- name: source_3
schema: real_schema_3
tables:
- name: table_6
13 changes: 12 additions & 1 deletion macros/recursive_dag.sql
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ all_relationships (
parent_file_path,
parent_directory_path,
parent_file_name,
parent_is_excluded,
child_id,
child,
child_resource_type,
Expand All @@ -34,6 +35,7 @@ all_relationships (
child_file_path,
child_directory_path,
child_file_name,
child_is_excluded,
distance,
path,
is_dependent_on_chain_of_views
Expand All @@ -49,6 +51,7 @@ all_relationships (
file_path as parent_file_path,
directory_path as parent_directory_path,
file_name as parent_file_name,
is_excluded as parent_is_excluded,
resource_id as child_id,
resource_name as child,
resource_type as child_resource_type,
Expand All @@ -58,6 +61,7 @@ all_relationships (
file_path as child_file_path,
directory_path as child_directory_path,
file_name as child_file_name,
is_excluded as child_is_excluded,
0 as distance,
{{ dbt.array_construct(['resource_name']) }} as path,
cast(null as boolean) as is_dependent_on_chain_of_views
Expand All @@ -78,6 +82,7 @@ all_relationships (
all_relationships.parent_file_path as parent_file_path,
all_relationships.parent_directory_path as parent_directory_path,
all_relationships.parent_file_name as parent_file_name,
all_relationships.parent_is_excluded as parent_is_excluded,
direct_relationships.resource_id as child_id,
direct_relationships.resource_name as child,
direct_relationships.resource_type as child_resource_type,
Expand All @@ -87,6 +92,7 @@ all_relationships (
direct_relationships.file_path as child_file_path,
direct_relationships.directory_path as child_directory_path,
direct_relationships.file_name as child_file_name,
direct_relationships.is_excluded as child_is_excluded,
all_relationships.distance+1 as distance,
{{ dbt.array_append('all_relationships.path', 'direct_relationships.resource_name') }} as path,
case
Expand Down Expand Up @@ -139,7 +145,8 @@ with direct_relationships as (
resource_id as parent_id,
resource_id as child_id,
resource_name,
materialized as child_materialized
materialized as child_materialized,
is_excluded as child_is_excluded
from direct_relationships
)

Expand All @@ -148,6 +155,7 @@ with direct_relationships as (
parent_id,
child_id,
child_materialized,
child_is_excluded,
0 as distance,
{{ dbt.array_construct(['resource_name']) }} as path,
cast(null as boolean) as is_dependent_on_chain_of_views
Expand All @@ -161,6 +169,7 @@ with direct_relationships as (
cte_{{i - 1}}.parent_id as parent_id,
direct_relationships.resource_id as child_id,
direct_relationships.materialized as child_materialized,
direct_relationships.is_excluded as child_is_excluded,
cte_{{i - 1}}.distance+1 as distance,
{{ dbt.array_append(prev_cte_path, 'direct_relationships.resource_name') }} as path,
case
Expand Down Expand Up @@ -200,6 +209,7 @@ with direct_relationships as (
parent.file_path as parent_file_path,
parent.directory_path as parent_directory_path,
parent.file_name as parent_file_name,
parent.is_excluded as parent_is_excluded,
child.resource_id as child_id,
child.resource_name as child,
child.resource_type as child_resource_type,
Expand All @@ -209,6 +219,7 @@ with direct_relationships as (
child.file_path as child_file_path,
child.directory_path as child_directory_path,
child.file_name as child_file_name,
child.is_excluded as child_is_excluded,
all_relationships_unioned.distance,
all_relationships_unioned.path,
all_relationships_unioned.is_dependent_on_chain_of_views
Expand Down
39 changes: 39 additions & 0 deletions macros/set_is_excluded.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
{% macro set_is_excluded(resource, resource_type) %}
{{ return(adapter.dispatch('set_is_excluded', 'dbt_project_evaluator')(resource, resource_type)) }}
{% endmacro %}

{% macro default__set_is_excluded(resource, resource_type) %}

{% set re = modules.re %}
{%- set ns = namespace(exclude=false) -%}

{% if resource_type == 'node' %}
{%- set resource_path = resource.original_file_path | replace("\\","\\\\") -%}
{% elif resource_type == 'source' %}
{%- set resource_path = resource.original_file_path | replace("\\","\\\\") ~ ":" ~ resource.fqn[-2] ~ "." ~ resource.fqn[-1] -%}
{% else %}
{{ exceptions.raise_compiler_error(
"`set_is_excluded()` macro does not support resource type: " ~ resource_type
) }}
{% endif %}


{#- we exclude the resource if it is from the current project and matches the pattern -#}
{%- for exclude_paths_pattern in var('exclude_paths_from_project',[]) -%}
{%- set matched_path = re.search(exclude_paths_pattern, resource_path, re.IGNORECASE) -%}
{%- if matched_path and resource.package_name == project_name %}
{% set ns.exclude = true %}
{%- endif -%}
{%- endfor -%}

{#- we exclude the resource if the package if it is listed in `exclude_packages` or if it is "all" -#}
{%- if (
resource.package_name != project_name)
and (resource.package_name in var('exclude_packages',[]) or 'all' in var('exclude_packages',[]))
-%}
{% set ns.exclude = true %}
{%- endif -%}

{{ return(ns.exclude) }}

{% endmacro %}
5 changes: 4 additions & 1 deletion macros/unpack/get_node_values.sql
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@
{%- for node in nodes_list -%}

{%- set hard_coded_references = dbt_project_evaluator.find_all_hard_coded_references(node) -%}
{%- set exclude_node = dbt_project_evaluator.set_is_excluded(node, resource_type="node") -%}


{%- set values_line =
[
Expand All @@ -30,7 +32,8 @@
wrap_string_with_quotes(node.meta | tojson),
wrap_string_with_quotes(dbt.escape_single_quotes(hard_coded_references)),
wrap_string_with_quotes(node.get('depends_on',{}).get('macros',[]) | tojson),
"cast(" ~ dbt_project_evaluator.is_not_empty_string(node.test_metadata) | trim ~ " as boolean)"
"cast(" ~ dbt_project_evaluator.is_not_empty_string(node.test_metadata) | trim ~ " as boolean)",
"cast(" ~ exclude_node ~ " as boolean)",
]
%}

Expand Down
5 changes: 4 additions & 1 deletion macros/unpack/get_source_values.sql
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@

{%- for node in nodes_list -%}

{%- set exclude_source = dbt_project_evaluator.set_is_excluded(node, resource_type="source") -%}

{%- set values_line =
[
wrap_string_with_quotes(node.unique_id),
Expand All @@ -27,7 +29,8 @@
wrap_string_with_quotes(node.package_name),
wrap_string_with_quotes(node.loader),
wrap_string_with_quotes(node.identifier),
wrap_string_with_quotes(node.meta | tojson)
wrap_string_with_quotes(node.meta | tojson),
"cast(" ~ exclude_source ~ " as boolean)",
]
%}

Expand Down
7 changes: 6 additions & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,17 @@ theme:
# Palette toggle for light mode
- media: "(prefers-color-scheme: light)"
scheme: default
primary: custom
accent: custom
toggle:
icon: material/brightness-7
name: Switch to dark mode

# Palette toggle for dark mode
- media: "(prefers-color-scheme: dark)"
scheme: slate
primary: custom
accent: custom
toggle:
icon: material/brightness-4
name: Switch to light mode
Expand Down Expand Up @@ -71,9 +75,10 @@ nav:
- Structure: rules/structure.md
- Performance: rules/performance.md
- Customization:
- Disabling models: customization/customization.md
- Overriding variables: customization/overriding-variables.md
- Disabling checks: customization/customization.md
- Configuring exceptions to the rules: customization/exceptions.md
- Excluding packages and models/sources based on path: customization/excluding-packages-and-paths.md
- Display issues in the logs: customization/issues-in-log.md
- Run in CI Check: ci-check.md
- Querying the DAG: querying-the-dag.md
Expand Down
3 changes: 2 additions & 1 deletion models/marts/core/int_all_graph_resources.sql
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,8 @@ joined as (
unioned_with_calc.loaded_at_field,
unioned_with_calc.loader,
unioned_with_calc.identifier,
unioned_with_calc.hard_coded_references -- NULL for non-model resources
unioned_with_calc.hard_coded_references, -- NULL for non-model resources
unioned_with_calc.is_excluded -- NULL for metrics and exposures

from unioned_with_calc
left join naming_convention_prefixes
Expand Down
3 changes: 2 additions & 1 deletion models/marts/core/int_direct_relationships.sql
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@ all_graph_resources as (
file_name,
model_type,
materialized,
source_name
source_name,
is_excluded
from {{ ref('int_all_graph_resources') }}
),

Expand Down
2 changes: 2 additions & 0 deletions models/marts/dag/fct_direct_join_to_source.sql
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ with direct_model_relationships as (
from {{ ref('int_all_dag_relationships') }}
where child_resource_type = 'model'
and distance = 1
and not parent_is_excluded
and not child_is_excluded
),

model_and_source_joined as (
Expand Down
1 change: 1 addition & 0 deletions models/marts/dag/fct_duplicate_sources.sql
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ with sources as (
end as source_db_location
from {{ ref('int_all_graph_resources') }}
where resource_type = 'source'
and not is_excluded
-- we order the CTE so that listagg returns values correctly sorted for some warehouses
order by 1, 2
),
Expand Down
1 change: 1 addition & 0 deletions models/marts/dag/fct_hard_coded_references.sql
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
with models as (
select * from {{ ref('int_all_graph_resources') }}
where resource_type = 'model'
and not is_excluded
),

final as (
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ with direct_relationships as (
*
from {{ ref('int_all_dag_relationships') }}
where distance = 1
and not parent_is_excluded
and not child_is_excluded
),
final as (
select
Expand Down
2 changes: 2 additions & 0 deletions models/marts/dag/fct_model_fanout.sql
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@ with all_dag_relationships as (
select
*
from {{ ref('int_all_dag_relationships') }}
where not parent_is_excluded
and not child_is_excluded
),

-- find all models without children
Expand Down
2 changes: 2 additions & 0 deletions models/marts/dag/fct_multiple_sources_joined.sql
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ with direct_source_relationships as (
from {{ ref('int_all_dag_relationships') }}
where distance = 1
and parent_resource_type = 'source'
and not parent_is_excluded
and not child_is_excluded
-- we order the CTE so that listagg returns values correctly sorted for some warehouses
order by 1, 2
),
Expand Down
Loading

0 comments on commit 6dc05ba

Please sign in to comment.