Skip to content

Commit

Permalink
Merge pull request #441 from dbt-labs/feature/calculate-sql-complexity
Browse files Browse the repository at this point in the history
  • Loading branch information
b-per authored Apr 25, 2024
2 parents e6bbe8f + 2f63142 commit 8c7d958
Show file tree
Hide file tree
Showing 10 changed files with 100 additions and 3 deletions.
20 changes: 20 additions & 0 deletions dbt_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -90,3 +90,23 @@ vars:

# -- Code complexity variables --
comment_chars: ["--"]
token_costs: {
"and": 0.1,
"or": 0.1,
"when": 0.5,
"coalesce": 1,
"distinct": 1,
"greatest": 1,
"least": 1,
"group": 1,
"join": 1,
"order": 1,
"select": 1,
"where": 1,
"having": 2,
"flatten": 3,
"unnest": 3,
"pivot": 3,
"partition by": 3,
"qualify": 3,
}
7 changes: 7 additions & 0 deletions docs/customization/overriding-variables.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,13 @@ vars:
chained_views_threshold: 8
```

## SQL code analysis

| variable | description | default |
| ----------- | ----------- | ----------- |
| `comment_chars` | a list of strings used for inline comments | `["--"]` |
| `token_costs` | a dictionary of SQL tokens (words) and associated complexity weight, <br>used to estimate models complexity | see in the `dbt_project.yml` file of the package |

## Execution

| variable | description | default |
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# Querying columns with SQL
# Querying columns names and descriptions with SQL

The model `stg_columns` ([source](https://github.com/dbt-labs/dbt-project-evaluator/tree/main/models/staging/graph/stg_columns.sql)), created with the package, lists all the columns from all the dbt nodes (models, sources, tests, snapshots)
The model `stg_columns` ([source](https://github.com/dbt-labs/dbt-project-evaluator/tree/main/models/staging/graph/stg_columns.sql)), created with the package, lists all the columns configured in all the dbt nodes (models, sources, tests, snapshots).

It will not list the columns of the models that have not explicitly been added to the YAML files.

You can use this model to help with questions such as:

Expand Down
2 changes: 2 additions & 0 deletions docs/querying-the-dag.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ Building additional models and snapshots on top of this model could allow:

## Getting insights on potential refactoring work

- identifying models with a lof of lines of code
- identifying the models with the highest level of complexity leveraging the column `sql_complexity` from the table `int_all_graph_resources`, based on the weights defined in the `token_costs` variable
- looking at the longest "chains" of models in a project
- reviewing models with many/few direct dependents
- identifying potential bottlenecks
Expand Down
21 changes: 21 additions & 0 deletions macros/calculate_number_lines.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
{% macro calculate_number_lines(node) %}
{{ return(adapter.dispatch('calculate_number_lines', 'dbt_project_evaluator')(node)) }}
{% endmacro %}

{% macro default__calculate_number_lines(node) %}

{% if node.resource_type == 'model' %}

{% if execute %}
{%- set model_raw_sql = node.raw_sql or node.raw_code -%}
{%- else -%}
{%- set model_raw_sql = '' -%}
{%- endif -%}

{{ return(model_raw_sql.count("\n")) + 1 }}

{% endif %}

{{ return(0) }}

{% endmacro %}
37 changes: 37 additions & 0 deletions macros/calculate_sql_complexity.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
{% macro calculate_sql_complexity(node) %}
{{ return(adapter.dispatch('calculate_sql_complexity', 'dbt_project_evaluator')(node)) }}
{% endmacro %}

{% macro default__calculate_sql_complexity(node) %}

{% if node.resource_type == 'model' and node.language == 'sql' %}

{% if execute %}
{%- set model_raw_sql = node.raw_sql or node.raw_code -%}
{%- else -%}
{%- set model_raw_sql = '' -%}
{%- endif -%}

{%- set re = modules.re -%}
{%- set ns = namespace(complexity = 0) -%}

{# we remove the comments that start with -- , or other characters configured #}
{%- set comment_chars_match = "(" ~ var('comment_chars') | join("|") ~ ").*" -%}
{%- set model_raw_sql_no_comments = re.sub(comment_chars_match, '', model_raw_sql) -%}

{%- for token, token_cost in var('token_costs').items() -%}

{# this is not 100% perfect but it checks more or less if the token exists as a word by itself or followed by "("" like for least()/greatest() #}
{%- set token_with_boundaries = "\\b" ~ token ~ "[\\t\\r\\n (]" -%}
{%- set all_regex_matches = re.findall(token_with_boundaries, model_raw_sql_no_comments, re.IGNORECASE) -%}
{%- set ns.complexity = ns.complexity + token_cost * (all_regex_matches | length) -%}

{%- endfor -%}

{{ return(ns.complexity) }}

{% endif %}

{{ return(0) }}

{% endmacro %}
4 changes: 4 additions & 0 deletions macros/unpack/get_node_values.sql
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@
{%- for node in nodes_list -%}

{%- set hard_coded_references = dbt_project_evaluator.find_all_hard_coded_references(node) -%}
{%- set number_lines = dbt_project_evaluator.calculate_number_lines(node) -%}
{%- set sql_complexity = dbt_project_evaluator.calculate_sql_complexity(node) -%}
{%- set contract = node.contract.enforced if node.contract else false -%}
{%- set exclude_node = dbt_project_evaluator.set_is_excluded(node, resource_type="node") -%}

Expand Down Expand Up @@ -40,6 +42,8 @@
"''" if not node.column_name else wrap_string_with_quotes(dbt.escape_single_quotes(node.column_name)),
wrap_string_with_quotes(node.meta | tojson),
wrap_string_with_quotes(dbt.escape_single_quotes(hard_coded_references)),
number_lines,
sql_complexity,
wrap_string_with_quotes(node.get('depends_on',{}).get('macros',[]) | tojson),
"cast(" ~ dbt_project_evaluator.is_not_empty_string(node.test_metadata) | trim ~ " as boolean)",
"cast(" ~ exclude_node ~ " as boolean)",
Expand Down
2 changes: 1 addition & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ nav:
- Configuring exceptions to the rules: customization/exceptions.md
- Excluding packages and models/sources based on path: customization/excluding-packages-and-paths.md
- Display issues in the logs: customization/issues-in-log.md
- Querying columns: customization/querying-columns.md
- Querying columns names and descriptions: customization/querying-columns-names-and-descriptions.md
- Run in CI Check: ci-check.md
- Querying the DAG: querying-the-dag.md
- Contributing: contributing.md
2 changes: 2 additions & 0 deletions models/marts/core/int_all_graph_resources.sql
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,8 @@ joined as (
unioned_with_calc.loader,
unioned_with_calc.identifier,
unioned_with_calc.hard_coded_references, -- NULL for non-model resources
unioned_with_calc.number_lines, -- NULL for non-model resources
unioned_with_calc.sql_complexity, -- NULL for non-model resources
unioned_with_calc.is_excluded -- NULL for metrics and exposures

from unioned_with_calc
Expand Down
2 changes: 2 additions & 0 deletions models/staging/graph/stg_nodes.sql
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,8 @@ select
cast(null as {{ dbt.type_string() }}) as column_name,
cast(null as {{ dbt.type_string() }}) as meta,
cast(null as {{ dbt.type_string() }}) as hard_coded_references,
cast(null as {{ dbt.type_int() }}) as number_lines,
cast(null as {{ dbt.type_float() }}) as sql_complexity,
cast(null as {{ dbt.type_string() }}) as macro_dependencies,
cast(True as boolean) as is_generic_test,
cast(True as boolean) as is_excluded
Expand Down

0 comments on commit 8c7d958

Please sign in to comment.