This package builds a mart of tables from dbt artifacts loaded into a table. It is compatible with Snowflake only. The models are based off of the v1 schema introduced in dbt 0.19.0.
Models included:
dim_dbt__models
fct_dbt__model_executions
fct_dbt__latest_full_model_executions
fct_dbt__critical_path
fct_dbt_run_results
The critical path model determines the slowest route through your DAG, which provides you with the information needed to make a targeted effort to reducing dbt run
times. For example:
-
Add this package to your
packages.yml
following these instructions -
Configure the following variables in your
dbt_project.yml
:
vars:
dbt_artifacts:
dbt_artifacts_database: your_db # optional, default is your target database
dbt_artifacts_schema: your_schema # optional, default is 'dbt_artifacts'
dbt_artifacts_table: your_table # optional, default is 'artifacts'
models:
...
dbt_artifacts:
+schema: your_destination_schema
staging:
+schema: your_destination_schema
Note that the model materializations are defined in this package's dbt_project.yml
, so do not set them in your project.
- Run
dbt deps
.
This package requires that the source data exists in a table in Snowflake.
Snowflake makes it possible to load local files into your warehouse. We've included a number of macros to assist with this. This method can be used by both dbt Cloud users, and users of other orchestration tools.
-
To initially create these tables, execute
dbt run-operation create_artifact_resources
(source). This will create a stage and a table named{{ target.database }}.dbt_artifacts.artifacts
— you can override this name using the variables listed in the Installation section, above. -
Add operations to your production run to load files into your table, via the
upload_artifacts
macro (source). You'll need to specify which files to upload through use of the--args
flag. Here's an example setup.
$ dbt seed
$ dbt run-operation upload_dbt_artifacts --args '{filenames: [manifest, run_results]}'
$ dbt run
$ dbt run-operation upload_dbt_artifacts --args '{filenames: [manifest, run_results]}'
$ dbt test
$ dbt run-operation upload_dbt_artifacts --args '{filenames: [run_results]}'
$ dbt source snapshot-freshness
$ dbt run-operation upload_dbt_artifacts --args '{filenames: [sources]}'
$ dbt docs generate
$ dbt run-operation upload_dbt_artifacts --args '{filenames: [catalog]}'
If you are using an orchestrator, you might instead upload these files to cloud storage — the method to do this will depend on your orchestrator. Then, link the cloud storage destination to a Snowflake external stage, and use a snowpipe to copy these files into the source table:
copy into ${snowflake_table.dbt_artifacts.database}.${snowflake_table.dbt_artifacts.schema}.${snowflake_table.dbt_artifacts.name}
from (
select
$1 as data,
$1:metadata:generated_at::timestamp_tz as generated_at,
metadata$filename as path,
regexp_substr(metadata$filename, '([a-z_]+.json)$') as artifact_type
from @${snowflake_stage.dbt_artifacts.database}.${snowflake_stage.dbt_artifacts.schema}.${snowflake_stage.dbt_artifacts.name}
)
file_format = (type = 'JSON')
The models will be picked up on your next dbt run
command. You can also run the package specifically with dbt run -m dbt_artifacts
.
The macros in this package have been adapted from code shared by Kevin Chan and Jonathan Talmi of Snaptravel.
Thank you for sharing your work with the community!