upstream-prod
is a dbt package for easily using production data in a development environment. It's like an alternative to the defer flag - only without the need to find and download a production manifest.
It is inspired by (but unrelated to) similar work by Monzo.
When developing locally, I would often get errors because my dev database had outdated or missing data. Rebuilding the models was often time-consuming and it wasn't easy to tell how many layers deep I needed to go.
upstream-prod
fixes this by overriding the {{ ref() }}
macro, redirecting ref
s in selected models to their equivalent table in production (unless the parent model was also selected).
For example, imagine a simple DAG like:
model_1 -> model_2 -> model_3
With a profiles.yml
similar to:
jaffle_shop:
target: dev
outputs:
dev:
...
prod:
...
If I run dbt build -s model_2+
, the following happens:
model_2
is built indev
with data from theprod
version ofmodel_1
.model_3
is also built in mydev
but it refers to thedev
version ofmodel_2
.- Tests are run against the
dev
versions ofmodel_2
andmodel_3
.
In short, your selected models are available in your dev
environment with all that lovely prod
quality!
If desired, the package can return a dev
model when the prod
version can't be found. This is useful when adding several new models at once.
Let's assume only model_1
is in prod
and I want to create model_2
and model_3
. First I create model_2
in dev
with dbt run -s model_2
, then start working on model_3
. By default, dbt run -s model_3
would fail because model_2
doesn't exist in prod
. With fallback mode enabled, the package would use the dev
model and the command would be successful.
After installing the package you'll need to add some variables to dbt_project.yml
.
Set either of upstream_prod_schema
or upstream_prod_database
to tell the package where to find production data. Open profiles.yml
and set the relevant variables for your setup:
- If dev and prod are in different schemas in the same database, set
upstream_prod_schema
to theschema
value of your prod target. - If dev and prod are in different logical databases, set
upstream_prod_database
to thedatabase
value of your prod target.
upstream_prod_enabled
: Disables the package when False. Defaults to True.upstream_prod_disabled_targets
: List of targets where the package should be disabled.upstream_prod_fallback
: Whether to fall back to the default target when a model can't be found in prod. Defaults to False.
Example
I use Snowflake and each developer has a separate database with identically-named schemas. Here's how I have configured my project:
# dbt_project.yml
vars:
upstream_prod_database: <prod_db>
upstream_prod_fallack: True
upstream_prod_disabled_targets:
- ci
- prod
Next, you need to tell dbt to use this package's version of {{ ref() }}
instead of the builtin macro. The recommended approach is to create a thin wrapper around upstream-prod
. In your macros
directory, create a file called ref.sql
with the following contents:
{% macro ref(
parent_model,
current_model=this.name,
prod_database=var("upstream_prod_database", None),
prod_schema=var("upstream_prod_schema", None),
enabled=var("upstream_prod_enabled", True),
fallback=var("upstream_prod_fallback", False)
) %}
{% do return(upstream_prod.ref(parent_model, current_model, prod_database, prod_schema, enabled, fallback)) %}
{% endmacro %}
Alternatively, you can find any instances of {{ ref() }}
in your project and replace them with {{ upstream_prod.ref() }}
upstream-prod
has been designed and tested on Snowflake only. It should work with other officially supported adapters but I can't be sure. If it doesn't, PRs are welcome!