[Feature] Recognize API Relation Table Used in SQL Header as Parent of Model #11038

aboomer07 · 2024-11-22T20:59:15Z

Is this a new bug in dbt-core?

I believe this is a new bug in dbt-core
I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

In one of my DBT models my_downstream_model, I am defining a table reference using api.relation.create like this:

{% set my_table = api.Relation.create(
    database=<my_database>,
    schema=<my_schema>,
    identifier=<my_identifier>
) -%}

This table is then used to leverage the declare variables functionality of BigQuery within a SQL header. I am doing this since the main table used in the model is a hive partitioned external table, so declaring the partitions as a variable greatly reduces the query cost compared to having the partitions be defined by the result of another query.

The SQL header call looks something like this:

{% call set_sql_header(config) %}
    DECLARE my_partition_list ARRAY<STRING> DEFAULT(ARRAY(select partition_field from {{ my_table }}));
{%- endcall %}

my_table is not used anywhere else in the main model query, and it is not recognized as a parent of my_downstream_model.

Is this the intended behavior? I also tried to set it as a dependency manually, doing -- depends_on: {{ my_table }} but that didn't get it to be recognized as a parent either.

Expected Behavior

My expectation was that my_table would be recognized as a parent of my_downstream_model, so that the DAG execution ordering would be correct.

Steps To Reproduce

Create a DBT SQL model with my_table defined as above, and use it in a SQL header call as I defined. I ran dbt ls --select=my_table+ and my_downstream_model was not in its list of dependencies

Relevant log output

No response

Environment

- OS: macOS 14.5 and Debian 11.4
- Python: 3.10.8
- dbt: 1.8.9

Which database adapter are you using with dbt?

bigquery

Additional Context

No response

The text was updated successfully, but these errors were encountered:

dbeatty10 · 2024-11-24T22:06:22Z

Thanks for reaching out @aboomer07 !

What you described sounds like the expected behavior to me. This is because dbt will only recognize dependencies that come from a call to the ref or source functions. The dbt DAG does not support dynamic graph building -- all the dependencies must be specified up-front.

Is there a reason why you can't create a source and then utilize it like the following?

First, create a source with your database, schema, and identifier:

models/_sources.yml

sources:
  - name: <my_source_name>  # often same as <my_schema>, but you can name the source whatever you want
    database: <my_database>  
    schema: <my_schema>  
    tables:
      - name: <my_identifier>

Then, utilize it within a dbt model:

models/my_downstream_model.sql

-- ...

-- depends_on: {{ source("<my_source_name>", "<my_identifier>") }}

-- ...

I didn't try one way or another with placing the source reference within the sql_header, but the at the very least the -- depends_on: approach should work.

aboomer07 · 2024-11-25T16:45:21Z

Thanks for getting back!

I now see that wasn't quite enough information on my part, but in general that makes sense what you're saying about dynamic graph building.

So I have a couple more things going on here, I have both a DBT project (<my_project>) as well as a private external (to the project) package (<my_package>) I built that I am using in that project. <my_package> has some generalized DBT macros and models that can be used across multiple projects.

So <my_table> is a model within <my_project> and gets referenced with {{ ref }} in other parts of the project.

<my_downstream_table> is actually a generic model within <my_package>. The way the models within <my_package> are built is that the project using <my_package> gets to supply a variable in the project specifying the name of the tables required by <my_downstream_table>. The goal of this was to set up a package where the user would only need to supply the table names specific to their project (and containing the correct fields), and then the package would be able to generate results from those.

So following the DAG I have <my_table> which is a model defined within and created by <my_project>. <my_downstream_table> is a generic model defined within <my_package>. <my_project> uses <my_downstream_table> and creates a realization of it dependent on <my_table> (and a couple others).

<my_downstream_table> is actually dependent on a few other tables either created / with their sources defined within <my_project> that are referenced within the main query, and the dependency graphing of those works fine!

Then <my_project> has some other tables that are dependent on <my_downstream_table>.

Hopefully this makes sense? I found a workaround for now, and if this isn't a good pattern to pursue then I'm fine with closing this issue! I guess this is also sort of a BigQuery issue, since this kind of pattern is only needed for specifying the partition filters.

dbeatty10 · 2024-11-26T17:03:45Z

Thank you for that additional context @aboomer07 💡

<my_downstream_table> is actually a generic model within <my_package>. The way the models within <my_package> are built is that the project using <my_package> gets to supply a variable in the project specifying the name of the tables required by <my_downstream_table>. The goal of this was to set up a package where the user would only need to supply the table names specific to their project (and containing the correct fields), and then the package would be able to generate results from those.

This would be a feature request rather than a bug report, so I'm going to update the title accordingly. The ability to pass in a list of table dependencies is not something we'd be interested in adopting especially since you have a workaround so I'm going to close this as "not planned".

aboomer07 added bug Something isn't working triage labels Nov 22, 2024

aboomer07 changed the title ~~[Bug] <title> API Relation Table Used in SQL Header Not Recognized as Parent of Model~~ [Bug] API Relation Table Used in SQL Header Not Recognized as Parent of Model Nov 22, 2024

dbeatty10 added awaiting_response and removed triage labels Nov 24, 2024

github-actions bot added triage and removed awaiting_response labels Nov 25, 2024

dbeatty10 closed this as not planned Won't fix, can't repro, duplicate, stale Nov 26, 2024

dbeatty10 added enhancement New feature or request wontfix Not a bug or out of scope for dbt-core and removed bug Something isn't working triage labels Nov 26, 2024

dbeatty10 changed the title ~~[Bug] API Relation Table Used in SQL Header Not Recognized as Parent of Model~~ [Feature] Recognize API Relation Table Used in SQL Header as Parent of Model Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Recognize API Relation Table Used in SQL Header as Parent of Model #11038

[Feature] Recognize API Relation Table Used in SQL Header as Parent of Model #11038

aboomer07 commented Nov 22, 2024 •

edited

Loading

dbeatty10 commented Nov 24, 2024

aboomer07 commented Nov 25, 2024 •

edited

Loading

dbeatty10 commented Nov 26, 2024

[Feature] Recognize API Relation Table Used in SQL Header as Parent of Model #11038

[Feature] Recognize API Relation Table Used in SQL Header as Parent of Model #11038

Comments

aboomer07 commented Nov 22, 2024 • edited Loading

Is this a new bug in dbt-core?

Current Behavior

Expected Behavior

Steps To Reproduce

Relevant log output

Environment

Which database adapter are you using with dbt?

Additional Context

dbeatty10 commented Nov 24, 2024

aboomer07 commented Nov 25, 2024 • edited Loading

dbeatty10 commented Nov 26, 2024

aboomer07 commented Nov 22, 2024 •

edited

Loading

aboomer07 commented Nov 25, 2024 •

edited

Loading