Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tags to models #16

Merged
merged 10 commits into from
Jan 26, 2024
Merged

Add tags to models #16

merged 10 commits into from
Jan 26, 2024

Conversation

chowbao
Copy link
Contributor

@chowbao chowbao commented Jan 24, 2024

  • Add tags to models for airflow scheduling
  • Move current tables into dbt public

Copy link
Contributor

@sydneynotthecity sydneynotthecity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious, what would be the impact if we didn't tag any of the stg_ source views?

I noticed that each staging view is tagged with current_state or enriched_history_operations, but those sources are used by more than just those final models, so curious how you determined what the tag should be. And whether there is merit in keep the tags for those models.

models/marts/history_assets.sql Outdated Show resolved Hide resolved
models/staging/stg_history_ledgers.sql Show resolved Hide resolved
@chowbao
Copy link
Contributor Author

chowbao commented Jan 25, 2024

I'm curious, what would be the impact if we didn't tag any of the stg_ source views?

The stg_* would still run. But actually the dbt-public staging tables are ephemeral so it doesn't matter. Looking at it again we should probably make all staging (dbt private) ephemeral

Oh they never run. Everything was working because test_sdf_raw already had all the views already

I think this means that the staging tables need tags for anything that uses them if we keep the dbt-public staging ephemeral

Copy link
Contributor

@sydneynotthecity sydneynotthecity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh they never run. Everything was working because test_sdf_raw already had all the views already

Interesting, I thought the views were supposed to be ephemeral, and should have been built as part of the CTE. So if you run something like trade_agg with an untagged stg_history_trades, it doesn't rebuild the source SQL?

If we have to rebuild the views, what are your thoughts on just tagging them as enriched_history_operations and running them in the half hourly DAG? I think adding new tags any time we build out a new pipeline is gonna get old

order by s.last_modified_ledger desc, s.ledger_entry_change desc
) as row_nr
from {{ ref('stg_account_signers') }} as s
join {{ ref('stg_history_ledgers') }} as l
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: I think we can remove the join against history_ledgers on the current state tables. Since the state tables now all have closed_at. We will need to validate that though and confirm that the closed_at field is backfilled appropriately.

@chowbao
Copy link
Contributor Author

chowbao commented Jan 25, 2024

Interesting, I thought the views were supposed to be ephemeral, and should have been built as part of the CTE. So if you run something like trade_agg with an untagged stg_history_trades, it doesn't rebuild the source SQL?

Yeah that's what I found when I tried it locally.

If we have to rebuild the views, what are your thoughts on just tagging them as enriched_history_operations and running them in the half hourly DAG? I think adding new tags any time we build out a new pipeline is gonna get old

Yeah that works. The problem then becomes where do you want to put the stellar-dbt-public views? Hard code prod to write to sdf_raw in the internal project?

@sydneynotthecity
Copy link
Contributor

The problem then becomes where do you want to put the stellar-dbt-public views? Hard code prod to write to sdf_raw in the internal project?

Sounds like we have two options:

  1. Keep the tags for all stg_* tables, which then allows us to keep staging tables ephermal (built only in the CTE during runtime)
  2. Materialize the views during the half-hourly dag run, but then the views must be written to a dataset (TBD).

Do you have a preference on which we move forward with? If we opt for option #2, I would prefer the views be materialized in a new, private crypto_stellar_raw dataset in crypto-stellar project. Is there an option that makes more sense for community devs?

@chowbao
Copy link
Contributor Author

chowbao commented Jan 25, 2024

The problem then becomes where do you want to put the stellar-dbt-public views? Hard code prod to write to sdf_raw in the internal project?

Sounds like we have two options:

  1. Keep the tags for all stg_* tables, which then allows us to keep staging tables ephermal (built only in the CTE during runtime)
  2. Materialize the views during the half-hourly dag run, but then the views must be written to a dataset (TBD).

Do you have a preference on which we move forward with? If we opt for option #2, I would prefer the views be materialized in a new, private crypto_stellar_raw dataset in crypto-stellar project. Is there an option that makes more sense for community devs?

Hmm we should probably do option 2. You're right that option 1 would get old really fast. Plus option 2 probably makes more sense for community devs because dbt-public does make views if run by itself instead of a package in dbt private

@chowbao chowbao merged commit 4ab7b7d into master Jan 26, 2024
@sydneynotthecity sydneynotthecity deleted the add-tags-to-models branch November 14, 2024 14:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants