-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move tests from scheduled queries / business queries to DBT #83
Move tests from scheduled queries / business queries to DBT #83
Conversation
eed3dec
to
c62a947
Compare
models/marts/history_assets.yml
Outdated
meta: | ||
description: "Monitors the freshness of your table over time, as the expected time between data updates." | ||
- incremental_unique_combination_of_columns: | ||
combination_of_columns: | ||
- batch_run_date |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure batch_run_date
should be included. My assumption was that the history_assets
table was unique on asset_code, asset_issuer, and asset_type
otherwise we would have "duplicate assets" based where each asset would have multiple batch_run_dates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 6bd6abf
models/sources/src_accounts.yml
Outdated
@@ -14,7 +14,6 @@ sources: | |||
- incremental_unique_combination_of_columns: | |||
combination_of_columns: | |||
- account_id | |||
- sequence_number |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think sequence_number
should be included. The src/staging accounts
table should record every instance of an account on every sequence_number
if the account had a change for that given ledger sequence iirc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I will keep it then, It was not present in the scheduled query, so I treated that as source of truth
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 6bd6abf
@@ -0,0 +1,17 @@ | |||
{{ config( | |||
severity="error" | |||
, tags=["singular_test"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI this tag runs every 30 mins in airflow. This is a higher frequency compared to what the cloud function and scheduled query tests used to run at. Which is good.
Just mentioning this in case we get noisy alerts where we might want to adjust the query and/or the frequency the tests are run (possibly with a separate dbt tag).
tests/bucketlist_db_size_check.sql
Outdated
select sequence, | ||
closed_at, | ||
total_byte_size_of_bucket_list / 1000000000 as bl_db_gb | ||
from {{ source('crypto_stellar', 'history_ledgers') }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be a ref('stg_history_ledgers')
instead of a source. Otherwise these tests would be hardcoded to just prod right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wont test still use crypto_stellar.history_ledgers
instead of test_crypto_stellar.history_ledgers
?
Example:
stellar-dbt-public/models/staging/stg_history_ledgers.sql
Lines 6 to 12 in 2e63a8a
with | |
raw_table as ( | |
select * | |
from {{ source('crypto_stellar', 'history_ledgers') }} | |
) | |
, history_ledgers as ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will use the test project/dataset because the source for staging tables will be overwritten by the dbt_project.yml in the private dbt repo
https://github.com/stellar/stellar-dbt/blob/master/dbt_project.yml#L63-L68
+project: "{% if target.name == 'prod' %} crypto-stellar {% else %} {{ target.project }} {% endif %}"
I don't think generic tests has such an override defined. So technically you can add a generic test source override to dbt_project.yml. But my preference would be to just change the generic test source
to ref
because I feel like it is cleaner because you only need to define a single override for staging table instead of two overrides (staging + generic tests)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Yes, agree in that case we should just use ref
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 74dad5a
tests/eho_by_ops.sql
Outdated
FROM {{ source('crypto_stellar', 'history_operations') }} op | ||
LEFT OUTER JOIN {{ ref('enriched_history_operations') }} eho |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here
I think this should be a ref('stg_history_operations') instead of a source. Otherwise these tests would be hardcoded to just prod right?
Edit: also in this case there would be a miss match between data if run in test because there is a ref('enriched_history_operations')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 74dad5a
@@ -11,7 +11,7 @@ with | |||
, batch_id | |||
, closed_at | |||
, max(sequence) as max_sequence | |||
from {{ source('crypto_stellar', 'history_ledgers') }} | |||
from {{ ref('stg_history_ledgers') }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙏
PR Checklist
PR Structure
otherwise).
Thoroughness
Release planning
semver, and I've changed the name of the BRANCH to release/* , feature/* or patch/* .
What
This PR:
Why
To centralize the data quality tests
Related PR: https://github.com/stellar/stellar-dbt/pull/227
Known limitations
None