Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbt Elementary and tests configuration #11

Merged
merged 7 commits into from
Jan 18, 2024

Conversation

enercolini
Copy link
Contributor

@enercolini enercolini commented Dec 12, 2023

Initial commit for reviewing the public sources and marts test configurations.

Confirm that the period defined for testing is as expected. Also confirm that the date column chosen (closed_at, batch_insert_ts, etc) to be used as a reference for the defined periods is correct.

During the execution of the dbt test on the configured sources, there were a few test failures. If you want more details and the list of failures @sydneynotthecity just let me know and I can provide it to you.

Also there is a few docs that needs updates. If you want to provide the description I can update it.

@sydneynotthecity
Copy link
Contributor

there were a few test failures. If you want more details and the list of failures @sydneynotthecity just let me know and I can provide it to you.

Yes, if you can send me those failures that would be great. Feel free to send through Slack or email

Copy link
Contributor

@sydneynotthecity sydneynotthecity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple nits that need to be addressed before approving.

For the src_* tables, it might also be worthwhile to investigate how the query performs with batch_insert_ts as the column for the freshness timestamp column. I have no idea if it will do a full table scan since that column is not the partition column, but I would like confirmation that it will not scan the entire table.

There are a couple of other tables where the date column configured is not the partition key. Given how large these tables are, can you confirm that BQ is not processing all bytes in the table when running the queries? if it is, the columns will need to be updated to the partition column.

Similar to this PR, i think it would be nice to have some volume alerts too if the volume of the data drastically increased or decreased during a timespan. If you are able to implement volume alerts in the other PR, please also apply here.

models/marts/enriched_history_operations.yml Outdated Show resolved Hide resolved
models/marts/enriched_history_operations.yml Outdated Show resolved Hide resolved
- relationships:
to: ref('stg_history_transactions')
field: transaction_id

- name: type
description: '{{ doc("type") }}'
tests:
- not_null
- not_null:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a test that confirms that the type <= 26?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented through the dbt utils test "expression_is_true".

models/marts/history_assets.yml Outdated Show resolved Hide resolved
models/marts/history_assets.yml Outdated Show resolved Hide resolved
models/sources/src_accounts.yml Outdated Show resolved Hide resolved
models/sources/src_history_ledgers.yml Outdated Show resolved Hide resolved
models/sources/src_history_operations.yml Outdated Show resolved Hide resolved
models/sources/src_history_trades.yml Outdated Show resolved Hide resolved
models/sources/src_history_transactions.yml Outdated Show resolved Hide resolved
@enercolini
Copy link
Contributor Author

Once this tests are approved, I'll move forward to configure the slack alerts as requested.

@enercolini enercolini requested a review from chowbao January 17, 2024 01:41
@chowbao
Copy link
Contributor

chowbao commented Jan 17, 2024

Wanted to confirm this before approving.

For the src_* tables, it might also be worthwhile to investigate how the query performs with batch_insert_ts as the column for the freshness timestamp column. I have no idea if it will do a full table scan since that column is not the partition column, but I would like confirmation that it will not scan the entire table.

Was this ever tested or confirmed? If it does cause problems the current batch_run_date/closed_at will work

@enercolini
Copy link
Contributor Author

enercolini commented Jan 17, 2024

Wanted to confirm this before approving.

For the src_* tables, it might also be worthwhile to investigate how the query performs with batch_insert_ts as the column for the freshness timestamp column. I have no idea if it will do a full table scan since that column is not the partition column, but I would like confirmation that it will not scan the entire table.

Was this ever tested or confirmed? If it does cause problems the current batch_run_date/closed_at will work

As I remember, I don't think using batch_insert_ts processed much more than batch_run_date due to the query elementary builds to analyze the metrics. I compared configuring two tests for the same table (account_signers) using different columns and both processed the same size. But anyway I changed the timestamp column to the partition column for all tables (basically batch_run_date and closed_at). Let me know if this makes sense to you as well. Thank you,

@enercolini enercolini merged commit 4e914ad into master Jan 18, 2024
@enercolini enercolini deleted the feature/sources_tests_updates branch January 18, 2024 11:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants