-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dag to create elementary report #506
Conversation
344fe8e
to
192fa70
Compare
192fa70
to
f60fb74
Compare
update
202655e
to
8260234
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious about the implications of running this DAG hourly. Will the report only contain data for the past day? Is it only sent if there are failures?
airflow_variables_dev.json
Outdated
@@ -124,7 +124,7 @@ | |||
"partnership_assets__account_holders_activity_fact": false, | |||
"partnership_assets__asset_activity_fact": false | |||
}, | |||
"dbt_image_name": "stellar/stellar-dbt:53375b5f9", | |||
"dbt_image_name": "stellar/stellar-dbt-dev:dba580d7c", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: Update this image once final dbt image is created. Also update corresponding var in prod
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
airflow_variables_dev.json
Outdated
@@ -288,21 +288,28 @@ | |||
"dbt": { | |||
"requests": { | |||
"cpu": "1", | |||
"ephemeral-storage": "500Mi", | |||
"ephemeral-storage": "1Gi", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default value of ephemeral-storage
is 1 Gi. We were not propagating this value
https://github.com/stellar/stellar-etl-airflow/pull/506/files#diff-f7be361360b2906f863ccfd863e335513b7b5ccc42728f16c2ae1ef43a798465R36
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the default value is 1Gi, do we even need to set for dbt
and default
tasks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, good to 🔪
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one suggestion but looks good. We won't need to upgrade worker size, correct?
airflow_variables_dev.json
Outdated
@@ -288,21 +288,28 @@ | |||
"dbt": { | |||
"requests": { | |||
"cpu": "1", | |||
"ephemeral-storage": "500Mi", | |||
"ephemeral-storage": "1Gi", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the default value is 1Gi, do we even need to set for dbt
and default
tasks?
dags/elementary_report_dag.py
Outdated
default_args=get_default_dag_args(), | ||
start_date=datetime(2024, 11, 11, 0, 0), | ||
description="This DAG creates elementary report and send it to slack", | ||
schedule="0 0 * * MON", # Runs every Monday |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL you can specify day of week by abbreviation instead of number 🙃
I suggest shifting this schedule back to 2 or 3 UTC so that it captures the latest dbt_stellar_marts
execution. Otherwise this data will be 23 hours stale for the marts tables specifically
No, we don't need to |
PR Checklist
PR Structure
otherwise).
Thoroughness
What
This PR adds a new DAG to generate elementary report and send it to slack. The DAG will be run once a week (on Monday) and generate report for last 7 days.
Why
The obervability report will help us to monitor any pending test failures during the week.
Known limitations
The edr command is lil wonky with memory requirements. It is a known issue and observed by many community members as well.
elementary-data/elementary#761
Even though above is closed, but it is not fully resolved. Also, it requires high ephemeral storage since it downloads all the data and then compute the report.
Testing: