-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add workflow to calculate schema changes #517
Conversation
fix
ef8ff2d
to
711a51f
Compare
run: | | ||
cd $GITHUB_WORKSPACE | ||
output=$(. scripts/update_dbt_marts_schema_changelog.sh) | ||
echo "$output" > changelog/dbt_marts.md |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm is it possible to make this append instead of overwrite the entire changelog each run?
There might be a time we delete the elementary.alerts_schema_changes
table like the times we have deleted the whole elementary
dataset because it was broken or elementary was doing something unexpected.
If it's too hard to make this append instead of overwrite we should remember to keep a copy of elementary.alerts_schema_changes
otherwise we'd lose changelog history
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hrm this process is pretty dependent on elementary.alerts_schema_changes
. Even if we were to delete this dataset ever, that in itself will affect changelog computing part. So ideally we should not delete that dataset.
However as far as append is concerned, we could
- parse tracked date if any
- add date filter to get data between tracked date and today
- update the tracked date as max(date)
Structure of .md file:
Date: 2024-10-10
Table 1
Date: 2024-09-01
Table 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But also, I think it's quite a bit of text formatting. We can leave it as is and can revisit later if needed. In the first place, I don't think we are going to see a lot of schema changes in general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup sounds good
d7dbf9f
to
8b420dc
Compare
PR Checklist
PR Structure
otherwise).
Thoroughness
What
This PR creates a github workflow which:
Since the results from script will be deterministic, no new PRs will be created if no schema has changed
Why
To track data schema changes.
I will follow up with 2 PRs:
Question
Does the format for changelog look alright? Happy to discuss other ideas