Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add workflow to calculate schema changes #517

Merged
merged 20 commits into from
Oct 29, 2024
Merged

Conversation

amishas157
Copy link
Contributor

@amishas157 amishas157 commented Oct 29, 2024

PR Checklist

PR Structure

  • This PR has reasonably narrow scope (if not, break it down into smaller PRs).
  • This PR avoids mixing refactoring changes with feature changes (split into two PRs
    otherwise).
  • This PR's title starts with the jira ticket associated with the PR.

Thoroughness

  • This PR adds tests for the most critical parts of the new functionality or fixes.
  • I've updated the README with the added features, breaking changes, new instructions on how to use the repository.

What

This PR creates a github workflow which:

Since the results from script will be deterministic, no new PRs will be created if no schema has changed

Why

To track data schema changes.

I will follow up with 2 PRs:

  • Add support to list models added/deleted
  • Make CI job to do diff for public datasets schema (from /schemas dir) and create changelog/source_data.md

Question

Does the format for changelog look alright? Happy to discuss other ideas

@amishas157 amishas157 force-pushed the patch/add-data-changelog branch from ef8ff2d to 711a51f Compare October 29, 2024 21:39
@amishas157 amishas157 marked this pull request as ready for review October 29, 2024 21:54
@amishas157 amishas157 requested a review from a team as a code owner October 29, 2024 21:54
run: |
cd $GITHUB_WORKSPACE
output=$(. scripts/update_dbt_marts_schema_changelog.sh)
echo "$output" > changelog/dbt_marts.md
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm is it possible to make this append instead of overwrite the entire changelog each run?

There might be a time we delete the elementary.alerts_schema_changes table like the times we have deleted the whole elementary dataset because it was broken or elementary was doing something unexpected.

If it's too hard to make this append instead of overwrite we should remember to keep a copy of elementary.alerts_schema_changes otherwise we'd lose changelog history

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hrm this process is pretty dependent on elementary.alerts_schema_changes. Even if we were to delete this dataset ever, that in itself will affect changelog computing part. So ideally we should not delete that dataset.

However as far as append is concerned, we could

  • parse tracked date if any
  • add date filter to get data between tracked date and today
  • update the tracked date as max(date)

Structure of .md file:

Date: 2024-10-10

Table 1

Date: 2024-09-01

Table 2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But also, I think it's quite a bit of text formatting. We can leave it as is and can revisit later if needed. In the first place, I don't think we are going to see a lot of schema changes in general.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup sounds good

update
@amishas157 amishas157 force-pushed the patch/add-data-changelog branch from d7dbf9f to 8b420dc Compare October 29, 2024 22:32
@amishas157 amishas157 merged commit 56f581c into master Oct 29, 2024
2 checks passed
@sydneynotthecity sydneynotthecity deleted the patch/add-data-changelog branch November 21, 2024 03:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants