Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/country names #246

Merged
merged 15 commits into from
Dec 24, 2024
Merged

Fix/country names #246

merged 15 commits into from
Dec 24, 2024

Conversation

coryamanda
Copy link
Collaborator

@coryamanda coryamanda commented Dec 1, 2024

Description

Fix to standardize country names in user_geos according to Martina's sheet: https://docs.google.com/spreadsheets/d/1UB3flfF2brnsWcSdp5khbrGf3Mp1PA7t9GgQMSZrENE/edit?gid=1271278978#gid=1271278978

Secondary goals:

  • create a reference table in Hydrone with ISO country info so that they can create standard regional groups more easily
  • standardize country names in dim_hoc_starts
  • apply capitalization logic to city and country in dim_hoc_starts

Links

Jira ticket(s): https://codedotorg.atlassian.net/browse/DATAOPS-1046

Testing story

  • Does your change include appropriate tests on key columns?
    eg.
    - not_null
    - unique
    - `dbt_utils.unique_combination_of_columns: , ["value","value","value"...]

Note: when submitting a new model for review please make sure the following have been tested:

  1. The model compiles (dbt build -m 'your_model')
    or: has the dbt Cloud job succeeded?
  2. The model runs (dbt run -m 'your_model')
  3. The model produces accessible data in the DW (select 1 from 'your_model')

Privacy

  • 1. Does this change involve the collection, use, or sharing of new Personal Data? No
  • 2. Do these data exist in the appropriate schema(s)? Yes
  • 3. Does this change involve a new or changed use or sharing of existing Personal Data? No
  • 4. Consider: will this data be visible on Tableau? will this data be surfaced in a report exported from Trevor? No
  • 5. If yes to any of the above, please list the models, columns, and justification below:
    i.
    ii.
    iii.

PR Checklist:

--> Note: if these are not all checked, the PR will be sent back.

  • Tests provide adequate coverage
  • Privacy and Security impacts have been assessed
  • Code adheres to style guide👀 and is DRY
  • Code is well-commented (**please do not leave extraneous commentary in model code, if it is for the purpose of documentation, please relocate accordingly)
  • Appropriate documentation has been provided (see .yml., did dbt docs generate succeed?)
  • New features are translatable or updates will not break up/downstream models
  • Relevant documentation has been added or updated (i.e. dbt docs has been updated successfully on Github Pages
  • Pull Request is labeled appropriately (eg. chore/, feature/, fix/)
  • Follow-up work items (including potential tech debt) are tracked and linked (if applicable)

Copy link
Collaborator

@allison-code-dot-org allison-code-dot-org left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a few notes to consider- otherwise LGTM!

dbt/models/marts/misc/_misc_models.yml Show resolved Hide resolved
from {{ref('seed_country_standardizations')}}
),

combined as (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see an argument for having this in user_geos, but if we wanted to keep the principle of staging tables only referencing a single model, we could integrate the country mappings in dim_users instead. This data is not surfaced to the end user in the staging model, so I think it might make more sense if this logic was in the dimensional layer.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user_geos table feeds dim_student_projects, not just dim_users. These are corrections we want to apply universally and they won't vary for different models, so I'd rather do them as upstream as possible. To keep the idea of staging referring to a single input model, would it make more sense to you to just do the corrections as code directly in the staging table? There aren't a lot (<15) but that number could grow over time.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha, I think how you have it makes sense then. The only other way I could think of doing it is having a dim_user_geos model, instead of only a staging and then changing all the references downstream. Thoughts?

@coryamanda coryamanda marked this pull request as draft December 4, 2024 21:35
- Converts the seed file lookup for country name edits to a macro
- Applies macro to HOC + user_geos
- Adds additional columns to dim_country_reference
- Docs
@coryamanda coryamanda marked this pull request as ready for review December 20, 2024 07:10
@coryamanda coryamanda merged commit 950b679 into main Dec 24, 2024
1 check was pending
@coryamanda coryamanda deleted the fix/country_names branch December 24, 2024 05:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants