Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
this PR adds the section/teacher/school/district associated with the student in the school year of their activity. This will duplicate records for every section the student is in in a given school year for every day of activity. Since the model already requires a count distinct on almost every aggregation one would make, we anticipate the risk of overcounting to be low, and a risk we think is worth the associated simplicity of adding the ids to a single model.
Another risk to consider is the number of rows this addition would be adding to dim_active_students. Before the addition, here are the row counts by school year:
2019-20 | 63707364
2020-21 | 72704669
2021-22 | 78850475
2022-23 | 77495252
2023-24 | 77697216
2024-25 | 31403006
After the addition of section_id:
2019-20 | 69489948 (9% increase)
2020-21 | 79677993 (9.5% increase)
2021-22 | 86568860 (9.8% increase)
2022-23 | 85426341 (10.2% increase)
2023-24 | 85116388 (9.5% increase)
2024-25 | 33936091 (8% increase)
The other solution proposed was a separate model on the school_year/student_id/ section_id grain, which would not have the daily duplication that this solution would have. However, many use cases for counting up active students (e.g. district team wanting to count active students by month among enrolled districts) would not be able to be served by this solution.