You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When the cron runs if it has never ran before, it goes all the way back to the earliest start_date in the database. This could potentially result in a huge query if there are old courses in here. We should probably have some limit configured that could be overridden via the cron as most of the time we don't need this old data. I'm not sure how else to avoid it without changes to BigQuery like partitions, that we are unable to do.
In addition, it will process older courses that don't actually have any activity and don't need processed. We should add a "last_accessed_date" field or similar to the course table to keep track if we actually need to update it.
Describe the solution you'd like
Add a date either dynamic or config that limits the earliest date data will be returned. This could be 4 months, 6 months a year. Just something lower than everything. We also need a way to override this on the cron if completely necessary.
This is a two part fix, and this is the first part. The second part is PR #1574.
This part will add the last_accessed_date field to the course table that will be used in the second part to limit the earliest date data will be returned.
Describe any possible alternatives you've considered
We have considered only running on active courses, but in testing and on the first run it might not be known what's active or not. Once. this has run once we shouldn't have this problem again.
Additional context
The text was updated successfully, but these errors were encountered:
Thank you for contributing to this project!
Describe your problem or feature you'd like added
When the cron runs if it has never ran before, it goes all the way back to the earliest start_date in the database. This could potentially result in a huge query if there are old courses in here. We should probably have some limit configured that could be overridden via the cron as most of the time we don't need this old data. I'm not sure how else to avoid it without changes to BigQuery like partitions, that we are unable to do.
In addition, it will process older courses that don't actually have any activity and don't need processed. We should add a "last_accessed_date" field or similar to the course table to keep track if we actually need to update it.
Describe the solution you'd like
Add a date either dynamic or config that limits the earliest date data will be returned. This could be 4 months, 6 months a year. Just something lower than everything. We also need a way to override this on the cron if completely necessary.
This is a two part fix, and this is the first part. The second part is PR #1574.
This part will add the
last_accessed_date
field to the course table that will be used in the second part to limit the earliest date data will be returned.Describe any possible alternatives you've considered
We have considered only running on active courses, but in testing and on the first run it might not be known what's active or not. Once. this has run once we shouldn't have this problem again.
Additional context
The text was updated successfully, but these errors were encountered: