-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not store output share in report aggregations. #1299
Conversation
This was never necessary: when an output share is recovered, its contribution to the relevant batch aggregation is always recorded, and the collection process solely uses the batch aggregation values. The output share was stored but never (meaningfully) read. Historically, this is an artifact of the initial implementation of the aggregation process, which occurred before the collection process was written. At time of initial implementation, there was no batch aggregation to record aggregated values into.
I dropped some code which would attempt to accumulate any aggregations that Finished during initialization, because it is impossible for an aggregation to finish during initialization (i.e. this was effectively dead code). I'll need to fix this in a smarter way in #1234, since the new aggregation scheme does allow aggregation to complete during initialization. |
@@ -0,0 +1 @@ | |||
ALTER TABLE report_aggregations DROP COLUMN out_share; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the migration plan here? I think there's no data in staging-dap-04
's report_aggregations
table, but will Janus start throwing errors if we deploy this migration ahead of deploying a new Janus? Or maybe it won't since staging-dap-04
is idle?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the current plan is to accept the errors, since as you say staging-dap-04
is effectively unused for the time being. Our current migrations effectively exercise the migration infra, but may not fulfill all the requirements of a production migration -- for example, 20230417204528 would have required a data migration/backfill if there were any data to be migrated/backfilled. (one might argue we should eventually yank releases that can't be migrated to higher release versions, except that I'm not sure anyone currently cares.)
That said, maybe that's not the right choice? What we really want is for this migration to run after the code in this PR is fully deployed. There are a couple of ways we could engineer this:
- Implement & document a way to split migrations into "pre-release" and "post-release" migrations, and phrase this as a "post-release" migration.
- Just split the data migration part of the change to a separate PR, and manually wrangle it to be released with the next release after the code changes in this PR.
WDYT? I'd like the ability to engineer data migrations to be pre- or post-release, but this might be too much work immediately. Manually wrangling each data change that needs to happen post-release so that it is included in a later release is easy to choose as a policy, but finicky & somewhat error-prone.
For now, my vote would be "accept whatever errors occur during the release process" since there is no practical downside to doing this, but I'm interested in your thoughts -- especially as to whether pre-/post-migrations are a good idea/feasible to implement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm comfortable with temporarily breaking staging-dap-04
while we deploy this migration, because nobody is currently relying on that deployment and thus we shouldn't take on extra work to maintain its availability.
But suppose we were doing this in a live production environment, as a thought exercise: I think what we'd do is this:
- Make a change to Janus that adds the migration script and makes code changes so that Janus can work with or without the migration applied (we can have confidence of this because our tests run against multiple schema versions)
- Deploy the new Janus
- Apply the migration
- Make another change to Janus that removes support for the old schema version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need any additional features to differentiate such migrations, rather, more complicated changes should be broken up into multiple deployable changes, and the sequencing of database migrations and code deploys will be determined on a case-by-case basis, then enforced by SUPPORTED_SCHEMA_VERSIONS
checks at runtime and in tests. This change is uniquely simple, since we're deleting a write-only column. If we were deleting a non-null column that we were both reading and writing, the sequence could look something like:
- Existing state: schema version 0 is deployed, and the deployed code only supports schema version 0
- Code deploy: stop reading from column, supports schema versions 0 and 1
- Schema migration, 0 => 1: make column nullable
- Code deploy: stop writing to column, supports schema versions 0, 1, and 2
- Schema migration, 1 => 2: delete column
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change LGTM, assuming tests pass.
This was never necessary: when an output share is recovered, its contribution to the relevant batch aggregation is always recorded, and the collection process solely uses the batch aggregation values. The output share was stored but never (meaningfully) read.
Historically, this is an artifact of the initial implementation of the aggregation process, which occurred before the collection process was written. At time of initial implementation, there was no batch aggregation to record aggregated values into.
Closes #1284.