-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[0.6] Aggregation: repeat same request on retries. #2556
Conversation
9df2ee2
to
2bb030b
Compare
99f9cc4
to
1e8c518
Compare
This is implemented by copying the necessary report information into the report aggregations at time of aggregation job creation. This means that client-report GC will not change the request that is made.
1e8c518
to
35afcf0
Compare
(This is rebased & ready for review.) |
@@ -99,7 +99,7 @@ macro_rules! supported_schema_versions { | |||
// version is seen, [`Datastore::new`] fails. | |||
// | |||
// Note that the latest supported version must be first in the list. | |||
supported_schema_versions!(3, 2, 1); | |||
supported_schema_versions!(3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'll want to discuss in release notes that deploying whichever Janus version this PR goes out in is not backward compatible and that operators will want to work out a DB migration plan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻, but as you mentioned in https://github.com/divviup/janus-ops/issues/1257, the DB migration is actually related to #2555 which is going out in this week's release. This PR will go out with the next release, once that migration has stabilized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. My biggest concern here is the upgrade path. The unit tests appear to exercise it, but it might be good to de-risk it further by running a heterogeneous load test, using this week's release image as the aggregation job creator, and this PR's code for the aggregation job driver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. My biggest concern here is the upgrade path. The unit tests appear to exercise it, but it might be good to de-risk it further by running a heterogeneous load test, using this week's release image as the aggregation job creator, and this PR's code for the aggregation job driver.
Great idea -- I'll set up a local load test to do just that before merging this PR.
I ran a local load test with the suggested configuration for about an hour, and I can see collection jobs completing with the expected values. I think this PR is about as good as it's gonna get, I'm going to go ahead and merge. |
This is implemented by copying the necessary report information into the report aggregations at time of aggregation job creation. This means that client-report GC will not change the request that is made.
I plan to follow this up with a PR to move report scrubbing from the end of the first aggregation step to aggregation job creation. This would have been included in this PR, but this PR is already quite large.
See individual commits for more detail in some cases--aggregation job creation & driving test cases are enhanced somewhat.
Backport of #2516.