-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add per task report upload metrics (0.6 backport) #2513
Conversation
Using this for load testing, since 0.7 is WIP. In reality I think this will need to be two PRs over two releases, to account for DB migration. |
f3bcc7f
to
86e70e2
Compare
Load test results: Each trial is the last 15 minutes of our standard load test run at 100QPS. The task is Time Interval Prio3Count. This should exercise the slowest path--validation that requires DB access and a successful report. System details:
100QPS Baseline (Janus release 0.6.9)n.b. there is no transaction error graph since this transaction can't fail due to serialization errors. Overall the results look to be non-regressive, as long as the shard count is greater than 1 to avoid serialization errors. Note that these graphs record the total memory usage of the postgresql container, but I think that turned out to be meaningless as it includes the memory used for buffer. Either way, I don't observe any strange behavior wrt memory. Note that some graphs contain a period of high latency--this is noise from the kubernetes cluster HPA not having scaled the aggregator deployment to a happy st ate. |
I'll need to break this PR into two to facilitate a zero downtime PostgreSQL migration. |
Actually, I don't think this is true. Our rollout cadence is like so:
Note that we don't wait for 1's success during deployment. The pods will remain in a crash loop until the database has been updated. Meanwhile, traffic won't be shifted to the new deployment. |
I think it would be preferable to do this in two deploys as follows:
If we did this in a single deploy, note that the old ReplicaSet wouldn't be able to start any new pods until the schema migration was applied. If we encountered issues during the deploy, this would complicate the response. |
ec1bf58
to
aa76c41
Compare
Fair enough. I was relying on the "old replica set won't start new pods" behavior to make this work, but that is indeed chaotic. |
aa76c41
to
6480c91
Compare
6480c91
to
f910486
Compare
* Add per task report upload metrics. * Change default to 32, add documentation on option * Fix test * Build query instead of brute forcing each possible one * Don't wait on bad reports * Use Runtime and RuntimeManager instead of sleeping in tests * Clippy * Cargo doc * Don't use macro needlessly Co-authored-by: Brandon Pitman <[email protected]> --------- Co-authored-by: Brandon Pitman <[email protected]>
Don't change existing schema
f910486
to
401e53d
Compare
Supports #2293
Backport of #2508 and #2537.
This PR should not be merged until #2553 is merged and in a release. In other words, these PRs should not be in the same release.