Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up deploy optimistically #106

Merged
merged 1 commit into from
Jan 31, 2025
Merged

Conversation

yosifkit
Copy link
Member

Trust that if an item was pushed successfully in a previous deploy, then we don't need to push it this time.

We can "Replay" a job with an adjusted filter stage to skip the filtering (i.e., skip the wget and just use an empty past-deploy.json) to get the previous behavior of pushing everything.

Test jobs:

@yosifkit yosifkit requested a review from a team as a code owner January 25, 2025 00:52
@yosifkit
Copy link
Member Author

With this, we could likely change the rateLimitBuilds to allow the job to run more often.

@yosifkit
Copy link
Member Author

Time testing with the large deploy.json from amd64:

$ time jq -L. 'include "deploy"; arch_tagged_manifests("amd64") | deploy_objects[]' ../meta/builds.json > deploy.json

real    0m3.578s
user    0m3.504s
sys     0m0.040s
$ cp deploy.json past-deploy.json
$ ls -lnh *deploy.json
-rw-r--r-- 1 1000 1000 3.3M Jan 24 16:59 deploy.json
-rw-r--r-- 1 1000 1000 3.3M Jan 24 16:59 past-deploy.json
$ time jq --slurpfile past past-deploy.json 'select( IN($past[]) | not )' ./deploy.json

real    0m0.694s
user    0m0.691s
sys     0m0.000s

! wget --timeout=5 -qO past-deploy.json "$JOB_URL/lastSuccessfulBuild/artifact/deploy.json" \\
|| ! jq 'empty' past-deploy.json \\
; then
touch past-deploy.json
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the second half of the conditional checking for valid JSON, this should probably do truncate with a zero size instead, right? (or we remove that check and let the jobs fail until we take explicit action, which is maybe safer since it'd be something unexpected and failing is easier to notice than slower)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yeah. let me adjust it. 🤦

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 We only need the if ! wget for the first time it is run. After that, does it make sense to fail if there isn't one? So, just as committed in the PR or perhaps without the jq test the first time and once they all run, drop the if and just let wget fail?

				if ! wget --timeout=5 -qO past-deploy.json "$JOB_URL/lastSuccessfulBuild/artifact/deploy.json"; then
					touch past-deploy.json
				fi

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah, we could do that, although if we ever need to bootstrap again it's a little bit of a hassle.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could even start with the raw failing wget version, perhaps leaving the touch version commented so we have an easier "replay" target and use replay to bootstrap all of them the first time?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(then we also have an obvious place to leave a comment right where we might investigate in the future, describing how if anything goes wrong here and we need to re-bootstrap, here's how to do it)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated it to fail if the wget fails with a note and touch command for bootstrapping.

Trust that if an item was pushed successfully in a previous deploy, then we don't need to push it this time
Copy link
Member

@tianon tianon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great -- we also need to couple this with a new job that does "trust ... but verify", but that can reasonably be a separate disconnected effort. 👍

We probably should wait to merge this until at least tomorrow, especially with the changes we deployed today (so we have enough time to make sure they're actually reasonably stable before we ramp down the load we're putting on them).

@tianon tianon merged commit 4f22d05 into docker-library:main Jan 31, 2025
1 check passed
@tianon tianon deleted the save-deploy branch January 31, 2025 17:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants