Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

git-push subject to race conditions when many stages push to same mono repo #3071

Closed
jessesuen opened this issue Dec 5, 2024 · 11 comments · Fixed by #3119
Closed

git-push subject to race conditions when many stages push to same mono repo #3071

jessesuen opened this issue Dec 5, 2024 · 11 comments · Fixed by #3119

Comments

@jessesuen
Copy link
Member

jessesuen commented Dec 5, 2024

Description

Two or more stages might attempt to push to same repo at the same time (e.g. using promote downstream button). When this happens, the git-push step might intermittently fail like:

step execution failed: failed to run step "git-push": error pushing commits to remote: error pushing branch: error executing cmd [/usr/bin/git push origin HEAD]: remote: Bypassed rule violations for refs/heads/main: remote: remote: - Changes must be made through a pull request. remote: To https://github.com/example/guestbook-deploy.git ! [remote rejected] HEAD -> main (cannot lock ref 'refs/heads/main': is at 74291c2cb80ee3f5c6e5e6e329e13a5a1866ca61 but expected 473aff63b3da037e6c6d4a3de2dabaee3c196c83) error: failed to push some refs to 'https://github.com/example/guestbook-deploy.git'

A workaround may be to use the retry feature, but I haven't verified this.

Steps to Reproduce

Have multiple stages promoting at same time while modifying the same gitops repo. Timing should try to overlap

Version

v1.2.0-unstable-20241205
@krancour
Copy link
Member

krancour commented Dec 5, 2024

A workaround may be to use the retry feature, but I haven't verified this.

I don't think retry will work. It will just retry the same push.

This was discussed offline on Monday and @hiddeco's suggestion had been to make the push step able to detect when the local branch is behind the remote and rebase.

If multiple Stages are working with the same branch, they should at least be working with different files, so merge conflicts should be rare.

@hiddeco
Copy link
Contributor

hiddeco commented Dec 5, 2024

I think there are a couple of additional error strings which we should look for:

The easiest way to rebase is likely: git pull --rebase.

@krancour
Copy link
Member

krancour commented Dec 6, 2024

@hiddeco instead of trying the push and then looking for specific error messages, what if we pre-emptively pull --rebase?

@hiddeco
Copy link
Contributor

hiddeco commented Dec 9, 2024

Assuming that when this still fails, we can retry the step if such a configuration is set — I think this could work.

@krancour
Copy link
Member

krancour commented Dec 9, 2024

Assuming that when this still fails, we can retry the step if such a configuration is set — I think this could work.

I don't think retrying the step changes much, right? If there's a merge conflict that cannot be resolved automatically with a rebase, I think we'd need to retry the whole promotion process from the beginning. I'm not opposed to that, but "retry everything" seems like its own separate feature.

Edit: The notion of retrying a whole Promotion is discussed in #2972

@krancour krancour modified the milestones: v1.1.1, v1.1.2 Dec 9, 2024
@krancour krancour self-assigned this Dec 9, 2024
@ksawerykarwacki
Copy link

This is a critical blocker, without it you effectively cannot have more than one flow (warehouse + stages) or multiple projects working on single repo. Effectively forcing everyone into multirepo approach even if they use different branches for distinct stages (but stages contain multiple apps as you can end up with the race condition).

@hiddeco
Copy link
Contributor

hiddeco commented Dec 10, 2024

I don't think retrying the step changes much, right?

My retry was aimed at the scenario where there would still be a new push from another Promotion between the time the Promotion does a git pull --rebase and a git push, which hypothetically is possible and could be retried.

@krancour
Copy link
Member

Ah. That makes sense.

@krancour
Copy link
Member

This is a critical blocker, without it you effectively cannot have more than one flow (warehouse + stages) or multiple projects working on single repo. Effectively forcing everyone into multirepo approach even if they use different branches for distinct stages (but stages contain multiple apps as you can end up with the race condition).

We're aware of the problem this poses, which is why we opened the issue.

You're overstating the implications a bit, however. This does not force you into using multiple repositories.

The problem is simply one of multiple Promotion processes writing to the same branch. If you consume charts or templates or something from the main branch of your monorepo, but write relevant, updated, or rendered parts to a Stage-specific branch (which you even mention) then there it no issue at all and you need not use multiple repos.

@ksawerykarwacki
Copy link

ksawerykarwacki commented Dec 10, 2024

@krancour yes but this assumes that you have one promotion flow per repository. Theoretically you can have multiple promotion flows (e.g. one per app or team) which may use shared stages (e.g. specific Kubernetes clusters) in monorepo. As there is no locking between promotion flows / projects on access to repo you can end up with this race condition easily.

Of course, this is not an issue if you have one project and shared steps for multiple warehouses, but I assume that many teams (including mine) would prefer to have deeper separation of promotion flows. It might be an antipattern but if that is the case then this should be highlighted in the documentation which currently lacks examples for scaling and multitenancy handling.

It might not be a big issue from technical perspective, but this leads to unpleasant experience for everyone trying to adopt Kargo to their workflow (I run into it within 10 minutes after my first installation) which may discourage some folks.

Regarding the issue itself, pull + push and trying to merge before failing seems like a way to go. Optional flag to pull and rebase before push is just enough. If you have deeper conflict, then you must resolve it manually anyway so you can restart promotion from scratch.

@krancour
Copy link
Member

I will re-iterate that the only problem is multiple stages writing to the same branch. If that is not something you are doing, there is no problem.

If it is something you are doing, then the fix will be in 1.2 or possibly even in 1.1.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants