Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(controller): git-push step: pull --rebase before push #3119

Merged
merged 9 commits into from
Dec 17, 2024

Conversation

krancour
Copy link
Member

@krancour krancour commented Dec 11, 2024

Fixes #3071

Although this is WIP 👀 would still be helpful, @hiddeco. 🙏

Apart from the git bits of this, this PR is the catalyst for the step execution engine to start distinguishing between errors/failures that are terminal and ones that can be retried. This seemed like the time, because once a merge conflict that requires manual resolution has been detected, there is no point in performing any retries, no matter what the retry policy says.

I tried to implement what we discussed offline, but I think you'll have some good constructive feedback on this.

r4r

Copy link

netlify bot commented Dec 11, 2024

Deploy Preview for docs-kargo-io ready!

Name Link
🔨 Latest commit 54b2a27
🔍 Latest deploy log https://app.netlify.com/sites/docs-kargo-io/deploys/6761ab52e96182000823138d
😎 Deploy Preview https://deploy-preview-3119.docs.kargo.io
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link

codecov bot commented Dec 11, 2024

Codecov Report

Attention: Patch coverage is 56.17978% with 78 lines in your changes missing coverage. Please review.

Project coverage is 51.27%. Comparing base (69493aa) to head (54b2a27).
Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
internal/controller/git/work_tree.go 8.51% 40 Missing and 3 partials ⚠️
internal/directives/simple_engine_promote.go 61.22% 18 Missing and 1 partial ⚠️
internal/directives/git_pusher.go 80.59% 11 Missing and 2 partials ⚠️
internal/directives/errors.go 62.50% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3119      +/-   ##
==========================================
+ Coverage   51.26%   51.27%   +0.01%     
==========================================
  Files         283      285       +2     
  Lines       25563    25706     +143     
==========================================
+ Hits        13104    13182      +78     
- Misses      11762    11824      +62     
- Partials      697      700       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@krancour krancour changed the title Krancour/pull rebase fix(controller): git-push step: pull --rebase before push Dec 11, 2024
@krancour krancour force-pushed the krancour/pull-rebase branch from 9b3979c to 0fac579 Compare December 11, 2024 05:03
@krancour krancour marked this pull request as draft December 11, 2024 05:06
internal/controller/git/errors.go Show resolved Hide resolved
internal/controller/git/work_tree.go Outdated Show resolved Hide resolved
@krancour krancour force-pushed the krancour/pull-rebase branch from 3d4ccb1 to efa806f Compare December 16, 2024 18:39
@krancour krancour marked this pull request as ready for review December 16, 2024 18:40
@krancour krancour requested a review from a team as a code owner December 16, 2024 18:40
// is indeed the maximum number of attempts.
backoff.Steps = int(*cfg.MaxAttempts)
} else {
backoff.Steps = 50
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This default seems pretty high, I think something like 10 would probably be better. Unless you have a specific reason to opt for this value?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the following, and ~20 promos running concurrently and writing to the same branch, I was occasionally able to exhaust attempts:

wait.Backoff{
	Duration: 1 * time.Second,
	Factor:   2,
	Steps:    10,
	Cap:      2 * time.Minute,
	Jitter:   0.1,
}

Being aware of how extensive some users' monorepos are, I presumed 10 to be not nearly enough.

The effective defaults are now:

wait.Backoff{
	Steps:    50,
	Duration: 10 * time.Millisecond,
	Factor:   5.0,
	Jitter:   0.1,
}

I think I still like defaulting to a reasonably high number of steps, but what I definitely didn't account for here is that retry.DefaultBackoff has no cap -- which makes 50 steps absurd.

What do you think about leaving it 50, but capping it at something like 5s?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, I could use a mutex to make the pull + rebase + push a critical section of code.

Given that controllers can be sharded, it won't eliminate the need for retries entirely, but it would improve the situation considerably.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was using this default backoff because, iirc, you'd suggested it, but I'm rethinking that.

In general, the best way to avoid conflicts will be spacing out the retries of the multiple concurrent promos...

That's got me thinking about a larger starting interval and a larger jitter.

At the same time, increasing the interval by a factor of five at each step seems way too aggressive. Even two seems aggressive.

I'm contemplating something like this:

wait.Backoff{
	Steps:    10,
	Duration: time.Second,
	Factor:   1.5,
	Jitter:   0.5,
	Cap:   30 * time.Second,
}

I'll try that in the morning to see how it performs.

Copy link
Member Author

@krancour krancour Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following works well on its own. It reduced the maximum number of retries I saw in my own example to 6 and most time, there was success with fewer retries, if any at all.

wait.Backoff{
	Steps:    10,
	Duration: time.Second,
	Factor:   1.5,
	Jitter:   0.5,
	Cap:   30 * time.Second,
}

I also added a repo + branch lock, so retries should only ever be necessary in the event of sharded controllers concurrently pushing to the same branch.

With this added, I see no retries at all.

Both improvements are in 54b2a27.

@krancour krancour added this pull request to the merge queue Dec 17, 2024
Merged via the queue into akuity:main with commit 3223783 Dec 17, 2024
17 checks passed
@krancour krancour deleted the krancour/pull-rebase branch December 17, 2024 17:21
github-actions bot pushed a commit that referenced this pull request Dec 17, 2024
Signed-off-by: Kent Rancourt <[email protected]>
Co-authored-by: Hidde Beydals <[email protected]>
(cherry picked from commit 3223783)
@akuitybot
Copy link

Successfully created backport PR for release-1.1:

diegocaspi pushed a commit to diegocaspi/kargo that referenced this pull request Dec 18, 2024
Signed-off-by: Kent Rancourt <[email protected]>
Co-authored-by: Hidde Beydals <[email protected]>
Signed-off-by: Diego Caspi <[email protected]>
fykaa pushed a commit to fykaa/kargo that referenced this pull request Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

git-push subject to race conditions when many stages push to same mono repo
3 participants