-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Infinite loop when promoting image #2968
Comments
@stephaneetje my apologies, as I seem to have misunderstood the nature of your loop. With the previous discussion having taken place in a thread about This means the cause of your loop has nothing to do with a mismatch of the desired revision. The continuous syncing is probably due to the argocd-update step syncing, then when checking to see if the sync completed, discovering that the most recently completely sync wasn't the one it asked for. It's waiting for its request to complete and seeing a sync completed, but that operation must have been initiated by someone or something else, so it tries again. Do you happen to have auto-sync enabled on your Apps? It is possible that is what the update step is fighting a losing battle against. We do generally suggest disabling autosync oh Apps when you use Kargo. I will address it in the docs to make that more clear. If this ends up not being the issue, enabling debug logging on the controller should provide more insight into what conditions are causing the update step to be caught in this sync loop. |
Oh. I had no doubt of that. The mystery is why Kargo is syncing over and over again -- which would usually be because:
I'm trying to piece together what's going wrong at step 3 and it was conjecture that something besides Kargo might (additionally) be initiating sync operations. I am attempting to replicate this. |
Running a build with #2936 in it should make it much easier to trace the reason without having to try to reproduce it. |
Also, I have tried quite hard to replicate this. I have 13 Projects x 3 Stages per project, each with an associated Argo CD App. All 39 Stages are reading from and writing to just one single branch. Here's a gist of what one Project looks like: https://gist.github.com/krancour/dafa7ded1b4db4c0683c05e7f01129e8 There are 12 more just like it. I've been unable to trigger any kind of infinite sync loop. I'll note the problem I did run into: With lots and lots of Promos running concurrently, all racing to push to the same remote branch, unsurprisingly, some of those failed because in the space between a Promotion checking out the branch and pushing to it, some other Promotion pushed its own commit. Re-trying any Promotion that failed for these reasons put things back on track. This will be a problem for anyone who has many Stages making updates to a single branch and a large volume of Promotions. (Add this to the list of problems that are solved for free by embracing the notion of branches as env-specific config storage.) Not to get too sidetracked here, but there's a limited amount we can do about the above because Promotion processes are user-defined and the actions carried out by each step are opaque to the rest of Kargo. This means there's no possibility of "try again starting from step n," but I will open a separate issue with a proposal for a "FailurePolicy" that could have, among other options, the option to start over from step 0. Circling back to the subject at hand, as I said, I couldn't reproduce that issue, so it would be interesting @stephaneetje if you are able to identify what you and I have done differently. I attempted to replicate your configuration as best I could, but there there's clearly some consequential difference. |
|
I have the same problem and this is what I found in the logs:
While Is it by any chance related to using the same source branch to merge the results back to? |
@krancour i just tried with only the name :
Step passes well, but i end up with the application being synced successfully, then outofsync (image has not been updated). I checked the revision used by the sync, it is the good one with the image change. I have to do another manual sync Also kargo documentation sais that |
This has the appearance of an Argo CD quirk/issue and not a Kargo issue.
Ahhhhh. This explains everything. 😄 I had not noticed that source update.
When you do perform any kind of updates on a source, there's a whole additional layer of stuff that happens when checking if the sync was successful. It compares what the sources look like to what you want them to look like. The discrepancy was here (somehow?) and because of it, the sync was being re-attempted over and over again because it kept looking like it hadn't worked. |
Does Kargo wait for refresh to finish before sending a sync signal? |
We patch the Application with a sync operation. When Argo CD syncs, it is smart about not relying on its cache. So the short answer is no. The longer answer is, "no because there's no need." After the sync is initiated, Kargo follows up on it as many times as necessary to find evidence that the exact sync operation it initiated has since completed. |
Seems like app starts showing out of sync only when sync is already running, indicating there can be some race. Idk about exact reasons though. |
Do you mean you patch the application sync operation directly instead of calling the server endpoint to do so? That might explain what we saw when trying to repro using ArgoCD CLI and being unable to. I'm pretty sure some steps are missing to do sync properly. How hard would it be to just trigger sync via endpoint/calling the code instead of patching the application? |
In particular, the observed behavior in ArgoCD UI is that Sync is clickable, shows a new duration, but shows old sync's data for a few seconds. This doesn't look intended. |
This doesn't sound like a Kargo problem. Note that #2980 should probably fix the original problem described by this thread. |
@krancour, I don't think ArgoCD is expected to work correctly with surgical replacements of things that are supposed to be done by using more high-level APIs. The evidence of showing both old and new data in the Sync info for a bit only supports that, at least that Kargo doesn't update the sync operation on the app properly. But operation state seems like an internal field managed by ArgoCD, so manually updating it doesn't seem like it should be done. It's like using library's internal function. |
I'm afraid this is incorrect. When you sync an App via Argo CD API, it's merely a request to sync and is handled the same way we do it. i.e. It patches the application with a sync operation: In either case, it is the Application controller that is doing all the actual work. |
The original issue here should be closed by #2980. |
How does Kargo patch the manifest? Can you point me to the code, please? What if ArgoCD controller changes the way it patches? |
Kargo is indeed setting the
Kargo would have to adapt. We do copy the implementation: However, I'm comfortable with this design choice, having wrote the original Argo CD code and understanding that the sync operation logic has not changed in incompatible way since basically its inception. As Kent already pointed out, we are merely requesting the sync operation by setting the initial |
@jessesuen, the Kargo link points to ArgoCD repo. I think I've found the relevant code though kargo/internal/directives/argocd_updater.go Line 476 in cc293f0
When comparing it with SetAppOperation, I don't see an analog to clearing I've seen different approaches for constructing the operation in server and app controller https://github.com/argoproj/argo-cd/blob/32cc6638f78e15652c24eb2cd6403e6e7ec26ab0/server/application/application.go#L1968-L1985, https://github.com/argoproj/argo-cd/blob/32cc6638f78e15652c24eb2cd6403e6e7ec26ab0/controller/appcontroller.go#L1987-L2051. Comparing with manual sync in server, Kargo doesn't set prune for example. Comparing with automated sync in controller, it doesn't do checks for already attempted syncs, tho some of it is specific to autosync's options. There are more differences, but I'm not sure they'd cause the behavior described. Overall, I think clearing the app.Status.OperationState needs to be added. As for Prune etc., not sure if Kargo has controls for this, but should be updates as well if yes. |
This is correct. There are options it doesn't set. There is no reason that certain high-value options cannot be surfaced in the
It's not clear what you think this will accomplish. Great care is taken already to verify that a completed operation is the exact operation we requested. This is accomplished by comparing the details of Here is the relevant code: kargo/internal/directives/argocd_updater.go Lines 365 to 474 in cc293f0
Edit: It's also not correct to set status on a resource we're not the controller for. |
Here we go #2984.
Parity with how ArgoCD does things, as well as consistency of the state. If the previous sync has finished and we add a new operation, keeping old status's operation state would not reflect the operation and probably results in the UI inconsistency showing new sync started, but state of the previous sync. I'll open a separate issue for this though. |
@jessesuen, @krancour, I've created #2985. |
Hello,
I've been trying to use kargo to only promote image tags (using helm-image-update) but on all my stages, i get an inifnite loop at the argocd-update step.
Here is my stage:
I also tried removing
desiredCommitFromStep
but didn't change anything.My warehouse is only subscribing an image repository:
You can see in the Argocd app's envents that it is looping
Discussion has began on this issue, opening another one as asked by @krancour
The text was updated successfully, but these errors were encountered: