Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kargo app updater doesn't clear the app.Status.OperationState when setting application operation, leading to inconsistencies during ArgoCD sync #2985

Closed
2 of 4 tasks
andrii-korotkov-verkada opened this issue Nov 23, 2024 · 9 comments

Comments

@andrii-korotkov-verkada

Checklist

  • I've searched the issue queue to verify this is not a duplicate bug report.
  • I've included steps to reproduce the bug.
  • I've pasted the output of kargo version.
  • I've pasted logs, if applicable.

Description

Kargo app updater doesn't clear the app.Status.OperationState when setting application operation, see

func (a *argocdUpdater) syncApplication(
ctx context.Context,
stepCtx *PromotionStepContext,
app *argocd.Application,
desiredSources argocd.ApplicationSources,
) error {
// Initiate a "hard" refresh.
if app.ObjectMeta.Annotations == nil {
app.ObjectMeta.Annotations = make(map[string]string, 1)
}
app.ObjectMeta.Annotations[argocd.AnnotationKeyRefresh] = string(argocd.RefreshTypeHard)
// Update the desired source(s) in the Argo CD Application.
if app.Spec.Source != nil {
app.Spec.Source = desiredSources[0].DeepCopy()
} else {
app.Spec.Sources = desiredSources.DeepCopy()
}
// Initiate a new operation.
app.Operation = &argocd.Operation{
InitiatedBy: argocd.OperationInitiator{
Username: applicationOperationInitiator,
Automated: true,
},
Info: []*argocd.Info{
{
Name: "Reason",
Value: "Promotion triggered a sync of this Application resource.",
},
{
Name: freightCollectionInfoKey,
Value: stepCtx.Freight.ID,
},
},
Sync: &argocd.SyncOperation{
Revisions: []string{},
},
}
if app.Spec.SyncPolicy != nil {
if app.Spec.SyncPolicy.Retry != nil {
app.Operation.Retry = *app.Spec.SyncPolicy.Retry
}
if app.Spec.SyncPolicy.SyncOptions != nil {
app.Operation.Sync.SyncOptions = app.Spec.SyncPolicy.SyncOptions
}
}
if app.Spec.Source != nil {
app.Operation.Sync.Revisions = []string{app.Spec.Source.TargetRevision}
}
for _, source := range app.Spec.Sources {
app.Operation.Sync.Revisions = append(app.Operation.Sync.Revisions, source.TargetRevision)
}
. But ArgoCD does that in its SetAppOperation https://github.com/argoproj/argo-cd/blob/32cc6638f78e15652c24eb2cd6403e6e7ec26ab0/util/argo/argo.go#L820. This seems to result in ArgoCD UI showing a new sync operation started with details of the old sync for a few seconds, as operation and status are not consistent between each other. There might be deeper consequences like app remaining out of sync.

Screenshots

Screenshot 2024-11-22 at 5 45 17 PM
Screenshot 2024-11-22 at 5 46 12 PM
Screenshot 2024-11-22 at 5 46 15 PM
Courtesy of @stephaneetje.

Steps to Reproduce

Change the tag in helm values and trigger an update through Kargo.

Version

Not provided

Logs

Not provided

@andrii-korotkov-verkada
Copy link
Author

See the discussion from this comment #2968 (comment) and below for more context.

@krancour
Copy link
Member

I did mention before in the other thread that this is kind of a no-no. A resource's Status is meant to be manipulated by the controller responsible for it and not by others.

@andrii-korotkov-verkada
Copy link
Author

But you manipulate the operation field with bypassing server APIs ;) What's the substantial difference making it a no-no and operation field updates yes? However, I'll check if it's possible to clear the status field by the controller if it's about a completed operation finished before the current sync started.

@krancour
Copy link
Member

krancour commented Nov 23, 2024

But you manipulate the operation field with bypassing server APIs ;) What's the substantial difference making it a no-no and operation field updates yes?

This is Kubernetes 101. Status is a special field. It is one of only a handful of fields that Kubernetes regards as a "subresource" and treats differently. What is different about a subresource is that the endpoints for creating/updating a resource ignore changes to subresources. Instead, the subresources get their own endpoints.

RBAC is an area where you can see this on full display. Permission to update a resource kind X doesn't automatically come with permission to update X.status. That has to be requested separately. (Since, as I said -- it's a separate endpoint.) i.e. You would need to grant permissions on resource kind x/status. In practice, you probably see this pretty rarely though, because the only principals who usually get such permissions are the controllers responsible for those resources.

So if you were to look at the permissions of the Kargo controller, for instance, in addition to resource kinds like stages and promotions, you'd also find stages/status and promotions/status. For the Argo CD Application controller, in addition to applications, you'd see applications/status.

The reason you don't see this more often is precisely because, by convention, status subresources aren't meant to be manipulated by third parties. By convention, that field is exclusively for the controller responsible for the resource to reflect the observed state of the world. No one else should be touching it.

@andrii-korotkov-verkada
Copy link
Author

Thanks for the explanation. I'll look into whether it's possible to handle better at ArgoCD side. Maybe it's just as simple as reset status field on app update if it became obsolete.

andrii-korotkov-verkada added a commit to andrii-korotkov-verkada/argo-cd that referenced this issue Nov 23, 2024
…y a direct object update

Some systems like Kargo update the application operation directly to initiate a new sync, bypassing the server. However, they can't update the status operation state to become nil, resulting in discrepancy, see akuity/kargo#2985 and the discussion. Handle this case on the app controller side.

Signed-off-by: Andrii Korotkov <[email protected]>
andrii-korotkov-verkada added a commit to andrii-korotkov-verkada/argo-cd that referenced this issue Nov 23, 2024
…y a direct object update (argoproj#20875)

Some systems like Kargo update the application operation directly to initiate a new sync, bypassing the server. However, they can't update the status operation state to become nil, resulting in discrepancy, see akuity/kargo#2985 and the discussion. Handle this case on the app controller side.

Signed-off-by: Andrii Korotkov <[email protected]>
andrii-korotkov-verkada added a commit to andrii-korotkov-verkada/argo-cd that referenced this issue Nov 23, 2024
…y a direct object update (argoproj#20875)

May fix argoproj#20875

Some systems like Kargo update the application operation directly to initiate a new sync, bypassing the server. However, they can't update the status operation state to become nil, resulting in discrepancy, see akuity/kargo#2985 and the discussion. Handle this case on the app controller side.

Let's cherry-pick to v2.13.

Signed-off-by: Andrii Korotkov <[email protected]>
andrii-korotkov-verkada added a commit to andrii-korotkov-verkada/argo-cd that referenced this issue Nov 23, 2024
…y a direct object update (argoproj#20875)

May fix argoproj#20875

Some systems like Kargo update the application operation directly to initiate a new sync, bypassing the server. However, they can't update the status operation state to become nil, resulting in discrepancy, see akuity/kargo#2985 and the discussion. Handle this case on the app controller side.

Let's cherry-pick to v2.13.

Signed-off-by: Andrii Korotkov <[email protected]>
@andrii-korotkov-verkada
Copy link
Author

Trying this argoproj/argo-cd#20915

andrii-korotkov-verkada added a commit to andrii-korotkov-verkada/argo-cd that referenced this issue Nov 23, 2024
…y a direct object update (argoproj#20875)

Fixes argoproj#20875

Some systems like Kargo update the application operation directly to initiate a new sync, bypassing the server. However, they can't update the status operation state to become nil, resulting in discrepancy, see akuity/kargo#2985 and the discussion. Handle this case on the app controller side.

Let's cherry-pick to v2.13.

Signed-off-by: Andrii Korotkov <[email protected]>
andrii-korotkov-verkada added a commit to andrii-korotkov-verkada/argo-cd that referenced this issue Nov 23, 2024
…y a direct object update (argoproj#20875)

Fixes argoproj#20875

Some systems like Kargo update the application operation directly to initiate a new sync, bypassing the server. However, they can't update the status operation state to become nil, resulting in discrepancy, see akuity/kargo#2985 and the discussion. Handle this case on the app controller side.

Let's cherry-pick to v2.13.

Signed-off-by: Andrii Korotkov <[email protected]>
andrii-korotkov-verkada added a commit to andrii-korotkov-verkada/argo-cd that referenced this issue Nov 24, 2024
…y a direct object update (argoproj#20875)

Fixes argoproj#20875

Some systems like Kargo update the application operation directly to initiate a new sync, bypassing the server. However, they can't update the status operation state to become nil, resulting in discrepancy, see akuity/kargo#2985 and the discussion. Handle this case on the app controller side.

Let's cherry-pick to v2.13.

Signed-off-by: Andrii Korotkov <[email protected]>
andrii-korotkov-verkada added a commit to andrii-korotkov-verkada/argo-cd that referenced this issue Nov 24, 2024
…y a direct object update (argoproj#20875)

Fixes argoproj#20875

Some systems like Kargo update the application operation directly to initiate a new sync, bypassing the server. However, they can't update the status operation state to become nil, resulting in discrepancy, see akuity/kargo#2985 and the discussion. Handle this case on the app controller side.

Let's cherry-pick to v2.13.

Signed-off-by: Andrii Korotkov <[email protected]>
andrii-korotkov-verkada added a commit to andrii-korotkov-verkada/argo-cd that referenced this issue Nov 24, 2024
…y a direct object update (argoproj#20875)

Fixes argoproj#20875

Some systems like Kargo update the application operation directly to initiate a new sync, bypassing the server. However, they can't update the status operation state to become nil, resulting in discrepancy, see akuity/kargo#2985 and the discussion. Handle this case on the app controller side.

Let's cherry-pick to v2.13.

Signed-off-by: Andrii Korotkov <[email protected]>
andrii-korotkov-verkada added a commit to andrii-korotkov-verkada/argo-cd that referenced this issue Nov 24, 2024
…y a direct object update (argoproj#20875)

Fixes argoproj#20875

Some systems like Kargo update the application operation directly to initiate a new sync, bypassing the server. However, they can't update the status operation state to become nil, resulting in discrepancy, see akuity/kargo#2985 and the discussion. Handle this case on the app controller side.

Let's cherry-pick to v2.13.

Signed-off-by: Andrii Korotkov <[email protected]>
andrii-korotkov-verkada added a commit to andrii-korotkov-verkada/argo-cd that referenced this issue Nov 24, 2024
…y a direct object update (argoproj#20875)

Fixes argoproj#20875

Some systems like Kargo update the application operation directly to initiate a new sync, bypassing the server. However, they can't update the status operation state to become nil, resulting in discrepancy, see akuity/kargo#2985 and the discussion. Handle this case on the app controller side.

Let's cherry-pick to v2.13.

Signed-off-by: Andrii Korotkov <[email protected]>
andrii-korotkov-verkada added a commit to andrii-korotkov-verkada/argo-cd that referenced this issue Nov 24, 2024
…y a direct object update (argoproj#20875)

Fixes argoproj#20875

Some systems like Kargo update the application operation directly to initiate a new sync, bypassing the server. However, they can't update the status operation state to become nil, resulting in discrepancy, see akuity/kargo#2985 and the discussion. Handle this case on the app controller side.

Let's cherry-pick to v2.13.

Signed-off-by: Andrii Korotkov <[email protected]>
andrii-korotkov-verkada added a commit to andrii-korotkov-verkada/argo-cd that referenced this issue Nov 24, 2024
…y a direct object update (argoproj#20875)

Fixes argoproj#20875

Some systems like Kargo update the application operation directly to initiate a new sync, bypassing the server. However, they can't update the status operation state to become nil, resulting in discrepancy, see akuity/kargo#2985 and the discussion. Handle this case on the app controller side.

Let's cherry-pick to v2.13.

Signed-off-by: Andrii Korotkov <[email protected]>
andrii-korotkov-verkada added a commit to andrii-korotkov-verkada/argo-cd that referenced this issue Nov 24, 2024
…y a direct object update (argoproj#20875)

Fixes argoproj#20875

Some systems like Kargo update the application operation directly to initiate a new sync, bypassing the server. However, they can't update the status operation state to become nil, resulting in discrepancy, see akuity/kargo#2985 and the discussion. Handle this case on the app controller side.

Let's cherry-pick to v2.13.

Signed-off-by: Andrii Korotkov <[email protected]>
andrii-korotkov-verkada added a commit to andrii-korotkov-verkada/argo-cd that referenced this issue Nov 24, 2024
…y a direct object update (argoproj#20875)

Fixes argoproj#20875

Some systems like Kargo update the application operation directly to initiate a new sync, bypassing the server. However, they can't update the status operation state to become nil, resulting in discrepancy, see akuity/kargo#2985 and the discussion. Handle this case on the app controller side.

Let's cherry-pick to v2.13.

Signed-off-by: Andrii Korotkov <[email protected]>
andrii-korotkov-verkada added a commit to andrii-korotkov-verkada/argo-cd that referenced this issue Nov 24, 2024
…y a direct object update (argoproj#20875)

Fixes argoproj#20875

Some systems like Kargo update the application operation directly to initiate a new sync, bypassing the server. However, they can't update the status operation state to become nil, resulting in discrepancy, see akuity/kargo#2985 and the discussion. Handle this case on the app controller side.

Let's cherry-pick to v2.13.

Signed-off-by: Andrii Korotkov <[email protected]>
andrii-korotkov-verkada added a commit to andrii-korotkov-verkada/argo-cd that referenced this issue Nov 24, 2024
…y a direct object update (argoproj#20875)

Fixes argoproj#20875

Some systems like Kargo update the application operation directly to initiate a new sync, bypassing the server. However, they can't update the status operation state to become nil, resulting in discrepancy, see akuity/kargo#2985 and the discussion. Handle this case on the app controller side.

Let's cherry-pick to v2.13.

Signed-off-by: Andrii Korotkov <[email protected]>
andrii-korotkov-verkada added a commit to andrii-korotkov-verkada/argo-cd that referenced this issue Nov 24, 2024
…y a direct object update (argoproj#20875)

Fixes argoproj#20875

Some systems like Kargo update the application operation directly to initiate a new sync, bypassing the server. However, they can't update the status operation state to become nil, resulting in discrepancy, see akuity/kargo#2985 and the discussion. Handle this case on the app controller side.

Let's cherry-pick to v2.13.

Signed-off-by: Andrii Korotkov <[email protected]>
andrii-korotkov-verkada added a commit to andrii-korotkov-verkada/argo-cd that referenced this issue Nov 24, 2024
…y a direct object update (argoproj#20875)

Fixes argoproj#20875

Some systems like Kargo update the application operation directly to initiate a new sync, bypassing the server. However, they can't update the status operation state to become nil, resulting in discrepancy, see akuity/kargo#2985 and the discussion. Handle this case on the app controller side.

Let's cherry-pick to v2.13.

Signed-off-by: Andrii Korotkov <[email protected]>
andrii-korotkov-verkada added a commit to andrii-korotkov-verkada/argo-cd that referenced this issue Nov 24, 2024
…y a direct object update (argoproj#20875)

Fixes argoproj#20875

Some systems like Kargo update the application operation directly to initiate a new sync, bypassing the server. However, they can't update the status operation state to become nil, resulting in discrepancy, see akuity/kargo#2985 and the discussion. Handle this case on the app controller side.

Let's cherry-pick to v2.13.

Signed-off-by: Andrii Korotkov <[email protected]>
andrii-korotkov-verkada added a commit to andrii-korotkov-verkada/argo-cd that referenced this issue Nov 24, 2024
…y a direct object update (argoproj#20875)

Fixes argoproj#20875

Some systems like Kargo update the application operation directly to initiate a new sync, bypassing the server. However, they can't update the status operation state to become nil, resulting in discrepancy, see akuity/kargo#2985 and the discussion. Handle this case on the app controller side.

Let's cherry-pick to v2.13.

Signed-off-by: Andrii Korotkov <[email protected]>
andrii-korotkov-verkada added a commit to andrii-korotkov-verkada/argo-cd that referenced this issue Nov 24, 2024
…y a direct object update (argoproj#20875)

Fixes argoproj#20875

Some systems like Kargo update the application operation directly to initiate a new sync, bypassing the server. However, they can't update the status operation state to become nil, resulting in discrepancy, see akuity/kargo#2985 and the discussion. Handle this case on the app controller side.

Let's cherry-pick to v2.13.

Signed-off-by: Andrii Korotkov <[email protected]>
andrii-korotkov-verkada added a commit to andrii-korotkov-verkada/argo-cd that referenced this issue Nov 24, 2024
…y a direct object update (argoproj#20875)

Fixes argoproj#20875

Some systems like Kargo update the application operation directly to initiate a new sync, bypassing the server. However, they can't update the status operation state to become nil, resulting in discrepancy, see akuity/kargo#2985 and the discussion. Handle this case on the app controller side.

Let's cherry-pick to v2.13.

Signed-off-by: Andrii Korotkov <[email protected]>
andrii-korotkov-verkada added a commit to andrii-korotkov-verkada/argo-cd that referenced this issue Nov 25, 2024
…y a direct object update (argoproj#20875)

Fixes argoproj#20875

Some systems like Kargo update the application operation directly to initiate a new sync, bypassing the server. However, they can't update the status operation state to become nil, resulting in discrepancy, see akuity/kargo#2985 and the discussion. Handle this case on the app controller side.

Let's cherry-pick to v2.13.

Signed-off-by: Andrii Korotkov <[email protected]>
@andrii-korotkov-verkada
Copy link
Author

I've made it ready for review, working for both live environment and tests. Hope ArgoCD maintainers accept it.

@krancour
Copy link
Member

krancour commented Dec 4, 2024

Closing this as I believe there's agreement that no one but Argo CD should update Application status and argoproj/argo-cd#20915 seems as if it may, pending review, address the reported issue on the Argo CD end.

@krancour krancour closed this as completed Dec 4, 2024
@jessesuen
Copy link
Member

After further internal discussion, we conceded that Kargo can clear .status.operationState (at least temporarily), since it's unclear if & when Argo CD will change it's behavior.

So .status.operationState will be cleared the same time the operation is set. Implemented in #3069 and will be in v1.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants