Environment variables to configure (shorten) Informer ResyncPeriods #13690

    
           // If the pod was deleted, then it is possible that the controller never get another informer message about it. 
        
           // In this case, the workflow will only be requeued after the resync period (20m). This means 
        
           // workflow will not update for 20m. Requeuing here prevents that happening.

#4423

agilgur5 · 2024-10-01T13:43:26Z

shortening might solve #13671 / #10947 (which is linked to a k8s client bug)

That would be a workaround, not a solution. Cache rebuilds are expensive, especially if you have a large amount of Workflows. We leave it at the k8s default, so if it's not tuned in Argo, making it user configurable is a bit confusing, to say the least.

There's also one of these for every informer

Also please fill out the issue templates in full, especially if you want to be a good role model to others.

tooptoop4 · 2024-10-01T19:53:04Z

@agilgur5 can u clarify expensive in what terms? (k8s api calls, controller cpu/memory? something else?) that might be preferable than missing SLAs for me

from reading kubernetes/kubernetes#127964 and kubernetes/client-go#571 informer seems unreliable compared to list current state

so choice seems to be rely on events/cache for what workflows should be operated on (non-0 chance of some missing) vs simple list all workflows (guaranteed to have all)

agilgur5 · 2024-10-01T21:31:45Z

All of the above. It can do a full relist, which is k8s API and network I/O expensive, and iterates through the entire cache, which uses CPU and memory. Depending on your usage, you might be able to see the rebuild as a clear spike in your metrics as with #12206 (comment)

In #12125 (comment) (I forgot that issue existed, very similar) and #13466 (comment) I linked to some readings upstream in kubernetes-client/java#725 (comment), this k8s SIG API Machinery Google Group thread, argoproj/gitops-engine#617 (comment). According to those, Informers are supposed to be quite stable now and no longer relist, although unclear if that applies outside of "core controllers".
But core controllers, kubebuilder, controller-runtime, etc all make heavy use of Informers, so they're an essential piece of k8s controllers upstream, and not necessarily something Argo should be working around if there are bugs.

I would say it's more an upstream issue if that even makes sense to expose to users, since it seems like k8s maintainers don't recommend changing the default for other tooling either.

that might be preferable than missing SLAs for me

that's a bit of a different question that is potentially worth exposing in its own right, although the argument against that would be that if Informers are acting up, your entire cluster is going to be having some problems, not just Argo

github-actions · 2024-10-24T02:24:07Z

This issue has been automatically marked as stale because it has not had recent activity and needs more information. It will be closed if no further activity occurs.

tooptoop4 · 2024-10-25T18:51:01Z

/unrotten

agilgur5 · 2024-10-25T19:10:42Z

/unrotten

This is still missing information...

tooptoop4 · 2024-10-28T10:33:24Z

according to kubernetes/kubernetes#128183 (comment) not upstream issue

tooptoop4 added the type/feature Feature request label Oct 1, 2024

tooptoop4 changed the title ~~Environment variable to configure (shorten) workflowResyncPeriod~~ Environment variable to configure (shorten) workflowResyncPeriod/podResyncPeriod Oct 1, 2024

agilgur5 changed the title ~~Environment variable to configure (shorten) workflowResyncPeriod/podResyncPeriod~~ Environment variables to configure (shorten) Informer ResyncPeriods Oct 1, 2024

agilgur5 added solution/workaround There's a workaround, might not be great, but exists area/controller Controller issues, panics area/upstream This is an issue with an upstream dependency, not Argo itself labels Oct 1, 2024

tooptoop4 mentioned this issue Oct 2, 2024

PodGC too slowly #13212

Open

4 tasks

agilgur5 added the problem/more information needed Not enough information has been provide to diagnose this issue. label Oct 7, 2024

github-actions bot added the problem/stale This has not had a response in some time label Oct 24, 2024

github-actions bot removed problem/stale This has not had a response in some time problem/more information needed Not enough information has been provide to diagnose this issue. labels Oct 26, 2024

gzb1128 mentioned this issue Nov 17, 2024

Add some doc about environment variable "INFORMER_WRITE_BACK" #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Environment variables to configure (shorten) Informer ResyncPeriods #13690

Environment variables to configure (shorten) Informer ResyncPeriods #13690

tooptoop4 commented Oct 1, 2024 •

edited

Loading

agilgur5 commented Oct 1, 2024 •

edited

Loading

tooptoop4 commented Oct 1, 2024 •

edited

Loading

agilgur5 commented Oct 1, 2024 •

edited

Loading

github-actions bot commented Oct 24, 2024

tooptoop4 commented Oct 25, 2024

agilgur5 commented Oct 25, 2024

tooptoop4 commented Oct 28, 2024

Environment variables to configure (shorten) Informer ResyncPeriods #13690

Environment variables to configure (shorten) Informer ResyncPeriods #13690

Comments

tooptoop4 commented Oct 1, 2024 • edited Loading

agilgur5 commented Oct 1, 2024 • edited Loading

tooptoop4 commented Oct 1, 2024 • edited Loading

agilgur5 commented Oct 1, 2024 • edited Loading

github-actions bot commented Oct 24, 2024

tooptoop4 commented Oct 25, 2024

agilgur5 commented Oct 25, 2024

tooptoop4 commented Oct 28, 2024

tooptoop4 commented Oct 1, 2024 •

edited

Loading

agilgur5 commented Oct 1, 2024 •

edited

Loading

tooptoop4 commented Oct 1, 2024 •

edited

Loading

agilgur5 commented Oct 1, 2024 •

edited

Loading