-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(controller): set WatchListPageSize
when WatchError
#13466
base: main
Are you sure you want to change the base?
Conversation
About pageSize is aligned with the default pageSize of pager in client-go. see https://github.com/kubernetes/client-go/blob/ee1a5aaf793a9ace9c433f5fb26a19058ed5f37c/tools/pager/pager.go#L31 |
…alled Signed-off-by: Yuping Fan <[email protected]>
Relist
calledlimit
when Relist
called
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some stylistic comments below
workflow/controller/informer/tolerant_cluster_workflow_template_informer.go
Outdated
Show resolved
Hide resolved
workflow/controller/informer/tolerant_workflow_template_informer.go
Outdated
Show resolved
Hide resolved
this makes sense to me, although I'm not sure if this could have other ramifications/consequences. both during relist, which makes sense to be a full list, and during other operations as this code block is not necessarily specific to relist |
@terrytangyuan - could you look at this. Given how #12133 fixed quite a subtle problem, it'd be good to have your input on this one. |
Signed-off-by: Yuping Fan <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting discovery. cc @jessesuen to take a look as well
Fixed, Thanks |
Signed-off-by: Yuping Fan <[email protected]>
workflow/controller/controller.go
Outdated
if options.ResourceVersion == "0" { | ||
options.ResourceVersion = "" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The recorded LastResourceVersion cannot be overwritten. see https://github.com/kubernetes/client-go/blob/ee1a5aaf793a9ace9c433f5fb26a19058ed5f37c/tools/cache/reflector.go#L574
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the thorough explanation as well as the links to the client-go code to support the need for this change. The rationale for this PR makes sense to me.
Here is my summary of the original issue, the history, and this PR (if only for my own edification):
- The Kubernetes cache informer will call list with
resourceVersion=0
.resourceVersion=0
tells the K8s API server that it is allowed to use its API server cache. However, it has the consequence of pagination (a.k.a limits/chunking) not working and returns all workflows in the response. This can lead to OOM issues in the controller in systems with a lot of workflows. - To solve this, fix: Correct limit in controller List API calls. Fixes #11134 #11343 had blindly set
resourceVersion=""
in all list calls, in order to ensure that pagination would always work. This would prevent OOM situations in the controller by avoiding non-paginated huge LIST responses. - However, as @fyp711 discovered, unconditionally setting
resourceVersion=""
is not correct, since the informer might also perform a "relist", which is a list call containing an actual numeric version, e.g.resourceVersion=1234
. The fact that our controller overrides1234
with""
is incorrect behavior because ignoringresourceVersion=1234
could cause consistency issues, but also causes the request to reach etcd. - The fix here is to achieve both concerns:
- We will still ignore the cache informer's request to list with
resourceVersion=0
, replacing it withresourceVersion=""
, since we know0
will cause memory issues - But we will now honor the cache informer's request to list with
resourceVersion=1234
. List withresourceVersion=1234
does honor page sizes (I verified this), and we avoid potential consistency issues as well as strain on etcd (sinceresourceVersion=1234
allows the server to use its API server cache).
- We will still ignore the cache informer's request to list with
The subtle differences between the list options that are worth repeating:
|
Thanks for the great summary Jesse, that's very helpful! My only remaining question is the effect/ramifications of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! LGTM. Can we add the link to the PR to the code comments and record the rationale of the changes?
Signed-off-by: Yuping Fan <[email protected]>
@terrytangyuan fixed, Thanks |
workflow/controller/controller.go
Outdated
} | ||
// The reflector will set the Limit to `0` when `ResourceVersion != "" && ResourceVersion != "0"`, which will fail | ||
// to limit the number of workflow returns. Timeouts and other errors may occur when there are a lots of workflows. | ||
// see https://github.com/kubernetes/client-go/blob/ee1a5aaf793a9ace9c433f5fb26a19058ed5f37c/tools/cache/reflector.go#L286 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this.
So in https://github.com/kubernetes/client-go/blob/ee1a5aaf793a9ace9c433f5fb26a19058ed5f37c/tools/cache/reflector.go#L289-L294 it says that no limit retrieves from watch cache and is intentionally done to avoid a thundering herd on etcd; so that sounds like adding a limit could have negative consequences and cause reading from etcd?
This behavior is confusing enough that I'm genuinely surprised this isn't documented better in client-go
with usage examples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agilgur5 Thank you for reading, let me share my opinion:
First of all, the purpose of this submission is to make the Limit take effect, in order to solve a series of problems caused by returning a large amount of workflow at once.
Next, the ListAndWatch
method will only execute List once each time, and then start continuous Watch, so I think this operation will not cause too much load on etcd.
Summary: The ListWatch is not a frequent List request, Setting Limit = xx
can ensure pagination return.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I get that the List portion is not as frequent (it will also happen during informer rebuilds), though part of the purpose of this PR is to reduce load on etcd, and this actually causes a relist from etcd instead of the watch cache
I checked kubernetes source, 1.30 and former versions may not honor this, but, set |
Signed-off-by: Yuping Fan <[email protected]>
I just tested with v1.30, v1.29, v1.26. In all cases, when using a |
This check makes sure that we never perform a list request that wants all objects in the cluster, instead always ensuring that responses are paginated. Since the caller did not specify a limit, we set this to the client-go default of 500. |
Sorry, maybe I didn't express clearly, when |
@jessesuen @Joibel @agilgur5 @terrytangyuan @imliuda Unfortunately, I did a stress test and found some issues. The current way of setting the Limit is incorrect, which may result in only 500 records of informer data in certain situations, as shown below. The correct way is to set the |
Thanks for stress testing this!
Yea this is what I was concerned about, that setting it unconditionally could break cache rebuilds or other times the informer decides to do a full relist, i.e. it can break the expected behavior of an informer. In higher load scenarios, the informer is likely to keep track of more than 500 records |
Apparently the informer framework's resync does not do a full re-list anymore per argoproj/gitops-engine#617 (comment)? @wojtek-t if you have a minute to check this PR, your expertise could be very valuable here as well! Especially regarding k8s 1.30+ changes/impact🙏 |
if got a // len(list) = 500, paginatedResult=false, err=nil
list, paginatedResult, err = pager.List(context.Background(), options) |
Signed-off-by: Yuping Fan <[email protected]>
Signed-off-by: Yuping Fan <[email protected]>
Signed-off-by: Yuping Fan <[email protected]>
Signed-off-by: Yuping Fan <[email protected]>
Signed-off-by: Yuping Fan <[email protected]>
Signed-off-by: Yuping Fan <[email protected]>
Signed-off-by: Yuping Fan <[email protected]>
|
||
func (wfc *WorkflowController) WatchErrorHandler(r *cache.Reflector, err error) { | ||
cache.DefaultWatchErrorHandler(r, err) | ||
if err != io.EOF { | ||
// The reflector will set the Limit to `0` when `ResourceVersion != "" && ResourceVersion != "0"`, which will fail | ||
// to limit the number of workflow returns. Timeouts and other errors may occur when there are a lots of workflows. | ||
// see https://github.com/kubernetes/client-go/blob/ee1a5aaf793a9ace9c433f5fb26a19058ed5f37c/tools/cache/reflector.go#L286 | ||
r.WatchListPageSize = common.DefaultPageSize | ||
wfc.WatchListPageSize = r.WatchListPageSize | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added an error handler to set the WatchListPageSize
of the reflector
to ensure that it can meet the Limit restriction when an error occurs in ListAndWatch
.
Currently, I have only found this way to set WatchListPageSize
. If anyone has a better solution, I am happy to discuss it with you.
@wojtek-t Could you help us review this pr ? We do believe that your experience can make this submission better, if you have free time. Thanks ! |
Hi @agilgur5 , I have updated a new fix solution, what do you think about this fix? Thanks. |
@fyp711 do u have a screenshot showing metrics graph of before vs after impact of this change? |
@tooptoop4 Currently no, This submission is an optimization based on this #11343
|
would be good to know how much this helps etcd |
limit
when Relist
calledWatchListPageSize
when WatchError
I modified the description of the issue, the previous description was not very accurate, the previous discussion should be about another issue. |
Fixes #13461
Motivation
If relist options ,
case options.ResourceVersion != "" && options.ResourceVersion != "0": pager.PageSize = 0
. It's will setoptions.Limit=0
, and than you will setResourceVersion=''
. So when do relist, it lists all objects from etcd.Below code is a part of
reflector.go
.see this part of code below, if
ResourceVersion= ""
K8s apiserver will always request etcd directly.Modifications
Add
Limit=500
when setoptions.ResourceVersion = ""
Verification
Wait informer watch failed,and then relist called. You can see that
options.Limit=0
in debug mode.