-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cluster resolver should produce better errors when there are no healthy endpoints #7749
Comments
Yeah, that definitely seems like a bug in the child policy, even completely independent of xDS. I think the child policy should report TF and its
Yeah, the more useful we can make the error message, the better. I did a bunch of work for cases like this in C-core as part of grpc/grpc#22883. The way we're handling this is:
|
PF will return a non-OK result and that will trigger a re-resolution, but the point here is that the policy will keep using the old addresses, which is the defined behavior of PF. |
I was expecting this needs to be handled by some custom logic in clusterresolver that detects when there are no addresses and substitutes the child picker for another one with a "no endpoints" error. If you're pushing the EDS resolver update to the child policy, then how do you avoid the same situation where we keep using old addresses when zero address errors are encountered? |
Oops, I guess I misread this part the first time. I don't think the child policy is supposed to report TF in these cases, right? IIUC, PF is supposed to treat a resolver result of 0 addresses the same as a resolver error, which we ignore if we were already READY. |
If you remember back when we agreed on the "always trust the control plane" principle (internally, see go/grpc-client-channel-principles-revamp), we decided that if the control plane sends us zero addresses, then we need to honor that immediately. Our currently agreed behavior of record is that PF should report TF in this case. It should not ignore the update if it was already READY -- we do that only if we can an error (as opposed to a valid but empty address list). As you know, I'm not a big fan of the "always trust the control plane" principle, and I would still like us to reevaluate whether that's the right thing, especially now that we have more o11y infrastructure in place in OSS. But until we do that, the above is the behavior that we should be providing in all languages. |
Is that not what are you are seeing? Where did you run into this? And what exact error were you seeing? I did check our codebase, and looks like we do not have an e2e style test for a scenario where the |
It's a theoretical case that I quickly modified the existing |
Also, an update: we may actually want to change the expected behavior of some of these things. I'll follow up here when I know move. |
It looks like my understanding of our PF implementation was wrong, and that (1) is false. We will accept a zero-address update from the resolver and apply it, making future RPCs fail. This was implemented in #5274. Sorry for the mistake. We still need to improve our error messages when a cluster has no routable endpoints, however. |
EDIT: See #7749 (comment) -- (1) below is not correct. (2) should still be fixed.
If there are no healthy addresses, clutserresolver is just passing an empty address list to the child LB policy:
grpc-go/xds/internal/balancer/clusterresolver/configbuilder.go
Lines 258 to 290 in 4544b8a
This means two things:
Both of these should be considered bugs.
cc @markdroth @ejona86
The text was updated successfully, but these errors were encountered: