Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

correct way to check internet connection goes away #7841

Open
ukai opened this issue Nov 14, 2024 · 5 comments
Open

correct way to check internet connection goes away #7841

ukai opened this issue Nov 14, 2024 · 5 comments
Assignees
Labels
Area: Client Includes Channel/Subchannel/Streams, Connectivity States, RPC Retries, Dial/Call Options and more. stale Status: Requires Reporter Clarification Type: Question

Comments

@ukai
Copy link

ukai commented Nov 14, 2024

When internet connection goes away in the long running process, grpc returns

code = Unavailable desc = last connection error: connection error: desc = "transport: Error while dialing: dial tcp: lookup remotebuildexecution.googleapis.com: no such host"

(using google.golang.org/api/transport/grpc's DialPool).

How should we heck this error type?
I think we want to retry for Unavailable, but we'd like to report this error to the user without retry.

@arjan-bal
Copy link
Contributor

@ukai do you know how DialPool handles failures? Does it transparently retry using a new ClientConn/Channel from the pool when one ClientConn fails? If yes, then you would always see a DNS resolution failure when sending the first request on a new ClientConn.

@arjan-bal arjan-bal added Area: Client Includes Channel/Subchannel/Streams, Connectivity States, RPC Retries, Dial/Call Options and more. Area: RPC Features Includes Compression, Encoding, Attributes/Metadata, Interceptors. and removed Area: RPC Features Includes Compression, Encoding, Attributes/Metadata, Interceptors. labels Nov 15, 2024
@ukai
Copy link
Author

ukai commented Nov 18, 2024

I don't know how DialPool handles the failure, as I'm not maintainer of google.golang.org/api/transport/grpc...

@ukai
Copy link
Author

ukai commented Nov 18, 2024

it just call grpc.DialContext for pool size, and pick roundrobin?

How can we distinguish DNS resolution failures (want no retry) from transient Unavailable failure (want retry)

@arjan-bal
Copy link
Contributor

When creating a channel using grpc.Dial(), the default resolver scheme is passthrough. When using passthrough, the hostname resolution isn't done by gRPC, but it happens when a transport is created by calling net.DIal. This means that the hostname resolution happens every time a transport is created. I can't think of a way of differentiating b/w network connectivity failures and DNS resolutions failures when using passthrough.

If the DNS resolver is used by explicitly using the dns:/// scheme in the target URI, the hostname resolution is managed by gRPC instead of relying on net.Dial. The gRPC DNS resolver keeps polling DNS for updates. Existing IP addresses are retained if name resolution fails. This means that if the connection was working earlier, you would not see a resolution failure when network connectivity breaks. Instead, you would probably see to a context deadline exceeded:

code = DeadlineExceeded desc = context deadline exceeded

If the DNS lookup fails in the first connection attempt using a ClientConn, an error similar to the following will be returned:

code = Unavailable desc = name resolver error: produced zero addresses

Note: There is a known issue with the DNS resolver (#7556) in which the hostname is resolved on the client even when using an http connect proxy. We plan to fix this in the next gRPC Go release.

Copy link

This issue is labeled as requiring an update from the reporter, and no update has been received after 6 days. If no update is provided in the next 7 days, this issue will be automatically closed.

@github-actions github-actions bot added the stale label Nov 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Client Includes Channel/Subchannel/Streams, Connectivity States, RPC Retries, Dial/Call Options and more. stale Status: Requires Reporter Clarification Type: Question
Projects
None yet
Development

No branches or pull requests

3 participants