-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optionally allow specification of custom DNS resolver #584
Conversation
cd7537c
to
8a49af2
Compare
583db73
to
f0698d4
Compare
I've realized that this isn't working with Slack, so I have some work to do I guess. |
custom_dns.go
Outdated
// Google anomaly detection can be triggered very often over IPv6. | ||
// Prefer IPv4 to mitigate, see issue #97 | ||
// If no IPv4 is available, fall back to IPv6 | ||
for _, candidate := range ips { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This replaces the logic from preferIPv4.
Okay, this is now ready for review. I verified that it works both with the system resolver and with specifying |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty neat! I'll defer to @hwh33 since he's more familiar with the codebase, but this seems good to me.
custom_dns.go
Outdated
r := &net.Resolver{ | ||
PreferGo: true, | ||
Dial: func(ctx context.Context, network, address string) (net.Conn, error) { | ||
return netx.DialContext(ctx, "udp", dnsServer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For responses that are larger than a full-sized UDP payload can be, the server may tell the client to retry over TCP. While this isn't terribly common, it can happen quite a lot with larger TXT and DNSSEC records. Since (eventually) the Go resolver is only looking for A and AAAA records, this is probably fine, but it's worth noting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why we'd ever want to overwrite the network parameter... Shouldn't we trust that the Go DNS resolver is specifying the correct network?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. It looks like both unbound and Google support TCP connections on port 53, so I'll just pass through the protocol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
I have another take on this in getlantern/lantern-cloud#423 as well. I haven't had time to test it yet, but manually running |
Closing in favor of https://github.com/getlantern/lantern-cloud/pull/423 |
@Crosse I've changed this so that it resolves in parallel now. I tested the parallel resolution on route |
custom_dns.go
Outdated
|
||
// Returns a dialer that uses custom DNS servers to resolve the host. It uses all DNS servers | ||
// in parallel and uses the first response it gets. | ||
func customDNSDialer(dnsServers []string, timeout time.Duration) (func(context.Context, string, string) (net.Conn, error), error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the parameter should be named dialTimeout
. As it stands, it would intuitively be the resolution timeout (to me anyway).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the resolution timeout. I just happen to also set it as a dial timeout because there's no point waiting to dial past the resolution timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, it is a dialTimeout, even I was confused!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
custom_dns.go
Outdated
r := &net.Resolver{ | ||
PreferGo: true, | ||
Dial: func(ctx context.Context, network, address string) (net.Conn, error) { | ||
return netx.DialContext(ctx, "udp", dnsServer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why we'd ever want to overwrite the network parameter... Shouldn't we trust that the Go DNS resolver is specifying the correct network?
custom_dns.go
Outdated
var resolveErr error | ||
select { | ||
case resolveErr = <-errs: | ||
// got an error | ||
default: | ||
// no error, we just timed out | ||
} | ||
return nil, errors.New("unable to resolve host %v, last resolution error: %v", host, resolveErr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be mildly confusing in the case of a timeout, where resolveErr
will be nil
. You could just return different errors in each of the select cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
custom_dns.go
Outdated
|
||
resolvedAddr := fmt.Sprintf("%s:%s", ip, port) | ||
d := &net.Dialer{ | ||
Deadline: time.Now().Add(timeout), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: it'd be a little more direct to just set Timeout: timeout
(it's kind of weird that net.Dialer
supports deadlines and timeouts simultaneously).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh cool, they must have added that somewhere along the way. I have a memory of only Deadline being available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
errors <- err | ||
return | ||
} | ||
if len(ips) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will mask misses as timeouts. We could instead have a miss
channel or something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out that if the DNS resolver can't find the host, it will return a "no such host" error, so I don't think we have to worry about masking. I did adjust the logic so that if all resolvers error, we fail immediately instead of waiting to hit the timeout. I added a unit test for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for digging into it!
} | ||
if len(ips) > 0 { | ||
// Google anomaly detection can be triggered very often over IPv6. | ||
// Prefer IPv4 to mitigate, see issue #97 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this referencing? https://github.com/getlantern/engineering/issues/97 doesn't look related
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, this comment was moved from elsewhere. @oxtoacart do you know what the deal is with this? I don't quite understand it. Is the comment saying that Google services don't like it when our proxy connects over IPv6?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's referencing https://github.com/getlantern/http-proxy-lantern/issues/97
IIRC, we were getting tons of CAPTCHAs and changing this to force IPv4 reduced that quite a bit.
@hwh33 Thanks for the great feedback! I've addressed your comments and retested on route |
errorCount++ | ||
if errorCount == len(resolvers) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
start := time.Now() | ||
_, err = d(context.Background(), "tcp", "blubbaasdfsadfsadf.dude:443") | ||
require.Error(t, err) | ||
fmt.Println(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: leftover print statement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One nit-pick, but looks good to me! Thanks for addressing that stuff =)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm generally good with this, code-wise. I would suggest not having a default value for the DNS servers to use and making it an explicit opt-in (principal of least astonishment).
After thinking about it longer, my only reservation is parallelizing the DNS queries. In our current situation, we would send simultaneous queries to our local resolver and Google and let them race for an answer, which sounds good on the face of it. But note that each of our phosts average around 3,000-5,000 queries per second, for an aggregate of around 100,000 queries per second. The reason Colin and I began using a local resolver in the first place was because we suspected that we were having our DNS queries rate-limited by the public resolver we had configured, and this was at much lower levels that we see today. If we unconditionally send queries to some non-Lantern-controlled DNS servers, we risk the same result and, potentially, a full block.
In my opinion, the best solution for all of this is actually to fix our local resolver issues (i.e., unbound segfaulting and locking up; see getlantern/lantern-cloud#428 for an idea of replacing Unbound altogether with BIND) and, eventually, run our own caching resolvers in the lantern cloud to act as upstreams for all of the phost-local forwarding resolvers.
I agree 100%. |
Should we close this one then? I generally agree that hammering Google DNS is probably not a great idea. |
I'm on the fence. I think having the option to let lantern-proxy do custom DNS resolution could be useful in the future even if we don't use it right now, but I also wonder if we should avoid adding a knob for functionality we don't intend to use in the near-term. |
Yeah, I would just close this PR and if you ever need that knob in the future, you can resurrect it. |
I tested this on route
d1192fd7-4b4f-48dc-87d7-d99850445f53
. I defaulteddns-servers
to172.16.0.53:53,8.8.8.8:53
and put in some debug statements to make sure our custom dialer is being used (it is). I was able to successfully proxy through the host.This will allow us to specify explicit nameservers for http-proxy to use, bypassing Docker for DNS. This is useful because Docker has been hanging lately, causing name resolution on the proxies to fail.