Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix consistent udp packet loss after the proxy read loop stopped #393

Merged
merged 1 commit into from
Sep 18, 2024

Conversation

fatanugraha
Copy link
Contributor

@fatanugraha fatanugraha commented Sep 3, 2024

Currently we never close the tcpip.Endpoint that we created when we get *udp.ForwarderRequest. This causes all packets that is sent by the same src ip:port after we return from the UDPProxy.Run to be "dropped".

By closing the endpoint, we will get new forwarder request after we return from UDPProxy.Run so we can process new packets.

Here's my reproduction code:

  1. Reuse the same local address when sending udp requests
  2. Send one DNS request (success)
  3. wait until UDPProxy.Run to return (after 90s)
  4. Send one DNS request (failed)
package main

import (
	"context"
	"fmt"
	"net"
	"time"
)

func main() {
	r := &net.Resolver{
		PreferGo: true,
		Dial: func(ctx context.Context, network, address string) (net.Conn, error) {
			addr, err := net.ResolveUDPAddr("udp", "192.168.5.1:40001")
			if err != nil {
				panic(err)
			}

			d := net.Dialer{
				Timeout:   time.Millisecond * time.Duration(10000),
				KeepAlive: -1,
				LocalAddr: addr,
			}

			conn, err := d.DialContext(ctx, network, "8.8.8.8:53")
			if err != nil {
				panic(err)
			}

			return conn, err
		},
	}

	lookup := func() {
		_, err := r.LookupIP(context.Background(), "ip4", "www.google.com")
		if err != nil {
			fmt.Println("err", err)
		} else {
			fmt.Println("ok")
		}
	}

	lookup()                     // ok
	time.Sleep(95 * time.Second) // wait for the UDPConnTimeout
	lookup()                     // this will fail
}

@fatanugraha
Copy link
Contributor Author

/assign cfergeau

@fatanugraha
Copy link
Contributor Author

/cc baude cfergeau

@openshift-ci openshift-ci bot requested review from baude and cfergeau September 9, 2024 06:02
@evidolob
Copy link
Collaborator

@fatanugraha I was trying to test this PR. I try to run test that you provided and it works fine(I don't get any errors, just two ok). I try that on macOS and fedora 40.
So I was wondering is I missing something?

@fatanugraha
Copy link
Contributor Author

fatanugraha commented Sep 15, 2024

Hi @evidolob I've put more detailed reproduction steps here: https://github.com/fatanugraha/gvisor-tap-proxy-393

Do let me know if you have further questions 🙇

attached debug logs from gvproxy (notice that the dns query from the same local addr starts failing after this log is printed DEBU[0122] Stopping udp proxy (read udp 8.8.8.8:53: i/o timeout)

gvproxy.log

capture.pcap.zip

Screenshot 2024-09-15 at 23 28 04

@evidolob evidolob self-requested a review September 17, 2024 11:38
@evidolob
Copy link
Collaborator

@cfergeau I can verify that problem described in this PR description exist, and PR indeed fix it.

@cfergeau
Copy link
Collaborator

I forced pushed to the branch to fix a few typos in the comment.
/lgtm
/approve

@openshift-ci openshift-ci bot added the lgtm label Sep 18, 2024
Copy link
Contributor

openshift-ci bot commented Sep 18, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cfergeau, evidolob, fatanugraha

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit ded1408 into containers:main Sep 18, 2024
6 of 21 checks passed
@cfergeau
Copy link
Collaborator

I wonder if this PR could help with #387 ? (dropping a note here as I can't test/look closely now)

@cfergeau
Copy link
Collaborator

I wonder if this PR could help with #387 ? (dropping a note here as I can't test/look closely now)

Yevhen tested this, and this does not help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants