-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vfkit: gvproxy exits on high network traffic #367
Comments
Now I am not an expert in how this works but shouldn't gvproxy just retry on ENOBUFS, also I would have assumed to sendto call to block instead of returning such a error. Was the socket configured non blocking maybe?
|
I also did not feel confident enough at the time to make significant changes to the inner tx/rx code shared by all virt providers (qemu, vfkit, hyperkit, ...). |
Please ping me for more debugs and to run experimental versions, as I can reliably trigger the problem. |
What are the most minimal instructions to trigger this? |
On my system, an M1 with 16Gb of RAM and a 1Gbit internet connection (wired), it's enough to create a machine with 3 cpus to trigger the problem with a near 100% success rate. 4 and above I think will give you 100% success. I ruled out the
For the docker-compose.yaml, I think any would work. I'm using this one just because I have it handy on my home
|
just the creation with updated: ah, a compose script. I'll try to recreate a 'simpler' reproducer |
I assume you need a high speed connection to trigger it, maybe try to use iperf3 between host and VM. |
On the macos host run Then in another terminal run iperf3 in a container as client, using
|
It should be easy to reproduce with
|
Can you give a try with this branch https://github.com/balajiv113/gvisor-tap-vsock/tree/rx-queue What's done |
@balajiv113 Using my reproducer from above that still fails in the same way, this is the gvproxy log:
|
@Luap99 I have handled writes to socket to 15writes per 15microsecond. This was having better success for me without affecting performance. Do try and let me know. TODO: Better way to set capacity. Note: Please do not remove SO_SNDBUF and SO_RCVBUF configs |
@balajiv113 Your patch seems to work but it effects performance a lot. I am down to ~600 Mbits from ~1.9 Gbits before. Also looking at your code this will always leak a goroutine and there is no way for a caller to remove that which is not great, likely not a problem for gvproxy directly as it will only calls this once and then runs until the process exits but if someone else is using this it can cause troubles. I am not familiar with the code base but I would expect something like |
I guess Apple Vz closes it in case of any error. So best possibility is to make sure this doesn't occur Edit: That check works. Thanks @Luap99 |
FYI Increasing capacity / reducing time will improve the performance. We need to find the best possible combination for it |
@balajiv113 I am thinking this: Luap99@5806d21 It seem to work for me with transfers around 2 Gbits. |
I can open PR with that if we agree that this is the right fix, I have to check if this still complies for windows before that |
It works well for me 💯 This looks good for me. Will wait for others to comment |
created #370 |
This was reported in containers/podman#23114,
gvproxy
exits when pulling big images on a fast network connection.The issue is coming from:
I was seeing this when I added vfkit support to gvisor-tap-vsock until I added
gvisor-tap-vsock/pkg/transport/unixgram_darwin.go
Lines 24 to 30 in 6dbbe08
This is unfortunately not good enough, and the maximum for these values is 810241024, and Riccardo is still having this issue with the maximum.
If I remember correctly, the "buffer is full" error were coming from the tx/rx functions in https://github.com/containers/gvisor-tap-vsock/blob/main/pkg/tap/switch.go
The text was updated successfully, but these errors were encountered: