vfkit: gvproxy exits on high network traffic #367

cfergeau · 2024-06-28T09:24:02Z

This was reported in containers/podman#23114, gvproxy exits when pulling big images on a fast network connection.

The issue is coming from:

time="2024-06-28T10:59:10+02:00" level=error msg="write unixgram /var/folders/09/9bv34hm11vb94tmwhtqyyx880000gn/T/podman/test-gvproxy.sock->/Users/riccardoforina/Library/Application Support/vfkit/net-15351-479784434.sock: sendto: no buffer space available"

I was seeing this when I added vfkit support to gvisor-tap-vsock until I added

gvisor-tap-vsock/pkg/transport/unixgram_darwin.go

Lines 24 to 30 in 6dbbe08

    
           err = rawConn.Control(func(fd uintptr) { 
        
           	if err = syscall.SetsockoptInt(int(fd), syscall.SOL_SOCKET, syscall.SO_SNDBUF, 1*1024*1024); err != nil { 
        
           		return 
        
           	} 
        
           	if err = syscall.SetsockoptInt(int(fd), syscall.SOL_SOCKET, syscall.SO_RCVBUF, 4*1024*1024); err != nil { 
        
           		return 
        
           	}

This is unfortunately not good enough, and the maximum for these values is 810241024, and Riccardo is still having this issue with the maximum.

If I remember correctly, the "buffer is full" error were coming from the tx/rx functions in https://github.com/containers/gvisor-tap-vsock/blob/main/pkg/tap/switch.go

The text was updated successfully, but these errors were encountered:

Luap99 · 2024-06-28T10:14:48Z

Now I am not an expert in how this works but shouldn't gvproxy just retry on ENOBUFS, also I would have assumed to sendto call to block instead of returning such a error. Was the socket configured non blocking maybe?

https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man2/sendto.2.html

If no messages space is available at the socket to hold the message to be
transmitted, then send() normally blocks, unless the socket has been
placed in non-blocking I/O mode. The select(2) call may be used to
determine when it is possible to send more data.

cfergeau · 2024-06-28T11:51:55Z

Now I am not an expert in how this works but shouldn't gvproxy just retry on ENOBUFS, also I would have assumed to sendto call to block instead of returning such a error. Was the socket configured non blocking maybe?

I also did not feel confident enough at the time to make significant changes to the inner tx/rx code shared by all virt providers (qemu, vfkit, hyperkit, ...).
However, the socket is wrapped in a vfkit-specific net.Conn, so we most likely could add blocking/retries there https://github.com/containers/gvisor-tap-vsock/blob/main/pkg/transport/unixgram_darwin.go
This definitely needs a closer look now that the workaround has proven not to be enough and results in nasty failures.

riccardo-forina · 2024-06-28T12:31:24Z

Please ping me for more debugs and to run experimental versions, as I can reliably trigger the problem.

gbraad · 2024-06-28T12:38:23Z

What are the most minimal instructions to trigger this?

riccardo-forina · 2024-06-28T12:41:38Z

On my system, an M1 with 16Gb of RAM and a 1Gbit internet connection (wired), it's enough to create a machine with 3 cpus to trigger the problem with a near 100% success rate. 4 and above I think will give you 100% success. I ruled out the rootful option as it doesn't seem to play any role in this.

podman machine init --cpus 4

For the docker-compose.yaml, I think any would work. I'm using this one just because I have it handy on my home


services:  
  postgresql:
    image: postgres:14
    hostname: postgresql
    volumes:
      - pg_data:/var/lib/postgresql/data
    environment:
      POSTGRES_DB: "conduktor-platform"
      POSTGRES_USER: "conduktor"
      POSTGRES_PASSWORD: "change_me"
      POSTGRES_HOST_AUTH_METHOD: "scram-sha-256"
    ports:
      - "5432:5432"

  conduktor-platform:
    image: conduktor/conduktor-platform:1.19.0
    depends_on:
      - postgresql
    ports:
      - "8081:8080"
    volumes:
      - conduktor_data:/var/conduktor
    environment:
      CDK_DATABASE_URL: "postgresql://conduktor:change_me@postgresql:5432/conduktor-platform"
      CDK_MONITORING_CORTEX-URL: http://conduktor-monitoring:9009/
      CDK_MONITORING_ALERT-MANAGER-URL: http://conduktor-monitoring:9010/
      CDK_MONITORING_CALLBACK-URL: http://conduktor-platform:8080/monitoring/api/
      CDK_MONITORING_NOTIFICATIONS-CALLBACK-URL: http://localhost:8080
    healthcheck:
      test: curl -f http://localhost:8080/platform/api/modules/health/live || exit 1
      interval: 10s
      start_period: 10s
      timeout: 5s
      retries: 3
  conduktor-monitoring:
    image: conduktor/conduktor-platform-cortex:1.19.0
    environment:
      CDK_CONSOLE-URL: "http://conduktor-platform:8080"
volumes:
  pg_data: {}
  conduktor_data: {}

gbraad · 2024-06-28T12:43:39Z

just the creation with podman machine init already triggers this?

updated: ah, a compose script.

I'll try to recreate a 'simpler' reproducer

Luap99 · 2024-06-28T12:50:14Z

I assume you need a high speed connection to trigger it, maybe try to use iperf3 between host and VM.

Luap99 · 2024-06-28T13:06:16Z

On the macos host run iperf3 -s to start the server

Then in another terminal run iperf3 in a container as client, using --network host to not get any slow downs from the container networking, using -R to send data from the server to the client (like pull images does) and lastly the important bit -P 8 to run things in parallel, without it I as not able to reproduce.

$ podman run --network host -it networkstatic/iperf3 -c host.containers.internal -R -P 8
Connecting to host host.containers.internal, port 5201
Reverse mode, remote host host.containers.internal is sending
[  5] local 192.168.127.2 port 58958 connected to 192.168.127.254 port 5201
[  7] local 192.168.127.2 port 58960 connected to 192.168.127.254 port 5201
[  9] local 192.168.127.2 port 58976 connected to 192.168.127.254 port 5201
[ 11] local 192.168.127.2 port 58980 connected to 192.168.127.254 port 5201
[ 13] local 192.168.127.2 port 58992 connected to 192.168.127.254 port 5201
[ 15] local 192.168.127.2 port 59000 connected to 192.168.127.254 port 5201
[ 17] local 192.168.127.2 port 59012 connected to 192.168.127.254 port 5201
[ 19] local 192.168.127.2 port 59018 connected to 192.168.127.254 port 5201
Error: Post "http://d/v5.1.1/libpod/containers/7ed13089ab5ece5c87c25b74c6bd842222f1992e6021e628e5b4c79ced226157/wait": EOF

cfergeau · 2024-06-28T13:15:54Z

It should be easy to reproduce with podman pull if you rebuild gvproxy with:

diff --git a/pkg/transport/unixgram_darwin.go b/pkg/transport/unixgram_darwin.go
index 12d3c50a..db473ade 100644
--- a/pkg/transport/unixgram_darwin.go
+++ b/pkg/transport/unixgram_darwin.go
@@ -8,7 +8,6 @@ import (
        "encoding/hex"
        "fmt"
        "net"
-       "syscall"
 )
 
 type connectedUnixgramConn struct {
@@ -17,22 +16,6 @@ type connectedUnixgramConn struct {
 }
 
 func connectListeningUnixgramConn(conn *net.UnixConn, remoteAddr *net.UnixAddr) (*connectedUnixgramConn, error) {
-       rawConn, err := conn.SyscallConn()
-       if err != nil {
-               return nil, err
-       }
-       err = rawConn.Control(func(fd uintptr) {
-               if err = syscall.SetsockoptInt(int(fd), syscall.SOL_SOCKET, syscall.SO_SNDBUF, 1*1024*1024); err != nil {
-                       return
-               }
-               if err = syscall.SetsockoptInt(int(fd), syscall.SOL_SOCKET, syscall.SO_RCVBUF, 4*1024*1024); err != nil {
-                       return
-               }
-       })
-       if err != nil {
-               return nil, err
-       }
-
        return &connectedUnixgramConn{
                UnixConn:   conn,
                remoteAddr: remoteAddr,

balajiv113 · 2024-07-02T11:26:05Z

Can you give a try with this branch

https://github.com/balajiv113/gvisor-tap-vsock/tree/rx-queue

What's done
Added support for rxQueue, this way we will keep reading from socket and submit it to gvisor in a go routine. This way read's are not blocked till gvisor is completed.

Luap99 · 2024-07-02T12:08:15Z

@balajiv113 Using my reproducer from above that still fails in the same way, this is the gvproxy log:

time="2024-07-02T14:06:37+02:00" level=info msg="gvproxy version v0.7.3-58-g9a4a0c4"
time="2024-07-02T14:06:37+02:00" level=info msg="waiting for clients..."
time="2024-07-02T14:06:37+02:00" level=info msg="new connection from /Users/paul/Library/Application Support/vfkit/net-3192-3936363403.sock to /var/folders/k1/56nx4hg12rb5pyrgt3l6lfdr0000gn/T/podman/podman-machine-default-gvproxy.sock"
time="2024-07-02T14:06:48+02:00" level=error msg="cannot receive packets from /Users/paul/Library/Application Support/vfkit/net-3192-3936363403.sock, disconnecting: cannot read size from socket: read unixgram /var/folders/k1/56nx4hg12rb5pyrgt3l6lfdr0000gn/T/podman/podman-machine-default-gvproxy.sock: use of closed network connection"
time="2024-07-02T14:06:48+02:00" level=error msg="write unixgram /var/folders/k1/56nx4hg12rb5pyrgt3l6lfdr0000gn/T/podman/podman-machine-default-gvproxy.sock->/Users/paul/Library/Application Support/vfkit/net-3192-3936363403.sock: sendto: no buffer space available"
time="2024-07-02T14:06:48+02:00" level=error msg="error closing unixgram:///var/folders/k1/56nx4hg12rb5pyrgt3l6lfdr0000gn/T/podman/podman-machine-default-gvproxy.sock: \"close unixgram /var/folders/k1/56nx4hg12rb5pyrgt3l6lfdr0000gn/T/podman/podman-machine-default-gvproxy.sock: use of closed network connection\""
time="2024-07-02T14:06:48+02:00" level=error msg="gvproxy exiting: cannot read size from socket: read unixgram /var/folders/k1/56nx4hg12rb5pyrgt3l6lfdr0000gn/T/podman/podman-machine-default-gvproxy.sock: use of closed network connection"

balajiv113 · 2024-07-17T09:15:24Z

@Luap99
Can you give a try now with changes from
https://github.com/balajiv113/gvisor-tap-vsock/tree/rx-queue

I have handled writes to socket to 15writes per 15microsecond. This was having better success for me without affecting performance. Do try and let me know.

TODO: Better way to set capacity.

Note: Please do not remove SO_SNDBUF and SO_RCVBUF configs

Luap99 · 2024-07-17T10:58:44Z

@balajiv113 Your patch seems to work but it effects performance a lot. I am down to ~600 Mbits from ~1.9 Gbits before.

Also looking at your code this will always leak a goroutine and there is no way for a caller to remove that which is not great, likely not a problem for gvproxy directly as it will only calls this once and then runs until the process exits but if someone else is using this it can cause troubles.

I am not familiar with the code base but I would expect something like if err == ENOBUFS then try write again

balajiv113 · 2024-07-17T11:15:32Z

if err == ENOBUFS then try write again
We cannot do like this, the problem is as soon as we get this buffer error. The connection is closed. This UDP connection is for host <-> Apple Vz

I guess Apple Vz closes it in case of any error. So best possibility is to make sure this doesn't occur

Edit: That check works. Thanks @Luap99

balajiv113 · 2024-07-17T11:16:47Z

I am down to ~600 Mbits from ~1.9 Gbits before.

FYI Increasing capacity / reducing time will improve the performance. We need to find the best possible combination for it

Luap99 · 2024-07-17T11:36:12Z

@balajiv113 I am thinking this: Luap99@5806d21

It seem to work for me with transfers around 2 Gbits.

Luap99 · 2024-07-17T11:39:21Z

I can open PR with that if we agree that this is the right fix, I have to check if this still complies for windows before that

balajiv113 · 2024-07-17T12:38:48Z

@balajiv113 I am thinking this: Luap99@5806d21

It seem to work for me with transfers around 2 Gbits.

It works well for me 💯 This looks good for me. Will wait for others to comment

Luap99 · 2024-07-17T13:56:43Z

created #370

cfergeau mentioned this issue Jun 28, 2024

podman compose not working correctly for some compose yamls containers/podman#23114

Closed

gbraad added the bug Something isn't working label Jun 28, 2024

gbraad added this to Project planning: crc Jun 28, 2024

gbraad moved this to Todo in Project planning: crc Jun 28, 2024

balajiv113 mentioned this issue Jul 11, 2024

Lima VM will randomly lose network connectivity lima-vm/lima#2480

Closed

Luap99 mentioned this issue Jul 17, 2024

handle ENOBUFS when writing to VM socket #370

Merged

openshift-merge-bot bot closed this as completed in 7782952 Jul 23, 2024

github-project-automation bot moved this from Todo to Done in Project planning: crc Jul 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vfkit: gvproxy exits on high network traffic #367

vfkit: gvproxy exits on high network traffic #367

cfergeau commented Jun 28, 2024

Luap99 commented Jun 28, 2024

cfergeau commented Jun 28, 2024 •

edited

Loading

riccardo-forina commented Jun 28, 2024

gbraad commented Jun 28, 2024

riccardo-forina commented Jun 28, 2024 •

edited

Loading

gbraad commented Jun 28, 2024 •

edited

Loading

Luap99 commented Jun 28, 2024

Luap99 commented Jun 28, 2024

cfergeau commented Jun 28, 2024

balajiv113 commented Jul 2, 2024

Luap99 commented Jul 2, 2024

balajiv113 commented Jul 17, 2024

Luap99 commented Jul 17, 2024 •

edited

Loading

balajiv113 commented Jul 17, 2024 •

edited

Loading

balajiv113 commented Jul 17, 2024

Luap99 commented Jul 17, 2024 •

edited

Loading

Luap99 commented Jul 17, 2024

balajiv113 commented Jul 17, 2024

Luap99 commented Jul 17, 2024

vfkit: gvproxy exits on high network traffic #367

vfkit: gvproxy exits on high network traffic #367

Comments

cfergeau commented Jun 28, 2024

Luap99 commented Jun 28, 2024

cfergeau commented Jun 28, 2024 • edited Loading

riccardo-forina commented Jun 28, 2024

gbraad commented Jun 28, 2024

riccardo-forina commented Jun 28, 2024 • edited Loading

gbraad commented Jun 28, 2024 • edited Loading

Luap99 commented Jun 28, 2024

Luap99 commented Jun 28, 2024

cfergeau commented Jun 28, 2024

balajiv113 commented Jul 2, 2024

Luap99 commented Jul 2, 2024

balajiv113 commented Jul 17, 2024

Luap99 commented Jul 17, 2024 • edited Loading

balajiv113 commented Jul 17, 2024 • edited Loading

balajiv113 commented Jul 17, 2024

Luap99 commented Jul 17, 2024 • edited Loading

Luap99 commented Jul 17, 2024

balajiv113 commented Jul 17, 2024

Luap99 commented Jul 17, 2024

cfergeau commented Jun 28, 2024 •

edited

Loading

riccardo-forina commented Jun 28, 2024 •

edited

Loading

gbraad commented Jun 28, 2024 •

edited

Loading

Luap99 commented Jul 17, 2024 •

edited

Loading

balajiv113 commented Jul 17, 2024 •

edited

Loading

Luap99 commented Jul 17, 2024 •

edited

Loading