Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handle ENOBUFS when writing to VM socket #370

Merged
merged 2 commits into from
Jul 23, 2024

Conversation

Luap99
Copy link
Member

@Luap99 Luap99 commented Jul 17, 2024

When a high throughput is happeing we send a lot of data into the VM socket, the VM has to read it in order to empty the buffer. This is inherently race so it is possible to get ENOBUFS here. In this case just keep trying writing until it works.

Fixes #367

When a high throughput is happeing we send a lot of data into the VM
socket, the VM has to read it in order to empty the buffer. This is
inherently race so it is possible to get ENOBUFS here. In this case just
keep trying writing until it works.

Fixes containers#367

Signed-off-by: Paul Holzinger <[email protected]>
@Luap99
Copy link
Member Author

Luap99 commented Jul 23, 2024

The test failure doesn't seem related, looks like it relies on a stable TXT entry for wikipedia.org which isn't the case.

It should use a domain under our control that is unlikely to change (podman.io or crc.dev maybe?) or the test must setup a local dns server which returns a stable known TXT record.

@Luap99
Copy link
Member Author

Luap99 commented Jul 23, 2024

@cfergeau PTAL

@cfergeau
Copy link
Collaborator

The test failure doesn't seem related, looks like it relies on a stable TXT entry for wikipedia.org which isn't the case.

It should use a domain under our control that is unlikely to change (podman.io or crc.dev maybe?) or the test must setup a local dns server which returns a stable known TXT record.

Yeah, I hit the test failure while looking at the PR, and came to the same conclusion, especially as there was already a change last month ( 79001db ). I'll add a record to crc.dev

Copy link
Collaborator

@cfergeau cfergeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

@@ -77,7 +77,7 @@ var _ = ginkgo.Describe("dns", func() {
ginkgo.It("should resolve TXT for wikipedia.org", func() {
out, err := sshExec("nslookup -query=txt wikipedia.org")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change wikipedia.org to crc.dev? It already returns what's expected by this test:

$ nslookup -query=txt crc.dev
[...]
crc.dev	text = "v=spf1 include:_mailcust.gandi.net ?all"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, also added text = part to the match just be be sure it is TXT

// again until it works or we get a different error
// https://github.com/containers/gvisor-tap-vsock/issues/367
continue
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if we should use conn.SetWriteDeadline when there's ENOBUFS to be sure we're not getting into an infinite loop.
But looks good to me as is, we can refine it further if there are issues.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is an option I guess, my basic understanding is that could only happen if the reader side (VM process) no longer reads anything from the socket but still keeps the socket open.
Sounds rather unlikely to me but yes if this turn out to cause problems we can add a deadline later

It sems the TXT entry for wikipedia.org changed which is causing the
test to fail. Fix it by only checking for a partial match and use a
domain that is under our control (crc.dev).

Signed-off-by: Paul Holzinger <[email protected]>
@cfergeau
Copy link
Collaborator

/lgtm
/approve

Copy link
Contributor

openshift-ci bot commented Jul 23, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cfergeau, Luap99

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@cfergeau
Copy link
Collaborator

/hold
till CI gets a chance to run

@cfergeau
Copy link
Collaborator

/unhold

@openshift-merge-bot openshift-merge-bot bot merged commit a3a9c41 into containers:main Jul 23, 2024
13 of 20 checks passed
@Luap99 Luap99 deleted the ENOBUFS branch July 23, 2024 12:05
@Luap99
Copy link
Member Author

Luap99 commented Jul 23, 2024

Thanks

@cfergeau Can you create a new release sometime this week? We release podman 5.2 next week and it would be good to get a new gvproxy version into the installer to fix this for podman users.

@cfergeau
Copy link
Collaborator

Thanks

@cfergeau Can you create a new release sometime this week? We release podman 5.2 next week and it would be good to get a new gvproxy version into the installer to fix this for podman users.

Yup, for sure, I'll try to do this today or tomorrow.

@cfergeau
Copy link
Collaborator

@Luap99 I've made the release and created containers/podman#23387

nirs added a commit to nirs/lima that referenced this pull request Oct 2, 2024
We used external package (tcpproxy) for proxying between unix stream and
datagram sockets. This package cannot handle ENOBUFS error, expected
condition on BSD based systems, and worse, it hides errors and stop
forwarding packets silently when write to vz socket fails with
ENOBUFS[1].

Fix the issues by replacing tcpproxy with a simpler and more direct
implementation that will be easier to maintain.

Fixes:

- Fix error handling if write to vz datagram socket fail with ENOBUFS.
  We retry the write until it succeeds. Same solution is used in
  gvisor-tap-vsock[2].

- Fix error handling if we could not read packet header or body from
  socket_vmnet stream socket. Previously we logged an error and
  continue, sending corrupted packet to VZ.

- Fix error handling if writing a packet to socket_vmnet stream socket
  returned after writing partial packet. Now we handle short writes and
  write the complete packet. Previously this ended with 2 corrupted
  packets.

- Log error if forwarding packets from vz to socket_vmnet or from
  socket_vmnet to vz failed.

Simplification:

- Use binary.Read() and binary.Write() to read and write qemu packet
  header.

[1] lima-vm/socket_vmnet#39
[2] containers/gvisor-tap-vsock#370

Signed-off-by: Nir Soffer <[email protected]>
nirs added a commit to nirs/lima that referenced this pull request Oct 2, 2024
We used external package (tcpproxy) for proxying between unix stream and
datagram sockets. This package cannot handle ENOBUFS error, expected
condition on BSD based systems, and worse, it hides errors and stop
forwarding packets silently when write to vz socket fails with
ENOBUFS[1].

Fix the issues by replacing tcpproxy with a simpler and more direct
implementation that will be easier to maintain.

Fixes:

- Fix error handling if write to vz datagram socket fail with ENOBUFS.
  We retry the write until it succeeds. Same solution is used in
  gvisor-tap-vsock[2].

- Fix error handling if we could not read packet header or body from
  socket_vmnet stream socket. Previously we logged an error and
  continue, sending corrupted packet to VZ.

- Fix error handling if writing a packet to socket_vmnet stream socket
  returned after writing partial packet. Now we handle short writes and
  write the complete packet. Previously this ended with 2 corrupted
  packets.

- Log error if forwarding packets from vz to socket_vmnet or from
  socket_vmnet to vz failed.

Simplification:

- Use binary.Read() and binary.Write() to read and write qemu packet
  header.

[1] lima-vm/socket_vmnet#39
[2] containers/gvisor-tap-vsock#370

Signed-off-by: Nir Soffer <[email protected]>
nirs added a commit to nirs/lima that referenced this pull request Oct 2, 2024
We used external package (tcpproxy) for proxying between unix stream and
datagram sockets. This package cannot handle ENOBUFS error, expected
condition on BSD based systems, and worse, it hides errors and stop
forwarding packets silently when write to vz socket fails with
ENOBUFS[1].

Fix the issues by replacing tcpproxy with a simpler and more direct
implementation that will be easier to maintain.

Fixes:

- Fix error handling if write to vz datagram socket fail with ENOBUFS.
  We retry the write until it succeeds. Same solution is used in
  gvisor-tap-vsock[2].

- Fix error handling if we could not read packet header or body from
  socket_vmnet stream socket. Previously we logged an error and
  continue, sending corrupted packet to VZ.

- Fix error handling if writing a packet to socket_vmnet stream socket
  returned after writing partial packet. Now we handle short writes and
  write the complete packet. Previously this ended with 2 corrupted
  packets.

- Log error if forwarding packets from vz to socket_vmnet or from
  socket_vmnet to vz failed.

Simplification:

- Use binary.Read() and binary.Write() to read and write qemu packet
  header.

Visibility:

- Make QEMUPacketConn private since it is an implementation detail
  correct only for lima vz-socket_vmnet use case.

[1] lima-vm/socket_vmnet#39
[2] containers/gvisor-tap-vsock#370

Signed-off-by: Nir Soffer <[email protected]>
nirs added a commit to nirs/lima that referenced this pull request Oct 3, 2024
We used external package (tcpproxy) for proxying between unix stream and
datagram sockets. This package cannot handle ENOBUFS error, expected
condition on BSD based systems, and worse, it hides errors and stop
forwarding packets silently when write to vz socket fails with
ENOBUFS[1].

Fix the issues by replacing tcpproxy with a simpler and more direct
implementation that will be easier to maintain.

Fixes:

- Fix error handling if write to vz datagram socket fail with ENOBUFS.
  We retry the write until it succeeds with a very short sleep between
  retries.. Similar solution is used in gvisor-tap-vsock[2].

- Fix error handling if we could not read packet header or body from
  socket_vmnet stream socket. Previously we logged an error and
  continue, sending corrupted packet to VZ.

- Fix error handling if writing a packet to socket_vmnet stream socket
  returned after writing partial packet. Now we handle short writes and
  write the complete packet. Previously this ended with 2 corrupted
  packets.

- Log error if forwarding packets from vz to socket_vmnet or from
  socket_vmnet to vz failed.

Simplification:

- Use binary.Read() and binary.Write() to read and write qemu packet
  header.

Visibility:

- Make QEMUPacketConn private since it is an implementation detail
  correct only for lima vz-socket_vmnet use case.

[1] lima-vm/socket_vmnet#39
[2] containers/gvisor-tap-vsock#370

Signed-off-by: Nir Soffer <[email protected]>
nirs added a commit to nirs/lima that referenced this pull request Oct 3, 2024
We used external package (tcpproxy) for proxying between unix stream and
datagram sockets. This package cannot handle ENOBUFS error, expected
condition on BSD based systems, and worse, it hides errors and stop
forwarding packets silently when write to vz socket fails with
ENOBUFS[1].

Fix the issues by replacing tcpproxy with a simpler and more direct
implementation that will be easier to maintain.

Fixes:

- Fix error handling if write to vz datagram socket fail with ENOBUFS.
  We retry the write until it succeeds with a very short sleep between
  retries.. Similar solution is used in gvisor-tap-vsock[2].

- Fix error handling if we could not read packet header or body from
  socket_vmnet stream socket. Previously we logged an error and
  continue, sending corrupted packet to VZ.

- Fix error handling if writing a packet to socket_vmnet stream socket
  returned after writing partial packet. Now we handle short writes and
  write the complete packet. Previously this ended with 2 corrupted
  packets.

- Log error if forwarding packets from vz to socket_vmnet or from
  socket_vmnet to vz failed.

Simplification:

- Use binary.Read() and binary.Write() to read and write qemu packet
  header.

Visibility:

- Make QEMUPacketConn private since it is an implementation detail
  correct only for lima vz-socket_vmnet use case.

[1] lima-vm/socket_vmnet#39
[2] containers/gvisor-tap-vsock#370

Signed-off-by: Nir Soffer <[email protected]>
nirs added a commit to nirs/lima that referenced this pull request Oct 3, 2024
We used external package (tcpproxy) for proxying between unix stream and
datagram sockets. This package cannot handle ENOBUFS error, expected
condition on BSD based systems, and worse, it hides errors and stop
forwarding packets silently when write to vz socket fails with
ENOBUFS[1].

Fix the issues by replacing tcpproxy with a simpler and more direct
implementation that will be easier to maintain.

Fixes:

- Fix error handling if write to vz datagram socket fail with ENOBUFS.
  We retry the write until it succeeds with a very short sleep between
  retries. Similar solution is used in gvisor-tap-vsock[2].

- Fix error handling if we could not read packet header or body from
  socket_vmnet stream socket. Previously we logged an error and continue
  to send corrupted packet to vz from the point of the failure.

- Fix error handling if writing a packet to socket_vmnet stream socket
  returned after writing partial packet. Now we handle short writes and
  write the complete packet. Previously would break the protocol and
  continue to send corrupted packet from the point of the failure.

- Log error if forwarding packets from vz to socket_vmnet or from
  socket_vmnet to vz failed.

Simplification:

- Use binary.Read() and binary.Write() to read and write qemu packet
  header.

Visibility:

- Make QEMUPacketConn private since it is an implementation detail of vz
  when using socket_vmnet.

[1] lima-vm/socket_vmnet#39
[2] containers/gvisor-tap-vsock#370

Signed-off-by: Nir Soffer <[email protected]>
nirs added a commit to nirs/lima that referenced this pull request Oct 3, 2024
We used external package (tcpproxy) for proxying between unix stream and
datagram sockets. This package cannot handle ENOBUFS error, expected
condition on BSD based systems, and worse, it hides errors and stop
forwarding packets silently when write to vz socket fails with
ENOBUFS[1].

Fix the issues by replacing tcpproxy with a simpler and more direct
implementation that will be easier to maintain.

Fixes:

- Fix error handling if write to vz datagram socket fail with ENOBUFS.
  We retry the write until it succeeds with a very short sleep between
  retries. Similar solution is used in gvisor-tap-vsock[2].

- Fix error handling if we could not read packet header or body from
  socket_vmnet stream socket. Previously we logged an error and continue
  to send corrupted packet to vz from the point of the failure.

- Fix error handling if writing a packet to socket_vmnet stream socket
  returned after writing partial packet. Now we handle short writes and
  write the complete packet. Previously would break the protocol and
  continue to send corrupted packet from the point of the failure.

- Log error if forwarding packets from vz to socket_vmnet or from
  socket_vmnet to vz failed.

Simplification:

- Use binary.Read() and binary.Write() to read and write qemu packet
  header.

Visibility:

- Make QEMUPacketConn private since it is an implementation detail of vz
  when using socket_vmnet.

[1] lima-vm/socket_vmnet#39
[2] containers/gvisor-tap-vsock#370

Signed-off-by: Nir Soffer <[email protected]>
nirs added a commit to nirs/lima that referenced this pull request Oct 6, 2024
We used external package (tcpproxy) for proxying between unix stream and
datagram sockets. This package cannot handle ENOBUFS error, expected
condition on BSD based systems, and worse, it hides errors and stop
forwarding packets silently when write to vz socket fails with
ENOBUFS[1].

Fix the issues by replacing tcpproxy with a simpler and more direct
implementation that will be easier to maintain.

Fixes:

- Fix error handling if write to vz datagram socket fail with ENOBUFS.
  We retry the write until it succeeds with a very short sleep between
  retries. Similar solution is used in gvisor-tap-vsock[2].

- Fix error handling if we could not read packet header or body from
  socket_vmnet stream socket. Previously we logged an error and continue
  to send corrupted packet to vz from the point of the failure.

- Fix error handling if writing a packet to socket_vmnet stream socket
  returned after writing partial packet. Now we handle short writes and
  write the complete packet. Previously would break the protocol and
  continue to send corrupted packet from the point of the failure.

- Log error if forwarding packets from vz to socket_vmnet or from
  socket_vmnet to vz failed.

Simplification:

- Use binary.Read() and binary.Write() to read and write qemu packet
  header.

Visibility:

- Make QEMUPacketConn private since it is an implementation detail of vz
  when using socket_vmnet.

[1] lima-vm/socket_vmnet#39
[2] containers/gvisor-tap-vsock#370

Signed-off-by: Nir Soffer <[email protected]>
nirs added a commit to nirs/lima that referenced this pull request Oct 10, 2024
We used external package (tcpproxy) for proxying between unix stream and
datagram sockets. This package cannot handle ENOBUFS error, expected
condition on BSD based systems, and worse, it hides errors and stop
forwarding packets silently when write to vz socket fails with
ENOBUFS[1].

Fix the issues by replacing tcpproxy with a simpler and more direct
implementation that will be easier to maintain.

Fixes:

- Fix error handling if write to vz datagram socket fail with ENOBUFS.
  We retry the write until it succeeds with a very short sleep between
  retries. Similar solution is used in gvisor-tap-vsock[2].

- Fix error handling if we could not read packet header or body from
  socket_vmnet stream socket. Previously we logged an error and continue
  to send corrupted packet to vz from the point of the failure.

- Fix error handling if writing a packet to socket_vmnet stream socket
  returned after writing partial packet. Now we handle short writes and
  write the complete packet. Previously would break the protocol and
  continue to send corrupted packet from the point of the failure.

- Log error if forwarding packets from vz to socket_vmnet or from
  socket_vmnet to vz failed.

Simplification:

- Use binary.Read() and binary.Write() to read and write qemu packet
  header.

Visibility:

- Make QEMUPacketConn private since it is an implementation detail of vz
  when using socket_vmnet.

Testing:

- Add a packet forwarding test covering the happy path in 10
  milliseconds.

[1] lima-vm/socket_vmnet#39
[2] containers/gvisor-tap-vsock#370

Signed-off-by: Nir Soffer <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

vfkit: gvproxy exits on high network traffic
2 participants