-
Notifications
You must be signed in to change notification settings - Fork 536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
virtio_netmap: Use half the vring's size as netmap slots #726
base: master
Are you sure you want to change the base?
Conversation
For every slot in a netmap ring, two descriptors in the virtqueue's vring are used. Therefore, we can only use half the vring's size as slots for netmap rings. Otherwise, the vring and the netmap ring are out of sync. This currently corrupts proper operation and produces bad data (see below). This patch also sets an explicit nm_config to ensure we keep the values for num_rx_desc/num_tx_desc from attach time and marks the VQ_FULL case in virtio_netmap_init_buffers() as an issue that should never be triggered. Background: Experiment / Observation Running a qemu-guest with a virtio_net NIC (rx_queue_size=256) attached to a bridge on the host. Sending known/predictable packets in a loop from the host via scapy. On the guest side, running tcpdump with native netmap support and inspecting the packet data: tcpdump -i netmap:enp8s0 -Xn# The first 128 packets look as expected, but thereafter come 128 packets of "null data": 128 15:59:46.592378 IP 10.0.0.128.128 > 127.0.0.1.256: UDP, length 141 0x0000: 4500 00a9 0001 0000 4011 f0c2 0a00 0080 E.......@....... 0x0010: 7f00 0001 0080 0100 0095 7f07 3030 3132 ............0012 0x0020: 3820 3031 3330 2046 4646 4646 4646 4646 8.0130.FFFFFFFFF 0x0030: 4646 4646 4646 4646 4646 4646 4646 4646 FFFFFFFFFFFFFFFF 0x0040: 4646 4646 4646 4646 4646 4646 4646 4646 FFFFFFFFFFFFFFFF 0x0050: 4646 4646 4646 4646 4646 4646 4646 4646 FFFFFFFFFFFFFFFF 0x0060: 4646 4646 4646 4646 4646 4646 4646 4646 FFFFFFFFFFFFFFFF 0x0070: 4646 4646 4646 4646 4646 4646 4646 4646 FFFFFFFFFFFFFFFF 0x0080: 4646 4646 4646 4646 4646 4646 4646 4646 FFFFFFFFFFFFFFFF 0x0090: 4646 4646 4646 4646 4646 4646 4646 4646 FFFFFFFFFFFFFFFF 0x00a0: 4646 4646 4646 4646 46 FFFFFFFFF 129 15:59:46.712936 00:00:00:00:00:00 > 00:00:00:00:00:00 Null Information, send seq 0, rcv seq 0, Flags [Command], length 121 0x0000: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0010: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0020: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0030: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0040: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0050: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0060: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0070: 0000 0000 00 ..... At packet 256 tcpdump receives non-null data, but the payloads represents what should've been received for packet 129 and later. Because the length is taken from the vring, some of the slots also expose stale/corrupt data: 281 16:03:51.846490 IP 10.0.0.153.153 > 127.0.0.1.306: UDP, length 74 0x0000: 4500 0066 0001 0000 4011 f0ec 0a00 0099 E..f....@....... 0x0010: 7f00 0001 0099 0132 0052 d633 3030 3135 .......2.R.30015 0x0020: 3320 3030 3633 2046 4646 4646 4646 4646 3.0063.FFFFFFFFF 0x0030: 4646 4646 4646 4646 4646 4646 4646 4646 FFFFFFFFFFFFFFFF 0x0040: 4646 4646 4646 4646 4646 4646 4646 4646 FFFFFFFFFFFFFFFF 0x0050: 4646 4646 4646 4646 4646 4646 4646 4646 FFFFFFFFFFFFFFFF 0x0060: 4646 4646 4646 5757 5757 5757 5757 5757 FFFFFFWWWWWWWWWW 0x0070: 5757 5757 5757 5757 5757 5757 5757 5757 WWWWWWWWWWWWWWWW 0x0080: 5757 5757 5757 5757 5757 5757 5757 5757 WWWWWWWWWWWWWWWW 0x0090: 5757 5757 5757 5757 5757 5757 0000 0000 WWWWWWWWWWWW.... 0x00a0: 0000 0000 0000 0000 ........
As an additional comment, FreeBSD uses only half the slots, too, unless indirect descriptors are in use. Linux virtio_net doesn't seem to use indirect descriptors (checked 4.4, 4,19), so it seems reasonable to do the same. https://github.com/freebsd/freebsd/blob/master/sys/dev/netmap/if_vtnet_netmap.h#L386 |
* num_rx_desc descriptors and so the queue is not | ||
* allowed to become full. | ||
*/ | ||
if (vq->num_free == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the VQ_FULL
macro for portability
@@ -677,6 +685,20 @@ virtio_netmap_intr(struct netmap_adapter *na, int onoff) | |||
} | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defining this function is unnecessary, because nothing is going to change the values listed in struct nm_config_info
...
Here you are not updating them indeed.
For every slot in a netmap ring, two descriptors in the virtqueue's
vring are used. Therefore, we can only use half the vring's size as
slots for netmap rings. Otherwise, the vring and the netmap ring are
out of sync. This currently corrupts proper operation and produces
bad data (see below).
This patch also sets an explicit nm_config to ensure we keep the
values for num_rx_desc/num_tx_desc from attach time and marks the
VQ_FULL case in virtio_netmap_init_buffers() as an issue that should
never be triggered.
Background: Experiment / Observation
Running a qemu-guest with a virtio_net NIC (rx_queue_size=256) attached
to a bridge on the host. Sending known/predictable packets in a loop
from the host via scapy. On the guest side, running tcpdump with native
netmap support and inspecting the packet data:
The first 128 packets look as expected, but thereafter come 128 packets
of "null data":
At packet 256 tcpdump receives non-null data, but the payloads represents
what should've been received for packet 129 and later. Because the length
is taken from the vring, some of the slots also expose stale/corrupt data:
This seems like a biggie and I wonder why this hasn't come up before, maybe there's something off in the analysis?
Scapy code to produce packets