Skip to content

Commit

Permalink
kernel: Fix vsock packets drop when the vsock driver starts
Browse files Browse the repository at this point in the history
The virtio vsock driver has a small window during initialization
where it can silently drop replies to connection requests.
Because no reply is sent, kata waits for 10 seconds and in the
end it generates a connection timeout error in HybridVSockDialer.

Fixes: kata-containers#8291

Signed-off-by: Alexandru Matei <[email protected]>
  • Loading branch information
alex-matei committed Nov 14, 2023
1 parent dffc6f6 commit bfd1ce3
Show file tree
Hide file tree
Showing 13 changed files with 834 additions and 3 deletions.
2 changes: 1 addition & 1 deletion tools/packaging/kernel/kata_config_version
Original file line number Diff line number Diff line change
@@ -1 +1 @@
116
117
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
From d03fb89b5e6dcaf399480708ee2d5a32dca70e7a Mon Sep 17 00:00:00 2001
From: "Longpeng(Mike)" <[email protected]>
Date: Thu, 12 Aug 2021 13:30:56 +0800
Subject: [PATCH 1/7] vsock/virtio: avoid potential deadlock when vsock device
remove

There's a potential deadlock case when remove the vsock device or
process the RESET event:

vsock_for_each_connected_socket:
spin_lock_bh(&vsock_table_lock) ----------- (1)
...
virtio_vsock_reset_sock:
lock_sock(sk) --------------------- (2)
...
spin_unlock_bh(&vsock_table_lock)

lock_sock() may do initiative schedule when the 'sk' is owned by
other thread at the same time, we would receivce a warning message
that "scheduling while atomic".

Even worse, if the next task (selected by the scheduler) try to
release a 'sk', it need to request vsock_table_lock and the deadlock
occur, cause the system into softlockup state.
Call trace:
queued_spin_lock_slowpath
vsock_remove_bound
vsock_remove_sock
virtio_transport_release
__vsock_release
vsock_release
__sock_release
sock_close
__fput
____fput

So we should not require sk_lock in this case, just like the behavior
in vhost_vsock or vmci.

Fixes: 0ea9e1d3a9e3 ("VSOCK: Introduce virtio_transport.ko")
Cc: Stefan Hajnoczi <[email protected]>
Signed-off-by: Longpeng(Mike) <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
---
net/vmw_vsock/virtio_transport.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index 2700a63ab095..3a056f8affd1 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -356,11 +356,14 @@ static void virtio_vsock_event_fill(struct virtio_vsock *vsock)

static void virtio_vsock_reset_sock(struct sock *sk)
{
- lock_sock(sk);
+ /* vmci_transport.c doesn't take sk_lock here either. At least we're
+ * under vsock_table_lock so the sock cannot disappear while we're
+ * executing.
+ */
+
sk->sk_state = TCP_CLOSE;
sk->sk_err = ECONNRESET;
sk->sk_error_report(sk);
- release_sock(sk);
}

static void virtio_vsock_update_guest_cid(struct virtio_vsock *vsock)
--
2.34.1

Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
From a44b98fc926521d6cf642ed945e1d0781973d0d8 Mon Sep 17 00:00:00 2001
From: Jiyong Park <[email protected]>
Date: Fri, 11 Mar 2022 11:00:16 +0900
Subject: [PATCH 2/7] vsock: each transport cycles only on its own sockets

When iterating over sockets using vsock_for_each_connected_socket, make
sure that a transport filters out sockets that don't belong to the
transport.

There actually was an issue caused by this; in a nested VM
configuration, destroying the nested VM (which often involves the
closing of /dev/vhost-vsock if there was h2g connections to the nested
VM) kills not only the h2g connections, but also all existing g2h
connections to the (outmost) host which are totally unrelated.

Tested: Executed the following steps on Cuttlefish (Android running on a
VM) [1]: (1) Enter into an `adb shell` session - to have a g2h
connection inside the VM, (2) open and then close /dev/vhost-vsock by
`exec 3< /dev/vhost-vsock && exec 3<&-`, (3) observe that the adb
session is not reset.

[1] https://android.googlesource.com/device/google/cuttlefish/

Fixes: c0cfa2d8a788 ("vsock: add multi-transports support")
Reviewed-by: Stefano Garzarella <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Jiyong Park <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
---
drivers/vhost/vsock.c | 3 ++-
include/net/af_vsock.h | 3 ++-
net/vmw_vsock/af_vsock.c | 9 +++++++--
net/vmw_vsock/virtio_transport.c | 7 +++++--
net/vmw_vsock/vmci_transport.c | 5 ++++-
5 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index a483cec31d5c..e413aaaca8f9 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -695,7 +695,8 @@ static int vhost_vsock_dev_release(struct inode *inode, struct file *file)

/* Iterating over all connections for all CIDs to find orphans is
* inefficient. Room for improvement here. */
- vsock_for_each_connected_socket(vhost_vsock_reset_orphans);
+ vsock_for_each_connected_socket(&vhost_transport.transport,
+ vhost_vsock_reset_orphans);

vhost_vsock_stop(vsock);
vhost_vsock_flush(vsock);
diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index b1c717286993..4d8589244dc7 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -197,7 +197,8 @@ struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr);
struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
struct sockaddr_vm *dst);
void vsock_remove_sock(struct vsock_sock *vsk);
-void vsock_for_each_connected_socket(void (*fn)(struct sock *sk));
+void vsock_for_each_connected_socket(struct vsock_transport *transport,
+ void (*fn)(struct sock *sk));
int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk);
bool vsock_find_cid(unsigned int cid);

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 791955f5e7ec..bc3b85838158 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -333,7 +333,8 @@ void vsock_remove_sock(struct vsock_sock *vsk)
}
EXPORT_SYMBOL_GPL(vsock_remove_sock);

-void vsock_for_each_connected_socket(void (*fn)(struct sock *sk))
+void vsock_for_each_connected_socket(struct vsock_transport *transport,
+ void (*fn)(struct sock *sk))
{
int i;

@@ -342,8 +343,12 @@ void vsock_for_each_connected_socket(void (*fn)(struct sock *sk))
for (i = 0; i < ARRAY_SIZE(vsock_connected_table); i++) {
struct vsock_sock *vsk;
list_for_each_entry(vsk, &vsock_connected_table[i],
- connected_table)
+ connected_table) {
+ if (vsk->transport != transport)
+ continue;
+
fn(sk_vsock(vsk));
+ }
}

spin_unlock_bh(&vsock_table_lock);
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index 3a056f8affd1..e131121533ad 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -24,6 +24,7 @@
static struct workqueue_struct *virtio_vsock_workqueue;
static struct virtio_vsock __rcu *the_virtio_vsock;
static DEFINE_MUTEX(the_virtio_vsock_mutex); /* protects the_virtio_vsock */
+static struct virtio_transport virtio_transport; /* forward declaration */

struct virtio_vsock {
struct virtio_device *vdev;
@@ -383,7 +384,8 @@ static void virtio_vsock_event_handle(struct virtio_vsock *vsock,
switch (le32_to_cpu(event->id)) {
case VIRTIO_VSOCK_EVENT_TRANSPORT_RESET:
virtio_vsock_update_guest_cid(vsock);
- vsock_for_each_connected_socket(virtio_vsock_reset_sock);
+ vsock_for_each_connected_socket(&virtio_transport.transport,
+ virtio_vsock_reset_sock);
break;
}
}
@@ -635,7 +637,8 @@ static void virtio_vsock_remove(struct virtio_device *vdev)
synchronize_rcu();

/* Reset all connected sockets when the device disappear */
- vsock_for_each_connected_socket(virtio_vsock_reset_sock);
+ vsock_for_each_connected_socket(&virtio_transport.transport,
+ virtio_vsock_reset_sock);

/* Stop all work handlers to make sure no one is accessing the device,
* so we can safely call vdev->config->reset().
diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
index 8b65323207db..87de3477cb6d 100644
--- a/net/vmw_vsock/vmci_transport.c
+++ b/net/vmw_vsock/vmci_transport.c
@@ -75,6 +75,8 @@ static u32 vmci_transport_qp_resumed_sub_id = VMCI_INVALID_ID;

static int PROTOCOL_OVERRIDE = -1;

+static struct vsock_transport vmci_transport; /* forward declaration */
+
/* Helper function to convert from a VMCI error code to a VSock error code. */

static s32 vmci_transport_error_to_vsock_error(s32 vmci_error)
@@ -883,7 +885,8 @@ static void vmci_transport_qp_resumed_cb(u32 sub_id,
const struct vmci_event_data *e_data,
void *client_data)
{
- vsock_for_each_connected_socket(vmci_transport_handle_detach);
+ vsock_for_each_connected_socket(&vmci_transport,
+ vmci_transport_handle_detach);
}

static void vmci_transport_recv_pkt_work(struct work_struct *work)
--
2.34.1

Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
From d57616af2cfcd67e964f91867cdcc20124b49091 Mon Sep 17 00:00:00 2001
From: Stefano Garzarella <[email protected]>
Date: Wed, 23 Mar 2022 18:36:23 +0100
Subject: [PATCH 3/7] vsock/virtio: initialize vdev->priv before using VQs

When we fill VQs with empty buffers and kick the host, it may send
an interrupt. `vdev->priv` must be initialized before this since it
is used in the virtqueue callbacks.

Fixes: 0deab087b16a ("vsock/virtio: use RCU to avoid use-after-free on the_virtio_vsock")
Suggested-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Stefano Garzarella <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
---
net/vmw_vsock/virtio_transport.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index e131121533ad..0711aaed17da 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -599,6 +599,8 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
INIT_WORK(&vsock->event_work, virtio_transport_event_work);
INIT_WORK(&vsock->send_pkt_work, virtio_transport_send_pkt_work);

+ vdev->priv = vsock;
+
mutex_lock(&vsock->tx_lock);
vsock->tx_run = true;
mutex_unlock(&vsock->tx_lock);
@@ -613,7 +615,6 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
vsock->event_run = true;
mutex_unlock(&vsock->event_lock);

- vdev->priv = vsock;
rcu_assign_pointer(the_virtio_vsock, vsock);

mutex_unlock(&the_virtio_vsock_mutex);
--
2.34.1

Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
From cbb4f49c1f4e761d1d35dab0bbed3b6aed52d456 Mon Sep 17 00:00:00 2001
From: Stefano Garzarella <[email protected]>
Date: Wed, 23 Mar 2022 18:36:25 +0100
Subject: [PATCH 4/7] vsock/virtio: enable VQs early on probe

virtio spec requires drivers to set DRIVER_OK before using VQs.
This is set automatically after probe returns, but virtio-vsock
driver uses VQs in the probe function to fill rx and event VQs
with new buffers.

Let's fix this, calling virtio_device_ready() before using VQs
in the probe function.

Fixes: 0ea9e1d3a9e3 ("VSOCK: Introduce virtio_transport.ko")
Signed-off-by: Stefano Garzarella <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
---
net/vmw_vsock/virtio_transport.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index 0711aaed17da..ae22cc8e1b27 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -601,6 +601,8 @@ static int virtio_vsock_probe(struct virtio_device *vdev)

vdev->priv = vsock;

+ virtio_device_ready(vdev);
+
mutex_lock(&vsock->tx_lock);
vsock->tx_run = true;
mutex_unlock(&vsock->tx_lock);
--
2.34.1

Loading

0 comments on commit bfd1ce3

Please sign in to comment.