-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[vsock] tx/rx hungup after large packet is delivered #494
Comments
I ran qemu with |
right!
mmm, so it could be a problem in the driver then. Which host & guest kernel are you using? |
for host, I tested with the latest one, 6.5? and for guest, I used 5.4(based on ubuntu 20.04) |
both guest->host and host->guest have the problem. |
@ikicha thanks, replicated! We need to check better what is going on with credits and socket buffer. Could it be that ncat does not set the socket buffer properly and waits for the whole message instead of reading it in pieces? |
I suspected credits.. because there is no response of CREDIT REQUEST. QQ: what components negotiate credit with each others? vhost-vsock and vhost-device-vsock? or vmm and vhost-device-vsock? |
I can't see a CREDIT REQUEST using
The VMM is not involved at all in the data path with both So the credit information is exchanged between the driver and the device, usually in every packet. So, if there is traffic, they don't need to explicitly send a CREDIT REQUEST, since every packet header contains the credit info. |
@ikicha can you try increasing the vsock buffer size in this way:
|
For |
In the guest, I don't know if import socket
import os
server = socket.socket(socket.AF_VSOCK, socket.SOCK_STREAM)
server.bind((socket.VMADDR_CID_ANY, 1234))
server.listen()
client, _ = server.accept()
client.setsockopt(socket.AF_VSOCK, socket.SO_VM_SOCKETS_BUFFER_MAX_SIZE, 2*1024*1024)
client.setsockopt(socket.AF_VSOCK, socket.SO_VM_SOCKETS_BUFFER_SIZE, 2*1024*1024)
i = 0
while True:
print(i, ": waiting response")
response = client.recv(1024*1024)
print(i, ": received ", len(response), "bytes")
if not response:
break
print(i, ": sending ", len(response), "bytes")
client.sendall(response)
i += 1
client.close() |
I tested it with vhost-vsock, and it mitigates the symptom, but it happens again as packet size increases. The interesting thing is both host and guest is stuck at |
If you don't want to increase the buffer size, what you need to do is have another thread reading. Otherwise what happens here is the following: The host wants to send 1 MB, the guest sends the data back, but when it gets to 256 K (default vsock buffer size) it deadlocks because the host buffer is full since no one is reading data from the socket. So the host should have another thread reading as you send. This works fine if the code I sent here runs in the guest (even without increasing the buffer size), while with ncat it doesn't work, I think there is some such thing in ncat too |
You need to set the buffer accordingly if you don't read from it with another thread. |
I just tried these 2 scripts with
import threading
import socket
import os
class thread(threading.Thread):
def __init__(self, client):
threading.Thread.__init__(self)
self.client = client
def run(self):
i = 0
while True:
print(i, ": waiting response")
response = self.client.recv(1024*1024)
if not response:
break
print(i, ": received ", len(response), "bytes")
i += 1
client = socket.socket(socket.AF_VSOCK, socket.SOCK_STREAM)
client.connect((3, 1234))
message = 'connect 1234\n'
client.sendall(message.encode())
response = client.recv(1024)
print("response:", response)
thread = thread(client)
thread.start()
i = 0
while True:
message = ' '*1024*1024*10
print(i, ": sending ", len(message), "bytes")
client.sendall(message.encode())
i += 1
client.close()
thread.join()
import socket
import os
server = socket.socket(socket.AF_VSOCK, socket.SOCK_STREAM)
server.bind((socket.VMADDR_CID_ANY, 1234))
server.listen()
client, _ = server.accept()
i = 0
while True:
print(i, ": waiting response")
response = client.recv(1024*1024)
print(i, ": received ", len(response), "bytes")
if not response:
break
print(i, ": sending ", len(response), "bytes")
client.sendall(response)
i += 1
client.close() |
Thanks for sharing, I confirm that it works for vhost-vsock. But it still doesn't work with vhost-user-vsock. do we need a kind of separate threads for tx and rx in vhost-user-vsock?(like proposal #323 ?) |
Thanks for confirming!
Yes, perhaps that might be the case. As a workaround, have you tried increasing |
Increasing tx buf size just postpones the error. In guest, the program hung up at I think vhost-device-vsock needs a kind of flow control logic, and I was curious if vhost-vsock has something like that. (because it doesn't happened in vhost-vosck..) (BTW, the counter part of vhost-device-vsock in vhost-vsock is drivers/vhost/vsock.c, isn't it?) |
Yeah, we need to check better that part.
Right. |
@ikicha ah, maybe increasing the vsock buffer in guest.py in #494 (comment) will help, since I guess the problem in vhost-device-vsock could be similar to the problem in the python script of #494 (comment) |
@ikicha FYI there was a potential un-init field in virtio-vsock common code (host-guest) affecting Introduced in Linux v6.3. |
Thanks for sharing! It might a part of problem, but I think I found the root cause. There is silent return via error in epoll_register between reading from uds and writing it into vq in conn.recv_pkt, it causes packet drop. |
I'll send a pull request about that soon. |
guest side:
while true; do ncat -e /bin/cat --vsock -l 1234 -k; done;
host side:
When I did stress test for vhost-device-vsock, I found a bug with large rx packet. It was stuck after VSOCK_OP_CREDIT_REQUEST in VsockConnection.recv_pkt(), (and it looks like there is no VSOCK_OP_CREDIT_UPDATE in tx) And it happens both in qemu and crosvm. I was curious if similar bugs happened before. (BTW, I found https://lore.kernel.org/netdev/[email protected]/ which is similar)
The text was updated successfully, but these errors were encountered: