Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[vioscsi] Fix SendSRB notify regression #1150

Merged

Conversation

benyamin-codez
Copy link
Contributor

@benyamin-codez benyamin-codez commented Sep 19, 2024

Related issues:
#756
#623
#907
...
[likely others]

Regression was introduced in #684 between 7dc052d and fdf56dd.
Specifically due to commits f1338bb and fdf56dd (considered together).

Prior to the regression, we did not issue virtqueue_notify() outside inside spinlock. This behaviour has been restored.

Prior to the regression, we issued virtqueue_kick_prepare() inside spinlock. This behaviour has been restored.

Note: We also do not want to issue virtqueue_notify() for a corrupt or failed buffer so this has been removed where virtqueue_add_buf() does not return SUCCESS. The virtqueue_add_buf() routine will return 0 on SUCCESS or otherwise a negative number, usually -28 (ENOSPC).

Freedom for Windows guests once held captive...! 8^D

cc: @vrozenfe @JonKohler @sb-ntnx @frwbr @foxmox

Related external issues (at least confounded by this regression):

https://bugzilla.proxmox.com/show_bug.cgi?id=4295
https://bugzilla.proxmox.com/show_bug.cgi?id=4295
https://bugzilla.kernel.org/show_bug.cgi?id=199727
...
[likely others]

Fixes regression in SendSRB of [vioscsi]

Regression was introduced in virtio-win#684 between 7dc052d and fdf56dd.
Specifically due to commits f1338bb and fdf56dd (considered together).

Signed-off-by: benyamin-codez <[email protected]>
@benyamin-codez
Copy link
Contributor Author

benyamin-codez commented Sep 19, 2024

Note my edit above, lots of distractions atm... 8^d

Prior to the regression, we did not issue virtqueue_notify() outside inside spinlock.

Edit: Still got it wrong. Thanks @frwbr for the correction..!

Prior to the regression, we issued virtqueue_kick_prepare() inside spinlock. This behaviour has been restored.

@benyamin-codez
Copy link
Contributor Author

@vrozenfe @JonKohler @sb-ntnx @frwbr @foxmox

All the checks have passed.

If there are no concerns, I'll move this from draft to review in about 12 hours...

@frwbr
Copy link

frwbr commented Sep 20, 2024

Hi, many thanks for opening this targeted PR!

I applied your patch on top of current master (22d0908) and ran the reproducer from #756 (comment) again (details on setup below). With this patch, I have not seen Reset to device, \Device\RaidPort[...] vioscsi warnings in the event log yet, nor Desc next is 3 messages issued by QEMU on the host. On current master (without this patch), I see both. So this patch looks very promising!

I have one comment:

Regression was introduced in #684 between 7dc052d and fdf56dd. Specifically due to commits f1338bb and fdf56dd (considered together).

Prior to the regression, we did not issue virtqueue_notify() outside inside spinlock. This behaviour has been restored.

Has been some time since I last looked into the code -- but isn't it the call to virtqueue_kick_prepare (not virtqueue_notify) that is currently (on current master) issued outside the spinlock, and with this patch moves inside the spinlock? If I'm not mistaken, the call to virtqueue_kick_prepare used to be inside the spinlock before commit f1338bb, which moved it outside the spinlock.

The call to virtqueue_notify, on the other hand, was issued outside the spinlock before and after f1338bb, and with this patch is still issued outside the spinlock -- I think :)


Guest: Windows 10
Host:

./qemu-stable-9.0.1/qemu-system-x86_64 \
  -accel kvm \
  -name 'win10-vioscsi-versions,debug-threads=on' \
  -chardev 'socket,id=qmp,path=/var/run/qemu-server/151.qmp,server=on,wait=off' \
  -mon 'chardev=qmp,mode=control' \
  -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' \
  -mon 'chardev=qmp-event,mode=control' \
  -pidfile /var/run/qemu-server/151.pid \
  -smbios 'type=1,uuid=32b5c31e-ec75-4d62-9d0b-a756f876e943' \
  -smp '4,sockets=1,cores=4,maxcpus=4' \
  -nodefaults \
  -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' \
  -vnc 'unix:/var/run/qemu-server/151.vnc,password=on' \
  -cpu 'qemu64,+aes,enforce,hv_ipi,hv_relaxed,hv_reset,hv_runtime,hv_spinlocks=0x1fff,hv_stimer,hv_synic,hv_time,hv_vapic,hv_vpindex,+kvm_pv_eoi,+kvm_pv_unhalt,+pni,+popcnt,+sse4.1,+sse4.2,+ssse3' \
  -m 8192 \
  -object 'iothread,id=iothread-virtioscsi0' \
  -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' \
  -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' \
  -device 'pci-bridge,id=pci.3,chassis_nr=3,bus=pci.0,addr=0x5' \
  -device 'vmgenid,guid=98ea6dc4-8be3-4c04-9b90-ce97bd6ba7b2' \
  -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' \
  -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' \
  -device 'VGA,id=vga,bus=pci.0,addr=0x2,edid=off' \
  -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
  -device 'virtio-scsi-pci,id=virtioscsi0,bus=pci.3,addr=0x1,iothread=iothread-virtioscsi0' \
  -drive 'file=/dev/pve/vm-151-disk-1,if=none,id=drive-scsi0,format=raw,cache=none,aio=io_uring,detect-zeroes=on' \
  -device 'scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0' \
  -device 'ahci,id=ahci0,multifunction=on,bus=pci.0,addr=0x7' \
  -drive 'file=/dev/pve/vm-151-disk-0,if=none,id=drive-sata0,format=raw,cache=none,aio=io_uring,detect-zeroes=on' \
  -device 'ide-hd,bus=ahci0.0,drive=drive-sata0,id=sata0,bootindex=100' \
  -netdev 'type=tap,id=net0,ifname=tap151i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown' \
  -device 'e1000,mac=BA:FF:FF:84:10:E7,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=102' \
  -rtc 'driftfix=slew,base=localtime' \
  -machine 'hpet=off,type=pc-i440fx-9.0' \
  -global 'kvm-pit.lost_tick_policy=discard'

Copy link
Contributor

@JonKohler JonKohler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general

@benyamin-codez
Copy link
Contributor Author

@JonKohler & @frwbr, thank you for the feedback.

Friedrich, you are of course correct re:

Has been some time since I last looked into the code -- but isn't it the call to virtqueue_kick_prepare (not virtqueue_notify) that is currently (on current master) issued outside the spinlock, and with this patch moves inside the spinlock? If I'm not mistaken, the call to virtqueue_kick_prepare used to be inside the spinlock before commit f1338bb, which moved it outside the spinlock.

The call to virtqueue_notify, on the other hand, was issued outside the spinlock before and after f1338bb, and with this patch is still issued outside the spinlock -- I think :)

I will update the post.

I have way too many distractions at the moment...
...and working with 24 git clones of this repo (!!!)...
...all of which is somewhat dangerous...

I've also been going down that NumberOfPhysicalBreaks rabbit hole...
Looking at that and its analogue MaximumPhysicalPages...
Plus HwMaxXferLen <--> MaximumTransferLength...
Including all the history of that ^^, e.g. MaximumSGList = ((MAX_BLOCK_SIZE)/PAGE_SIZE) + 1
Plus accommodating 64-bit DMA in the SrbExtensionSize using NOPB or equiv. (which I'm not sure we actually do)...
Plus stepping through the debugger to check out what's actually going on with all of that ^^.
8^d

I think my brain might need rebasing before I'm done... 8^O

Anyway, I'm sure I'll get there in the next week or two, just in time for viostor to start getting mainstream multiqueue support and become the new default...!! That being said, I have enjoyed the edification and improving what I can and it's been nice touching C again.

@benyamin-codez
Copy link
Contributor Author

@vrozenfe

Vadim, any issues with this one...?

Regards,
Ben

@vrozenfe
Copy link
Collaborator

@benyamin-codez

Sorry for delay in response.
And thank you for your work.

All the best,
Vadim.

@benyamin-codez
Copy link
Contributor Author

@vrozenfe

My pleasure Vadim. Happy to help.

This one should bring welcome relief to many once merged and packaged.

Best regards,
Ben

@vrozenfe vrozenfe merged commit 0755ca6 into virtio-win:master Oct 3, 2024
5 checks passed
benyamin-codez added a commit to benyamin-codez/kvm-guest-drivers-windows that referenced this pull request Oct 23, 2024
Backports fixes and improvements from vioscsi PRs virtio-win#1150 and virtio-win#1162

virtqueue struct vq was also removed in favour of adaptExt->vq[QueueNumber],
which results in a minor performance increase.

Signed-off-by: benyamin-codez <[email protected]>
YanVugenfirer pushed a commit that referenced this pull request Nov 11, 2024
Backports fixes and improvements from vioscsi PRs #1150 and #1162

virtqueue struct vq was also removed in favour of adaptExt->vq[QueueNumber],
which results in a minor performance increase.

Signed-off-by: benyamin-codez <[email protected]>
benyamin-codez added a commit to benyamin-codez/kvm-guest-drivers-windows that referenced this pull request Nov 19, 2024
Backports fixes and improvements from vioscsi PRs virtio-win#1150 and virtio-win#1162

virtqueue struct vq was also removed in favour of adaptExt->vq[QueueNumber],
which results in a minor performance increase.

Signed-off-by: benyamin-codez <[email protected]>
MartinDrab pushed a commit to MartinDrab/kvm-guest-drivers-windows that referenced this pull request Dec 19, 2024
Backports fixes and improvements from vioscsi PRs virtio-win#1150 and virtio-win#1162

virtqueue struct vq was also removed in favour of adaptExt->vq[QueueNumber],
which results in a minor performance increase.

Signed-off-by: benyamin-codez <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants