-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate EGM fixes + sysfs linkage required for libvirt #33
base: 24.04_linux-nvidia-adv-6.8-next
Are you sure you want to change the base?
Integrate EGM fixes + sysfs linkage required for libvirt #33
Commits on Nov 22, 2024
-
vfio/nvgrace-egm: Free region memory during unregistration
Free the kmalloc'd region when the EGM is unregistered. Signed-off-by: Matthew R. Ochs <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 488ba9d - Browse repository at this point
Copy the full SHA 488ba9dView commit details -
vfio/nvgrace-egm: Move region hash initialization
Move region hash initiaization alongside the other region initialization statements to avoid situations where the hash table was not properly initialized. Signed-off-by: Matthew R. Ochs <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 5a2a63a - Browse repository at this point
Copy the full SHA 5a2a63aView commit details -
vfio/nvgrace-egm: Handle and convey EGM registration errors
Update error handling within EGM regiration routine to catch and return errors to the caller. Signed-off-by: Matthew R. Ochs <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for f05e845 - Browse repository at this point
Copy the full SHA f05e845View commit details -
vfio/nvgrace-gpu: Handle EGM registration failure
Detect and handle a failure from the EGM registration service. Signed-off-by: Matthew R. Ochs <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 9f2e0ad - Browse repository at this point
Copy the full SHA 9f2e0adView commit details -
vfio/nvgrace-gpu: Address checkpatch warnings
Fix source to resolve checkpatch warnings Signed-off-by: Matthew R. Ochs <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 1981e05 - Browse repository at this point
Copy the full SHA 1981e05View commit details -
vfio/nvgrace-egm: Address sparse errors
Fix minor syntax errors from sparse. Signed-off-by: Matthew R. Ochs <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for eeef16d - Browse repository at this point
Copy the full SHA eeef16dView commit details -
vfio/nvgrace-egm: Address smatch errors
Return the intended errno upon a copyout fault, remove unnecessary checks following container_of pointer derivation, and use the correct macro and types for overflow checking. Signed-off-by: Matthew R. Ochs <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 299bc85 - Browse repository at this point
Copy the full SHA 299bc85View commit details -
vfio/nvgrace-gpu: Address smatch errors
Use the correct macro and types for overflow checking. Signed-off-by: Matthew R. Ochs <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 228d53d - Browse repository at this point
Copy the full SHA 228d53dView commit details -
vfio/nvgrace-egm: Ensure ACPI value reads are successful
Ensure ACPI table reads are successful prior to using the value. Signed-off-by: Matthew R. Ochs <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for dcbb01f - Browse repository at this point
Copy the full SHA dcbb01fView commit details -
vfio/nvgrace-egm: Avoid invalid retired pages base
Some environments may provide a "nvidia,egm-retired-pages-data-base” but fail to populate it with a base address, leaving it NULL. Mapping this invalid value results in a synchronous exception when the region is first touched. Detect a NULL value, generate a warning to draw attention to the firmware bug, and return without mapping. INFO: th500_ras_intr_handler: External Abort reason=1 syndrome=0x92000410 flags=0x1 [ 82.104493] Internal error: synchronous external abort: 0000000096000410 [NVIDIA#1] SMP [ 82.114898] Modules linked in: nvgrace_gpu_vfio_pci(E) nvgrace_egm(E) [ 82.257218] CPU: 0 PID: 10 Comm: kworker/0:1 Tainted: G OE 6.8.12+ NVIDIA#5 [ 82.265135] Hardware name: NVIDIA GH200 P5042, BIOS 24103110 20241031 [ 82.271720] Workqueue: events work_for_cpu_fn [ 82.276180] pstate: 03400009 (nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) [ 82.283298] pc : register_egm_node+0x2cc/0x440 [nvgrace_egm] [ 82.289087] lr : register_egm_node+0x2c4/0x440 [nvgrace_egm] [ 82.294872] sp : ffff8000802ebc30 [ 82.298254] x29: ffff8000802ebc60 x28: 00000000000000ff x27: 0000000000000000 [ 82.305550] x26: ffff000087a320c8 x25: ffff0000a5700000 x24: ffff000087a32000 [ 82.312846] x23: ffffa77cd758e368 x22: 0000000000000000 x21: ffffa77cd758c640 [ 82.320141] x20: ffffa77cd758e170 x19: ffff800081e7d000 x18: ffff800080293038 [ 82.327437] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [ 82.334732] x14: 0000000000000000 x13: 65203a65646f6e5f x12: 0000000000000000 [ 82.342027] x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000 [ 82.349322] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000 [ 82.356618] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000 [ 82.363913] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff800081e7d000 [ 82.371210] Call trace: [ 82.373705] register_egm_node+0x2cc/0x440 [nvgrace_egm] [ 82.379135] nvgrace_gpu_probe+0x2ac/0x528 [nvgrace_gpu_vfio_pci] [ 82.385366] local_pci_probe+0x4c/0xe0 [ 82.389198] work_for_cpu_fn+0x28/0x58 [ 82.393026] process_one_work+0x168/0x3f0 [ 82.397123] worker_thread+0x360/0x480 [ 82.400952] kthread+0x11c/0x128 [ 82.404248] ret_from_fork+0x10/0x20 [ 82.407906] Code: d2820001 940002b3 aa0003f3 b4fffac0 (f9400017) [ 82.414134] ---[ end trace 0000000000000000 ]--- Signed-off-by: Matthew R. Ochs <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for db5fdd3 - Browse repository at this point
Copy the full SHA db5fdd3View commit details -
vfio/nvgrace-egm: Link egm and PCI devices
Create a sysfs link between the egm character device and its associated GPU (PCI device) for correlation. Example: $ realpath /sys/class/egm/egm4/0009\:01\:00.0 /sys/devices/pci0009:00/0009:00:00.0/0009:01:00.0 $ realpath /sys/bus/pci/devices/0009:01:00.0/egm4 /sys/devices/virtual/egm/egm4 Signed-off-by: Matthew R. Ochs <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for cd68d43 - Browse repository at this point
Copy the full SHA cd68d43View commit details -
cover-letter: vfio/nvgrace-egm: Support EGM/GPU correlation and impro…
…ve error handling Small series of fixes/improvements to the nvgrace VFIO modules. Signed-off-by: Matthew R. Ochs <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 420d79b - Browse repository at this point
Copy the full SHA 420d79bView commit details