layers: Allow using uncached buffer for Debug Printf #6128

dorian-apanel-intel · 2023-07-12T12:43:10Z

Adds option to force using AMD_DEVICE_COHERENT_MEMORY for debug printf buffer to print messages even if VK_ERROR_DEVICE_LOST is encountered.

Forcing extension and device feature if not enabled by application.

Added workaround for atomic operations in
uncached memory being in cache anyway.

Added workaround to failing MapMemory after DEVICE_LOST (occurs on AMD): When using uncached buffer, do not unmap buffer until messages are analyzed.

This PR attempts to implement #6101 Tested on AMD and Intel (extension not public yet) using this modified vkcube: dorian-apanel-intel/Vulkan-Tools@e022693

CLAassistant · 2023-07-12T12:43:16Z

All committers have signed the CLA.

ci-tester-lunarg · 2023-07-12T12:43:17Z

Author dorian-apanel-intel not on autobuild list. Waiting for curator authorization before starting CI build.

ci-tester-lunarg · 2023-07-12T12:55:13Z

Author dorian-apanel-intel not on autobuild list. Waiting for curator authorization before starting CI build.

ci-tester-lunarg · 2023-07-12T12:57:50Z

Author dorian-apanel-intel not on autobuild list. Waiting for curator authorization before starting CI build.

ci-tester-lunarg · 2023-07-12T13:12:43Z

Author dorian-apanel-intel not on autobuild list. Waiting for curator authorization before starting CI build.

ci-tester-lunarg · 2023-07-12T13:17:19Z

Author dorian-apanel-intel not on autobuild list. Waiting for curator authorization before starting CI build.

layers/gpu_validation/debug_printf.cpp

ci-tester-lunarg · 2023-07-12T16:02:00Z

CI Vulkan-ValidationLayers build queued with queue ID 5177.

TonyBarbour

Has this been tested? Is there a test that can go with this?

juan-lunarg

I don't see any testing being added. Is the plan to add them later?

ci-tester-lunarg · 2023-07-12T16:05:01Z

CI Vulkan-ValidationLayers build # 12676 running.

ci-tester-lunarg · 2023-07-12T16:54:46Z

CI Vulkan-ValidationLayers build # 12676 passed.

ci-tester-lunarg · 2023-07-13T06:19:45Z

Author dorian-apanel-intel not on autobuild list. Waiting for curator authorization before starting CI build.

dorian-apanel-intel · 2023-07-13T06:32:09Z

@juan-lunarg @Tony-LunarG

I don't see any testing being added. Is the plan to add them later?
Has this been tested? Is there a test that can go with this?

I've only tested it manually so far.
Would it be enough to copy one test from tests\negative\debug_printf.cpp and modify it to use uncached buffer and hang?

ci-tester-lunarg · 2023-07-13T15:14:04Z

CI Vulkan-ValidationLayers build queued with queue ID 5894.

ci-tester-lunarg · 2023-07-13T15:15:01Z

CI Vulkan-ValidationLayers build # 12692 running.

TonyBarbour · 2023-07-13T15:24:18Z

@juan-lunarg @Tony-LunarG

I don't see any testing being added. Is the plan to add them later?
Has this been tested? Is there a test that can go with this?

I've only tested it manually so far. Would it be enough to copy one test from tests\negative\debug_printf.cpp and modify it to use uncached buffer and hang?

I think so, as long as the test gets a DEVICE_LOST and then successfully parses a debug printf

ci-tester-lunarg · 2023-07-13T16:06:46Z

CI Vulkan-ValidationLayers build # 12692 passed.

juan-lunarg · 2023-07-13T17:16:37Z

Would it be enough to copy one test from tests\negative\debug_printf.cpp and modify it to use uncached buffer and hang?

This sounds like it would cause problems for Jenkins CI (Although if we switched to ctest maybe not... since the test would be in it's own process).

Because currently CI invokes the test executable and tries running all tests.

ci-tester-lunarg · 2023-07-17T10:35:47Z

Author dorian-apanel-intel not on autobuild list. Waiting for curator authorization before starting CI build.

Adds option to force using AMD_DEVICE_COHERENT_MEMORY for debug printf buffer to print messages even if VK_ERROR_DEVICE_LOST is encountered. * Option added to layer json to be visible in vkconfig, * Forcing extension and device feature if not enabled by application, * Added workaround for atomic operations in uncached memory being in cache anyway, * Added workaround to failing MapMemory after DEVICE_LOST (occurs on AMD): When using uncached buffer, do not unmap buffer until messages are analyzed.

ci-tester-lunarg · 2023-07-17T12:22:16Z

Author dorian-apanel-intel not on autobuild list. Waiting for curator authorization before starting CI build.

dorian-apanel-intel · 2023-07-17T12:33:29Z

@Tony-LunarG @juan-lunarg @jeremyg-lunarg

I've added unit test, had to pass a value using push constants, because hardcoded infinite loop was not hanging on AMD.
I modified layer_options.cpp file to move option from test to layer. It is not perfect, but after looking at layer_configuration.md does not describe programmatic configuration of layers #5895, I think it will need rewrite soon anyway.
After updating AMD drivers, Atomics WA is no longer needed there, but leaving it for Intel for now.
NVidia does not support required extension yet.
Force push was required due to printf test file rename.

I plan to add also unit test for Page Fault, but preferably in another pull request.

ci-tester-lunarg · 2023-07-17T13:20:51Z

Author dorian-apanel-intel not on autobuild list. Waiting for curator authorization before starting CI build.

ci-tester-lunarg · 2023-07-17T15:44:40Z

CI Vulkan-ValidationLayers build queued with queue ID 7794.

ci-tester-lunarg · 2023-07-17T16:22:04Z

CI Vulkan-ValidationLayers build # 12728 running.

ci-tester-lunarg · 2023-07-17T16:22:06Z

CI Vulkan-ValidationLayers build # 12728 running.

ci-tester-lunarg · 2023-07-17T17:17:14Z

CI Vulkan-ValidationLayers build # 12728 failed.

juan-lunarg · 2023-07-17T17:25:02Z

Galaxy S22 testing failed

[ RUN      ] NegativeDebugPrintf.UncachedBuffer
jni/../../tests/framework/error_monitor.cpp:200: Failure
Failed
Validation Error: [ UNASSIGNED-DEBUG-PRINTF ] Object 0: handle = 0x6c1d064220, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0x92394c89 | Setup Error. Detail: (Unable to allocate device memory.  Device could become unstable.)
jni/../../tests/framework/error_monitor.cpp:241: Failure
Failed
Did not receive expected error 'Here are three float values 1.000000, 3.141500, 0.000000'
[  FAILED  ] NegativeDebugPrintf.UncachedBuffer (5836 ms)

juan-lunarg

LGTM. Thanks for the test 👍🏾

I'll defer to @Tony-LunarG and @jeremyg-lunarg for the actual review.

TonyBarbour · 2023-07-17T18:11:04Z

Galaxy S22 testing failed

[ RUN      ] NegativeDebugPrintf.UncachedBuffer
jni/../../tests/framework/error_monitor.cpp:200: Failure
Failed
Validation Error: [ UNASSIGNED-DEBUG-PRINTF ] Object 0: handle = 0x6c1d064220, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0x92394c89 | Setup Error. Detail: (Unable to allocate device memory.  Device could become unstable.)
jni/../../tests/framework/error_monitor.cpp:241: Failure
Failed
Did not receive expected error 'Here are three float values 1.000000, 3.141500, 0.000000'
[  FAILED  ] NegativeDebugPrintf.UncachedBuffer (5836 ms)

I don't think there's much the test can do about that. I'll just add the new test to the S22 blacklist.

TonyBarbour

The message processing loop won't exit correctly

TonyBarbour · 2023-07-17T19:55:43Z

layers/gpu_validation/debug_printf.cpp


    uint32_t index = spvtools::kDebugOutputDataOffset;
-    while (debug_output_buffer[index]) {
+    while ((index < output_buffer_size) && debug_output_buffer[index]) {


I don't think this loop will work if the output buffer has one message but isn't full. To see this, use uncached buffer memory in the test NegativeDebugPrintf.BasicUsage and run it. This loop doesn't stop when it's supposed to.

Could you check in which vkQueueSubmit/AnalyzeAndGenerateMessages you observe it? BasicUsage does it 15 times.

Here is how I understand it:

In my first one, debug_output_buffer == 0x0000028b8e520000
Memory of that buffer is:

0x0000028B8E520000 00 00 00 00 MBZ 0x0000028B8E520004 0a 00 00 00 expect (size of all messages, written by atomic in shader) 0x0000028B8E520008 0a 00 00 00 DPFOutputRecord.size (size in dwords of this message) 0x0000028B8E52000C 00 00 00 00 DPFOutputRecord.shader_id 0x0000028B8E520010 70 00 00 00 DPFOutputRecord.instruction_position 0x0000028B8E520014 00 00 00 00 DPFOutputRecord.stage 0x0000028B8E520018 00 00 00 00 DPFOutputRecord.stage_word_1 0x0000028B8E52001C 00 00 00 00 DPFOutputRecord.stage_word_2 0x0000028B8E520020 00 00 00 00 DPFOutputRecord.stage_word_3 0x0000028B8E520024 29 00 00 00 DPFOutputRecord.format_string_id ("Here are two float values %f, %f") 0x0000028B8E520028 00 00 80 3f DPFOutputRecord.values (float 1.0) 0x0000028B8E52002C 56 0e 49 40 DPFOutputRecord.values (float 3.14150000) 0x0000028B8E520030 00 00 00 00 DPFOutputRecord.size (no more messages) 0x0000028B8E520034 00 00 00 00 DPFOutputRecord. 0x0000028B8E520038 00 00 00 00 DPFOutputRecord.

Index starts at 0x2. At the end of first loop iteration:
index += debug_record->size; // 0x2 + 0xa = 0xc
At second loop beginning
debug_output_buffer[index] == 0x0000028b8e520030 {0x0}
So the loop ends.

(index < output_buffer_size) means "while we are still in the buffer"
&& debug_output_buffer[index]) means "while next message is not empty"

I don't have access to the system I was using to test this today - I'll get back to it tomorrow. The system I do have is getting VK_TIMEOUT instead of VK_ERROR_DEVICE_LOST or VK_SUCCESS at the vkQueueWaitIdle, and I had to add that to the test's expected results. You should add that to your test.

Thanks, I've added VK_TIMEOUT to asserts in test.
I assume you used AMD for that test. it happened to me once too.
VK_TIMEOUT should be not returned from vkQueueSubmit, vkDeviceWaitIdle or vkQueueWaitIdle.

I see the problem. Because debug_output_buffer[1] comes back as 0, the memset at the end of message processing doesn't clear the buffer:

memset(debug_output_buffer, 0, 4 * (debug_output_buffer[spvtools::kDebugOutputSizeOffset] + spvtools::kDebugOutputDataOffset));

And the memset clears 4 * (0 + 2) bytes which doesn't cover all of the records. Then on the next printf, the loop depends on debug_output_buffer[index] being zero when the loop is supposed to stop, but previous printfs may have left values that didn't get memset to 0. I suppose one answer would be to memset the whole buffer.

But I'm seeing another problem. If I modify your test to do two debug printfs:

if (gl_VertexIndex == 0) { debugPrintfEXT("Here are three float values %f, %f, %f", 1.0, myfloat, gl_Position.x); debugPrintfEXT("Here's another debug printf"); float x = constants.x[0]; while(x > -1.f) { // infinite loop x += 0.000001f; } debugPrintfEXT("Here is a value that should not be printed %f", x); }

I only get the second printf back, as if the first one was overwritten in the shader. Could that be a result of the atomic operation problem and debug_output_buffer[1] not getting written?
(This is all on an AMD RX6600 with driver 23.7.1)

The atomic add is performed by the instrumentation added to the shader via spirv-tools code, which I'm not very familiar with.

https://github.com/KhronosGroup/SPIRV-Tools/blob/main/source/opt/instrument_pass.cpp#L823

Ok so after a bit of digging, this extension was enabled before we had the MALL cache in hardware, and nobody ever went back and changed the interface to bypass MALL, so there IS caching enabled on this memory type for newer hardware. Whoops. I've filed an internal bug to fix this and to add a more comprehensive test to make sure this actually works on the off chance we make another big change to the caching hierarchy...

You can go look at the public driver source if for some reason you want to see where this is happening 🙃

I'll let you know when it's fixed and subsequently released.

@Tobski I assume the fix is released now?

A fix has been implemented but I'm not sure on release status, I'll take a look.

ci-tester-lunarg · 2023-07-19T11:02:01Z

CI Vulkan-ValidationLayers build queued with queue ID 9060.

ci-tester-lunarg · 2023-07-19T11:25:07Z

CI Vulkan-ValidationLayers build # 12764 running.

ci-tester-lunarg · 2023-07-19T12:19:23Z

CI Vulkan-ValidationLayers build # 12764 passed.

ci-tester-lunarg · 2023-07-21T10:12:51Z

CI Vulkan-ValidationLayers build queued with queue ID 10076.

ci-tester-lunarg · 2023-07-21T10:15:01Z

CI Vulkan-ValidationLayers build # 12781 running.

ci-tester-lunarg · 2023-07-21T11:07:46Z

CI Vulkan-ValidationLayers build # 12781 failed.

ci-tester-lunarg · 2023-07-21T21:26:24Z

CI Vulkan-ValidationLayers build queued with queue ID 10420.

lunarpapillo · 2023-07-21T21:27:03Z

Last LunarG CI failure was due to a system issue. I've restarted.

ci-tester-lunarg · 2023-07-21T21:59:21Z

CI Vulkan-ValidationLayers build # 12787 running.

ci-tester-lunarg · 2023-07-21T22:54:24Z

CI Vulkan-ValidationLayers build # 12787 passed.

Tobski · 2023-08-01T11:05:48Z

layers/gpu_validation/debug_printf.cpp

@@ -672,6 +821,9 @@ void DebugPrintf::AllocateDebugPrintfResources(const VkCommandBuffer cmd_buffer,
    buffer_info.usage = VK_BUFFER_USAGE_STORAGE_BUFFER_BIT;
    VmaAllocationCreateInfo alloc_info = {};
    alloc_info.requiredFlags = VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT;
+    if (use_uncached_buffer) {
+        alloc_info.requiredFlags |= VK_MEMORY_PROPERTY_DEVICE_COHERENT_BIT_AMD | VK_MEMORY_PROPERTY_DEVICE_UNCACHED_BIT_AMD;


You probably don't want to require UNCACHED here, since theoretically in future you might get COHERENT and CACHED memory. Uncached is meant to be informative, not something you generally rely upon unless you're doing some very specific performance tuning.

I don't quite understand. Here I want to get device uncached to read it after device is lost.

dorian-apanel-intel requested a review from a team as a code owner July 12, 2023 12:43

dorian-apanel-intel changed the title ~~layers: gpu-av Debug Printf using uncached buffer~~ layers: Allow using uncached buffer for Debug Printf Jul 12, 2023

dorian-apanel-intel force-pushed the dapanel/uncached_debug_printf_buffer branch from 749483d to af6ca10 Compare July 12, 2023 12:55

dorian-apanel-intel force-pushed the dapanel/uncached_debug_printf_buffer branch from af6ca10 to d148df2 Compare July 12, 2023 12:57

dorian-apanel-intel force-pushed the dapanel/uncached_debug_printf_buffer branch from 2a5f0a9 to ada331e Compare July 12, 2023 13:17

jeremyg-lunarg requested review from TonyBarbour and jeremyg-lunarg July 12, 2023 14:41

jeremyg-lunarg reviewed Jul 12, 2023

View reviewed changes

layers/gpu_validation/debug_printf.cpp Outdated Show resolved Hide resolved

layers/gpu_validation/debug_printf.cpp Outdated Show resolved Hide resolved

TonyBarbour reviewed Jul 12, 2023

View reviewed changes

juan-lunarg reviewed Jul 12, 2023

View reviewed changes

dorian-apanel-intel force-pushed the dapanel/uncached_debug_printf_buffer branch from 99bb797 to 090d262 Compare July 17, 2023 12:22

tests: Removed unused variables

c0c5d42

juan-lunarg reviewed Jul 17, 2023

View reviewed changes

TonyBarbour suggested changes Jul 17, 2023

View reviewed changes

tests: Added VK_TIMEOUT handling

cb05cb6

layers: Fix printf debug_output_buffer clear

4b956cf

dorian-apanel-intel mentioned this pull request Jul 21, 2023

VK_AMD_device_coherent_memory atomic operations. KhronosGroup/Vulkan-Docs#2172

Open

Tobski reviewed Aug 1, 2023

View reviewed changes

layers: Allow using uncached buffer for Debug Printf #6128

Are you sure you want to change the base?

layers: Allow using uncached buffer for Debug Printf #6128

Conversation

dorian-apanel-intel commented Jul 12, 2023

CLAassistant commented Jul 12, 2023 • edited Loading

ci-tester-lunarg commented Jul 12, 2023

ci-tester-lunarg commented Jul 12, 2023

ci-tester-lunarg commented Jul 12, 2023

ci-tester-lunarg commented Jul 12, 2023

ci-tester-lunarg commented Jul 12, 2023

ci-tester-lunarg commented Jul 12, 2023

TonyBarbour left a comment

Choose a reason for hiding this comment

juan-lunarg left a comment

Choose a reason for hiding this comment

ci-tester-lunarg commented Jul 12, 2023

ci-tester-lunarg commented Jul 12, 2023

ci-tester-lunarg commented Jul 13, 2023

dorian-apanel-intel commented Jul 13, 2023

ci-tester-lunarg commented Jul 13, 2023

ci-tester-lunarg commented Jul 13, 2023

TonyBarbour commented Jul 13, 2023

ci-tester-lunarg commented Jul 13, 2023

juan-lunarg commented Jul 13, 2023

ci-tester-lunarg commented Jul 17, 2023

ci-tester-lunarg commented Jul 17, 2023

dorian-apanel-intel commented Jul 17, 2023

ci-tester-lunarg commented Jul 17, 2023

ci-tester-lunarg commented Jul 17, 2023

ci-tester-lunarg commented Jul 17, 2023

ci-tester-lunarg commented Jul 17, 2023

ci-tester-lunarg commented Jul 17, 2023

juan-lunarg commented Jul 17, 2023

juan-lunarg left a comment

Choose a reason for hiding this comment

TonyBarbour commented Jul 17, 2023

TonyBarbour left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TonyBarbour Jul 18, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ci-tester-lunarg commented Jul 19, 2023

ci-tester-lunarg commented Jul 19, 2023

ci-tester-lunarg commented Jul 19, 2023

ci-tester-lunarg commented Jul 21, 2023

ci-tester-lunarg commented Jul 21, 2023

ci-tester-lunarg commented Jul 21, 2023

ci-tester-lunarg commented Jul 21, 2023

lunarpapillo commented Jul 21, 2023

ci-tester-lunarg commented Jul 21, 2023

ci-tester-lunarg commented Jul 21, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CLAassistant commented Jul 12, 2023 •

edited

Loading

TonyBarbour Jul 18, 2023 •

edited

Loading