-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Vulkan deferred operation handling errors #1192
Fix Vulkan deferred operation handling errors #1192
Conversation
CI gfxreconstruct build queued with queue ID 4468. |
CI gfxreconstruct build # 2958 running. |
CI gfxreconstruct build # 2958 failed. |
3cf98ed
to
7b11eaf
Compare
CI gfxreconstruct build queued with queue ID 8587. |
CI gfxreconstruct build # 2993 running. |
CI gfxreconstruct build # 2993 passed. |
7b11eaf
to
15e0f69
Compare
CI gfxreconstruct build queued with queue ID 16590. |
CI gfxreconstruct build # 3050 running. |
CI gfxreconstruct build # 3050 passed. |
@@ -6155,7 +6161,33 @@ VkResult VulkanReplayConsumerBase::OverrideCreateRayTracingPipelinesKHR( | |||
createInfoCount, | |||
modified_create_infos.data(), | |||
in_pAllocator, | |||
out_pPipelines); | |||
deferred_operation_info->replayPipelines.data()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deferred_operation_info->replayPipelines
is resized when if (deferred_operation_info)
. It means if it doesn't use deferred operation, it will cause an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing that out. I'll follow up with a commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
CI gfxreconstruct build queued with queue ID 30721. |
CI gfxreconstruct build # 3137 running. |
CI gfxreconstruct build # 3137 passed. |
void VulkanExportJsonConsumerBase::Process_vkCreateRayTracingPipelinesKHR( | ||
const ApiCallInfo& call_info, | ||
VkResult returnValue, | ||
format::HandleId device, | ||
format::HandleId deferredOperation, | ||
format::HandleId pipelineCache, | ||
uint32_t createInfoCount, | ||
StructPointerDecoder<Decoded_VkRayTracingPipelineCreateInfoKHR>* pCreateInfos, | ||
StructPointerDecoder<Decoded_VkAllocationCallbacks>* pAllocator, | ||
HandlePointerDecoder<VkPipeline>* pPipelines) | ||
{ | ||
nlohmann::ordered_json& jdata = WriteApiCallStart(call_info, "vkCreateRayTracingPipelinesKHR"); | ||
FieldToJson(jdata[NameReturn()], returnValue, json_options_); | ||
auto& args = jdata[NameArgs()]; | ||
HandleToJson(args["device"], device, json_options_); | ||
HandleToJson(args["deferredOperation"], deferredOperation, json_options_); | ||
HandleToJson(args["pipelineCache"], pipelineCache, json_options_); | ||
FieldToJson(args["createInfoCount"], createInfoCount, json_options_); | ||
FieldToJson(args["pCreateInfos"], pCreateInfos, json_options_); | ||
FieldToJson(args["pAllocator"], pAllocator, json_options_); | ||
HandleToJson(args["pPipelines"], pPipelines, json_options_); | ||
WriteBlockEnd(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this, it is the same as the default generated version. We don't want custom code where generated code can work. You probably added this because the default implementation disappeared when you added vkCreateRayTracingPipelinesKHR
to vulkan_generators/blacklists.json
. Each generator can have its own blocklisk.blacklist JSON file: they don't have to share this one.
Either give the JSON consumer its own blocklist or remove this function from the default blocklist in the JSON consumer header and body generators with a line like self.APICALL_BLACKLIST.remove('vkCreateRayTracingPipelinesKHR')
(untested), probably at the botton of __init__()
or the top of generate_feature()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated it accordingly, thanks.
StructPointerDecoder<Decoded_VkRayTracingPipelineCreateInfoKHR>* pCreateInfos, | ||
StructPointerDecoder<Decoded_VkAllocationCallbacks>* pAllocator, | ||
HandlePointerDecoder<VkPipeline>* pPipelines) override; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this copy of the generated version. See other comment on body of same module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated it accordingly, thanks.
CI gfxreconstruct build queued with queue ID 72512. |
CI gfxreconstruct build # 3436 running. |
CI gfxreconstruct build # 3436 failed. |
This failure is just a replay failure for an unknown struct in a |
I'll rebase, thank you. |
dfadf84
to
46813fd
Compare
CI gfxreconstruct build queued with queue ID 74963. |
I'm looking into it, will keep you updated, thanks. |
deferred_operation_info->record_modified_create_infos.clear(); | ||
deferred_operation_info->record_modified_pgroups.clear(); | ||
deferred_operation_info->replayPipelines.resize(createInfoCount); | ||
deferred_operation_info->capturePipelines.resize(createInfoCount); | ||
memcpy(deferred_operation_info->capturePipelines.data(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to use deferred_operation_info->capturePipelines->insert
? t allows you to omit the sizeof
and is more idiomatic C++. I realize there are uses of memcpy in this file but we'd like to move away from memcpy
when possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
if ((result != VK_OPERATION_DEFERRED_KHR) || | ||
(!device_wrapper->property_feature_info.feature_rayTracingPipelineShaderGroupHandleCaptureReplay)) | ||
{ | ||
// If the operation is not deferred by driver. or the system doesn't support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, this block ends up being written twice by the code in this branch. I don't think that's what we want for this solution. If there has to be a deferred operation read of the output variable, we probably need a MetaData command to write that value, and then replay should handle the case of postponing creation of the VkPipeline handles until that MetaData command is written.
I'm confused about what's happening here. Is the problem that the application creates an RT pipeline and gets VK_OPERATION_DEFERRED_KHR then joins the deferred operation until success, then uses the pipeline handles but GFXR somehow writes the operations out of order so the pipeline handles are used in replay before their creation is complete?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a test case where the application creates an RT pipeline and gets VK_OPERATION_DEFERRED_KHR then joins the deferred operation until success, then uses the pipeline handles but GFXR somehow writes the operations out of order so the pipeline handles are used in replay before their creation is complete?
If that's what's happening, then I think I need to understand how that could happen and we may need to introduce synchronization around the pipeline handles and anything using those pipeline handles, and not create this block twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you describe in steps how the problem occurs so I can understand how this branch is expected to the issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The modified app is valid.
Yes, I think the hardware doesn't support the feature rayTracingPipelineShaderGroupHandleCaptureReplay, it triggers the issue that the above code block generates one more trace block. I'll add a commit for the issue. It may playback the trace of raytracingbasic as current dev branch. However, for this hardware, it's not guarantee playback always work, the reason is the following:
GFXR current framework requires the feature for capture/replay raytracing titles. It needs the related GPU address/ShaderGroupHandle capture/replay extension to reproduce these capture time GPU address/handle value during replay time. Without the feature (rayTracingPipelineShaderGroupHandleCaptureReplay) support, it's not guarantee for replaying raytracing trace file because shadergroup handle value may have different value during replay time, especially for complicated apps or real game, simple app and full trace file replay sometimes may work because the used GPU addresses/handles way less than real title.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
CI gfxreconstruct build queued with queue ID 87408. |
CI gfxreconstruct build # 3536 running. |
CI gfxreconstruct build # 3536 passed. |
Thank you, @mizhen , for fixing it. I didn't realize the One idea is maybe we could still write the block of |
Thank you for your review, I will make some changes. |
CI gfxreconstruct build queued with queue ID 97935. |
Done. |
CI gfxreconstruct build # 3612 running. |
CI gfxreconstruct build # 3612 passed. |
@mizhen Thank you. Could you please |
|
when using deferred operation object in command vkcreateraytracingpipelines , driver has two choices. One is driver proceed all workload in current thread within this command, another choice is deferring this command to other threads. The commit added handling to the second choice and fixed related errors on pipeline reading and capture/playback shader group handle error. Change-Id: Id4704502714b866211555d007e09dcefa69a4138
The deferred operation object info pointer will be nullptr on the condition and the following access to this pointer will cause access error. The commit fixed the issue. Change-Id: Ia9e729f4cf083360801a7ffbefcef759e00efe79
Remove the condition of checking system feature rayTracingPipelineShaderGroupHandleCaptureReplay which may trigger error for some hardware/driver which support raytracing without the feature.
Add warning message for delaying writing the block of vkCreateRayTracingPipelinesKHR to the capture file.
5c4311c
to
adee9fb
Compare
CI gfxreconstruct build queued with queue ID 101804. |
Done. |
CI gfxreconstruct build # 3626 running. |
CI gfxreconstruct build # 3626 failed. |
CI gfxreconstruct build # 3629 passed. It's good to merge. |
The problem
when using deferred operation object in command vkcreateraytracingpipelines, there are two cases with the driver. One is driver proceed all workload in current thread without deferring this command, another case is deferring this command to other threads. The second case is not covered by existing handling. It cause some
title playback failure because reading wrong pipeline data and capture/playback shader group handle, it also cause trim playback failure because ray tracing pipeline creation failure.
The solution
The pull request is the fix for these errors. It added needed handling for deferred command and fixed related errors on pipeline reading, capture/playback shader group handle handling and trim deferred operation object handling.
Result
All tests passed for the target title and other applications include trim trace and full trace capture/playback, these test results show the pull-request fixed the target title playback issue caused by these deferred operation handling errors.