-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
updated to compile against more recent opencl headers #13
base: maint-3.8
Are you sure you want to change the base?
Conversation
hmmm, maybe this is not quite correct yet. i get segfaults in |
i tested with /usr/bin/test-clkernel, /usr/bin/test-clenabled, /usr/bin/test-clfilter, /usr/bin/test-clxcorrelate, /usr/bin/test-clxengine and they all run correctly |
i managed to get a backtrace in gdb:
|
Hi Stef, thanks for looking at this. Was something I knew I was eventually going to have to get to. When it's all tested out, happy to merge it! |
i dug down into this, and i'm convinced this is a different issue, the thing is, i can only test it with my patch in this pull request. anyway here is my analysis: minimal testcases can be found at: https://gist.github.com/stef/5e2d8a88f1d9d220f623a55531a9fb06 the report below has been test with intel-opencl-icd version: 20.44.18297 on
and on
The segfault happens in two different locations. In the case of the first location, the backtrace of the faulting thread is the following:
the crash happens in: this function looks like this: void DrmAllocation::makeBOsResident(OsContext *osContext, uint32_t vmHandleId, std::vector<BufferObject *> *bufferObjects, bool bind) {
if (this->fragmentsStorage.fragmentCount) {
for (unsigned int f = 0; f < this->fragmentsStorage.fragmentCount; f++) {
if (!this->fragmentsStorage.fragmentStorageData[f].residency->resident[osContext->getContextId()]) { The fault occurs in the last line of this snippet. the dissassembly of this function looks like this:
The problem is, that
The 0x38th offset of
which is inside the
this (gdb) ptype/o this->fragmentsStorage apparently the
since each of them is 40 bytes long, it is obvious that the offending address that gets loaded in we can verify this quickly:
which is equal to what is the value there, surely it is 0?
This address belongs to some mmapped memory region (which supposedly is shared with the gpu). Since 3 assembly operations before the segfault we read this value as 0, this is a strong indicator of some kind of race condition. Interestingly, running this process multiple times sometimes did result sometimes in a value of 0 there when already in gdb, suggesting that whatever updates this memory has not yet written there. Sometimes running the testcases the crash happens at a different location:
which contains:
apparently this function fails: https://github.com/intel/compute-runtime/blob/3015b9575251d380e24a4569566b7b8a467d6380/shared/source/memory_manager/host_ptr_manager.cpp#L261 my opencl-fu is very limited, this is where i'd like to hand this issue over to you or the intel guys. i'll open an issue in their tracker about this. |
It's definitely interesting. I've mostly been running against NVIDIA GPUs, I haven't tested against Intel graphics sets in quite a while (I had it in a really old laptop maybe 5-7 years ago, but nothing in my current setups). 2 things do immediately come to mind:
You may be able to try in the section where the memory gets allocated adding some checks that the memory creation appears to have succeeded even if it doesn't throw an exception? In setBufferLength() for instance. If new cl::Buffer isn't throwing an exception, maybe it's just returning a NULL if it can't allocate? |
would you be able to propose specific code to patch in, i'd be happy to run tests |
As a first test, just make sure aBuffer, etc. are not coming back NULL. Just something like aBuffer != NULL. What data type are you feeding it? I can glance over that kernel there one more time too. |
i feed a sinus and a cosinus signal source in my gnuradiocompanion flowgraph into a cl-multiply and that into a nullsink. i don't really do buffers at all. |
I was referring to the cl::buffers that get created in the cl_mathop class. Are the sine/cosines complex or float data types? |
they are all complex. |
i prepared testcases linked in my analysis under https://gist.github.com/stef/5e2d8a88f1d9d220f623a55531a9fb06 |
the intel people have posted some insights, does any of this ring a bell with you? intel/compute-runtime#409 (comment) |
i created a much simpler c++ only testcase which still fails for me: https://gist.github.com/stef/e7818dc85b48c06d98a333e01f3d526f |
there seems to be a violation of the opencl spec according to the intel engineers reproducing the testcase: intel/compute-runtime#409 (comment) |
I did try to start a conversion to the newer headers and ran into issues. So there's a foundation in the code base now to switch it over. It's just not turned on. If Intel provides any feedback, let me know. |
i'm not sure what other feedback would intel be providing besides the:
|
In the math op blocks, there's only one queue, there's a thread lock to ensure multiple work() calls don't happen at the same time to cause any overlap, and there's a blocking read at the end of a sequence (which is the correct way to do it). So I'm not sure what they're looking at. |
this commit should fix #4
i deliberately added
#defines
to mark which files need updating to the newer APIi managed to compile this on a devuan host, against the following package:
opencl-clhpp-headers 3.0~2.0.13-1 - which seems to be from
khronos-opencl-clhpp