Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updated to compile against more recent opencl headers #13

Open
wants to merge 1 commit into
base: maint-3.8
Choose a base branch
from

Conversation

stef
Copy link

@stef stef commented Mar 5, 2021

this commit should fix #4

i deliberately added #defines to mark which files need updating to the newer API

i managed to compile this on a devuan host, against the following package:

opencl-clhpp-headers 3.0~2.0.13-1 - which seems to be from khronos-opencl-clhpp

@stef
Copy link
Author

stef commented Mar 5, 2021

hmmm, maybe this is not quite correct yet. i get segfaults in intel-opencl/libigdrcl.so - can anyone else test this if this is my setup that is borken or if this is a consequence of making it compile?

@stef
Copy link
Author

stef commented Mar 6, 2021

i tested with /usr/bin/test-clkernel, /usr/bin/test-clenabled, /usr/bin/test-clfilter, /usr/bin/test-clxcorrelate, /usr/bin/test-clxengine and they all run correctly

@stef
Copy link
Author

stef commented Mar 6, 2021

i managed to get a backtrace in gdb:

Thread 65 "clMathOp6" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff497fa700 (LWP 16741)]
NEO::DrmAllocation::makeBOsResident (this=0x6ddaec0, osContext=0x5f15080, vmHandleId=vmHandleId@entry=0, bufferObjects=bufferObjects@entry=0x13de378, bind=bind@entry=false) at ./shared/source/os_interface/
linux/drm_allocation.cpp:35
Download failed: Invalid argument.  Continuing without source file ./build/shared/source/./shared/source/os_interface/linux/drm_allocation.cpp.
35      ./shared/source/os_interface/linux/drm_allocation.cpp: No such file or directory.
(gdb) bt
#0  NEO::DrmAllocation::makeBOsResident (this=0x6ddaec0, osContext=0x5f15080, vmHandleId=vmHandleId@entry=0, bufferObjects=bufferObjects@entry=0x13de378, bind=bind@entry=false)
    at ./shared/source/os_interface/linux/drm_allocation.cpp:35
#1  0x00007fffe6d27d34 in NEO::DrmCommandStreamReceiver<NEO::SKLFamily>::processResidency (this=0x13de150, inputAllocationsForResidency=..., handleId=0)
    at ./opencl/source/os_interface/linux/drm_command_stream.inl:131
#2  0x00007fffe6d286ce in NEO::DrmCommandStreamReceiver<NEO::SKLFamily>::flushInternal (this=this@entry=0x13de150, batchBuffer=..., allocationsForResidency=std::vector of length 10, capacity 20 = {...})
    at ./opencl/source/os_interface/linux/drm_command_stream_bdw_plus.inl:16
#3  0x00007fffe6d2890b in NEO::DrmCommandStreamReceiver<NEO::SKLFamily>::flush (this=this@entry=0x13de150, batchBuffer=..., allocationsForResidency=std::vector of length 10, capacity 20 = {...})
    at ./opencl/source/os_interface/linux/drm_command_stream.inl:86
#4  0x00007fffe6c817c5 in NEO::CommandStreamReceiverHw<NEO::SKLFamily>::flushTask (this=this@entry=0x13de150, commandStreamTask=..., commandStreamStartTask=commandStreamStartTask@entry=576, dsh=..., 
    ioh=..., ssh=..., taskLevel=<optimized out>, dispatchFlags=..., device=...) at ./shared/source/command_stream/command_stream_receiver_hw_base.inl:545
#5  0x00007fffe6b2bfdc in NEO::CommandQueueHw<NEO::SKLFamily>::enqueueNonBlocked<4596u> (this=this@entry=0x9c8060, surfaces=surfaces@entry=0x7fff497f84d0, surfaceCount=surfaceCount@entry=2, 
    commandStream=..., commandStreamStart=commandStreamStart@entry=576, blocking=@0x7fff497f635c: false, multiDispatchInfo=..., enqueueProperties=..., timestampPacketDependencies=..., eventsRequest=..., 
    eventBuilder=..., taskLevel=22, printfHandler=0x0) at ./opencl/source/command_queue/enqueue_common.h:789
#6  0x00007fffe6b57e3f in NEO::CommandQueueHw<NEO::SKLFamily>::enqueueHandler<4596u> (this=this@entry=0x9c8060, surfacesForResidency=surfacesForResidency@entry=0x7fff497f84d0, 
    numSurfaceForResidency=numSurfaceForResidency@entry=2, blocking=<optimized out>, blocking@entry=false, multiDispatchInfo=..., numEventsInWaitList=<optimized out>, numEventsInWaitList@entry=0, 
    eventWaitList=<optimized out>, event=0x0) at /usr/include/c++/10/bits/unique_ptr.h:173
#7  0x00007fffe6b58387 in NEO::CommandQueueHw<NEO::SKLFamily>::enqueueHandler<4596u, 2ul> (event=0x0, eventWaitList=0x0, numEventsInWaitList=0, dispatchInfo=..., blocking=false, surfacesForResidency=..., 
    this=0x9c8060) at ./opencl/source/command_queue/command_queue_hw.h:335
#8  NEO::CommandQueueHw<NEO::SKLFamily>::dispatchBcsOrGpgpuEnqueue<4596u, 2ul> (this=this@entry=0x9c8060, dispatchInfo=..., surfaces=..., builtInOperation=builtInOperation@entry=1, 
    numEventsInWaitList=numEventsInWaitList@entry=0, eventWaitList=eventWaitList@entry=0x0, event=0x0, blocking=false) at ./opencl/source/command_queue/enqueue_common.h:1120
#9  0x00007fffe6b598a7 in NEO::CommandQueueHw<NEO::SKLFamily>::enqueueWriteBuffer (this=0x9c8060, buffer=0xf64170, blockingWrite=0, offset=0, size=28672, ptr=<optimized out>, 
    mapAllocation=<optimized out>, numEventsInWaitList=0, eventWaitList=0x0, event=0x0) at ./opencl/source/command_queue/enqueue_write_buffer.h:114
#10 0x00007fffe69f54bd in clEnqueueWriteBuffer (commandQueue=<optimized out>, buffer=<optimized out>, blockingWrite=<optimized out>, offset=<optimized out>, cb=<optimized out>, ptr=<optimized out>, 
    numEventsInWaitList=<optimized out>, eventWaitList=<optimized out>, event=<optimized out>) at ./opencl/source/api/api.cpp:2356
#11 0x00007fffeb4372a8 in cl::CommandQueue::enqueueWriteBuffer (this=<optimized out>, this=<optimized out>, event=0x0, events=0x0, ptr=<optimized out>, size=28672, offset=0, blocking=0, buffer=...)
    at /usr/include/CL/opencl.hpp:7632
#12 gr::clenabled::clMathOp_impl::processOpenCL (this=0x59dbae0, noutput_items=3584, ninput_items=..., input_items=std::vector of length 2, capacity 2 = {...}, 
    output_items=std::vector of length 1, capacity 1 = {...}) at /home/s/srcs/gr-clenabled/lib/clMathOp_impl.cc:421
#13 0x00007ffff68fa167 in gr::sync_block::general_work (this=0x59dbcc0, noutput_items=<optimized out>, ninput_items=..., input_items=..., output_items=...) at ../gnuradio-runtime/lib/sync_block.cc:61
#14 0x00007ffff68b70f3 in gr::block_executor::run_one_iteration (this=this@entry=0x7fff497f9dc0) at ../gnuradio-runtime/lib/block_executor.cc:514
#15 0x00007ffff690a634 in gr::tpb_thread_body::tpb_thread_body (this=0x7fff497f9dc0, block=..., start_sync=..., max_noutput_items=<optimized out>) at ../gnuradio-runtime/lib/tpb_thread_body.cc:122
#16 0x00007ffff68f9264 in gr::tpb_container::operator() (this=0x6720a80) at ../gnuradio-runtime/lib/scheduler_tpb.cc:50
#17 gr::thread::thread_body_wrapper<gr::tpb_container>::operator() (this=0x6720a80) at ../gnuradio-runtime/lib/../include/gnuradio/thread/thread_body_wrapper.h:52
#18 0x00007ffff6385787 in boost::(anonymous namespace)::thread_proxy (param=<optimized out>) at libs/thread/src/pthread/thread.cpp:179
#19 0x00007ffff7f92ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#20 0x00007ffff7d25def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

@ghostop14
Copy link
Owner

Hi Stef, thanks for looking at this. Was something I knew I was eventually going to have to get to. When it's all tested out, happy to merge it!

@stef
Copy link
Author

stef commented Mar 8, 2021

i dug down into this, and i'm convinced this is a different issue, the thing is, i can only test it with my patch in this pull request. anyway here is my analysis:

minimal testcases can be found at: https://gist.github.com/stef/5e2d8a88f1d9d220f623a55531a9fb06

the report below has been test with intel-opencl-icd version: 20.44.18297 on

  • devuan running on a Intel(R) Celeron(R) J4115 CPU @ 1.80GHz, the gpu is an integrated Intel gen9 UHD Graphics 600 (Gemini Lake). clview reports:
Platform Id: 0
Device Id: 0
Platform Name: Intel(R) OpenCL HD Graphics
Device Name: Intel(R) Graphics Gen9 [0x3185]
Device Type: GPU
Clock Frequency: 750 MHz
Compute Units: 12 (A workgroup executes on a compute unit.  This represents parallel workgroups.)
Max Workgroup Size: 256
Constant Memory: 1677720K (429496320 floats)
Local Memory: 64K (16384 floats)
Double Precision Math Support: Yes
Double Precision Fused Multiply/Add [FMA] Support: Yes
Single Precision Fused Multiply/Add [FMA] Support: Yes

and on

  • debian running where clview reports:
Platform Id: 0
Device Id: 0
Platform Name: Intel(R) OpenCL HD Graphics
Device Name: Intel(R) Graphics Gen9 [0x5917]
Device Type: GPU
Clock Frequency: 1150 MHz
Compute Units: 24 (A workgroup executes on a compute unit.  This represents parallel workgroups.)
Max Workgroup Size: 256
Constant Memory: -8K (-2048 floats)
Local Memory: 64K (16384 floats)
Double Precision Math Support: Yes
Double Precision Fused Multiply/Add [FMA] Support: Yes
Single Precision Fused Multiply/Add [FMA] Support: Yes

Platform Id: 1
Device Id: 0
Platform Name: Intel(R) CPU Runtime for OpenCL(TM) Applications
Device Name: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
Device Type: CPU
Clock Frequency: 1800 MHz
Compute Units: 8 (A workgroup executes on a compute unit.  This represents parallel workgroups.)
Max Workgroup Size: 8192
Constant Memory: 128K (32768 floats)
Local Memory: 32K (8192 floats)
Double Precision Math Support: Yes
Double Precision Fused Multiply/Add [FMA] Support: Yes
Single Precision Fused Multiply/Add [FMA] Support: No

The segfault happens in two different locations.

In the case of the first location, the backtrace of the faulting thread is the following:

#0  NEO::DrmAllocation::makeBOsResident (this=0x66bfa90, osContext=0x5f11ee0, vmHandleId=vmHandleId@entry=0, bufferObjects=bufferObjects@entry=0x1792718, bind=bind@entry=false)
    at ./shared/source/os_interface/linux/drm_allocation.cpp:35
#1  0x00007fffe6d27d34 in NEO::DrmCommandStreamReceiver<NEO::SKLFamily>::processResidency (this=0x17924f0, inputAllocationsForResidency=..., handleId=0)
    at ./opencl/source/os_interface/linux/drm_command_stream.inl:131
#2  0x00007fffe6d286ce in NEO::DrmCommandStreamReceiver<NEO::SKLFamily>::flushInternal (this=this@entry=0x17924f0, batchBuffer=..., allocationsForResidency=std::vector of length 9, capacity 20 = {...})
    at ./opencl/source/os_interface/linux/drm_command_stream_bdw_plus.inl:16
#3  0x00007fffe6d2890b in NEO::DrmCommandStreamReceiver<NEO::SKLFamily>::flush (this=this@entry=0x17924f0, batchBuffer=..., allocationsForResidency=std::vector of length 9, capacity 20 = {...})
    at ./opencl/source/os_interface/linux/drm_command_stream.inl:86
#4  0x00007fffe6c817c5 in NEO::CommandStreamReceiverHw<NEO::SKLFamily>::flushTask (this=this@entry=0x17924f0, commandStreamTask=..., commandStreamStartTask=commandStreamStartTask@entry=576, dsh=..., 
    ioh=..., ssh=..., taskLevel=<optimized out>, dispatchFlags=..., device=...) at ./shared/source/command_stream/command_stream_receiver_hw_base.inl:545
#5  0x00007fffe6b2bfdc in NEO::CommandQueueHw<NEO::SKLFamily>::enqueueNonBlocked<4596u> (this=this@entry=0x60e9420, surfaces=surfaces@entry=0x7fff497f84d0, surfaceCount=surfaceCount@entry=2, 
    commandStream=..., commandStreamStart=commandStreamStart@entry=576, blocking=@0x7fff497f635c: false, multiDispatchInfo=..., enqueueProperties=..., timestampPacketDependencies=..., eventsRequest=..., 
    eventBuilder=..., taskLevel=40, printfHandler=0x0) at ./opencl/source/command_queue/enqueue_common.h:789
#6  0x00007fffe6b57e3f in NEO::CommandQueueHw<NEO::SKLFamily>::enqueueHandler<4596u> (this=this@entry=0x60e9420, surfacesForResidency=surfacesForResidency@entry=0x7fff497f84d0, 
    numSurfaceForResidency=numSurfaceForResidency@entry=2, blocking=<optimized out>, blocking@entry=false, multiDispatchInfo=..., numEventsInWaitList=<optimized out>, numEventsInWaitList@entry=0, 
    eventWaitList=<optimized out>, event=0x0) at /usr/include/c++/10/bits/unique_ptr.h:173
#7  0x00007fffe6b58387 in NEO::CommandQueueHw<NEO::SKLFamily>::enqueueHandler<4596u, 2ul> (event=0x0, eventWaitList=0x0, numEventsInWaitList=0, dispatchInfo=..., blocking=false, surfacesForResidency=..., 
    this=0x60e9420) at ./opencl/source/command_queue/command_queue_hw.h:335
#8  NEO::CommandQueueHw<NEO::SKLFamily>::dispatchBcsOrGpgpuEnqueue<4596u, 2ul> (this=this@entry=0x60e9420, dispatchInfo=..., surfaces=..., builtInOperation=builtInOperation@entry=1, 
    numEventsInWaitList=numEventsInWaitList@entry=0, eventWaitList=eventWaitList@entry=0x0, event=0x0, blocking=false) at ./opencl/source/command_queue/enqueue_common.h:1120
#9  0x00007fffe6b598a7 in NEO::CommandQueueHw<NEO::SKLFamily>::enqueueWriteBuffer (this=0x60e9420, buffer=0x60f26a0, blockingWrite=0, offset=0, size=28672, ptr=<optimized out>, 
    mapAllocation=<optimized out>, numEventsInWaitList=0, eventWaitList=0x0, event=0x0) at ./opencl/source/command_queue/enqueue_write_buffer.h:114
#10 0x00007fffe69f54bd in clEnqueueWriteBuffer (commandQueue=<optimized out>, buffer=<optimized out>, blockingWrite=<optimized out>, offset=<optimized out>, cb=<optimized out>, ptr=<optimized out>, 
    numEventsInWaitList=<optimized out>, eventWaitList=<optimized out>, event=<optimized out>) at ./opencl/source/api/api.cpp:2356
#11 0x00007fffeb4372a8 in cl::CommandQueue::enqueueWriteBuffer (this=<optimized out>, this=<optimized out>, event=0x0, events=0x0, ptr=<optimized out>, size=28672, offset=0, blocking=0, buffer=...)
    at /usr/include/CL/opencl.hpp:7632
#12 gr::clenabled::clMathOp_impl::processOpenCL (this=0x62718a0, noutput_items=3584, ninput_items=..., input_items=std::vector of length 2, capacity 2 = {...}, 
    output_items=std::vector of length 1, capacity 1 = {...}) at /home/s/srcs/gr-clenabled/lib/clMathOp_impl.cc:421
#13 0x00007ffff68fa167 in gr::sync_block::general_work (this=0x6271a80, noutput_items=<optimized out>, ninput_items=..., input_items=..., output_items=...) at ../gnuradio-runtime/lib/sync_block.cc:61
#14 0x00007ffff68b70f3 in gr::block_executor::run_one_iteration (this=this@entry=0x7fff497f9dc0) at ../gnuradio-runtime/lib/block_executor.cc:514
#15 0x00007ffff690a634 in gr::tpb_thread_body::tpb_thread_body (this=0x7fff497f9dc0, block=..., start_sync=..., max_noutput_items=<optimized out>) at ../gnuradio-runtime/lib/tpb_thread_body.cc:122
#16 0x00007ffff68f9264 in gr::tpb_container::operator() (this=0x819eb50) at ../gnuradio-runtime/lib/scheduler_tpb.cc:50
#17 gr::thread::thread_body_wrapper<gr::tpb_container>::operator() (this=0x819eb50) at ../gnuradio-runtime/lib/../include/gnuradio/thread/thread_body_wrapper.h:52
#18 0x00007ffff6385787 in boost::(anonymous namespace)::thread_proxy (param=<optimized out>) at libs/thread/src/pthread/thread.cpp:179
#19 0x00007ffff7f92ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#20 0x00007ffff7d25def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

the crash happens in:

https://github.com/intel/compute-runtime/blob/20.44.18297/shared/source/os_interface/linux/drm_allocation.cpp#L35

this function looks like this:

      void DrmAllocation::makeBOsResident(OsContext *osContext, uint32_t vmHandleId, std::vector<BufferObject *> *bufferObjects, bool bind) {                                                              
          if (this->fragmentsStorage.fragmentCount) {                                                                                                                                                      
              for (unsigned int f = 0; f < this->fragmentsStorage.fragmentCount; f++) {             
                  if (!this->fragmentsStorage.fragmentStorageData[f].residency->resident[osContext->getContextId()]) {

The fault occurs in the last line of this snippet.

the dissassembly of this function looks like this:

Dump of assembler code for function _ZN3NEO13DrmAllocation15makeBOsResidentEPNS_9OsContextEjPSt6vectorIPNS_12BufferObjectESaIS5_EEb:
   0x00007fffe6d1da10 <+0>:     push   r15
   0x00007fffe6d1da12 <+2>:     push   r14
   0x00007fffe6d1da14 <+4>:     mov    r14,rcx
   0x00007fffe6d1da17 <+7>:     push   r13
   0x00007fffe6d1da19 <+9>:     mov    r13d,edx
   0x00007fffe6d1da1c <+12>:    push   r12
   0x00007fffe6d1da1e <+14>:    mov    r12,rsi
   0x00007fffe6d1da21 <+17>:    push   rbp
   0x00007fffe6d1da22 <+18>:    mov    rbp,rdi
   0x00007fffe6d1da25 <+21>:    push   rbx
   0x00007fffe6d1da26 <+22>:    mov    ebx,r8d
   0x00007fffe6d1da29 <+25>:    sub    rsp,0x8
   0x00007fffe6d1da2d <+29>:    mov    edx,DWORD PTR [rdi+0x90]
   0x00007fffe6d1da33 <+35>:    test   edx,edx
   0x00007fffe6d1da35 <+37>:    je     0x7fffe6d1db18 <_ZN3NEO13DrmAllocation15makeBOsResidentEPNS_9OsContextEjPSt6vectorIPNS_12BufferObjectESaIS5_EEb+264>
   0x00007fffe6d1da3b <+43>:    mov    eax,DWORD PTR [rsi+0x10]
   0x00007fffe6d1da3e <+46>:    mov    rsi,QWORD PTR [rdi+0x38]
   0x00007fffe6d1da42 <+50>:    mov    r15d,0x1
=> 0x00007fffe6d1da48 <+56>:    mov    rdi,QWORD PTR [rsi]

The problem is, that rsi is 0, and thus this is a null dereference. rsi is set 2 lines above in

   0x00007fffe6d1da3e <+46>:    mov    rsi,QWORD PTR [rdi+0x38]

rdi is in fact a pointer to the this pointer of this function - as can be seen in the 0th frame of the backtrace above.

The 0x38th offset of rdi translates to the address

(gdb) p $rdi+0x38
$2 = 0x66bfac8

which is inside the fragmentsStorage member variable of the NEO::DrmAllocation class as seen by the following calculation:

(gdb) p &this->fragmentsStorage
$1 = (NEO::OsHandleStorage *) 0x66bfaa8

this frameStorage looks like this:

(gdb) ptype/o this->fragmentsStorage
/* offset | size / type = struct NEO::OsHandleStorage {
/
0 | 120 */ NEO::AllocationStorageData fragmentStorageData[3];

apparently the fragmentsStorage has at most 3 fragmentStorageData array items, which look like this:

(gdb) ptype/o this->fragmentsStorage.fragmentStorageData
type = struct NEO::AllocationStorageData {
/*    0      |     8 */    NEO::OsHandle *osHandleStorage;
/*    8      |     8 */    size_t fragmentSize;
/*   16      |     8 */    const void *cpuPtr;
/*   24      |     1 */    bool freeTheFragment;
/* XXX  7-byte hole  */
/*   32      |     8 */    NEO::ResidencyData *residency;

                           /* total size (bytes):   40 */
                         } [3]

since each of them is 40 bytes long, it is obvious that the offending address that gets loaded in rsi is the last member variable of this structure: residency.

we can verify this quickly:

(gdb) p &this->fragmentsStorage.fragmentStorageData[0].residency
$4 = (NEO::ResidencyData **) 0x66bfac8

which is equal to $2 calulated above adding 0x38 to rdi.

what is the value there, surely it is 0?

(gdb) p this->fragmentsStorage.fragmentStorageData[0].residency
$5 = (NEO::ResidencyData *) 0x7fff740018a0

This address belongs to some mmapped memory region (which supposedly is shared with the gpu). Since 3 assembly operations before the segfault we read this value as 0, this is a strong indicator of some kind of race condition.

Interestingly, running this process multiple times sometimes did result sometimes in a value of 0 there when already in gdb, suggesting that whatever updates this memory has not yet written there.

Sometimes running the testcases the crash happens at a different location:

Abort was called at 250 line in file:
/build/intel-compute-runtime-7vSeZ9/intel-compute-runtime-20.44.18297/shared/source/memory_manager/host_ptr_manager.cpp

which contains:

UNRECOVERABLE_IF(checkAllocationsForOverlapping(memoryManager, &requirements) == RequirementsStatus::FATAL);

apparently this function fails: https://github.com/intel/compute-runtime/blob/3015b9575251d380e24a4569566b7b8a467d6380/shared/source/memory_manager/host_ptr_manager.cpp#L261

my opencl-fu is very limited, this is where i'd like to hand this issue over to you or the intel guys. i'll open an issue in their tracker about this.
(thanks to fabs for helping me debug this this far)

@ghostop14
Copy link
Owner

It's definitely interesting. I've mostly been running against NVIDIA GPUs, I haven't tested against Intel graphics sets in quite a while (I had it in a really old laptop maybe 5-7 years ago, but nothing in my current setups). 2 things do immediately come to mind:

  1. What's the size of the memory block created and is it potentially too big for that GPU? Maybe it's silently failing somewhere, hence the NULL?
  2. Could be something in the Intel OpenCL implementation related to (1). I'd think if there isn't enough memory or something, maybe it throws an exception?

You may be able to try in the section where the memory gets allocated adding some checks that the memory creation appears to have succeeded even if it doesn't throw an exception? In setBufferLength() for instance. If new cl::Buffer isn't throwing an exception, maybe it's just returning a NULL if it can't allocate?

@stef
Copy link
Author

stef commented Mar 8, 2021

would you be able to propose specific code to patch in, i'd be happy to run tests

@ghostop14
Copy link
Owner

As a first test, just make sure aBuffer, etc. are not coming back NULL. Just something like aBuffer != NULL. What data type are you feeding it? I can glance over that kernel there one more time too.

@stef
Copy link
Author

stef commented Mar 8, 2021

i feed a sinus and a cosinus signal source in my gnuradiocompanion flowgraph into a cl-multiply and that into a nullsink. i don't really do buffers at all.

@ghostop14
Copy link
Owner

I was referring to the cl::buffers that get created in the cl_mathop class. Are the sine/cosines complex or float data types?

@stef
Copy link
Author

stef commented Mar 8, 2021

they are all complex.

@stef
Copy link
Author

stef commented Mar 8, 2021

i prepared testcases linked in my analysis under https://gist.github.com/stef/5e2d8a88f1d9d220f623a55531a9fb06

@stef
Copy link
Author

stef commented Mar 19, 2021

the intel people have posted some insights, does any of this ring a bell with you? intel/compute-runtime#409 (comment)

@stef
Copy link
Author

stef commented Mar 24, 2021

i created a much simpler c++ only testcase which still fails for me: https://gist.github.com/stef/e7818dc85b48c06d98a333e01f3d526f

@stef
Copy link
Author

stef commented May 21, 2021

there seems to be a violation of the opencl spec according to the intel engineers reproducing the testcase: intel/compute-runtime#409 (comment)

@ghostop14
Copy link
Owner

I did try to start a conversion to the newer headers and ran into issues. So there's a foundation in the code base now to switch it over. It's just not turned on. If Intel provides any feedback, let me know.

@stef
Copy link
Author

stef commented May 21, 2021

i'm not sure what other feedback would intel be providing besides the:

After debugging I can see that the reason for the crash is using the same host ptr in non-blocking clEnqueueWriteBuffer calls in different threads and with different command queues.
According to the spec another write with the same host ptr can't be enqueued before previous one finishes.
Please see: https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_API.html#_reading_writing_and_copying_buffer_objects "The memory pointed to by ptr cannot be reused by the application after the call returns".

@ghostop14
Copy link
Owner

In the math op blocks, there's only one queue, there's a thread lock to ensure multiple work() calls don't happen at the same time to cause any overlap, and there's a blocking read at the end of a sequence (which is the correct way to do it). So I'm not sure what they're looking at.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants