-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SYCL implementation #44
base: master
Are you sure you want to change the base?
Conversation
look for HAVE_SYCL and SYCL_EXTERNAL for most of the changes. this works on GPU and HOST but not CPU (OpenCL issue). Signed-off-by: Jeff Hammond <[email protected]>
Signed-off-by: Jeff Hammond <[email protected]>
Signed-off-by: Jeff Hammond <[email protected]>
Signed-off-by: Jeff Hammond <[email protected]>
Signed-off-by: Jeff Hammond <[email protected]>
I am also leaving in a bit of debug code that I think is useful until we figure out the right way to control SYCL device dispatch. Because SYCL is a pluripotent back-end, it isn't obvious how to do this. Intel DPC++ allows one to set the default device with an environment variable, but I wanted to push control in QS for debugging purposes. |
I have verified that the SYCL implementation also runs correctly on NVIDIA (Pascal). There is a compiler bug with Compiler
Hardware
Host execution
NVIDIA execution
CUDA executionFor reference, here is the execution with CUDA 11.0 and the
|
Signed-off-by: Jeff Hammond <[email protected]>
Signed-off-by: Jeff Hammond <[email protected]>
This is the SYCL/DPC++ port. It currently depends on two features that are not widely available:
USM (unified shared memory), which serves the same purpose as
cudaMallocManaged
. It is part of SYCL 2020 but only Intel DPC++ on Intel back-ends supports it today. CodePlay ComputeCpp has started implementing it but I don't think it is finished yet and I didn't test it.sycl::intel::experimental::printf
, which is, as one might expect, an Intel extension to supportprintf
. There is an alternative extension in CodePlay ComputeCpp but I didn't bother with that because of the previous issue.The following output appears to be identical to that of GCC OpenMP, but please let me know what other verification I need to do.
Today, the Intel DPC++ implementation is working with the host and Gen9 GPU devices, but not the CPU device because of an Intel OpenCL issue that is known and in the process of being fixed.