Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contributing a SYCL version #127

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Contributing a SYCL version #127

wants to merge 1 commit into from

Conversation

mgrabban
Copy link

Thank you for this nice repo! We at Intel use it as one of our GPU benchmark workloads. We now would like to contribute the SYCL version here.

These are the changes that were made:

  • We took the CUDA version of bitcracker and converted it to SYCL using Intel's DPC++ Compatiblity Tool (DPCT) available here.
  • NOTE 1: This version bypasses use of FAISS by running input images through an offline Python version of FAISS and using its output as input to this SYCL version. So this is more suitable for hardware and framework (SYCL, CUDA, HIP) benchmarking.
  • NOTE 2: This version also does not use fft from MKL. Instead it uses a manually implemented fft. For apples-to-apples comparison, we do have a corresponding (modified) CUDA version available here in Velocity-Bench. I am happy to add that CUDA version here, if that will be useful.
  • The SYCL code runs on Intel GPUs, CPUs, NVIDIA GPUs and AMD GPUs. The README has detailed instructions for building and running on these machines.

The REAME.md for this version has detailed build and run instructions for different machines.

Any feedback is welcome.

@DavidMChan
Copy link
Member

Thanks for this contribution! It's exciting to see that this project was useful for you! Do you have a link to some benchmarking results for the code on different devices? This is an interesting direction for the code base to allow support for multiple different underlying architectures, but I do wonder what the performance tradeoffs might be. I also wonder how hard it might be to integrate this into the existing project and delivery pipelines -- it may take some time to go through this code, and see what the added build implications are for our system.

@mgrabban
Copy link
Author

The code should give good performance and one can also run the different versions available in Velocity-Bench on different devices to compare performance. But we can't share performance numbers due to some restrictions. This link has performance of some of our other workloads.
In terms of integration, as mentioned before, FAISS has completely been bypassed and simple manually implemented fft has been used (in all versions). Besides these, the code struct remains more or less the same.

@mgrabban
Copy link
Author

We understand integration might take time but we are happy with keeping the SYCL version as a separate branch in this repo. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants