forked from NVIDIA/grcuda
-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Grcuda 61 cleanup for release 1 (#19)
* adding changelog, removed unused thread manager * fixed install.sh, now using env variables to retrieve the absolute path to grcuda.jar * added curl to install script * added license to demo * removed unused cuda code; added license to benchmarks * added license to tests * added license to functions and libraries * fixed removal of thread manager breaking build * added more updated licenses * added license to runtime files * Updated changelog * fixed typo * added grcuda-data info to readme * udpated tracking of grcuda-data * temporarily removed submodule grcuda-data * readded grcuda-data submodule * tracking master? * updated grcuda-data tracking * Added the possibility to send execution times to the frontend * Display execution times in race mode * clarified streamattach in changelog Co-authored-by: Guido Walter Di Donato <[email protected]> Co-authored-by: Francesco Sgherzi <[email protected]>
- Loading branch information
1 parent
311ea3e
commit d63678d
Showing
265 changed files
with
5,117 additions
and
3,286 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
[submodule "grcuda-data"] | ||
path = grcuda-data | ||
url = https://github.com/AlbertoParravicini/grcuda-data.git | ||
url = https://github.com/AlbertoParravicini/grcuda-data.git | ||
branch = master |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
# 2021-09-30, Release 1 | ||
|
||
## API Changes | ||
|
||
* Added option to specify arguments in NFI kernel signatures as `const` | ||
* The effect is the same as marking them as `in` in the NIDL syntax | ||
* It is not strictly required to have the corresponding arguments in the CUDA kernel marked as `const`, although that's recommended | ||
* Marking arguments as `const` or `in` enables the async scheduler to overlap kernels that use the same read-only arguments | ||
|
||
## New asynchronous scheduler | ||
|
||
* Added a new asynchronous scheduler for GrCUDA, enable it with `--experimental-options --grcuda.ExecutionPolicy=async` | ||
* With this scheduler, GPU kernels are executed asynchronously. Once they are launched, the host execution resumes immediately | ||
* The computation is synchronized (i.e. the host thread is stalled and waits for the kernel to finish) only once GPU data are accessed by the host thread | ||
* Execution of multiple kernels (operating on different data, e.g. distinct DeviceArrays) is overlapped using different streams | ||
* Data transfer and execution (on different data, e.g. distinct DeviceArrays) is overlapped using different streams | ||
* The scheduler supports different options, see `README.md` for the full list | ||
* It is the scheduler presented in "DAG-based Scheduling with Resource Sharing for Multi-task Applications in a Polyglot GPU Runtime" (IPDPS 2021) | ||
|
||
* Enabled partial support for cuBLAS and cuML in the aync scheduler | ||
* **Known limitation:** functions in these libraries work with the async scheduler, although they still run on the default stream (i.e. they are not asynchronous) | ||
* They do benefit from prefetching | ||
* Set TensorRT support to experimental | ||
* TensorRT is currently not supported on CUDA 11.4, making it impossible to use along a recent version of cuML | ||
* **Known limitation:** due to this incompatibility, TensorRT is currently not available on the async scheduler | ||
|
||
## New features | ||
|
||
* Added generic AbstractArray data structure, which is extended by DeviceArray, MultiDimDeviceArray, MultiDimDeviceArrayView, and provides high-level array interfaces | ||
* Added API for prefetching | ||
* If enabled (and using a GPU with architecture newer or equal than Pascal), it prefetches data to the GPU before executing a kernel, instead of relying on page-faults for data transfer. It can greatly improve performance | ||
* Added API for stream attachment | ||
* Always enabled in GPUs with with architecture older than Pascal, and the async scheduler is active. With the sync scheduler, it can be manually enabled | ||
* It restricts the visibility of GPU data to the specified stream | ||
* In architectures newer or equal than Pascal it can provide a small performance benefit | ||
* Added `copyTo/copyFrom` functions on generic arrays (Truffle interoperable objects that expose the array API) | ||
* Internally, the copy is implemented as a for loop, instead of using CUDA's `memcpy` | ||
* It is still faster than copying using loops in the host languages, in many cases, and especially if host code is not JIT-ted | ||
* It is also used for copying data to/from DeviceArrays with column-major layout, as `memcpy` cannot copy non-contiguous data | ||
|
||
## Demos, benchmarks and code samples | ||
|
||
* Added demo used at SeptembeRSE 2021 (`demos/image_pipeline_local` and `demos/image_pipeline_web`) | ||
* It shows an image processing pipeline that applies a retro look to images. We have a local version and a web version that displays results a in web page | ||
* Added benchmark suite written in Graalpython, used in "DAG-based Scheduling with Resource Sharing for Multi-task Applications in a Polyglot GPU Runtime" (IPDPS 2021) | ||
* It is a collection of complex multi-kernel benchmarks meant to show the benefits of asynchronous scheduling. | ||
|
||
## Miscellaneosus | ||
|
||
* Added dependency to `grcuda-data` submodule, used to store data, results and plots used in publications and demos. | ||
* Updated name "grCUDA" to "GrCUDA". It looks better, doesn't it? | ||
* Added support for Java 11 along with Java 8 | ||
* Added option to specify the location of cuBLAS and cuML with environment variables (`LIBCUBLAS_DIR` and `LIBCUML_DIR`) | ||
* Refactored package hierarchy to reflect changes to current GrCUDA (e.g. `gpu -> runtime`) | ||
* Added basic support for TruffleLogger | ||
* Removed a number of existing deprecation warnings | ||
* Added around 800 unit tests, with support for extensive parametrized testing and GPU mocking | ||
* Updated documentation | ||
* Bumped GraalVM version to 21.2 | ||
* Added scripts to setup a new machine from scratch (e.g. on OCI), plus other OCI-specific utility scripts (see `oci_setup/`) | ||
* Added documentation to setup IntelliJ Idea for GrCUDA development | ||
* Added documentation about Python benchmark suite | ||
* Added documentation on asynchronous scheduler options |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.