GrCUDA MultiGPU Pre-release

Pre-release

Pre-release

gwdidonato released this 15 Apr 20:31

· 17 commits to GRCUDA-96-11-AD-AE-SC22 since this release

grcuda-0.4-beta.0

d26261d

New features

Enabled support for multiple GPU in the asynchronous scheduler:
- Added the GrCUDADeviceManager component that encapsulates the status of the multi-GPU system. It tracks the currently active GPUs, the streams and the currently active computations associated with each GPU, and what data is up-to-date on each device.
- Added the GrCUDAStreamPolicy component that encapsulates new scheduling heuristics to select the best device for each new computation (CUDA streams are uniquely associated to a GPU), using information such as data locality and the current load of the device. We currently support 5 scheduling heuristic with increasing complexity:
  - ROUND_ROBIN: simply rotate the scheduling between GPUs. Used as initialization strategy of other policies;
  - STREAM_AWARE: assign the computation to the device with the fewest busy stream, i.e. select the device with fewer ongoing computations;
  - MIN_TRANSFER_SIZE: select the device that requires the least amount of bytes to be transferred, maximizing data locality;
  - MINMIN_TRANSFER_TIME: select the device for which the minimum total transfer time would be minimum;
  - MINMAX_TRANSFER_TIME select the device for which the maximum total transfer time would be minimum.
- Modified the GrCUDAStreamManager component to select the stream with heuristics provided by the policy manager.
- Extended the CUDARuntime component with APIs for selecting and managing multiple GPUs.

Assets 3