GPUDirect Async

Introduction

GPUDirect Async is all about moving control logic from third-party devices to the GPU.

LibGDSync implements GPUDirect Async support on InfiniBand Verbs, by bridging the gap between the CUDA and the Verbs APIs. It consists of a set of low-level APIs which are still very similar to IB Verbs though operating on CUDA streams.

Requirements

CUDA

A recent CUDA Toolkit, minimally 8.0, because of the CUDA driver MemOP APIs.
A recent display driver, i.e. r361, r367 or later, is required.
Explicitly enable GPU peer mappings

GPUDirect Async depends on the ability to create GPU peer mappings of the HCA BAR space.

GPU peer mappings are mappings (in the sense of cuMemHostRegister) to the PCI Express resource space of a third party device. That feature is normally disable due to potential security problems, see https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2015-5053. In fact, unless the PeerMappingOverride registry of the NVIDIA kernel-mode driver is enabled, only root user can use that feature.

To enable GPU peer mappings for all users, the PeerMappingOverride registry must be set to 1:

$ cat /etc/modprobe.d/nvidia.conf
options nvidia NVreg_RegistryDwords="PeerMappingOverride=1;"

If the display driver is r387 or newer, the CUDA Memory Operations API must be explicitly enabled by means of NVreg_EnableStreamMemOPs=1:

$ cat /etc/modprobe.d/nvidia.conf
options nvidia NVreg_EnableStreamMemOPs=1 NVreg_RegistryDwords="PeerMappingOverride=1;"

After that, either reboot or manually reload the NVIDIA kernel module:

# unload all kernel modules which depends on nvidia.ko
$ service gdrcopy stop
$ service nv_peer_mem stop
$ modprobe -r nvidia_uvm
$ modprobe -r nvidia
$ modprobe nvidia
...

Infiniband

Mellanox OFED (MOFED) 4.2 or newer is required, because of the peer-direct verbs extensions. As an alternative, it is possible to use MOFED 3.4 and replace the stock libmlx5 with the one at https://github.com/gpudirect/libmlx5/tree/fixes.

Peer-direct verbs are only supported on the libmlx5 low-level plug-in module, so either Connect-IB or ConnectX-4 HCAs are required.

The Mellanox OFED GPUDirect RDMA kernel module, is required to allow the HCA to access the GPU memory.

Caveats

Tests have been done using Mellanox Connect-IB. Any HCA driven by mlx5 driver should work.

Kepler or newer Tesla/Quadro GPUs are required because of GPUDirect RDMA.

A special HCA firmware setting is currently necessary in combination with GPUs prior to Pascal. Use mlxconfig to set the NON_PREFETCHABLE_PF_BAR parameter on your HCA to 1. For more information see Mellanox Firmware Tools (MFT) User Manual.

Additional libraries

The GDRCopy library is required to create CPU-side user-space mappings of GPU memory, currently used when allocating verbs objects on GPU memory.

Platforms

This prototype has been tested on RHEL 6.x and Ubuntu 16.04

Build

Git repository does not include autotools files. The first time the directory must be configured by running:

$ autoreconf -if

If that fails complaining about AX_CHECK_COMPILE_FLAG, you will need to install a library of extra autoconf macros, for example:

$ yum install autoconf-archive

As an example, the build.sh script is provided. You should modify it according to the desired destination paths as well as the location of the dependencies.

Before starting to build LibGDSync, you need to have available on your system GDRCopy libraries and headers.

LibMP

LibMP is a lightweight messaging library built on top of LibGDSync APIs, developed as a technology demonstrator to easily deploy the GPUDirect Async technology in applications.

GPUDirect Async suite

We created a new repository here in order to collect in a single project all the components of the GPUDirect Async technology. In this repo you can find several scripts useful to configure, build and run all the GPUDirect Async libraries, tests, benchmarks and examples.

Acknowledging GPUDirect Async

If you find this software useful in your work, please cite:

"GPUDirect Async: exploring GPU synchronous communication techniques for InfiniBand clusters", E. Agostini, D. Rossetti, S. Potluri. Journal of Parallel and Distributed Computing, Vol. 114, Pages 28-45, April 2018

"Offloading communication control logic in GPU accelerated applications", E. Agostini, D. Rossetti, S. Potluri. Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid’ 17), IEEE Conference Publications, Pages 248-257, Nov 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

GPUDirect Async

Introduction

Requirements

CUDA

Infiniband

Caveats

Additional libraries

Platforms

Build

LibMP

GPUDirect Async suite

Acknowledging GPUDirect Async

Files

README.md

Latest commit

History

README.md

File metadata and controls

GPUDirect Async

Introduction

Requirements

CUDA

Infiniband

Caveats

Additional libraries

Platforms

Build

LibMP

GPUDirect Async suite

Acknowledging GPUDirect Async