Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for extended memops #59

Merged
merged 99 commits into from
May 30, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
99 commits
Select commit Hold shift + click to select a range
69fe4e9
fixes for experimental CUDA MemOps
drossetti May 15, 2017
e8461b6
fix bug such that GDS_DISABLE_WEAK_CONSISTENCY was not effective
drossetti Jun 16, 2017
4a4a2be
bump version to 2.2
drossetti Jun 19, 2017
2e4426c
add support for WRITE_MEMORY
drossetti Jun 19, 2017
1b95132
introduce GDSCHECK
drossetti Jun 19, 2017
638dc49
use gds_prepare_write_memory to init descriptor
drossetti Jun 19, 2017
220ea6a
use gpu.h tracing funs
drossetti Jun 19, 2017
9fac000
convert most of prints into gpu_dbg()
drossetti Jun 19, 2017
9a36de6
translate CUDA_ERROR_NOT_SUPPORTED, propagate error from gds_stream_b…
drossetti Jun 19, 2017
de395b0
remove most of tools APIs, easily replaced by using gds descriptors
drossetti Jun 20, 2017
64c6104
rename GDS_DISABLE_INLINECOPY to GDS_DISABLE_WRITEMEMORY
drossetti Jun 20, 2017
9a2d701
replacing gds_stream_post_poll_dword with gds_prepare_wait_value32+gd…
drossetti Jun 20, 2017
dfeb15d
introduce -w (use_wrmem) to enable use of WRITE_MEMORY. add GDSCHECK …
drossetti Jun 20, 2017
51aa395
support global id attribute. should solve issue #10
drossetti Jun 20, 2017
f95f717
use gpu debug/error/tracing functions. refactor code to support non-p…
drossetti Jun 20, 2017
2c0539f
get rid of autogen.sh, use autoreconf instead
drossetti Jun 21, 2017
235b206
fix CU_STREAM_MEMORY_BARRIER flags
drossetti Jan 3, 2018
c5bbf41
improve error reporting
drossetti Jan 3, 2018
2388e32
refactor passing extra configure params
drossetti Jan 4, 2018
9e1731a
cleanups, passing on K40m
drossetti Jan 4, 2018
be2e8cd
increase CHUNK_SIZE to 16 in gds_sanity
drossetti Feb 15, 2018
b5efbca
use CUDATK if set
drossetti May 10, 2018
186b2b4
fix merge breakages
drossetti May 10, 2018
bbaf2e2
don't use extended memops by default
drossetti May 10, 2018
5f56b54
move CUDA <= 9.2 memop check in general configure section. add checki…
drossetti May 11, 2018
81d3196
introduce OFED and GDRCOPY env vars. pass --enable-extended-memops by…
drossetti May 11, 2018
f49eb80
introduce GDS_MEMBAR_MLX5 membar flag as an optimization. use it in g…
drossetti May 11, 2018
6afebc4
consistently deploy HAS_DECL macros. refactor weak consistency enabli…
drossetti May 11, 2018
8d1f7bb
warm gpu_stream_client as well
drossetti May 11, 2018
edce254
!peersync is not supported so message accordingly
drossetti May 11, 2018
0c2285e
add support for NVTX style profiling annotations
drossetti May 15, 2018
86ae4a8
fix bug such that nvtx was always enabled
drossetti May 16, 2018
e7e4589
introduce gds_post_descriptors
drossetti May 11, 2018
08f0ab1
fix decl of gds_post_descriptors
drossetti May 11, 2018
9b93e8c
implement missing cases in gds_post_descriptors and gds_post_ops_on_c…
drossetti May 11, 2018
a75b939
fix typo in comment
drossetti May 15, 2018
85ee32c
add gpu_warnc & gpu_warn_once
drossetti May 15, 2018
0577d24
make gpu_launch_calc_kernel_on_stream calculate grid
drossetti May 15, 2018
6c27f5a
warn once if size==0
drossetti May 16, 2018
5eb44b0
route dbg macro to gpu_dbg. introduce --skip-kernel-launch option, wh…
drossetti May 16, 2018
33bd1b9
fix help test for -P option
drossetti May 16, 2018
733d772
add support for WRITE_MEMORY
drossetti Jun 19, 2017
debd246
introduce GDSCHECK
drossetti Jun 19, 2017
8c62fb4
use gds_prepare_write_memory to init descriptor
drossetti Jun 19, 2017
90ad6f7
use gpu.h tracing funs
drossetti Jun 19, 2017
ba15238
convert most of prints into gpu_dbg()
drossetti Jun 19, 2017
2ca857d
translate CUDA_ERROR_NOT_SUPPORTED, propagate error from gds_stream_b…
drossetti Jun 19, 2017
0e407cc
remove most of tools APIs, easily replaced by using gds descriptors
drossetti Jun 20, 2017
2f23a59
rename GDS_DISABLE_INLINECOPY to GDS_DISABLE_WRITEMEMORY
drossetti Jun 20, 2017
eadd319
replacing gds_stream_post_poll_dword with gds_prepare_wait_value32+gd…
drossetti Jun 20, 2017
6000cbe
introduce -w (use_wrmem) to enable use of WRITE_MEMORY. add GDSCHECK …
drossetti Jun 20, 2017
da10534
support global id attribute. should solve issue #10
drossetti Jun 20, 2017
e94c87a
use gpu debug/error/tracing functions. refactor code to support non-p…
drossetti Jun 20, 2017
59caac4
get rid of autogen.sh, use autoreconf instead
drossetti Jun 21, 2017
f2102b4
fix CU_STREAM_MEMORY_BARRIER flags
drossetti Jan 3, 2018
05bb8ed
improve error reporting
drossetti Jan 3, 2018
bad134f
refactor passing extra configure params
drossetti Jan 4, 2018
2c99cc8
cleanups, passing on K40m
drossetti Jan 4, 2018
98933af
increase CHUNK_SIZE to 16 in gds_sanity
drossetti Feb 15, 2018
ac20a78
use CUDATK if set
drossetti May 10, 2018
50fb20a
don't use extended memops by default
drossetti May 10, 2018
e232f25
move CUDA <= 9.2 memop check in general configure section. add checki…
drossetti May 11, 2018
b4d5e7f
introduce OFED and GDRCOPY env vars. pass --enable-extended-memops by…
drossetti May 11, 2018
09055e3
introduce GDS_MEMBAR_MLX5 membar flag as an optimization. use it in g…
drossetti May 11, 2018
19753e4
consistently deploy HAS_DECL macros. refactor weak consistency enabli…
drossetti May 11, 2018
d02e5cc
warm gpu_stream_client as well
drossetti May 11, 2018
778dead
!peersync is not supported so message accordingly
drossetti May 11, 2018
63f5955
Merge pull request #63 from gpudirect/streamcb
e-ago May 17, 2018
e9726b7
Merge branch 'new_memops' into nvtx
e-ago May 17, 2018
c1470fc
Merge pull request #62 from gpudirect/nvtx
e-ago May 17, 2018
6227ee4
fix free on error in stream callback. force use_desc_apis when !peers…
drossetti May 18, 2018
95a9a30
handle stream callback errors
drossetti May 18, 2018
7be8f7b
handle stream callback errors. introduce --gpu-mem (replacing gpu_id=…
drossetti May 18, 2018
80df4c7
Merge pull request #65 from gpudirect/fixtests
e-ago May 18, 2018
fa155dd
NVTX annotate pp_post_work()
drossetti May 22, 2018
babd92c
introduce --hide-cpu-launch-latency/-L option to ease benchmarking.
drossetti May 22, 2018
7f6866c
cleanup: remove unused options
drossetti May 22, 2018
d671068
refactor gpu registration to accomodate on-demand registration happen…
drossetti May 22, 2018
2f8d3d7
improve help. actually implement -m option. fix detection of WRITE_ME…
drossetti May 22, 2018
27ac382
remove stale text
drossetti May 22, 2018
834f026
remove cudaDeviceSynchronize from NVTX macros. always flush stderr in…
drossetti May 22, 2018
78e2c84
add a comment clarifying that gds_ordinal_from_device() needs a bette…
drossetti May 22, 2018
d83b81c
add/fix options help text
drossetti May 22, 2018
f771a63
add more tracing. fix PROF usage for !peersync case and make decrese …
drossetti May 22, 2018
36a8f39
add GDS_ASSERT utility function
drossetti May 23, 2018
904bc6e
fortify gds_ordinal_from_device implementation
drossetti May 23, 2018
9240b4d
fix implementation of GDS_ASSERT()
drossetti May 24, 2018
475813b
cleanup warnings of Ubuntu
drossetti May 24, 2018
7c77f2e
abort if relaxed ordering is not consistently detected
drossetti May 24, 2018
c0965b9
fix bug when calling gds_prepare_write_memory when CU_STREAM_MEM_OP_W…
drossetti May 24, 2018
e15534b
add dbg tracing for POLL_NOR
drossetti May 24, 2018
d767910
document --gpu-mem option in help
drossetti May 25, 2018
3a8faef
improve messaging in case of errors in support_weak_consistency
drossetti May 25, 2018
23da4fa
fix more warnings on Ubuntu
drossetti May 25, 2018
929516e
remove stale code. introduce GDS_DISABLE_WAIT_NOR off by default. fix…
drossetti May 25, 2018
2fb9295
disabling WAIT NOR as a work-around for #68
drossetti May 25, 2018
c11aee7
error prints MPI rank
drossetti May 29, 2018
1489810
Merge branch 'new_memops' of https://github.com/gpudirect/libgdsync i…
drossetti May 29, 2018
4e430de
Merge branch 'master' into new_memops
e-ago May 30, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 5 additions & 6 deletions Makefile.am
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ AM_CPPFLAGS += -D__STDC_FORMAT_MACROS

#AM_LDFLAGS = -L$(CUDA_PATH)/lib64
LIBGDSTOOLS = @LIBGDSTOOLS@
LIBNVTX = @LIBNVTX@

lib_LTLIBRARIES = src/libgdsync.la

Expand All @@ -34,21 +35,19 @@ bin_PROGRAMS = tests/gds_kernel_latency tests/gds_poll_lat tests/gds_kernel_loop
noinst_PROGRAMS = tests/rstest

tests_gds_kernel_latency_SOURCES = tests/gds_kernel_latency.c tests/gpu_kernels.cu tests/pingpong.c tests/gpu.cpp
tests_gds_kernel_latency_LDADD = $(top_builddir)/src/libgdsync.la -lmpi $(LIBGDSTOOLS) -lgdrapi -lcuda -lcudart
tests_gds_kernel_latency_LDADD = $(top_builddir)/src/libgdsync.la -lmpi $(LIBGDSTOOLS) -lgdrapi $(LIBNVTX) -lcuda -lcudart

tests_rstest_SOURCES = tests/rstest.cpp
tests_rstest_LDADD =

#tests_gds_poll_lat_CFLAGS = -DUSE_PROF -DUSE_PERF -I/ivylogin/home/drossetti/work/p4/cuda_a/sw/dev/gpu_drv/cuda_a/drivers/gpgpu/cuda/inc
#tests_gds_poll_lat_SOURCES = tests/gds_poll_lat.c tests/gpu.cpp tests/gpu_kernels.cu tests/perfutil.c tests/perf.c
tests_gds_poll_lat_SOURCES = tests/gds_poll_lat.c tests/gpu.cpp tests/gpu_kernels.cu
tests_gds_poll_lat_LDADD = $(top_builddir)/src/libgdsync.la $(LIBGDSTOOLS) -lgdrapi -lmpi -lcuda -lcudart
tests_gds_poll_lat_LDADD = $(top_builddir)/src/libgdsync.la $(LIBGDSTOOLS) -lgdrapi -lmpi $(LIBNVTX) -lcuda -lcudart

tests_gds_sanity_SOURCES = tests/gds_sanity.c tests/gpu.cpp tests/gpu_kernels.cu
tests_gds_sanity_LDADD = $(top_builddir)/src/libgdsync.la $(LIBGDSTOOLS) -lgdrapi -lmpi -lcuda -lcudart
tests_gds_sanity_LDADD = $(top_builddir)/src/libgdsync.la $(LIBGDSTOOLS) -lgdrapi -lmpi $(LIBNVTX) -lcuda -lcudart

tests_gds_kernel_loopback_latency_SOURCES = tests/gds_kernel_loopback_latency.c tests/pingpong.c tests/gpu.cpp tests/gpu_kernels.cu
tests_gds_kernel_loopback_latency_LDADD = $(top_builddir)/src/libgdsync.la $(LIBGDSTOOLS) -lgdrapi -lcuda -lcudart
tests_gds_kernel_loopback_latency_LDADD = $(top_builddir)/src/libgdsync.la $(LIBGDSTOOLS) -lgdrapi $(LIBNVTX) -lcuda -lcudart


SUFFIXES= .cu
Expand Down
5 changes: 4 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,10 @@ This prototype has been tested on RHEL 6.x and Ubuntu 16.04
## Build

Git repository does not include autotools files. The first time the directory
must be configured by running autogen.sh
must be configured by running:
```shell
$ autoreconf -if
```

As an example, the build.sh script is provided. You should modify it
according to the desired destination paths as well as the location
Expand Down
7 changes: 0 additions & 7 deletions autogen.sh

This file was deleted.

44 changes: 36 additions & 8 deletions build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,27 +2,55 @@

[ ! -d config ] && mkdir -p config

[ ! -e configure ] && ./autogen.sh
[ ! -e configure ] && autoreconf -fv -i

[ ! -d build ] && mkdir build

cd build
echo "PREFIX=$PREFIX"
echo "CUDADRV=$CUDADRV"
echo "CUDATK=$CUDATK"
echo "CUDA=$CUDA"
echo "MPI_HOME=$MPI_HOME"

if [ ! -e Makefile ]; then
echo "configuring..."
WITHCUDADRV=
EXTRA=
if [ "x$CUDADRV" != "x" ]; then
WITHCUDADRV="--with-cuda-driver=${CUDADRV}"
EXTRA+=" --with-cuda-driver=${CUDADRV}"
fi
if [ "x$CUDATK" != "x" ]; then
EXTRA+=" --with-cuda-toolkit=$CUDATK"
elif [ "x$CUDA" != "x" ]; then
EXTRA+=" --with-cuda-toolkit=$CUDA"
else
echo "ERROR: CUDA toolkit path not passed"
exit
fi
if [ "x$OFED" != "x" ]; then
echo "picking OFED libibverbs from $OFED"
EXTRA+=" --with-libibverbs=$OFED"
else
echo "WARNING: assuming IB Verbs is installed in /usr"
EXTRA+=" --with-libibverbs=/usr"
fi

if [ "x$GDRCOPY" != "x" ]; then
EXTRA+=" --with-gdrcopy=$GDRCOPY"
else
echo "WARNING: assuming GDRcopy is installed in /usr"
EXTRA+=" --with-gdrcopy=/usr"
fi

EXTRA+=" --enable-test"
EXTRA+=" --enable-extended-memops"
#EXTRA+=" --enable-nvtx"
#EXTRA="$EXTRA --with-gdstools=$PREFIX"

../configure \
--prefix=$PREFIX \
--with-libibverbs=$PREFIX \
$WITHCUDADRV \
--with-cuda-toolkit=$CUDA \
--with-gdrcopy=$PREFIX \
--with-mpi=$MPI_HOME \
--enable-test
$EXTRA

fi

Expand Down
48 changes: 40 additions & 8 deletions configure.ac
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ AM_CONDITIONAL(TEST_ENABLE, test x$enable_test = xyes)
AC_ARG_ENABLE(
[extended-memops],
[AC_HELP_STRING([--enable-extended-memops],
[Enable support for CUDA 9.0 MemOps (default=no)])],
[Enable support for CUDA 10.0 MemOps (default=no)])],
[enable_ext_memops=$enableval],
[enable_ext_memops=no])
AM_CONDITIONAL(EXT_MEMOPS, test x$enable_ext_memops = xyes)
Expand Down Expand Up @@ -106,12 +106,30 @@ AC_ARG_WITH(cuda-driver,
)

dnl Specify GPU Arch
AC_ARG_ENABLE(gpu-arch,
AC_HELP_STRING([--enable-gpu-arch=arch], [ Set GPU arch: sm_20, sm_21, sm_30, sm_35, sm_50, sm_52 (default: sm_35)]),
[ gpu_arch=${enableval} ],
[ gpu_arch="sm_35" ]
AC_ARG_WITH(
[gpu-arch],
AC_HELP_STRING([--with-gpu-arch=arch],
[ Set GPU arch: sm_30, sm_35, sm_50, sm_52, sm_60, sm_70 (default: sm_35)]),
[ gpu_arch=${withval} ],
[ gpu_arch="sm_35" ]
)

AC_ARG_ENABLE(
[nvtx],
[AC_HELP_STRING([--enable-nvtx],
[Use NVTX profiling extensions (default=no)])],
[enable_nvtx=$enableval],
[enable_nvtx=no])
if test x$enable_nvtx = x || test x$enable_nvtx = xno; then
want_nvtx=no
LIBNVTX=
else
want_nvtx=yes
CPPFLAGS="$CPPFLAGS -DUSE_NVTX"
LIBNVTX=-lnvToolsExt
AC_MSG_NOTICE([Enabling use of NVTX])
AC_SUBST(LIBNVTX)
fi

dnl Checks for programs
AC_PROG_CC
Expand Down Expand Up @@ -169,11 +187,25 @@ dnl Checks for CUDA >= 8.0
AC_CHECK_LIB(cuda, cuStreamBatchMemOp, [],
AC_MSG_ERROR([cuStreamBatchMemOp() not found. libgdsync requires CUDA 8.0 or later.]))

dnl Checks for CUDA >= 9.0
AC_CHECK_DECLS([CU_STREAM_MEM_OP_WRITE_VALUE_64], [], [], [[#include <cuda.h>]])
AC_CHECK_DECLS([CU_STREAM_MEM_OP_WAIT_VALUE_64], [], [], [[#include <cuda.h>]])
AC_CHECK_DECLS([CU_STREAM_WAIT_VALUE_NOR], [], [], [[#include <cuda.h>]])
AC_CHECK_DECLS([CU_DEVICE_ATTRIBUTE_CAN_USE_64_BIT_STREAM_MEM_OPS], [], [], [[#include <cuda.h>]])
AC_CHECK_DECLS([CU_DEVICE_ATTRIBUTE_CAN_USE_STREAM_WAIT_VALUE_NOR], [], [], [[#include <cuda.h>]])

dnl Checks for CUDA >= 9.2
AC_CHECK_DECLS([CU_DEVICE_ATTRIBUTE_CAN_USE_STREAM_MEM_OPS], [], [], [[#include <cuda.h>]])
AC_CHECK_DECLS([CU_DEVICE_ATTRIBUTE_CAN_FLUSH_REMOTE_WRITES], [], [], [[#include <cuda.h>]])

if test x$enable_ext_memops = xyes; then
AC_CHECK_DECLS([CU_STREAM_MEM_OP_INLINE_COPY], [], [], [[#include <cuda.h>]])
AC_CHECK_DECLS([CU_STREAM_MEM_OP_WRITE_MEMORY], [], [], [[#include <cuda.h>]])
AC_CHECK_DECLS([CU_STREAM_MEM_OP_MEMORY_BARRIER], [], [], [[#include <cuda.h>]])
AC_CHECK_DECLS([CU_STREAM_MEM_OP_WRITE_VALUE_64], [], [], [[#include <cuda.h>]])
AC_CHECK_DECLS([CU_STREAM_BATCH_MEM_OP_CONSISTENCY_WEAK], [], [], [[#include <cuda.h>]])
AC_CHECK_DECLS([CU_STREAM_BATCH_MEM_OP_RELAXED_ORDERING], [], [], [[#include <cuda.h>]])
AC_CHECK_DECLS([CU_DEVICE_ATTRIBUTE_CAN_USE_STREAM_BATCH_MEMOP_RELAXED_ORDERING], [], [], [[#include <cuda.h>]])
AC_CHECK_DECLS([CU_DEVICE_ATTRIBUTE_CAN_USE_STREAM_WRITE_MEMORY], [], [], [[#include <cuda.h>]])
AC_CHECK_DECLS([CU_DEVICE_ATTRIBUTE_CAN_USE_STREAM_MEMORY_BARRIER], [], [], [[#include <cuda.h>]])
AC_CHECK_DECLS([CU_DEVICE_ATTRIBUTE_MAXIMUM_STREAM_WRITE_MEMORY_SIZE], [], [], [[#include <cuda.h>]])
fi

AC_CONFIG_FILES([Makefile libgdsync.spec])
Expand Down
64 changes: 58 additions & 6 deletions include/gdsync/core.h
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@
#endif

#define GDS_API_MAJOR_VERSION 2U
#define GDS_API_MINOR_VERSION 1U
#define GDS_API_MINOR_VERSION 2U
#define GDS_API_VERSION ((GDS_API_MAJOR_VERSION << 16) | GDS_API_MINOR_VERSION)
#define GDS_API_VERSION_COMPATIBLE(v) \
( ((((v) & 0xffff0000U) >> 16) == GDS_API_MAJOR_VERSION) && \
Expand Down Expand Up @@ -120,6 +120,7 @@ typedef enum gds_memory_type {
GDS_MEMORY_MASK = 0x7
} gds_memory_type_t;

// Note: those flags below must not overlap with gds_memory_type_t
typedef enum gds_wait_flags {
GDS_WAIT_POST_FLUSH = 1<<3,
} gds_wait_flags_t;
Expand All @@ -128,14 +129,15 @@ typedef enum gds_write_flags {
GDS_WRITE_PRE_BARRIER = 1<<4,
} gds_write_flags_t;

typedef enum gds_immcopy_flags {
GDS_IMMCOPY_POST_TAIL_FLUSH = 1<<4,
} gds_immcopy_flags_t;
typedef enum gds_write_memory_flags {
GDS_WRITE_MEMORY_POST_BARRIER_SYS = 1<<4, /*< add a trailing memory barrier to the memory write operation */
} gds_write_memory_flags_t;

typedef enum gds_membar_flags {
GDS_MEMBAR_FLUSH_REMOTE = 1<<4,
GDS_MEMBAR_DEFAULT = 1<<5,
GDS_MEMBAR_SYS = 1<<6,
GDS_MEMBAR_MLX5 = 1<<7,
} gds_membar_flags_t;

enum {
Expand Down Expand Up @@ -244,7 +246,32 @@ int gds_prepare_write_value32(gds_write_value32_t *desc, uint32_t *ptr, uint32_t



typedef enum gds_tag { GDS_TAG_SEND, GDS_TAG_WAIT, GDS_TAG_WAIT_VALUE32, GDS_TAG_WRITE_VALUE32 } gds_tag_t;
/**
* Represents a staged copy operation
* the src buffer can be reused after the API call
*/

typedef struct gds_write_memory {
uint8_t *dest;
const uint8_t *src;
size_t count;
int flags; // takes gds_memory_type_t | gds_write_memory_flags_t
} gds_write_memory_t;

/**
* flags: gds_memory_type_t | gds_write_memory_flags_t
*/
int gds_prepare_write_memory(gds_write_memory_t *desc, uint8_t *dest, const uint8_t *src, size_t count, int flags);



typedef enum gds_tag {
GDS_TAG_SEND,
GDS_TAG_WAIT,
GDS_TAG_WAIT_VALUE32,
GDS_TAG_WRITE_VALUE32,
GDS_TAG_WRITE_MEMORY
} gds_tag_t;

typedef struct gds_descriptor {
gds_tag_t tag; /**< selector for union below */
Expand All @@ -253,14 +280,39 @@ typedef struct gds_descriptor {
gds_wait_request_t *wait;
gds_wait_value32_t wait32;
gds_write_value32_t write32;
gds_write_memory_t writemem;
};
} gds_descriptor_t;

/**
* flags: must be 0
* \brief: post descriptors for peer QPs synchronized to the specified CUDA stream
*
* \param flags - must be 0
*
* \return
* 0 on success or one standard errno error
*
*/
int gds_stream_post_descriptors(CUstream stream, size_t n_descs, gds_descriptor_t *descs, int flags);

/**
* \brief: CPU-synchronous post descriptors for peer QPs
*
*
* \param flags - must be 0
*
* \return
* 0 on success or one standard errno error
*
*
* Notes:
* - This API might have higher overhead than issuing multiple ibv_post_send.
* - It is provided for convenience only.
* - It might fail if trying to access CUDA device memory pointers
*/
int gds_post_descriptors(size_t n_descs, gds_descriptor_t *descs, int flags);


/*
* Local variables:
* c-indent-level: 8
Expand Down
11 changes: 0 additions & 11 deletions include/gdsync/tools.h
Original file line number Diff line number Diff line change
Expand Up @@ -46,15 +46,4 @@ typedef struct gds_mem_desc {
int gds_alloc_mapped_memory(gds_mem_desc_t *desc, size_t size, int flags);
int gds_free_mapped_memory(gds_mem_desc_t *desc);

// flags is combination of gds_memory_type and gds_poll_flags
int gds_stream_post_poll_dword(CUstream stream, uint32_t *ptr, uint32_t magic, gds_wait_cond_flag_t cond_flag, int flags);
int gds_stream_post_poke_dword(CUstream stream, uint32_t *ptr, uint32_t value, int flags);
int gds_stream_post_inline_copy(CUstream stream, void *ptr, void *src, size_t nbytes, int flags);
int gds_stream_post_polls_and_pokes(CUstream stream,
size_t n_polls, uint32_t *ptrs[], uint32_t magics[], gds_wait_cond_flag_t cond_flags[], int poll_flags[],
size_t n_pokes, uint32_t *poke_ptrs[], uint32_t poke_values[], int poke_flags[]);
int gds_stream_post_polls_and_immediate_copies(CUstream stream,
size_t n_polls, uint32_t *ptrs[], uint32_t magics[], gds_wait_cond_flag_t cond_flags[], int poll_flags[],
size_t n_imms, void *imm_ptrs[], void *imm_datas[], size_t imm_bytes[], int imm_flags[]);

GDS_END_DECLS
Loading