In this benchmark, we deploy containerized GPU applications on a cluster node, On the same node, we're going to submit LULESH and MILC as a traditional batch job.
This benchmark has been tested and evaluated on Piz Daint system with SLURM and node sharing enabled.
-
Build extended rFaaS with remote memory functions by running
build.sh
insrc/remote_memory
. To configure rFaaS dependencies, please inspect the README of the package. It should be sufficient to setPKG_CONFIG_PATH
environment variable to point to installation directories ofpistache
,rdmacm
, andibverbs
. -
Build LULESH from sources in
external/LULESH
(build scripts are provided). -
Build MILC from sources in
external/MILC
(build scripts are provided).
-
SSH to the node where you're going to deploy the co-located batch job and remote memory function.
-
Verify that Docker is running correctly on the system.
-
Create a copy of
devices.json
, and change in it the IP address of the network interface used for communication. -
Start the rFaaS executor with
<rfaas-build-dir>/bin/executor_manager --config exec_manager.json --device-database devices_copy.json --skip-resource-manager
.
-
SSH to the second node.
-
Create a copy of
devices.json
, and change in it the IP address of the network interface used for communication. -
In the
exec_db.json
, change the executor address to the address of rFaaS executor obtained in the previous part. -
Start the rFaaS benchmark with
<rfaas-build-dir>/benchmarks/rma --config benchmark.json --device-database devices_copy.json --name empty --functions examples/libfunctions2.so --executors-database exec_db.json -s 1024 --read_size <size> --rdma_type <rdma-type> --pause <pause> --rma_address <executor-ip-address>
-
To repeat results from the paper, change
<size>
,<pause>
, and<rdma-type>
to use read and write operations of different size, launched with different pause between consecutive invocations.
-
In
slurm_lulesh_27.sh
, changenodelist
to use the node hosting co-located rFaaS executor and LULESH. -
Launch the co-located LULESH
sbatch < slurm_lulesh_27.sh
. -
Stop the client benchmark.