libdying
is a mixture of 1) a failure emulator and 2) an implementation
of corrected broadcast algorithms that can use either Gossip or trees in
their dissemination. The library is designed to interpose MPI calls with
the help of the PMPI interface. Calls to MPI_Bcast will then be replaced
by a call to the library's own broadcast.
"Dead nodes" are emulated by not calling the broadcast function for them. Thus they do not participate in the algorithm at all. All other (non-fault-aware) MPI functions are called normally by all ranks. Not doing so would result in a deadlock as the MPI library is not able to handle faults and would wait indefinitely for the missing ranks.
- CMake >= 3.1
- Python
- Compiler with support for C99
- MPI library
mkdir build
cd build
cmake .. -DCMAKE_BUILD_TYPE=Debug
make
The library is designed for being preloaded to an MPI application via the
LD_PRELOAD
mechanism. It is controlled by various environment variables:
-
CORRT_DIST
— correction distance -
CORRT_COUNT_MAX
— maximum supported messge size -
CORRT_DISS_TYPE
— dissemination typegossip
— Gossip, based on hops instead of time (see paper)tree_binomial
— interleaved binomial treetree_lame
— interleaved Lamé treetree_binomial_in_order
— non-interleaved binomial tree
-
CORRT_GOSSIP_SEED
— random seed used to determine Gossip partners (used only with Gossip) -
CORRT_GOSSIP_ROUNDS
— number of hops a message travels in Gossip (used only with Gossip) -
TREE_LAME_K
— order of the Lamé tree (only used with Lamé tree) -
DYING_LIST
— comma-seperated list of ranks that "should die"
Please note that the broadcast implementation is currently considered a prototype only. It overwrites the first byte of the payload data to store an epoch number so different broadcasts can be told apart. For Gossip, a second byte is used for the current Gossip step. A production-grade implementation would stores these value as part of the message header.
srun --export=DYING_LIST='',CORR_DIST=2,CORR_COUNT_MAX=256,TREE_TYPE=lame,TREE_LAME_K=2,LD_PRELOAD=./build/libdying.so -n 48 ./osu-micro-benchmarks-5.4.1/mpi/collective/osu_bcast -m 256 -f -i 1000