Suppose you will put the source codes under $HOME directory.
cd $HOME
git clone --recursive
The following modules can be used on ECRC systems:
mkl/2018-update-1 gcc/5.5.0 cmake/3.17.3 openmpi/3.0.0-gcc-5.5.0
DPLASMA as well as PaRSEC which is a submodule in DPLASMA, HCORE and STARS-H are required. These libraries are provided as submodules of this repository so use these submodules for installation.
git submodule update --init --recursive
can be used to get the submodules.
HCORE and STARS-H are manually installed as mentioned in the following subsections. But DPLASMA and HiCMA are installed together using a single command as will be mentoned in the next section.
A sample installation of HCORE:
cd $HOME/hicma-x-dev
cd hcore && mkdir build && cd build
cmake .. -DCMAKE_INSTALL_PREFIX=`pwd`/installdir
make -j install
export PKG_CONFIG_PATH=$HOME/hicma-x-dev/hcore/build/installdir/lib/pkgconfig:$PKG_CONFIG_PATH
STARS-H can be installed following the instructions at Make sure to export PKG_CONFIG_PATH
A sample installation of STARS-H:
cd $HOME/hicma-x-dev
cd stars-h && mkdir build && cd build
make -j install
export PKG_CONFIG_PATH=$HOME/hicma-x-dev/stars-h/build/installdir/lib/pkgconfig:$PKG_CONFIG_PATH
Submodules must be updated via git submodule update --init --recursive
cd $HOME/hicma-x-dev
mkdir -p build && cd build
cmake ..
In addition, if Intel compiler is used, add -DCMAKE_Fortran_FLAGS="-nofor-main"
A sample configuration for Shaheen II:
Run make for compilation. If following command fails, try removing -j 8
make -j 8
Go to HiCMA folder:
cd hicma_parsec
Running examples:
mpirun -np 4 -npernode 1 ./testing_dpotrf_tlr -N 2700 -t 270 -e 1e-8 -u 130 -D 2 -P 2 -v
numerical correctness checking enabled:
mpirun -n 4 -npernode 1 ./testing_dpotrf_tlr -N 2700 -t 270 -e 1e-8 -u 130 -D 2 -P 2 -v --check
mpirun -np 4 --npernode 1 ./testing_dpotrf_tlr -N 108000 -t 2700 -e 1e-8 -u 1200 -D 4 -P 2 -v -- -mca runtime_comm_coll_bcast 0
3-flow version:
mpirun -n 4 ./testing_dpotrf_tlr -N 2700 -t 270 -e 1e-8 -u 130 -D 2 -P 2 -v -E 0 -Z 1 --check
-N: matrix size; required
-t: tile size; required
-e: accuracy threshold; default: 1.0e-8
-u: maxrank threshold for compressed tiles; default: tile_size/2
-P: row process grid; default: number_of_nodes
-D: kind of problem: default: 2
-v: print more info
More information:
./testing_dpotrf_tlr --help
Additional PaRSEC flags:
./testing_dpotrf_tlr -- --help
(1) if the problem is a little dense, i.e., band_size > 1 after auto-tuning (e.g., in statistics-3d-sqexp application with accuracy threshold -e 1.0e-8), "-- -mca runtime_comm_coll_bcast 0" is needed for better performance;
(2) Set argument -c to number_of_cores - 1;
(3) Choose the process grid to be as square as possible with P < Q;
(4) in most cases,
for -D 2 (statistics-2d-sqexp), set maxrank= 150;
for -D 3 (statistics-3d-sqexp), set maxrank= 500;
for -D 4 (statistics-3d-exp), set maxrank= tile_size / 2.
Q. Cao, Y. Pei, T. Herault, K. Akbudak, A. Mikhalev, G. Bosilca, H. Ltaief, D. E. Keyes, and J. Dongarra, Performance Analysis of Tile Low-Rank Cholesky Factorization Using PaRSEC Instrumentation Tools, 2019 IEEE/ACM International Workshop on Programming and Performance Visualization Tools (ProTools), Denver, CO, USA, 2019, pp. 25-32.
Q. Cao, Y. Pei, K. Akbudak, A. Mikhalev, G. Bosilca, H. Ltaief, D. E. Keyes, and J. Dongarra, Extreme-Scale Task-Based Cholesky Factorization Toward Climate and Weather Prediction Applications, The Platform for Advanced Scientific Computing (PASC 2020).
Q. Cao, Y. Pei, K. Akbudak, G. Bosilca, H. Ltaief, D. E. Keyes, and J. Dongarra, Leveraging PaRSEC Runtime Support to Tackle Challenging 3D Data-Sparse Matrix Problems, IEEE International Parallel & Distributed Processing Symposium (IPDPS 2021).