Benchmark suggestions requested #3672
-
I got AMReX built and run on a few of the tests on one or two nodes of a cluster of Cascade Lake processors. I also ran the one under Tests/Amr/Advection_AmrLevel/Exec/UniformVelocity on up to 32 nodes of the same cluster by increasing the sizes in amr.n_cell to 512x512x512. My interest is in examining the impact of the interconnection network HCA's link and width on the performance of AMReX. What would be a good benchmark to run for that purpose? Saludos, Gerardo Senior Engineer, Networking HPC Applications Performance (at NVIDIA) |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 11 replies
-
I think https://github.com/AMReX-Codes/amrex/tree/development/Tests/LinearSolvers/ABecLaplacian_C could be a good test. Multigrid solver is often the bottleneck of many AMReX applications. At large scales, the communication cost starts to dominate. You can use the setup in https://github.com/AMReX-Codes/amrex/tree/development/Tests/LinearSolvers/ABecLaplacian_C/scalingtest. 256^3 cells per GPU is probably a good target for weak scaling runs. (For example, n_cell=256 for 1 GPU and n_cell=512 for 8 GPUs.) |
Beta Was this translation helpful? Give feedback.
-
Weiqun, Thanks again. I rebuilt with TINY_PROFILE=TRUE in the GNUmakefile. Is it correct to assume the time of interest is the one reported for the LinearSolver region? For instance, on 32 nodes, Name NCalls Incl. Min Incl. Avg Incl. Max Max % REG::LinearSolver 1 16.47 16.5 16.53 98.36% Saludos, |
Beta Was this translation helpful? Give feedback.
I think https://github.com/AMReX-Codes/amrex/tree/development/Tests/LinearSolvers/ABecLaplacian_C could be a good test. Multigrid solver is often the bottleneck of many AMReX applications. At large scales, the communication cost starts to dominate. You can use the setup in https://github.com/AMReX-Codes/amrex/tree/development/Tests/LinearSolvers/ABecLaplacian_C/scalingtest. 256^3 cells per GPU is probably a good target for weak scaling runs. (For example, n_cell=256 for 1 GPU and n_cell=512 for 8 GPUs.)
To use GPU aware MPI, run with
amrex.use_gpu_aware_mpi=1
.