E3SM-I/O case study

Build Instructions
Run Instructions
Three cases from E3SM production runs
Performance Results

This case study uses E3SM-IO to evaluate the performance of the HDF5 log-layout based VOL, compared with methods using other I/O libraries. E3SM-IO is an I/O benchmark suite that measures the performance I/O kernel of E3SM, a state-of-the-science Earth system modeling, simulation, and prediction project. The I/O patterns of E3SM captured by the E3SM's I/O module, Scorpio from the production runs, are used in the benchmark.

Build Instructions

Prerequisite
- HDF5 1.13.0, required by any HDF5 VOL
- HDF5 log-layout based VOL version 1.3.0

Clone E3SM-IO from its github repository:

  git clone https://github.com/Parallel-NetCDF/E3SM-IO.git

Configure E3SM-IO with HDF5 and log-layout based VOL features enabled.
```
  cd E3SM-IO
  autoreconf -i
  ./configure --with-hdf5=${HOME}/hdf5/1.13.0 --with-logvol=${HOME}/log_based_vol/1.3.0
```
Full configuration options are available in E3SM-IO's INSTALL.md
Compile and link
```
  make -j 64
```
The executable named src/e3sm_io will be created.

Run Instructions

Run with HDF5 log-layout based VOL as the I/O method

  mpiexec -np 16 src/e3sm_io -a hdf5_log -x log -k -o ${HOME}/e3sm_io_log datasets/f_case_866x72_16p.nc

Run with HDF5 native VOL as the I/O method

  mpiexec -np 16 src/e3sm_io -a hdf5 -x canonical -k -o ${HOME}/e3sm_io_native datasets/f_case_866x72_16p.nc

The above two commands run the small-scale F case using the data partitioning patterns, referred to as decomposition maps in Scorpio, generated from a 16-process run. This decomposition map file comes with the E3SM-IO, along with two other cases, G and I.
Information about the decomposition maps is available in datasets/README.md
Details of command-line options can be found in the E3SM-IO's INSTALL.md

Three cases from E3SM production runs

The E3SM-IO benchmark studies the I/O performance of three E3SM cases.

F case - the atmospheric component
G case - the oceanic component
I case - the land component. The F case and the I case produce two history files, referred to as h0 and h1. The G case produces only one file.

The I/O related information of our evaluations is provided in the table below.

Output file	F-H0	F-H1	G	I-H0	I-H1
Number of MPI processes	21600	21600	9600	1344	1344
Total size of data written (GiB)	14.09	6.68	79.69	86.11	0.36
Number of fixed sized variables	15	15	11	18	10
Number of record variables	399	36	41	542	542
Number of time records	1	25	1	240	1
Number of variables not partitioned	27	27	11	14	14
Number of variables partitioned	387	24	41	546	538
Number of non-contig requests	174953	83261	20888	9248875	38650
Number of attributes	1427	148	858	2789	2759

Performance Results

The performance numbers presented here compare three I/O methods used in E3SM-IO: the log-layout based VOL, PnetCDF, and ADIOS.

The PnetCDF method stores E3SM variables in files in a canonical storage layout. For each partitioned variable, each process writes multiple non-contiguous requests. The PnetCDF's non-blocking APIs are used to enabled request aggregation to improve performance. However, storing data in the canonical order requires inter-process communications in MPI collective I/O, which can be expensive. Given E3SM's data partitioning patterns containing large numbers of noncontiguous write requests, the communication cost can be very expensive. Therefore, it is expected the PnetCDF method performs slower than the log-layout based VOL, which stores data in the log layout, requiring no inter-process communications.

Scorpio implements an I/O option to use ADIOS library to write data. In Scorpio's implementation, each process stores the write data in ADIOS's local variables by appending one write request data after another. These local variables are only a collection of data blocks without any ADIOS metadata describing their logical location. Instead, Scorpio stores the metadata, such as write data's canonical location, as additional ADIOS variables, which will be used to convert the BP files to the NetCDF files later.

Evaluation on Cori at NERSC

Performance chart below shows the execution time, collected in September 2021, on Cori at NERSC. All runs were on the KNL nodes, with 64 MPI processes allocated per node.

Both Log-layout based VOL and ADIOS runs enabled their subfiling feature, which creates one file per compute node. The Lustre striping configuration is set to striping count of 8 and striping size of 1 MiB.

Evaluation on Summit at OLCF

Performance chart below shows the execution time, collected in September 2021, on Summit at OLCF. All runs allocated 84 MPI processes per node. Summit's parallel file system, Spectrum file system (GPFS), was used.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

E3SM_IO.md

E3SM_IO.md

E3SM-I/O case study

Build Instructions

Run Instructions

Three cases from E3SM production runs

Performance Results

Evaluation on Cori at NERSC

Evaluation on Summit at OLCF

Files

E3SM_IO.md

Latest commit

History

E3SM_IO.md

File metadata and controls

E3SM-I/O case study

Build Instructions

Run Instructions

Three cases from E3SM production runs

Performance Results

Evaluation on Cori at NERSC

Evaluation on Summit at OLCF