Skip to content

Commit

Permalink
Merge pull request #24 from mlcommons/main
Browse files Browse the repository at this point in the history
Merge main to v0.5 release branch
  • Loading branch information
johnugeorge authored May 24, 2023
2 parents 2a837ba + 537f0eb commit 673843a
Show file tree
Hide file tree
Showing 8 changed files with 384 additions and 270 deletions.
3 changes: 3 additions & 0 deletions .github/CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# These owners will be the default owners for everything in the repo.
# Unless a later match takes precedence,they will be requested for review when someone opens a pull request.
* @mlcommons/wg-storage
305 changes: 177 additions & 128 deletions LICENSE.md

Large diffs are not rendered by default.

64 changes: 43 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,17 @@
# MLPerf™ Storage Benchmark Suite
MLPerf Storage is a benchmark suite to characterize the performance of storage systems that support machine learning workloads.

- [Overview](#Overview)
- [Installation](#Installation)
- [Configuration](#Configuration)
- [Workloads](#Workloads)
- [U-Net3D](#U-Net3D)
- [BERT](#BERT)
- [DLRM](#DLRM)
- [Parameters](#Parameters)
- [Releases](#Releases)
- [Overview](#overview)
- [Installation](#installation)
- [Configuration](#configuration)
- [Workloads](#workloads)
- [U-Net3D](#u-net3d)
- [BERT](#bert)
- [DLRM](#dlrm)
- [Parameters](#parameters)
- [CLOSED](#closed)
- [OPEN](#open)
- [Submission Rules](#submission-rules)
## Overview

This section describes how to use the MLPerf™ Storage Benchmark to measure the performance of a storage system supporting a compute cluster running AI/ML training tasks.
Expand Down Expand Up @@ -69,6 +71,7 @@ The working directory structure is as follows
```
|---storage
|---benchmark.sh
|---report.py
|---dlio_benchmark
|---storage-conf
|---workload(folder contains configs of all workloads)
Expand Down Expand Up @@ -165,21 +168,21 @@ For running benchmark on `unet3d` workload with data located in `unet3d_data` di
./benchmark.sh run --workload unet3d --num-accelerators 4 --results-dir unet3d_results --param dataset.data_folder=unet3d_data
```

4. Reports are generated from the benchmark results
4. Benchmark submission report is generated by aggregating the individual run results.

```bash
./benchmark.sh reportgen -h

Usage: ./benchmark.sh reportgen [options]
Generate a report from the benchmark results. Supports single host and multi host run.
Generate a report from the benchmark results.


Options:
-h, --help Print this message
-r, --results-dir Location to the results directory
```

For multi-host run, the results need to be in the following structure.
The result directory needs to be in the following structure which must include 5 runs.

```
sample-results
Expand All @@ -200,7 +203,7 @@ sample-results
|---host-n
|---summary.json
.....
|---run-n
|---run-5
|---host-1
|---summary.json
|---host-2
Expand All @@ -210,20 +213,21 @@ sample-results
|---summary.json
```

To generate multi host report,
To generate the benchmark report,

```bash
./benchmark.sh reportgen --results-dir sample-results/
```

For reference, a sample result directory structure can be found [here](https://github.com/johnugeorge/mlperf-storage-sample-results).

## Workloads
Currently, the storage benchmark suite supports benchmarking of 3 deep learning workloads
- Image segmentation using U-Net3D model ([unet3d](./storage-conf/workloads/unet3d.yaml))
- Natural language processing using BERT model ([bert](./storage-conf/workloads/bert.yaml))
- Recommendation using DLRM model (TODO)

### U-Net3D Workload
### U-Net3D

Calculate minimum dataset size required for the benchmark run

Expand All @@ -243,14 +247,14 @@ Run the benchmark.
./benchmark.sh run --workload unet3d --num-accelerators 8 --param dataset.num_files_train=3200
```

All results will be stored in ```results/unet3d/$DATE-$TIME``` folder or in the directory when overriden using `--results-dir`(or `-r`) argument. To generate the final report, one can do
All results will be stored in ```results/unet3d/$DATE-$TIME``` folder or in the directory when overridden using `--results-dir`(or `-r`) argument. To generate the final report, one can do

```bash
./benchmark.sh reportgen --results-dir results/unet3d/$DATE-$TIME
```
This will generate ```mlperf_storage_report.json``` in the output folder.

### BERT Workload
### BERT

Calculate minimum dataset size required for the benchmark run

Expand All @@ -269,21 +273,22 @@ Run the benchmark
./benchmark.sh run --workload bert --num-accelerators 8 --param dataset.num_files_train=350
```

All results will be stored in ```results/bert/$DATE-$TIME``` folder or in the directory when overriden using `--results-dir`(or `-r`) argument. To generate the final report, one can do
All results will be stored in ```results/bert/$DATE-$TIME``` folder or in the directory when overridden using `--results-dir`(or `-r`) argument. To generate the final report, one can do

```bash
./benchmark.sh reportgen -r results/bert/$DATE-$TIME
```
This will generate ```mlperf_storage_report.json``` in the output folder.


### DLRM Workload
### DLRM

To be added

## Parameters

Below table displays the list of configurable paramters for the benchmark.
### CLOSED
Below table displays the list of configurable parameters for the benchmark in the closed category.

| Parameter | Description |Default|
| ------------------------------ | ------------------------------------------------------------ |-------|
Expand All @@ -293,10 +298,27 @@ Below table displays the list of configurable paramters for the benchmark.
| dataset.data_folder | The path where dataset is stored | --|
| **Reader params** | | |
| reader.read_threads | Number of threads to load the data | --|
| reader.computation_threads | Number of threads to preprocess the data(only for bert) | --|
| reader.computation_threads | Number of threads to preprocess the data(only for Bert) | --|
| **Checkpoint params** | | |
| checkpoint.checkpoint_folder | The folder to save the checkpoints | --|
| **Storage params** | | |
| storage.storage_root | The storage root directory | ./|
| storage.storage_type | The storage type |local_fs|


### OPEN
In addition to what can be changed in the CLOSED category, the following parameters can be changed in the OPEN category.

| Parameter | Description |Default|
| ------------------------------ | ------------------------------------------------------------ |-------|
| framework | The machine learning framework |Pytorch for 3D U-Net, Tensorflow for Bert |
| **Dataset params** | | |
| dataset.format | Format of the dataset | .npz for 3D U-Net and tfrecord for Bert|
| dataset.num_samples_per_file | Number of samples per file(only for Tensorflow using tfrecord datasets) | For 3D U-Net: 1 and for Bert: 313532|
| **Reader params** |
| reader.data_loader | Data loader type(Tensorflow or PyTorch or custom) | PyTorch for 3D U-Net, and Tensorflow for Bert|
| reader.transfer_size | Number of bytes in the read buffer(only for Tensorflow) | For BERT: 262144|

## Submission Rules

MLPerf™ Storage Benchmark submission rules are described in this [doc](https://docs.google.com/document/d/1QOaCLiWb82H9cwdVX5KyeDZWt0781y4SgMQPhoij-b4/edit). If you have questions, please contact [Storage WG chairs](https://mlcommons.org/en/groups/research-storage/).
10 changes: 8 additions & 2 deletions benchmark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ WORKLOADS=("unet3d" "bert")
UNET3D_CONFIG_FILE=${CONFIG_PATH}/workload/unet3d.yaml
BERT_CONFIG_FILE=${CONFIG_PATH}/workload/bert.yaml
# Currently only "closed" category is supported
CATEGORIES=("closed")
CATEGORIES=("closed" "open")
DEFAULT_CATEGORY="closed"
CLOSED_CATEGORY_PARAMS=(
# dataset params
Expand All @@ -25,6 +25,12 @@ CLOSED_CATEGORY_PARAMS=(
OPEN_CATEGORY_PARAMS=(
# all closed params
"${CLOSED_CATEGORY_PARAMS[@]}"
# framework params
"framework"
# dataset params
"dataset.format" "dataset.num_samples_per_file"
# reader params
"reader.data_loader" "reader.transfer_size"
)
HYDRA_OUTPUT_CONFIG_DIR="configs"
EXTRA_PARAMS=(
Expand Down Expand Up @@ -272,7 +278,7 @@ configview() {

postprocess() {
local results_dir=$1
python3 report.py --result-dir $results_dir --multi-host --create-report
python3 report.py --result-dir $results_dir
}

main() {
Expand Down
Loading

0 comments on commit 673843a

Please sign in to comment.