- Build the repository with
cmake . make all
- Edit the config files
configGPU.ini for GPUOffloading configTaskTree.ini for TaskTree config.ini for everything else
- Run the benchmark. The config file is automatically detected if it's in the same directory
- Look at the output with Extra-P or in the stdout
This make file is automatically created by cmake with the CMakeList.txt, which uses the system's standard compiler. If you want to set a different compiler, you can define it and build the project the following way:
export CXX=/path/to/compiler cmake /path/to/this/repo
make all
If the compiler is set up correctly with all functionality needed for offloading code for your GPU, then the compilation process is no different from those described above.
Parameter | Description |
---|---|
Repetitions | Space separated list of the number of repetitions of each microbenchmark |
Threads | Space separated list of the number of threads to be used for benchmarking |
Iterations | Space separated list of the number of times the workload should be repeated (for taskbench this is the number of tasks) |
Workload | Space separated list of the amount of workload in each loop iteration |
Directive | The number of times a directive should be repeated |
ExtraP | Whether the output should be saved as a Extra-P-readable JSON format |
Quiet | If set to true, the results will not get printed to stdout |
EPCC | [EXPERIMENTAL] Enables EPCC-style overhead calculations. BEWARE: Some microbenchmarks behave differently than EPCC and therefore have different results |
ClampLow | Clamp low/negative overheads to 1.0 (ExtraP does not like values less than 1) |
EmptyParallelRegion | Add an empty parallel region before the benchmark is run. (Most OpenMP implementations have more overhead when creating threads for the first time.) |
Parameter | Description |
---|---|
Repetitions | Space separated list of the number of repetitions of each microbenchmark |
Threads | Space separated list of the amount of threads per team on the GPU |
Teams | Space separated list of the amount of teams on the GPU |
Iterations | Space separated list of the amount of times the workload should be repeated on the GPU |
ArraySizes | Space separated list of the amount of floats that should be transferred to the GPU |
Workload | Space separated list of the amount of workload in each iteration |
Reference | The amount of CPU threads the GPU overhead is calculated with (default = max threads) |
Directive | The amount of times a directive should be repeated |
ExtraP | Whether the output should be saved as a ExtraP-readable JSON format |
Quiet | If set to true the results wont get printed to stdout |
Parameter | Description |
---|---|
Repetitions | Space separated list of the number of repetitions of each microbenchmark |
Children | Space separated list of the amount of children that each node creates |
Tasks | Space separated list of the total amount of tasks that should be created |
Workload | Space separated list of the amount of workload for each task |
ExtraP | Whether the output should be saved as a ExtraP-readable JSON format |
Quiet | If set to true the results wont get printed to stdout |
Repetitions=10
Threads=16 12 8 4 2
Iterations=100 1000 2000 4000 5000
Workload=10 50 100 200 500 1000
Directive=5
ExtraP=true
Quiet=false
EPCC=false
Repetitions=10
Threads=10
Teams=5
Iterations=10
ArraySizes=1000
Workload=100
Reference=8
ExtraP=true
Quiet=true
Repetitions=10
Children=4 2 1
Tasks=10 20 1000
Workload=1 20 50
ExtraP=true
Quiet=true
- Seyed Ali Mohammadi, Lukas Rothenberger, Gustavo de Morais, Bertin Nico Görlich, Erik Lille, Hendrik Rüthers, and Felix Wolf. 2023. Filtering and Ranking of Code Regions for Parallelization via Hotspot Detection and OpenMP Overhead Analysis. In Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis (SC-W '23). Association for Computing Machinery, New York, NY, USA, 1368–1379. https://doi.org/10.1145/3624062.3624206
Please cite in your publications if it helps your research:
@inproceedings{10.1145/3624062.3624206,
author = {Mohammadi, Seyed Ali and Rothenberger, Lukas and de Morais, Gustavo and G\"{o}rlich, Bertin Nico and Lille, Erik and R\"{u}thers, Hendrik and Wolf, Felix},
title = {Filtering and Ranking of Code Regions for Parallelization via Hotspot Detection and OpenMP Overhead Analysis},
year = {2023},
isbn = {9798400707858},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3624062.3624206},
doi = {10.1145/3624062.3624206},
booktitle = {Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis},
pages = {1368–1379},
numpages = {12},
keywords = {parallelization overhead, expected benefits, OpenMP microbenchmarks, Hotspot detection, ranking, performance analysis},
location = {Denver, CO, USA},
series = {SC-W '23}
}