Benchmark Data

Accelerator Device Project - Benchmark Data

methodology

Used gr-bench approach and scripts
Tests were run using CUDA loopback blocks in the flow graph shown above for each of the following three cases:
- stock GR 3.9 + legacy (double copy) loopback - shown in blue in the graphs below
- ngsched + legacy (double copy) loopback - shown in orange in the graphs below
- ngsched + single mapped custom buffer - shown in green in the graphs below
Each test case iterated over various values for "veclen" (batch size) and number of loopback blocks
- "veclen" (batch size) values: 1024, 2048, 4096, 8192, 16384, 32768
- number of loopback blocks value: 1, 2, 4, 16
Each test case was run 10 times
Each plot below shows execution time plotted against veclen. Note an equivalent plot could be made showing throughput (MB/s) vs. veclen.
Total data copied was 100,000,000 * 8 byte gr_complex values for a total of 800,000,000 bytes (~762.94 MB)