Skip to content

Benchmark Data

David Sorber edited this page Aug 25, 2021 · 1 revision

Accelerator Device Project - Benchmark Data

Methodology and Description

methodology

  • Used gr-bench approach and scripts
  • Tests were run using CUDA loopback blocks in the flow graph shown above for each of the following three cases:
    • stock GR 3.9 + legacy (double copy) loopback - shown in blue in the graphs below
    • ngsched + legacy (double copy) loopback - shown in orange in the graphs below
    • ngsched + single mapped custom buffer - shown in green in the graphs below
  • Each test case iterated over various values for "veclen" (batch size) and number of loopback blocks
    • "veclen" (batch size) values: 1024, 2048, 4096, 8192, 16384, 32768
    • number of loopback blocks value: 1, 2, 4, 16
  • Each test case was run 10 times
  • Each plot below shows execution time plotted against veclen. Note an equivalent plot could be made showing throughput (MB/s) vs. veclen.
  • Total data copied was 100,000,000 * 8 byte gr_complex values for a total of 800,000,000 bytes (~762.94 MB)

Results

Dell XPS 15 laptop + NVidia GTX 1650 GPU

  • Dell XPS 15 laptop
    • Intel i9-10885H (8 cores/16 threads)
    • 32 GB DDR4-2933
    • NVidia GTX 1650 GPU

xps15_1 xps15_2

SuperMicro SM-X11DGQ Server + NVidia P100

  • SuperMicro SM-X11DGQ Server
    • 2x Intel Gold 6148 (20 cores/40 threads each)
    • 512 GB DDR4-
    • NVidia P100 GPU

sm-x11dgq_1 sm-x11dgq_2

NVidia Jetson AGX Xavier

  • NVidia Jetson AGX Xavier
    • 8-core ARM v8.2 CPU
    • 32GB 256-Bit LPDDR4x
    • 512-core Volta GPU with Tensor Cores

xavier_1 xavier_2