Memory bandwidth limit #4
-
Hello, thanks for the very nice example and optimization guide. I do not fully understand your reasoning in part 1 to establish an upper bound for the memory bandwidth. Starting from kernel 1, we see an actual fetch size double the theoretical fetch size. Ciao, |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
Hi Manual, Thanks for your response. To answer your questions:
Hope this helps, |
Beta Was this translation helpful? Give feedback.
-
Thanks for the clarification. I understand your motivation, and it gives me more confidence I have grasped some basics about performance tuning on this device. Thanks! |
Beta Was this translation helpful? Give feedback.
Hi Manual,
Thanks for your response. To answer your questions:
Yes it should be 30.6% thanks for catching
Part 1 does not claim that we cannot get closer to 100% peak HBM bandwidth. The reported
1165 GB/s
, which happens to be 71% of the HBM peak, is a loose target one should minimally achieve if reduction of the memory traffic is the sole optimization goal. In the subsequent parts, there are cases where the FOM exceeds1165 GB/s
. We have deliberately avoided making any strong performance claims, as achieving the highest possible FOM will depend on a variety of factors e.g., ROCm versions, system configurations, and other optimization tricks not covered in this Laplacian series. That s…