How to set block sizes to maximize performance for matmul #1407

RuABraun · 2023-03-25T03:10:30Z

RuABraun
Mar 25, 2023

Hi,
I'm interested in optimizing a matmul operations where I know the dims (e.g. 1024x512 @ 512x2048) and the GPU involved (A100).

I know I could just empirically test a lot of different options out, but I'd like to get a better of understanding of how CUDA and triton interact so that in other cases I have a better idea of what to do (I usually work in a context where some dims are variable).

Since I know the GPU, I know the number of SMs available and the size of the caches etc. might it be possible to get a guide on what values one should pick dependent on that?
Based on this I think I should be making sure that the BLOCK_SIZE_M and BLOCK_SIZE_N (to use the vars from the tutorial) result in tiles that optimize the use of the SMs. Is that going in the right direction?

I can find a lot of info about optimizing cuda kernels, but it's difficult to understand how that translates to writing triton code. Any help with that would be much appreciated!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to set block sizes to maximize performance for matmul #1407

{{title}}

Replies: 0 comments

Select a reply

How to set block sizes to maximize performance for matmul #1407

RuABraun Mar 25, 2023

Replies: 0 comments

RuABraun
Mar 25, 2023