How to handle different devices / autotune (shared memory sizes) #248

blefaudeux · 2021-08-27T16:14:21Z

blefaudeux
Aug 27, 2021

I'm wondering about a Tritonic way to handle different classes of devices, P100/V100/A100 for instance. In the excellent Matmul example there's a practical example of that, since some autotune configuration will OOM on a P100 and produce a useful

        if shared_mem > tt_device.max_shared_memory():
>           raise  OutOfResources(shared_mem, tt_device.max_shared_memory(), "shared memory")
E           triton.code_gen.OutOfResources: out of resource: shared memoryRequired: 98304Hardware limit: 49152

It's not a big issue given that removing the biggest block sizes fixes it, and one could query at runtime the cuda device being present and adjust the presets accordingly, but I was wondering whether building that into Triton langua would make sense (or maybe that it's already there and I missed it)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to handle different devices / autotune (shared memory sizes) #248

{{title}}

Replies: 0 comments

Select a reply

How to handle different devices / autotune (shared memory sizes) #248

blefaudeux Aug 27, 2021

Replies: 0 comments

blefaudeux
Aug 27, 2021