How to handle different devices / autotune (shared memory sizes) #248
blefaudeux
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm wondering about a Tritonic way to handle different classes of devices, P100/V100/A100 for instance. In the excellent Matmul example there's a practical example of that, since some autotune configuration will OOM on a P100 and produce a useful
It's not a big issue given that removing the biggest block sizes fixes it, and one could query at runtime the cuda device being present and adjust the presets accordingly, but I was wondering whether building that into Triton langua would make sense (or maybe that it's already there and I missed it)
Beta Was this translation helpful? Give feedback.
All reactions