-
With the code from the matmul tutorial (commenting out the
The output is of shape [128, 32]. For rows [0, 63], the results do match exactly. Starting from row 64, there is systematic difference:
Wonder if anyone can reproduce? Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
So, I repro'ed this, and it seemed like an oddly large difference to me, until I remembered that Triton is almost certainly using tf32, and that while torch may be using tf32 (if you Comparing gemms for equality is hard :). I usually like an approach like this: https://twitter.com/bwasti/status/1621370782436687872 Basically, |
Beta Was this translation helpful? Give feedback.
So, I repro'ed this, and it seemed like an oddly large difference to me, until I remembered that Triton is almost certainly using tf32, and that while torch may be using tf32 (if you
torch.set_float32_matmul_precision("high")
), it may not be! In fact for a problem this small, I observe ampere_sgemm_* kernels in the profile, which are fp32 (not tf32) kernels.Comparing gemms for equality is hard :). I usually like an approach like this: https://twitter.com/bwasti/status/1621370782436687872 Basically,
torch.randn(shape)+1.0)/k