[NVIDIA][Backend] Add CoalesceAsyncCopy Pass for in-DotOpEnc Upcasting #5222

ggengnv · 2024-11-21T21:53:25Z

This is a follow-up to the dotOp hoisting optimization for WGMMA (MMAv3). See #5003 (comment)

In short, when upcasting operand A in registers prior to WGMMA and when pipelining is enabled, AsyncCopyGLobalToLocal's src gmem blocked encoding will have sizePerThread > smem view's vec (along the contiguous dimension). This will resulting in multiple cp.async instructions being generated for a contiguous global data segment, resulting in uncoalesced loads. This was previously confirmed in ncu. See above comment for an example.

I've added a generalized fix in a new pass after the pipeliner. I've reused the logic in the LLVM lowering for AsyncCopyGlobalToLocal to calculate the max contiguous copy size. I compare that to the blockEnc's sizePerThread along the inner (contiguous) dimension. If the former is less than latter, I set the latter to former.

When A is k-major, can verify a small perf improvement and that ncu no longer reports uncoalesced loads.
When A is m-major, this pass is a no-op because copy size == sizePerThread == 16

ptal, thanks @ThomasRaoux

ThomasRaoux

looks good, few minor comments

ThomasRaoux · 2024-11-22T01:21:40Z

lib/Dialect/TritonGPU/Transforms/CoalesceAsyncCopy.cpp

+    Value mask = copyOp.getMask();
+    Value other = copyOp.getOther();
+    auto srcTy = cast<RankedTensorType>(src.getType());
+    auto blockEnc = cast<BlockedEncodingAttr>(srcTy.getEncoding());


you can't assume the copy will use blocked layout

ThomasRaoux · 2024-11-22T01:22:41Z

lib/Dialect/TritonGPU/Transforms/CoalesceAsyncCopy.cpp

+    // replace the asyncCopy
+    auto newCopyOp = rewriter.create<AsyncCopyGlobalToLocalOp>(
+        copyOp.getLoc(), src, copyOp.getResult(), mask, other,
+        copyOp.getCache(), copyOp.getEvict(), copyOp.getIsVolatile());
+    rewriter.replaceOp(copyOp, newCopyOp);


nit, you could do in place update

ThomasRaoux · 2024-11-22T01:25:00Z

lib/Dialect/TritonGPU/Transforms/CoalesceAsyncCopy.cpp

+#include "mlir/Support/LLVM.h"
+#include "mlir/Transforms/Passes.h"
+#include "triton/Analysis/Utility.h"
+#include "triton/Conversion/TritonGPUToLLVM/Utility.h"


nit: this is a bit of a layering violation, getRegToSharedLayout probably belongs to triton gpu dialect utils.

ggengnv · 2024-11-22T18:32:48Z

Addressed comments - moved util to lib/Dialect/Transforms/Utility.cpp

ggengnv requested a review from ptillet as a code owner November 21, 2024 21:53

ggengnv added 3 commits November 21, 2024 21:56

Add coalesce async copy pass

dc94f69

Make logic more general

14858e4

Document and format

f1af158

ggengnv force-pushed the coalesce-upcast branch from 9a9fcb0 to f1af158 Compare November 21, 2024 21:57

ggengnv changed the title ~~Add CoalesceAsyncCopy Pass for in-DotOpEnc Upcasting~~ [Nvidia][Backend] Add CoalesceAsyncCopy Pass for in-DotOpEnc Upcasting Nov 21, 2024

ggengnv changed the title ~~[Nvidia][Backend] Add CoalesceAsyncCopy Pass for in-DotOpEnc Upcasting~~ [NVIDIA][Backend] Add CoalesceAsyncCopy Pass for in-DotOpEnc Upcasting Nov 21, 2024

ggengnv added 2 commits November 21, 2024 22:24

Fix random typo

2124a06

Move memdesc to ttg in lit test

5b0f4ad

ThomasRaoux reviewed Nov 22, 2024

View reviewed changes

Address comments

d3a50e9

Fix bug and add test

82e9f63

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NVIDIA][Backend] Add CoalesceAsyncCopy Pass for in-DotOpEnc Upcasting #5222

[NVIDIA][Backend] Add CoalesceAsyncCopy Pass for in-DotOpEnc Upcasting #5222

ggengnv commented Nov 21, 2024

ThomasRaoux left a comment

ThomasRaoux Nov 22, 2024

ThomasRaoux Nov 22, 2024

ThomasRaoux Nov 22, 2024

ggengnv commented Nov 22, 2024

[NVIDIA][Backend] Add CoalesceAsyncCopy Pass for in-DotOpEnc Upcasting #5222

Are you sure you want to change the base?

[NVIDIA][Backend] Add CoalesceAsyncCopy Pass for in-DotOpEnc Upcasting #5222

Conversation

ggengnv commented Nov 21, 2024

ThomasRaoux left a comment

Choose a reason for hiding this comment

ThomasRaoux Nov 22, 2024

Choose a reason for hiding this comment

ThomasRaoux Nov 22, 2024

Choose a reason for hiding this comment

ThomasRaoux Nov 22, 2024

Choose a reason for hiding this comment

ggengnv commented Nov 22, 2024