-
Notifications
You must be signed in to change notification settings - Fork 645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GPU] Add pattern to fuse tensor.collapse_shape into forall producer #19295
Conversation
PR is mostly ready, but I need to add more lit tests and improve the docs. EDIT: Ready now |
4eb7167
to
04c4f2b
Compare
04c4f2b
to
5639b90
Compare
5639b90
to
688e8cb
Compare
@@ -228,4 +228,38 @@ def FuseForallOp : Op<Transform_Dialect, "iree.fuse_forall", | |||
let cppNamespace = "mlir::iree_compiler::IREE::transform_dialect"; | |||
} | |||
|
|||
def FuseCollapseShapeWithForallOp : Op<Transform_Dialect, "iree.fuse_collapse_shape_with_forall", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def FuseCollapseShapeIntoForallOp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments and a suggestion about how to split up the pattern a bit. If you can wait for #19231 we can reuse the pattern there for swapping expand_shape with extract_slice also.
compiler/src/iree/compiler/Codegen/Dialect/GPU/Transforms/Transforms.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Dialect/GPU/Transforms/Transforms.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Dialect/GPU/Transforms/Transforms.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Dialect/GPU/TransformExtensions/IREEGPUExtensionsOps.td
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Dialect/GPU/Transforms/Transforms.cpp
Outdated
Show resolved
Hide resolved
688e8cb
to
102bc5a
Compare
This PR now depends on #19231. The tile_and_fuse pipeline test is failing because the pattern from that PR is needed here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this ready for another review or does it need a rebase first?
I am going to send another separate PR that this should rebase on (to fix the test failure), but the code in here shouldn't change, so it can be reviewed now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, LGTM
compiler/src/iree/compiler/Codegen/Dialect/GPU/Transforms/Transforms.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/LLVMGPU/test/ROCDL/pipeline_tile_and_fuse.mlir
Show resolved
Hide resolved
82d6046
to
2ba3856
Compare
Rebasing and preparing to land this PR now after finishing initial data tiling op codegen prototype. I'll wait a bit in case there are comments. |
e9ab180
to
6425acc
Compare
Based on #19730 now Edit: Rebased now |
This PR moves the `SwapExpandShapeWithSlicePattern` to Codegen/Common/Transforms, and adds the pattern to the FuseAndHoistParallelLoops pass. This pattern is generally useful for tiling fusion, because it exposes more producer fusion opportunities when there are reshapes in the IR, but more specifically, it is useful in combination with the pattern introduced in #19295. That pattern creates an expanded parallel_insert_slice, and an expand_shape on the corresponding init block arg in the forall loop body. This makes the slice on the init argument lower dimensional than the parallel_insert_slice at the end. It is better for bufferization if these slices are the same, and this pattern makes that happen by bubbling the slice of the init arg up through the expand_shape, increasing the dimensionality to match the parallel_insert_slice. --------- Signed-off-by: Max Dawkins <[email protected]>
13fc8f9
to
da68bc2
Compare
Signed-off-by: Max Dawkins <[email protected]>
da68bc2
to
8d74baa
Compare
…19296) This PR adds a pattern to fuse a consumer tensor.extract_slice into a producer scf.forall op. The transform is added to FuseAndHoistParallelLoops, where it helps to fuse tensor.unpack ops with extract_slice semantics into producer loops. This is needed when targeting MFMA intrinsics for unaligned shapes, and also in generating code for unset encoding ops on GPU. This is a follow up to #19295, which has the complementing pattern for collapse_shape. The PR also adds a transform op to keep the long lit tests separate from the FuseAndHoistParallelLoop tests. --------- Signed-off-by: Max Dawkins <[email protected]> Signed-off-by: Max Dawkins <[email protected]> Co-authored-by: Max Dawkins <[email protected]>
This PR adds a pattern to fuse a consumer tensor.collapse_shape into a producer scf.forall op. The transform is added to FuseAndHoistParallelLoops, where it helps to fuse tensor.unpack ops with extract_slice semantics into producer loops. This is needed when targeting MFMA intrinsics for unaligned shapes, and also in generating code for unset encoding ops on GPU.
The PR also adds a transform op to keep the long lit tests separate from the FuseAndHoistParallelLoop tests.