Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GPU] Add pattern to fuse tensor.collapse_shape into forall producer #19295

Merged
merged 1 commit into from
Jan 22, 2025

Conversation

Max191
Copy link
Contributor

@Max191 Max191 commented Nov 25, 2024

This PR adds a pattern to fuse a consumer tensor.collapse_shape into a producer scf.forall op. The transform is added to FuseAndHoistParallelLoops, where it helps to fuse tensor.unpack ops with extract_slice semantics into producer loops. This is needed when targeting MFMA intrinsics for unaligned shapes, and also in generating code for unset encoding ops on GPU.

The PR also adds a transform op to keep the long lit tests separate from the FuseAndHoistParallelLoop tests.

@Max191
Copy link
Contributor Author

Max191 commented Nov 25, 2024

PR is mostly ready, but I need to add more lit tests and improve the docs.

EDIT: Ready now

@Max191 Max191 force-pushed the collapse-shape-forall-fusion branch from 4eb7167 to 04c4f2b Compare November 26, 2024 18:30
@Max191 Max191 marked this pull request as ready for review November 26, 2024 18:31
@Max191 Max191 force-pushed the collapse-shape-forall-fusion branch from 04c4f2b to 5639b90 Compare November 26, 2024 18:49
@Max191 Max191 force-pushed the collapse-shape-forall-fusion branch from 5639b90 to 688e8cb Compare November 26, 2024 19:55
@@ -228,4 +228,38 @@ def FuseForallOp : Op<Transform_Dialect, "iree.fuse_forall",
let cppNamespace = "mlir::iree_compiler::IREE::transform_dialect";
}

def FuseCollapseShapeWithForallOp : Op<Transform_Dialect, "iree.fuse_collapse_shape_with_forall",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def FuseCollapseShapeIntoForallOp

Copy link
Contributor

@qedawkins qedawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments and a suggestion about how to split up the pattern a bit. If you can wait for #19231 we can reuse the pattern there for swapping expand_shape with extract_slice also.

@Max191 Max191 force-pushed the collapse-shape-forall-fusion branch from 688e8cb to 102bc5a Compare December 4, 2024 20:17
@Max191
Copy link
Contributor Author

Max191 commented Dec 4, 2024

This PR now depends on #19231. The tile_and_fuse pipeline test is failing because the pattern from that PR is needed here.

@Max191 Max191 requested review from qedawkins and pashu123 December 4, 2024 21:01
Copy link
Contributor

@qedawkins qedawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this ready for another review or does it need a rebase first?

@Max191
Copy link
Contributor Author

Max191 commented Dec 9, 2024

Is this ready for another review or does it need a rebase first?

I am going to send another separate PR that this should rebase on (to fix the test failure), but the code in here shouldn't change, so it can be reviewed now.

@Max191 Max191 requested a review from qedawkins December 9, 2024 21:02
Copy link
Contributor

@qedawkins qedawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, LGTM

@Max191 Max191 force-pushed the collapse-shape-forall-fusion branch from 82d6046 to 2ba3856 Compare January 16, 2025 17:52
@Max191
Copy link
Contributor Author

Max191 commented Jan 16, 2025

Rebasing and preparing to land this PR now after finishing initial data tiling op codegen prototype. I'll wait a bit in case there are comments.
cc @qedawkins

@Max191 Max191 force-pushed the collapse-shape-forall-fusion branch from e9ab180 to 6425acc Compare January 17, 2025 16:12
@Max191 Max191 requested a review from hanhanW as a code owner January 17, 2025 16:12
@Max191
Copy link
Contributor Author

Max191 commented Jan 17, 2025

Based on #19730 now

Edit: Rebased now

Max191 added a commit that referenced this pull request Jan 21, 2025
This PR moves the `SwapExpandShapeWithSlicePattern` to
Codegen/Common/Transforms, and adds the pattern to the
FuseAndHoistParallelLoops pass.

This pattern is generally useful for tiling fusion, because it exposes
more producer fusion opportunities when there are reshapes in the IR,
but more specifically, it is useful in combination with the pattern
introduced in #19295. That pattern
creates an expanded parallel_insert_slice, and an expand_shape on the
corresponding init block arg in the forall loop body. This makes the
slice on the init argument lower dimensional than the
parallel_insert_slice at the end. It is better for bufferization if
these slices are the same, and this pattern makes that happen by
bubbling the slice of the init arg up through the expand_shape,
increasing the dimensionality to match the parallel_insert_slice.

---------

Signed-off-by: Max Dawkins <[email protected]>
@Max191 Max191 force-pushed the collapse-shape-forall-fusion branch 3 times, most recently from 13fc8f9 to da68bc2 Compare January 21, 2025 19:00
@Max191 Max191 force-pushed the collapse-shape-forall-fusion branch from da68bc2 to 8d74baa Compare January 21, 2025 20:29
@Max191 Max191 merged commit a430695 into iree-org:main Jan 22, 2025
40 checks passed
Max191 added a commit that referenced this pull request Jan 28, 2025
…19296)

This PR adds a pattern to fuse a consumer tensor.extract_slice into a
producer scf.forall op. The transform is added to
FuseAndHoistParallelLoops, where it helps to fuse tensor.unpack ops with
extract_slice semantics into producer loops. This is needed when
targeting MFMA intrinsics for unaligned shapes, and also in generating
code for unset encoding ops on GPU. This is a follow up to
#19295, which has the complementing
pattern for collapse_shape.

The PR also adds a transform op to keep the long lit tests separate from
the FuseAndHoistParallelLoop tests.

---------

Signed-off-by: Max Dawkins <[email protected]>
Signed-off-by: Max Dawkins <[email protected]>
Co-authored-by: Max Dawkins <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants