Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GPU] Avoid fusing slices of already tiled ops #19404

Merged
merged 1 commit into from
Jan 17, 2025

Conversation

Max191
Copy link
Contributor

@Max191 Max191 commented Dec 6, 2024

This fixes a bug in FuseAndHoistParallelLoops that can cause an infinite loop for imperfectly aligned unpack producer fusion. The FuseTilableSliceProducers pattern does not check that the producer operation is not already in the tiled loop, and the tiling implementation of imperfectly aligned unpack ops generates an extract_slice on the result of the tiled unpack. This would cause an infinite recursive application of the producer fusion pattern. This fixes the bug by enforcing that the producer operation is outside of the slice's parent loop.

%c128 = arith.constant 128 : index
%c0 = arith.constant 0 : index
%0 = tensor.empty() : tensor<128xf16>
%unpack = tensor.unpack %arg0 inner_dims_pos = [0] inner_tiles = [31] into %0 : tensor<5x31xf16> -> tensor<128xf16>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When this unpack is tiled with a tile size of 64, an extract_slice must be generated on its result. This is what used to cause the infinite loop.

Copy link
Contributor

@qedawkins qedawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR has a bad smell, but the existing pattern has a worse smell :P

That said, the tiling implementation of unpack isn't great if this is the case, we probably want to think about decomposing before we get here.

@Max191
Copy link
Contributor Author

Max191 commented Dec 9, 2024

This PR has a bad smell, but the existing pattern has a worse smell :P

Yeah, the real problem is that producer fusion is happening as a pattern rewrite in FuseAndHoistParallelLoops. In the normal TilingInterface fusion, this is done with a worklist, and the problematic extract_slice is never added to the worklist.

As a side note, I think the FuseAndHoistParallelLoops pass is due for a refactoring. There are a lot of things happening in this pass, and it can be very difficult to debug/understand for those who don't already have a lot of context about the pass. Maybe we can rethink some of these smelly patterns when we eventually refactor the pass.

That said, the tiling implementation of unpack isn't great if this is the case, we probably want to think about decomposing before we get here.

This is what I'm doing now on my data tiling branch. This bug is just something that I came across in early experimentation, and I didn't want to leave it unfixed. We only need to tile unpack ops at workgroup distribution level, and then they get decomposed before any other tiling.

@Max191 Max191 force-pushed the fuse-and-hoist-avoid-tiling-tiled-ops branch from 9139220 to aef7c7e Compare January 17, 2025 15:15
@Max191 Max191 merged commit 75c9e86 into iree-org:main Jan 17, 2025
37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants