[Stream] Add layouts to encodings for all stream tensor AffinityOp. #19726

hanhanW · 2025-01-17T08:51:31Z

The revision adds the support for the rest of AffinityOp that have TensorPhase trait, i.e., TensorCloneOp, TensorSliceOp, TensorFillOp, and TensorUpdateOp ops. There are two stream tensor ops do not implement the AffinityOpInterface, so they are not supported within the revision. They are stream.tensor.load op and stream.tensor.store op. We should be able to track the resource affinity for these two ops, and it requires additional analysis. Thus, they are not scoped within the revision.

The revision also adds the missing documentation to the addLayoutsToTensorPhaseOps method.

The revision adds the support for the rest of AffinityOp that have TensorPhase trait, i.e., TensorCloneOp, TensorSliceOp, TensorFillOp, and TensorUpdateOp ops. There are two stream tensor ops do not implement the AffinityOpInterface, so they are not supported within the revision. They are stream.tensor.load op and stream.tensor.store op. We should be able to track the resource affinity for these two ops, and it requires additional analysis. Thus, they are not scoped within the revision. The revision also adds the missing documentation to the `addLayoutsToTensorPhaseOps` method. Signed-off-by: hanhanW <[email protected]>

benvanik

Hmm transfer ops are ones where we can't really have encoding changes or possibly even encodings at all - we could at the cost of never hitting the DMA path with them, and that's probably bad. We may have to brainstorm on that -- today I'd say we need to add a trait (SameOperandsAndResultEncodings) to these ops and verify that they can be implemented with memcpy but I'm not sure even that is enough. Basically, if any transfer op besides clone (which is "entire tensor") has an encoding we have to generate a dispatch (codegen it, bundle it with binary, and ship it) for that specific set of operand and result encodings and those will run on execution units instead of DMA ones (as we can't cuMemcpy a slice of an encoded tensor if that has funny padding, or can't have cuMemcpy swizzle/change encodings for us). If you were to run any code doing this today with this change it will be incorrect (unless it's an entire tensor and the same operand and result encoding).

Such conversions to dispatches happen in MaterializeBuiltinsPass - e.g., if you try to fill with an i64 value we have to use a dispatch as GPUs don't support i64 fills. That pass would need to look at any of the ops, decide if they can be implemented with memcpy/memset, and if not emit new executables with the custom load/store with the encodings.

We may have to look at some IR and see if we can handle that better earlier on - after MaterializeBuiltinsPass we have to have everything be either supportable by devices or turned into dispatches, but that's pretty late in the flow and we want to try to guarantee that we rarely (if ever) end up down those paths (we're properly fusing with producers/consumers).

IOW I'm worried this may not be implementable - we need to iterate a bit.

benvanik · 2025-01-17T16:23:30Z

compiler/src/iree/compiler/Dialect/Stream/Transforms/SpecializeEncodings.cpp

+  return success();
+}
+
+/// Updates the update_encoding for `op`. The op have to define a


Suggested change

/// Updates the update_encoding for `op`. The op have to define a

/// Updates the update_encoding for `op`. The op has to define a

benvanik · 2025-01-17T16:23:36Z

compiler/src/iree/compiler/Dialect/Stream/Transforms/SpecializeEncodings.cpp

@@ -141,6 +159,52 @@ updateResultEncoding(RewriterBase &rewriter, OpTy op,
  return success();
 }

+/// Updates the target_encoding for `op`. The op have to define a


Suggested change

/// Updates the target_encoding for `op`. The op have to define a

/// Updates the target_encoding for `op`. The op has to define a

benvanik · 2025-01-17T16:29:12Z

compiler/src/iree/compiler/Dialect/Stream/Transforms/SpecializeEncodings.cpp

+/// tensor type, the method resolves the layouts, strips outdated information,
+/// and adds the resolved layouts to the encodings. The updated encodings should
+/// have enough information for other lowering transformations.
+/// TODO(hanchung): Add support for stream.tensor.load ops and


there's some TBD work to make load/store better - it'll likely require the same behavior as with the current implementation and is something we'll want to do earlier on in flow as otherwise we'll need a stream builtin that can adjust to different data types and perform the conversion (for multiple elements) or something like a switch that translates the load/store indices (for single element) on the host.

hanhanW requested a review from lialan January 17, 2025 08:51

hanhanW requested a review from benvanik as a code owner January 17, 2025 08:51

benvanik requested changes Jan 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Stream] Add layouts to encodings for all stream tensor AffinityOp. #19726

[Stream] Add layouts to encodings for all stream tensor AffinityOp. #19726

hanhanW commented Jan 17, 2025

benvanik left a comment

benvanik Jan 17, 2025

benvanik Jan 17, 2025

benvanik Jan 17, 2025

	/// Updates the update_encoding for `op`. The op have to define a
	/// Updates the update_encoding for `op`. The op has to define a

	/// Updates the target_encoding for `op`. The op have to define a
	/// Updates the target_encoding for `op`. The op has to define a

[Stream] Add layouts to encodings for all stream tensor AffinityOp. #19726

Are you sure you want to change the base?

[Stream] Add layouts to encodings for all stream tensor AffinityOp. #19726

Conversation

hanhanW commented Jan 17, 2025

benvanik left a comment

Choose a reason for hiding this comment

benvanik Jan 17, 2025

Choose a reason for hiding this comment

benvanik Jan 17, 2025

Choose a reason for hiding this comment

benvanik Jan 17, 2025

Choose a reason for hiding this comment