Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Stream] Add layouts to encodings for all stream tensor AffinityOp. #19726

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

hanhanW
Copy link
Contributor

@hanhanW hanhanW commented Jan 17, 2025

The revision adds the support for the rest of AffinityOp that have TensorPhase trait, i.e., TensorCloneOp, TensorSliceOp, TensorFillOp, and TensorUpdateOp ops. There are two stream tensor ops do not implement the AffinityOpInterface, so they are not supported within the revision. They are stream.tensor.load op and stream.tensor.store op. We should be able to track the resource affinity for these two ops, and it requires additional analysis. Thus, they are not scoped within the revision.

The revision also adds the missing documentation to the addLayoutsToTensorPhaseOps method.

The revision adds the support for the rest of AffinityOp that have
TensorPhase trait, i.e., TensorCloneOp, TensorSliceOp,
TensorFillOp, and TensorUpdateOp ops. There are two stream tensor ops do
not implement the AffinityOpInterface, so they are not supported within
the revision. They are stream.tensor.load op and stream.tensor.store op.
We should be able to track the resource affinity for these two ops, and
it requires additional analysis. Thus, they are not scoped within the
revision.

The revision also adds the missing documentation to the
`addLayoutsToTensorPhaseOps` method.

Signed-off-by: hanhanW <[email protected]>
@hanhanW hanhanW requested a review from lialan January 17, 2025 08:51
@hanhanW hanhanW requested a review from benvanik as a code owner January 17, 2025 08:51
Copy link
Collaborator

@benvanik benvanik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm transfer ops are ones where we can't really have encoding changes or possibly even encodings at all - we could at the cost of never hitting the DMA path with them, and that's probably bad. We may have to brainstorm on that -- today I'd say we need to add a trait (SameOperandsAndResultEncodings) to these ops and verify that they can be implemented with memcpy but I'm not sure even that is enough. Basically, if any transfer op besides clone (which is "entire tensor") has an encoding we have to generate a dispatch (codegen it, bundle it with binary, and ship it) for that specific set of operand and result encodings and those will run on execution units instead of DMA ones (as we can't cuMemcpy a slice of an encoded tensor if that has funny padding, or can't have cuMemcpy swizzle/change encodings for us). If you were to run any code doing this today with this change it will be incorrect (unless it's an entire tensor and the same operand and result encoding).

Such conversions to dispatches happen in MaterializeBuiltinsPass - e.g., if you try to fill with an i64 value we have to use a dispatch as GPUs don't support i64 fills. That pass would need to look at any of the ops, decide if they can be implemented with memcpy/memset, and if not emit new executables with the custom load/store with the encodings.

We may have to look at some IR and see if we can handle that better earlier on - after MaterializeBuiltinsPass we have to have everything be either supportable by devices or turned into dispatches, but that's pretty late in the flow and we want to try to guarantee that we rarely (if ever) end up down those paths (we're properly fusing with producers/consumers).

IOW I'm worried this may not be implementable - we need to iterate a bit.

return success();
}

/// Updates the update_encoding for `op`. The op have to define a
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Updates the update_encoding for `op`. The op have to define a
/// Updates the update_encoding for `op`. The op has to define a

@@ -141,6 +159,52 @@ updateResultEncoding(RewriterBase &rewriter, OpTy op,
return success();
}

/// Updates the target_encoding for `op`. The op have to define a
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Updates the target_encoding for `op`. The op have to define a
/// Updates the target_encoding for `op`. The op has to define a

/// tensor type, the method resolves the layouts, strips outdated information,
/// and adds the resolved layouts to the encodings. The updated encodings should
/// have enough information for other lowering transformations.
/// TODO(hanchung): Add support for stream.tensor.load ops and
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's some TBD work to make load/store better - it'll likely require the same behavior as with the current implementation and is something we'll want to do earlier on in flow as otherwise we'll need a stream builtin that can adjust to different data types and perform the conversion (for multiple elements) or something like a switch that translates the load/store indices (for single element) on the host.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants