-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow optimizing mask conversions on x64 as well #110195
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
e9f236c
to
1cb24aa
Compare
dea61f4
to
58e20f9
Compare
// We don't actually have a convert here, but we do have a case where | ||
// the mask is being used in a ConditionalSelect and therefore can be | ||
// consumed directly as a mask. While the IR shows TYP_SIMD, it gets | ||
// handled in lowering as part of the general embedded-mask support. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other words the conditional select operations support both TYP_SIMD
and TYP_MASK
for the mask operand, right? With the TYP_SIMD
one being a 0/1 in each lane, and the TYP_MASK
one being a bit mask.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other words the conditional select operations support both TYP_SIMD and TYP_MASK for the mask operand, right?
Right.
With the TYP_SIMD one being a 0/1 in each lane, and the TYP_MASK one being a bit mask.
Rather TYP_SIMD
is Zero
or AllBitsSet
in each lane and TYP_MASK
is a compressed form being a bitmask (1-bit per element).
This comes about from SIMD comparisons returning Zero
/AllBitsSet
per lane such that it can be used with all bitmask operations, not just with conditional select or similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Also cc @a74nh for awareness
* Allow optimizing mask conversions on x64 as well * Ensure the right operand is accessed on xarch * Minimally handle CndSel as part of optimizing mask conversions * Add some additional comments and clean up the logic a bit * Apply formatting patch
* Allow optimizing mask conversions on x64 as well * Ensure the right operand is accessed on xarch * Minimally handle CndSel as part of optimizing mask conversions * Add some additional comments and clean up the logic a bit * Apply formatting patch
This mostly just extends the mask conversion optimization to light-up on x64 as well. In order to achieve that it mostly just adds in the minor different handling for the conversion cost and ensuring the right operand is accessed.
It additionally adds support for one more important scenario, which is recognizing that
ConditionalSelect
despite taking a vector in IR has special support to be lowered/contained such that the mask can be consumed directly. -- The same is technically also possible for the various bitwise operations, but those are less important to handle initially.