-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIE2P] Instruction select G_SELECT. #291
Conversation
a922878
to
a917d57
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few suggestions to refactor some code, overall this look ok
def : Pat<(vec512Ty (select (i32 eRS8:$rs1), VEC512:$rs2, VEC512:$rs3)), | ||
(vec512Ty (VSEL_32 VEC512:$rs2, VEC512:$rs3, (ADD_add_r_ri eR:$rs1, (i32 -1))))>; | ||
} | ||
foreach vec1024Ty = [v128i8, v64i16, v32i32] in { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we adapt the legalization rules for G_SELECT
and clamp to 512-bit
vector types?
Then we only need to support the 512-bit patterns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its a nice idea.
May be I have one more suggestion, since we only allow 512-bit vector types. Can we get rid of explicitly converting accumulator to vector by bitcasts. And constrain G_SELECT to vector register banks in the regbank selection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we are just delaying this action from legalizer to regbankselect?
I think nothing should pass legalizer, if we don't intend it to be legal.
a917d57
to
f8c5009
Compare
aaa8426
to
63d335f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change looks good. Few nits and a suggestion.
def : Pat<(vec512Ty (select (i32 eRS8:$rs1), VEC512:$rs2, VEC512:$rs3)), | ||
(vec512Ty (VSEL_32 VEC512:$rs2, VEC512:$rs3, (ADD_add_r_ri eR:$rs1, (i32 -1))))>; | ||
} | ||
foreach vec1024Ty = [v128i8, v64i16, v32i32] in { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its a nice idea.
May be I have one more suggestion, since we only allow 512-bit vector types. Can we get rid of explicitly converting accumulator to vector by bitcasts. And constrain G_SELECT to vector register banks in the regbank selection.
|
||
const Register NewDstReg = MRI.createGenericVirtualRegister(NewVecTy); | ||
MIRBuilder.buildInstr(MI.getOpcode(), {NewDstReg}, | ||
{SrcReg0, NewSrcReg1, NewSrcReg2}, MI.getFlags()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check: SrcReg0 is false or true, which correctly defines a vector condition for element 0, the only one we're interested in. (Perhaps give an outline of this strategy in a comment at the start of the function)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
G_SELECT only gets a 0/1 value as condition, and selects either SrcReg1 if condition is true, i.e. 1, or SrcReg2 if condition is false, i.e. 0.
AIE's instruction works slightly differently:
Each bit in sel selects either from xsrc0 (value zero) or xsrc1 (value one).
Hence we need to translate the 0/1 bit to a bitmask in instruction selection.
We do this using cond - 1
. Which will result in a zero mask if cond is true
, or an all 1 mask (-1) if cond is false
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for my confusion. We just legalize to the vector size by padding, we don't map to VSEL here.
But note that you could leave the condition as it is for vector sizes <= 32, because we have 8, 16 and 32 bit vsel.
f1b8457
to
68c7849
Compare
|
||
const Register NewDstReg = MRI.createGenericVirtualRegister(NewVecTy); | ||
MIRBuilder.buildInstr(MI.getOpcode(), {NewDstReg}, | ||
{SrcReg0, NewSrcReg1, NewSrcReg2}, MI.getFlags()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
G_SELECT only gets a 0/1 value as condition, and selects either SrcReg1 if condition is true, i.e. 1, or SrcReg2 if condition is false, i.e. 0.
AIE's instruction works slightly differently:
Each bit in sel selects either from xsrc0 (value zero) or xsrc1 (value one).
Hence we need to translate the 0/1 bit to a bitmask in instruction selection.
We do this using cond - 1
. Which will result in a zero mask if cond is true
, or an all 1 mask (-1) if cond is false
.
# RUN: llc -mtriple aie2p -run-pass=regbankselect -regbankselect-greedy %s -verify-machineinstrs -o - | FileCheck --check-prefix=FAST %s | ||
# (c) Copyright 2024-2025 Advanced Micro Devices, Inc. or its affiliates | ||
|
||
# RUN: llc -mtriple aie2p -run-pass=legalizer,regbankselect -regbankselect-fast %s -verify-machineinstrs -o - | FileCheck --check-prefix=GREEDY %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to not run the legalizer in a regbankselect test.
We can simply remove the illegal cases from this test, as long as we cover them in the legalizer test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok removed.
68c7849
to
65a1092
Compare
return QueryTy.isVector() && QueryTy.getSizeInBits() < Size; | ||
}; | ||
} | ||
|
||
LegalityPredicate | ||
negatePredicate(const std::function<bool(const LegalityQuery &)> &Func) { | ||
return [=](const LegalityQuery &Query) { return !Func(Query); }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I wouldn't dare writing this. That's a reference to a function handle abstraction captured by value into a lambda somehow bound to a LegailityPredicate return value.
@@ -236,10 +243,24 @@ AIE2LegalizerInfo::AIE2LegalizerInfo(const AIE2Subtarget &ST) : AIEHelper(ST) { | |||
|
|||
getActionDefinitionsBuilder(G_SELECT) | |||
.legalFor({{S32, S32}, {P0, S32}}) | |||
.clampScalar(1, S32, S32) | |||
// AIE ISA supports only 512-bit vector select |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is AIE2/AIE2P I guess.
.bitcastIf( | ||
[=](const LegalityQuery &Query) { | ||
const LLT &ResTy = Query.Types[0]; | ||
return ResTy.isScalar() && ResTy.getSizeInBits() >= 256; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When do we see scalars >= 256?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For AIE2, we probably don't.
const LLT DstTy = MRI.getType(DstReg); | ||
|
||
if (DstTy.isVector() && DstTy.getSizeInBits() < 512) | ||
return legalizeG_SELECTLessThan512Bit(Helper, MI); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should parameterize this with the size limit and not have 512 in the name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept it coz we only had 512bit "max" bit size possible, but I changed it now.
65a1092
to
9a2c768
Compare
|
||
if (DstTy.isVector() && DstTy.getSizeInBits() < 512) | ||
return legalizeG_SELECTWithSizeLimit(Helper, MI, 512); | ||
|
||
assert(DstTy.isVector() && DstTy.getSizeInBits() == 2048 && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a feeling that you can drop all this 2048 handling if:
- Wait for this change to be integrated: https://github.com/Xilinx/llvm-aie/blob/c01a09a08e6c1295a19373f7871da16a394c9e03/llvm/lib/Target/AIE/aie2p/AIE2PLegalizerInfo.cpp#L522C1-L558C13 (it will apply fewer elements if for unmerge with more than 2 dest regs.)
- Remove your rule
.customFor({{AccV64S32, S32}})
.
As result, .clampMaxNumElements(0, S32, 16)
that is already there will do the job for us.
This will change your test in the following way:
; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(<64 x s32>) = COPY $dm1
; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(<64 x s32>) = COPY $dm2
; CHECK-NEXT: [[UV:%[0-9]+]]:_(<32 x s32>), [[UV1:%[0-9]+]]:_(<32 x s32>) = G_UNMERGE_VALUES [[COPY1]](<64 x s32>)
- ; CHECK-NEXT: [[UV2:%[0-9]+]]:_(<32 x s32>), [[UV3:%[0-9]+]]:_(<32 x s32>) = G_UNMERGE_VALUES [[COPY2]](<64 x s32>)
- ; CHECK-NEXT: [[UV4:%[0-9]+]]:_(<16 x s32>), [[UV5:%[0-9]+]]:_(<16 x s32>) = G_UNMERGE_VALUES [[UV]](<32 x s32>)
- ; CHECK-NEXT: [[UV6:%[0-9]+]]:_(<16 x s32>), [[UV7:%[0-9]+]]:_(<16 x s32>) = G_UNMERGE_VALUES [[UV2]](<32 x s32>)
- ; CHECK-NEXT: [[SELECT:%[0-9]+]]:_(<16 x s32>) = G_SELECT [[ASSERT_ZEXT]](s32), [[UV4]], [[UV6]]
- ; CHECK-NEXT: [[SELECT1:%[0-9]+]]:_(<16 x s32>) = G_SELECT [[ASSERT_ZEXT]](s32), [[UV5]], [[UV7]]
+ ; CHECK-NEXT: [[UV2:%[0-9]+]]:_(<16 x s32>), [[UV3:%[0-9]+]]:_(<16 x s32>) = G_UNMERGE_VALUES [[UV]](<32 x s32>)
+ ; CHECK-NEXT: [[UV4:%[0-9]+]]:_(<16 x s32>), [[UV5:%[0-9]+]]:_(<16 x s32>) = G_UNMERGE_VALUES [[UV1]](<32 x s32>)
+ ; CHECK-NEXT: [[UV6:%[0-9]+]]:_(<32 x s32>), [[UV7:%[0-9]+]]:_(<32 x s32>) = G_UNMERGE_VALUES [[COPY2]](<64 x s32>)
+ ; CHECK-NEXT: [[UV8:%[0-9]+]]:_(<16 x s32>), [[UV9:%[0-9]+]]:_(<16 x s32>) = G_UNMERGE_VALUES [[UV6]](<32 x s32>)
+ ; CHECK-NEXT: [[UV10:%[0-9]+]]:_(<16 x s32>), [[UV11:%[0-9]+]]:_(<16 x s32>) = G_UNMERGE_VALUES [[UV7]](<32 x s32>)
+ ; CHECK-NEXT: [[SELECT:%[0-9]+]]:_(<16 x s32>) = G_SELECT [[ASSERT_ZEXT]](s32), [[UV2]], [[UV8]]
+ ; CHECK-NEXT: [[SELECT1:%[0-9]+]]:_(<16 x s32>) = G_SELECT [[ASSERT_ZEXT]](s32), [[UV3]], [[UV9]]
+ ; CHECK-NEXT: [[SELECT2:%[0-9]+]]:_(<16 x s32>) = G_SELECT [[ASSERT_ZEXT]](s32), [[UV4]], [[UV10]]
+ ; CHECK-NEXT: [[SELECT3:%[0-9]+]]:_(<16 x s32>) = G_SELECT [[ASSERT_ZEXT]](s32), [[UV5]], [[UV11]]
; CHECK-NEXT: [[CONCAT_VECTORS:%[0-9]+]]:_(<32 x s32>) = G_CONCAT_VECTORS [[SELECT]](<16 x s32>), [[SELECT1]](<16 x s32>)
- ; CHECK-NEXT: [[UV8:%[0-9]+]]:_(<16 x s32>), [[UV9:%[0-9]+]]:_(<16 x s32>) = G_UNMERGE_VALUES [[UV1]](<32 x s32>)
- ; CHECK-NEXT: [[UV10:%[0-9]+]]:_(<16 x s32>), [[UV11:%[0-9]+]]:_(<16 x s32>) = G_UNMERGE_VALUES [[UV3]](<32 x s32>)
- ; CHECK-NEXT: [[SELECT2:%[0-9]+]]:_(<16 x s32>) = G_SELECT [[ASSERT_ZEXT]](s32), [[UV8]], [[UV10]]
- ; CHECK-NEXT: [[SELECT3:%[0-9]+]]:_(<16 x s32>) = G_SELECT [[ASSERT_ZEXT]](s32), [[UV9]], [[UV11]]
; CHECK-NEXT: [[CONCAT_VECTORS1:%[0-9]+]]:_(<32 x s32>) = G_CONCAT_VECTORS [[SELECT2]](<16 x s32>), [[SELECT3]](<16 x s32>)
; CHECK-NEXT: [[CONCAT_VECTORS2:%[0-9]+]]:_(<64 x s32>) = G_CONCAT_VECTORS [[CONCAT_VECTORS]](<32 x s32>), [[CONCAT_VECTORS1]](<32 x s32>)
; CHECK-NEXT: $dm0 = COPY [[CONCAT_VECTORS2]](<64 x s32>)
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made the changes.
9a2c768
to
3307a20
Compare
3307a20
to
e5ca8e3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
e5ca8e3
to
0f293f5
Compare
No description provided.