-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIE2P] Enable S20 Narrowing for FIFO Loads and Stores #333
base: aie-public
Are you sure you want to change the base?
Conversation
1e3c4ac
to
54f625d
Compare
case Intrinsic::aie2p_fifo_ld_pop_576_1d_bfp16: | ||
case Intrinsic::aie2p_fifo_ld_pop_576_2d_bfp16: | ||
case Intrinsic::aie2p_fifo_ld_pop_576_3d_bfp16: | ||
case Intrinsic::aie2p_fifo_st_flush_2d: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We now have more intrinsics available, maybe double check they are all present
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering whether all scalar operands of these instructions are accurately qualified as 'S20'. It at least seems likely that they don't consume more than 20 bits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See:
%4 = tail call { ptr, <32 x i32>, i32, <64 x i8>, <8 x i8> } @llvm.aie2p.fifo.ld.pop.576.1d.bfp16(ptr %0, <32 x i32> %1, i32 %2, i20 %3)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering whether all scalar operands of these instructions are accurately qualified as 'S20
The availability register is indeed an s32
, and we should leave it as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-opening: I think @martien-de-jong 's comment needs to be addressed.
This function should only return true
for actual s20 operands. FIFO ld/st have an s32 availability operand, isNativeS20ConsumerIntrinsic
should return false
for this one. Maybe change the prototype to something like isNativeS20ConsumerIntrinsic(unsigned IntrinsicID, unsigned OperandIdx)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the actual Operand extraction happens in getOperandsToNarrow
, where @llvm.aie2p.fifo.ld.pop.576.1d.bfp16
would not provide the i32
operand as a i20
operand to handle. Here we just check if the MI has Operands that are i20 and could profit from the combine.
default: | ||
case TargetOpcode::G_INTRINSIC_W_SIDE_EFFECTS: { | ||
const unsigned IntrinsicID = cast<GIntrinsic>(Use).getIntrinsicID(); | ||
if (!isNativeS20ConsumerIntrinsic(IntrinsicID)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice you factored that out!
llvm/test/CodeGen/AIE/aie2p/GlobalIsel/prelegalizercombiner-s20-narrowing.mir
Outdated
Show resolved
Hide resolved
llvm/test/CodeGen/AIE/aie2p/GlobalIsel/prelegalizercombiner-s20-narrowing.mir
Show resolved
Hide resolved
f526965
to
cf0e8ab
Compare
llvm/test/CodeGen/AIE/aie2p/GlobalIsel/prelegalizercombiner-s20-narrowing.mir
Show resolved
Hide resolved
cf0e8ab
to
287837f
Compare
; CHECK-NEXT: mov m0, r0 | ||
; CHECK-NEXT: mov p3, p2 | ||
; CHECK-NEXT: mov dn0, r1 | ||
; CHECK-NEXT: mov dj0, r2 | ||
; CHECK-NEXT: mov p2, p4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We seem to have more pointer movs now. Do you understand why?
Added FIFO load and store intrinsics as S20 consumers in S20Narrowing
Added and updated the tests for the same