[AIE2P] Combine G_SHUFFLE_VECTOR to G_AIE_EXTRACT_SUBVECTOR #302

katerynamuts · 2025-01-23T09:30:48Z

No description provided.

andcarminati · 2025-01-23T12:57:50Z

llvm/lib/Target/AIE/AIECombinerHelper.cpp

  const unsigned NumDstElems = DstTy.getNumElements();
  const unsigned NumSrcElems = Src1Ty.getNumElements();
+  if (NumDstElems < NumSrcElems) {


I guess that this could prevent a broadcast of an element of a bigger vector into a smaller one, no?

yes, but as far as I can see this combine matchShuffleToBroadcast doesn't cover this case. And without this check It fails for the following test:
%0:_(<4 x s8>) = G_SHUFFLE_VECTOR %1(<8 x s8>), %2(<8 x s8>), shufflemask(-1, 5, -1, 3)
because createDuplicatePatternMask returns a mask of size NumDstElems / NumSrcElems=0.

In general, this PR is just to combine G_SHUFFLE_VECTOR to G_AIE_EXTRACT_SUBVECTOR but there will be a follow-up PR which will extract a subvector and broadcast it and I think I can try to cover this case (matchShuffleToBroadcast) there.

andcarminati · 2025-01-23T13:11:12Z

llvm/test/CodeGen/AIE/GlobalISel/prelegalizercombiner-shuffle-vector.mir

@@ -227,3 +227,129 @@ body:             |
    $x0 = COPY %0(<16 x s32>)
    PseudoRET implicit $lr, implicit $x0
 ...
+---


What about this case:

--- name: shuffle_vector_to_extract_subvec_4x8Dst tracksRegLiveness: true body: | bb.1: liveins: $l0, $l1 %1:_(<8 x s8>) = COPY $l0 %2:_(<8 x s8>) = COPY $l1 %0:_(<4 x s8>) = G_SHUFFLE_VECTOR %1(<8 x s8>), %2(<8 x s8>), shufflemask(8, 9, 10, 11) PseudoRET implicit $lr, implicit %0

Now we just discard, but we could return the first subreg of the second src register.

What do you think?

I also thought about it but @konstantinschwarz said there is a PR in upstream for canonicalization, i.e., if we have a mask with all indices from the second source vector, then the source vectors are switched and the indices are changed so that they correspond to the new first source vector (previously, source vector 2). It means that we implement all combines only for the source vector 1.

andcarminati · 2025-01-23T13:13:49Z

llvm/lib/Target/AIE/AIECombinerHelper.cpp

+                                       BuildFnTy &MatchInfo) {
+  assert(MI.getOpcode() == TargetOpcode::G_SHUFFLE_VECTOR);
+  const Register DstReg = MI.getOperand(0).getReg();
+  const Register Src1Reg = MI.getOperand(1).getReg();


I see a nice opportunity here by considering extracts from the second input operand.

niwinanto · 2025-01-24T09:00:08Z

llvm/lib/Target/AIE/AIEInstrGISel.td

@@ -126,3 +126,10 @@ def G_AIE_VSHIFT_RIGHT : AIEGenericInstruction {
  let InOperandList = (ins type0:$src1, type0:$src2, type1:$shift_amt);
  let hasSideEffects = false;
 }
+
+// Extract 32-bit or 64-bit subvector.
+def G_AIE_EXTRACT_SUBVECTOR : AIEGenericInstruction {


Since we are only extracting 32/64-bit, is it really subvector extract? Can we use G_AIE_SEXT_EXTRACT_VECTOR_ELT?

Maybe we could reuse and extend to include immediate usage. G_AIE_SEXT_EXTRACT_VECTOR_ELT also expands the index to a register, so as is, it is not interesting.

The comment must be changed. It extracts 32/64 bit vectors: 4 x s8 or 2 x s16 for 32-bit and 8 x s8, 4 x s16 and 2 x s32 for 64-bit. I discussed it already with @konstantinschwarz to extend G_AIE_SEXT_EXTRACT_VECTOR_ELT to cover also output vectors not only elements but we came to conclusion that it would be confusing when G_AIE_SEXT_EXTRACT_VECTOR_ELT extracts a vector and not an element, so we decided to introduce G_AIE_EXTRACT_SUBVECTOR for extracting vectors.

andcarminati · 2025-01-24T09:16:40Z

llvm/lib/Target/AIE/AIECombinerHelper.cpp

+    auto ExtractMask = createSequentialMask(Start, NumElems, 0);
+
+    for (unsigned I = 0; I < NumDstElems; I++) {
+      if (Mask[I] == -1)


nit: comment about ignoring undefs.

andcarminati · 2025-01-24T09:47:43Z

llvm/lib/Target/AIE/AIECombinerHelper.cpp

+      if (Mask[I] == -1)
+        continue;
+
+      if (Mask[I] != ExtractMask[I])


What about a more compact implementation:

auto CheckExtractMask = [&](unsigned Start, unsigned NumElems) -> bool { auto ExtractMask = createSequentialMask(Start, NumElems, 0); return std::equal(Mask.begin(), Mask.end(), ExtractMask.begin(), ExtractMask.end(), [&](const int LHS, const int RHS) { return (LHS == -1 || LHS == RHS); }); };

niwinanto · 2025-01-24T10:21:54Z

llvm/lib/Target/AIE/aie2p/AIE2PInstrPatterns.td

@@ -719,6 +719,28 @@ def : Pat<(int_aie2p_vinsert64_bf512 VEC512:$s1, eR29:$idx, eL:$s0),
 def : Pat<(int_aie2p_vinsert32_accfloat ACC512:$s1, eR29:$idx, eR:$s0),
           (COPY_TO_REGCLASS (VINSERT_32_mR29_insert (COPY_TO_REGCLASS ACC512:$s1, VEC512), eR29:$idx, eR:$s0), ACC512)>;

+// VEXTRACT


We are selecting like this (<4 x s8>) = G_AIE_EXTRACT_SUBVECTOR [[COPY]](<8 x s8>), [[C]](s32) in the prelegalizer. Output type differs from selection patterns below.

katerynamuts · 2025-01-27T10:37:17Z

The only native source vector type for G_AIE_EXTRACT_SUBVECTOR is 512-bit vectors. The output of G_AIE_EXTRACT_SUBVECTOR is always a 32- or 64-bit vector. I updated the combiner method matchShuffleToExtractSubvec, so that it also supports the source vector types other than 512 bits. And I updated the instruction selection for G_AIE_EXTRACT_SUBVECTOR.

niwinanto · 2025-01-27T12:22:30Z

llvm/lib/Target/AIE/AIEInstrGISel.td

@@ -126,3 +126,10 @@ def G_AIE_VSHIFT_RIGHT : AIEGenericInstruction {
  let InOperandList = (ins type0:$src1, type0:$src2, type1:$shift_amt);
  let hasSideEffects = false;
 }
+
+// Extract 32-bit or 64-bit subvector.


Can we include 16-bit as well, which can extract <2 x 8>?

I discussed it with @konstantinschwarz and we decided to introduce G_AIE_EXTRACT_SUBVECTOR to cover, e.g., the following case: shufflemask <0,1,0,1,0,1> -> bcst (extract (src <0,1>)). And the input for bcst is 32 or 64 bit register, so at least for now we allow G_AIE_EXTRACT_SUBVECTOR to extract 32 and 64 bit vectors because this combine is currently the only place where we use G_AIE_EXTRACT_SUBVECTOR.

niwinanto · 2025-01-27T12:26:18Z

llvm/lib/Target/AIE/AIECombinerHelper.cpp

+      if (Src1TySize == 32) {
+        B.buildConcatVectors(
+            {NewSrcReg}, {Src1Reg, ImplicitDef, ImplicitDef, ImplicitDef,
+                          ImplicitDef, ImplicitDef, ImplicitDef, ImplicitDef,
+                          ImplicitDef, ImplicitDef, ImplicitDef, ImplicitDef,
+                          ImplicitDef, ImplicitDef, ImplicitDef, ImplicitDef});
+      } else if (Src1TySize == 64) {
+        B.buildConcatVectors(
+            {NewSrcReg}, {Src1Reg, ImplicitDef, ImplicitDef, ImplicitDef,
+                          ImplicitDef, ImplicitDef, ImplicitDef, ImplicitDef});
+      } else if (Src1TySize == 128) {
+        B.buildConcatVectors({NewSrcReg},
+                             {Src1Reg, ImplicitDef, ImplicitDef, ImplicitDef});
+      } else { // Src1TySize == 256
+        B.buildConcatVectors({NewSrcReg}, {Src1Reg, ImplicitDef});


Can we do better, to avoid code duplication. May be use for loop to generate concat vector Ops.

niwinanto · 2025-01-27T12:31:53Z

llvm/lib/Target/AIE/AIECombinerHelper.cpp

+    }
+
+    if (Src1TySize == 2048) {
+      NewSubIdx = SubIdx % (NumSubVectors / 4);


Nit: May be using separate variable for NumSubVectors / 4 increase readability.

niwinanto · 2025-01-27T12:35:10Z

llvm/lib/Target/AIE/AIECombinerHelper.cpp

+  const Register NewSrcReg = MRI.createGenericVirtualRegister(NewSrc1Ty);
+
+  // 32, 64, 126, 256 bit source vectors
+  if (Src1TySize < 512) {


Why we need to support 32 and 64 source vector?

The minimum extract subvector size is 32, so what does extract 32 from 32-bit vector means.

This covers this case: %0:_(<4 x s8>) = G_SHUFFLE_VECTOR %1(<4 x s8>), %2(<4 x s8>), shufflemask(0, 1, 2, 3)

Isn't that equivalent to %0 = COPY %1?

yes, I've added this already to my implementation. See this test: https://github.com/Xilinx/llvm-aie/pull/302/files#:~:text=%2D%2D%2D-,name%3A%20shuffle_vector_to_extract_subvec_32BitSrc,-tracksRegLiveness%3A%20true

niwinanto · 2025-01-27T12:40:24Z

llvm/test/CodeGen/AIE/GlobalISel/prelegalizercombiner-shuffle-vector.mir

+    ; CHECK-NEXT: PseudoRET implicit $lr, implicit [[AIE_EXTRACT_SUBVECTOR]](<4 x s8>)
+    %1:_(<4 x s8>) = COPY $r0
+    %2:_(<4 x s8>) = COPY $r1
+    %0:_(<4 x s8>) = G_SHUFFLE_VECTOR %1(<4 x s8>), %2(<4 x s8>), shufflemask(0, 1, 2, 3)


It seems %0:_(<4 x s8>) = COPY %1 is more optimal.
Not sure this is a phase ordering problem or we are not supporting already.

It is a good point but we don't support this combine (G_SHUFFLE_VECTOR->COPY) yet.

But maybe I can cover this case in my function and then this will be more optimal and I won't need support for 32 bit vectors

I added this case to my implementation

andcarminati · 2025-01-27T16:27:32Z

llvm/lib/Target/AIE/AIECombinerHelper.cpp

+    }
+
+    if (Src1TySize == 2048) {
+      const unsigned NumSubVectors512Bits = NumSubVectors / 4;


I think this code can be simplified. Just one unmerge in the end.

Yes, put the UnusedSubregs in an array and build the operand vector from it.

martien-de-jong · 2025-01-27T17:00:54Z

llvm/lib/Target/AIE/AIEBaseInstrInfo.h

@@ -145,6 +145,10 @@ struct AIEBaseInstrInfo : public TargetInstrInfo {
  virtual unsigned getGenericVShiftOpcode() const {
    llvm_unreachable("Target didn't implement getGenericVShiftOpcode!");
  }
+  /// Return the opcode to be used for subvector extraction.
+  virtual unsigned getGenericExtractSubvectorOpcode() const {
+    llvm_unreachable("Target didn't implement getGenericVSelOpcode!");


nit: incorrect function name in the message

andcarminati · 2025-01-30T10:04:27Z

llvm/test/CodeGen/AIE/GlobalISel/prelegalizercombiner-shuffle-vector.mir

-    %0:_(<4 x s64>) = COPY $wl0
-    %1:_(<4 x s64>) = COPY $wl1
-    %2:_(<8 x s64>) = G_SHUFFLE_VECTOR %0(<4 x s64>), %1(<4 x s64>), shufflemask(3, 3, -1, -1, 3, 3, 3, 3)
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(<8 x s32>) = COPY $wl0


This delivers a nice piece of code:

nopa ; nopb ; nops ; ret lr; nopm ; nopv mova r0, #2 // Delay Slot 5 vextract.32 r0, x0, r0, vaddsign1 // Delay Slot 4 nop // Delay Slot 3 vbcst.32 x0, r0 // Delay Slot 2 nop // Delay Slot 1

andcarminati · 2025-01-30T10:31:47Z

llvm/lib/Target/AIE/AIECombinerHelper.cpp

+  if (NumSrc1Elems % NumDstElems != 0)
+    return false;
+
+  auto GetSubIdx = [=]() -> std::optional<int> {


nit: you can capture by reference here to avoid a mask copy.

andcarminati · 2025-01-30T10:55:47Z

llvm/lib/Target/AIE/AIECombinerHelper.cpp

+                                       MachineRegisterInfo &MRI,
+                                       const AIEBaseInstrInfo &TII,
+                                       BuildFnTy &MatchInfo) {
+  // To parametrize this function we define some generic constants. Ideally,


I think we should go to this generic approach. This is the only non-generic part of the code.

andcarminati · 2025-01-30T10:58:16Z

llvm/lib/Target/AIE/AIECombinerHelper.cpp

+  if (NumSrc1Elems % NumDstElems != 0)
+    return false;
+
+  auto GetSubIdx = [=]() -> std::optional<int> {


super-nit: As we use optional, we can use unsigned here.

andcarminati · 2025-01-30T11:03:18Z

llvm/lib/Target/AIE/AIECombinerHelper.cpp

+
+  // Source vectors with the size greater than the native source vector size
+  MatchInfo = [=, &MRI](MachineIRBuilder &B) {
+    int NewSubIdx = -1;


nit: declare and initialize in the same line below, perhaps using unsigned.

andcarminati · 2025-01-30T11:19:03Z

llvm/lib/Target/AIE/AIECombinerHelper.cpp

+    NewSubIdx = SubIdx.value() % NumSubVectorsNativeSize;
+
+    SmallVector<Register, 4> SubRegs;
+    for (unsigned I = 0; I < SizeCoefficient - 1; ++I) {


nit: Instead of shifting the vector to insert, we can insert in place:

unsigned NewSrcRegPosition = SubIdx.value() / NumSubVectorsNativeSize; for (unsigned I = 0; I < SizeCoefficient; ++I) { if (I == NewSrcRegPosition) SubRegs.push_back(NewSrcReg); else SubRegs.push_back(MRI.createGenericVirtualRegister(NewSrc1Ty)); }

andcarminati · 2025-01-30T11:22:15Z

llvm/lib/Target/AIE/aie2p/AIE2PLegalizerInfo.cpp

-            // to EXTRACT_VECTOR_ELT
-            if (!isVectorRegisterSized(DstTy))
-              return false;
+            if (SrcTy.isVector() && DstTy.isScalar()) {


Check: this is part of other PR, but it is here now for completeness, right?

The part inside this if was already there, I just add this if to extend fewerElementsIf for this case: SrcTy.isVector() && DstTy.isVector().

In case, it would be nice to have a test for this change.

andcarminati · 2025-01-30T15:36:49Z

llvm/lib/Target/AIE/aie2p/AIE2PLegalizerInfo.cpp

+              return SrcTy.getNumElements() > 2;
+            }
+
+            if (SrcTy.isVector() && DstTy.isVector()) {


nit:

if (SrcTy.isVector() && DstTy.isVector()) { // Unmerges into 2 subvectors are legal return !(2 * DstTy.getSizeInBits() == SrcTy.getSizeInBits()); }

andcarminati · 2025-01-31T08:44:59Z

llvm/lib/Target/AIE/AIE2InstrInfo.cpp

@@ -1620,3 +1620,9 @@ Register AIE2InstrInfo::getMSStatusReg() const { return AIE2::srMS0; }
 Register AIE2InstrInfo::getPackSignCReg() const { return AIE2::crPackSign; }

 Register AIE2InstrInfo::getUnpackSignCReg() const { return AIE2::crUnpackSign; }
+
+unsigned AIE2InstrInfo::getGPRSize() const { return 32; }


nit: maybe getScalarRegSize.

F-Stuckmann · 2025-01-31T08:45:08Z

llvm/lib/Target/AIE/AIEBaseInstrInfo.h

@@ -145,6 +145,11 @@ struct AIEBaseInstrInfo : public TargetInstrInfo {
  virtual unsigned getGenericVShiftOpcode() const {
    llvm_unreachable("Target didn't implement getGenericVShiftOpcode!");
  }
+  /// Return the opcode to be used for subvector extraction.
+  virtual unsigned getGenericExtractSubvectorOpcode() const {


I know this is probably the wrong place and time to note this, but it would make sense to have shared opcodes across all AIE targets and only specify the opcodes that differ. Therefore we would have all generic optimization by default enabled in every architecture we have, instead of manually adding support for each architecture by providing target specific hooks.

absolutely agree, but it should be done in a separate PR. :)

I think that the name can be the same, but internally they can have different numbers. So we need AIE2:: and AIE2p::.

yeah, but there should be a AIEBASE:: opcode collection.

Then all combiners etc. work on the AIEBASE:: as much as possible and only target specific Opcodes get own hooks (like fifo load/stores)

That's why this is not the point in time to do such a large overhaul, but it may help us in the future

katerynamuts · 2025-01-31T09:16:34Z

llvm/lib/Target/AIE/AIECombinerHelper.cpp

+  // Currently, we cannot concat vectors of the size less than vector register
+  // size.
+  assert(Src1TySize >= VecRegSize && "Unsupported source register size");


@andcarminati what do you think about this ? is it better to have an assert here or just return so that we don't crash if we have this case?

As this is a match function, it is better to not crash! You are right.

andcarminati

LGTM. Nice work!

katerynamuts requested review from abhinay-anubola, abnikant, andcarminati, F-Stuckmann, gbossu, khallouh, konstantinschwarz, martien-de-jong, niwinanto, SagarMaheshwari99 and stephenneuendorffer as code owners January 23, 2025 09:30

katerynamuts force-pushed the kmuts.subvector branch from 978972b to 2903dc3 Compare January 23, 2025 09:31

andcarminati reviewed Jan 23, 2025

View reviewed changes

katerynamuts force-pushed the kmuts.subvector branch from 2903dc3 to 4a12a1f Compare January 23, 2025 13:54

niwinanto reviewed Jan 24, 2025

View reviewed changes

andcarminati reviewed Jan 24, 2025

View reviewed changes

niwinanto reviewed Jan 24, 2025

View reviewed changes

katerynamuts force-pushed the kmuts.subvector branch from 4a12a1f to cd13ed1 Compare January 27, 2025 10:36

katerynamuts requested review from niwinanto and andcarminati January 27, 2025 10:37

niwinanto reviewed Jan 27, 2025

View reviewed changes

katerynamuts force-pushed the kmuts.subvector branch from cd13ed1 to c2e822d Compare January 27, 2025 14:30

katerynamuts requested a review from niwinanto January 27, 2025 14:30

katerynamuts force-pushed the kmuts.subvector branch from c2e822d to 490891c Compare January 27, 2025 15:27

andcarminati reviewed Jan 27, 2025

View reviewed changes

martien-de-jong reviewed Jan 27, 2025

View reviewed changes

katerynamuts requested review from andcarminati, niwinanto and martien-de-jong January 30, 2025 08:55

andcarminati reviewed Jan 30, 2025

View reviewed changes

katerynamuts force-pushed the kmuts.subvector branch 2 times, most recently from 1bf87f9 to c01a09a Compare January 30, 2025 12:43

katerynamuts requested a review from andcarminati January 30, 2025 12:43

andcarminati reviewed Jan 30, 2025

View reviewed changes

katerynamuts force-pushed the kmuts.subvector branch 2 times, most recently from 5bfbc25 to 37f1017 Compare January 31, 2025 08:18

katerynamuts requested a review from andcarminati January 31, 2025 08:18

andcarminati reviewed Jan 31, 2025

View reviewed changes

F-Stuckmann reviewed Jan 31, 2025

View reviewed changes

katerynamuts force-pushed the kmuts.subvector branch 3 times, most recently from 772f213 to 289d58d Compare January 31, 2025 08:56

[AIE2P] Add G_AIE_EXTRACT_SUBVECTOR

7fd7b52

katerynamuts commented Jan 31, 2025

View reviewed changes

katerynamuts added 2 commits January 31, 2025 09:23

[AIE2P] Combine G_SHUFFLE_VECTOR to G_AIE_EXTRACT_SUBVECTOR

97be67b

[AIE2P] Instruction selection for G_AIE_VECTOR_SUBVECTOR

1d423d2

katerynamuts force-pushed the kmuts.subvector branch from 289d58d to 1d423d2 Compare January 31, 2025 09:24

andcarminati approved these changes Jan 31, 2025

View reviewed changes

katerynamuts merged commit ab6aab4 into aie-public Jan 31, 2025
8 checks passed

katerynamuts deleted the kmuts.subvector branch January 31, 2025 10:40

[AIE2P] Combine G_SHUFFLE_VECTOR to G_AIE_EXTRACT_SUBVECTOR #302

[AIE2P] Combine G_SHUFFLE_VECTOR to G_AIE_EXTRACT_SUBVECTOR #302

Conversation

katerynamuts commented Jan 23, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andcarminati Jan 24, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andcarminati Jan 24, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

katerynamuts commented Jan 27, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andcarminati Jan 30, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andcarminati Jan 30, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

F-Stuckmann Jan 31, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andcarminati left a comment

Choose a reason for hiding this comment

andcarminati Jan 24, 2025 •

edited

Loading

andcarminati Jan 24, 2025 •

edited

Loading

andcarminati Jan 30, 2025 •

edited

Loading

andcarminati Jan 30, 2025 •

edited

Loading

F-Stuckmann Jan 31, 2025 •

edited

Loading