[AIE2P] Support wide vector postinc 2D/3D, and offset load/store #323

abnikant · 2025-01-31T07:20:20Z

Enable combine_load_store_increment, combine_offset_load_store_ptradd and combine_offset_load_store_share_ptradd
Support wide vector POSTINC, POSTINC_2D, POSTINC_3D load and store.
Fixup , offset opcode.
Enable fifo combined load/store incr and offset load/store for 512-bits.
Add tests for combining and load/store.

andcarminati · 2025-01-31T08:08:52Z

llvm/test/CodeGen/AIE/aie2p/combine-loads-stores.mir

+# See https://llvm.org/LICENSE.txt for license information.
+# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+#
+# (c) Copyright 2023-2024 Advanced Micro Devices, Inc. or its affiliates


andcarminati · 2025-01-31T08:09:10Z

llvm/lib/Target/AIE/AIECombinerHelper.cpp

@@ -58,7 +58,10 @@ MachineInstr *findPreIncMatch(MachineInstr &MemI, MachineRegisterInfo &MRI,
                              const AIEBaseInstrInfo &TII) {
  // This is currently done with patterns in instruction selection.
  // No need to do it here.
-  if (MRI.getType(MemI.getOperand(0).getReg()).getSizeInBits() >= 1024)
+  MachineFunction &MF = *MemI.getMF();
+  bool isAIE2 = MF.getTarget().getTargetTriple().isAIE2();


nit: const

andcarminati · 2025-01-31T08:09:25Z

llvm/lib/Target/AIE/AIECombinerHelper.cpp

@@ -320,9 +323,11 @@ MachineInstr *findPostIncMatch(MachineInstr &MemI, MachineRegisterInfo &MRI,
                               const AIEBaseInstrInfo &TII) {
  if (!EnablePostIncCombine)
    return nullptr;
-  if (MRI.getType(MemI.getOperand(0).getReg()).getSizeInBits() >= 1024)
+  MachineFunction &MF = *MemI.getMF();
+  bool isAIE2 = MF.getTarget().getTargetTriple().isAIE2();


nit: const

andcarminati · 2025-01-31T08:19:41Z

llvm/test/CodeGen/AIE/aie2p/combine-loads-stores.mir

+  ; CHECK-NEXT: bb.1:
+  ; CHECK-NEXT:   successors: %bb.1(0x80000000)
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   [[PHI:%[0-9]+]]:_(p0) = G_PHI [[COPY]](p0), %bb.0, %2(p0), %bb.1


Interesting, here we have just %2 instead of the regex.

khallouh · 2025-01-31T10:49:18Z

llvm/lib/Target/AIE/aie2p/AIE2PInstructionSelector.cpp

+                                        : AIE2P::VLDA_dmx_lda_fifohl_idx;
+        return {ISelOpcode, FitsImmediateRange,
+                /*OffsetOpcode=*/AIE2P::VLDA_dmx_lda_fifohl_idx_imm};
+      }
      llvm_unreachable("512-bit vector type must be in AccRegBank or VRegBank");


Nit: ...or FifoRegBank

khallouh · 2025-01-31T10:49:40Z

llvm/lib/Target/AIE/aie2p/AIE2PInstrInfo.cpp

@@ -252,6 +252,11 @@ unsigned AIE2PInstrInfo::getOffsetMemOpcode(unsigned BaseMemOpcode) const {
  llvm_unreachable("not a generic load/store");
 }

+bool AIE2PInstrInfo::isOffsetMemOpcode(unsigned Opcode) const {


Nit: isGenericMemOffsetOpcode

updated the name.

martien-de-jong · 2025-01-31T12:58:51Z

llvm/lib/Target/AIE/AIECombinerHelper.cpp

+  MachineFunction &MF = *MemI.getMF();
+  const bool isAIE2 = MF.getTarget().getTargetTriple().isAIE2();
+  if (isAIE2 &&
+      MRI.getType(MemI.getOperand(0).getReg()).getSizeInBits() >= 1024)


Shouldn't there be a corresponding size check for AIE2P?

yes, I have added a check but I don't expect the vector size to be greater that 2048-bits.

martien-de-jong · 2025-01-31T13:09:11Z

llvm/lib/Target/AIE/aie2p/AIE2PInstrInfo.cpp

@@ -253,6 +253,11 @@ unsigned AIE2PInstrInfo::getOffsetMemOpcode(unsigned BaseMemOpcode) const {
  llvm_unreachable("not a generic load/store");
 }

+bool AIE2PInstrInfo::isGenericOffsetMemOpcode(unsigned Opcode) const {
+  return ((Opcode == AIE2P::G_AIE_OFFSET_STORE) ||
+          (Opcode == AIE2P::G_AIE_OFFSET_LOAD));


What about G_AIE_OFFSET_SEXTLOAD / G_AIE_OFFSET_ZEXTLOAD ?

F-Stuckmann · 2025-01-31T15:41:44Z

llvm/test/CodeGen/AIE/aie2p/fifo-loads.ll

-; CHECK-NEXT:    vst lfl0, [p1, #0] // Delay Slot 4
-; CHECK-NEXT:    vst lfh0, [p1, #64] // Delay Slot 3
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    nop


why can the vst not work on the fifo registers as before?

not sure I understood the question but if your question is why the test is updated ? since MIR is updated after the load/store combiner is enabled, it uses G_AIE_POSTINC_STORE.

do we have a vst post increment that works on fifo regs? Because that would be preferable in this situation

andcarminati · 2025-01-31T16:44:52Z

llvm/test/CodeGen/AIE/aie2p/ldst-fifo-stores.ll

-; CHECK-NEXT:    vst sfl, [p1, #0] // Delay Slot 4
-; CHECK-NEXT:    vst sfh, [p1, #64] // Delay Slot 3
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    vmov x1, sfh


I guess @F-Stuckmann is referring to this case. Now we are storing using vector registers.

we do have the post-increment for fifo-regs , refer to the test case inst-select-vector-pre-post-increment.mir. The problem with the above case is register bank assignment not assigning fiforegbank.

%7:ptrregbank(p0), %8:fiforegbank(<32 x s32>), %9:gprregbank(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.aie2p.fifo.ld.fill), %4(p0), %5(<32 x s32>), %6(s32)
%10:vregbank(<32 x s32>) = COPY %8(<32 x s32>)
%3:ptrregbank(p0) = G_AIE_POSTINC_STORE %10(<32 x s32>), %1, %2(s20) :: (store (<32 x s32>) into %ir.s)

if I use %8 in G_AIE_POSTINC_STORE , then it generates post-inc with fifo. We are missing G_AIE_POSTINC_STORE opcode in regbank-select.

After handling G_AIE_POSTINC_STORE in RegisterBank assignment, both of these tests ldst-fifo-stores.ll and fifo-loads.ll are working as expected. I will have a separate commit for this.

Hi @abnikant, I suspected that we were missing this proper register bank assignment. Thank you for the explanation.

@andcarminati I will open a separate PR for RegisterBank assignment, adding a commit in this PR does not make much sense.

Agree! This PR has complexity enough.

andcarminati · 2025-02-04T12:22:49Z

llvm/lib/Target/AIE/AIECombinerHelper.cpp

+  const Triple &TT = MF.getTarget().getTargetTriple();
+  const unsigned VecSize =
+      MRI.getType(MemI.getOperand(0).getReg()).getSizeInBits();
+  if ((TT.isAIE2() && VecSize >= 1024) || (TT.isAIE2P() && VecSize > 2048))


nit: can we refactor this check? Future targets can just extend it.

I can refactor, and probably add a target hook to check for maximum supported size for a sub-target. But here we still need to add check for VecSize == 1024 for AIE2 since we don't have instruction selection support for 1024-bit combined load/store (postinc/offset)?

Sure, any refactoring is welcome here ;-)

@andcarminati - Could you please review the refactored code? I had to merge two commits to prevent test failures during instruction selection after enabling combines, as support for instruction selection was introduced in the subsequent commit.

Hi @abnikant, I had a second round or review, I included some suggestions to try to simplify some parts. The implementation looks correct to me.

andcarminati · 2025-02-10T10:55:34Z

llvm/lib/Target/AIE/AIEBaseInstrInfo.h

@@ -567,6 +569,11 @@ struct AIEBaseInstrInfo : public TargetInstrInfo {
    llvm_unreachable("Target didn't implement getVecRegSize!");
  }

+  /// Return the maximum supported vector size for this target.
+  virtual unsigned getMaxVectorBitSize() const {
+    llvm_unreachable("Target didn't implement getMaxVectorSize!");


nit getMaxVectorBitSize.

andcarminati · 2025-02-10T11:16:38Z

llvm/lib/Target/AIE/AIECombinerHelper.cpp

+  const unsigned VecSize =
+      MRI.getType(MemI.getOperand(0).getReg()).getSizeInBits();
+  const unsigned MaxVecSize = TII.getMaxVectorBitSize();
+  // TODO: Remove the following check once 1024-bit load/store


nit: outdated comment.

I feel that we could have a common logic here, based on TII. Can we harmonize this asymmetric difference in some way? What do you think?

Can't skip this check if we use getMaxVectorBitSize , I have added another target hook getMaxSupportedLdStIncSize(), is this okay?

andcarminati · 2025-02-10T11:23:50Z

llvm/lib/Target/AIE/aie2p/AIE2PInstrInfo.cpp

@@ -259,6 +259,13 @@ unsigned AIE2PInstrInfo::getOffsetMemOpcode(unsigned BaseMemOpcode) const {
  llvm_unreachable("not a generic load/store");
 }

+bool AIE2PInstrInfo::isGenericOffsetMemOpcode(unsigned Opcode) const {


Maybe isAIEOffsetMemOpcode?

I am not in favor of selectively updating the name, Generic keyword is used in few other places for AIE opcodes, I think updating all other names along with this will make more sense , what do you think (using a separate small PR) ?

llvm/lib/Target/AIE/aie2p/AIE2PInstructionSelector.cpp

llvm/test/CodeGen/AIE/aie2p/GlobalIsel/inst-select-vector-indexed-load-store.mir

andcarminati · 2025-02-10T11:58:13Z

llvm/test/CodeGen/AIE/aie2p/ldst-fifo-stores.ll

-; CHECK-NEXT:    st r26, [p1, dj0] // Delay Slot 5
-; CHECK-NEXT:    vst sfl, [p1, #0] // Delay Slot 4
-; CHECK-NEXT:    vst sfh, [p1, #64] // Delay Slot 3
+; CHECK-NEXT:    nop


Maybe we could prioritize merging your RegBank PR first, as it represents an early compilation step. In this cases we will have this tests in the final shape. But it is just one idea.

That makes sense, but we need to merge this PR first; otherwise, ld-fifo.ll will break. After the Regbank PR, the regbank assignment for one of the operands in the load/store combine instruction is updated to fiforegbank, which isn't supported for instruction selection until this PR is merged.

In this case, it makes sense!

andcarminati

LGTM. Nice work! Thank you for addressing the comments.

2) [AIE2P] Support postinc 2D/3D, and offset load/store

abnikant · 2025-02-11T14:45:23Z

LGTM. Nice work! Thank you for addressing the comments.

Thanks @andcarminati for looking into this. I just rebased this PR to resolve conflicts in fifo-loads.ll test. Can you please check ?

andcarminati · 2025-02-12T13:13:09Z

llvm/lib/Target/AIE/aie2p/AIE2PInstructionSelector.cpp

      if (RBID == AIE2P::VRegBankID)
        return {/*ISelOpcode=*/AIE2P::VLDA_2D_dmx_lda_x, NoImmediate,
-                /*OffsetOpcode=*/{}};
-      llvm_unreachable("512-bit vector type must be in AccRegBank or VRegBank");
+                /*OffsetOpcode=*/{AIE2P::VLDA_dmw_lda_w_idx_imm}};


Late observation: is this VLDA_dmw_lda_w_idx_imm correct for size >= 512?

oh!, yes. This is wrong opcode and will result in wrong Copy. It should be VLDA_dmx_lda_x_idx_imm, I see it is used at two places. I will open a fixup PR shortly. Thanks for catching this.

andcarminati · 2025-02-12T13:16:27Z

llvm/test/CodeGen/AIE/aie2p/GlobalIsel/inst-select-vector-pre-post-increment.mir

+    ; CHECK-NEXT: [[MOV_PD_imm11_pseudo3:%[0-9]+]]:edc = MOV_PD_imm11_pseudo 4
+    ; CHECK-NEXT: [[REG_SEQUENCE:%[0-9]+]]:ed = REG_SEQUENCE [[MOV_PD_imm11_pseudo]], %subreg.sub_mod, [[MOV_PD_imm11_pseudo2]], %subreg.sub_dim_size, [[MOV_PD_imm11_pseudo1]], %subreg.sub_dim_stride, [[MOV_PD_imm11_pseudo3]], %subreg.sub_dim_count
+    ; CHECK-NEXT: [[VLDA_2D_dmx_lda_x:%[0-9]+]]:vec512, [[VLDA_2D_dmx_lda_x1:%[0-9]+]]:ep, [[VLDA_2D_dmx_lda_x2:%[0-9]+]]:edc = VLDA_2D_dmx_lda_x [[COPY]], [[REG_SEQUENCE]] :: (load (<16 x s32>), align 128)
+    ; CHECK-NEXT: [[VLDA_dmw_lda_w_idx_imm:%[0-9]+]]:mwa = VLDA_dmw_lda_w_idx_imm [[COPY]], 64 :: (load (<16 x s32>) from unknown-address + 64)


Here the same observation.

abnikant requested review from abhinay-anubola, andcarminati, F-Stuckmann, gbossu, katerynamuts, khallouh, konstantinschwarz, martien-de-jong, niwinanto, SagarMaheshwari99 and stephenneuendorffer as code owners January 31, 2025 07:20

andcarminati reviewed Jan 31, 2025

View reviewed changes

khallouh reviewed Jan 31, 2025

View reviewed changes

abnikant force-pushed the aie2p.wide.ld.st.incr.offset branch 2 times, most recently from 9d70b1d to a9baa70 Compare January 31, 2025 11:36

martien-de-jong reviewed Jan 31, 2025

View reviewed changes

abnikant force-pushed the aie2p.wide.ld.st.incr.offset branch from a9baa70 to 48a5ce8 Compare January 31, 2025 14:46

F-Stuckmann reviewed Jan 31, 2025

View reviewed changes

abnikant force-pushed the aie2p.wide.ld.st.incr.offset branch from 48a5ce8 to c82ccd3 Compare January 31, 2025 16:21

andcarminati reviewed Jan 31, 2025

View reviewed changes

abnikant force-pushed the aie2p.wide.ld.st.incr.offset branch from c82ccd3 to 81c7263 Compare February 3, 2025 06:36

andcarminati reviewed Feb 4, 2025

View reviewed changes

abnikant force-pushed the aie2p.wide.ld.st.incr.offset branch 3 times, most recently from 488b7c5 to d415c8c Compare February 10, 2025 05:25

abnikant force-pushed the aie2p.wide.ld.st.incr.offset branch from d415c8c to b247267 Compare February 10, 2025 06:10

khallouh mentioned this pull request Feb 10, 2025

[AIE2P] Improve RegbankSelect handling for load/store offset and post-increment addressing modes #337

Merged

andcarminati reviewed Feb 10, 2025

View reviewed changes

llvm/lib/Target/AIE/aie2p/AIE2PInstructionSelector.cpp Show resolved Hide resolved

andcarminati reviewed Feb 10, 2025

View reviewed changes

llvm/test/CodeGen/AIE/aie2p/GlobalIsel/inst-select-vector-indexed-load-store.mir Show resolved Hide resolved

andcarminati reviewed Feb 10, 2025

View reviewed changes

abnikant force-pushed the aie2p.wide.ld.st.incr.offset branch 2 times, most recently from 99756e3 to 49ab730 Compare February 11, 2025 11:33

andcarminati previously approved these changes Feb 11, 2025

View reviewed changes

1) [AIE2P] Enable post-pre incr and offset load combine

1b8f97b

2) [AIE2P] Support postinc 2D/3D, and offset load/store

abnikant dismissed andcarminati’s stale review via 1b8f97b February 11, 2025 14:43

abnikant force-pushed the aie2p.wide.ld.st.incr.offset branch from 49ab730 to 1b8f97b Compare February 11, 2025 14:43

andcarminati approved these changes Feb 11, 2025

View reviewed changes

abnikant merged commit 333cd37 into aie-public Feb 11, 2025
8 checks passed

SagarMaheshwari99 deleted the aie2p.wide.ld.st.incr.offset branch February 12, 2025 13:01

andcarminati reviewed Feb 12, 2025

View reviewed changes

[AIE2P] Support wide vector postinc 2D/3D, and offset load/store #323

[AIE2P] Support wide vector postinc 2D/3D, and offset load/store #323

Conversation

abnikant commented Jan 31, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abnikant Jan 31, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andcarminati Jan 31, 2025 • edited Loading

Choose a reason for hiding this comment

abnikant Jan 31, 2025 • edited Loading

Choose a reason for hiding this comment

abnikant Feb 3, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abnikant Feb 11, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andcarminati left a comment

Choose a reason for hiding this comment

abnikant commented Feb 11, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abnikant Jan 31, 2025 •

edited

Loading

andcarminati Jan 31, 2025 •

edited

Loading

abnikant Jan 31, 2025 •

edited

Loading

abnikant Feb 3, 2025 •

edited

Loading

abnikant Feb 11, 2025 •

edited

Loading