[AMDGPU] Early bail in getFunctionCodeSize for meta inst. NFC. #127129

rampitec · 2025-02-13T21:21:51Z

It does not change the estimate because getInstSizeInBytes() already
returns 0 for meta instructions, but added a test and early bail.

rampitec · 2025-02-13T21:22:14Z

[AMDGPU] Respect MBB alignment in the getFunctionCodeSize() #127142
[AMDGPU] Early bail in getFunctionCodeSize for meta inst. NFC. #127129 👈 (View in Graphite)
[AMDGPU] Move into SIProgramInfo and cache getFunctionCodeSize. NFCI. #127111 : 2 other dependent PRs (#126981 , #127246 )
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

llvmbot · 2025-02-13T21:24:09Z

@llvm/pr-subscribers-backend-amdgpu

Author: Stanislav Mekhanoshin (rampitec)

Changes

It does not change the estimate because getInstSizeInBytes() already
returns 0 for meta instructions, but added a test and early bail.

Full diff: https://github.com/llvm/llvm-project/pull/127129.diff

2 Files Affected:

(modified) llvm/lib/Target/AMDGPU/SIProgramInfo.cpp (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/code-size-estimate.mir (+13)

diff --git a/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp b/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp
index 5179288084010..b995687e71780 100644
--- a/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp
@@ -216,7 +216,7 @@ uint64_t SIProgramInfo::getFunctionCodeSize(const MachineFunction &MF) {
       // TODO: CodeSize should account for multiple functions.
 
       // TODO: Should we count size of debug info?
-      if (MI.isDebugInstr())
+      if (MI.isDebugInstr() || MI.isMetaInstruction())
         continue;
 
       CodeSize += TII->getInstSizeInBytes(MI);
diff --git a/llvm/test/CodeGen/AMDGPU/code-size-estimate.mir b/llvm/test/CodeGen/AMDGPU/code-size-estimate.mir
index 9e46c58b6b5a9..76eaf350301e4 100644
--- a/llvm/test/CodeGen/AMDGPU/code-size-estimate.mir
+++ b/llvm/test/CodeGen/AMDGPU/code-size-estimate.mir
@@ -18,3 +18,16 @@ body:             |
   $vgpr16 = V_MOV_B32_indirect_read undef $vgpr1, implicit $exec, implicit $m0, implicit $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15
   V_MOV_B32_indirect_write undef $vgpr0, undef $vgpr3, implicit $exec, implicit $m0, implicit-def $vgpr0_vgpr1_vgpr2_vgpr3, implicit killed $vgpr0_vgpr1_vgpr2_vgpr3(tied-def 4)
 ...
+
+# CHECK: meta:                                   ; @meta
+# CHECK: ; wave barrier
+# CHECK: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; encoding: [0x00,0x00,0x8c,0xbf]
+# CHECK: ; codeLenInByte = 4
+---
+name:            meta
+tracksRegLiveness: true
+body:             |
+  bb.0:
+
+  WAVE_BARRIER
+...

arsenm · 2025-02-14T00:30:36Z

There is also MF.estimateFunctionSizeInBytes(), probably should use that as a stop gap until MC computes this

rampitec · 2025-02-14T19:22:00Z

There is also MF.estimateFunctionSizeInBytes(), probably should use that as a stop gap until MC computes this

#127246

For some reason it is not const and also can overestimate code size.

llvm/lib/Target/AMDGPU/SIProgramInfo.cpp

It does not change the estimate because getInstSizeInBytes() already returns 0 for meta instructions, but added a test and early bail.

llvm-ci · 2025-02-18T10:18:00Z

LLVM Buildbot has detected a new failure on builder openmp-offload-amdgpu-runtime running on omp-vega20-0 while building llvm at step 7 "Add check check-offload".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/30/builds/15977

Here is the relevant piece of the build log for the reference

Step 7 (Add check check-offload) failure: test (failure)
...
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/bug50022.cpp (999 of 1008)
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/test_libc.cpp (1000 of 1008)
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/bug49779.cpp (1001 of 1008)
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/wtime.c (1002 of 1008)
PASS: libomptarget :: x86_64-unknown-linux-gnu :: offloading/bug49021.cpp (1003 of 1008)
PASS: libomptarget :: x86_64-unknown-linux-gnu :: offloading/std_complex_arithmetic.cpp (1004 of 1008)
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/complex_reduction.cpp (1005 of 1008)
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/bug49021.cpp (1006 of 1008)
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/std_complex_arithmetic.cpp (1007 of 1008)
TIMEOUT: libomptarget :: amdgcn-amd-amdhsa :: offloading/parallel_offloading_map.cpp (1008 of 1008)
******************** TEST 'libomptarget :: amdgcn-amd-amdhsa :: offloading/parallel_offloading_map.cpp' FAILED ********************
Exit Code: -9
Timeout: Reached timeout of 100 seconds

Command Output (stdout):
--
# RUN: at line 1
/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/clang++ -fopenmp    -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src  -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib  -fopenmp-targets=amdgcn-amd-amdhsa /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/parallel_offloading_map.cpp -o /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/Output/parallel_offloading_map.cpp.tmp /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a && /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/Output/parallel_offloading_map.cpp.tmp | /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/parallel_offloading_map.cpp
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/clang++ -fopenmp -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -fopenmp-targets=amdgcn-amd-amdhsa /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/parallel_offloading_map.cpp -o /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/Output/parallel_offloading_map.cpp.tmp /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a
# note: command had no output on stdout or stderr
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/Output/parallel_offloading_map.cpp.tmp
# note: command had no output on stdout or stderr
# error: command failed with exit status: -9
# error: command reached timeout: True
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/parallel_offloading_map.cpp
# note: command had no output on stdout or stderr
# error: command failed with exit status: -9
# error: command reached timeout: True

--

********************
Slowest Tests:
--------------------------------------------------------------------------
100.06s: libomptarget :: amdgcn-amd-amdhsa :: offloading/parallel_offloading_map.cpp
16.38s: libomptarget :: amdgcn-amd-amdhsa :: offloading/bug49021.cpp
12.49s: libomptarget :: amdgcn-amd-amdhsa :: offloading/parallel_target_teams_reduction_min.cpp
12.45s: libomptarget :: amdgcn-amd-amdhsa :: offloading/parallel_target_teams_reduction_max.cpp
10.78s: libomptarget :: amdgcn-amd-amdhsa :: offloading/complex_reduction.cpp
9.38s: libomptarget :: amdgcn-amd-amdhsa :: jit/empty_kernel_lvl2.c
8.90s: libomptarget :: x86_64-unknown-linux-gnu :: offloading/bug49021.cpp
7.70s: libomptarget :: amdgcn-amd-amdhsa :: offloading/ompx_saxpy_mixed.c
7.56s: libomptarget :: amdgcn-amd-amdhsa :: offloading/barrier_fence.c
7.35s: libomptarget :: x86_64-unknown-linux-gnu :: offloading/std_complex_arithmetic.cpp
7.31s: libomptarget :: x86_64-unknown-linux-gnu :: offloading/complex_reduction.cpp
6.55s: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/bug49021.cpp
5.94s: libomptarget :: amdgcn-amd-amdhsa :: offloading/parallel_target_teams_reduction.cpp
5.10s: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/std_complex_arithmetic.cpp
5.03s: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/complex_reduction.cpp

…127129) It does not change the estimate because getInstSizeInBytes() already returns 0 for meta instructions, but added a test and early bail.

This was referenced Feb 13, 2025

[AMDGPU] Set inst_pref_size to maximum #126981

Open

[AMDGPU] Move into SIProgramInfo and cache getFunctionCodeSize. NFCI. #127111

Merged

rampitec requested a review from arsenm February 13, 2025 21:23

rampitec marked this pull request as ready for review February 13, 2025 21:23

llvmbot added the backend:AMDGPU label Feb 13, 2025

rampitec mentioned this pull request Feb 13, 2025

[AMDGPU] Respect MBB alignment in the getFunctionCodeSize() #127142

Merged

rampitec mentioned this pull request Feb 14, 2025

[AMDGPU] Switch to MF.estimateFunctionSizeInBytes() #127246

Open

arsenm reviewed Feb 17, 2025

View reviewed changes

llvm/lib/Target/AMDGPU/SIProgramInfo.cpp Outdated Show resolved Hide resolved

Base automatically changed from users/rampitec/02-13-_amdgpu_move_into_siprograminfo_and_cache_getfunctioncodesize._nfci to main February 18, 2025 02:22

[AMDGPU] Early bail in getFunctionCodeSize for meta inst. NFC.

faf1cf6

It does not change the estimate because getInstSizeInBytes() already returns 0 for meta instructions, but added a test and early bail.

rampitec force-pushed the users/rampitec/02-13-_amdgpu_early_bail_in_getfunctioncodesize_for_meta_inst._nfc branch from c048954 to faf1cf6 Compare February 18, 2025 08:45

arsenm approved these changes Feb 18, 2025

View reviewed changes

rampitec merged commit bc4f05d into main Feb 18, 2025
8 checks passed

rampitec deleted the users/rampitec/02-13-_amdgpu_early_bail_in_getfunctioncodesize_for_meta_inst._nfc branch February 18, 2025 10:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMDGPU] Early bail in getFunctionCodeSize for meta inst. NFC. #127129

[AMDGPU] Early bail in getFunctionCodeSize for meta inst. NFC. #127129

rampitec commented Feb 13, 2025

rampitec commented Feb 13, 2025 •

edited

Loading

llvmbot commented Feb 13, 2025

arsenm commented Feb 14, 2025

rampitec commented Feb 14, 2025

llvm-ci commented Feb 18, 2025

[AMDGPU] Early bail in getFunctionCodeSize for meta inst. NFC. #127129

[AMDGPU] Early bail in getFunctionCodeSize for meta inst. NFC. #127129

Conversation

rampitec commented Feb 13, 2025

rampitec commented Feb 13, 2025 • edited Loading

llvmbot commented Feb 13, 2025

arsenm commented Feb 14, 2025

rampitec commented Feb 14, 2025

llvm-ci commented Feb 18, 2025

rampitec commented Feb 13, 2025 •

edited

Loading