Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMDGPU] Early bail in getFunctionCodeSize for meta inst. NFC. #127129

Conversation

rampitec
Copy link
Collaborator

It does not change the estimate because getInstSizeInBytes() already
returns 0 for meta instructions, but added a test and early bail.

Copy link
Collaborator Author

rampitec commented Feb 13, 2025

@rampitec rampitec requested a review from arsenm February 13, 2025 21:23
@rampitec rampitec marked this pull request as ready for review February 13, 2025 21:23
@llvmbot
Copy link
Member

llvmbot commented Feb 13, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Stanislav Mekhanoshin (rampitec)

Changes

It does not change the estimate because getInstSizeInBytes() already
returns 0 for meta instructions, but added a test and early bail.


Full diff: https://github.com/llvm/llvm-project/pull/127129.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIProgramInfo.cpp (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/code-size-estimate.mir (+13)
diff --git a/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp b/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp
index 5179288084010..b995687e71780 100644
--- a/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp
@@ -216,7 +216,7 @@ uint64_t SIProgramInfo::getFunctionCodeSize(const MachineFunction &MF) {
       // TODO: CodeSize should account for multiple functions.
 
       // TODO: Should we count size of debug info?
-      if (MI.isDebugInstr())
+      if (MI.isDebugInstr() || MI.isMetaInstruction())
         continue;
 
       CodeSize += TII->getInstSizeInBytes(MI);
diff --git a/llvm/test/CodeGen/AMDGPU/code-size-estimate.mir b/llvm/test/CodeGen/AMDGPU/code-size-estimate.mir
index 9e46c58b6b5a9..76eaf350301e4 100644
--- a/llvm/test/CodeGen/AMDGPU/code-size-estimate.mir
+++ b/llvm/test/CodeGen/AMDGPU/code-size-estimate.mir
@@ -18,3 +18,16 @@ body:             |
   $vgpr16 = V_MOV_B32_indirect_read undef $vgpr1, implicit $exec, implicit $m0, implicit $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15
   V_MOV_B32_indirect_write undef $vgpr0, undef $vgpr3, implicit $exec, implicit $m0, implicit-def $vgpr0_vgpr1_vgpr2_vgpr3, implicit killed $vgpr0_vgpr1_vgpr2_vgpr3(tied-def 4)
 ...
+
+# CHECK: meta:                                   ; @meta
+# CHECK: ; wave barrier
+# CHECK: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; encoding: [0x00,0x00,0x8c,0xbf]
+# CHECK: ; codeLenInByte = 4
+---
+name:            meta
+tracksRegLiveness: true
+body:             |
+  bb.0:
+
+  WAVE_BARRIER
+...

@arsenm
Copy link
Contributor

arsenm commented Feb 14, 2025

There is also MF.estimateFunctionSizeInBytes(), probably should use that as a stop gap until MC computes this

@rampitec
Copy link
Collaborator Author

There is also MF.estimateFunctionSizeInBytes(), probably should use that as a stop gap until MC computes this

#127246

For some reason it is not const and also can overestimate code size.

Base automatically changed from users/rampitec/02-13-_amdgpu_move_into_siprograminfo_and_cache_getfunctioncodesize._nfci to main February 18, 2025 02:22
It does not change the estimate because getInstSizeInBytes() already
returns 0 for meta instructions, but added a test and early bail.
@rampitec rampitec force-pushed the users/rampitec/02-13-_amdgpu_early_bail_in_getfunctioncodesize_for_meta_inst._nfc branch from c048954 to faf1cf6 Compare February 18, 2025 08:45
@rampitec rampitec merged commit bc4f05d into main Feb 18, 2025
8 checks passed
@rampitec rampitec deleted the users/rampitec/02-13-_amdgpu_early_bail_in_getfunctioncodesize_for_meta_inst._nfc branch February 18, 2025 10:08
@llvm-ci
Copy link
Collaborator

llvm-ci commented Feb 18, 2025

LLVM Buildbot has detected a new failure on builder openmp-offload-amdgpu-runtime running on omp-vega20-0 while building llvm at step 7 "Add check check-offload".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/30/builds/15977

Here is the relevant piece of the build log for the reference
Step 7 (Add check check-offload) failure: test (failure)
...
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/bug50022.cpp (999 of 1008)
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/test_libc.cpp (1000 of 1008)
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/bug49779.cpp (1001 of 1008)
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/wtime.c (1002 of 1008)
PASS: libomptarget :: x86_64-unknown-linux-gnu :: offloading/bug49021.cpp (1003 of 1008)
PASS: libomptarget :: x86_64-unknown-linux-gnu :: offloading/std_complex_arithmetic.cpp (1004 of 1008)
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/complex_reduction.cpp (1005 of 1008)
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/bug49021.cpp (1006 of 1008)
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/std_complex_arithmetic.cpp (1007 of 1008)
TIMEOUT: libomptarget :: amdgcn-amd-amdhsa :: offloading/parallel_offloading_map.cpp (1008 of 1008)
******************** TEST 'libomptarget :: amdgcn-amd-amdhsa :: offloading/parallel_offloading_map.cpp' FAILED ********************
Exit Code: -9
Timeout: Reached timeout of 100 seconds

Command Output (stdout):
--
# RUN: at line 1
/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/clang++ -fopenmp    -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src  -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib  -fopenmp-targets=amdgcn-amd-amdhsa /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/parallel_offloading_map.cpp -o /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/Output/parallel_offloading_map.cpp.tmp /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a && /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/Output/parallel_offloading_map.cpp.tmp | /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/parallel_offloading_map.cpp
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/clang++ -fopenmp -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -fopenmp-targets=amdgcn-amd-amdhsa /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/parallel_offloading_map.cpp -o /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/Output/parallel_offloading_map.cpp.tmp /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a
# note: command had no output on stdout or stderr
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/Output/parallel_offloading_map.cpp.tmp
# note: command had no output on stdout or stderr
# error: command failed with exit status: -9
# error: command reached timeout: True
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/parallel_offloading_map.cpp
# note: command had no output on stdout or stderr
# error: command failed with exit status: -9
# error: command reached timeout: True

--

********************
Slowest Tests:
--------------------------------------------------------------------------
100.06s: libomptarget :: amdgcn-amd-amdhsa :: offloading/parallel_offloading_map.cpp
16.38s: libomptarget :: amdgcn-amd-amdhsa :: offloading/bug49021.cpp
12.49s: libomptarget :: amdgcn-amd-amdhsa :: offloading/parallel_target_teams_reduction_min.cpp
12.45s: libomptarget :: amdgcn-amd-amdhsa :: offloading/parallel_target_teams_reduction_max.cpp
10.78s: libomptarget :: amdgcn-amd-amdhsa :: offloading/complex_reduction.cpp
9.38s: libomptarget :: amdgcn-amd-amdhsa :: jit/empty_kernel_lvl2.c
8.90s: libomptarget :: x86_64-unknown-linux-gnu :: offloading/bug49021.cpp
7.70s: libomptarget :: amdgcn-amd-amdhsa :: offloading/ompx_saxpy_mixed.c
7.56s: libomptarget :: amdgcn-amd-amdhsa :: offloading/barrier_fence.c
7.35s: libomptarget :: x86_64-unknown-linux-gnu :: offloading/std_complex_arithmetic.cpp
7.31s: libomptarget :: x86_64-unknown-linux-gnu :: offloading/complex_reduction.cpp
6.55s: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/bug49021.cpp
5.94s: libomptarget :: amdgcn-amd-amdhsa :: offloading/parallel_target_teams_reduction.cpp
5.10s: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/std_complex_arithmetic.cpp
5.03s: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/complex_reduction.cpp

wldfngrs pushed a commit to wldfngrs/llvm-project that referenced this pull request Feb 19, 2025
…127129)

It does not change the estimate because getInstSizeInBytes() already
returns 0 for meta instructions, but added a test and early bail.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants