[pull] main from llvm:main #5546

pull · 2025-01-16T01:14:23Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.1)

Can you help keep this open source service alive? 💖 Please sponsor : )

…114737) Scan directive allows to specify scan reductions within an worksharing loop, worksharing loop simd or simd directive which should have an `InScan` modifier associated with it. This change adds the mlir support for the same. Related PR: [Parsing and Semantic Support for scan](#102792)

…gument (#123971) Another case where the descriptor must be allocated with the CUF runtime and not a simple alloca instruction.

In EmitCXXNewAllocSize, when handling a constant array size, we were calling tryEmitAbstract with the type of the object being allocated rather than the expected type of the array size. This worked out because the allocated type was always a pointer and tryEmitAbstract only ends up using the size of the type to extend or truncate the constant, and in this case the destination type should be size_t, which is usually the same width as the pointer. This change fixes the type, but it makes no functional difference with the current constant emitter implementation.

…18771)" This reverts commit d7fb4a2. Buildbots failing: https://lab.llvm.org/buildbot/#/builders/169/builds/7671 https://lab.llvm.org/buildbot/#/builders/65/builds/11046

…verloaded 'operator->' in the current instantiation (#104458)" (#109422) Reapplies #104458, fixing a bug that occurs when a class member access expression calls an `operator->` operator function that returns a non-dependent class type.

…essor. (#123662) Attempting to collect loop guards for loops without a predecessor can lead to non-terminating recursion trying to construct a SCEV. Fixes #122913.

We had two WriteRes for WriteJalr with difference latencies. Drop the duplicate and change the latency of Jal to 1 based on review feedback

rdar://138554797

This implements a suggestion by Craig in PR #123878. We can move the worklist management out of the per-instruction work and do it once at the end of scanning all the instructions. This should reduce repeat visitation of the same instruction when no changes can be made. Note that this does not remove the inherent O(N^2) in the algorithm. We're still potentially visiiting every user of every def. I also included a guard for unreachable blocks since that had been mentioned as a possible cause. It seems we've rulled that out, but guarding for this case is still a good idea.

An optimization was added that tries to move the uses of the mflr instruction away from the instruction itself. However, this doesn't work when we are using the hashst instruction because that instruction needs to be run before the stack frame is obtained. This patch disables moving instructions away from the mflr in the case where ROP protection is being used. --------- Co-authored-by: Lei Huang <[email protected]>

This fixes the slowdown in #123862.

…ilding overloaded 'operator->' in the current instantiation (#104458)"" (#123982) Reverts #109422

Remove edge iterator parameters from the various helpers that move edges onto other nodes, and their associated iterator update code, and instead iterate over copies of the edge lists in the caller loops. This also avoids the need to increment these iterators at every early loop continue. This simplifies the code, makes it less error prone when updating, and in particular, facilitates adding handling of recursive contexts. There were no measurable compile time and memory overhead effects for a large target.

@Tai78641

This patch changes PadOp's padding input to type !tosa.shape<2 * rank>, (where rank is the rank of the PadOp's input), instead of a <rank x 2> tensor. This patch is also a part of TOSA v1.0 effort: https://discourse.llvm.org/t/rfc-tosa-dialect-increment-to-v1-0/83708 This patch updates the PadOp to match all against the TOSA v1.0 form. Original Authors include: @Tai78641 @wonjeon Co-authored-by: Tai Ly <[email protected]>

Create separate resource initialization function for each resource and add them to CodeGenModule's `CXXGlobalInits` list. Fixes #120636 and addresses this [comment ](https://github.com/llvm/llvm-project/pull/119755/files#r1894093603).

True16 format for v_cmpx_class_f16. Update VOPCX_CLASS t16 and fake16 pseudo.

A bulk commit of true16 support for v_cmp_xx_i/u16 instructions including: v_cmpx_lt_i16 v_cmpx_eq_i16 v_cmpx_le_i16 v_cmpx_gt_i16 v_cmpx_ne_i16 v_cmpx_ge_i16 v_cmpx_lt_u16 v_cmpx_eq_u16 v_cmpx_le_u16 v_cmpx_gt_u16 v_cmpx_ne_u16 v_cmpx_ge_u16

…rs (#112860) Fixes #112677

This PR relands [#122992](#122992). Some machines were failing to run the `reflect-error.ll` test due to the RUN lines ```llvm ; RUN: not %if spirv-tools %{ llc -O0 -mtriple=spirv64-unknown-unknown %s -o /dev/null 2>&1 -filetype=obj %} ; RUN: not %if spirv-tools %{ llc -O0 -mtriple=spirv32-unknown-unknown %s -o /dev/null 2>&1 -filetype=obj %} ``` which failed when `spirv-tools` was not present on the machine due to running the command `not` without any arguments. These RUN lines have been removed since they don't actually test anything new compared to the other two RUN lines due to the expected error during instruction selection. ```llvm ; RUN: not llc -verify-machineinstrs -O0 -mtriple=spirv64-unknown-unknown %s -o /dev/null 2>&1 | FileCheck %s ; RUN: not llc -verify-machineinstrs -O0 -mtriple=spirv32-unknown-unknown %s -o /dev/null 2>&1 | FileCheck %s ```

A SYCL kernel entry point function is a non-member function or a static member function declared with the `sycl_kernel_entry_point` attribute. Such functions define a pattern for an offload kernel entry point function to be generated to enable execution of a SYCL kernel on a device. A SYCL library implementation orchestrates the invocation of these functions with corresponding SYCL kernel arguments in response to calls to SYCL kernel invocation functions specified by the SYCL 2020 specification. The offload kernel entry point function (sometimes referred to as the SYCL kernel caller function) is generated from the SYCL kernel entry point function by a transformation of the function parameters followed by a transformation of the function body to replace references to the original parameters with references to the transformed ones. Exactly how parameters are transformed will be explained in a future change that implements non-trivial transformations. For now, it suffices to state that a given parameter of the SYCL kernel entry point function may be transformed to multiple parameters of the offload kernel entry point as needed to satisfy offload kernel argument passing requirements. Parameters that are decomposed in this way are reconstituted as local variables in the body of the generated offload kernel entry point function. For example, given the following SYCL kernel entry point function definition: ``` template<typename KernelNameType, typename KernelType> [[clang::sycl_kernel_entry_point(KernelNameType)]] void sycl_kernel_entry_point(KernelType kernel) { kernel(); } ``` and the following call: ``` struct Kernel { int dm1; int dm2; void operator()() const; }; Kernel k; sycl_kernel_entry_point<class kernel_name>(k); ``` the corresponding offload kernel entry point function that is generated might look as follows (assuming `Kernel` is a type that requires decomposition): ``` void offload_kernel_entry_point_for_kernel_name(int dm1, int dm2) { Kernel kernel{dm1, dm2}; kernel(); } ``` Other details of the generated offload kernel entry point function, such as its name and calling convention, are implementation details that need not be reflected in the AST and may differ across target devices. For that reason, only the transformation described above is represented in the AST; other details will be filled in during code generation. These transformations are represented using new AST nodes introduced with this change. `OutlinedFunctionDecl` holds a sequence of `ImplicitParamDecl` nodes and a sequence of statement nodes that correspond to the transformed parameters and function body. `SYCLKernelCallStmt` wraps the original function body and associates it with an `OutlinedFunctionDecl` instance. For the example above, the AST generated for the `sycl_kernel_entry_point<kernel_name>` specialization would look as follows: ``` FunctionDecl 'sycl_kernel_entry_point<kernel_name>(Kernel)' TemplateArgument type 'kernel_name' TemplateArgument type 'Kernel' ParmVarDecl kernel 'Kernel' SYCLKernelCallStmt CompoundStmt <original statements> OutlinedFunctionDecl ImplicitParamDecl 'dm1' 'int' ImplicitParamDecl 'dm2' 'int' CompoundStmt VarDecl 'kernel' 'Kernel' <initialization of 'kernel' with 'dm1' and 'dm2'> <transformed statements with redirected references of 'kernel'> ``` Any ODR-use of the SYCL kernel entry point function will (with future changes) suffice for the offload kernel entry point to be emitted. An actual call to the SYCL kernel entry point function will result in a call to the function. However, evaluation of a `SYCLKernelCallStmt` statement is a no-op, so such calls will have no effect other than to trigger emission of the offload kernel entry point. Additionally, as a related change inspired by code review feedback, these changes disallow use of the `sycl_kernel_entry_point` attribute with functions defined with a _function-try-block_. The SYCL 2020 specification prohibits the use of C++ exceptions in device functions. Even if exceptions were not prohibited, it is unclear what the semantics would be for an exception that escapes the SYCL kernel entry point function; the boundary between host and device code could be an implicit noexcept boundary that results in program termination if violated, or the exception could perhaps be propagated to host code via the SYCL library. Pending support for C++ exceptions in device code and clear semantics for handling them at the host-device boundary, this change makes use of the `sycl_kernel_entry_point` attribute with a function defined with a _function-try-block_ an error.

…vance. NFC (#123876) Use this to improve performance of SubtargetEmitter::findWriteResources and SubtargetEmitter::findReadAdvance. Now we can do a map lookup instead of a linear search through all WriteRes/ReadAdvance records. This reduces the build time of RISCVGenSubtargetInfo.inc on my machine from 43 seconds to 10 seconds.

…d mtriple used when passing options into the translate API call (#123975) Rename internal command line flags for optimization level and mtriple used when passing options into the translate API call.

…r` in `TypeLocTypeMatcher` (#123450) There are no template in `TypeLocTypeMatcher`. So we do not need to use `DynTypedMatcher` which can improve performance

…ion files (#123849)

MSVC ignores the `/defArm64Native` argument on non-ARM64X targets. It is also ignored if the `/def` option is not specified.

Plumbs through creating file ranges to C and Python.

This changes the implementation of `__copy_cvref_t` to only template the implementation class on the `_From` parameter, avoiding instantiations for every combination of `_From` and `_To`.

Before this patch we might have emitted pack instructions in between PHI nodes. This patch fixes it by fixing the insert point of the new packs.

The stated return type was incorrect; this patch corrects it. More generally, it explains how the Offset and its components fits into the overall shadow mapping calculation.

…23307) - Add `I` to intrinsics and instructions - Add `_` before sbf16 in intrinsics Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965

The current MIR cycle sinking capabilities are rather limited. It only support sinking copies into a single successor block while obeying limits. This opt-in feature adds a more aggressive option, that is not limited to the above concerns. The feature will try to "sink" by duplicating any top-level preheader instruction (that we are sure is safe to sink) into any user block, then does some dead code cleanup. In particular, this is useful for high RP situations when loop bodies have control flow.

…124080) Fixes #124079

…124193) This resolves the `-Wignored-qualifiers` warning introduced by the new warnign in #121419. First caught in buildbot `ppc64le-lld-multistage-test` https://lab.llvm.org/buildbot/#/builders/168/builds/7756 --------- Co-authored-by: Henry Jiang <[email protected]>

…ructions. (#124200) Fixes #123387

This was left over from 408659c.

There are no subregister defs in SSA.

…123931) Close #123815 See the comments for details. We can't get primary context arbitrarily since the redecl may have different context and information. There is a TODO for modules specific case, I'd like to make it after this PR.

- Widen v2i8, v2i16 and v2i32 vectors so they don't cast back and forth, and make sure that instructions with correct data unit is being used. - Handle undef indices for VSHF when lowering VECTOR_SHUFFLE (it crashes if such index is present).

This patch fixes: llvm/lib/CodeGen/MachineSink.cpp:1667:22: error: unused variable 'Preheader' [-Werror,-Wunused-variable]

MaterializationUnits may contain arbitrary resources that need cleanup. We want to do this outside the JIT's session lock. This should fix a lock-order-inversion warning in clang-repl (for details see #124215).

This commit adds support for griddepcontrol PTX instruction with tests under griddepcontrol.ll

…arwin. See discussion in 4f0325873faccfbe1.

This patch fixes the scheduler's clear() function to also clear the ReadyList. Not doing so is a bug and results in crashes when the ReadyList contains stale instructions, because it was never clered.

This adds handling for raw and structured buffers when lowering resource access via `llvm.dx.resource.getpointer`. Fixes #121714

Don't insert a space between a type declaration r_paren and &/&&. Fixes #124073.

…24102) This PR adds amdgpu-sw-lower-lds pass to AMDGPUCodeGenPassBuilder::addIRPasses()

Linker relaxation is not implemented for jitlink now. But if relaxation is enabled by clang, R_LARCH_RELAX and R_LARCH_ALIGN relocations will be emitted. This commit adapts lld's algorithm to jitlink. Currently, only relaxing R_LARCH_ALIGN is implemented. Other relaxable relocs can be implemented in the future. Without this, interpreting C++ code using clang-repl or running ir using lli when relaxation is enabled will occur error: `JIT session error: Unsupported loongarch relocation:102: R_LARCH_ALIGN`. Similar to 310473c but only implement align.

…rmat..." This reverts 4f03258 and follow-up patches (see below) while I investigate some ongoing failures on the buildbots. --- Revert "[clang-repl] Try to XFAIL testcase on arm32 without affecting arm64 darwin." This reverts commit fd174f0. Revert "[clang-repl] The simple-exception test now passes on arm64-darwin." This reverts commit c9bc242. Revert "[ORC] Destroy defunct MaterializationUnits outside the session lock." This reverts commit a001cc0. Revert "[ORC] Add explicit narrowing casts to fix build errors." This reverts commit 26fc07d. Revert "[ORC] Enable JIT support for the compact-unwind frame info format on Darwin." This reverts commit 4f03258.

The convertFloorOp pattern incurs precision loss when floating-point numbers exceed the representable range of int64. This pattern should be removed. Fixes #119836

…#124159) Horizontal add (hadd) and subtract (hsub) are currently heuristically handled by `maybeHandleSimpleNomemIntrinsic()` (via `handleUnknownIntrinsic()`), which computes the shadow by bitwise OR'ing the two operands. This has false positives for hadd/hsub shadows. For example, suppose the shadows for the two operands are 00000000 and 11111111 respectively. The expected shadow for the result is 00001111, but `maybeHandleSimpleNomemIntrinsic` would compute it as 11111111. This patch handles horizontal add using `handleIntrinsicByApplyingToShadow` (from #114490), which has no false positives for hadd/hsub: if each pair of adjacent shadow values is zero (fully initialized), the result will be zero (fully initialized). More generally, it is precise for hadd/hsub if at least one of the two adjacent shadow values in each pair is zero. It does have some false negatives for hadd/hsub: if we add/subtract two adjacent non-zero shadow values, some bits of the result may incorrectly be zero. We consider this an acceptable tradeoff for performance. To make shadow propagation precise, we want the equivalent of "horizontal OR", but this is not available. Reducing horizontal OR to (permutation plus bitwise OR) is left as an exercise for the reader.

…mations for funnel shifts (#124175) We only had handling for cases where we had argument data.

…122262)

…PACKUS lowering (#123956) If the NSW/NUW flags are present, then we can assume the source value is within bounds and saturation will not occur with the PACKSS/PACKUS instructions. Fixes #87485

pull bot added the ⤵️ pull label Jan 16, 2025

anchuraj and others added 29 commits January 22, 2025 09:53

[flang][cuda] Allocate descriptor in managed memory on rebox block ar…

9f83c4e

…gument (#123971) Another case where the descriptor must be allocated with the CUF runtime and not a simple alloca instruction.

Revert "[LLVM][Clang][AArch64] Implement AArch64 build attributes (#1…

b40739a

…18771)" This reverts commit d7fb4a2. Buildbots failing: https://lab.llvm.org/buildbot/#/builders/169/builds/7671 https://lab.llvm.org/buildbot/#/builders/65/builds/11046

[SCEV] Do not attempt to collect loop guards for loops without predec…

137d706

…essor. (#123662) Attempting to collect loop guards for loops without a predecessor can lead to non-terminating recursion trying to construct a SCEV. Fixes #122913.

[RISCV] Remove duplicate WriteRes<WriteJalr for MIPSP8700. (#123865)

146ee98

We had two WriteRes for WriteJalr with difference latencies. Drop the duplicate and change the latency of Jal to 1 based on review feedback

[llvm][Support] Only enable backtrace test when it's enabled (#123852)

ec15b24

rdar://138554797

[RISCV][VLOPT] Don't reduce the VL is the same as CommonVL (#123878)

1687aa2

This fixes the slowdown in #123862.

Revert "Reapply "[Clang][Sema] Use the correct lookup context when bu…

5ede7b6

…ilding overloaded 'operator->' in the current instantiation (#104458)"" (#123982) Reverts #109422

Android defaults to pic (#123955)

3057d0f

[HLSL] Fix global resource initialization (#123394)

719f0d9

Create separate resource initialization function for each resource and add them to CodeGenModule's `CXXGlobalInits` list. Fixes #120636 and addresses this [comment ](https://github.com/llvm/llvm-project/pull/119755/files#r1894093603).

[AMDGPU][True16][MC] true16 for v_cmpx_class_f16 (#123251)

1cf0af3

True16 format for v_cmpx_class_f16. Update VOPCX_CLASS t16 and fake16 pseudo.

[Clang] Fix handling of immediate escalation for inherited constructo…

213e03c

…rs (#112860) Fixes #112677

[flang][cuda][NFC] Add kernel name in translation error (#123987)

c6e7b4a

[SPIR-V] Rename internal command line flags for optimization level an…

ac94fad

…d mtriple used when passing options into the translate API call (#123975) Rename internal command line flags for optimization level and mtriple used when passing options into the translate API call.

[ASTMatchers][NFC] use Matcher<QualType> instead of `DynTypedMatche…

68c6b2e

…r` in `TypeLocTypeMatcher` (#123450) There are no template in `TypeLocTypeMatcher`. So we do not need to use `DynTypedMatcher` which can improve performance

[LLD][COFF] Use EC symbol table for exports defined in module definit…

a2c683b

…ion files (#123849)

[LLD][COFF] Add support for the -defArm64Native argument (#123850)

4e9d5a3

MSVC ignores the `/defArm64Native` argument on non-ARM64X targets. It is also ignored if the `/def` option is not specified.

[mlir] Add C and Python interface for file range (#123276)

a77250f

Plumbs through creating file ranges to C and Python.

[libc++] Avoid unnecessary instantiations for __copy_cvref_t (#123718)

223bd0c

This changes the implementation of `__copy_cvref_t` to only template the implementation class on the `_From` parameter, avoiding instantiations for every combination of `_From` and `_To`.

vporpo and others added 30 commits January 23, 2025 16:29

[SandboxVec][BottomUpVec] Fix packing when PHIs are present (#124206)

d2234ca

Before this patch we might have emitted pack instructions in between PHI nodes. This patch fixes it by fixing the insert point of the new packs.

[msan][NFC] Correct and clarify comment for getShadowPtrOffset()

969eb4e

The stated return type was incorrect; this patch corrects it. More generally, it explains how the Offset and its components fits into the overall shadow mapping calculation.

[X86][AVX10.2-BF16] Update VCOMISBF16 intrinsics and instructions (#1…

24f177d

…23307) - Add `I` to intrinsics and instructions - Add `_` before sbf16 in intrinsics Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965

[clang][test] Add .cuh as a recognized extension for lit test files (#…

0013264

…124080) Fixes #124079

[libc] Use -fno-math-errno to for __builtin_fma* to generate fma inst…

b11529b

…ructions. (#124200) Fixes #123387

[RISCV][NFC] Remove Redundant Inline Asm Logic (#124202)

e06b703

This was left over from 408659c.

MachineCSE: Remove check for subreg on a def operand (#124095)

0ef39a8

There are no subregister defs in SSA.

[CodeGen] Fix a warning

9fecb4f

This patch fixes: llvm/lib/CodeGen/MachineSink.cpp:1667:22: error: unused variable 'Preheader' [-Werror,-Wunused-variable]

[ORC] Destroy defunct MaterializationUnits outside the session lock.

a001cc0

MaterializationUnits may contain arbitrary resources that need cleanup. We want to do this outside the JIT's session lock. This should fix a lock-order-inversion warning in clang-repl (for details see #124215).

[clang-repl] The simple-exception test now passes on arm64-darwin.

c9bc242

[LLVM][NVPTX] Add support for griddepcontrol instruction (#123511)

435609b

This commit adds support for griddepcontrol PTX instruction with tests under griddepcontrol.ll

[clang-repl] Try to XFAIL testcase on arm32 without affecting arm64 d…

fd174f0

…arwin. See discussion in 4f0325873faccfbe1.

[compiler-rt][rtsan] preadv(64)/pwritev(64) interception. (#124115)

02a3004

[compiler-rt][rtsan] inotify api for Linux interception. (#124177)

f3d2e75

[SandboxVec][Scheduler] Fix clear() to clear all state (#124214)

6db73fa

This patch fixes the scheduler's clear() function to also clear the ReadyList. Not doing so is a bug and results in crashes when the ReadyList contains stale instructions, because it was never clered.

[DirectX] Handle dx.RawBuffer in DXILResourceAccess (#121725)

2f39d13

This adds handling for raw and structured buffers when lowering resource access via `llvm.dx.resource.getpointer`. Fixes #121714

[clang-format] Fix a regression in PointerAlignment: Left (#124085)

6330f1e

Don't insert a space between a type declaration r_paren and &/&&. Fixes #124073.

[AMDGPU] Add amdgpu-sw-lower-lds pass to NPM codegen addIRPasses. (#1…

3c79a04

…24102) This PR adds amdgpu-sw-lower-lds pass to AMDGPUCodeGenPassBuilder::addIRPasses()

[NewPM] LiveIntervals: Check dependencies for invalidation (#123563)

a9c61e0

[mlir] [math] Fix the precision issue of expand math (#120865)

45d83ae

The convertFloorOp pattern incurs precision loss when floating-point numbers exceed the representable range of int64. This pattern should be removed. Fixes #119836

[CostModel] getTypeBasedIntrinsicInstrCost - add default cost approxi…

b84b717

…mations for funnel shifts (#124175) We only had handling for cases where we had argument data.

[JITLink][LoongArch] Add label addition and subtraction relocations (#…

f6253f8

…122262)

[X86] Use NSW/NUW flags on ISD::TRUNCATE nodes to improve X86 PACKSS/…

ddd2f57

…PACKUS lowering (#123956) If the NSW/NUW flags are present, then we can assume the source value is within bounds and saturation will not occur with the PACKSS/PACKUS instructions. Fixes #87485

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] main from llvm:main #5546

[pull] main from llvm:main #5546

pull bot commented Jan 16, 2025 •

edited

Loading

[pull] main from llvm:main #5546

Are you sure you want to change the base?

[pull] main from llvm:main #5546

Conversation

pull bot commented Jan 16, 2025 • edited Loading

pull bot commented Jan 16, 2025 •

edited

Loading