Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] main from llvm:main #5546

Open
wants to merge 1,034 commits into
base: main
Choose a base branch
from
Open

[pull] main from llvm:main #5546

wants to merge 1,034 commits into from

Conversation

pull[bot]
Copy link

@pull pull bot commented Jan 16, 2025

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.1)

Can you help keep this open source service alive? 💖 Please sponsor : )

@pull pull bot added the ⤵️ pull label Jan 16, 2025
anchuraj and others added 29 commits January 22, 2025 09:53
…114737)

Scan directive allows to specify scan reductions within an worksharing
loop, worksharing loop simd or simd directive which should have an
`InScan` modifier associated with it. This change adds the mlir support
for the same.

Related PR: [Parsing and Semantic Support for
scan](#102792)
…gument (#123971)

Another case where the descriptor must be allocated with the CUF runtime
and not a simple alloca instruction.
In EmitCXXNewAllocSize, when handling a constant array size, we were
calling tryEmitAbstract with the type of the object being allocated rather
than the expected type of the array size. This worked out because the
allocated type was always a pointer and tryEmitAbstract only ends up
using the size of the type to extend or truncate the constant, and in this
case the destination type should be size_t, which is usually the same
width as the pointer. This change fixes the type, but it makes no
functional difference with the current constant emitter implementation.
…verloaded 'operator->' in the current instantiation (#104458)" (#109422)

Reapplies #104458, fixing a bug that occurs when a class member access expression calls an `operator->` operator function that returns a non-dependent class type.
…essor. (#123662)

Attempting to collect loop guards for loops without a predecessor can
lead to non-terminating recursion trying to construct a SCEV.

Fixes #122913.
We had two WriteRes for WriteJalr with difference latencies. Drop the
duplicate and change the latency of Jal to 1 based on review feedback
This implements a suggestion by Craig in PR #123878. We can move the
worklist management out of the per-instruction work and do it once at
the end of scanning all the instructions. This should reduce repeat
visitation of the same instruction when no changes can be made.

Note that this does not remove the inherent O(N^2) in the algorithm.
We're still potentially visiiting every user of every def.

I also included a guard for unreachable blocks since that had been
mentioned as a possible cause. It seems we've rulled that out, but
guarding for this case is still a good idea.
An optimization was added that tries to move the uses of the mflr
instruction away from the instruction itself. However, this doesn't work
when we are using the hashst instruction because that instruction needs
to be run before the stack frame is obtained.

This patch disables moving instructions away from the mflr in the case
where ROP protection is being used.

---------

Co-authored-by: Lei Huang <[email protected]>
…ilding overloaded 'operator->' in the current instantiation (#104458)"" (#123982)

Reverts #109422
Remove edge iterator parameters from the various helpers that move edges
onto other nodes, and their associated iterator update code, and instead
iterate over copies of the edge lists in the caller loops. This also
avoids the need to increment these iterators at every early loop
continue.

This simplifies the code, makes it less error prone when updating, and
in particular, facilitates adding handling of recursive contexts.

There were no measurable compile time and memory overhead effects for a
large target.
This patch changes PadOp's padding input to type !tosa.shape<2 * rank>,
(where rank is the rank of the PadOp's input), instead of a <rank x 2>
tensor.

This patch is also a part of TOSA v1.0 effort:
https://discourse.llvm.org/t/rfc-tosa-dialect-increment-to-v1-0/83708

This patch updates the PadOp to match all against the TOSA v1.0 form. 

Original Authors include: 
@Tai78641 
@wonjeon

Co-authored-by: Tai Ly <[email protected]>
Create separate resource initialization function for each resource and
add them to CodeGenModule's `CXXGlobalInits` list.
Fixes #120636 and addresses this [comment
](https://github.com/llvm/llvm-project/pull/119755/files#r1894093603).
True16 format for v_cmpx_class_f16. Update VOPCX_CLASS t16 and fake16
pseudo.
A bulk commit of true16 support for v_cmp_xx_i/u16 instructions
including:

v_cmpx_lt_i16
v_cmpx_eq_i16
v_cmpx_le_i16
v_cmpx_gt_i16
v_cmpx_ne_i16
v_cmpx_ge_i16
v_cmpx_lt_u16
v_cmpx_eq_u16
v_cmpx_le_u16
v_cmpx_gt_u16
v_cmpx_ne_u16
v_cmpx_ge_u16
This PR relands
[#122992](#122992).

Some machines were failing to run the `reflect-error.ll` test due to the
RUN lines
```llvm
; RUN: not %if spirv-tools %{ llc -O0 -mtriple=spirv64-unknown-unknown %s -o /dev/null 2>&1 -filetype=obj %}
; RUN: not %if spirv-tools %{ llc -O0 -mtriple=spirv32-unknown-unknown %s -o /dev/null 2>&1 -filetype=obj %}
```
which failed when `spirv-tools` was not present on the machine due to
running the command `not` without any arguments.

These RUN lines have been removed since they don't actually test
anything new compared to the other two RUN lines due to the expected
error during instruction selection.
```llvm
; RUN: not llc -verify-machineinstrs -O0 -mtriple=spirv64-unknown-unknown %s -o /dev/null 2>&1 | FileCheck %s
; RUN: not llc -verify-machineinstrs -O0 -mtriple=spirv32-unknown-unknown %s -o /dev/null 2>&1 | FileCheck %s
```
A SYCL kernel entry point function is a non-member function or a static
member function declared with the `sycl_kernel_entry_point` attribute.
Such functions define a pattern for an offload kernel entry point
function to be generated to enable execution of a SYCL kernel on a
device. A SYCL library implementation orchestrates the invocation of
these functions with corresponding SYCL kernel arguments in response to
calls to SYCL kernel invocation functions specified by the SYCL 2020
specification.

The offload kernel entry point function (sometimes referred to as the
SYCL kernel caller function) is generated from the SYCL kernel entry
point function by a transformation of the function parameters followed
by a transformation of the function body to replace references to the
original parameters with references to the transformed ones. Exactly how
parameters are transformed will be explained in a future change that
implements non-trivial transformations. For now, it suffices to state
that a given parameter of the SYCL kernel entry point function may be
transformed to multiple parameters of the offload kernel entry point as
needed to satisfy offload kernel argument passing requirements.
Parameters that are decomposed in this way are reconstituted as local
variables in the body of the generated offload kernel entry point
function.

For example, given the following SYCL kernel entry point function
definition:
```
template<typename KernelNameType, typename KernelType>
[[clang::sycl_kernel_entry_point(KernelNameType)]]
void sycl_kernel_entry_point(KernelType kernel) {
  kernel();
}
```

and the following call:
```
struct Kernel {
  int dm1;
  int dm2;
  void operator()() const;
};
Kernel k;
sycl_kernel_entry_point<class kernel_name>(k);
```

the corresponding offload kernel entry point function that is generated
might look as follows (assuming `Kernel` is a type that requires
decomposition):
```
void offload_kernel_entry_point_for_kernel_name(int dm1, int dm2) {
  Kernel kernel{dm1, dm2};
  kernel();
}
```

Other details of the generated offload kernel entry point function, such
as its name and calling convention, are implementation details that need
not be reflected in the AST and may differ across target devices. For
that reason, only the transformation described above is represented in
the AST; other details will be filled in during code generation.

These transformations are represented using new AST nodes introduced
with this change. `OutlinedFunctionDecl` holds a sequence of
`ImplicitParamDecl` nodes and a sequence of statement nodes that
correspond to the transformed parameters and function body.
`SYCLKernelCallStmt` wraps the original function body and associates it
with an `OutlinedFunctionDecl` instance. For the example above, the AST
generated for the `sycl_kernel_entry_point<kernel_name>` specialization
would look as follows:
```
FunctionDecl 'sycl_kernel_entry_point<kernel_name>(Kernel)'
  TemplateArgument type 'kernel_name'
  TemplateArgument type 'Kernel'
  ParmVarDecl kernel 'Kernel'
  SYCLKernelCallStmt
    CompoundStmt
      <original statements>
    OutlinedFunctionDecl
      ImplicitParamDecl 'dm1' 'int'
      ImplicitParamDecl 'dm2' 'int'
      CompoundStmt
        VarDecl 'kernel' 'Kernel'
          <initialization of 'kernel' with 'dm1' and 'dm2'>
        <transformed statements with redirected references of 'kernel'>
```

Any ODR-use of the SYCL kernel entry point function will (with future
changes) suffice for the offload kernel entry point to be emitted. An
actual call to the SYCL kernel entry point function will result in a
call to the function. However, evaluation of a `SYCLKernelCallStmt`
statement is a no-op, so such calls will have no effect other than to
trigger emission of the offload kernel entry point.

Additionally, as a related change inspired by code review feedback,
these changes disallow use of the `sycl_kernel_entry_point` attribute
with functions defined with a _function-try-block_. The SYCL 2020
specification prohibits the use of C++ exceptions in device functions.
Even if exceptions were not prohibited, it is unclear what the semantics
would be for an exception that escapes the SYCL kernel entry point
function; the boundary between host and device code could be an implicit
noexcept boundary that results in program termination if violated, or
the exception could perhaps be propagated to host code via the SYCL
library. Pending support for C++ exceptions in device code and clear
semantics for handling them at the host-device boundary, this change
makes use of the `sycl_kernel_entry_point` attribute with a function
defined with a _function-try-block_ an error.
…vance. NFC (#123876)

Use this to improve performance of SubtargetEmitter::findWriteResources
and SubtargetEmitter::findReadAdvance. Now we can do a map lookup
instead of a linear search through all WriteRes/ReadAdvance records.
    
This reduces the build time of RISCVGenSubtargetInfo.inc on my
machine from 43 seconds to 10 seconds.
…d mtriple used when passing options into the translate API call (#123975)

Rename internal command line flags for optimization level and mtriple
used when passing options into the translate API call.
…r` in `TypeLocTypeMatcher` (#123450)

There are no template in `TypeLocTypeMatcher`. So we do not need to use
`DynTypedMatcher` which can improve performance
MSVC ignores the `/defArm64Native` argument on non-ARM64X targets.
It is also ignored if the `/def` option is not specified.
Plumbs through creating file ranges to C and Python.
This changes the implementation of `__copy_cvref_t` to only template the
implementation class on the `_From` parameter, avoiding instantiations
for every combination of `_From` and `_To`.
vporpo and others added 30 commits January 23, 2025 16:29
Before this patch we might have emitted pack instructions in between PHI
nodes. This patch fixes it by fixing the insert point of the new packs.
The stated return type was incorrect; this patch corrects it. More generally, it explains how the Offset and its components fits into the overall shadow mapping calculation.
…23307)

- Add `I` to intrinsics and instructions
- Add `_` before sbf16 in intrinsics

Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965
The current MIR cycle sinking capabilities are rather limited. It only
support sinking copies into a single successor block while obeying
limits.

This opt-in feature adds a more aggressive option, that is not limited
to the above concerns. The feature will try to "sink" by duplicating any
top-level preheader instruction (that we are sure is safe to sink) into
any user block, then does some dead code cleanup. In particular, this is
useful for high RP situations when loop bodies have control flow.
…124193)

This resolves the `-Wignored-qualifiers` warning introduced by the new
warnign in #121419. First
caught in buildbot `ppc64le-lld-multistage-test`

https://lab.llvm.org/buildbot/#/builders/168/builds/7756

---------

Co-authored-by: Henry Jiang <[email protected]>
…123931)

Close #123815

See the comments for details. We can't get primary context arbitrarily
since the redecl may have different context and information.

There is a TODO for modules specific case, I'd like to make it after
this PR.
- Widen v2i8, v2i16 and v2i32 vectors so they don't cast back and forth,
and make sure that instructions with correct data unit is being used.
- Handle undef indices for VSHF when lowering VECTOR_SHUFFLE (it crashes
if such index is present).
This patch fixes:

  llvm/lib/CodeGen/MachineSink.cpp:1667:22: error: unused variable
  'Preheader' [-Werror,-Wunused-variable]
MaterializationUnits may contain arbitrary resources that need cleanup. We want
to do this outside the JIT's session lock.

This should fix a lock-order-inversion warning in clang-repl (for details see
#124215).
This commit adds support for griddepcontrol PTX instruction with tests
under griddepcontrol.ll
This patch fixes the scheduler's clear() function to also clear the
ReadyList. Not doing so is a bug and results in crashes when the
ReadyList contains stale instructions, because it was never clered.
This adds handling for raw and structured buffers when lowering resource
access via `llvm.dx.resource.getpointer`.

Fixes #121714
Don't insert a space between a type declaration r_paren and &/&&.

Fixes #124073.
…24102)

This PR adds amdgpu-sw-lower-lds pass to
AMDGPUCodeGenPassBuilder::addIRPasses()
Linker relaxation is not implemented for jitlink now. But if
relaxation is enabled by clang, R_LARCH_RELAX and
R_LARCH_ALIGN relocations will be emitted.

This commit adapts lld's algorithm to jitlink. Currently, only
relaxing R_LARCH_ALIGN is implemented. Other relaxable
relocs can be implemented in the future.

Without this, interpreting C++ code using clang-repl or running
ir using lli when relaxation is enabled will occur error: `JIT
session error: Unsupported loongarch relocation:102: R_LARCH_ALIGN`.

Similar to 310473c but only implement align.
…rmat..."

This reverts 4f03258 and follow-up patches
(see below) while I investigate some ongoing failures on the buildbots.

---

Revert "[clang-repl] Try to XFAIL testcase on arm32 without affecting arm64
darwin."

This reverts commit fd174f0.

Revert "[clang-repl] The simple-exception test now passes on arm64-darwin."

This reverts commit c9bc242.

Revert "[ORC] Destroy defunct MaterializationUnits outside the session lock."

This reverts commit a001cc0.

Revert "[ORC] Add explicit narrowing casts to fix build errors."

This reverts commit 26fc07d.

Revert "[ORC] Enable JIT support for the compact-unwind frame info format on
Darwin."

This reverts commit 4f03258.
The convertFloorOp pattern incurs precision loss when floating-point
numbers exceed the representable range of int64. This pattern should be
removed.

Fixes #119836
…#124159)

Horizontal add (hadd) and subtract (hsub) are currently heuristically
handled by `maybeHandleSimpleNomemIntrinsic()` (via
`handleUnknownIntrinsic()`), which computes the shadow by bitwise OR'ing
the two operands. This has false positives for hadd/hsub shadows. For
example, suppose the shadows for the two operands are 00000000 and
11111111 respectively. The expected shadow for the result is 00001111,
but `maybeHandleSimpleNomemIntrinsic` would compute it as 11111111.

This patch handles horizontal add using
`handleIntrinsicByApplyingToShadow` (from
#114490), which has no false
positives for hadd/hsub: if each pair of adjacent shadow values is zero
(fully initialized), the result will be zero (fully initialized). More
generally, it is precise for hadd/hsub if at least one of the two
adjacent shadow values in each pair is zero.

It does have some false negatives for hadd/hsub: if we add/subtract two
adjacent non-zero shadow values, some bits of the result may incorrectly
be zero. We consider this an acceptable tradeoff for performance. To
make shadow propagation precise, we want the equivalent of "horizontal
OR", but this is not available. Reducing horizontal OR to (permutation
plus bitwise OR) is left as an exercise for the reader.
…mations for funnel shifts (#124175)

We only had handling for cases where we had argument data.
…PACKUS lowering (#123956)

If the NSW/NUW flags are present, then we can assume the source value is within bounds and saturation will not occur with the PACKSS/PACKUS instructions.

Fixes #87485
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment