forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pull] main from llvm:main #5546
Open
pull
wants to merge
1,034
commits into
Ericsson:main
Choose a base branch
from
llvm:main
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
+1,079,138
−98,015
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…gument (#123971) Another case where the descriptor must be allocated with the CUF runtime and not a simple alloca instruction.
In EmitCXXNewAllocSize, when handling a constant array size, we were calling tryEmitAbstract with the type of the object being allocated rather than the expected type of the array size. This worked out because the allocated type was always a pointer and tryEmitAbstract only ends up using the size of the type to extend or truncate the constant, and in this case the destination type should be size_t, which is usually the same width as the pointer. This change fixes the type, but it makes no functional difference with the current constant emitter implementation.
We had two WriteRes for WriteJalr with difference latencies. Drop the duplicate and change the latency of Jal to 1 based on review feedback
This implements a suggestion by Craig in PR #123878. We can move the worklist management out of the per-instruction work and do it once at the end of scanning all the instructions. This should reduce repeat visitation of the same instruction when no changes can be made. Note that this does not remove the inherent O(N^2) in the algorithm. We're still potentially visiiting every user of every def. I also included a guard for unreachable blocks since that had been mentioned as a possible cause. It seems we've rulled that out, but guarding for this case is still a good idea.
An optimization was added that tries to move the uses of the mflr instruction away from the instruction itself. However, this doesn't work when we are using the hashst instruction because that instruction needs to be run before the stack frame is obtained. This patch disables moving instructions away from the mflr in the case where ROP protection is being used. --------- Co-authored-by: Lei Huang <[email protected]>
This fixes the slowdown in #123862.
Remove edge iterator parameters from the various helpers that move edges onto other nodes, and their associated iterator update code, and instead iterate over copies of the edge lists in the caller loops. This also avoids the need to increment these iterators at every early loop continue. This simplifies the code, makes it less error prone when updating, and in particular, facilitates adding handling of recursive contexts. There were no measurable compile time and memory overhead effects for a large target.
This patch changes PadOp's padding input to type !tosa.shape<2 * rank>, (where rank is the rank of the PadOp's input), instead of a <rank x 2> tensor. This patch is also a part of TOSA v1.0 effort: https://discourse.llvm.org/t/rfc-tosa-dialect-increment-to-v1-0/83708 This patch updates the PadOp to match all against the TOSA v1.0 form. Original Authors include: @Tai78641 @wonjeon Co-authored-by: Tai Ly <[email protected]>
Create separate resource initialization function for each resource and add them to CodeGenModule's `CXXGlobalInits` list. Fixes #120636 and addresses this [comment ](https://github.com/llvm/llvm-project/pull/119755/files#r1894093603).
True16 format for v_cmpx_class_f16. Update VOPCX_CLASS t16 and fake16 pseudo.
A bulk commit of true16 support for v_cmp_xx_i/u16 instructions including: v_cmpx_lt_i16 v_cmpx_eq_i16 v_cmpx_le_i16 v_cmpx_gt_i16 v_cmpx_ne_i16 v_cmpx_ge_i16 v_cmpx_lt_u16 v_cmpx_eq_u16 v_cmpx_le_u16 v_cmpx_gt_u16 v_cmpx_ne_u16 v_cmpx_ge_u16
This PR relands [#122992](#122992). Some machines were failing to run the `reflect-error.ll` test due to the RUN lines ```llvm ; RUN: not %if spirv-tools %{ llc -O0 -mtriple=spirv64-unknown-unknown %s -o /dev/null 2>&1 -filetype=obj %} ; RUN: not %if spirv-tools %{ llc -O0 -mtriple=spirv32-unknown-unknown %s -o /dev/null 2>&1 -filetype=obj %} ``` which failed when `spirv-tools` was not present on the machine due to running the command `not` without any arguments. These RUN lines have been removed since they don't actually test anything new compared to the other two RUN lines due to the expected error during instruction selection. ```llvm ; RUN: not llc -verify-machineinstrs -O0 -mtriple=spirv64-unknown-unknown %s -o /dev/null 2>&1 | FileCheck %s ; RUN: not llc -verify-machineinstrs -O0 -mtriple=spirv32-unknown-unknown %s -o /dev/null 2>&1 | FileCheck %s ```
A SYCL kernel entry point function is a non-member function or a static member function declared with the `sycl_kernel_entry_point` attribute. Such functions define a pattern for an offload kernel entry point function to be generated to enable execution of a SYCL kernel on a device. A SYCL library implementation orchestrates the invocation of these functions with corresponding SYCL kernel arguments in response to calls to SYCL kernel invocation functions specified by the SYCL 2020 specification. The offload kernel entry point function (sometimes referred to as the SYCL kernel caller function) is generated from the SYCL kernel entry point function by a transformation of the function parameters followed by a transformation of the function body to replace references to the original parameters with references to the transformed ones. Exactly how parameters are transformed will be explained in a future change that implements non-trivial transformations. For now, it suffices to state that a given parameter of the SYCL kernel entry point function may be transformed to multiple parameters of the offload kernel entry point as needed to satisfy offload kernel argument passing requirements. Parameters that are decomposed in this way are reconstituted as local variables in the body of the generated offload kernel entry point function. For example, given the following SYCL kernel entry point function definition: ``` template<typename KernelNameType, typename KernelType> [[clang::sycl_kernel_entry_point(KernelNameType)]] void sycl_kernel_entry_point(KernelType kernel) { kernel(); } ``` and the following call: ``` struct Kernel { int dm1; int dm2; void operator()() const; }; Kernel k; sycl_kernel_entry_point<class kernel_name>(k); ``` the corresponding offload kernel entry point function that is generated might look as follows (assuming `Kernel` is a type that requires decomposition): ``` void offload_kernel_entry_point_for_kernel_name(int dm1, int dm2) { Kernel kernel{dm1, dm2}; kernel(); } ``` Other details of the generated offload kernel entry point function, such as its name and calling convention, are implementation details that need not be reflected in the AST and may differ across target devices. For that reason, only the transformation described above is represented in the AST; other details will be filled in during code generation. These transformations are represented using new AST nodes introduced with this change. `OutlinedFunctionDecl` holds a sequence of `ImplicitParamDecl` nodes and a sequence of statement nodes that correspond to the transformed parameters and function body. `SYCLKernelCallStmt` wraps the original function body and associates it with an `OutlinedFunctionDecl` instance. For the example above, the AST generated for the `sycl_kernel_entry_point<kernel_name>` specialization would look as follows: ``` FunctionDecl 'sycl_kernel_entry_point<kernel_name>(Kernel)' TemplateArgument type 'kernel_name' TemplateArgument type 'Kernel' ParmVarDecl kernel 'Kernel' SYCLKernelCallStmt CompoundStmt <original statements> OutlinedFunctionDecl ImplicitParamDecl 'dm1' 'int' ImplicitParamDecl 'dm2' 'int' CompoundStmt VarDecl 'kernel' 'Kernel' <initialization of 'kernel' with 'dm1' and 'dm2'> <transformed statements with redirected references of 'kernel'> ``` Any ODR-use of the SYCL kernel entry point function will (with future changes) suffice for the offload kernel entry point to be emitted. An actual call to the SYCL kernel entry point function will result in a call to the function. However, evaluation of a `SYCLKernelCallStmt` statement is a no-op, so such calls will have no effect other than to trigger emission of the offload kernel entry point. Additionally, as a related change inspired by code review feedback, these changes disallow use of the `sycl_kernel_entry_point` attribute with functions defined with a _function-try-block_. The SYCL 2020 specification prohibits the use of C++ exceptions in device functions. Even if exceptions were not prohibited, it is unclear what the semantics would be for an exception that escapes the SYCL kernel entry point function; the boundary between host and device code could be an implicit noexcept boundary that results in program termination if violated, or the exception could perhaps be propagated to host code via the SYCL library. Pending support for C++ exceptions in device code and clear semantics for handling them at the host-device boundary, this change makes use of the `sycl_kernel_entry_point` attribute with a function defined with a _function-try-block_ an error.
…vance. NFC (#123876) Use this to improve performance of SubtargetEmitter::findWriteResources and SubtargetEmitter::findReadAdvance. Now we can do a map lookup instead of a linear search through all WriteRes/ReadAdvance records. This reduces the build time of RISCVGenSubtargetInfo.inc on my machine from 43 seconds to 10 seconds.
…d mtriple used when passing options into the translate API call (#123975) Rename internal command line flags for optimization level and mtriple used when passing options into the translate API call.
…r` in `TypeLocTypeMatcher` (#123450) There are no template in `TypeLocTypeMatcher`. So we do not need to use `DynTypedMatcher` which can improve performance
MSVC ignores the `/defArm64Native` argument on non-ARM64X targets. It is also ignored if the `/def` option is not specified.
Plumbs through creating file ranges to C and Python.
This changes the implementation of `__copy_cvref_t` to only template the implementation class on the `_From` parameter, avoiding instantiations for every combination of `_From` and `_To`.
Before this patch we might have emitted pack instructions in between PHI nodes. This patch fixes it by fixing the insert point of the new packs.
The stated return type was incorrect; this patch corrects it. More generally, it explains how the Offset and its components fits into the overall shadow mapping calculation.
…23307) - Add `I` to intrinsics and instructions - Add `_` before sbf16 in intrinsics Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965
The current MIR cycle sinking capabilities are rather limited. It only support sinking copies into a single successor block while obeying limits. This opt-in feature adds a more aggressive option, that is not limited to the above concerns. The feature will try to "sink" by duplicating any top-level preheader instruction (that we are sure is safe to sink) into any user block, then does some dead code cleanup. In particular, this is useful for high RP situations when loop bodies have control flow.
…124193) This resolves the `-Wignored-qualifiers` warning introduced by the new warnign in #121419. First caught in buildbot `ppc64le-lld-multistage-test` https://lab.llvm.org/buildbot/#/builders/168/builds/7756 --------- Co-authored-by: Henry Jiang <[email protected]>
This was left over from 408659c.
There are no subregister defs in SSA.
- Widen v2i8, v2i16 and v2i32 vectors so they don't cast back and forth, and make sure that instructions with correct data unit is being used. - Handle undef indices for VSHF when lowering VECTOR_SHUFFLE (it crashes if such index is present).
This patch fixes: llvm/lib/CodeGen/MachineSink.cpp:1667:22: error: unused variable 'Preheader' [-Werror,-Wunused-variable]
MaterializationUnits may contain arbitrary resources that need cleanup. We want to do this outside the JIT's session lock. This should fix a lock-order-inversion warning in clang-repl (for details see #124215).
This commit adds support for griddepcontrol PTX instruction with tests under griddepcontrol.ll
…arwin. See discussion in 4f0325873faccfbe1.
This patch fixes the scheduler's clear() function to also clear the ReadyList. Not doing so is a bug and results in crashes when the ReadyList contains stale instructions, because it was never clered.
This adds handling for raw and structured buffers when lowering resource access via `llvm.dx.resource.getpointer`. Fixes #121714
Don't insert a space between a type declaration r_paren and &/&&. Fixes #124073.
…24102) This PR adds amdgpu-sw-lower-lds pass to AMDGPUCodeGenPassBuilder::addIRPasses()
Linker relaxation is not implemented for jitlink now. But if relaxation is enabled by clang, R_LARCH_RELAX and R_LARCH_ALIGN relocations will be emitted. This commit adapts lld's algorithm to jitlink. Currently, only relaxing R_LARCH_ALIGN is implemented. Other relaxable relocs can be implemented in the future. Without this, interpreting C++ code using clang-repl or running ir using lli when relaxation is enabled will occur error: `JIT session error: Unsupported loongarch relocation:102: R_LARCH_ALIGN`. Similar to 310473c but only implement align.
…rmat..." This reverts 4f03258 and follow-up patches (see below) while I investigate some ongoing failures on the buildbots. --- Revert "[clang-repl] Try to XFAIL testcase on arm32 without affecting arm64 darwin." This reverts commit fd174f0. Revert "[clang-repl] The simple-exception test now passes on arm64-darwin." This reverts commit c9bc242. Revert "[ORC] Destroy defunct MaterializationUnits outside the session lock." This reverts commit a001cc0. Revert "[ORC] Add explicit narrowing casts to fix build errors." This reverts commit 26fc07d. Revert "[ORC] Enable JIT support for the compact-unwind frame info format on Darwin." This reverts commit 4f03258.
The convertFloorOp pattern incurs precision loss when floating-point numbers exceed the representable range of int64. This pattern should be removed. Fixes #119836
…#124159) Horizontal add (hadd) and subtract (hsub) are currently heuristically handled by `maybeHandleSimpleNomemIntrinsic()` (via `handleUnknownIntrinsic()`), which computes the shadow by bitwise OR'ing the two operands. This has false positives for hadd/hsub shadows. For example, suppose the shadows for the two operands are 00000000 and 11111111 respectively. The expected shadow for the result is 00001111, but `maybeHandleSimpleNomemIntrinsic` would compute it as 11111111. This patch handles horizontal add using `handleIntrinsicByApplyingToShadow` (from #114490), which has no false positives for hadd/hsub: if each pair of adjacent shadow values is zero (fully initialized), the result will be zero (fully initialized). More generally, it is precise for hadd/hsub if at least one of the two adjacent shadow values in each pair is zero. It does have some false negatives for hadd/hsub: if we add/subtract two adjacent non-zero shadow values, some bits of the result may incorrectly be zero. We consider this an acceptable tradeoff for performance. To make shadow propagation precise, we want the equivalent of "horizontal OR", but this is not available. Reducing horizontal OR to (permutation plus bitwise OR) is left as an exercise for the reader.
…mations for funnel shifts (#124175) We only had handling for cases where we had argument data.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by pull[bot] (v2.0.0-alpha.1)
Can you help keep this open source service alive? 💖 Please sponsor : )