[SYCLomatic] Fix sycl to SYCLomatic pull down failed. #2148

daiyaan-ahmed6 · 2024-07-10T02:21:04Z

[SYCLomatic] Fix sycl to SYCLomatic pull down failed

Signed-off-by: Daiyaan Ahmed [email protected]

…#96514) This PR fixes llvm/llvm-project#96513. The way of creation of array type constant was incorrect: instead of creating [1, 1, 1] or [1, 1, 1, 1, 1, ....] constants, the same [1] constant was always created, substituting original composite constants. This in its turn led to a situation when only one of constants might exist in the code without emitting invalid code, the second constant would be eventually rewritten to the first constant, because a key to address both was an array of a single element (like [1]). This PR fixes the issue and purges from the code unneeded copy/pasted clone of the function that creates an array constant.

…nes (#95269) This patch extends #73964 and adds optimisation of load SVE intrinsics when predicate is zero.

The handling of `PointerType` is similar to `HeapType`. The only difference is that allocated flag is generated for `HeapType` and associated flag for `PointerType`. The tests for pointer to allocatable strings are disabled for now. I will enable them once #95906 is merged. The debugging in GDB looks like this: integer, pointer :: par2(:) integer, target, allocatable :: ar2(:) integer, target :: sc integer, pointer :: psc allocate(ar2(4)) par2 => ar2 psc => sc 19 par2 => ar2 (gdb) p par2 $3 = <not associated> (gdb) n 20 do i=1,5 (gdb) p par2 $4 = (0, 0, 0, 0) (gdb) ptype par2 type = integer (4) (gdb) p sc $5 = 3 (gdb) p psc $6 = (PTR TO -> ( integer )) 0x7fffffffda24 (gdb) p *psc $7 = 3

…ing for generic types (#89217) This patch is intended to be the first of a series with end goal to adapt atomic optimizer pass to support i64 and f64 operations (along with removing all unnecessary bitcasts). This legalizes 64 bit readlane, writelane and readfirstlane ops pre-ISel --------- Co-authored-by: vikramRH <[email protected]>

This change adds methods like buildGetFPEnv and similar for opcodes that represent manipulation on floating-point state.

This changes the behaviour in C++03 mode because we'll now use the builtin on Clang, but I don't think that's much of a problem.

This header used three-space indentation in a number of places. Reformat it completely.

This FIXME has already been addressed in #89358

Instead for iterating over all VFs when computing costs, simply iterate over the VFs available in the created VPlans. Split off from llvm/llvm-project#92555. This also prepares for moving the check if any vector instructions will be generated to be based on VPlan, to unblock recommitting llvm/llvm-project#92555.

Without the store, the vector loop body is empty. Add a store to avoid that, while not impacting the induction resume values that are created.

This patch implements lowering of the GlobalAddress, BlockAddress, JumpTable and BR_JT. Also patch adds legal support of the BR_CC operation for i32 type.

Some of these are just old, while others previously did not use UTC due to missing features that have since been implemented (such as signature matching).

These will be replaced later.

…Target` (#96500)

Since we mark the pseudos as mayLoad but do not provide any MMOs, isSafeToMove conservatively returns false, stopping MachineLICM from hoisting the instructions. PseudoLA_TLS_{LD,GD} does not actually expand to a load, so stop marking that as mayLoad to allow it to be hoisted, and for the others make sure to add MMOs during lowering to indicate they're GOT loads and thus can be freely moved.

… (#95061) This patch augments the HIPAMD driver to allow it to target AMDGCN flavoured SPIR-V compilation. It's mostly straightforward, as we re-use some of the existing SPIRV infra, however there are a few notable additions: - we introduce an `amdgcnspirv` offload arch, rather than relying on using `generic` (this is already fairly overloaded) or simply using `spirv` or `spirv64` (we'll want to use these to denote unflavoured SPIRV, once we bring up that capability) - initially it is won't be possible to mix-in SPIR-V and concrete AMDGPU targets, as it would require some relatively intrusive surgery in the HIPAMD Toolchain and the Driver to deal with two triples (`spirv64-amd-amdhsa` and `amdgcn-amd-amdhsa`, respectively) - in order to retain user provided compiler flags and have them available at JIT time, we rely on embedding the command line via `-fembed-bitcode=marker`, which the bitcode writer had previously not implemented for SPIRV; we only allow it conditionally for AMDGCN flavoured SPIRV, and it is handled correctly by the Translator (it ends up as a string literal) Once the SPIRV BE is no longer experimental we'll switch to using that rather than the translator. There's some additional work that'll come via a separate PR around correctly piping through AMDGCN's implementation of `printf`, for now we merely handle its flags correctly.

Extends llvm/llvm-project#95403 to handle non-constant cases - we can avoid unpacks/extensions from vXi8 to vXi16 by using PMADDUBSW instead and truncating the vXi16 results back together. Most targets benefit from performing this for non-constant cases - its just Intel Core/SandyBridge era CPUs that might experience additional Port0/15 contention (but lower instruction count). Fixes llvm/llvm-project#90748

This adds a new pattern that can legalize a multi-tile transfer_write as a single store loop. This is done as part of type decomposition as at this level we know each tile write is disjoint, but that information is lost after decomposition (without analysis to reconstruct it). Example (pseudo-MLIR): ``` vector.transfer_write %vector, %dest[%y, %x], %mask : vector<[16]x[8]xi16>, memref<?x?xi16> ``` Is rewritten to: ``` scf.for %slice_idx = %c0 to %c8_vscale step %c1 { %upper_slice_mask = vector.extract %mask[%slice_idx] ─┐ : vector<[8]xi1> from vector<[16]x[8]xi1> | %upper_slice = vector.extract %upper_tile[%slice_idx] |- Store upper tile : vector<[8]xi16> from vector<[8]x[8]xi16> | vector.transfer_write %upper_slice, | %dest[%slice_idx + %y, %x], %upper_slice_mask | : vector<[8]xi16>, memref<?x?xi16> ┘ %lower_slice_idx = %slice_idx + %c8_vscale ─┐ %lower_slice_mask = vector.extract %mask[%lower_slice_idx] | : vector<[8]xi1> from vector<[16]x[8]xi1> | %lower_slice = vector.extract %lower_tile[%slice_idx] |- Store lower : vector<[8]xi16> from vector<[8]x[8]xi16> | tile vector.transfer_write %lower_slice, | %dest[%lower_slice_idx + %y, %x], %lower_slice_mask | : vector<[8]xi16>, memref<?x?xi16> ┘ } ```

…(#95424) Summary: This patch adds `CMAKE_REQUIRED_FLAGS` for the GPU build so checks like `check_cxx_compiler_flags` work as expected. This is required because we need to hack around the potential lack of `nvlink` and `ptxas` for NVPTX targets and the fact that the AMDGPU target needs `-nogpulib` to avoid errors on lack of ROCm. This makes a few of the checks pass and also allows us to just check `-mcpu=native` for architecture detection instead of finding the tools manually.

Introduce a Loop::getLocStr stolen from LoopVectorize's static function getDebugLocString in order to have uniform debug output headers across LoopVectorize, LoopAccessAnalysis, and LoopDistribute. The motivation for this change is to have UpdateTestChecks recognize the headers and automatically generate CHECK lines for debug output, with minimal special-casing.

-1 mask elements are specified to return poison rather than undef nowadays , so update the reorderScalars() implementation to match.

The lowering of copyprivate clauses with allocatable or pointer variables was incorrect. This happened because the values passed to copyVar() are always wrapped in SymbolBox::Intrinsic, which resulted in allocatable/pointer variables being handled as regular ones. This is fixed by providing to copyVar() the attributes of the variables being copied, to make it possible to detect and handle allocatable/pointer variables correctly. Fixes #95801

I've not added any new tests for these, because the original conditions were wrong (they did not consider streaming mode) and we have tests for the positive cases.

And rename an argument to avoid an upper/lowercase clash.

…ster_group (#14113) This PR is a partial implementation of [`sycl_ext_oneapi_cuda_cluster_group`](intel/llvm#13594), introducing the `cluster_size` property to launch a kernel with CUDA's thread block clusters Only a small part of the extension specification described in intel/llvm#13594 is used in this implementation. To be specific everything after the section "Launching a kernel with a `cluster_group`" is not included in this PR. A very important point to note is that this PR still fully represents a functional use case of using Nvidia's cuda driver cluster launch feature for its primary purpose which is to accelerate cross-work-group collective operations (particularly for GEMM), leveraging cross-work group asynchronous multi-casting of distributed shared memory across work-groups. This is a high priority feature that is targeted for the next release. The other parts of the extension specification described in intel/llvm#13594, primarily related to the "cluster_group" abstraction is a (user-facing) convenience abstraction that is not required to be exposed in libraries that optimize such library collective operations (GEMM). Please therefore focus reviews of this PR on the relevant aspects of the extension that are required for the implementation in this PR and the library based application of it as described in this message. --------- Signed-off-by: JackAKirk <[email protected]>

…roup.hpp` (#13760) Not needed anymore after deprecated shuffles/collectives were removed.

Co-authored-by: Kenneth Benzie (Benie) <[email protected]>

We cannot currently update the kernel binary of a node in Whole Graph Update. Rather than silently accepting inconsistent kernel functions, which indicates the graphs aren't topologically identical, throw an error when the kernel types of two nodes are mismatched. This change requires removing the unittest for barrier nodes in Whole Graph Update as the mock infrastructure does not setup the internal `CG` class to the depth required to test working functionality. This functionality is already covered by `test-e2e/Graph/Update/whole_update_barrier_node.cpp`

…n (#14407) pre-commit PR for oneapi-src/unified-runtime#1809 --------- Signed-off-by: Neil R. Spruit <[email protected]> Co-authored-by: Kenneth Benzie (Benie) <[email protected]>

This PR: 1. Adds a new Intel FPGA experimental SYCL extension that introduces support for `task_sequences` which provides a sub kernel task level parallelism interface. 2. Updates the fpga_kernel_interface SYCL extension to add the `stall_enable_clusters` and `stall_free_clusters` properties. These are used with task sequences as well. --------- Co-authored-by: Steffen Larsen <[email protected]> Co-authored-by: GarveyJoe <[email protected]> Co-authored-by: Abhishek Tiwari <[email protected]> Co-authored-by: Xue, Bowen <[email protected]> Co-authored-by: Adel Ejjeh <[email protected]> Co-authored-by: Ejjeh, Adel <[email protected]>

Co-authored-by: Kenneth Benzie (Benie) <[email protected]>

…496)

…424) It would appear that I missed a file in #14175. I looked over the files again and did not catch anything else, but do let me know if there is more.

Couldn't do this earlier because of a driver issue. Signed-off-by: Sarnie, Nick <[email protected]>

Bump certifi to 2024.07.04 Certifi 2024.07.04 removes root certificates from "GLOBALTRUST" from the root store. These are in the process of being removed from Mozilla's trust store. GLOBALTRUST's root certificates are being removed pursuant to an investigation which identified "long-running and unresolved compliance issues". Conclusions of Mozilla's investigation can be found [here](https://groups.google.com/a/mozilla.org/g/dev-security-policy/c/XpknYMPO8dI). Upgrading will resolve this issue.

I missed this in intel/llvm@4a87b2c so the change didn't actually do anything. Signed-off-by: Sarnie, Nick <[email protected]>

@aelovikov-intel

…(#14425) I verified manually this option fixes the issue that was being tracked internally. Thanks to @aelovikov-intel for help finding the option that was required. Signed-off-by: Sarnie, Nick <[email protected]>

Needed until KhronosGroup/SYCL-CTS#907 is merged Signed-off-by: Sarnie, Nick <[email protected]>

This is mostly relevant when sycl-ls might be run in the context of really old drivers/lzloader. It is a workaround for some Win test systems.

One of the SYCL 1.2 exception subclasses being removed during the ABI breaking window since these subclasses have been removed in SYCL 2020. We still want to be able to carry information about backend error code, so `MPIErr` data member in `sycl::exception` remains. However, since none of the standard ctors can accept it (per the specification), I've introduced a new helper `detail::set_pi_error` to set it. It should only be used when propagating an error comming from the PI/UR level and *NOT* for the error conditions originated in the SYCL RT (which is a change from previous implementation but a logical/justifiable one).

`assert.cpp`: replace use of deprecated `get_info<info::device::extensions>` with `opencl::has_extension` `string_test.cpp`: replace use of deprecated `get_pointer` with `get_multi_ptr`

Not removing the exception subclass itself as that would result in merge conflicts with our PRs.

…tamps (#14360) UR PR: oneapi-src/unified-runtime#1806

The design of SYCLConditionalCallOnDevicePass pass is described in https://github.com/intel/llvm/blob/01fa4f8a439ea9b7d932d8099e1e1a17c630c1ea/sycl/doc/design/DeviceIf.md#new-ir-pass

Signed-off-by: Ahmed, Daiyaan <[email protected]>

VyacheslavLevytskyy and others added 30 commits June 25, 2024 10:56

[AArch64][SVE] optimisation for SVE load intrinsics with no active la…

0bd9c49

…nes (#95269) This patch extends #73964 and adds optimisation of load SVE intrinsics when predicate is zero.

[GlobalISel] Add build methods for FP environment intrinsics (#96607)

f9795f3

This change adds methods like buildGetFPEnv and similar for opcodes that represent manipulation on floating-point state.

[libc++] Use __is_nothrow_destructible (#95766)

16d02cd

This changes the behaviour in C++03 mode because we'll now use the builtin on Clang, but I don't think that's much of a problem.

[SetOperations] clang-format header (NFC)

29f4a05

This header used three-space indentation in a number of places. Reformat it completely.

[clang] Remove a stale FIXME

f09b024

This FIXME has already been addressed in #89358

[LoopUnroll] Use poison instead of undef for preheader value

eeb0884

[LV] Make create-induction-resume.ll more robust by adding store.

a2e9157

Without the store, the vector loop body is empty. Add a store to avoid that, while not impacting the induction resume values that are created.

[LoopUnroll] Use poison instead of undef for another preheader value

37c736e

[Xtensa] Lower GlobalAddress/BlockAddress/JumpTable (#95256)

cc8fdd6

This patch implements lowering of the GlobalAddress, BlockAddress, JumpTable and BR_JT. Also patch adds legal support of the BR_CC operation for i32 type.

[SCCP] Generate test checks (NFC)

4acc8ee

Some of these are just old, while others previously did not use UTC due to missing features that have since been implemented (such as signature matching).

[SCCP] Use poison instead of undef when zapping returns

16bb8c1

[Reassociate] Use poison instead of undef for dummy operands (NFCI)

35eef9f

These will be replaced later.

[NFC][lld][ELF] Remove unused sec param of `ObjFile<ELFT>::getReloc…

65f9601

…Target` (#96500)

[VPlanTest] Use poison instead of undef for dummy values (NFC)

9952e00

[VectorBuilderTest] Use poison instead of undef for dummy values (NFC)

68efc50

[SLP] Use poison instead of undef in reorderScalars() (#96619)

8263bec

-1 mask elements are specified to return poison rather than undef nowadays , so update the reorderScalars() implementation to match.

[AArch64] Remove all instances of the 'hasSVEorSME' interfaces. (#96543)

c436649

I've not added any new tests for these, because the original conditions were wrong (they did not consider streaming mode) and we have tests for the positive cases.

[InstSimplify] Fold all poison phi to poison instead of undef

c5aa983

[InstCombine] Add poison variant to phi test (NFC)

4780dc3

And rename an argument to avoid an upper/lowercase clash.

AD2605 and others added 25 commits July 8, 2024 08:33

[SYCL] Drop sycl/ext/oneapi/functional.hpp include from `sycl/sub_g…

4ab1e4b

…roup.hpp` (#13760) Not needed anymore after deprecated shuffles/collectives were removed.

[UR] Bump UR to use new definition of ur_native_handle_t (#13705)

af2221f

Co-authored-by: Kenneth Benzie (Benie) <[email protected]>

[UR][L0] Fix Device Info return code to report unsupported enumeratio…

eb03091

…n (#14407) pre-commit PR for oneapi-src/unified-runtime#1809 --------- Signed-off-by: Neil R. Spruit <[email protected]> Co-authored-by: Kenneth Benzie (Benie) <[email protected]>

[SYCL][ESIMD] [ABI Break] Remove deprecated ESIMD API (#14415)

b30e6f0

[UR] Update main tag for L0 CommandBuffer refactor (#14240)

ac556f9

Co-authored-by: Kenneth Benzie (Benie) <[email protected]>

[ABI-Break][SYCL] Remove deprecated property use_primary_context (#13…

dbddb82

…496)

[SYCL] Move more non device code checks out of check_device_code (#14…

bd31be7

…424) It would appear that I missed a file in #14175. I looked over the files again and did not catch anything else, but do let me know if there is more.

[SYCL][ESIMD] Move fmax to SPIR-V intrinsic (#14020)

1f1be9c

Couldn't do this earlier because of a driver issue. Signed-off-by: Sarnie, Nick <[email protected]>

[CI] Fix nightly extra lit options (#14482)

47f3dee

I missed this in intel/llvm@4a87b2c so the change didn't actually do anything. Signed-off-by: Sarnie, Nick <[email protected]>

[CI] Fix nightly CTS build (#14477)

2555800

Needed until KhronosGroup/SYCL-CTS#907 is merged Signed-off-by: Sarnie, Nick <[email protected]>

[SYCL] suppression of error dialogs in Win sycl-ls. (#14479)

be34c68

This is mostly relevant when sycl-ls might be run in the context of really old drivers/lzloader. It is a workaround for some Win test systems.

[SYCL][E2E] Fix deprecated warnings in DeviceLib tests (#14428)

f40fd84

`assert.cpp`: replace use of deprecated `get_info<info::device::extensions>` with `opencl::has_extension` `string_test.cpp`: replace use of deprecated `get_pointer` with `get_multi_ptr`

[SYCL] Remove uses of nd_range_error exception (#14488)

6224978

Not removing the exception subclass itself as that would result in merge conflicts with our PRs.

[clang][NFC] Remove dead code (#14445)

356ac30

[UR] Uplift UR tag to the fix in the L0 adapter regarding event times…

78ae397

…tamps (#14360) UR PR: oneapi-src/unified-runtime#1806

[SYCL][NFC] Add missing ocloc requirements to E2E tests (#14443)

1fcc1cf

[SYCL] Implement SYCLConditionalCallOnDevicePass pass (#14228)

19e471f

The design of SYCLConditionalCallOnDevicePass pass is described in https://github.com/intel/llvm/blob/01fa4f8a439ea9b7d932d8099e1e1a17c630c1ea/sycl/doc/design/DeviceIf.md#new-ir-pass

Weekly pulldown from origin/sycl to origin/SYCLomatic

b33b58c

[SYCLomatic] Fix sycl to SYCLomatic pull down failed

8300c79

Signed-off-by: Ahmed, Daiyaan <[email protected]>

daiyaan-ahmed6 requested a review from a team as a code owner July 10, 2024 02:21

daiyaan-ahmed6 requested review from the-slow-one and TejaX-Alaghari July 10, 2024 02:21

daiyaan-ahmed6 added 2 commits July 11, 2024 11:28

Merge branch 'SYCLomatic' into pulldown-fix-base-20240710

1cc178e

Fix breaking changes in llvm/sycl pulldown in DPCT

bb359b1

Signed-off-by: Ahmed, Daiyaan <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCLomatic] Fix sycl to SYCLomatic pull down failed. #2148

[SYCLomatic] Fix sycl to SYCLomatic pull down failed. #2148

daiyaan-ahmed6 commented Jul 10, 2024

[SYCLomatic] Fix sycl to SYCLomatic pull down failed. #2148

Are you sure you want to change the base?

[SYCLomatic] Fix sycl to SYCLomatic pull down failed. #2148

Conversation

daiyaan-ahmed6 commented Jul 10, 2024