-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCLomatic] Fix sycl to SYCLomatic pull down failed. #2148
Open
daiyaan-ahmed6
wants to merge
3,158
commits into
oneapi-src:SYCLomatic
Choose a base branch
from
daiyaan-ahmed6:pulldown-fix-base-20240710
base: SYCLomatic
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
[SYCLomatic] Fix sycl to SYCLomatic pull down failed. #2148
daiyaan-ahmed6
wants to merge
3,158
commits into
oneapi-src:SYCLomatic
from
daiyaan-ahmed6:pulldown-fix-base-20240710
+608,383
−191,409
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…#96514) This PR fixes llvm/llvm-project#96513. The way of creation of array type constant was incorrect: instead of creating [1, 1, 1] or [1, 1, 1, 1, 1, ....] constants, the same [1] constant was always created, substituting original composite constants. This in its turn led to a situation when only one of constants might exist in the code without emitting invalid code, the second constant would be eventually rewritten to the first constant, because a key to address both was an array of a single element (like [1]). This PR fixes the issue and purges from the code unneeded copy/pasted clone of the function that creates an array constant.
…nes (#95269) This patch extends #73964 and adds optimisation of load SVE intrinsics when predicate is zero.
The handling of `PointerType` is similar to `HeapType`. The only difference is that allocated flag is generated for `HeapType` and associated flag for `PointerType`. The tests for pointer to allocatable strings are disabled for now. I will enable them once #95906 is merged. The debugging in GDB looks like this: integer, pointer :: par2(:) integer, target, allocatable :: ar2(:) integer, target :: sc integer, pointer :: psc allocate(ar2(4)) par2 => ar2 psc => sc 19 par2 => ar2 (gdb) p par2 $3 = <not associated> (gdb) n 20 do i=1,5 (gdb) p par2 $4 = (0, 0, 0, 0) (gdb) ptype par2 type = integer (4) (gdb) p sc $5 = 3 (gdb) p psc $6 = (PTR TO -> ( integer )) 0x7fffffffda24 (gdb) p *psc $7 = 3
…ing for generic types (#89217) This patch is intended to be the first of a series with end goal to adapt atomic optimizer pass to support i64 and f64 operations (along with removing all unnecessary bitcasts). This legalizes 64 bit readlane, writelane and readfirstlane ops pre-ISel --------- Co-authored-by: vikramRH <[email protected]>
This change adds methods like buildGetFPEnv and similar for opcodes that represent manipulation on floating-point state.
This changes the behaviour in C++03 mode because we'll now use the builtin on Clang, but I don't think that's much of a problem.
This header used three-space indentation in a number of places. Reformat it completely.
This FIXME has already been addressed in #89358
Instead for iterating over all VFs when computing costs, simply iterate over the VFs available in the created VPlans. Split off from llvm/llvm-project#92555. This also prepares for moving the check if any vector instructions will be generated to be based on VPlan, to unblock recommitting llvm/llvm-project#92555.
Without the store, the vector loop body is empty. Add a store to avoid that, while not impacting the induction resume values that are created.
This patch implements lowering of the GlobalAddress, BlockAddress, JumpTable and BR_JT. Also patch adds legal support of the BR_CC operation for i32 type.
Some of these are just old, while others previously did not use UTC due to missing features that have since been implemented (such as signature matching).
These will be replaced later.
…Target` (#96500)
Since we mark the pseudos as mayLoad but do not provide any MMOs, isSafeToMove conservatively returns false, stopping MachineLICM from hoisting the instructions. PseudoLA_TLS_{LD,GD} does not actually expand to a load, so stop marking that as mayLoad to allow it to be hoisted, and for the others make sure to add MMOs during lowering to indicate they're GOT loads and thus can be freely moved.
… (#95061) This patch augments the HIPAMD driver to allow it to target AMDGCN flavoured SPIR-V compilation. It's mostly straightforward, as we re-use some of the existing SPIRV infra, however there are a few notable additions: - we introduce an `amdgcnspirv` offload arch, rather than relying on using `generic` (this is already fairly overloaded) or simply using `spirv` or `spirv64` (we'll want to use these to denote unflavoured SPIRV, once we bring up that capability) - initially it is won't be possible to mix-in SPIR-V and concrete AMDGPU targets, as it would require some relatively intrusive surgery in the HIPAMD Toolchain and the Driver to deal with two triples (`spirv64-amd-amdhsa` and `amdgcn-amd-amdhsa`, respectively) - in order to retain user provided compiler flags and have them available at JIT time, we rely on embedding the command line via `-fembed-bitcode=marker`, which the bitcode writer had previously not implemented for SPIRV; we only allow it conditionally for AMDGCN flavoured SPIRV, and it is handled correctly by the Translator (it ends up as a string literal) Once the SPIRV BE is no longer experimental we'll switch to using that rather than the translator. There's some additional work that'll come via a separate PR around correctly piping through AMDGCN's implementation of `printf`, for now we merely handle its flags correctly.
Extends llvm/llvm-project#95403 to handle non-constant cases - we can avoid unpacks/extensions from vXi8 to vXi16 by using PMADDUBSW instead and truncating the vXi16 results back together. Most targets benefit from performing this for non-constant cases - its just Intel Core/SandyBridge era CPUs that might experience additional Port0/15 contention (but lower instruction count). Fixes llvm/llvm-project#90748
This adds a new pattern that can legalize a multi-tile transfer_write as a single store loop. This is done as part of type decomposition as at this level we know each tile write is disjoint, but that information is lost after decomposition (without analysis to reconstruct it). Example (pseudo-MLIR): ``` vector.transfer_write %vector, %dest[%y, %x], %mask : vector<[16]x[8]xi16>, memref<?x?xi16> ``` Is rewritten to: ``` scf.for %slice_idx = %c0 to %c8_vscale step %c1 { %upper_slice_mask = vector.extract %mask[%slice_idx] ─┐ : vector<[8]xi1> from vector<[16]x[8]xi1> | %upper_slice = vector.extract %upper_tile[%slice_idx] |- Store upper tile : vector<[8]xi16> from vector<[8]x[8]xi16> | vector.transfer_write %upper_slice, | %dest[%slice_idx + %y, %x], %upper_slice_mask | : vector<[8]xi16>, memref<?x?xi16> ┘ %lower_slice_idx = %slice_idx + %c8_vscale ─┐ %lower_slice_mask = vector.extract %mask[%lower_slice_idx] | : vector<[8]xi1> from vector<[16]x[8]xi1> | %lower_slice = vector.extract %lower_tile[%slice_idx] |- Store lower : vector<[8]xi16> from vector<[8]x[8]xi16> | tile vector.transfer_write %lower_slice, | %dest[%lower_slice_idx + %y, %x], %lower_slice_mask | : vector<[8]xi16>, memref<?x?xi16> ┘ } ```
…(#95424) Summary: This patch adds `CMAKE_REQUIRED_FLAGS` for the GPU build so checks like `check_cxx_compiler_flags` work as expected. This is required because we need to hack around the potential lack of `nvlink` and `ptxas` for NVPTX targets and the fact that the AMDGPU target needs `-nogpulib` to avoid errors on lack of ROCm. This makes a few of the checks pass and also allows us to just check `-mcpu=native` for architecture detection instead of finding the tools manually.
Introduce a Loop::getLocStr stolen from LoopVectorize's static function getDebugLocString in order to have uniform debug output headers across LoopVectorize, LoopAccessAnalysis, and LoopDistribute. The motivation for this change is to have UpdateTestChecks recognize the headers and automatically generate CHECK lines for debug output, with minimal special-casing.
-1 mask elements are specified to return poison rather than undef nowadays , so update the reorderScalars() implementation to match.
The lowering of copyprivate clauses with allocatable or pointer variables was incorrect. This happened because the values passed to copyVar() are always wrapped in SymbolBox::Intrinsic, which resulted in allocatable/pointer variables being handled as regular ones. This is fixed by providing to copyVar() the attributes of the variables being copied, to make it possible to detect and handle allocatable/pointer variables correctly. Fixes #95801
I've not added any new tests for these, because the original conditions were wrong (they did not consider streaming mode) and we have tests for the positive cases.
And rename an argument to avoid an upper/lowercase clash.
…ster_group (#14113) This PR is a partial implementation of [`sycl_ext_oneapi_cuda_cluster_group`](intel/llvm#13594), introducing the `cluster_size` property to launch a kernel with CUDA's thread block clusters Only a small part of the extension specification described in intel/llvm#13594 is used in this implementation. To be specific everything after the section "Launching a kernel with a `cluster_group`" is not included in this PR. A very important point to note is that this PR still fully represents a functional use case of using Nvidia's cuda driver cluster launch feature for its primary purpose which is to accelerate cross-work-group collective operations (particularly for GEMM), leveraging cross-work group asynchronous multi-casting of distributed shared memory across work-groups. This is a high priority feature that is targeted for the next release. The other parts of the extension specification described in intel/llvm#13594, primarily related to the "cluster_group" abstraction is a (user-facing) convenience abstraction that is not required to be exposed in libraries that optimize such library collective operations (GEMM). Please therefore focus reviews of this PR on the relevant aspects of the extension that are required for the implementation in this PR and the library based application of it as described in this message. --------- Signed-off-by: JackAKirk <[email protected]>
…roup.hpp` (#13760) Not needed anymore after deprecated shuffles/collectives were removed.
Co-authored-by: Kenneth Benzie (Benie) <[email protected]>
We cannot currently update the kernel binary of a node in Whole Graph Update. Rather than silently accepting inconsistent kernel functions, which indicates the graphs aren't topologically identical, throw an error when the kernel types of two nodes are mismatched. This change requires removing the unittest for barrier nodes in Whole Graph Update as the mock infrastructure does not setup the internal `CG` class to the depth required to test working functionality. This functionality is already covered by `test-e2e/Graph/Update/whole_update_barrier_node.cpp`
…n (#14407) pre-commit PR for oneapi-src/unified-runtime#1809 --------- Signed-off-by: Neil R. Spruit <[email protected]> Co-authored-by: Kenneth Benzie (Benie) <[email protected]>
This PR: 1. Adds a new Intel FPGA experimental SYCL extension that introduces support for `task_sequences` which provides a sub kernel task level parallelism interface. 2. Updates the fpga_kernel_interface SYCL extension to add the `stall_enable_clusters` and `stall_free_clusters` properties. These are used with task sequences as well. --------- Co-authored-by: Steffen Larsen <[email protected]> Co-authored-by: GarveyJoe <[email protected]> Co-authored-by: Abhishek Tiwari <[email protected]> Co-authored-by: Xue, Bowen <[email protected]> Co-authored-by: Adel Ejjeh <[email protected]> Co-authored-by: Ejjeh, Adel <[email protected]>
Co-authored-by: Kenneth Benzie (Benie) <[email protected]>
…424) It would appear that I missed a file in #14175. I looked over the files again and did not catch anything else, but do let me know if there is more.
Couldn't do this earlier because of a driver issue. Signed-off-by: Sarnie, Nick <[email protected]>
Bump certifi to 2024.07.04 Certifi 2024.07.04 removes root certificates from "GLOBALTRUST" from the root store. These are in the process of being removed from Mozilla's trust store. GLOBALTRUST's root certificates are being removed pursuant to an investigation which identified "long-running and unresolved compliance issues". Conclusions of Mozilla's investigation can be found [here](https://groups.google.com/a/mozilla.org/g/dev-security-policy/c/XpknYMPO8dI). Upgrading will resolve this issue.
I missed this in intel/llvm@4a87b2c so the change didn't actually do anything. Signed-off-by: Sarnie, Nick <[email protected]>
…(#14425) I verified manually this option fixes the issue that was being tracked internally. Thanks to @aelovikov-intel for help finding the option that was required. Signed-off-by: Sarnie, Nick <[email protected]>
Needed until KhronosGroup/SYCL-CTS#907 is merged Signed-off-by: Sarnie, Nick <[email protected]>
This is mostly relevant when sycl-ls might be run in the context of really old drivers/lzloader. It is a workaround for some Win test systems.
One of the SYCL 1.2 exception subclasses being removed during the ABI breaking window since these subclasses have been removed in SYCL 2020. We still want to be able to carry information about backend error code, so `MPIErr` data member in `sycl::exception` remains. However, since none of the standard ctors can accept it (per the specification), I've introduced a new helper `detail::set_pi_error` to set it. It should only be used when propagating an error comming from the PI/UR level and *NOT* for the error conditions originated in the SYCL RT (which is a change from previous implementation but a logical/justifiable one).
`assert.cpp`: replace use of deprecated `get_info<info::device::extensions>` with `opencl::has_extension` `string_test.cpp`: replace use of deprecated `get_pointer` with `get_multi_ptr`
Not removing the exception subclass itself as that would result in merge conflicts with our PRs.
The design of SYCLConditionalCallOnDevicePass pass is described in https://github.com/intel/llvm/blob/01fa4f8a439ea9b7d932d8099e1e1a17c630c1ea/sycl/doc/design/DeviceIf.md#new-ir-pass
Signed-off-by: Ahmed, Daiyaan <[email protected]>
Signed-off-by: Ahmed, Daiyaan <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
[SYCLomatic] Fix sycl to SYCLomatic pull down failed
Signed-off-by: Daiyaan Ahmed [email protected]