[sycl][cuda][hip] Expose const addrsp via device_global<T, decltype(properties{device_constant})> #16001

JackAKirk · 2024-11-06T15:32:36Z

Map cuda/hip const addrspace device global variable (__constant__) to device_global<T, decltype(properties{device_constant})>

Nvidia GPUs have a dedicated constant memory cache which can be a lot faster in some cases for constant global device variables ("constant cuda symbols"). CUDA programmers access this cache via global variables marked __constant__
AMD GPUs do not have a dedicated constant memory cache (as far as I am aware). However the HIP programming model does support __constant__. As well as supporting the constant cache in the Nvidia case, when AMD GPUs are the target the macro can be used as a compiler hint for other optimizations such as using SGPRs (scalar registers) instead of VGPRs (vector registers).

This patch switches on these optimizations for cuda/hip backends of dpc++.

This is a natural translation that allows the complete support of device_global features under the constraint that programmers cannot update the device_global<T, decltype(properties{device_constant})> in kernel code (matching __constant__ semantics in cuda/hip), whilst still allowing them to update this constant global variable via queue::memcpy(const device_global), which maps naturally to how CUDA APIs allows programmers to update __constant__ device symbols via the host.

Key applications that have been identified will benefit from this:

Kokkos (general)
Blender
NWCHEMEX aka Exachem

Fixes #5827
Fixes #4278

Signed-off-by: JackAKirk <[email protected]>

Naghasan · 2024-12-10T17:25:51Z

sycl/include/sycl/ext/oneapi/device_global/device_global.hpp

+  template <typename propertyT> static constexpr auto get_property() {         \
+    return property_list_t::template get_property<propertyT>();                \
+  }
+
 template <typename T, typename... Props>
 class
 #ifdef __SYCL_DEVICE_ONLY__
    [[__sycl_detail__::global_variable_allowed, __sycl_detail__::device_global,


I was wondering if device_global should take an optional argument to select the address space instead of a new device_constant. If we do, you could use some meta-programming to select the address space here, that could limit the impact on the headers.

We could even do that with the current property. It should be as simple as doing a std::conditional_t<property_list_t::template has_property<device_constant_key>(), __OPENCL_CONSTANT_AS__ T *, T *>, inheriting the definition of __OPENCL_CONSTANT_AS__ from sycl/include/sycl/access/access.hpp. Assuming the PTX and AMDGCN know how to handle __attribute__((opencl_constant)), that should hopefully avoid the need for the new clang attribute.

As for the member availability, this could be done through either conditionally picking base classes or SFINAE.

We could even do that with the current property. It should be as simple as doing a std::conditional_t<property_list_t::template has_property<device_constant_key>(), __OPENCL_CONSTANT_AS__ T *, T *>, inheriting the definition of __OPENCL_CONSTANT_AS__ from sycl/include/sycl/access/access.hpp. Assuming the PTX and AMDGCN know how to handle __attribute__((opencl_constant)), that should hopefully avoid the need for the new clang attribute.

As for the member availability, this could be done through either conditionally picking base classes or SFINAE.

Doing address space declarations directly in source code is currently not allowed by SEMA: I get e.g. (same if field is pointer type):

error: field may not be qualified with an address space 100 | T __attribute__((opencl_constant)) val{};

I looked into changing this behaviour, but I didn't think there was a simple solution.
My current idea is to partially specialize as

device_global< T, detail::properties_t<Props...>, typename std::enable_if_t<(detail::properties_t<Props...>:: template has_property<device_constant_key>())>> : public detail::device_global_base<T, detail::properties_t<Props...>>

such that when the property device_constant is used we add a clang attribute to the class :

__sycl_detail__::device_constant

That the compiler then uses to manually set the address space to .const only for cuda/hip backends.

This I think in theory should be compatible with the partial specializations of the device_global_base class; to allow simultaneous specializations for the case when device_image_scope property is in the property list (via device_global_base): The only property that can be used in combination with the device_constant property should be device_image_scope I think (I will update the specification doc once the implementation is finalised).

@steffenlarsen @Naghasan Maybe there is a better solution though?

Once this is done there should also probably be a few more tests added to check all valid combined functionality of the device_constant property with the device_image_scope property. There should also be a test checking that the compiler does not allow writing to a device_global with the device_constant property (apart from via sycl::hander::/queue::memcpy etc).

If we need the attribute, could we maybe make it apply to the field instead of the device-global class then? I.e. that way we can use the conditional solution we previously discussed, but with the new attribute.

As a thought experiment, can we think of a case where a const global variable (including const fields of non-const global variables) would not want the variables to be in the .const namespace for NVPTX and AMDGCN? If not, could we maybe make the address-space decision based on that?

If we need the attribute, could we maybe make it apply to the field instead of the device-global class then? I.e. that way we can use the conditional solution we previously discussed, but with the new attribute.

Yeah this sounds like it might be a better solution. I'll look into doing this. Thanks for the input

As a thought experiment, can we think of a case where a const global variable (including const fields of non-const global variables) would not want the variables to be in the .const namespace for NVPTX and AMDGCN? If not, could we maybe make the address-space decision based on that?

I am not sure if it can be an issue for NVPTX and AMDGCN backends, but in theory you can run out of .const memory space (It is normally limited to 4kb for NVPTX), and hence a user may want to be careful about which variables to put in .const space. If it is not a real issue in these backends, it could be an issue in other as yet unsupported backends. This is why we do not wan to have device_global<const T> imply .const address space, and instead have the explicit device_global<T, decltype(properties{device_constant})> solution, which will be functionally identical to device_global<const T, decltype(properties{device_constant})>:

device_constant implies const T but const T shouldn't imply .const address space.

steffenlarsen · 2024-12-11T06:06:41Z

sycl/include/sycl/ext/oneapi/device_global/device_global.hpp

+  template <typename propertyT> static constexpr auto get_property() {         \
+    return property_list_t::template get_property<propertyT>();                \
+  }
+
 template <typename T, typename... Props>
 class
 #ifdef __SYCL_DEVICE_ONLY__
    [[__sycl_detail__::global_variable_allowed, __sycl_detail__::device_global,


We could even do that with the current property. It should be as simple as doing a std::conditional_t<property_list_t::template has_property<device_constant_key>(), __OPENCL_CONSTANT_AS__ T *, T *>, inheriting the definition of __OPENCL_CONSTANT_AS__ from sycl/include/sycl/access/access.hpp. Assuming the PTX and AMDGCN know how to handle __attribute__((opencl_constant)), that should hopefully avoid the need for the new clang attribute.

As for the member availability, this could be done through either conditionally picking base classes or SFINAE.

steffenlarsen · 2024-12-11T06:09:03Z

sycl/include/sycl/ext/oneapi/device_global/properties.hpp

+struct device_constant_key
+    : detail::compile_time_property_key<detail::PropKind::DeviceConstant> {
+  using value_t = property_value<device_constant_key>;
+};


Do we have this new property and its effects on the device_global class documented anywhere?

steffenlarsen · 2024-12-11T06:14:02Z

sycl/doc/extensions/experimental/sycl_ext_oneapi_device_global.asciidoc

+[NOTE]
+====
+If _T_ is `const` then implementations may choose to allocate the `device_global` in a dedicated constant address space as an optimization. When using the {dpcpp} compiler with the CUDA or HIP backend, declaring a `device_global<const T>` is equivalent to declaring a `$$__constant__$$` variable.
+====


This doesn't seem like it corresponds to what is implemented. It could be done in tandem by making all conditional behavior dependent on the new property dependent on the disjunction of that and std::is_const_v<T>. That said, if that is the case we also need to document how that affects the members of the device_global class.

I haven't finalised the implementation yet (I'm just testing the draft requested changes atm), so I haven't updated the documentation which you are right is completely out of date. I described this here: #16001 (comment)

Can you mention me with a comment like "spec ready for review" when it is ready?

Can you mention me with a comment like "spec ready for review" when it is ready?

Sure, no problem.

Signed-off-by: JackAKirk <[email protected]>

JackAKirk added 2 commits October 15, 2024 07:19

[NVPTX] Use "const" cache w/ "device_global"

8c38eda

Signed-off-by: JackAKirk <[email protected]>

Make compat with full device_global impl.

b0a8698

Signed-off-by: JackAKirk <[email protected]>

JackAKirk had a problem deploying to WindowsCILock November 6, 2024 15:33 — with GitHub Actions Error

fix format

af10296

Signed-off-by: JackAKirk <[email protected]>

JackAKirk temporarily deployed to WindowsCILock November 6, 2024 15:39 — with GitHub Actions Inactive

JackAKirk temporarily deployed to WindowsCILock November 6, 2024 16:14 — with GitHub Actions Inactive

Merge branch 'sycl' into cuda-const-addr-expose

7682c97

JackAKirk temporarily deployed to WindowsCILock November 7, 2024 16:30 — with GitHub Actions Inactive

JackAKirk temporarily deployed to WindowsCILock November 7, 2024 17:07 — with GitHub Actions Inactive

Add device code test/Update ext spec with note

d64e9f1

Signed-off-by: JackAKirk <[email protected]>

JackAKirk had a problem deploying to WindowsCILock November 14, 2024 15:14 — with GitHub Actions Error

Try to render ascii correctly

7668ca5

Signed-off-by: JackAKirk <[email protected]>

JackAKirk had a problem deploying to WindowsCILock November 14, 2024 15:18 — with GitHub Actions Error

Remove unwanted italic

002c1ba

Signed-off-by: JackAKirk <[email protected]>

JackAKirk had a problem deploying to WindowsCILock November 14, 2024 15:32 — with GitHub Actions Error

Try to remove \

b9dcbbd

Signed-off-by: JackAKirk <[email protected]>

JackAKirk had a problem deploying to WindowsCILock November 14, 2024 15:40 — with GitHub Actions Error

JackAKirk added 2 commits November 14, 2024 07:44

Another attempt to render __constant__

6263635

Try to render __constant__

b629213

JackAKirk had a problem deploying to WindowsCILock November 14, 2024 15:46 — with GitHub Actions Error

JackAKirk had a problem deploying to WindowsCILock November 14, 2024 15:48 — with GitHub Actions Error

JackAKirk added 2 commits November 14, 2024 07:50

Try to render __constant__

7f4cf57

Render __constant__ correctly

9c15eed

JackAKirk had a problem deploying to WindowsCILock November 14, 2024 15:53 — with GitHub Actions Failure

JackAKirk had a problem deploying to WindowsCILock November 14, 2024 16:57 — with GitHub Actions Error

JackAKirk added 2 commits November 14, 2024 09:36

Check default addrspace used for non cuda/hip

d8aceb1

Signed-off-by: JackAKirk <[email protected]>

Improve comment

84f0f11

Signed-off-by: JackAKirk <[email protected]>

JackAKirk had a problem deploying to WindowsCILock November 14, 2024 18:08 — with GitHub Actions Failure

JackAKirk had a problem deploying to WindowsCILock November 14, 2024 20:34 — with GitHub Actions Error

Simplify test

3f70ded

Add missing typename

1da56bb

Signed-off-by: JackAKirk <[email protected]>

JackAKirk had a problem deploying to WindowsCILock December 10, 2024 13:09 — with GitHub Actions Failure

Fix separate types device_global specializations

52d9e6f

JackAKirk had a problem deploying to WindowsCILock December 10, 2024 15:22 — with GitHub Actions Failure

JackAKirk had a problem deploying to WindowsCILock December 10, 2024 16:36 — with GitHub Actions Failure

Naghasan reviewed Dec 10, 2024

View reviewed changes

steffenlarsen reviewed Dec 11, 2024

View reviewed changes

JackAKirk marked this pull request as draft December 11, 2024 10:54

JackAKirk added 3 commits January 22, 2025 12:06

Fix impl for testing.

79bb6b6

Signed-off-by: JackAKirk <[email protected]>

Fix format

4c8f724

Signed-off-by: JackAKirk <[email protected]>

Merge branch 'sycl' into cuda-const-addr-expose

4eb8375

JackAKirk had a problem deploying to WindowsCILock January 22, 2025 12:14 — with GitHub Actions Error

Fix merge/format

6e2f772

Signed-off-by: JackAKirk <[email protected]>

JackAKirk had a problem deploying to WindowsCILock January 22, 2025 12:39 — with GitHub Actions Failure

device_constant -> const T

6ca1274

Signed-off-by: JackAKirk <[email protected]>

JackAKirk had a problem deploying to WindowsCILock January 22, 2025 13:49 — with GitHub Actions Failure

Fix template deduction.

0e36f21

Signed-off-by: JackAKirk <[email protected]>

JackAKirk had a problem deploying to WindowsCILock January 22, 2025 14:17 — with GitHub Actions Error

Fix device_global_copy test

9e53646

Signed-off-by: JackAKirk <[email protected]>

JackAKirk had a problem deploying to WindowsCILock January 22, 2025 14:55 — with GitHub Actions Failure

Fix failures

9b4de8b

Signed-off-by: JackAKirk <[email protected]>

JackAKirk had a problem deploying to WindowsCILock January 22, 2025 16:29 — with GitHub Actions Error

fix

c14a604

Signed-off-by: JackAKirk <[email protected]>

JackAKirk had a problem deploying to WindowsCILock January 22, 2025 17:40 — with GitHub Actions Failure

JackAKirk had a problem deploying to WindowsCILock January 22, 2025 19:56 — with GitHub Actions Failure

Windows fix

0a66d92

Signed-off-by: JackAKirk <[email protected]>

JackAKirk had a problem deploying to WindowsCILock January 23, 2025 15:28 — with GitHub Actions Failure

JackAKirk added 2 commits January 27, 2025 18:00

typedef workaround attempt

27c1351

Signed-off-by: JackAKirk <[email protected]>

Merge branch 'sycl' into cuda-const-addr-expose

3ef0864

JackAKirk had a problem deploying to WindowsCILock January 27, 2025 18:04 — with GitHub Actions Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[sycl][cuda][hip] Expose const addrsp via device_global<T, decltype(properties{device_constant})> #16001

[sycl][cuda][hip] Expose const addrsp via device_global<T, decltype(properties{device_constant})> #16001

JackAKirk commented Nov 6, 2024 •

edited

Loading

Naghasan Dec 10, 2024

steffenlarsen Dec 11, 2024 •

edited

Loading

JackAKirk Dec 11, 2024

steffenlarsen Dec 12, 2024

JackAKirk Dec 13, 2024 •

edited

Loading

steffenlarsen Dec 11, 2024 •

edited

Loading

steffenlarsen Dec 11, 2024

steffenlarsen Dec 11, 2024

JackAKirk Dec 11, 2024

gmlueck Dec 11, 2024

JackAKirk Dec 13, 2024

[sycl][cuda][hip] Expose const addrsp via device_global<T, decltype(properties{device_constant})> #16001

Are you sure you want to change the base?

[sycl][cuda][hip] Expose const addrsp via device_global<T, decltype(properties{device_constant})> #16001

Conversation

JackAKirk commented Nov 6, 2024 • edited Loading

Choose a reason for hiding this comment

steffenlarsen Dec 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JackAKirk Dec 13, 2024 • edited Loading

Choose a reason for hiding this comment

steffenlarsen Dec 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JackAKirk commented Nov 6, 2024 •

edited

Loading

steffenlarsen Dec 11, 2024 •

edited

Loading

JackAKirk Dec 13, 2024 •

edited

Loading

steffenlarsen Dec 11, 2024 •

edited

Loading