-
Notifications
You must be signed in to change notification settings - Fork 753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[sycl][cuda][hip] Expose const addrsp via device_global<T, decltype(properties{device_constant})> #16001
base: sycl
Are you sure you want to change the base?
Conversation
Signed-off-by: JackAKirk <[email protected]>
Signed-off-by: JackAKirk <[email protected]>
Signed-off-by: JackAKirk <[email protected]>
Signed-off-by: JackAKirk <[email protected]>
Signed-off-by: JackAKirk <[email protected]>
Signed-off-by: JackAKirk <[email protected]>
Signed-off-by: JackAKirk <[email protected]>
Signed-off-by: JackAKirk <[email protected]>
Signed-off-by: JackAKirk <[email protected]>
Signed-off-by: JackAKirk <[email protected]>
template <typename propertyT> static constexpr auto get_property() { \ | ||
return property_list_t::template get_property<propertyT>(); \ | ||
} | ||
|
||
template <typename T, typename... Props> | ||
class | ||
#ifdef __SYCL_DEVICE_ONLY__ | ||
[[__sycl_detail__::global_variable_allowed, __sycl_detail__::device_global, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if device_global
should take an optional argument to select the address space instead of a new device_constant
. If we do, you could use some meta-programming to select the address space here, that could limit the impact on the headers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could even do that with the current property. It should be as simple as doing a std::conditional_t<property_list_t::template has_property<device_constant_key>(), __OPENCL_CONSTANT_AS__ T *, T *>
, inheriting the definition of __OPENCL_CONSTANT_AS__
from sycl/include/sycl/access/access.hpp. Assuming the PTX and AMDGCN know how to handle __attribute__((opencl_constant))
, that should hopefully avoid the need for the new clang attribute.
As for the member availability, this could be done through either conditionally picking base classes or SFINAE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could even do that with the current property. It should be as simple as doing a
std::conditional_t<property_list_t::template has_property<device_constant_key>(), __OPENCL_CONSTANT_AS__ T *, T *>
, inheriting the definition of__OPENCL_CONSTANT_AS__
from sycl/include/sycl/access/access.hpp. Assuming the PTX and AMDGCN know how to handle__attribute__((opencl_constant))
, that should hopefully avoid the need for the new clang attribute.As for the member availability, this could be done through either conditionally picking base classes or SFINAE.
Doing address space declarations directly in source code is currently not allowed by SEMA: I get e.g. (same if field is pointer type):
error: field may not be qualified with an address space
100 | T __attribute__((opencl_constant)) val{};
I looked into changing this behaviour, but I didn't think there was a simple solution.
My current idea is to partially specialize as
device_global<
T, detail::properties_t<Props...>,
typename std::enable_if_t<(detail::properties_t<Props...>::
template has_property<device_constant_key>())>>
: public detail::device_global_base<T, detail::properties_t<Props...>>
such that when the property device_constant is used we add a clang attribute to the class :
__sycl_detail__::device_constant
That the compiler then uses to manually set the address space to .const only for cuda/hip backends.
This I think in theory should be compatible with the partial specializations of the device_global_base
class; to allow simultaneous specializations for the case when device_image_scope
property is in the property list (via device_global_base): The only property that can be used in combination with the device_constant
property should be device_image_scope
I think (I will update the specification doc once the implementation is finalised).
@steffenlarsen @Naghasan Maybe there is a better solution though?
Once this is done there should also probably be a few more tests added to check all valid combined functionality of the device_constant property with the device_image_scope property. There should also be a test checking that the compiler does not allow writing to a device_global with the device_constant property (apart from via sycl::hander::/queue::memcpy etc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we need the attribute, could we maybe make it apply to the field instead of the device-global class then? I.e. that way we can use the conditional solution we previously discussed, but with the new attribute.
As a thought experiment, can we think of a case where a const global variable (including const fields of non-const global variables) would not want the variables to be in the .const
namespace for NVPTX and AMDGCN? If not, could we maybe make the address-space decision based on that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we need the attribute, could we maybe make it apply to the field instead of the device-global class then? I.e. that way we can use the conditional solution we previously discussed, but with the new attribute.
Yeah this sounds like it might be a better solution. I'll look into doing this. Thanks for the input
As a thought experiment, can we think of a case where a const global variable (including const fields of non-const global variables) would not want the variables to be in the
.const
namespace for NVPTX and AMDGCN? If not, could we maybe make the address-space decision based on that?
I am not sure if it can be an issue for NVPTX and AMDGCN backends, but in theory you can run out of .const memory space (It is normally limited to 4kb for NVPTX), and hence a user may want to be careful about which variables to put in .const space. If it is not a real issue in these backends, it could be an issue in other as yet unsupported backends. This is why we do not wan to have device_global<const T>
imply .const address space, and instead have the explicit device_global<T, decltype(properties{device_constant})>
solution, which will be functionally identical to device_global<const T, decltype(properties{device_constant})>
:
device_constant
implies const T
but const T
shouldn't imply .const address space.
template <typename propertyT> static constexpr auto get_property() { \ | ||
return property_list_t::template get_property<propertyT>(); \ | ||
} | ||
|
||
template <typename T, typename... Props> | ||
class | ||
#ifdef __SYCL_DEVICE_ONLY__ | ||
[[__sycl_detail__::global_variable_allowed, __sycl_detail__::device_global, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could even do that with the current property. It should be as simple as doing a std::conditional_t<property_list_t::template has_property<device_constant_key>(), __OPENCL_CONSTANT_AS__ T *, T *>
, inheriting the definition of __OPENCL_CONSTANT_AS__
from sycl/include/sycl/access/access.hpp. Assuming the PTX and AMDGCN know how to handle __attribute__((opencl_constant))
, that should hopefully avoid the need for the new clang attribute.
As for the member availability, this could be done through either conditionally picking base classes or SFINAE.
struct device_constant_key | ||
: detail::compile_time_property_key<detail::PropKind::DeviceConstant> { | ||
using value_t = property_value<device_constant_key>; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have this new property and its effects on the device_global
class documented anywhere?
[NOTE] | ||
==== | ||
If _T_ is `const` then implementations may choose to allocate the `device_global` in a dedicated constant address space as an optimization. When using the {dpcpp} compiler with the CUDA or HIP backend, declaring a `device_global<const T>` is equivalent to declaring a `$$__constant__$$` variable. | ||
==== |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem like it corresponds to what is implemented. It could be done in tandem by making all conditional behavior dependent on the new property dependent on the disjunction of that and std::is_const_v<T>
. That said, if that is the case we also need to document how that affects the members of the device_global
class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't finalised the implementation yet (I'm just testing the draft requested changes atm), so I haven't updated the documentation which you are right is completely out of date. I described this here: #16001 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you mention me with a comment like "spec ready for review" when it is ready?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you mention me with a comment like "spec ready for review" when it is ready?
Sure, no problem.
Signed-off-by: JackAKirk <[email protected]>
Signed-off-by: JackAKirk <[email protected]>
Signed-off-by: JackAKirk <[email protected]>
Signed-off-by: JackAKirk <[email protected]>
Signed-off-by: JackAKirk <[email protected]>
Signed-off-by: JackAKirk <[email protected]>
Signed-off-by: JackAKirk <[email protected]>
Signed-off-by: JackAKirk <[email protected]>
Signed-off-by: JackAKirk <[email protected]>
Signed-off-by: JackAKirk <[email protected]>
Map cuda/hip const addrspace device global variable (
__constant__
) todevice_global<T, decltype(properties{device_constant})>
Nvidia GPUs have a dedicated constant memory cache which can be a lot faster in some cases for constant global device variables ("constant cuda symbols"). CUDA programmers access this cache via global variables marked
__constant__
AMD GPUs do not have a dedicated constant memory cache (as far as I am aware). However the HIP programming model does support
__constant__
. As well as supporting the constant cache in the Nvidia case, when AMD GPUs are the target the macro can be used as a compiler hint for other optimizations such as using SGPRs (scalar registers) instead of VGPRs (vector registers).This patch switches on these optimizations for cuda/hip backends of dpc++.
This is a natural translation that allows the complete support of
device_global
features under the constraint that programmers cannot update thedevice_global<T, decltype(properties{device_constant})>
in kernel code (matching__constant__
semantics in cuda/hip), whilst still allowing them to update this constant global variable viaqueue::memcpy(const device_global)
, which maps naturally to how CUDA APIs allows programmers to update__constant__
device symbols via the host.Key applications that have been identified will benefit from this:
Kokkos (general)
Blender
NWCHEMEX aka Exachem
Fixes #5827
Fixes #4278