Skip to content

Commit

Permalink
Dispatch via std::variant, and type-safe conversion of Tensor::_impl-…
Browse files Browse the repository at this point in the history
…>_storage._impl->Mem to a pointer

Type-safe conversions
=====================

Tensor::ptr() returns this->_impl->_storage._impl->Mem, contained in a std::variant of pointers to the supported types (excluding void).  If the dtype is void, then throw an error

template <typename T>
Tensor::ptr_as<T>() returns this->_impl->_storage._impl->Mem cast to T*.  If dtype does not match T, then throw an error

There are also versions for GPU storage, enabled if UNI_GPU is defined:

Tensor::gpu_ptr() returns a std::variant of pointers to the supported types, as required for CUDA (i.e. using cuComplex etc).

template <typename T>
Tensor::gpu_ptr_as<T>() returns this->_impl->_storage._impl->Mem cast to T*.  If dtype does not match the CUDA type T, then throw an error

There is also a compile-time mechanism to promote types, as Type_class::type_promote_t<T, U>, which uses the type_promote() function to determine the
type corresponding to type_promote(dtype_T, dtype_U) (yay for constexpr functions!)  Calculating the promoted type via pointers is likely to be a
common operation, so there is also Type_class::type_promote_from_pointer_t<T,U>, which requires that T and U are pointer types.  This is almost
equivalent to type_promote_t<std::remove_pointer_t<T>, std::remove_pointer_t<U>>, except that if T is not actually a pointer type then
std::remove_pointer_t<T> is just T, and hence gives the same result as type_promote_t<T,U>, but type_promote_from_pointer_t<T,U> is void if T or U are
not pointer types (with the intent that this will eventually result in a compile error if it was not indented).  Note that
type_promote_from_pointer_t<T,U> is NOT itself a pointer type.

Similarly, for GPU types, there is type_promote_gpu_t and type_promote_from_gpu_pointer_t

Dispatch via std::variant
=========================

src/linalg/Kron.cpp demonstrates dispatch to a two-parameter function via std::variant and the ptr() function. The key code is

        std::visit(
          [&](auto tl, auto tr) {
            // tl and tr are pointer types here.
            using out_type = Type_class::type_promote_from_pointer_t<decltype(tl), decltype(tr)>;
            static_assert(!std::is_same_v<out_type, void>);
            cytnx::linalg_internal::Kron_general(out.ptr_as<out_type>(), tl, tr, pad_shape1,
                                                 pad_shape2);
          },
          Tl.ptr(), Tr.ptr());

There Tl and Tr are Tensor objects, and this does a double dispatch on the dtypes of Tl and Tr, via

std::visit(function, Tl.ptr(), Tr.ptr())

with the function being a lambda function to forward the pointers to the template kernel linalg_internal::Kron_general().

out_type is computed from the promotion of the pointer types of tl and tr, and the pointer to the output tensor is obtained
from out.ptr_as<out_type>().  This will throw an error if out_type does not match the previously assigned dtype, although
it represents logic error in the code, rather than a user error.  In this case we can be confident it should never trigger, since
the initialization of Tensor out immediately before dispatching to the kernel sets the dtype from type_promote(), which is
using the same code to compute the promoted type as type_promote_from_pointer_t.

A slight disadvantage of this approach is that this requires that the template kernel is included into Kron.hpp, which increases compile time marginally
but it means that linalg_internal_cpu/Kron_internal.cpp is no longer used.  Since this is a very long file that must be kept in sync with any changes
to dtypes, the type promotion algorithm, and the Kron kernel itself, it is a substantial redunction in source code and maintenance.

Similarly, the static jump tables for the Kron function in backend/linalg_internal_interface.cpp have been removed (these were actually missing some entries).

The corresponding changes have also been done for the CUDA version, now using Tl.gpu_ptr() and Tr.gpu_ptr(), and dispatching to the CUDA kernel cuKron_general,
and using type_promote_from_gpu_pointer_t to get the correct CUDA type of the output tensor.

A complication in the CUDA case is that the CUDA template kernel now needs to be included into Kron.cpp, so the cuKron_internal.cu has been renamed cuKron_internal.cuh
(CUDA header file), and if UNI_GPU is defined then Kron.cpp must be compiled using the CUDA compiler.  This is managed by the local CMakeLists.txt file:

 # This must be before the target_sources_local(), so that the target is added as CUDA
if(USE_CUDA)
  set_source_files_properties(Kron.cpp DIRECTORY ${CMAKE_SOURCE_DIR} PROPERTIES LANGUAGE CUDA)
endif()

set_source_files_properties() is directory-specific and by default only applies to targets defined in the current directory (and subdirectories?), hence
it is necessary to explicitly set the DIRECTORY where the targets (cytnx) are defined.

Possible future directions
==========================

Apply the above techniques to the other linear algebra functions, and eventually drop src/backend/linalg_internal_interface.[hpp|cpp].  This should be
simple in some cases; other cases would need more substantial refactoring to merge type-specific kernels into template functions.

The Tensor::ptr(), gpu_ptr(), ptr_as<T>(), gpu_ptr_as<T>() functions do not really belong in class Tensor, it would be nice to move them into the Storage, ideally
ptr() and ptr_as<T>() functions would only exist for some CpuStorage class, and similarly gpu_ptr() and gpu_ptr_as() would only exist for the GPU storage class.
The dispatch to a back-end kernel could then go via two stages: firstly dispatch based on the device to a back-end storage (i.e. either cpu, or gpu), and then
that back-end handles the device-specific code.  Banish gpu_ptr() and gpu_ptr_as<T>() to backend/gpu/storage/....

The appearance of 'void' as one of the types in dtype was a complication that needed a hack to remove 'void' when constructing the variant of pointers. If
void* is a possible type in the std::variant, then the dispatch via std::visit() is expecting that void* is a viable option to consider, and hence the first version of
the Kron dispatch was attempting to compile the Kron kernel for tensors of type void...  Removing void from consideration in this way works, but it hasa disadvantage
that the index of the type in the variant (i.e. T.ptr().index()) is no longer aligned with the dtype variable.  Not really a problem since there should not
be any need to do that, but there is potential for that to go badly wrong if someone unwittingly expects them to be the same.
Perhaps removing void completely from Type_class would be feasible, and instead treat any tensor with zero dimension as the equivalent of 'void'.
  • Loading branch information
ianmccul committed Nov 28, 2024
1 parent 83dfac6 commit ed3e259
Show file tree
Hide file tree
Showing 15 changed files with 449 additions and 3,930 deletions.
52 changes: 52 additions & 0 deletions include/Tensor.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -493,6 +493,58 @@ namespace cytnx {
// }
//@}

// This mechanism is to remove the 'void' type from Type_list. Taking advantage of it
// appearing first ...
struct internal {
template <typename Variant>
struct exclude_first;

template <typename First, typename... Rest>
struct exclude_first<std::variant<First, Rest...>> {
using type = std::variant<Rest...>;
};
}; // internal

// std::variant of pointers to Type_list, without void ....
using pointer_types =
make_variant_from_transform_t<typename internal::exclude_first<Type_list>::type,
std::add_pointer>;

// convert this->_impl->_storage._impl->Mem to a typed variant of pointers, excluding void*
pointer_types ptr() const;

// convert this->_impl->_strorage->Mem to the given pointer type.
// Throws an exception if T does not match this->dtype
template <typename T>
T *ptr_as() const {
cytnx_error_msg(this->dtype() != Type_class::cy_typeid_v<std::remove_cv_t<T>>,
"[ERROR] Attempt to convert dtype %d (%s) to pointer of type %s",
this->dtype(), Type_class::getname(this->dtype()),
Type_class::getname(Type_class::cy_typeid_v<std::remove_cv_t<T>>));
return static_cast<T *>(this->_impl->_storage._impl->Mem);
}

#ifdef UNI_GPU
// std::variant of pointers to Type_list_gpu, without void ....
using gpu_pointer_types =
make_variant_from_transform_t<typename internal::exclude_first<Type_list_gpu>::type,
std::add_pointer>;

// convert this->_impl->_storage->Mem to a typed variant of pointers, excluding void*
gpu_pointer_types gpu_ptr() const;

// convert this->_impl->_strorage->Mem to the given pointer type.
// Throws an exception if T does not match this->dtype
template <typename T>
T *gpu_ptr_as() const {
cytnx_error_msg(this->dtype() != Type_class::cy_typeid_gpu_v<std::remove_cv_t<T>>,
"[ERROR] Attempt to convert dtype %d (%s) to GPU pointer of type %s",
this->dtype(), Type_class::getname(this->dtype()),
Type_class::getname(Type_class::cy_typeid_gpu_v<std::remove_cv_t<T>>));
return static_cast<T *>(this->_impl->_storage._impl->Mem);
}
#endif

/**
@brief Convert a Storage to Tensor
@param[in] in the Storage to be converted
Expand Down
188 changes: 148 additions & 40 deletions include/Type.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,9 @@
#include <array>
#include <utility>
#include <vector>
#include <variant>

#include "cytnx_error.hpp"
#include "cytnx_error.hpp" // also brings in cuComplex.h

#define MKL_Complex8 std::complex<float>
#define MKL_Complex16 std::complex<double>
Expand Down Expand Up @@ -75,6 +76,20 @@ namespace cytnx {

} // namespace internal

// helper metafunction to transform a variant into another variant via a
// transform template alias
template <typename V, template <typename> class Transform>
struct make_variant_from_transform;

template <template <typename> class Transform, typename... Args>
struct make_variant_from_transform<std::variant<Args...>, Transform> {
using type = std::variant<typename Transform<Args>::type...>;
};

// helper type alias for make_variant_from_transform
template <typename V, template <typename> class Transform>
using make_variant_from_transform_t = typename make_variant_from_transform<V, Transform>::type;

template <typename T>
using is_complex = internal::is_complex_impl<std::remove_cv_t<T>>;

Expand All @@ -91,14 +106,29 @@ namespace cytnx {
template <typename T>
constexpr bool is_complex_floating_point_v = is_complex_floating_point<T>::value;

// tuple_element_index<T, Tuple> returns the index of type T in the Tuple, or compile error if not
// variant_index<T, Variant> returns the index of type T in the Variant, or compile error if not
// found
template <typename T, typename Tuple>
struct tuple_element_index
: std::integral_constant<std::size_t, internal::index_in_tuple_helper<0, T, Tuple>()> {};
template <typename T, typename Variant>
struct variant_index;

template <typename T, typename... Types>
struct variant_index<T, std::variant<Types...>> {
static constexpr size_t value = std::variant_size_v<std::variant<Types...>>;
};

template <typename T, typename... Types>
struct variant_index<T, std::variant<T, Types...>> {
static constexpr size_t value = 0;
};

template <typename T, typename U, typename... Types>
struct variant_index<T, std::variant<U, Types...>> {
static constexpr size_t value = 1 + variant_index<T, std::variant<Types...>>::value;
};

template <typename T, typename Tuple>
constexpr int tuple_element_index_v = tuple_element_index<T, Tuple>::value;
// helper template variable
template <typename T, typename Variant>
static constexpr size_t variant_index_v = variant_index<T, Variant>::value;

namespace internal {
// type_size returns the sizeof(T) for the supported types. This is the same as
Expand All @@ -110,18 +140,27 @@ namespace cytnx {
} // namespace internal

// the list of supported types. The dtype() of an object is an index into this list.
// This **MUST** match the ordering of __type::__pybind_type
// std::variant works better than std::tuple here since a variant is constrained to only
// hold each type once, and we have std::variant_alternative_t<n> to get the n'th type,
// as well as the variant_index_v helper to get the index of a given type
using Type_list =
std::tuple<void, cytnx_complex128, cytnx_complex64, cytnx_double, cytnx_float, cytnx_int64,
cytnx_uint64, cytnx_int32, cytnx_uint32, cytnx_int16, cytnx_uint16, cytnx_bool>;
std::variant<void, cytnx_complex128, cytnx_complex64, cytnx_double, cytnx_float, cytnx_int64,
cytnx_uint64, cytnx_int32, cytnx_uint32, cytnx_int16, cytnx_uint16, cytnx_bool>;

// For GPU storage, the types are slightly different because CUDA uses their own complex type
#ifdef UNI_GPU
using Type_list_gpu =
std::variant<void, cuDoubleComplex, cuComplex, cytnx_double, cytnx_float, cytnx_int64,
cytnx_uint64, cytnx_int32, cytnx_uint32, cytnx_int16, cytnx_uint16, cytnx_bool>;
#endif

// The number of supported types
constexpr int N_Type = std::tuple_size_v<Type_list>;
constexpr int N_Type = std::variant_size_v<Type_list>;
constexpr int N_fType = 5;

// The friendly name of each type
template <typename T>
constexpr char* Type_names;
constexpr char* Type_names = nullptr;
template <>
constexpr const char* Type_names<void> = "Void";
template <>
Expand Down Expand Up @@ -149,7 +188,7 @@ namespace cytnx {

// The corresponding Python enumeration name
template <typename T>
constexpr char* Type_enum_name;
constexpr char* Type_enum_name = nullptr;
template <>
constexpr const char* Type_enum_name<void> = "Void";
template <>
Expand Down Expand Up @@ -187,7 +226,10 @@ namespace cytnx {

template <typename T>
struct Type_struct_t {
static constexpr unsigned int cy_typeid = tuple_element_index_v<T, Type_list>;
static constexpr unsigned int cy_typeid = variant_index_v<T, Type_list>;
#ifdef UNI_GPU
static constexpr unsigned int cy_typeid_gpu = variant_index_v<T, Type_list_gpu>;
#endif
static constexpr const char* name = Type_names<T>;
static constexpr const char* enum_name = Type_enum_name<T>;
static constexpr bool is_complex = is_complex_v<T>;
Expand All @@ -202,39 +244,45 @@ namespace cytnx {
};

namespace internal {
template <typename Tuple, std::size_t... Indices>
template <typename Variant, std::size_t... Indices>
constexpr auto make_type_array_helper(std::index_sequence<Indices...>) {
return std::array<Type_struct, sizeof...(Indices)>{
Type_struct_t<std::tuple_element_t<Indices, Tuple>>::construct()...};
Type_struct_t<std::variant_alternative_t<Indices, Variant>>::construct()...};
}
template <typename Tuple>
template <typename Variant>
constexpr auto make_type_array() {
return make_type_array_helper<Tuple>(std::make_index_sequence<std::tuple_size_v<Tuple>>());
return make_type_array_helper<Variant>(
std::make_index_sequence<std::variant_size_v<Variant>>());
}
} // namespace internal

// Typeinfos is a std::array<Type_struct> for each type in Type_list
constexpr auto Typeinfos = internal::make_type_array<Type_list>();

template <typename T>
constexpr unsigned int cy_typeid = tuple_element_index_v<T, Type_list>;

class Type_class {
private:
public:
// Typeinfos is a std::array<Type_struct> for each type in Type_list
static constexpr auto Typeinfos = internal::make_type_array<Type_list>();

template <typename T>
static constexpr unsigned int cy_typeid_v = variant_index_v<T, Type_list>;

#ifdef UNI_GPU
template <typename T>
static constexpr unsigned int cy_typeid_gpu_v = variant_index_v<T, Type_list_gpu>;
#endif

enum Type : unsigned int {
Void = cy_typeid<void>,
ComplexDouble = cy_typeid<cytnx_complex128>,
ComplexFloat = cy_typeid<cytnx_complex64>,
Double = cy_typeid<cytnx_double>,
Float = cy_typeid<cytnx_float>,
Int64 = cy_typeid<cytnx_int64>,
Uint64 = cy_typeid<cytnx_uint64>,
Int32 = cy_typeid<cytnx_int32>,
Uint32 = cy_typeid<cytnx_uint32>,
Int16 = cy_typeid<cytnx_int16>,
Uint16 = cy_typeid<cytnx_uint16>,
Bool = cy_typeid<cytnx_bool>
Void = cy_typeid_v<void>,
ComplexDouble = cy_typeid_v<cytnx_complex128>,
ComplexFloat = cy_typeid_v<cytnx_complex64>,
Double = cy_typeid_v<cytnx_double>,
Float = cy_typeid_v<cytnx_float>,
Int64 = cy_typeid_v<cytnx_int64>,
Uint64 = cy_typeid_v<cytnx_uint64>,
Int32 = cy_typeid_v<cytnx_int32>,
Uint32 = cy_typeid_v<cytnx_uint32>,
Int16 = cy_typeid_v<cytnx_int16>,
Uint16 = cy_typeid_v<cytnx_uint16>,
Bool = cy_typeid_v<cytnx_bool>
};

static constexpr void check_type(unsigned int type_id) {
Expand Down Expand Up @@ -274,13 +322,73 @@ namespace cytnx {

template <class T>
static constexpr unsigned int cy_typeid(const T& rc) {
return Type_struct_t<T>::cy_typeid;
return cy_typeid_v<T>;
}

template <typename T>
static constexpr unsigned int cy_typeid_v = typeid(T{});
// Find a common type for typeL and typeR
static constexpr unsigned int type_promote(unsigned int typeL, unsigned int typeR) {
if (typeL < typeR) {
if (typeL == 0) return 0;

static unsigned int type_promote(unsigned int typeL, unsigned int typeR);
if (!is_unsigned(typeR) && is_unsigned(typeL)) {
return typeL - 1;
} else {
return typeL;
}
} else {
if (typeR == 0) return 0;
if (!is_unsigned(typeL) && is_unsigned(typeR)) {
return typeR - 1;
} else {
return typeR;
}
}
}

// type metafunction for type promotion
template <typename TL, typename TR>
using type_promote_t =
std::variant_alternative_t<Type_class::type_promote(variant_index_v<TL, Type_list>,
variant_index_v<TR, Type_list>),
Type_list>;

// Helper to promote two pointer types (note does _not_ return another pointer type)
template <typename TL, typename TR>
struct type_promote_from_pointer {
using type = void;
};

template <typename TL, typename TR>
struct type_promote_from_pointer<TL*, TR*> {
using type = type_promote_t<std::decay_t<TL>, std::decay_t<TR>>;
};

// helper typedef
template <typename TL, typename TR>
using type_promote_from_pointer_t = typename type_promote_from_pointer<TL, TR>::type;

#ifdef UNI_GPU
// .. and we need a version where TL and TR are GPU device pointers
template <typename TL, typename TR>
using type_promote_gpu_t =
std::variant_alternative_t<Type_class::type_promote(variant_index_v<TL, Type_list_gpu>,
variant_index_v<TR, Type_list_gpu>),
Type_list_gpu>;

template <typename TL, typename TR>
struct type_promote_from_gpu_pointer {
using type = void;
};

template <typename TL, typename TR>
struct type_promote_from_gpu_pointer<TL*, TR*> {
using type = type_promote_gpu_t<std::decay_t<TL>, std::decay_t<TR>>;
};

// helper typedef
template <typename TL, typename TR>
using type_promote_from_gpu_pointer_t = typename type_promote_from_gpu_pointer<TL, TR>::type;
#endif

}; // Type_class
/// @endcond
Expand Down
26 changes: 26 additions & 0 deletions src/Tensor.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,32 @@ namespace cytnx {
return self;
}

template <std::size_t... Is>
Tensor::pointer_types void_ptr_to_variant_impl(void *p, unsigned int dtype,
std::index_sequence<Is...>) {
// Lambda to select the correct type based on dtype
Tensor::pointer_types result;
(
[&]() {
if (dtype == Is) {
using TargetType =
std::variant_alternative_t<Is,
typename Tensor::internal::exclude_first<Type_list>::type>;
result = static_cast<TargetType *>(p);
}
}(),
...); // Fold expression
return result;
}

Tensor::pointer_types Tensor::ptr() const {
cytnx_error_msg(this->dtype() == 0, "[ERROR] operation not allowed for empty (void) Tensor.%s",
"\n");
// dtype()-1 here because we have removed void from the variant
return void_ptr_to_variant_impl(this->_impl->_storage._impl->Mem, this->dtype() - 1,
std::make_index_sequence<std::variant_size_v<pointer_types>>{});
}

// ADD
Tensor Tensor::Tproxy::operator+(
const cytnx_complex128 &rc) const { //{return this->_operatorADD(rc);};
Expand Down
Loading

0 comments on commit ed3e259

Please sign in to comment.