Dispatch via std::variant, and type-safe conversion of Tensor::_impl-…

…>_storage._impl->Mem to a pointer Type-safe conversions ===================== Tensor::ptr() returns this->_impl->_storage._impl->Mem, contained in a std::variant of pointers to the supported types (excluding void). If the dtype is void, then throw an error template <typename T> Tensor::ptr_as<T>() returns this->_impl->_storage._impl->Mem cast to T*. If dtype does not match T, then throw an error There are also versions for GPU storage, enabled if UNI_GPU is defined: Tensor::gpu_ptr() returns a std::variant of pointers to the supported types, as required for CUDA (i.e. using cuComplex etc). template <typename T> Tensor::gpu_ptr_as<T>() returns this->_impl->_storage._impl->Mem cast to T*. If dtype does not match the CUDA type T, then throw an error There is also a compile-time mechanism to promote types, as Type_class::type_promote_t<T, U>, which uses the type_promote() function to determine the type corresponding to type_promote(dtype_T, dtype_U) (yay for constexpr functions!) Calculating the promoted type via pointers is likely to be a common operation, so there is also Type_class::type_promote_from_pointer_t<T,U>, which requires that T and U are pointer types. This is almost equivalent to type_promote_t<std::remove_pointer_t<T>, std::remove_pointer_t<U>>, except that if T is not actually a pointer type then std::remove_pointer_t<T> is just T, and hence gives the same result as type_promote_t<T,U>, but type_promote_from_pointer_t<T,U> is void if T or U are not pointer types (with the intent that this will eventually result in a compile error if it was not indented). Note that type_promote_from_pointer_t<T,U> is NOT itself a pointer type. Similarly, for GPU types, there is type_promote_gpu_t and type_promote_from_gpu_pointer_t Dispatch via std::variant ========================= src/linalg/Kron.cpp demonstrates dispatch to a two-parameter function via std::variant and the ptr() function. The key code is std::visit( [&](auto tl, auto tr) { // tl and tr are pointer types here. using out_type = Type_class::type_promote_from_pointer_t<decltype(tl), decltype(tr)>; static_assert(!std::is_same_v<out_type, void>); cytnx::linalg_internal::Kron_general(out.ptr_as<out_type>(), tl, tr, pad_shape1, pad_shape2); }, Tl.ptr(), Tr.ptr()); There Tl and Tr are Tensor objects, and this does a double dispatch on the dtypes of Tl and Tr, via std::visit(function, Tl.ptr(), Tr.ptr()) with the function being a lambda function to forward the pointers to the template kernel linalg_internal::Kron_general(). out_type is computed from the promotion of the pointer types of tl and tr, and the pointer to the output tensor is obtained from out.ptr_as<out_type>(). This will throw an error if out_type does not match the previously assigned dtype, although it represents logic error in the code, rather than a user error. In this case we can be confident it should never trigger, since the initialization of Tensor out immediately before dispatching to the kernel sets the dtype from type_promote(), which is using the same code to compute the promoted type as type_promote_from_pointer_t. A slight disadvantage of this approach is that this requires that the template kernel is included into Kron.hpp, which increases compile time marginally but it means that linalg_internal_cpu/Kron_internal.cpp is no longer used. Since this is a very long file that must be kept in sync with any changes to dtypes, the type promotion algorithm, and the Kron kernel itself, it is a substantial redunction in source code and maintenance. Similarly, the static jump tables for the Kron function in backend/linalg_internal_interface.cpp have been removed (these were actually missing some entries). The corresponding changes have also been done for the CUDA version, now using Tl.gpu_ptr() and Tr.gpu_ptr(), and dispatching to the CUDA kernel cuKron_general, and using type_promote_from_gpu_pointer_t to get the correct CUDA type of the output tensor. A complication in the CUDA case is that the CUDA template kernel now needs to be included into Kron.cpp, so the cuKron_internal.cu has been renamed cuKron_internal.cuh (CUDA header file), and if UNI_GPU is defined then Kron.cpp must be compiled using the CUDA compiler. This is managed by the local CMakeLists.txt file: # This must be before the target_sources_local(), so that the target is added as CUDA if(USE_CUDA) set_source_files_properties(Kron.cpp DIRECTORY ${CMAKE_SOURCE_DIR} PROPERTIES LANGUAGE CUDA) endif() set_source_files_properties() is directory-specific and by default only applies to targets defined in the current directory (and subdirectories?), hence it is necessary to explicitly set the DIRECTORY where the targets (cytnx) are defined. Possible future directions ========================== Apply the above techniques to the other linear algebra functions, and eventually drop src/backend/linalg_internal_interface.[hpp|cpp]. This should be simple in some cases; other cases would need more substantial refactoring to merge type-specific kernels into template functions. The Tensor::ptr(), gpu_ptr(), ptr_as<T>(), gpu_ptr_as<T>() functions do not really belong in class Tensor, it would be nice to move them into the Storage, ideally ptr() and ptr_as<T>() functions would only exist for some CpuStorage class, and similarly gpu_ptr() and gpu_ptr_as() would only exist for the GPU storage class. The dispatch to a back-end kernel could then go via two stages: firstly dispatch based on the device to a back-end storage (i.e. either cpu, or gpu), and then that back-end handles the device-specific code. Banish gpu_ptr() and gpu_ptr_as<T>() to backend/gpu/storage/.... The appearance of 'void' as one of the types in dtype was a complication that needed a hack to remove 'void' when constructing the variant of pointers. If void* is a possible type in the std::variant, then the dispatch via std::visit() is expecting that void* is a viable option to consider, and hence the first version of the Kron dispatch was attempting to compile the Kron kernel for tensors of type void... Removing void from consideration in this way works, but it hasa disadvantage that the index of the type in the variant (i.e. T.ptr().index()) is no longer aligned with the dtype variable. Not really a problem since there should not be any need to do that, but there is potential for that to go badly wrong if someone unwittingly expects them to be the same. Perhaps removing void completely from Type_class would be feasible, and instead treat any tensor with zero dimension as the equivalent of 'void'.
Cytnx-dev · Nov 28, 2024 · ed3e259 · ed3e259
1 parent 83dfac6
commit ed3e259
Show file tree

Hide file tree

Showing 15 changed files with 449 additions and 3,930 deletions.
diff --git a/include/Tensor.hpp b/include/Tensor.hpp
@@ -493,6 +493,58 @@ namespace cytnx {
     // }
     //@}
 
+    // This mechanism is to remove the 'void' type from Type_list. Taking advantage of it
+    // appearing first ...
+    struct internal {
+      template <typename Variant>
+      struct exclude_first;
+
+      template <typename First, typename... Rest>
+      struct exclude_first<std::variant<First, Rest...>> {
+        using type = std::variant<Rest...>;
+      };
+    };  // internal
+
+    // std::variant of pointers to Type_list, without void ....
+    using pointer_types =
+      make_variant_from_transform_t<typename internal::exclude_first<Type_list>::type,
+                                    std::add_pointer>;
+
+    // convert this->_impl->_storage._impl->Mem to a typed variant of pointers, excluding void*
+    pointer_types ptr() const;
+
+    // convert this->_impl->_strorage->Mem to the given pointer type.
+    // Throws an exception if T does not match this->dtype
+    template <typename T>
+    T *ptr_as() const {
+      cytnx_error_msg(this->dtype() != Type_class::cy_typeid_v<std::remove_cv_t<T>>,
+                      "[ERROR] Attempt to convert dtype %d (%s) to pointer of type %s",
+                      this->dtype(), Type_class::getname(this->dtype()),
+                      Type_class::getname(Type_class::cy_typeid_v<std::remove_cv_t<T>>));
+      return static_cast<T *>(this->_impl->_storage._impl->Mem);
+    }
+
+  #ifdef UNI_GPU
+    // std::variant of pointers to Type_list_gpu, without void ....
+    using gpu_pointer_types =
+      make_variant_from_transform_t<typename internal::exclude_first<Type_list_gpu>::type,
+                                    std::add_pointer>;
+
+    // convert this->_impl->_storage->Mem to a typed variant of pointers, excluding void*
+    gpu_pointer_types gpu_ptr() const;
+
+    // convert this->_impl->_strorage->Mem to the given pointer type.
+    // Throws an exception if T does not match this->dtype
+    template <typename T>
+    T *gpu_ptr_as() const {
+      cytnx_error_msg(this->dtype() != Type_class::cy_typeid_gpu_v<std::remove_cv_t<T>>,
+                      "[ERROR] Attempt to convert dtype %d (%s) to GPU pointer of type %s",
+                      this->dtype(), Type_class::getname(this->dtype()),
+                      Type_class::getname(Type_class::cy_typeid_gpu_v<std::remove_cv_t<T>>));
+      return static_cast<T *>(this->_impl->_storage._impl->Mem);
+    }
+  #endif
+
     /**
     @brief Convert a Storage to Tensor
     @param[in] in the Storage to be converted

diff --git a/include/Type.hpp b/include/Type.hpp
@@ -9,8 +9,9 @@
 #include <array>
 #include <utility>
 #include <vector>
+#include <variant>
 
-#include "cytnx_error.hpp"
+#include "cytnx_error.hpp"  // also brings in cuComplex.h
 
 #define MKL_Complex8 std::complex<float>
 #define MKL_Complex16 std::complex<double>
@@ -75,6 +76,20 @@ namespace cytnx {
 
   }  // namespace internal
 
+  // helper metafunction to transform a variant into another variant via a
+  // transform template alias
+  template <typename V, template <typename> class Transform>
+  struct make_variant_from_transform;
+
+  template <template <typename> class Transform, typename... Args>
+  struct make_variant_from_transform<std::variant<Args...>, Transform> {
+    using type = std::variant<typename Transform<Args>::type...>;
+  };
+
+  // helper type alias for make_variant_from_transform
+  template <typename V, template <typename> class Transform>
+  using make_variant_from_transform_t = typename make_variant_from_transform<V, Transform>::type;
+
   template <typename T>
   using is_complex = internal::is_complex_impl<std::remove_cv_t<T>>;
 
@@ -91,14 +106,29 @@ namespace cytnx {
   template <typename T>
   constexpr bool is_complex_floating_point_v = is_complex_floating_point<T>::value;
 
-  // tuple_element_index<T, Tuple> returns the index of type T in the Tuple, or compile error if not
+  // variant_index<T, Variant> returns the index of type T in the Variant, or compile error if not
   // found
-  template <typename T, typename Tuple>
-  struct tuple_element_index
-      : std::integral_constant<std::size_t, internal::index_in_tuple_helper<0, T, Tuple>()> {};
+  template <typename T, typename Variant>
+  struct variant_index;
+
+  template <typename T, typename... Types>
+  struct variant_index<T, std::variant<Types...>> {
+    static constexpr size_t value = std::variant_size_v<std::variant<Types...>>;
+  };
+
+  template <typename T, typename... Types>
+  struct variant_index<T, std::variant<T, Types...>> {
+    static constexpr size_t value = 0;
+  };
+
+  template <typename T, typename U, typename... Types>
+  struct variant_index<T, std::variant<U, Types...>> {
+    static constexpr size_t value = 1 + variant_index<T, std::variant<Types...>>::value;
+  };
 
-  template <typename T, typename Tuple>
-  constexpr int tuple_element_index_v = tuple_element_index<T, Tuple>::value;
+  // helper template variable
+  template <typename T, typename Variant>
+  static constexpr size_t variant_index_v = variant_index<T, Variant>::value;
 
   namespace internal {
     // type_size returns the sizeof(T) for the supported types. This is the same as
@@ -110,18 +140,27 @@ namespace cytnx {
   }  // namespace internal
 
   // the list of supported types. The dtype() of an object is an index into this list.
-  // This **MUST** match the ordering of __type::__pybind_type
+  // std::variant works better than std::tuple here since a variant is constrained to only
+  // hold each type once, and we have std::variant_alternative_t<n> to get the n'th type,
+  // as well as the variant_index_v helper to get the index of a given type
   using Type_list =
-    std::tuple<void, cytnx_complex128, cytnx_complex64, cytnx_double, cytnx_float, cytnx_int64,
-               cytnx_uint64, cytnx_int32, cytnx_uint32, cytnx_int16, cytnx_uint16, cytnx_bool>;
+    std::variant<void, cytnx_complex128, cytnx_complex64, cytnx_double, cytnx_float, cytnx_int64,
+                 cytnx_uint64, cytnx_int32, cytnx_uint32, cytnx_int16, cytnx_uint16, cytnx_bool>;
+
+  // For GPU storage, the types are slightly different because CUDA uses their own complex type
+#ifdef UNI_GPU
+  using Type_list_gpu =
+    std::variant<void, cuDoubleComplex, cuComplex, cytnx_double, cytnx_float, cytnx_int64,
+                 cytnx_uint64, cytnx_int32, cytnx_uint32, cytnx_int16, cytnx_uint16, cytnx_bool>;
+#endif
 
   // The number of supported types
-  constexpr int N_Type = std::tuple_size_v<Type_list>;
+  constexpr int N_Type = std::variant_size_v<Type_list>;
   constexpr int N_fType = 5;
 
   // The friendly name of each type
   template <typename T>
-  constexpr char* Type_names;
+  constexpr char* Type_names = nullptr;
   template <>
   constexpr const char* Type_names<void> = "Void";
   template <>
@@ -149,7 +188,7 @@ namespace cytnx {
 
   // The corresponding Python enumeration name
   template <typename T>
-  constexpr char* Type_enum_name;
+  constexpr char* Type_enum_name = nullptr;
   template <>
   constexpr const char* Type_enum_name<void> = "Void";
   template <>
@@ -187,7 +226,10 @@ namespace cytnx {
 
   template <typename T>
   struct Type_struct_t {
-    static constexpr unsigned int cy_typeid = tuple_element_index_v<T, Type_list>;
+    static constexpr unsigned int cy_typeid = variant_index_v<T, Type_list>;
+#ifdef UNI_GPU
+    static constexpr unsigned int cy_typeid_gpu = variant_index_v<T, Type_list_gpu>;
+#endif
     static constexpr const char* name = Type_names<T>;
     static constexpr const char* enum_name = Type_enum_name<T>;
     static constexpr bool is_complex = is_complex_v<T>;
@@ -202,39 +244,45 @@ namespace cytnx {
   };
 
   namespace internal {
-    template <typename Tuple, std::size_t... Indices>
+    template <typename Variant, std::size_t... Indices>
     constexpr auto make_type_array_helper(std::index_sequence<Indices...>) {
       return std::array<Type_struct, sizeof...(Indices)>{
-        Type_struct_t<std::tuple_element_t<Indices, Tuple>>::construct()...};
+        Type_struct_t<std::variant_alternative_t<Indices, Variant>>::construct()...};
     }
-    template <typename Tuple>
+    template <typename Variant>
     constexpr auto make_type_array() {
-      return make_type_array_helper<Tuple>(std::make_index_sequence<std::tuple_size_v<Tuple>>());
+      return make_type_array_helper<Variant>(
+        std::make_index_sequence<std::variant_size_v<Variant>>());
     }
   }  // namespace internal
 
-  // Typeinfos is a std::array<Type_struct> for each type in Type_list
-  constexpr auto Typeinfos = internal::make_type_array<Type_list>();
-
-  template <typename T>
-  constexpr unsigned int cy_typeid = tuple_element_index_v<T, Type_list>;
-
   class Type_class {
    private:
    public:
+    // Typeinfos is a std::array<Type_struct> for each type in Type_list
+    static constexpr auto Typeinfos = internal::make_type_array<Type_list>();
+
+    template <typename T>
+    static constexpr unsigned int cy_typeid_v = variant_index_v<T, Type_list>;
+
+#ifdef UNI_GPU
+    template <typename T>
+    static constexpr unsigned int cy_typeid_gpu_v = variant_index_v<T, Type_list_gpu>;
+#endif
+
     enum Type : unsigned int {
-      Void = cy_typeid<void>,
-      ComplexDouble = cy_typeid<cytnx_complex128>,
-      ComplexFloat = cy_typeid<cytnx_complex64>,
-      Double = cy_typeid<cytnx_double>,
-      Float = cy_typeid<cytnx_float>,
-      Int64 = cy_typeid<cytnx_int64>,
-      Uint64 = cy_typeid<cytnx_uint64>,
-      Int32 = cy_typeid<cytnx_int32>,
-      Uint32 = cy_typeid<cytnx_uint32>,
-      Int16 = cy_typeid<cytnx_int16>,
-      Uint16 = cy_typeid<cytnx_uint16>,
-      Bool = cy_typeid<cytnx_bool>
+      Void = cy_typeid_v<void>,
+      ComplexDouble = cy_typeid_v<cytnx_complex128>,
+      ComplexFloat = cy_typeid_v<cytnx_complex64>,
+      Double = cy_typeid_v<cytnx_double>,
+      Float = cy_typeid_v<cytnx_float>,
+      Int64 = cy_typeid_v<cytnx_int64>,
+      Uint64 = cy_typeid_v<cytnx_uint64>,
+      Int32 = cy_typeid_v<cytnx_int32>,
+      Uint32 = cy_typeid_v<cytnx_uint32>,
+      Int16 = cy_typeid_v<cytnx_int16>,
+      Uint16 = cy_typeid_v<cytnx_uint16>,
+      Bool = cy_typeid_v<cytnx_bool>
     };
 
     static constexpr void check_type(unsigned int type_id) {
@@ -274,13 +322,73 @@ namespace cytnx {
 
     template <class T>
     static constexpr unsigned int cy_typeid(const T& rc) {
-      return Type_struct_t<T>::cy_typeid;
+      return cy_typeid_v<T>;
     }
 
-    template <typename T>
-    static constexpr unsigned int cy_typeid_v = typeid(T{});
+    // Find a common type for typeL and typeR
+    static constexpr unsigned int type_promote(unsigned int typeL, unsigned int typeR) {
+      if (typeL < typeR) {
+        if (typeL == 0) return 0;
 
-    static unsigned int type_promote(unsigned int typeL, unsigned int typeR);
+        if (!is_unsigned(typeR) && is_unsigned(typeL)) {
+          return typeL - 1;
+        } else {
+          return typeL;
+        }
+      } else {
+        if (typeR == 0) return 0;
+        if (!is_unsigned(typeL) && is_unsigned(typeR)) {
+          return typeR - 1;
+        } else {
+          return typeR;
+        }
+      }
+    }
+
+    // type metafunction for type promotion
+    template <typename TL, typename TR>
+    using type_promote_t =
+      std::variant_alternative_t<Type_class::type_promote(variant_index_v<TL, Type_list>,
+                                                          variant_index_v<TR, Type_list>),
+                                 Type_list>;
+
+    // Helper to promote two pointer types (note does _not_ return another pointer type)
+    template <typename TL, typename TR>
+    struct type_promote_from_pointer {
+      using type = void;
+    };
+
+    template <typename TL, typename TR>
+    struct type_promote_from_pointer<TL*, TR*> {
+      using type = type_promote_t<std::decay_t<TL>, std::decay_t<TR>>;
+    };
+
+    // helper typedef
+    template <typename TL, typename TR>
+    using type_promote_from_pointer_t = typename type_promote_from_pointer<TL, TR>::type;
+
+#ifdef UNI_GPU
+    // .. and we need a version where TL and TR are GPU device pointers
+    template <typename TL, typename TR>
+    using type_promote_gpu_t =
+      std::variant_alternative_t<Type_class::type_promote(variant_index_v<TL, Type_list_gpu>,
+                                                          variant_index_v<TR, Type_list_gpu>),
+                                 Type_list_gpu>;
+
+    template <typename TL, typename TR>
+    struct type_promote_from_gpu_pointer {
+      using type = void;
+    };
+
+    template <typename TL, typename TR>
+    struct type_promote_from_gpu_pointer<TL*, TR*> {
+      using type = type_promote_gpu_t<std::decay_t<TL>, std::decay_t<TR>>;
+    };
+
+    // helper typedef
+    template <typename TL, typename TR>
+    using type_promote_from_gpu_pointer_t = typename type_promote_from_gpu_pointer<TL, TR>::type;
+#endif
 
   };  // Type_class
   /// @endcond

diff --git a/src/Tensor.cpp b/src/Tensor.cpp
@@ -54,6 +54,32 @@ namespace cytnx {
     return self;
   }
 
+  template <std::size_t... Is>
+  Tensor::pointer_types void_ptr_to_variant_impl(void *p, unsigned int dtype,
+                                                 std::index_sequence<Is...>) {
+    // Lambda to select the correct type based on dtype
+    Tensor::pointer_types result;
+    (
+      [&]() {
+        if (dtype == Is) {
+          using TargetType =
+            std::variant_alternative_t<Is,
+                                       typename Tensor::internal::exclude_first<Type_list>::type>;
+          result = static_cast<TargetType *>(p);
+        }
+      }(),
+      ...);  // Fold expression
+    return result;
+  }
+
+  Tensor::pointer_types Tensor::ptr() const {
+    cytnx_error_msg(this->dtype() == 0, "[ERROR] operation not allowed for empty (void) Tensor.%s",
+                    "\n");
+    // dtype()-1 here because we have removed void from the variant
+    return void_ptr_to_variant_impl(this->_impl->_storage._impl->Mem, this->dtype() - 1,
+                                    std::make_index_sequence<std::variant_size_v<pointer_types>>{});
+  }
+
   // ADD
   Tensor Tensor::Tproxy::operator+(
     const cytnx_complex128 &rc) const {  //{return this->_operatorADD(rc);};