Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Dispatch via std::variant, and type-safe conversion of Tensor::_impl-…
…>_storage._impl->Mem to a pointer Type-safe conversions ===================== Tensor::ptr() returns this->_impl->_storage._impl->Mem, contained in a std::variant of pointers to the supported types (excluding void). If the dtype is void, then throw an error template <typename T> Tensor::ptr_as<T>() returns this->_impl->_storage._impl->Mem cast to T*. If dtype does not match T, then throw an error There are also versions for GPU storage, enabled if UNI_GPU is defined: Tensor::gpu_ptr() returns a std::variant of pointers to the supported types, as required for CUDA (i.e. using cuComplex etc). template <typename T> Tensor::gpu_ptr_as<T>() returns this->_impl->_storage._impl->Mem cast to T*. If dtype does not match the CUDA type T, then throw an error There is also a compile-time mechanism to promote types, as Type_class::type_promote_t<T, U>, which uses the type_promote() function to determine the type corresponding to type_promote(dtype_T, dtype_U) (yay for constexpr functions!) Calculating the promoted type via pointers is likely to be a common operation, so there is also Type_class::type_promote_from_pointer_t<T,U>, which requires that T and U are pointer types. This is almost equivalent to type_promote_t<std::remove_pointer_t<T>, std::remove_pointer_t<U>>, except that if T is not actually a pointer type then std::remove_pointer_t<T> is just T, and hence gives the same result as type_promote_t<T,U>, but type_promote_from_pointer_t<T,U> is void if T or U are not pointer types (with the intent that this will eventually result in a compile error if it was not indented). Note that type_promote_from_pointer_t<T,U> is NOT itself a pointer type. Similarly, for GPU types, there is type_promote_gpu_t and type_promote_from_gpu_pointer_t Dispatch via std::variant ========================= src/linalg/Kron.cpp demonstrates dispatch to a two-parameter function via std::variant and the ptr() function. The key code is std::visit( [&](auto tl, auto tr) { // tl and tr are pointer types here. using out_type = Type_class::type_promote_from_pointer_t<decltype(tl), decltype(tr)>; static_assert(!std::is_same_v<out_type, void>); cytnx::linalg_internal::Kron_general(out.ptr_as<out_type>(), tl, tr, pad_shape1, pad_shape2); }, Tl.ptr(), Tr.ptr()); There Tl and Tr are Tensor objects, and this does a double dispatch on the dtypes of Tl and Tr, via std::visit(function, Tl.ptr(), Tr.ptr()) with the function being a lambda function to forward the pointers to the template kernel linalg_internal::Kron_general(). out_type is computed from the promotion of the pointer types of tl and tr, and the pointer to the output tensor is obtained from out.ptr_as<out_type>(). This will throw an error if out_type does not match the previously assigned dtype, although it represents logic error in the code, rather than a user error. In this case we can be confident it should never trigger, since the initialization of Tensor out immediately before dispatching to the kernel sets the dtype from type_promote(), which is using the same code to compute the promoted type as type_promote_from_pointer_t. A slight disadvantage of this approach is that this requires that the template kernel is included into Kron.hpp, which increases compile time marginally but it means that linalg_internal_cpu/Kron_internal.cpp is no longer used. Since this is a very long file that must be kept in sync with any changes to dtypes, the type promotion algorithm, and the Kron kernel itself, it is a substantial redunction in source code and maintenance. Similarly, the static jump tables for the Kron function in backend/linalg_internal_interface.cpp have been removed (these were actually missing some entries). The corresponding changes have also been done for the CUDA version, now using Tl.gpu_ptr() and Tr.gpu_ptr(), and dispatching to the CUDA kernel cuKron_general, and using type_promote_from_gpu_pointer_t to get the correct CUDA type of the output tensor. A complication in the CUDA case is that the CUDA template kernel now needs to be included into Kron.cpp, so the cuKron_internal.cu has been renamed cuKron_internal.cuh (CUDA header file), and if UNI_GPU is defined then Kron.cpp must be compiled using the CUDA compiler. This is managed by the local CMakeLists.txt file: # This must be before the target_sources_local(), so that the target is added as CUDA if(USE_CUDA) set_source_files_properties(Kron.cpp DIRECTORY ${CMAKE_SOURCE_DIR} PROPERTIES LANGUAGE CUDA) endif() set_source_files_properties() is directory-specific and by default only applies to targets defined in the current directory (and subdirectories?), hence it is necessary to explicitly set the DIRECTORY where the targets (cytnx) are defined. Possible future directions ========================== Apply the above techniques to the other linear algebra functions, and eventually drop src/backend/linalg_internal_interface.[hpp|cpp]. This should be simple in some cases; other cases would need more substantial refactoring to merge type-specific kernels into template functions. The Tensor::ptr(), gpu_ptr(), ptr_as<T>(), gpu_ptr_as<T>() functions do not really belong in class Tensor, it would be nice to move them into the Storage, ideally ptr() and ptr_as<T>() functions would only exist for some CpuStorage class, and similarly gpu_ptr() and gpu_ptr_as() would only exist for the GPU storage class. The dispatch to a back-end kernel could then go via two stages: firstly dispatch based on the device to a back-end storage (i.e. either cpu, or gpu), and then that back-end handles the device-specific code. Banish gpu_ptr() and gpu_ptr_as<T>() to backend/gpu/storage/.... The appearance of 'void' as one of the types in dtype was a complication that needed a hack to remove 'void' when constructing the variant of pointers. If void* is a possible type in the std::variant, then the dispatch via std::visit() is expecting that void* is a viable option to consider, and hence the first version of the Kron dispatch was attempting to compile the Kron kernel for tensors of type void... Removing void from consideration in this way works, but it hasa disadvantage that the index of the type in the variant (i.e. T.ptr().index()) is no longer aligned with the dtype variable. Not really a problem since there should not be any need to do that, but there is potential for that to go badly wrong if someone unwittingly expects them to be the same. Perhaps removing void completely from Type_class would be feasible, and instead treat any tensor with zero dimension as the equivalent of 'void'.
- Loading branch information