-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cytnx Storage Refactor Proposal #500
Comments
Looks good to me -- some minor points:
|
Thank @ianmccul for commenting on this proposal.
|
I was having another look at this. I don't understand why classes But that isn't the case -- the code on the dev-master branch for For example, from template <typename DType>
void StorageImplementation<DType>::fill(const cytnx_complex128 &val) {
Fill(val);
}
template <typename DType>
void StorageImplementation<DType>::fill(const cytnx_complex64 &val) {
Fill(val);
}
template <typename DType>
void StorageImplementation<DType>::fill(const cytnx_double &val) {
Fill(val);
} (and so on) There is no reason for this to be a virtual function. Any caller of If you still need an untyped |
Related to the above, but a separate issue, template <typename DType>
template <typename OtherDType>
void StorageImplementation<DType>::Fill(const OtherDType &value) {
if constexpr (!std::is_constructible_v<DType, OtherDType>) {
cytnx_error_msg(true, "[ERROR] cannot fill %s value into %s container",
Type.getname(Type.cy_typeid(OtherDType())).c_str(),
Type.getname(Type.cy_typeid(DType())).c_str());
return;
} else {
if (this->device == Device.cpu) {
utils_internal::FillCpu(this->Mem, static_cast<DType>(value), this->len);
} else {
#ifdef UNI_GPU
checkCudaErrors(cudaSetDevice(this->device));
utils_internal::FillGpu(this->Mem, static_cast<DType>(value), this->len);
#else
cytnx_error_msg(true, "[ERROR][fill] fatal internal, %s",
"storage is on gpu without CUDA support\n");
#endif
}
}
} I can't see any reason to defer the error here to runtime. If you call template <typename T>
class StorageImplementation {
template <typename U> void Fill(U value); // existing Fill() function
};
template <typename T>
class TensorImplementation {
StorageImplementation<T> storage_;
public:
template <typename U>
void Fill(U x) { storage_.Fill(x); }
};
class Scalar {
std::variant<int, float, double /* other types */> data_;
};
class Tensor {
std::variant<
std::shared_ptr<TensorImplementation<int>>,
std::shared_ptr<TensorImplementation<float>>,
std::shared_ptr<TensorImplementation<double>>
/* other types */
> data_;
public:
void Fill(Scalar s);
};
// Helper for Tensor::Fill (simplified version)
template <typename T, typename U>
void FillImpl(std::shared_ptr<TensorImplementation<T>>& tensor_impl, U scalar_val) {
if constexpr (!std::is_constructible_v<T, U>) {
cytnx_error_msg(true, "Cannot fill type %s into tensor of type %s.",
cy_type_str<U>, cy_type_str<T>); // cy_type_str does not yet exist
} else {
tensor_impl->Fill(scalar_val);
}
}
// Tensor::Fill implementation, foward to FillImpl via visitor pattern
void Tensor::Fill(Scalar s) {
std::visit(
[](auto&& tensor_impl, auto&& scalar_val) {
FillImpl(tensor_impl, scalar_val);
},
this->data_, s.data_);
} |
We still hold untyped |
OK, let me think about that. I agree at some point it needs to be a runtime error, but that should be in a function/method that belongs to the untyped interface, not a member of the typed interface |
Hi @IvanaGyro , I had a look to see if I could find a nice way to implement However, there is a problem when compiling the code without if constexpr (std::is_same_v<GpuAllocator<T>, Allocator>) {
checkCudaErrors(cudaSetDevice(device));
if constexpr (std::is_same_v<cytnx_complex128, T>) {
utils_internal::cuComplexmem_gpu_cdtd(out->data(), storage_.data(), storage_.size(),
/* get_real */ false);
} else {
utils_internal::cuComplexmem_gpu_cftf(out->data(), storage_.data(), storage_.size(),
/* get_real */ false);
} Unfortunately, using I think there is a design question here, in the sense that it is the storage that is in charge of managing these operations, but the way you have it now, it is the Allocator that determines whether you want the CPU or the GPU version. So the storage needs to know the details of the allocator, which is not a great design. But because it is a template, it all comes down to spelling: in effect, you have two different classes, one for CPU storage, and one for GPU storage, implemented by alternative partial specializations of template <typename T>
using CpuStorage = StorageImplementation<T, std::allocator<T>>;
template <typename T>
using GpuStorage = StorageImplementation<T, GpuAllocator<T>>; So the solution to get the template to compile with and without They could both inherit from a common base class, so functions that are the same in both the GPU and CPU cases can be implemented just once. That intermediate base class could be a template of T, or the Allocator, or both; whatever is easier. |
Suppose we only support the GPU having UMA feature, is there any negative impact if we only use, no matter whether |
That could work. In this model, it isn't really a #ifdef UNI_GPU
template <typename T>
using StorageAllocator = GpuAllocator<T>;
#else
using StorageAllocator = std::allocator<T>;
#endif Alternatively, have two versions of #ifdef UNI_GPU
#include "StorageImplementation_gpu.hpp"
#else
#include "StorageImplementation_cpu.hpp"
#endif as a cleaner approach than sprinkling In any of these approaches, you could use an intermediate base class for common code between the cpu and gpu cases. |
By the way, if you use |
class StorageBase
andclass TensorBase
Introduction
Per off-line (not on GitHub) discussions on 2024.10.03 and 2024.11.06, some chapters of this proposal are removed from the original unpublished draft. Only the chapters that got consensus remained.
This proposal outlines a plan for refactoring the
Storage
andTensor
classes, along with their respective implementation classes. The goal of this refactoring is to enhance the efficiency of the program and minimize human errors. Please note that this proposal is not finalized and does not constitute a definitive list of items to be implemented. We encourage discussions regarding any details of the proposed refactoring plan.Importantly, this proposal DOES NOT alter the APIs for Python.
Advantages
Refactoring takes time. There is no need to refactor the code if there is no advantage. Here are some advantages we will have if we apply the plan mentioned in the following sections.
boost::intrusive_ptr
Dependencies: Currently, we utilize two components from Boost:intrusive_ptr
andunordered_map
. By replacingboost::unordered_map
withstd::unordered_map
, we can eliminate our dependency on Boost.Storage
class, we can lower the likelihood of contributors inadvertently accessing memory incorrectly.UniTensor
class to PyTorch's API. This proposal aims to decouple them.Class Diagrams
Below are class diagrams illustrating the current and proposed designs. Both diagrams omit specifiers such as
const
,virtual
, etc., for clarity.And here is the class diagram of the new design. The diagram only shows the stuff changed. The levels above from
class UniTensor
are not affected. Every interface only shows the methods to be implemented for bridging other backends.Proposed Changes
Templatization of Implementations of Storage and Tensor
The current design repeatedly defines implementations for
Storage
, resulting in numerous duplicated functions. Additionally,Tensor_impl
lacks type information, preventing compile-time type checking and leading to runtime checks that can degrade performance.The new design templatizes the implementations of Storage and Tensor to enable type checking at the compile time. The compiler can also help to determine if one type can convert to the other type. The new design also makes the containers for CPU and the containers for GPU different types. By separating them, we can easily use the STL container, like
std::vector
to maintain the memory. And we only open the door of usingthrust::vector
to maintain the memory both on CPU and GPU.Encapsulation of All Classes
Using
class Storage_base
as an example:Makes member variables public increasing the risk of illegally accessing the memory.
Besides, both Google Style Guide and C++ Core Guidelines prefer private member variables.
Here is a change for
class Storage_base
.The access control of the member variables changed or removed in the class diagram of the new design will be updated.
Creation of Interfaces:
class StorageBase
andclass TensorBase
The introduction of these interfaces allows for a cleaner separation between interface definitions and their implementations, enhancing flexibility for future integrations with other libraries. There are two options for implementation.
Added on 2024.11.08: This proposal requests the developers to implement the type-less
StorageBase
andTensorBase
while integrating with other libraries. However, by following with #496, it may be also possible to request the develops to implement typedStorageBase<typename>
andTensorBase<typename>
.Option 1 (Not Selected)
This option proposed making existing classes base classes for implementations from other libraries but presented several issues including forward declaration requirements and mixed responsibilities.
There are three problems with this option.
class CustomUint64Storage
, contains the unused variableimplementation_
which wastes memory.Option 2 (Selected)
The selected option introduces abstract interfaces (
class StorageBase
andclass TensorBase
) that must be implemented by any concrete storage or tensor class:Except for the public member functions this proposal appeals to remove, the public member functions of
class StorageBase
andclass TensorBase
will copy from the currentclass Storage
andclass Tensor
respectively. Below are the pure virtual functions ofclass StorageBase
andclass TensorBase
.These pure virtual functions must be overridden by the implementation.There is no function for getting
Storage
from TensorBase because we don't expect other libraries to have the component equivalent toStorage
. With this assumption, it means we have to change the arguments of the functions consumingclass Storage
to the iterators and the raw pointers.The reason for adding
HasContiguousIterator()
is to support the tensor implementation which stores the data in a multi-dimensional data structure. We may consider providing alternative and less efficient algorithms, likeExp()
, for that kind of tensor implementation. If we only want to support the tensor implementations storing data in a one-dimensional array, we should consider replacing the wholeStorage
component withvector<T>
because the functionality ofStorage
is almost equal to the functionality ofvector
.Discussion on Removal of Types
To facilitate type elimination in the C++ API, it is crucial to retain the clone() functions, as well as the class StorageBase, class TensorBase, class Storage, and class Tensor. If we only remove types during the Python binding process, these components will no longer be necessary, resulting in a cleaner and more straightforward codebase.
The usage of
clone()
will be further discussed in the next section.Blind Storage from Shapes
Another area for improvement is how
Storage
interacts with shapes. TheStorage
class currently relies on shape information that is passed to its member functions by the caller, which can lead to implicit coupling between storage management and shape handling.To foster better separation of concerns, we propose moving the functions below into
Tensor
fromStorage
.With the new design, we can consider eliminating the entire
Storage
component, as it essentially functions as a one-dimensional array. This functionality can be effectively replaced by usingstd::vector
orthrust::vector
as a member variable within theTensor
component. By making this change, we can streamline the implementation and reduce complexity in the codebase.Steps
Following these steps will enable the compiler to detect more mistakes made during the refactoring process.
class Storage_base
into an abstract class.USIInit
,Storage::Init()
,Storage_base::Init()
, andStorage_base::_Init_byptr()
with constructors and factory classes.utils_internal::Movemem_cpu_x
,utils_internal::Fill_gpu
, and their CUDA counterparts.class Storage_base
.class Tensor_Impl
.Storage
.class StorageBase
andclass TensorBase
.class Storage
orclass Tensor
to instead consumeclass StorageBase
,class TensorBase
, or raw pointers.The text was updated successfully, but these errors were encountered: