diff --git a/docs/img/computational_graph.png b/docs/img/computational_graph.png deleted file mode 100644 index ab8e2715..00000000 Binary files a/docs/img/computational_graph.png and /dev/null differ diff --git a/docs/installation.md b/docs/installation.md deleted file mode 100644 index 2cc67824..00000000 --- a/docs/installation.md +++ /dev/null @@ -1,73 +0,0 @@ -# Installation - -## System Support - -This library supports both x86_64/amd64 and arm64/aarch64. Check if your system is supported out of the box in the table below. The library requires very few dependencies, so as long as your machine supports a C++ compiler and python, you should be able to get it working by fiddling with the CMake and setuptools files. - -| OS + Arch | Python | Latest Release Directly Tested | -|-|-|-| -|Ubuntu 24.04 AMD64 | Python 3.9+ || -|Ubuntu 22.04 AMD64 | Python 3.9+ || -|Ubuntu 20.04 AMD64 | Python 3.9+ || -|Ubuntu 24.04 ARM64 | TBD | | -|Ubuntu 22.04 ARM64 | TBD | | -|Ubuntu 20.04 ARM64 | TBD | | -|ArchLinux 6.6.68 LTS | Python 3.9+ || -|MacOS 15 ARM64 | Python 3.9+ || -|MacOS 14 ARM64 | Python 3.9+ | v0.0.16 | -|MacOS 13 ARM64 | Python 3.9+ || -|MacOS 12 ARM64 | Python 3.9+ || -|MacOS 11 ARM64 | Python 3.9+ || -|Windows 11 | Python 3.9+ | v0.0.17 | -|Windows 10 | Python 3.9+ || -|Debian 13 | Python 3.9+ || -|Debian 12 | Python 3.9+ || -|LinuxMint 22 | Python 3.9+ || -|LinuxMint 21 | Python 3.9+ || - -## Compiling the `aten` Library - -Your machine will need system dependencies such as CMake, a C++ compiler, and pybind11. The library uses C++17. Preferably you will have git and conda installed already. For more specific instructions on installing these on your system, refer to the more detailed installation guide. - -Git clone the repo, then pip install, which will run `setup.py`. - -``` -git clone git@github.com:mbahng/pyember.git -cd pyember -pip install . -``` - -This runs `cmake` on `aten/CMakeLists.txt`, which calls the following. -1. It always calls `aten/src/CMakeLists.txt` that compiles and links the source files in the C++ tensor library. -2. If `BUILD_PYTHON_BINDINGS=ON` (always on by default), it further calls `aten/bindings/CMakeLists.txt` to further generate a `.so` file that can be imported into `ember`. -3. If `BUILD_DEV=ON`, it calls `aten/test/CMakeLists.txt` to further compile the C++ unit testing suite. - -If there are problems with building, you should check, in order, -1. Whether `build/` has been created. This is the first step in `setup.py` -2. Whether the compiled `main.cpp` and, if `BUILD_DEV=ON`, the C++ unit test files have been compiled, i.e. if `build/src/main` and `build/test/tests` executables exist. -3. Whether `build/*/aten.cpython-3**-darwin.so` exists (somewhere in the build directory, depending on the machine). The Makefile generated by `aten/bindings/CMakeLists.txt` will produce `build/*/aten.cpython-3**-darwin.so`. -4. The `setup()` function will immediately copy this `.so` file to `ember/aten.cpython-3**-darwin.so`. You should see a success message saying that it has been moved or an error. The `.so` file must live within `ember`, the actual library, since `ember/__init__.py` must access it within the same directory level. - -## Testing and Development - -The pip install comes with two more environment variable parameters. Note that the following command is whitespace-sensitive. -``` -CMAKE_DEBUG=1 CMAKE_DEV=1 pip install . -``` -1. Setting `CMAKE_DEBUG=1` compiles the `aten` library with debug mode (`-g`) on, which I use when using gdb/lldb on the compiled code. -2. Setting `CMAKE_DEV=1` compiles the C++ testing suite as well. If you want to do this, you will also need to install google-tests. A code snippet for Ubuntu and Debian is shown below. -``` -sudo apt-get install libgtest-dev -cd /usr/src/gtest -cmake CMakeLists.txt -make -cp lib/*.a /usr/lib -rm -rf /var/lib/apt/lists/* -``` - -If you would like to run tests and/or develop the package yourself, you can run the script `./run_tests.sh all` (args `python` to run just python tests and `cpp` to run just C++ tests), which will -1. Run all C++ unit tests for `aten`, ensuring that all functions work correctly. -2. Run all Python unit tests for `ember`, ensuring that additional functions work correctly and that the C++ functions are bound correctly. - -The stub (`.pyi`) files for `aten` are located in `ember/aten`. - diff --git a/docs/progress.md b/docs/progress.md deleted file mode 100644 index 79c2f720..00000000 --- a/docs/progress.md +++ /dev/null @@ -1,136 +0,0 @@ -# Progress - - To do: - 1. Add a template argument for Tensor dtype. - 2. Store all tensors in heap to preserve them after stack is destroyed. - - ✅ - Done - ❌ - Not implemented - 🪧 - Don't need, either should not be accessed or is not necessary (e.g. due to inheritance) - 🚧 - In progress - - ## Aten BaseTensor - - | C++ Method | PyBind Method | Status | C++ Tests | Python Tests | Stubs | - |----------------------------------------------------------------------|----------------------|--------|-----------|--------------|--------| - | `std::string type() const` | `type()` | ✅ | 🪧 | 🪧 | ✅ | - | `std::string dtype() const` | `dtype()` | ✅ | 🪧 | 🪧 | ✅ | - | `bool operator==(BaseTensor&)` | `__eq__()` | ✅ | 🪧 | 🪧 | ✅ | - | `bool operator!=(BaseTensor&)` | `__ne__()` | ✅ | 🪧 | 🪧 | ✅ | - | `double at(const std::vector&) const` | `__getitem__()` | ✅ | 🪧 | 🪧 | ✅ | - | `double at(const std::vector&)` | `__setitem__()` | ✅ | 🪧 | 🪧 | ✅ | - | `std::unique_ptr slice(const std::vector&) const` | `__getitem__()` | ✅ | 🪧 | 🪧 | ✅ | - | `operator std::string() const` | `__str__()` | ✅ | 🪧 | 🪧 | ✅ | - | `operator std::string() const` | `__repr__()` | ✅ | 🪧 | 🪧 | ✅ | - | `BaseTensor& reshape(std::vector)` | `reshape(List[int])` | ✅ | 🪧 | 🪧 | ✅ | - - ## Aten GradTensor - - | C++ Method | PyBind Method | Status | C++ Tests | Python Tests | Stubs | - |----------------------------------------------------------------------|--------------------------------------------|--------|-----------|--------------|--------| - | `std::string type() const` | `type()` | ✅ | ✅ | ✅ | ✅ | - | `std::string dtype() const` | `dtype()` | ✅ | ✅ | ✅ | ✅ | - | `bool operator==(GradTensor&)` | `__eq__()` | ✅ | ✅ | ✅ | 🪧 | - | `bool operator!=(GradTensor&)` | `__ne__()` | ✅ | ✅ | ✅ | 🪧 | - | `double at(const std::vector&) const` | `__getitem__()` | ✅ | ✅ | ✅ | ✅ | - | `double at(const std::vector&)` | `__setitem__()` | ✅ | ✅ | ✅ | ✅ | - | `std::unique_ptr slice(const std::vector&) const` | `__getitem__()` | ✅ | ✅ | ✅ | ✅ | - | `BaseTensor::operator std::string() const` | `__str__()` | ✅ | ❌ | ✅ | ✅ | - | `BaseTensor::operator std::string() const` | `__repr__()` | ✅ | ❌ | ✅ | ✅ | - | `size_t pivot() const` | `pivot()` | ✅ | ✅ | ✅ | ✅ | - | `GradTensor()` | `GradTensor()` | ✅ | ✅ | ✅ | ✅ | - | `GradTensor(std::vector, std::vector, size_t)` | `GradTensor(List[double], List[int], int)` | ✅ | ✅ | ✅ | ✅ | - | `GradTensor(std::vector, size_t)` | `GradTensor(List[int], int)` | ✅ | ✅ | ✅ | ✅ | - | `GradTensor::eye(size_t, size_t)` | | ✅ | ✅ | ✅ | ✅ | - | `transpose()` | `transpose()` | ✅ | ✅ | ✅ | ✅ | - | `GradTensor copy() const` | `copy()` | ✅ | ✅ | ✅ | ✅ | - | | `__neg__()` | ✅ | 🪧 | ✅ | ✅ | - | `Tensor add(Tensor&)` | `__add__(Tensor)` | ✅ | ✅ | ✅ | ✅ | - | | `__radd__(Tensor)` | ✅ | 🪧 | ✅ | | - | `GradTensor add(GradTensor&)` | `__add__(GradTensor)` | ✅ | ✅ | ✅ | ✅ | - | | `__radd__(GradTensor)` | ✅ | 🪧 | ✅ | | - | `GradTensor add(double&)` | `__add__(float)` | ✅ | ✅ | ✅ | ✅ | - | | `__radd__(float)` | ✅ | 🪧 | ✅ | | - | `Tensor sub(Tensor&)` | `__sub__(Tensor)` | ✅ | ✅ | ✅ | ✅ | - | | `__rsub__(Tensor)` | ✅ | 🪧 | ✅ | | - | `GradTensor sub(GradTensor&)` | `__sub__(GradTensor)` | ✅ | ✅ | ✅ | ✅ | - | | `__rsub__(GradTensor)` | ✅ | 🪧 | ✅ | | - | `GradTensor sub(double&)` | `__sub__(float)` | ✅ | ✅ | ✅ | ✅ | - | | `__rsub__(float)` | ✅ | 🪧 | ✅ | | - | `Tensor mul(Tensor&)` | `__mul__(Tensor)` | ✅ | ✅ | ✅ | ✅ | - | | `__rmul__(Tensor)` | ✅ | 🪧 | ✅ | | - | `GradTensor mul(GradTensor&)` | `__mul__(GradTensor)` | ✅ | ✅ | ✅ | ✅ | - | | `__rmul__(GradTensor)` | ✅ | 🪧 | ✅ | | - | `GradTensor mul(double&)` | `__mul__(float)` | ✅ | ✅ | ✅ | ✅ | - | | `__rmul__(float)` | ✅ | 🪧 | ✅ | | - | `GradTensor matmul(GradTensor&)` | `__matmul__(GradTensor)` | ✅ | ✅ | ✅ | ✅ | - - ## Aten Tensor - - | C++ Method | PyBind Method | Status | C++ Tests | Python Tests | Stubs | - |-------------------------------------------------------------------------|-----------------------------------------------|--------|-----------|--------------|--------| - | `std::string type() const` | `type()` | ✅ | ✅ | ✅ | ✅ | - | `std::string dtype() const` | `dtype()` | ✅ | ✅ | ✅ | ✅ | - | `bool operator==(Tensor&)` | `__eq__()` | ✅ | ✅ | ✅ | 🪧 | - | `bool operator!=(Tensor&)` | `__ne__()` | ✅ | ✅ | ✅ | 🪧 | - | `double at(const std::vector&) const` | `__getitem__()` | ✅ | ✅ | ✅ | ✅ | - | `double at(const std::vector&)` | `__setitem__()` | ✅ | ✅ | ✅ | ✅ | - | `std::unique_ptr slice(const std::vector&) const` | `__getitem__()` | ✅ | ✅ | ✅ | ✅ | - | `BaseTensor::operator std::string() const` | `__str__()` | ✅ | ✅ | ✅ | ✅ | - | `BaseTensor::operator std::string() const` | `__repr__()` | ✅ | ✅ | ✅ | ✅ | - | `Tensor(std::vector, std::vector)` | `Tensor(List[float], List[int])` | ✅ | ✅ | ✅ | ✅ | - | `Tensor(std::vector)` | `Tensor(List[float])` | ✅ | ✅ | ✅ | ✅ | - | `Tensor(std::vector>)` | `Tensor(List[List[float]])` | ✅ | ✅ | ✅ | ✅ | - | `Tensor(std::vector>>)` | `Tensor(List[List[List[float]]])` | ✅ | ✅ | ✅ | ✅ | - | `static Tensor arange(int, int, int)` | `Tensor.arange(int, int, int)` | ✅ | ✅ | ✅ | ✅ | - | `static Tensor linspace(double, double, int)` | `Tensor.linspace(float, float, int)` | ✅ | ✅ | ✅ | ✅ | - | `static Tensor gaussian(std::vector, double, double)` | `Tensor.gaussian(List[int], float, float)` | ✅ | ✅ | ✅ | ✅ | - | `static Tensor uniform(std::vector, double, double)` | `Tensor.uniform(List[int], int, int)` | ✅ | ✅ | ✅ | ✅ | - | `static Tensor ones(std::vector)` | `Tensor.ones(List[int])` | ✅ | ✅ | ✅ | ✅ | - | `static Tensor zeros(std::vector)` | `Tensor.zeros(List[int])` | ✅ | ✅ | ✅ | ✅ | - | `void build_topo(Tensor* v, std::set&, std::vector&)` | 🪧 | ✅ | ❌ | 🪧 | 🪧 | - | `prev_` | `prev` | ✅ | | | | - | `std::vector backprop(bool)` | `backprop(bool)` | ✅ | ✅ | ✅ | ✅ | - | `Tensor& reshape(std::vector)` | `reshape(List[int])` | ✅ | ✅ | ✅ | ✅ | - | `Tensor copy() const` | `copy()` | ✅ | ❌ | ✅ | ✅ | - | `Tensor neg()` | `__neg__()` | ✅ | 🪧 | ✅ | ✅ | - | `Tensor add(Tensor&)` | `__add__(Tensor)` | ✅ | ✅ | ✅ | ✅ | - | | `__radd__(Tensor)` | ✅ | 🪧 | ✅ | | - | `Tensor add(GradTensor&)` | `__add__(GradTensor)` | ✅ | ❌ | ✅ | ✅ | - | | `__radd__(GradTensor)` | ✅ | 🪧 | ✅ | | - | `Tensor add(double&)` | `__add__(float)` | ✅ | ❌ | ✅ | ✅ | - | | `__radd__(float)` | ✅ | 🪧 | ✅ | | - | `Tensor sub(Tensor&)` | `__sub__(Tensor)` | ✅ | ✅ | ✅ | ✅ | - | | `__rsub__(Tensor)` | ✅ | 🪧 | ✅ | | - | `Tensor sub(GradTensor&)` | `__sub__(GradTensor)` | ✅ | ❌ | ✅ | ✅ | - | | `__rsub__(GradTensor)` | ✅ | 🪧 | ✅ | | - | `Tensor sub(double&)` | `__sub__(float)` | ✅ | ❌ | ✅ | ✅ | - | | `__rsub__(float)` | ✅ | 🪧 | ✅ | | - | `Tensor mul(Tensor&)` | `__mul__(Tensor)` | ✅ | ❌ | ✅ | ✅ | - | | `__rmul__(Tensor)` | ✅ | 🪧 | ✅ | | - | `Tensor mul(GradTensor&)` | `__mul__(GradTensor)` | ✅ | ❌ | ✅ | ✅ | - | | `__rmul__(GradTensor)` | ✅ | 🪧 | ✅ | | - | `Tensor mul(double&)` | `__mul__(float)` | ✅ | ❌ | ✅ | ✅ | - | | `__rmul__(float)` | ✅ | 🪧 | ✅ | | - | `Tensor exp(double&)` | `__pow__(float)` | ❌ | ❌ | ❌ | ❌ | - | `Tensor exp(double&)` | `exp(float)` | ❌ | ❌ | ❌ | ❌ | - | `Tensor log(double&)` | `log(float)` | ❌ | ❌ | ❌ | ❌ | - | `Tensor matmul(Tensor&)` | `matmul(Tensor)` | ✅ | ❌ | ❌ | ✅ | - | `Tensor matmul(Tensor&)` | `__matmul__(Tensor)` | ✅ | ❌ | ✅ | ✅ | - | `Tensor tranpose(const std::vector&) const` | `transpose(List[int])` | ✅ | ❌ | ✅ | ✅ | - | `Tensor concat(Tensor&, size_t)` | `concat(Tensor)` | ❌ | ❌ | ❌ | ❌ | - | `Tensor sin()` | `sin()` | ❌ | ❌ | ❌ | ❌ | - | `Tensor cos()` | `cos()` | ❌ | ❌ | ❌ | ❌ | - | `Tensor tan()` | `tan()` | ❌ | ❌ | ❌ | ❌ | - | `Tensor arcsin()` | `arcsin()` | ❌ | ❌ | ❌ | ❌ | - | `Tensor arccos()` | `arccos()` | ❌ | ❌ | ❌ | ❌ | - | `Tensor arctan()` | `arctan()` | ❌ | ❌ | ❌ | ❌ | - | `Tensor relu()` | `relu()` | ❌ | ❌ | ❌ | ❌ | - | `Tensor sigmoid()` | `sigmoid()` | ❌ | ❌ | ❌ | ❌ | - | `Tensor leaky_relu()` | `leaky_relu()` | ❌ | ❌ | ❌ | ❌ | - | `Tensor sum()` | `sum()` | ❌ | ❌ | ❌ | ❌ | - | `Tensor mean()` | `mean()` | ❌ | ❌ | ❌ | ❌ | - | `Tensor norm()` | `norm()` | ❌ | ❌ | ❌ | ❌ | - - ## Models - diff --git a/docs/structure.md b/docs/structure.md deleted file mode 100644 index 597f96c5..00000000 --- a/docs/structure.md +++ /dev/null @@ -1,38 +0,0 @@ -# Repository - -I've thought for a few weeks on how to structure this whole library, getting inspiration from the pytorch and tinygrad repositories. At a high level, the actual package repository is in `pyember/ember`, which uses functions pybinded from `pyember/aten` for fast computations. - -I tried to model a lot of the structure from Pytorch and TinyGrad. Very briefly, - -1. `aten/` contains the header and source files for the C++ low-level tensor library, such as basic operations and an autograd engine. - 1. `aten/src` contains all the source files and definitions. - 2. `aten/bindings` contains the pybindings. - 3. `aten/test` contains all the C++ testing modules for aten. -2. `ember/` contains the actual library, supporting high level models, objectives, optimizers, dataloaders, and samplers. - 1. `ember/aten` contains the stub files. - 2. `ember/datasets` contains all preprocessing tools, such as datasets/loaders, standardizing, cross validation checks. - 3. `ember/models` contains all machine learning models. - 4. `ember/objectives` contain all loss functions and regularizers. - 5. `ember/optimizers` contain all the optimizers/solvers, such as iterative (e.g. SGD), greedy (e.g. decision tree splitting), and one-shot (e.g. least-squares solution). - 6. `ember/samplers` contain all samplers (e.g. MCMC, SGLD). -3. `docs/` contains detailed documentation about each function. -4. `examples/` are example python scripts on training models. -5. `tests/` are python testing modules for the `ember` library. -6. `docker/` contains docker images of all the operating systems and architectures I tested ember on. General workflows on setting up the environment can be found there for supported machines. -7. `setup.py` allows you to pip install this as a package. -8. `run_tests.sh` which is the main test running script. - -For a more detailed explanation, look [here](docs/structure.md). - - -## ATen - - Aten, short for "a tensor" library (got the name from PyTorch), is a C++ library that provides low level functionality for Tensors. This includes the basic vector and matrix operations like addition, scalar/matrix multiplication, dot products, transpose, etc, which are used everywhere in model training and inference and must be fast. - -### Compiling and PyBinding - - Let's look at `aten/CMakeLists.txt` and `aten/binding/CMakeLists.txt`. - - - `aten/CMakeLists.txt` contains the instructions to generate a Makefile for compiling and linking the `aten` library. It has an optional argument `BUILD_PYTHON_BINDINGS` when set `ON`, will generate the `.so` file through `aten/binding/CMakeLists.txt`. The executables compiled with `aten/main.cpp` are compiled to `aten/build/main`. Same for the test files which are compiled to `aten/build/tests`. - - - `aten/binding/CMakeLists.txt` contains the instructions to generate the `.so` file and saves it to `pyember/ember/_C.cpython-312-darwin.so`. It must be contained within the Python package directory, since `ember`cannot access libraries outside of its base directory. diff --git a/docs/tensors.md b/docs/tensors.md deleted file mode 100644 index 050ecb91..00000000 --- a/docs/tensors.md +++ /dev/null @@ -1,234 +0,0 @@ -Tensors are $N$-dimensional arrays that are used to represent tabular data or model parameters. They are both derived from the `BaseTensor` abstract class, which support the very minimal functionalities that all tensors should have. - -The hierarchy is - - ``` - BaseTensor - Tensor - ScalarTensor (TBI) - DenseTensor (TBI) - SparseTensor (TBI) - GradTensor - ``` - -All the attributes and methods that are supported by all classes can be found in `aten/src/Tensor.h`. - -# BaseTensor - -### Attributes - -`std::vector storage_` -- A contiguous vector of doubles that store the state of the tensor. - -`std::vector shape_` -- The shape of the tensor, where the product of the shape elements should match the length of `storage_`. - -### Methods - - -`virtual std::string type() const { return "BaseTensor"; }` -- outputs the string representing the instance of the class. -- It is a virtual function, which must be overwritten. The `const` indicates that it doesn't a - -`virtual std::string dtype() const { return "double"; }` -- outputs the type of the elements in the tensor -- not sure if this needs to be virtual - -`virtual ~BaseTensor() = default;` -- a destructor. Not sure if this is needed - -`const std::vector& shape() const { return shape_; }` -- getter function for the shape - -`const std::vector& data() const { return storage_; }` -- getter function for the data, or storage - -`BaseTensor& reshape(std::vector new_shape);` -- simply reshapes by changing the `shape` attribute and does nothing else. - -`virtual bool operator==(BaseTensor& other) const;` -- equality operator that minimally checks the attributes `storage_` and `shape_`. -- Is a virtual function since it must be overwritten by `GradTensor`s which have additional `pivot` attributes. - -`virtual bool operator!=(BaseTensor& other) const;` -- Just negation of equality operator (see above). - -`operator std::string() const;` -- Returns a string so we can actually print tensors. Prints the type, plus the `storage_` and `shape_` so that we can see array structure. - -`virtual double at(const std::vector& indices) const;` -- Similar to `__getitem__`, where you return a copy of an element by its index. - -`virtual double& at(const std::vector& indices);` -- Similar to `__setitem__`, where you return a reference to the element for modification. - - -`virtual std::unique_ptr slice(const std::vector& slices) const;` -- Used to get a slice of a `Tensor` with the strides stored in `BaseTensor::Slice` struct. -- Returns a copy, not a reference/view of the Tensor! - - ``` - struct Slice { - size_t start; - size_t stop; - size_t step; - - Slice(size_t start_ = 0, - size_t stop_ = std::numeric_limits::max(), - size_t step_ = 1) - : start(start_), stop(stop_), step(step_) {} - }; - ``` - -### Notes - -- I'm not sure whether to include strides as PyTorch does, as this is directly in the `shape`. This would certainly make viewing easier, but would require a lot of modification in the `std::string` function of `BaseTensor` to use the strides to push into a stringstream. - -- I've tried virtualizing the transpose function from `BaseTensor`, but I wanted it to return a reference for `GradTensor` while a copy for `Tensor`, so I made two separate implementations in both subclasses. - - -## GradTensors - -Gradient Tensor, or `GradTensor`s, are tensors that store the total derivative of an elementary operation (not precisely the gradient, but in $\mathbb{R}^n$ one can be transposed to get the other). These operations can have 1 or more arguments. It is represented as an $N$-tensor of size - -$$ - (D_1, D_2, \ldots, D_N) -$$ - -Let's look at a regular gradient of a function $f: \mathbb{R}^n \rightarrow \mathbb{R}^m$, which is a $m \times n$ matrix. However, if we have another function $g: \mathbb{R}^{n \times m} \rightarrow \mathbb{R}$, then this also is a matrix of shape $m \times n$. Clearly there is some ambiguity here, so we must store another attribute which I call the *pivot dimension*, that captures this information. Say that we have $f: \mathbb{R}^{\mathbf{n}} \rightarrow \mathbb{R}^{\mathbf{m}}$, where the superscripts are now vectors of length $d_n, d_m$. Then, the total derivative has shape -$$ - \mathbf{m} \times \mathbf{n} -$$ -with the pivot being $d_m + 1$, the first dimension index of the input. - -Essentially, we are approaching matrix multiplication in a more general way by [contracting tensors](https://en.wikipedia.org/wiki/Tensor_contraction). Note that by including this pivot parameter, we can support both batching and higher-dimensional multiplication. - -### Attributes - -`std::vector storage_` -- A contiguous vector of doubles that store the state of the tensor. - -`std::vector shape_` -- The shape of the tensor, where the product of the shape elements should match the length of `storage_`. - -`size_t pivot_` -- The pivot index that marks the start hyperdimension of the input. - -### Constructors - -`GradTensor();` -- Default constructor that is called when initializing a tensor without any gradients. Stores empty vectors and `pivot_ = 0`. - -`GradTensor(std::vector data, std::vector shape, size_t pivot);` -- Full constructor that sets all attributes. - -`GradTensor(std::vector shape, size_t pivot);` -- Initializes a GradTensor of shape shape and pivot but with all $0$ entries. - -`static GradTensor eye(size_t n, size_t pivot = 1);` -- Creates an identity matrix gradient, which is a good default initialization when calling `backprop()` on a tensor. - - -### Methods - -`std::string type() const override { return "GradTensor"; }` -- overrides type - -`size_t pivot() const { return pivot_; }` -- getter method for pivot index - -`bool operator==(GradTensor& other) const;` -- equality operator that also checks equality in pivot - -`bool operator!=(GradTensor& other) const;` -- negation of equality. - -`GradTensor copy() const;` -- returns a new GradTensor copy - -`GradTensor add(GradTensor& other);` -- For adding gradients together of the same shape and pivot. - -`Tensor add(Tensor& other);` -- For adding gradients to parameters for updating. - -`GradTensor sub(GradTensor& other);` -- For subtracting gradients together of the same shape and pivot. - -`Tensor sub(Tensor& other);` -- For subtracting gradients to parameters for updating. - -`GradTensor mul(GradTensor& other);` -- Elementwise multiplication, which doesn't really make sense to do at all but included. -`Tensor mul(Tensor& other); ` -- Elementwise multiplication, which doesn't really make sense to do at all but included. -`GradTensor matmul(GradTensor& other); ` -- Right matrix multiplication or tensor contraction, used for chain rule. - -`GradTensor& transpose(const std::vector& axes = {});` -- Modifies the gradient tensor in place and returns itself. -- It doesn't return a copy like in `Tensor` since I don't think it makes sense reuse the old one. If you must, you can just copy it and then transpose it. - -### Notes - -- The most natural operations on gradients/Jacobians are addition, subtraction, and multiplication (composition). Operations like taking the dot product or element-wise multiplication doesn't make sense, so I did not implement them on purpose. - -- Might need to check whether addition checks if pivots are the same. - -## Tensor - -Regular tensors, or `Tensor`s, store either tabular data or the state of a parameter. - -### Attributes - -`std::vector storage_` -- A contiguous vector of doubles that store the state of the tensor. - -`std::vector shape_` -- The shape of the tensor, where the product of the shape elements should match the length of `storage_`. - -`GradTensor grad = GradTensor();` -- the Jacobian (if being precise, rather than gradient) of some tensor further down the computation graph with respect to this tensor. - -`std::vector prev = std::vector();` -- previous nodes used to compute this tensor, if any - -`std::function backward;` -- function for filling in gradients of this tensor - -### Constructors - -`Tensor(std::vector data, std::vector shape);` -- The full constructor, which sets the storage and shape, whilst setting the gradients to null. - -`Tensor(std::vector data);` -- Constructor for 1D arrays. - -`Tensor(std::vector> data);` -- Constructor for 2D arrays. - -`Tensor(std::vector>> data);` -- Constructor for 3D arrays. - -`static Tensor arange(int start, int stop, int step = 1);` -- Arange constructor, returning a 1D array. - -`static Tensor linspace(double start, double stop, int numsteps);` -- Linspace constructor (like in numpy), returning a 1D array. - -`static Tensor gaussian(std::vector shape, double mean = 0.0, double stddev = 1.0);` -- Returns a Tensor of shape `shape` of Gaussian random variables. - -`static Tensor uniform(std::vector shape, double min = 0.0, double max = 1.0);` -- Returns a Tensor of shape `shape` of Uniform random variables. - -`static Tensor ones(std::vector shape);` -- Returns a Tensor of shape `shape` of all $1$. - -`static Tensor zeros(std::vector shape);` -- Returns a Tensor of shape `shape` of all $0$. - - -### Methods - -