diff --git a/docs/img/computational_graph.png b/docs/img/computational_graph.png
deleted file mode 100644
index ab8e2715..00000000
Binary files a/docs/img/computational_graph.png and /dev/null differ
diff --git a/docs/installation.md b/docs/installation.md
deleted file mode 100644
index 2cc67824..00000000
--- a/docs/installation.md
+++ /dev/null
@@ -1,73 +0,0 @@
-# Installation 
-
-## System Support 
-
-This library supports both x86_64/amd64 and arm64/aarch64. Check if your system is supported out of the box in the table below. The library requires very few dependencies, so as long as your machine supports a C++ compiler and python, you should be able to get it working by fiddling with the CMake and setuptools files. 
-
-| OS + Arch | Python | Latest Release Directly Tested |
-|-|-|-|
-|Ubuntu 24.04 AMD64     | Python 3.9+ ||
-|Ubuntu 22.04 AMD64     | Python 3.9+ ||
-|Ubuntu 20.04 AMD64     | Python 3.9+ ||
-|Ubuntu 24.04 ARM64     | TBD | |
-|Ubuntu 22.04 ARM64     | TBD | |
-|Ubuntu 20.04 ARM64     | TBD | |
-|ArchLinux 6.6.68 LTS   | Python 3.9+ ||
-|MacOS 15 ARM64         | Python 3.9+ || 
-|MacOS 14 ARM64         | Python 3.9+ | v0.0.16 |
-|MacOS 13 ARM64         | Python 3.9+ ||
-|MacOS 12 ARM64         | Python 3.9+ ||
-|MacOS 11 ARM64         | Python 3.9+ ||
-|Windows 11             | Python 3.9+ | v0.0.17 |
-|Windows 10             | Python 3.9+ ||
-|Debian 13              | Python 3.9+ ||
-|Debian 12              | Python 3.9+ ||
-|LinuxMint 22           | Python 3.9+ ||
-|LinuxMint 21           | Python 3.9+ ||
-
-## Compiling the `aten` Library  
-
-Your machine will need system dependencies such as CMake, a C++ compiler, and pybind11. The library uses C++17. Preferably you will have git and conda installed already. For more specific instructions on installing these on your system, refer to the more detailed installation guide. 
-
-Git clone the repo, then pip install, which will run `setup.py`. 
-
-```
-git clone git@github.com:mbahng/pyember.git 
-cd pyember 
-pip install .
-```
-
-This runs `cmake` on `aten/CMakeLists.txt`, which calls the following. 
-1. It always calls `aten/src/CMakeLists.txt` that compiles and links the source files in the C++ tensor library. 
-2. If `BUILD_PYTHON_BINDINGS=ON` (always on by default), it further calls `aten/bindings/CMakeLists.txt` to further generate a `.so` file that can be imported into `ember`. 
-3. If `BUILD_DEV=ON`, it calls `aten/test/CMakeLists.txt` to further compile the C++ unit testing suite. 
-
-If there are problems with building, you should check, in order, 
-1. Whether `build/` has been created. This is the first step in `setup.py` 
-2. Whether the compiled `main.cpp` and, if `BUILD_DEV=ON`, the C++ unit test files have been compiled, i.e. if `build/src/main` and `build/test/tests` executables exist. 
-3. Whether `build/*/aten.cpython-3**-darwin.so` exists (somewhere in the build directory, depending on the machine). The Makefile generated by `aten/bindings/CMakeLists.txt` will produce `build/*/aten.cpython-3**-darwin.so`. 
-4. The `setup()` function will immediately copy this `.so` file to `ember/aten.cpython-3**-darwin.so`. You should see a success message saying that it has been moved or an error. The `.so` file must live within `ember`, the actual library, since `ember/__init__.py` must access it within the same directory level. 
-
-## Testing and Development
-
-The pip install comes with two more environment variable parameters. Note that the following command is whitespace-sensitive. 
-```
-CMAKE_DEBUG=1 CMAKE_DEV=1 pip install .
-```
-1. Setting `CMAKE_DEBUG=1` compiles the `aten` library with debug mode (`-g`) on, which I use when using gdb/lldb on the compiled code. 
-2. Setting `CMAKE_DEV=1` compiles the C++ testing suite as well. If you want to do this, you will also need to install google-tests. A code snippet for Ubuntu and Debian is shown below. 
-```
-sudo apt-get install libgtest-dev 
-cd /usr/src/gtest 
-cmake CMakeLists.txt 
-make 
-cp lib/*.a /usr/lib 
-rm -rf /var/lib/apt/lists/*
-```
-
-If you would like to run tests and/or develop the package yourself, you can run the script `./run_tests.sh all` (args `python` to run just python tests and `cpp` to run just C++ tests), which will 
-1. Run all C++ unit tests for `aten`, ensuring that all functions work correctly. 
-2. Run all Python unit tests for `ember`, ensuring that additional functions work correctly and that the C++ functions are bound correctly. 
-
-The stub (`.pyi`) files for `aten` are located in `ember/aten`. 
-
diff --git a/docs/progress.md b/docs/progress.md
deleted file mode 100644
index 79c2f720..00000000
--- a/docs/progress.md
+++ /dev/null
@@ -1,136 +0,0 @@
-# Progress 
-
-  To do: 
-  1. Add a template argument for Tensor dtype. 
-  2. Store all tensors in heap to preserve them after stack is destroyed.  
-
-  ✅ - Done
-  ❌ - Not implemented
-  🪧 - Don't need, either should not be accessed or is not necessary (e.g. due to inheritance)
-  🚧 - In progress
-
-  ## Aten BaseTensor 
-
-  | C++ Method                                                           | PyBind Method        | Status | C++ Tests | Python Tests | Stubs  |
-  |----------------------------------------------------------------------|----------------------|--------|-----------|--------------|--------|
-  | `std::string type() const`                                           | `type()`             | ✅     | 🪧        | 🪧           | ✅     |
-  | `std::string dtype() const`                                          | `dtype()`            | ✅     | 🪧        | 🪧           | ✅     |
-  | `bool operator==(BaseTensor&)`                                       | `__eq__()`           | ✅     | 🪧        | 🪧           | ✅     |
-  | `bool operator!=(BaseTensor&)`                                       | `__ne__()`           | ✅     | 🪧        | 🪧           | ✅     |
-  | `double at(const std::vector<size_t>&) const`                        | `__getitem__()`      | ✅     | 🪧        | 🪧           | ✅     |
-  | `double at(const std::vector<size_t>&)`                              | `__setitem__()`      | ✅     | 🪧        | 🪧           | ✅     |
-  | `std::unique_ptr<BaseTensor> slice(const std::vector<Slice>&) const` | `__getitem__()`      | ✅     | 🪧        | 🪧           | ✅     |
-  | `operator std::string() const`                                       | `__str__()`          | ✅     | 🪧        | 🪧           | ✅     |
-  | `operator std::string() const`                                       | `__repr__()`         | ✅     | 🪧        | 🪧           | ✅     |
-  | `BaseTensor& reshape(std::vector<size_t>)`                           | `reshape(List[int])` | ✅     | 🪧        | 🪧           | ✅     |
-
-  ## Aten GradTensor 
-
-  | C++ Method                                                           | PyBind Method                              | Status | C++ Tests | Python Tests | Stubs  |
-  |----------------------------------------------------------------------|--------------------------------------------|--------|-----------|--------------|--------|
-  | `std::string type() const`                                           | `type()`                                   | ✅     | ✅        | ✅           | ✅     |
-  | `std::string dtype() const`                                          | `dtype()`                                  | ✅     | ✅        | ✅           | ✅     |
-  | `bool operator==(GradTensor&)`                                       | `__eq__()`                                 | ✅     | ✅        | ✅           | 🪧     |
-  | `bool operator!=(GradTensor&)`                                       | `__ne__()`                                 | ✅     | ✅        | ✅           | 🪧     |
-  | `double at(const std::vector<size_t>&) const`                        | `__getitem__()`                            | ✅     | ✅        | ✅           | ✅     |
-  | `double at(const std::vector<size_t>&)`                              | `__setitem__()`                            | ✅     | ✅        | ✅           | ✅     |
-  | `std::unique_ptr<GradTensor> slice(const std::vector<Slice>&) const` | `__getitem__()`                            | ✅     | ✅        | ✅           | ✅     |
-  | `BaseTensor::operator std::string() const`                           | `__str__()`                                | ✅     | ❌        | ✅           | ✅     |
-  | `BaseTensor::operator std::string() const`                           | `__repr__()`                               | ✅     | ❌        | ✅           | ✅     |
-  | `size_t pivot() const`                                               | `pivot()`                                  | ✅     | ✅        | ✅           | ✅     |
-  | `GradTensor()`                                                       | `GradTensor()`                             | ✅     | ✅        | ✅           | ✅     |
-  | `GradTensor(std::vector<double>, std::vector<size_t>, size_t)`       | `GradTensor(List[double], List[int], int)` | ✅     | ✅        | ✅           | ✅     |
-  | `GradTensor(std::vector<size_t>, size_t)`                            | `GradTensor(List[int], int)`               | ✅     | ✅        | ✅           | ✅     |
-  | `GradTensor::eye(size_t, size_t)`                                    |                                            | ✅     | ✅        | ✅           | ✅     |
-  | `transpose()`                                                        | `transpose()`                              | ✅     | ✅        | ✅           | ✅     |
-  | `GradTensor copy() const`                                            | `copy()`                                   | ✅     | ✅        | ✅           | ✅     |
-  |                                                                      | `__neg__()`                                | ✅     | 🪧        | ✅           | ✅     |
-  | `Tensor add(Tensor&)`                                                | `__add__(Tensor)`                          | ✅     | ✅        | ✅           | ✅     |
-  |                                                                      | `__radd__(Tensor)`                         | ✅     | 🪧        | ✅           |        |
-  | `GradTensor add(GradTensor&)`                                        | `__add__(GradTensor)`                      | ✅     | ✅        | ✅           | ✅     |
-  |                                                                      | `__radd__(GradTensor)`                     | ✅     | 🪧        | ✅           |        |
-  | `GradTensor add(double&)`                                            | `__add__(float)`                           | ✅     | ✅        | ✅           | ✅     |
-  |                                                                      | `__radd__(float)`                          | ✅     | 🪧        | ✅           |        |
-  | `Tensor sub(Tensor&)`                                                | `__sub__(Tensor)`                          | ✅     | ✅        | ✅           | ✅     |
-  |                                                                      | `__rsub__(Tensor)`                         | ✅     | 🪧        | ✅           |        |
-  | `GradTensor sub(GradTensor&)`                                        | `__sub__(GradTensor)`                      | ✅     | ✅        | ✅           | ✅     |
-  |                                                                      | `__rsub__(GradTensor)`                     | ✅     | 🪧        | ✅           |        |
-  | `GradTensor sub(double&)`                                            | `__sub__(float)`                           | ✅     | ✅        | ✅           | ✅     |
-  |                                                                      | `__rsub__(float)`                          | ✅     | 🪧        | ✅           |        |
-  | `Tensor mul(Tensor&)`                                                | `__mul__(Tensor)`                          | ✅     | ✅        | ✅           | ✅     |
-  |                                                                      | `__rmul__(Tensor)`                         | ✅     | 🪧        | ✅           |        |
-  | `GradTensor mul(GradTensor&)`                                        | `__mul__(GradTensor)`                      | ✅     | ✅        | ✅           | ✅     |
-  |                                                                      | `__rmul__(GradTensor)`                     | ✅     | 🪧        | ✅           |        |
-  | `GradTensor mul(double&)`                                            | `__mul__(float)`                           | ✅     | ✅        | ✅           | ✅     |
-  |                                                                      | `__rmul__(float)`                          | ✅     | 🪧        | ✅           |        |
-  | `GradTensor matmul(GradTensor&)`                                     | `__matmul__(GradTensor)`                   | ✅     | ✅        | ✅           | ✅     |
-
-  ## Aten Tensor 
-
-  | C++ Method                                                              | PyBind Method                                 | Status | C++ Tests | Python Tests | Stubs  |
-  |-------------------------------------------------------------------------|-----------------------------------------------|--------|-----------|--------------|--------|
-  | `std::string type() const`                                              | `type()`                                      | ✅     | ✅        | ✅           | ✅     |
-  | `std::string dtype() const`                                             | `dtype()`                                     | ✅     | ✅        | ✅           | ✅     |
-  | `bool operator==(Tensor&)`                                              | `__eq__()`                                    | ✅     | ✅        | ✅           | 🪧     |
-  | `bool operator!=(Tensor&)`                                              | `__ne__()`                                    | ✅     | ✅        | ✅           | 🪧     |
-  | `double at(const std::vector<size_t>&) const`                           | `__getitem__()`                               | ✅     | ✅        | ✅           | ✅     |
-  | `double at(const std::vector<size_t>&)`                                 | `__setitem__()`                               | ✅     | ✅        | ✅           | ✅     |
-  | `std::unique_ptr<Tensor> slice(const std::vector<Slice>&) const`        | `__getitem__()`                               | ✅     | ✅        | ✅           | ✅     |
-  | `BaseTensor::operator std::string() const`                              | `__str__()`                                   | ✅     | ✅        | ✅           | ✅     |
-  | `BaseTensor::operator std::string() const`                              | `__repr__()`                                  | ✅     | ✅        | ✅           | ✅     |
-  | `Tensor(std::vector<double>, std::vector<size_t>)`                      | `Tensor(List[float], List[int])`              | ✅     | ✅        | ✅           | ✅     |
-  | `Tensor(std::vector<double>)`                                           | `Tensor(List[float])`                         | ✅     | ✅        | ✅           | ✅     |
-  | `Tensor(std::vector<std::vector<double>>)`                              | `Tensor(List[List[float]])`                   | ✅     | ✅        | ✅           | ✅     |
-  | `Tensor(std::vector<std::vector<std::vector<double>>>)`                 | `Tensor(List[List[List[float]]])`             | ✅     | ✅        | ✅           | ✅     |
-  | `static Tensor arange(int, int, int)`                                   | `Tensor.arange(int, int, int)`                | ✅     | ✅        | ✅           | ✅     |
-  | `static Tensor linspace(double, double, int)`                           | `Tensor.linspace(float, float, int)`          | ✅     | ✅        | ✅           | ✅     |
-  | `static Tensor gaussian(std::vector<size_t>, double, double)`           | `Tensor.gaussian(List[int], float, float)`    | ✅     | ✅        | ✅           | ✅     |
-  | `static Tensor uniform(std::vector<size_t>, double, double)`            | `Tensor.uniform(List[int], int, int)`         | ✅     | ✅        | ✅           | ✅     |
-  | `static Tensor ones(std::vector<size_t>)`                               | `Tensor.ones(List[int])`                      | ✅     | ✅        | ✅           | ✅     |
-  | `static Tensor zeros(std::vector<size_t>)`                              | `Tensor.zeros(List[int])`                     | ✅     | ✅        | ✅           | ✅     |
-  | `void build_topo(Tensor* v, std::set<Tensor*>&, std::vector<Tensor*>&)` | 🪧                                            | ✅     | ❌        | 🪧           | 🪧     |
-  | `prev_`                                                                 | `prev`                                        | ✅     |           |              |        |
-  | `std::vector<Tensor*> backprop(bool)`                                   | `backprop(bool)`                              | ✅     | ✅        | ✅           | ✅     |
-  | `Tensor& reshape(std::vector<size_t>)`                                  | `reshape(List[int])`                          | ✅     | ✅        | ✅           | ✅     |
-  | `Tensor copy() const`                                                   | `copy()`                                      | ✅     | ❌        | ✅           | ✅     |
-  | `Tensor neg()`                                                          | `__neg__()`                                   | ✅     | 🪧        | ✅           | ✅     |
-  | `Tensor add(Tensor&)`                                                   | `__add__(Tensor)`                             | ✅     | ✅        | ✅           | ✅     |
-  |                                                                         | `__radd__(Tensor)`                            | ✅     | 🪧        | ✅           |        |
-  | `Tensor add(GradTensor&)`                                               | `__add__(GradTensor)`                         | ✅     | ❌        | ✅           | ✅     |
-  |                                                                         | `__radd__(GradTensor)`                        | ✅     | 🪧        | ✅           |        |
-  | `Tensor add(double&)`                                                   | `__add__(float)`                              | ✅     | ❌        | ✅           | ✅     |
-  |                                                                         | `__radd__(float)`                             | ✅     | 🪧        | ✅           |        |
-  | `Tensor sub(Tensor&)`                                                   | `__sub__(Tensor)`                             | ✅     | ✅        | ✅           | ✅     |
-  |                                                                         | `__rsub__(Tensor)`                            | ✅     | 🪧        | ✅           |        |
-  | `Tensor sub(GradTensor&)`                                               | `__sub__(GradTensor)`                         | ✅     | ❌        | ✅           | ✅     |
-  |                                                                         | `__rsub__(GradTensor)`                        | ✅     | 🪧        | ✅           |        |
-  | `Tensor sub(double&)`                                                   | `__sub__(float)`                              | ✅     | ❌        | ✅           | ✅     |
-  |                                                                         | `__rsub__(float)`                             | ✅     | 🪧        | ✅           |        |
-  | `Tensor mul(Tensor&)`                                                   | `__mul__(Tensor)`                             | ✅     | ❌        | ✅           | ✅     |
-  |                                                                         | `__rmul__(Tensor)`                            | ✅     | 🪧        | ✅           |        |
-  | `Tensor mul(GradTensor&)`                                               | `__mul__(GradTensor)`                         | ✅     | ❌        | ✅           | ✅     |
-  |                                                                         | `__rmul__(GradTensor)`                        | ✅     | 🪧        | ✅           |        |
-  | `Tensor mul(double&)`                                                   | `__mul__(float)`                              | ✅     | ❌        | ✅           | ✅     |
-  |                                                                         | `__rmul__(float)`                             | ✅     | 🪧        | ✅           |        |
-  | `Tensor exp(double&)`                                                   | `__pow__(float)`                              | ❌     | ❌        | ❌           | ❌     |
-  | `Tensor exp(double&)`                                                   | `exp(float)`                                  | ❌     | ❌        | ❌           | ❌     |
-  | `Tensor log(double&)`                                                   | `log(float)`                                  | ❌     | ❌        | ❌           | ❌     |
-  | `Tensor matmul(Tensor&)`                                                | `matmul(Tensor)`                              | ✅     | ❌        | ❌           | ✅     |
-  | `Tensor matmul(Tensor&)`                                                | `__matmul__(Tensor)`                          | ✅     | ❌        | ✅           | ✅     |
-  | `Tensor tranpose(const std::vector<size_t>&) const`                     | `transpose(List[int])`                        | ✅     | ❌        | ✅           | ✅     |
-  | `Tensor concat(Tensor&, size_t)`                                        | `concat(Tensor)`                              | ❌     | ❌        | ❌           | ❌     |
-  | `Tensor sin()`                                                          | `sin()`                                       | ❌     | ❌        | ❌           | ❌     |
-  | `Tensor cos()`                                                          | `cos()`                                       | ❌     | ❌        | ❌           | ❌     |
-  | `Tensor tan()`                                                          | `tan()`                                       | ❌     | ❌        | ❌           | ❌     |
-  | `Tensor arcsin()`                                                       | `arcsin()`                                    | ❌     | ❌        | ❌           | ❌     |
-  | `Tensor arccos()`                                                       | `arccos()`                                    | ❌     | ❌        | ❌           | ❌     |
-  | `Tensor arctan()`                                                       | `arctan()`                                    | ❌     | ❌        | ❌           | ❌     |
-  | `Tensor relu()`                                                         | `relu()`                                      | ❌     | ❌        | ❌           | ❌     |
-  | `Tensor sigmoid()`                                                      | `sigmoid()`                                   | ❌     | ❌        | ❌           | ❌     |
-  | `Tensor leaky_relu()`                                                   | `leaky_relu()`                                | ❌     | ❌        | ❌           | ❌     |
-  | `Tensor sum()`                                                          | `sum()`                                       | ❌     | ❌        | ❌           | ❌     |
-  | `Tensor mean()`                                                         | `mean()`                                      | ❌     | ❌        | ❌           | ❌     |
-  | `Tensor norm()`                                                         | `norm()`                                      | ❌     | ❌        | ❌           | ❌     |
-
-  ## Models 
-
diff --git a/docs/structure.md b/docs/structure.md
deleted file mode 100644
index 597f96c5..00000000
--- a/docs/structure.md
+++ /dev/null
@@ -1,38 +0,0 @@
-# Repository 
-
-I've thought for a few weeks on how to structure this whole library, getting inspiration from the pytorch and tinygrad repositories. At a high level, the actual package repository is in `pyember/ember`, which uses functions pybinded from `pyember/aten` for fast computations. 
-
-I tried to model a lot of the structure from Pytorch and TinyGrad. Very briefly, 
-
-1. `aten/` contains the header and source files for the C++ low-level tensor library, such as basic operations and an autograd engine. 
- 1. `aten/src` contains all the source files and definitions. 
- 2. `aten/bindings` contains the pybindings. 
- 3. `aten/test` contains all the C++ testing modules for aten. 
-2. `ember/` contains the actual library, supporting high level models, objectives, optimizers, dataloaders, and samplers. 
- 1. `ember/aten` contains the stub files. 
- 2. `ember/datasets` contains all preprocessing tools, such as datasets/loaders, standardizing, cross validation checks. 
- 3. `ember/models` contains all machine learning models. 
- 4. `ember/objectives` contain all loss functions and regularizers. 
- 5. `ember/optimizers` contain all the optimizers/solvers, such as iterative (e.g. SGD), greedy (e.g. decision tree splitting), and one-shot (e.g. least-squares solution). 
- 6. `ember/samplers` contain all samplers (e.g. MCMC, SGLD). 
-3. `docs/` contains detailed documentation about each function.  
-4. `examples/` are example python scripts on training models.  
-5. `tests/` are python testing modules for the `ember` library. 
-6. `docker/` contains docker images of all the operating systems and architectures I tested ember on. General workflows on setting up the environment can be found there for supported machines. 
-7. `setup.py` allows you to pip install this as a package. 
-8. `run_tests.sh` which is the main test running script. 
-
-For a more detailed explanation, look [here](docs/structure.md). 
-
-
-## ATen
-
-  Aten, short for "a tensor" library (got the name from PyTorch), is a C++ library that provides low level functionality for Tensors. This includes the basic vector and matrix operations like addition, scalar/matrix multiplication, dot products, transpose, etc, which are used everywhere in model training and inference and must be fast. 
-
-### Compiling and PyBinding
-
-  Let's look at `aten/CMakeLists.txt` and `aten/binding/CMakeLists.txt`. 
-
-  - `aten/CMakeLists.txt` contains the instructions to generate a Makefile for compiling and linking the `aten` library. It has an optional argument `BUILD_PYTHON_BINDINGS` when set `ON`, will generate the `.so` file through `aten/binding/CMakeLists.txt`. The executables compiled with `aten/main.cpp` are compiled to `aten/build/main`. Same for the test files which are compiled to `aten/build/tests`. 
-
-  - `aten/binding/CMakeLists.txt` contains the instructions to generate the `.so` file and saves it to `pyember/ember/_C.cpython-312-darwin.so`. It must be contained within the Python package directory, since `ember`cannot access libraries outside of its base directory. 
diff --git a/docs/tensors.md b/docs/tensors.md
deleted file mode 100644
index 050ecb91..00000000
--- a/docs/tensors.md
+++ /dev/null
@@ -1,234 +0,0 @@
-Tensors are $N$-dimensional arrays that are used to represent tabular data or model parameters. They are both derived from the `BaseTensor` abstract class, which support the very minimal functionalities that all tensors should have. 
-
-The hierarchy is 
-
-    ```
-    BaseTensor 
-        Tensor 
-            ScalarTensor (TBI)
-            DenseTensor  (TBI)
-            SparseTensor (TBI)
-        GradTensor
-    ``` 
-
-All the attributes and methods that are supported by all classes can be found in `aten/src/Tensor.h`. 
-
-# BaseTensor 
-
-### Attributes
-
-`std::vector<double> storage_`
-- A contiguous vector  of doubles that store the state of the tensor. 
-
-`std::vector<size_t> shape_`
-- The shape of the tensor, where the product of the shape elements should match the length of `storage_`. 
-
-### Methods
-
-
-`virtual std::string type() const { return "BaseTensor"; }`
-- outputs the string representing the instance of the class. 
-- It is a virtual function, which must be overwritten. The `const` indicates that it doesn't a
-
-`virtual std::string dtype() const { return "double"; }`
-- outputs the type of the elements in the tensor
-- not sure if this needs to be virtual 
-
-`virtual ~BaseTensor() = default;` 
-- a destructor. Not sure if this is needed
-
-`const std::vector<size_t>& shape() const { return shape_; }` 
-- getter function for the shape 
-
-`const std::vector<double>& data() const { return storage_; }`
-- getter function for the data, or storage
-
-`BaseTensor& reshape(std::vector<size_t> new_shape);`
-- simply reshapes by changing the `shape` attribute and does nothing else. 
-
-`virtual bool operator==(BaseTensor& other) const;` 
-- equality operator that minimally checks the attributes `storage_` and `shape_`. 
-- Is a virtual function since it must be overwritten by `GradTensor`s which have additional `pivot` attributes. 
-
-`virtual bool operator!=(BaseTensor& other) const;`
-- Just negation of equality operator (see above). 
-
-`operator std::string() const;`
-- Returns a string so we can actually print tensors. Prints the type, plus the `storage_` and `shape_` so that we can see array structure. 
-
-`virtual double at(const std::vector<size_t>& indices) const;`
-- Similar to `__getitem__`, where you return a copy of an element by its index. 
-
-`virtual double& at(const std::vector<size_t>& indices);`
-- Similar to `__setitem__`, where you return a reference to the element for modification. 
-
-
-`virtual std::unique_ptr<BaseTensor> slice(const std::vector<Slice>& slices) const;`
-- Used to get a slice of a `Tensor` with the strides stored in `BaseTensor::Slice` struct. 
-- Returns a copy, not a reference/view of the Tensor!
-
-    ```
-    struct Slice {
-      size_t start;
-      size_t stop;
-      size_t step;
-      
-      Slice(size_t start_ = 0, 
-        size_t stop_ = std::numeric_limits<size_t>::max(), 
-        size_t step_ = 1)
-      : start(start_), stop(stop_), step(step_) {}
-    };
-    ```
-
-### Notes 
-
-- I'm not sure whether to include strides as PyTorch does, as this is directly in the `shape`. This would certainly make viewing easier, but would require a lot of modification in the `std::string` function of `BaseTensor` to use the strides to push into a stringstream. 
-
-- I've tried virtualizing the transpose function from `BaseTensor`, but I wanted it to return a reference for `GradTensor` while a copy for `Tensor`, so I made two separate implementations in both subclasses.  
-
-
-## GradTensors 
-
-Gradient Tensor, or `GradTensor`s, are tensors that store the total derivative of an elementary operation (not precisely the gradient, but in $\mathbb{R}^n$ one can be transposed to get the other). These operations can have 1 or more arguments. It is represented as an $N$-tensor of size 
-
-$$
-    (D_1, D_2, \ldots, D_N)
-$$
-
-Let's look at a regular gradient of a function $f: \mathbb{R}^n \rightarrow \mathbb{R}^m$, which is a $m \times n$ matrix. However, if we have another function $g: \mathbb{R}^{n \times m} \rightarrow \mathbb{R}$, then this also is a matrix of shape $m \times n$. Clearly there is some ambiguity here, so we must store another attribute which I call the *pivot dimension*, that captures this information. Say that we have $f: \mathbb{R}^{\mathbf{n}} \rightarrow \mathbb{R}^{\mathbf{m}}$, where the superscripts are now vectors of length $d_n, d_m$. Then, the total derivative has shape  
-$$ 
-    \mathbf{m} \times \mathbf{n}
-$$
-with the pivot being $d_m + 1$, the first dimension index of the input. 
-
-Essentially, we are approaching matrix multiplication in a more general way by [contracting tensors](https://en.wikipedia.org/wiki/Tensor_contraction). Note that by including this pivot parameter, we can support both batching and higher-dimensional multiplication. 
-
-### Attributes 
-
-`std::vector<double> storage_`
-- A contiguous vector  of doubles that store the state of the tensor. 
-
-`std::vector<size_t> shape_`
-- The shape of the tensor, where the product of the shape elements should match the length of `storage_`. 
-
-`size_t pivot_`
-- The pivot index that marks the start hyperdimension of the input. 
-
-### Constructors 
-
-`GradTensor();` 
-- Default constructor that is called when initializing a tensor without any gradients. Stores empty vectors and `pivot_ = 0`.
-
-`GradTensor(std::vector<double> data, std::vector<size_t> shape, size_t pivot);` 
-- Full constructor that sets all attributes. 
-
-`GradTensor(std::vector<size_t> shape, size_t pivot);` 
-- Initializes a GradTensor of shape shape and pivot but with all $0$ entries. 
-
-`static GradTensor eye(size_t n, size_t pivot = 1);` 
-- Creates an identity matrix gradient, which is a good default initialization when calling `backprop()` on a tensor. 
-
-
-### Methods
-
-`std::string type() const override { return "GradTensor"; }`
-- overrides type 
-
-`size_t pivot() const { return pivot_; }`
-- getter method for pivot index
-
-`bool operator==(GradTensor& other) const;`
-- equality operator that also checks equality in pivot 
-
-`bool operator!=(GradTensor& other) const;`
-- negation of equality. 
-
-`GradTensor copy() const;`
-- returns a new GradTensor copy
-
-`GradTensor add(GradTensor& other);`
-- For adding gradients together of the same shape and pivot. 
-
-`Tensor add(Tensor& other);`
-- For adding gradients to parameters for updating.  
-
-`GradTensor sub(GradTensor& other);`
-- For subtracting gradients together of the same shape and pivot. 
-
-`Tensor sub(Tensor& other);`
-- For subtracting gradients to parameters for updating.  
-
-`GradTensor mul(GradTensor& other);`
-- Elementwise multiplication, which doesn't really make sense to do at all but included.  
-`Tensor mul(Tensor& other); `
-- Elementwise multiplication, which doesn't really make sense to do at all but included.  
-`GradTensor matmul(GradTensor& other); `
-- Right matrix multiplication or tensor contraction, used for chain rule. 
-
-`GradTensor& transpose(const std::vector<size_t>& axes = {});`
-- Modifies the gradient tensor in place and returns itself. 
-- It doesn't return a copy like in `Tensor` since I don't think it makes sense reuse the old one. If you must, you can just copy it and then transpose it. 
-
-### Notes 
-
-- The most natural operations on gradients/Jacobians are addition, subtraction, and multiplication (composition). Operations like taking the dot product or element-wise multiplication doesn't make sense, so I did not implement them on purpose. 
-
-- Might need to check whether addition checks if pivots are the same. 
-
-## Tensor
-
-Regular tensors, or `Tensor`s, store either tabular data or the state of a parameter. 
-
-### Attributes 
-
-`std::vector<double> storage_`
-- A contiguous vector  of doubles that store the state of the tensor. 
-
-`std::vector<size_t> shape_`
-- The shape of the tensor, where the product of the shape elements should match the length of `storage_`. 
-
-`GradTensor grad = GradTensor();`
-- the Jacobian (if being precise, rather than gradient) of some tensor further down the computation graph with respect to this tensor. 
-
-`std::vector<Tensor*> prev = std::vector<Tensor*>();` 
-- previous nodes used to compute this tensor, if any 
-
-`std::function<void()> backward;`
-- function for filling in gradients of this tensor
-
-### Constructors 
-
-`Tensor(std::vector<double> data, std::vector<size_t> shape);`
-- The full constructor, which sets the storage and shape, whilst setting the gradients to null. 
-
-`Tensor(std::vector<double> data);`
-- Constructor for 1D arrays. 
-
-`Tensor(std::vector<std::vector<double>> data);`
-- Constructor for 2D arrays. 
-
-`Tensor(std::vector<std::vector<std::vector<double>>> data);`
-- Constructor for 3D arrays. 
-
-`static Tensor arange(int start, int stop, int step = 1);`
-- Arange constructor, returning a 1D array. 
-
-`static Tensor linspace(double start, double stop, int numsteps);`
-- Linspace constructor (like in numpy), returning a 1D array. 
-
-`static Tensor gaussian(std::vector<size_t> shape, double mean = 0.0, double stddev = 1.0);`
-- Returns a Tensor of shape `shape` of Gaussian random variables. 
-
-`static Tensor uniform(std::vector<size_t> shape, double min = 0.0, double max = 1.0);`
-- Returns a Tensor of shape `shape` of Uniform random variables. 
-
-`static Tensor ones(std::vector<size_t> shape);`
-- Returns a Tensor of shape `shape` of all $1$. 
-
-`static Tensor zeros(std::vector<size_t> shape);`
-- Returns a Tensor of shape `shape` of all $0$. 
-
-
-### Methods 
-
-