From 7280c0fa500fc5cd6518d1867fe7290d9b52e1b0 Mon Sep 17 00:00:00 2001
From: Riley Murray <rileyjmurray@users.noreply.github.com>
Date: Tue, 10 Sep 2024 10:22:11 -0400
Subject: [PATCH] New memory management policies, new SketchingOperator
 concept,  redesign of DenseDist and SparseDist, lots of docs (#108)

This branch was originally created to expand source and web
documentation. However, as I tried to write those docs I found myself
questioning many previous design decisions. (It turns out this is a
trend.)

Anyway, the PR is now gigantic in scope. I've tried to break down
changes into categories.

### Changes to source documentation
* Introduced macros for opaque matrices and vectors in rst-compatable
source code comments (``\mtxA,\mtxB,\mtxX,\mtxS,\mtxx,\mtxy``). Right
now those macros render their arguments in bold, but we could easily
switch to a different style.
* Removed most bracket-based indexing notation like ``\mat(A)[i, j]`` in
favor of mathematically typeset subscripts like ``\mat(A)_{ij}``.
* Removed (the vast majority of) rst macro directive definitions from
source code comments.
* Complete rewrites of DenseDist, DenseSkOp, SparseDist, and SparseSkOp
docs.
* Complete rewrites of COOMatrix, CSRMatrix, and CSCMatrix source docs
to match the style of DenseSkOp and SparseSkOp.
* Added source docs for the Axis enum (renamed from MajorAxis).
* Added source docs for the SketchingOperator concept.

### New features and identifiers

* Change the definition of LASOs so they actually implement LESS-Uniform
(with Rademachers as the scalar sub-gaussian distribution).
* Made the SketchingDistribution and SketchingOperator concepts apply
meaningful constraints.
* Extend sample_indices_iid_uniform to support generating Rademachers in
addition to the indices themselves, much like how repeated_fisher_yates
works. Exposed an overload that matches the original signature.
* Add a fill_sparse overload that's analogous to the many-argument
fill_dense overload.
* De-overloaded the verbose fill_dense and fill_sparse into new
functions: fill_dense_unpacked and fill_sparse_unpacked_nosub. The
latter has the nosub suffix because it doesn't have all the parameters
needed to select a submatrix, and I'll eventually want a
fill_sparse_unpacked that qualitatively matches fill_dense_unpacked.

### API changes : types

Adopted a consistent policy on semantics of memory ownership. This
included making sure all sketching operator and sparse matrix classes
have non-const own_memory members.

Added isometry_scale member to SketchingDistribution, DenseDist, and
SparseDist.

Added dim_major and dim_minor to DenseDist and SparseDist. Deliberately
keeping this out of the SketchingDistribution concept.

RNGState
* Defined a DefaultRNG type, equal to r123::Philox4x32. RNGState's
default RNG template parameter is now RNG = DefaultRNG.
* Add static_assert that counter length for RNGs is >= 2. Needed for
Box-Muller transforms and many other places.

SparseDist
* Changed initialization order.
* Add full_nnz member
* Added a constructor for SparseDist, and replaced many instances of
defining SparseDists with initializer lists. The construct has vec_nnz=4
and major_axis=Short by default.

SparseSkOp
* Add nnz member
* Remove known_filled member from SparseSkOp; the old check
"!S.known_filled" is equivalent to "S.nnz <= 0".

SparseMatrix classes
 * Remove reserve instance-methods.
* Add reserve_xyz free-functions to replace the reserve
instance-methods.
 * Removed some (mostly) redundant constructors.

Important changes that don't affect the public API:
* Renamed MajorAxis enum to just "Axis".
* Removed Axis::Undefined and ScalarDist::BlackBox.
* Make a (private!) helper BLASFriendlyOperator datatype, so that I
don't have to coerce DenseSkOp into supporting submatrices directly.


### API changes : free functions

* LSKGES and RSKGES no longer call ``fill_sparse`` on a user-provided
unfilled SparseSkOp. Instead, we make a temporary shallow-copy, fill
that copy, and delete it before returning. This is in preparation for
when LSKGES/RSKGES only generate submatrices that are needed for the
operation at hand, and it follows the same memory-allocation/ownership
semantics of dense sketching ([L/R]SK[GE/SP]3).
* Template sample_indicies_iid and sample_indices_iid_uniform to allow
general signed integers, rather than only int64_t.
* Removed the ability of spmm and sketch_sparse to operate on
submatrices of sparse data matrices. This basic functionality is still
preserved by ``left_spmm`` and ``right_spmm`` in the former case and
``lsksp3`` and ``rsksp3`` in the latter case.
* Removed the "val" argument from the ``overwrite_triangle`` utility
function. I was only using this function with val=0.0.
* remove has_fixed_nnz_per_col(S), since it wasn't used.
* Moved several utility functions into the main ``RandBLAS`` namespace.
* Significantly modernized ``print_colmaj``.

### Templating conventions
* Changed templating in ``sketch_sparse`` to avoid nested templating. By
nested templating I mean something like``template <typename T, typename
RNG>`` and having an argument of type ``DenseSkOp<T,RNG>``, while more
"flat" templating would use ``template <typename DenseSkOp, typename T =
DenseSkOp::scalar_t>`` and then have arguments of type ``T`` and
``DenseSkOp``.
* Changed templating functions that only accepted sparse sketching
operators but templated as ``SKOP``. They now template as ``SparseSkOp``
(which in a technical sense makes no difference, but does make clear
that it's supposed to be a ``SparseSkOp<T,RNG>`` for some parameters
``T`` and ``RNG``.
* Started templating RNGState in function calls as ``typename state_t =
RNGState<DefaultRNG>``.


### Web docs
* Actually wrote the tutorial on selecting parameters for DenseDist and
SparseDist!
* Added a "FAQ and Limitations" page to the web docs. Includes
discussion of C++ idioms we do/don't use.

### Other

Added a utility script for converting my awkward reStructuredText source
code docs of function arguments into doxygen format. I thought the
doxygen format might be viable since I tweaked the CSS, but it wasn't.
I'm keeping the script in case I want to revisit this later.

Increased the size of allowed error messages in
``randblas_error_if_msg``.

Added tests for LSKSP3 and RSKSP3.
---
 .cloc_exlude                                  |  11 +
 .gitignore                                    |   1 +
 RandBLAS/base.hh                              | 421 +++++++----
 RandBLAS/dense_skops.hh                       | 555 ++++++++------
 RandBLAS/exceptions.hh                        |   6 +-
 RandBLAS/skge.hh                              | 373 +++++----
 RandBLAS/sksy.hh                              |  91 ++-
 RandBLAS/skve.hh                              |  52 +-
 RandBLAS/sparse_data/base.hh                  |  80 +-
 RandBLAS/sparse_data/conversions.hh           |  52 +-
 RandBLAS/sparse_data/coo_matrix.hh            | 270 ++++---
 RandBLAS/sparse_data/coo_spmm_impl.hh         |   7 +-
 RandBLAS/sparse_data/csc_matrix.hh            | 243 +++---
 RandBLAS/sparse_data/csc_spmm_impl.hh         |  10 +-
 RandBLAS/sparse_data/csr_matrix.hh            | 250 +++---
 RandBLAS/sparse_data/csr_spmm_impl.hh         |   8 +-
 RandBLAS/sparse_data/sksp.hh                  | 333 ++++----
 RandBLAS/sparse_data/spmm_dispatch.hh         |  83 +-
 RandBLAS/sparse_skops.hh                      | 714 +++++++++++-------
 RandBLAS/util.hh                              | 384 ++++++----
 docstring_transformers.py                     | 119 +++
 .../qrcp_matrixmarket.cc                      |   9 +-
 .../svd_matrixmarket.cc                       |   3 +-
 .../svd_rank1_plus_noise.cc                   |  21 +-
 .../total-least-squares/tls_sparse_skop.cc    |  12 +-
 rtd/DevNotes.md                               |   6 +-
 rtd/source/Doxyfile                           |   4 +-
 rtd/source/FAQ.rst                            | 181 +++++
 rtd/source/api_reference/index.rst            |  15 +-
 .../api_reference/index_sampling_utils.rst    |  13 -
 rtd/source/api_reference/other_sparse.rst     |  21 -
 rtd/source/api_reference/sketch_dense.rst     |  71 +-
 rtd/source/api_reference/sketch_sparse.rst    | 104 ++-
 rtd/source/api_reference/skops_and_dists.rst  | 157 ++--
 rtd/source/api_reference/sparse_matrices.rst  |  32 -
 rtd/source/api_reference/utilities.rst        |  36 +
 .../sparse_vs_dense_diagram_no_header.html    |   2 -
 rtd/source/conf.py                            |  49 +-
 rtd/source/index.rst                          |   3 +-
 rtd/source/installation/index.rst             |   9 +-
 rtd/source/tutorial/_incomplete_sketching.rst |  19 -
 rtd/source/tutorial/distributions.rst         | 198 ++++-
 rtd/source/tutorial/gemm.rst                  |  13 +-
 rtd/source/tutorial/index.rst                 |  45 +-
 rtd/source/tutorial/memory.rst                |  77 ++
 rtd/source/tutorial/sampling_skops.rst        |   3 +-
 rtd/source/tutorial/sketch_updates.rst        | 278 +++++++
 rtd/source/tutorial/submatrices.rst           |   4 +-
 rtd/source/tutorial/temp.rst                  |  68 --
 rtd/source/tutorial/updates.rst               | 128 ----
 rtd/source/updates/index.rst                  |   2 +-
 rtd/themes/randblas_rtd/static/custom.js      |  16 +
 .../randblas_rtd/static/theme_overrides.css   |  62 +-
 test/CMakeLists.txt                           |   2 +
 test/DevNotes.md                              |   4 +-
 test/handrolled_lapack.hh                     |  10 +-
 test/test_basic_rng/benchmark_speed.cc        |   2 +-
 test/test_basic_rng/test_continuous.cc        |  53 +-
 test/test_basic_rng/test_discrete.cc          |  22 +-
 test/test_basic_rng/test_distortion.cc        |  12 +-
 test/test_datastructures/test_denseskop.cc    |  76 +-
 test/test_datastructures/test_sparseskop.cc   | 167 ++--
 .../test_datastructures/test_spmats/common.hh |   6 +-
 .../test_spmats/test_coo.cc                   | 101 +--
 .../test_spmats/test_csc.cc                   |   2 +-
 .../test_spmats/test_csr.cc                   |   6 +-
 test/test_handrolled_lapack.cc                |  24 +-
 test/test_matmul_cores/linop_common.hh        |  27 +-
 test/test_matmul_cores/test_lskges.cc         | 109 ++-
 test/test_matmul_cores/test_rskge3.cc         |   2 +-
 test/test_matmul_cores/test_rskges.cc         |  67 +-
 .../test_spmm/spmm_test_helpers.hh            |   4 +-
 test/test_matmul_wrappers/DevNotes.md         |  19 +-
 .../test_sketch_sparse.cc                     | 558 ++++++++++++++
 .../test_sketch_symmetric.cc                  | 272 +++----
 .../test_sketch_vector.cc                     |  16 +-
 76 files changed, 4573 insertions(+), 2712 deletions(-)
 create mode 100644 .cloc_exlude
 create mode 100644 docstring_transformers.py
 create mode 100644 rtd/source/FAQ.rst
 delete mode 100644 rtd/source/api_reference/index_sampling_utils.rst
 delete mode 100644 rtd/source/api_reference/other_sparse.rst
 delete mode 100644 rtd/source/api_reference/sparse_matrices.rst
 create mode 100644 rtd/source/api_reference/utilities.rst
 delete mode 100644 rtd/source/assets/sparse_vs_dense_diagram_no_header.html
 delete mode 100644 rtd/source/tutorial/_incomplete_sketching.rst
 create mode 100644 rtd/source/tutorial/memory.rst
 create mode 100644 rtd/source/tutorial/sketch_updates.rst
 delete mode 100644 rtd/source/tutorial/temp.rst
 delete mode 100644 rtd/source/tutorial/updates.rst
 create mode 100644 rtd/themes/randblas_rtd/static/custom.js

diff --git a/.cloc_exlude b/.cloc_exlude
new file mode 100644
index 00000000..32093b8d
--- /dev/null
+++ b/.cloc_exlude
@@ -0,0 +1,11 @@
+bin/
+CMake/
+RandBLAS/CMakeFiles/
+examples/build/
+examples/sparse-data-matrices/
+rtd/build/
+rtd/sphinxext/
+test/lib/googletest/
+.github
+.gitignore
+.gitmodules
diff --git a/.gitignore b/.gitignore
index 2138c923..2ced0696 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1,4 +1,5 @@
 **/__pycache__/*
+*_ignore*
 
 # vim
 *.sw*
diff --git a/RandBLAS/base.hh b/RandBLAS/base.hh
index 9e37b57f..41ba86f0 100644
--- a/RandBLAS/base.hh
+++ b/RandBLAS/base.hh
@@ -53,12 +53,125 @@
 /// code common across the project
 namespace RandBLAS {
 
+typedef r123::Philox4x32 DefaultRNG;
+
+/** A representation of the state of a counter-based random number generator
+ * (CBRNG) defined in Random123. The representation consists of two arrays:
+ * the counter and the key. The arrays' types are statically sized, small
+ * (typically of length 2 or 4), and can be distinct from one another.
+ * 
+ * The template parameter RNG is a CBRNG type in defined in Random123. We've found
+ * that Philox-based CBRNGs work best for our purposes, but we also support Threefry-based CBRNGS.
+ */
+template <typename RNG = DefaultRNG>
+struct RNGState {
+
+    /// -------------------------------------------------------------------
+    /// Type of the underlying Random123 CBRNG. Must be based on 
+    /// Philox or Threefry. We've found that Philox works best for our 
+    /// purposes, and we default to Philox4x32.
+    using generator = RNG;
+    
+    using ctr_type = typename RNG::ctr_type;
+    // ^ An array type defined in Random123.
+    using key_type = typename RNG::key_type;
+    // ^ An array type defined in Random123.
+    using ctr_uint = typename RNG::ctr_type::value_type;
+    // ^ The unsigned integer type used in this RNGState's counter array.
+
+    /// -------------------------------------------------------------------
+    /// The unsigned integer type used in this RNGState's key array.
+    /// This is typically std::uint32_t, but it can be std::uint64_t.
+    using key_uint = typename RNG::key_type::value_type;
+
+    const static int len_c = RNG::ctr_type::static_size;
+    static_assert(len_c >= 2);
+    const static int len_k = RNG::key_type::static_size;
+
+    typename RNG::ctr_type counter;
+    // ^ This RNGState's counter array.  Exclude from doxygen comments
+    //   since end-users shouldn't touch it.
+
+    /// ------------------------------------------------------------------
+    /// This RNGState's key array. If you want to manually advance the key
+    /// by an integer increment of size "step," then you do so by calling 
+    /// this->key.incr(step).
+    typename RNG::key_type key;
+
+
+    /// Initialize the counter and key arrays to all zeros.
+    RNGState() : counter{{0}}, key(key_type{{}}) {}
+
+    // construct from a key
+    RNGState(key_type const &k) : counter{{0}}, key(k) {}
+
+    // Initialize counter and key arrays at the given values.
+    RNGState(ctr_type const &c, key_type const &k) : counter(c), key(k) {}
+
+    // move construct from an initial counter and key
+    RNGState(ctr_type &&c, key_type &&k) : counter(std::move(c)), key(std::move(k)) {}
+
+    /// Initialize the counter array to all zeros. Initialize the key array to have first
+    /// element equal to k and all other elements equal to zero.
+    RNGState(key_uint k) : counter{{0}}, key{{k}} {}
+
+    ~RNGState() {};
+
+    /// A copy constructor.
+    RNGState(const RNGState<RNG> &s) : RNGState(s.counter, s.key) {};
+
+    // A copy-assignment operator.
+    RNGState<RNG> &operator=(const RNGState<RNG> &s) {
+        std::memcpy(this->counter.v, s.counter.v, this->len_c * sizeof(ctr_uint));
+        std::memcpy(this->key.v,     s.key.v,     this->len_k * sizeof(key_uint));
+        return *this;
+    };
+
+};
+
+
+// template <typename RNG>
+// RNGState<RNG>::RNGState(
+//     const RNGState<RNG> &s
+// ) {
+//     std::memcpy(this->counter.v, s.counter.v, this->len_c * sizeof(ctr_uint));
+//     std::memcpy(this->key.v,     s.key.v,     this->len_k * sizeof(key_uint));
+// }
+
+// template <typename RNG>
+// RNGState<RNG> &RNGState<RNG>::operator=(
+//     const RNGState &s
+// ) {
+//     std::memcpy(this->counter.v, s.counter.v, this->len_c * sizeof(ctr_uint));
+//     std::memcpy(this->key.v,     s.key.v,     this->len_k * sizeof(key_uint));
+//     return *this;
+// }
+
+template <typename RNG>
+std::ostream &operator<<(
+    std::ostream &out,
+    const RNGState<RNG> &s
+) {
+    int i;
+    out << "counter : {";
+    for (i = 0; i < s.len_c - 1; ++i) {
+        out << s.counter[i] << ", ";
+    }
+    out << s.counter[i] << "}\n";
+    out << "key     : {";
+    for (i = 0; i < s.len_k - 1; ++i) {
+        out << s.key[i] << ", ";
+    }
+    out << s.key[i] << "}";
+    return out;
+}
+
 /**
  * Stores stride information for a matrix represented as a buffer.
  * The intended semantics for a buffer "A" and the conceptualized
  * matrix "mat(A)" are 
  * 
- *  mat(A)[i, j] == A[i * inter_row_stride + j * inter_col_stride].
+ *  mat(A)_{ij} == A[i * inter_row_stride + j * inter_col_stride].
  * 
  * for all (i, j) within the bounds of mat(A).
  */
@@ -114,8 +227,12 @@ inline submat_spec_64t offset_and_ldim(
 }
 
 
+#ifdef __cpp_concepts
 template<typename T>
 concept SignedInteger = (std::numeric_limits<T>::is_signed && std::numeric_limits<T>::is_integer);
+#else
+#define SignedInteger typename
+#endif
 
 
 template <SignedInteger TI, SignedInteger TO = int64_t>
@@ -135,160 +252,208 @@ inline TO safe_int_product(TI a, TI b) {
 }
 
 
-enum class MajorAxis : char {
+// ---------------------------------------------------------------------------
+/// Sketching operators are only "useful" for dimension reduction if they're
+/// non-square.
+///
+/// The larger dimension of a sketching operator has a different
+/// semantic role than the small dimension. This enum provides a way for us
+/// to refer to the larger or smaller dimension in a way that's agnostic to 
+/// whether the sketching operator is wide or tall.
+///  
+/// For a wide matrix, its *short-axis vectors* are its columns, and its
+/// *long-axis vectors* are its rows.
+///
+/// For a tall matrix, its short-axis vectors are its rows, and its
+/// long-axis vectors are its columns.
+///
+enum class Axis : char {
     // ---------------------------------------------------------------------------
-    ///  short-axis vectors (cols of a wide matrix, rows of a tall matrix)
     Short = 'S',
 
     // ---------------------------------------------------------------------------
-    ///  long-axis vectors (rows of a wide matrix, cols of a tall matrix)
-    Long = 'L',
-
-    // ---------------------------------------------------------------------------
-    ///  Undefined (used when row-major vs column-major must be explicit)
-    Undefined = 'U'
-};
-
-
-/** A representation of the state of a counter-based random number generator
- * (CBRNG) defined in Random123. The representation consists of two arrays:
- * the counter and the key. The arrays' types are statically sized, small
- * (typically of length 2 or 4), and can be distinct from one another.
- * 
- * The template parameter RNG is a CBRNG type in defined in Random123. We've found
- * that Philox-based CBRNGs work best for our purposes, but we also support Threefry-based CBRNGS.
- */
-template <typename RNG = r123::Philox4x32>
-struct RNGState
-{
-    using generator = RNG;
-    
-    using ctr_type = typename RNG::ctr_type;
-    // ^ An array type defined in Random123.
-    using key_type = typename RNG::key_type;
-    // ^ An array type defined in Random123.
-    using ctr_uint = typename RNG::ctr_type::value_type;
-    // ^ The unsigned integer type used in this RNGState's counter array.
-
-    /// -------------------------------------------------------------------
-    /// @brief The unsigned integer type used in this RNGState's key array.
-    ///        This is typically std::uint32_t, but it can be std::uint64_t.
-    using key_uint = typename RNG::key_type::value_type;
-
-
-    const static int len_c = RNG::ctr_type::static_size;
-    const static int len_k = RNG::key_type::static_size;
-    typename RNG::ctr_type counter;
-    // ^ This RNGState's counter array.
-
-    /// ------------------------------------------------------------------
-    /// This RNGState's key array. If you want to manually advance the key
-    /// by an integer increment of size "step," then you do so by calling 
-    /// this->key.incr(step).
-    typename RNG::key_type key;
-
-
-    /// Initialize the counter and key arrays to all zeros.
-    RNGState() : counter{{0}}, key(key_type{{}}) {}
-
-    // construct from a key
-    RNGState(key_type const &k) : counter{{0}}, key(k) {}
-
-    // Initialize counter and key arrays at the given values.
-    RNGState(ctr_type const &c, key_type const &k) : counter(c), key(k) {}
-
-    // move construct from an initial counter and key
-    RNGState(ctr_type &&c, key_type &&k) : counter(std::move(c)), key(std::move(k)) {}
-
-    /// Initialize the counter array to all zeros. Initialize the key array to have first
-    /// element equal to k and all other elements equal to zero.
-    RNGState(key_uint k) : counter{{0}}, key{{k}} {}
-
-    ~RNGState() {};
-
-    /// A copy constructor.
-    RNGState(const RNGState<RNG> &s);
-
-    RNGState<RNG> &operator=(const RNGState<RNG> &s);
-
+    Long = 'L'
 };
 
-
-template <typename RNG>
-RNGState<RNG>::RNGState(
-    const RNGState<RNG> &s
-) {
-    std::memcpy(this->counter.v, s.counter.v, this->len_c * sizeof(ctr_uint));
-    std::memcpy(this->key.v,     s.key.v,     this->len_k * sizeof(key_uint));
-}
-
-template <typename RNG>
-RNGState<RNG> &RNGState<RNG>::operator=(
-    const RNGState &s
-) {
-    std::memcpy(this->counter.v, s.counter.v, this->len_c * sizeof(ctr_uint));
-    std::memcpy(this->key.v,     s.key.v,     this->len_k * sizeof(key_uint));
-    return *this;
-}
-
-template <typename RNG>
-std::ostream &operator<<(
-    std::ostream &out,
-    const RNGState<RNG> &s
-) {
-    int i;
-    out << "counter : {";
-    for (i = 0; i < s.len_c - 1; ++i) {
-        out << s.counter[i] << ", ";
-    }
-    out << s.counter[i] << "}\n";
-    out << "key     : {";
-    for (i = 0; i < s.len_k - 1; ++i) {
-        out << s.key[i] << ", ";
+// ---------------------------------------------------------------------------
+/// Returns max(n_rows, n_cols) if major_axis == Axis::Long, and returns 
+/// min(n_rows, n_cols) otherwise.
+///
+inline int64_t get_dim_major(Axis major_axis, int64_t n_rows, int64_t n_cols) {
+    if (major_axis == Axis::Long) {
+        return std::max(n_rows, n_cols);
+    } else {
+        return std::min(n_rows, n_cols);
     }
-    out << s.key[i] << "}";
-    return out;
 }
 
+
+#ifdef __cpp_concepts
 // =============================================================================
 /// @verbatim embed:rst:leading-slashes
 ///
-/// .. NOTE: \ttt expands to \texttt (its definition is given in an rst file)
+/// **Mathematical description**
+///
+/// Matrices sampled from sketching distributions in RandBLAS are mean-zero
+/// and have covariance matrices that are proportional to the identity.
+///
+/// Formally, 
+/// if :math:`\D` is a distribution over :math:`r \times c` matrices and 
+/// :math:`\mtxS` is a sample from :math:`\D`,  then
+/// :math:`\mathbb{E}\mtxS = \mathbf{0}_{r \times c}` and
+///
+/// .. math::
+///    :nowrap:
+///     
+///     \begin{gather}
+///     \theta^2 \cdot \mathbb{E}\left[ \mtxS^T\mtxS \right]=\mathbf{I}_{c \times c}& \nonumber \\
+///     \,\phi^2 \cdot \mathbb{E}\left[ \mtxS{\mtxS}^T\, \right]=\mathbf{I}_{r \times r}& \nonumber
+///     \end{gather}
+///
+/// hold for some :math:`\theta > 0` and :math:`\phi > 0`.
+///
+/// The *isometry scale* of the distribution
+/// is :math:`\alpha := \theta` if :math:`c \geq r` and :math:`\alpha := \phi` otherwise. If you want to
+/// sketch in a way that preserves squared norms in expectation, then you should sketch with 
+/// a scaled sample :math:`\alpha \mtxS` rather than the sample itself.
 ///
-/// Words. Hello!
+/// **Programmatic description**
 ///
+/// A variable :math:`\ttt{D}` of a type that conforms to the 
+/// :math:`\ttt{SketchingDistribution}` concept has the following attributes.
+///
+/// .. list-table::
+///    :widths: 25 30 40
+///    :header-rows: 1
+///    
+///    * - 
+///      - type
+///      - description
+///    * - :math:`\ttt{D.n_rows}`
+///      - :math:`\ttt{const int64_t}`
+///      - samples from :math:`\ttt{D}` have this many rows
+///    * - :math:`\ttt{D.n_cols}`
+///      - :math:`\ttt{const int64_t}`
+///      - samples from :math:`\ttt{D}` have this many columns
+///    * - :math:`\ttt{D.isometry_scale}`
+///      - :math:`\ttt{const double}`
+///      - See above.
+///
+/// Note that the isometry scale is always stored in double precision; this has no bearing 
+/// on the precision of sketching operators that are sampled from a :math:`\ttt{SketchingDistribution}`.
+///
+/// **Notes**
+///
+/// RandBLAS has two SketchingDistribution types: DenseDist and SparseDist.
+/// These types have members called called "major_axis,"
+/// "dim_major," and "dim_minor." These members have similar semantic roles across
+/// the two classes, but their precise meanings differ significantly.
 /// @endverbatim
 template<typename SkDist>
 concept SketchingDistribution = requires(SkDist D) {
-    { D.n_rows }     -> std::convertible_to<const int64_t>;
-    { D.n_cols }     -> std::convertible_to<const int64_t>;
-    { D.major_axis } -> std::convertible_to<const MajorAxis>;
+    { D.n_rows }     -> std::same_as<const int64_t&>;
+    { D.n_cols }     -> std::same_as<const int64_t&>;
+    { D.isometry_scale } -> std::same_as<const double&>;
 };
-
-// =============================================================================
-/// \fn isometry_scale_factor(SkDist D) 
-/// @verbatim embed:rst:leading-slashes
-/// Words here ...
-/// @endverbatim
-template <typename T, SketchingDistribution SkDist>
-inline T isometry_scale_factor(SkDist D);
+#else
+#define SketchingDistribution typename
+#endif
 
 
+#ifdef __cpp_concepts
 // =============================================================================
+/// A type \math{\ttt{SKOP}} that conforms to the SketchingOperator concept
+/// has three member types.
 /// @verbatim embed:rst:leading-slashes
 ///
-/// .. NOTE: \ttt expands to \texttt (its definition is given in an rst file)
+/// .. list-table::
+///    :widths: 25 65
+///    :header-rows: 0
+///
+///    * - :math:`\ttt{SKOP::distribution_t}`
+///      - A type conforming to the SketchingDistribution concept.
+///    * - :math:`\ttt{SKOP::state_t}`
+///      - A template instantiation of RNGState.
+///    * - :math:`\ttt{SKOP::scalar_t}`
+///      - Real scalar type used in matrix representations of :math:`\ttt{SKOP}\text{s}.`
+///
+/// And an object :math:`\ttt{S}` of type :math:`\ttt{SKOP}` has the following 
+/// instance members.
+///
+/// .. list-table::
+///    :widths: 20 25 45
+///    :header-rows: 0
+///    
+///    * - :math:`\ttt{S.dist}`
+///      - :math:`\ttt{const distribution_t}`
+///      - Distribution from which this operator is sampled.
+///    * - :math:`\ttt{S.n_rows}`
+///      - :math:`\ttt{const int64_t}`
+///      - An alias for :math:`\ttt{S.dist.n_rows}.`
+///    * - :math:`\ttt{S.n_cols}`
+///      - :math:`\ttt{const int64_t}`
+///      - An alias for :math:`\ttt{S.dist.n_cols}.`
+///    * - :math:`\ttt{S.seed_state}`
+///      - :math:`\ttt{const state_t}`
+///      - RNGState used to construct
+///        an explicit representation of :math:`\ttt{S}`.
+///    * - :math:`\ttt{S.next_state}`
+///      - :math:`\ttt{const state_t}`
+///      - An RNGState that can be used in a call to a random sampling routine
+///        whose output should be statistically independent from :math:`\ttt{S}.`   
+///    * - :math:`\ttt{S.own_memory}`
+///      - :math:`\ttt{bool}`
+///      - A flag used to indicate whether internal functions
+///        have permission to attach memory to :math:`\ttt{S},`
+///        *and* whether the destructor of :math:`\ttt{S}` has the
+///        responsibility to delete any memory that's attached to
+///        :math:`\ttt{S}.`
 ///
-/// Words. Hello!
+/// 
+/// RandBLAS only has two SketchingOperator types: DenseSkOp and SparseSkOp. These types
+/// have several things in common
+/// that aren't enforced by the SketchingOperator concept. Most notably, they have 
+/// constructors of the following form.
+///
+/// .. code:: c++
+///
+///    SKOP(distribution_t dist, state_t seed_state) 
+///     : dist(dist), 
+///       seed_state(seed_state), 
+///       next_state(/* type-specific function of state and dist */), 
+///       n_rows(dist.n_rows), 
+///       n_cols(dist.n_cols), 
+///       own_memory(true)
+///       /* type-specific initializers */ { };
 ///
 /// @endverbatim
-template<typename SkOp>
-concept SketchingOperator = requires(SkOp S) {
-    { S.n_rows }     -> std::convertible_to<const int64_t>;
-    { S.n_cols }     -> std::convertible_to<const int64_t>;
-    { S.seed_state } -> std::convertible_to<const typename SkOp::state_t>;
-    { S.seed_state } -> std::convertible_to<const typename SkOp::state_t>;
+template<typename SKOP>
+concept SketchingOperator = requires {
+    typename SKOP::distribution_t;
+    typename SKOP::state_t;
+    typename SKOP::scalar_t;
+} && SketchingDistribution<typename SKOP::distribution_t> && requires(
+    SKOP S,typename SKOP::distribution_t dist, typename SKOP::state_t state
+) {
+    { S.dist }       -> std::same_as<const typename SKOP::distribution_t&>;
+    { S.n_rows }     -> std::same_as<const int64_t&>;
+    { S.n_cols }     -> std::same_as<const int64_t&>;
+    { S.seed_state } -> std::same_as<const typename SKOP::state_t&>;
+    { S.next_state } -> std::same_as<const typename SKOP::state_t&>;
+    { S.own_memory } -> std::same_as<bool&>;
 };
+#else
+#define SketchingOperator typename
+#endif
+
+// I want to add a constraint to SketchingOperator so that conformant types SKOP have two-argument constructors of the form SKOP(typename SKOP::
+
+///    * - :math:`\ttt{S.own_memory}`
+///      - :math:`\ttt{bool}`
+///      - RandBLAS has permission to attach memory to :math:`\ttt{S}`
+///        if and only if this is true. If true at destruction time, RandBLAS
+///        must delete any memory attached to :math:`\ttt{S}.` RandBLAS is 
+///        forbidden from deleting attached memory under any other circumstances.
 
 
 } // end namespace RandBLAS::base
diff --git a/RandBLAS/dense_skops.hh b/RandBLAS/dense_skops.hh
index 5cef11f1..5bd01ab4 100644
--- a/RandBLAS/dense_skops.hh
+++ b/RandBLAS/dense_skops.hh
@@ -171,14 +171,8 @@ static RNGState<RNG> fill_dense_submat_impl(int64_t n_cols, T* smat, int64_t n_s
 
 template <typename RNG, typename DD>
 RNGState<RNG> compute_next_state(DD dist, RNGState<RNG> state) {
-    if (dist.major_axis == MajorAxis::Undefined) {
-        // implies dist.family = DenseDistName::BlackBox
-        return state;
-    }
-    // ^ This is the only place where MajorAxis is actually used to some 
-    //   productive end.
-    int64_t major_len = major_axis_length(dist);
-    int64_t minor_len = dist.n_rows + (dist.n_cols - major_len);
+    int64_t major_len = dist.dim_major;
+    int64_t minor_len = dist.dim_minor;
     int64_t ctr_size = RNG::ctr_type::static_size;
     int64_t pad = 0;
     if (major_len % ctr_size != 0) {
@@ -190,6 +184,20 @@ RNGState<RNG> compute_next_state(DD dist, RNGState<RNG> state) {
     return state;
 }
 
+inline blas::Layout natural_layout(Axis major_axis, int64_t n_rows, int64_t n_cols) {
+    bool is_wide = n_rows < n_cols;
+    bool fa_long = major_axis == Axis::Long;
+    if (is_wide && fa_long) {
+        return blas::Layout::RowMajor;
+    } else if (is_wide) {
+        return blas::Layout::ColMajor;
+    } else if (fa_long) {
+        return blas::Layout::ColMajor;
+    } else {
+        return blas::Layout::RowMajor;
+    }
+}
+
 } // end namespace RandBLAS::dense
 
 
@@ -198,10 +206,8 @@ namespace RandBLAS {
 // =============================================================================
 /// We support two distributions for dense sketching operators: those whose
 /// entries are iid Gaussians or iid uniform over a symmetric interval.
-/// For implementation reasons, we also expose an option to indicate that an
-/// operator's distribution is unknown but it is still represented by a buffer
-/// that can be used in GEMM.
-enum class DenseDistName : char {
+/// 
+enum class ScalarDist : char {
     // ---------------------------------------------------------------------------
     ///  Indicates the Gaussian distribution with mean 0 and variance 1.
     Gaussian = 'G',
@@ -209,16 +215,13 @@ enum class DenseDistName : char {
     // ---------------------------------------------------------------------------
     ///  Indicates the uniform distribution over [-r, r] where r := sqrt(3)
     ///  is the radius that provides for a variance of 1.
-    Uniform = 'U',
-
-    // ---------------------------------------------------------------------------
-    /// Indicates that the sketching operator's entries will only be specified by
-    /// a user-provided buffer.
-    BlackBox = 'B'
+    Uniform = 'U'
 };
 
 // =============================================================================
-/// A distribution over dense sketching operators.
+///  A distribution over matrices whose entries are iid mean-zero variance-one
+///  random variables.
+///  This type conforms to the SketchingDistribution concept.
 struct DenseDist {
     // ---------------------------------------------------------------------------
     ///  Matrices drawn from this distribution have this many rows.
@@ -228,29 +231,52 @@ struct DenseDist {
     ///  Matrices drawn from this distribution have this many columns.
     const int64_t n_cols;
 
-    // ---------------------------------------------------------------------------
-    ///  The distribution used for the entries of the sketching operator.
-    const DenseDistName family;
-
     // ---------------------------------------------------------------------------
     ///  This member affects whether samples from this distribution have their
     ///  entries filled row-wise or column-wise. While there is no statistical 
     ///  difference between these two filling orders, there are situations
     ///  where one order or the other might be preferred.
-    ///  
-    /// @verbatim embed:rst:leading-slashes
-    /// .. dropdown:: *Notes for experts*
-    ///    :animate: fade-in-slide-down
     ///
-    ///     Deciding the value of this member is only needed
-    ///     in algorithms where (1) there's a need to iteratively generate panels of
-    ///     a larger sketching operator and (2) one of larger operator's dimensions
-    ///     cannot be known before the  iterative process starts.
+    ///  For more information, see the DenseDist::natural_layout and the section of the
+    ///  RandBLAS tutorial on
+    ///  @verbatim embed:rst:inline :ref:`updating sketches <sketch_updates>`. @endverbatim 
     ///
-    ///     Essentially, a column-wise fill order lets us stack operators horizontally
-    ///     in a consistent way, while row-wise fill order lets us stack vertically
-    ///     in a consistent way. The mapping from major_axis to fill order is given below.
-    /// 
+    const Axis major_axis;
+
+    // ---------------------------------------------------------------------------
+    ///  Defined as
+    ///  @verbatim embed:rst:leading-slashes
+    ///
+    ///  .. math::
+    ///
+    ///      \ttt{dim_major} = \begin{cases} \,\min\{ \ttt{n_rows},\, \ttt{n_cols} \} &\text{ if }~~ \ttt{major_axis} = \ttt{Short} \\ \max\{ \ttt{n_rows},\,\ttt{n_cols} \} & \text{ if } ~~\ttt{major_axis} = \ttt{Long} \end{cases}.
+    ///
+    ///  @endverbatim
+    ///
+    const int64_t dim_major;
+
+    // ---------------------------------------------------------------------------
+    ///  Defined as \math{\ttt{n_rows} + \ttt{n_cols} - \ttt{dim_major}.} This is
+    ///  just whichever of \math{(\ttt{n_rows}, \ttt{n_cols})} wasn't identified
+    ///  as \math{\ttt{dim_major}.}
+    ///
+    const int64_t dim_minor;
+
+    // ---------------------------------------------------------------------------
+    ///  A sketching operator sampled from this distribution should be multiplied
+    ///  by this constant in order for sketching to preserve norms in expectation.
+    const double isometry_scale;
+
+    // ---------------------------------------------------------------------------
+    ///  @verbatim embed:rst:leading-slashes
+    ///  The distribution on :math:`\mathbb{R}` for entries of operators sampled from this distribution.
+    ///  @endverbatim
+    const ScalarDist family;
+
+    // ---------------------------------------------------------------------------
+    ///  @verbatim embed:rst:leading-slashes
+    ///  The fill order (row major or column major) implied by major_axis,
+    ///  n_rows, and n_cols, according to the following table.
     ///        .. list-table::
     ///           :widths: 34 33 33
     ///           :header-rows: 1
@@ -259,105 +285,116 @@ struct DenseDist {
     ///             - :math:`\texttt{major_axis} = \texttt{Long}`
     ///             - :math:`\texttt{major_axis} = \texttt{Short}`
     ///           * - :math:`\texttt{n_rows} > \texttt{n_cols}`
-    ///             - column-wise
-    ///             - row-wise
+    ///             - column major
+    ///             - row major
     ///           * - :math:`\texttt{n_rows} \leq \texttt{n_cols}`
-    ///             - row-wise
-    ///             - column-wise
-    /// @endverbatim
-    const MajorAxis major_axis;
+    ///             - row major
+    ///             - column major
+    ///
+    ///  If you want to sample a dense sketching operator represented as 
+    ///  buffer in a layout different than the one given here, then a 
+    ///  change-of-layout has to be performed explicitly. 
+    ///  @endverbatim
+    ///
+    const blas::Layout natural_layout;
 
     // ---------------------------------------------------------------------------
-    ///  A distribution over matrices of shape (n_rows, n_cols) with entries drawn
-    ///  iid from either the default choice of standard normal distribution, or from
-    ///  the uniform distribution over [-r, r], where r := sqrt(3) provides for
-    ///  unit variance.
-    DenseDist(
-        int64_t n_rows,
-        int64_t n_cols,
-        DenseDistName dn = DenseDistName::Gaussian
-    ) : n_rows(n_rows), n_cols(n_cols), family(dn), major_axis( (dn == DenseDistName::BlackBox) ? MajorAxis::Undefined : MajorAxis::Long) { }
-
+    ///  Arguments passed to this function are used to initialize members of the same names.
+    ///  The members \math{\ttt{dim_major},} \math{\ttt{dim_minor},} \math{\ttt{isometry_scale},}
+    ///  and \math{\ttt{natural_layout}} are automatically initialized to be consistent with these arguments.
+    ///  
+    ///  This constructor will raise an error if \math{\min\\{\ttt{n_rows}, \ttt{n_cols}\\} \leq 0.}
     DenseDist(
         int64_t n_rows,
         int64_t n_cols,
-        DenseDistName dn,
-        MajorAxis ma
-    ) : n_rows(n_rows), n_cols(n_cols), family(dn), major_axis(ma) {
-        if (dn == DenseDistName::BlackBox) {
-            randblas_require(ma == MajorAxis::Undefined);
-        } else {
-            randblas_require(ma != MajorAxis::Undefined);
-        }  
+        ScalarDist family = ScalarDist::Gaussian,
+        Axis major_axis = Axis::Long
+    ) :  // variable definitions
+        n_rows(n_rows), n_cols(n_cols),
+        major_axis(major_axis),
+        dim_major((major_axis == Axis::Long) ? std::max(n_rows, n_cols) : std::min(n_rows, n_cols)),
+        dim_minor((major_axis == Axis::Long) ? std::min(n_rows, n_cols) : std::max(n_rows, n_cols)),
+        isometry_scale(std::pow(dim_minor, -0.5)),
+        family(family),
+        natural_layout(dense::natural_layout(major_axis, n_rows, n_cols))
+    {   // argument validation
+        randblas_require(n_rows > 0);
+        randblas_require(n_cols > 0);
     }
 
 };
 
-
-inline blas::Layout dist_to_layout(const DenseDist &D) {
-    randblas_require(D.major_axis != MajorAxis::Undefined);
-    bool is_wide = D.n_rows < D.n_cols;
-    bool fa_long = D.major_axis == MajorAxis::Long;
-    if (is_wide && fa_long) {
-        return blas::Layout::RowMajor;
-    } else if (is_wide) {
-        return blas::Layout::ColMajor;
-    } else if (fa_long) {
-        return blas::Layout::ColMajor;
-    } else {
-        return blas::Layout::RowMajor;
-    }
-}
-
-inline int64_t major_axis_length(const DenseDist &D) {
-    randblas_require(D.major_axis != MajorAxis::Undefined);
-    return (D.major_axis == MajorAxis::Long) ? 
-        std::max(D.n_rows, D.n_cols) : std::min(D.n_rows, D.n_cols);
-}
-
-template <typename T>
-inline T isometry_scale_factor(DenseDist D) {
-    if (D.family == DenseDistName::BlackBox) {
-        throw std::runtime_error("Unrecognized distribution.");
-    }
-    // When we sample from the scalar distributions we always
-    // scale things so they're variance-1. 
-    return std::pow((T) std::min(D.n_rows, D.n_cols), -0.5);
-}
+#ifdef __cpp_concepts
+static_assert(SketchingDistribution<DenseDist>);
+#endif
 
 
 // =============================================================================
-/// A sample from a distribution over dense sketching operators.
-///
+///  A sample from a distribution over matrices whose entries are iid
+///  mean-zero variance-one random variables.
+///  This type conforms to the SketchingOperator concept.
 template <typename T, typename RNG = r123::Philox4x32>
 struct DenseSkOp {
 
-    using state_t  = RNGState<RNG>;
-    using scalar_t = T;
+    // ---------------------------------------------------------------------------
+    /// Type alias.
+    using distribution_t = DenseDist;
 
-    const int64_t n_rows;
-    const int64_t n_cols;
+    // ---------------------------------------------------------------------------
+    /// Type alias.
+    using state_t = RNGState<RNG>;
 
     // ---------------------------------------------------------------------------
-    ///  The distribution from which this sketching operator is sampled.
-    ///  This member specifies the number of rows and columns of the sketching
-    ///  operator.
+    /// Real scalar type used in matrix representations of this operator.
+    using scalar_t = T;
+
+    // ---------------------------------------------------------------------------
+    ///  The distribution from which this operator is sampled;
+    ///  this member specifies the number of rows and columns of this operator.
     const DenseDist dist;
 
     // ---------------------------------------------------------------------------
-    ///  The state that should be passed to the RNG when the full sketching 
+    ///  The state that should be passed to the RNG when the full 
     ///  operator needs to be sampled from scratch. 
-    const RNGState<RNG> seed_state;
+    const state_t seed_state;
 
     // ---------------------------------------------------------------------------
-    ///  The state that should be used by the next call to an RNG *after* the
-    ///  full sketching operator has been sampled.
-    const RNGState<RNG> next_state;
+    ///  The state that should be used in the next call to a random sampling function
+    ///  whose output should be statistically independent from properties of this
+    ///  operator.
+    const state_t next_state;
 
+    // ---------------------------------------------------------------------------
+    ///  Alias for dist.n_rows.
+    const int64_t n_rows;
 
-    T *buff = nullptr;                      // memory
-    blas::Layout layout;                    // matrix storage order
-    bool del_buff_on_destruct = false;      // only applies if fill_dense(S) has been called.
+    // ---------------------------------------------------------------------------
+    ///  Alias for dist.n_cols.
+    const int64_t n_cols;
+
+    // ----------------------------------------------------------------------------
+    ///  If true, then RandBLAS has permission to allocate and attach memory to this operator's
+    ///  \math{\ttt{buff}} member. If true *at destruction time*, then delete []
+    ///  will be called on \math{\ttt{buff}} if it is non-null.
+    ///
+    ///  RandBLAS only writes to this member at construction time.
+    ///
+    bool own_memory;
+
+    // ---------------------------------------------------------------------------
+    ///  Reference to an array that holds the explicit representation of this operator
+    ///  as a dense matrix.
+    ///
+    ///  If non-null this must point to an array of length at least 
+    ///  \math{\ttt{dist.n_cols * dist.n_rows},} and this array must contain the 
+    ///  random samples from \math{\ttt{dist}} implied by \math{\ttt{seed_state}.} See DenseSkOp::layout for more information.
+    T *buff = nullptr; 
+
+    // ---------------------------------------------------------------------------
+    ///  The storage order for \math{\ttt{buff}.} The leading dimension of
+    ///  \math{\mat(\ttt{buff})} when reading from \math{\ttt{buff}} is
+    ///  \math{\ttt{dist.dim_major}.}
+    const blas::Layout layout;
 
 
     /////////////////////////////////////////////////////////////////////
@@ -366,130 +403,147 @@ struct DenseSkOp {
     //
     /////////////////////////////////////////////////////////////////////
 
-    DenseSkOp(
-        int64_t n_rows,
-        int64_t n_cols,
-        DenseDist dist,
-        RNGState<RNG> const &seed_state,
-        RNGState<RNG> const &next_state,
-        T *buff,
-        blas::Layout layout,
-        bool del_buff_on_destruct
-    ) : 
-        n_rows(n_rows), n_cols(n_cols), dist(dist),
-        seed_state(seed_state), next_state(next_state),
-        buff(buff), layout(layout), del_buff_on_destruct(del_buff_on_destruct) { }
-
-    ///---------------------------------------------------------------------------
-    /// Construct a DenseSkOp object, \math{S}.
-    ///
-    /// @param[in] dist
-    ///     DenseDist.
-    ///     - Specifies the dimensions of \math{S}.
-    ///     - Specifies the (scalar) distribution of \math{S}'s entries.
+    // ---------------------------------------------------------------------------
+    ///  Arguments passed to this function are 
+    ///  used to initialize members of the same names. \math{\ttt{own_memory}} is initialized to true,
+    ///  \math{\ttt{buff}} is initialized to nullptr, and \math{\ttt{layout}} is initialized
+    ///  to \math{\ttt{dist.natural_layout}.} The \math{\ttt{next_state}} member is 
+    ///  computed automatically from \math{\ttt{dist}} and \math{\ttt{next_state}.}
     ///
-    /// @param[in] state
-    ///     RNGState.
-    ///     - The RNG will use this as the starting point to generate all 
-    ///       random numbers needed for \math{S}.
+    ///  Although \math{\ttt{own_memory}} is initialized to true, RandBLAS will not attach
+    ///  memory to \math{\ttt{buff}} unless fill_dense(DenseSkOp &S) is called. 
     ///
+    ///  If a RandBLAS function needs an explicit representation of this operator and
+    ///  yet \math{\ttt{buff}} is null, then RandBLAS will construct a temporary
+    ///  explicit representation of this operator and delete that representation before returning.
+    ///  
     DenseSkOp(
         DenseDist dist,
-        RNGState<RNG> const &state
+        const state_t &seed_state
     ) : // variable definitions
+        dist(dist),
+        seed_state(seed_state),
+        next_state(dense::compute_next_state(dist, seed_state)),
         n_rows(dist.n_rows),
         n_cols(dist.n_cols),
-        dist(dist),
-        seed_state(state),
-        next_state(dense::compute_next_state(dist, state)),
+        own_memory(true),
+        // ^ We won't take advantage of own_memory unless this is passed to fill_dense(S).
         buff(nullptr),
-        layout(dist_to_layout(dist))
-    {   // sanity checks
-        randblas_require(this->dist.n_rows > 0);
-        randblas_require(this->dist.n_cols > 0);
-        if (dist.family == DenseDistName::BlackBox)
-            randblas_require(this->buff != nullptr);
-    };
-
-    // Destructor
+        layout(dist.natural_layout) { }
+
+    //  Move constructor
+    DenseSkOp(
+        DenseSkOp<T,RNG> &&S
+    ) : // Initializations
+        dist(S.dist),
+        seed_state(S.seed_state),
+        next_state(S.next_state),
+        n_rows(dist.n_rows), n_cols(dist.n_cols),
+        own_memory(S.own_memory), buff(S.buff), layout(S.layout)
+    {   // Body
+        S.buff = nullptr;
+        // ^ Our memory management policy prohibits us from changing
+        //   S.own_memory after S was constructed. But overwriting
+        //   S.buff with the null pointer is allowed since we 
+        //   can gaurantee that won't cause a memory leak.
+    }
+
+    //  Destructor
     ~DenseSkOp() {
-        if (this->del_buff_on_destruct)
-            delete [] this->buff;
+        if (own_memory && buff != nullptr) {
+            delete [] buff;
+        }
     }
 };
 
+#ifdef __cpp_concepts
+static_assert(SketchingOperator<DenseSkOp<float>>);
+static_assert(SketchingOperator<DenseSkOp<double>>);
+#endif
 
 // =============================================================================
 /// @verbatim embed:rst:leading-slashes
 ///
-///   .. |mat|   mathmacro:: \operatorname{mat}
 ///   .. |buff|  mathmacro:: \mathtt{buff}
-///   .. |D|     mathmacro:: \mathcal{D}
 ///   .. |nrows| mathmacro:: \mathtt{n\_rows}
 ///   .. |ncols| mathmacro:: \mathtt{n\_cols}
-///   .. |ioff| mathmacro:: \mathtt{i\_off}
-///   .. |joff| mathmacro:: \mathtt{j\_off}
+///   .. |ioff| mathmacro:: \mathtt{ro\_s}
+///   .. |joff| mathmacro:: \mathtt{co\_s}
 ///   .. |layout| mathmacro:: \mathtt{layout}
 ///
 /// @endverbatim
-/// Fill \math{\buff} so that (1) \math{\mat(\buff)} is a submatrix of
-/// an _implicit_ random sample from \math{\D}, and (2) \math{\mat(\buff)}
-/// is determined by reading from \math{\buff} in \math{\layout} order.
-/// 
-/// If we denote the implicit sample from \math{\D} by \math{S}, then we have
+/// This function provides the underlying implementation of fill_dense(DenseSkOp &S).
+/// Unlike fill_dense(DenseSkOp &S), this function can be used to generate explicit representations
+/// of *submatrices* of dense sketching operators.
+///
+/// Formally, if \math{\mtxS} were sampled from \math{\D} with \math{\ttt{seed_state=seed}},
+/// then on exit we'd have
+///
 /// @verbatim embed:rst:leading-slashes
 /// .. math::
-///     \mat(\buff) = S[\ioff:(\ioff + \nrows),\, \joff:(\joff + \ncols)]
+///     \mat(\buff)_{ij} = \mtxS_{(i+\ioff)(\joff + j)}
+///
+/// where :math:`\mat(\cdot)` reads from :math:`\buff` in :math:`\layout` order.
 /// @endverbatim
-/// on exit.
+/// If \math{\ttt{layout != dist.natural_layout}}
+/// then this function internally allocates \math{\ttt{n_rows * n_cols}} words of workspace,
+/// and deletes this workspace before returning.
 ///
-/// This function is for generating low-level representations of matrices
-/// that are equivalent to a submatrix of a RandBLAS DenseSkOp, but 
-/// without using the DenseSkOp abstraction. This can be useful if you want
-/// to sketch a structured matrix that RandBLAS doesn't support (like a symmetric
-/// matrix whose values are only stored in the upper or lower triangle).
+/// @verbatim embed:rst:leading-slashes
+/// .. dropdown:: Full parameter descriptions
+///   :animate: fade-in-slide-down
 ///
-/// Note that since the entries of \math{\buff} are sampled iid from a common
-/// distribution, the value of \math{\layout} is unlikely to have mathematical significance.
-/// However, the value of \math{\layout} can affect this function's efficiency.
-/// For best efficiency we recommend \math{\layout = \mathtt{dist\_to\_layout}(\D).}
-/// If a different value of \math{\layout} is used, then this function will internally
-/// allocate extra memory for an out-of-place change of storage order.
+///     layout      
+///      - blas::Layout::RowMajor or blas::Layout::ColMajor
+///      - The storage order for :math:`\mat(\buff)` on exit. The leading dimension
+///        is the smallest value permitted for a matrix of dimensions (n_rows, n_cols)
+///        in the given layout. I.e., it's n_rows if layout == ColMajor and 
+///        n_cols if layout == RowMajor.
+///      - Note that since the entries of :math:`\buff` are sampled iid from a common
+///        distribution, the value of :math:`\layout` is unlikely to have mathematical significance.
+///        However, the value of :math:`\layout` can affect this function's efficiency.
+///        For best efficiency we recommend :math:`\ttt{layout=}\D{}\ttt{.natural_layout}.`
+///        If a different value of :math:`\layout` is used, then this function will internally
+///        allocate extra memory for an out-of-place layout change.
 ///
-/// @param[in] layout
-///     blas::Layout::RowMajor or blas::Layout::ColMajor
-///      - The storage order for \math{\mat(\buff)} on exit.
-/// @param[in] D
-///      A DenseDist object.
+///     D
+///      - A DenseDist object.
 ///      - A distribution over random matrices of shape (D.n_rows, D.n_cols).
-/// @param[in] n_rows
-///      A positive integer.
-///      - The number of rows in \math{\mat(\buff)}.
-/// @param[in] n_cols
-///      A positive integer.
-///      - The number of columns in \math{\mat(\buff)}.
-/// @param[in] ro_s
-///      A nonnegative integer.
-///      - The row offset for \math{\mat(\buff)} as a submatrix of \math{S}. 
-///      - We require that \math{\ioff + \nrows} is at most D.n_rows.
-/// @param[in] co_s
-///      A nonnegative integer.
-///      - The column offset for \math{\mat(\buff)} as a submatrix of \math{S}. 
-///      - We require that \math{\joff + \ncols} is at most D.n_cols.
-/// @param[in] buff
-///     Buffer of type T.
-///     - Length must be at least \math{\nrows \cdot \ncols}.
-/// @param[in] seed
-///      A CBRNG state
-///      - Used to define \math{S} as a sample from \math{\D}.
-/// 
+///
+///     n_rows
+///      - A positive integer.
+///      - The number of rows in :math:`\mat(\buff).`
+///
+///     n_cols
+///      - A positive integer.
+///      - The number of columns in :math:`\mat(\buff).`
+///
+///     ro_s
+///      - A nonnegative integer.
+///      - The row offset for :math:`\mat(\buff)` as a submatrix of :math:`\mtxS.` 
+///      - We require that :math:`\ioff + \nrows` is at most D.n_rows.
+///
+///     co_s
+///      - A nonnegative integer.
+///      - The column offset for :math:`\mat(\buff)` as a submatrix of :math:`\mtxS.` 
+///      - We require that :math:`\joff + \ncols` is at most D.n_cols.
+///
+///     buff
+///      - Buffer of type T.
+///      - Length must be at least :math:`\nrows \cdot \ncols.`
+///
+///     seed
+///      - A CBRNG state
+///      - Used to define :math:`\mtxS` as a sample from :math:`\D.`
+///
+/// @endverbatim
 template<typename T, typename RNG = r123::Philox4x32>
-RNGState<RNG> fill_dense(blas::Layout layout, const DenseDist &D, int64_t n_rows, int64_t n_cols, int64_t ro_s, int64_t co_s, T* buff, const RNGState<RNG> &seed) {
+RNGState<RNG> fill_dense_unpacked(blas::Layout layout, const DenseDist &D, int64_t n_rows, int64_t n_cols, int64_t ro_s, int64_t co_s, T* buff, const RNGState<RNG> &seed) {
     using RandBLAS::dense::fill_dense_submat_impl;
     randblas_require(D.n_rows >= n_rows + ro_s);
     randblas_require(D.n_cols >= n_cols + co_s);
-    blas::Layout natural_layout = dist_to_layout(D);
-    int64_t ma_len = major_axis_length(D);
+    blas::Layout natural_layout = D.natural_layout;
+    int64_t ma_len = D.dim_major;
     int64_t n_rows_, n_cols_, ptr;
     if (natural_layout == blas::Layout::ColMajor) {
         // operate on the transpose in row-major
@@ -503,18 +557,15 @@ RNGState<RNG> fill_dense(blas::Layout layout, const DenseDist &D, int64_t n_rows
     }
     RNGState<RNG> next_state{};
     switch (D.family) {
-        case DenseDistName::Gaussian: {
+        case ScalarDist::Gaussian: {
             next_state = fill_dense_submat_impl<T,RNG,r123ext::boxmul>(ma_len, buff, n_rows_, n_cols_, ptr, seed);
             break;
         }
-        case DenseDistName::Uniform: {
+        case ScalarDist::Uniform: {
             next_state = fill_dense_submat_impl<T,RNG,r123ext::uneg11>(ma_len, buff, n_rows_, n_cols_, ptr, seed);
             blas::scal(n_rows_ * n_cols_, (T)std::sqrt(3), buff, 1);
             break;
         }
-        case DenseDistName::BlackBox: {
-            throw std::invalid_argument(std::string("fill_dense cannot be called with the BlackBox distribution."));
-        }
         default: {
             throw std::runtime_error(std::string("Unrecognized distribution."));
         }
@@ -532,15 +583,8 @@ RNGState<RNG> fill_dense(blas::Layout layout, const DenseDist &D, int64_t n_rows
 }
  
 // =============================================================================
-/// @verbatim embed:rst:leading-slashes
-///
-///   .. |mat|  mathmacro:: \operatorname{mat}
-///   .. |buff| mathmacro:: \mathtt{buff}
-///   .. |D|    mathmacro:: \mathcal{D} 
-///
-/// @endverbatim
 /// Fill \math{\buff} so that \math{\mat(\buff)} is a sample from \math{\D} using
-/// seed \math{\mathtt{seed}}.
+/// seed \math{\mathtt{seed}.} The buffer's layout is \math{\D\ttt{.natural_layout}.}
 ///
 /// @param[in] D
 ///      A DenseDist object.
@@ -548,56 +592,75 @@ RNGState<RNG> fill_dense(blas::Layout layout, const DenseDist &D, int64_t n_rows
 ///     Buffer of type T.
 ///     - Length must be at least D.n_rows * D.n_cols.
 ///     - The leading dimension of \math{\mat(\buff)} when reading from \math{\buff}
-///       is either D.n_rows or D.n_cols, depending on the return value of this function
-///       that indicates row-major or column-major layout.
+///       is either D.n_rows or D.n_cols, depending on \math{\D\ttt{.natural_layout}.}
 /// @param[in] seed
 ///      A CBRNG state
 ///      - Used to define \math{\mat(\buff)} as a sample from \math{\D}.
 ///
-/// @returns
-///     A std::pair consisting of "layout" and "next_state".
-///     - \math{\buff} must be read in "layout" order 
-///       to recover \math{\mat(\buff)}. This layout is determined
-///       from \math{\D} and cannot be controlled directly.
-///     - If this function returns a layout that is undesirable then it is
-///       the caller's responsibility to perform a transpose as needed.
-/// 
 template <typename T, typename RNG = r123::Philox4x32>
 RNGState<RNG> fill_dense(const DenseDist &D, T *buff, const RNGState<RNG> &seed) {
-    return fill_dense(dist_to_layout(D), D, D.n_rows, D.n_cols, 0, 0, buff, seed);
+    return fill_dense_unpacked(D.natural_layout, D, D.n_rows, D.n_cols, 0, 0, buff, seed);
 }
 
-// ============================================================================= 
-/// Performs the work in sampling S from its underlying distribution. This entails
-/// allocating a buffer of size S.dist.n_rows * S.dist.n_cols, attaching that
-/// buffer to S as S.buff, and finally sampling iid random variables to populate
-/// S.buff. A flag is set on S so its destructor will deallocate S.buff.
-/// 
-/// By default, RandBLAS allocates and populates buffers for dense sketching operators
-/// just before they are needed in some operation, and then it deletes these buffers
-/// once the operation is complete. Calling this function bypasses that policy.
+// =============================================================================
+/// If \math{\ttt{S.own_memory}} is true then we enter an allocation stage. If
+/// \math{\ttt{S.buff}} is equal to \math{\ttt{nullptr}} then it is redirected to the
+/// start of an new array (allocated with ``new []``)
+/// of length \math{\ttt{S.n_rows * S.n_cols}.} 
 ///
-/// @param[in] S
-///     A DenseSkOp object.
-///    
+/// After the allocation stage, we check \math{\ttt{S.buff}} and we raise
+/// an error if it's null.
+///
+/// If \math{\ttt{S.buff}} is are non-null, then we'll assume it has length at least
+///  \math{\ttt{S.n_rows * S.n_cols}.} We'll proceed to populate \math{\ttt{S.buff}} 
+/// with the data for the explicit representation of \math{\ttt{S}.}
+/// On exit, one can encode a BLAS-style representation of \math{\ttt{S}} with the tuple
+/// @verbatim embed:rst:leading-slashes
+/// .. math::
+///     
+///     (\ttt{S.layout},~\ttt{S.n_rows},~\ttt{S.n_cols},~\ttt{S.buff},~\ttt{S.dist.dim_major})
+///
+/// In BLAS parlance, the last element of this tuple would be called the "leading dimension"
+/// of :math:`\ttt{S}.`
+/// @endverbatim
 template <typename DenseSkOp>
 void fill_dense(DenseSkOp &S) {
-    using T = typename DenseSkOp::scalar_t;
-    randblas_require(S.buff == nullptr);
-    randblas_require(S.dist.family != DenseDistName::BlackBox);
-    S.buff = new T[S.dist.n_rows * S.dist.n_cols];
-    fill_dense(S.dist, S.buff, S.seed_state);
-    S.del_buff_on_destruct = true;
+    if (S.own_memory && S.buff == nullptr) {
+        using T = typename DenseSkOp::scalar_t;
+        S.buff = new T[S.n_rows * S.n_cols];
+    }
+    randblas_require(S.buff != nullptr);
+    fill_dense_unpacked(S.layout, S.dist, S.n_rows, S.n_cols, 0, 0, S.buff, S.seed_state);
     return;
 }
 
-template <typename T, typename RNG>
-DenseSkOp<T,RNG> submatrix_as_blackbox(const DenseSkOp<T,RNG> &S, int64_t n_rows, int64_t n_cols, int64_t ro_s, int64_t co_s) {
+template <typename T>
+struct BLASFriendlyOperator {
+    using scalar_t = T;
+    const blas::Layout layout;
+    const int64_t n_rows;
+    const int64_t n_cols;
+    T* buff;
+    const int64_t ldim;
+    const bool own_memory;
+
+    ~BLASFriendlyOperator() {
+        if (own_memory && buff != nullptr) {
+            delete [] buff;
+        }
+    }
+};
+
+template <typename BFO, typename DenseSkOp>
+BFO submatrix_as_blackbox(const DenseSkOp &S, int64_t n_rows, int64_t n_cols, int64_t ro_s, int64_t co_s) {
+    randblas_require(ro_s + n_rows <= S.n_rows);
+    randblas_require(co_s + n_cols <= S.n_cols);
+    using T = typename DenseSkOp::scalar_t;
     T *buff = new T[n_rows * n_cols];
-    auto dl = dist_to_layout(S.dist);
-    fill_dense(dl, S.dist, n_rows, n_cols, ro_s, co_s, buff, S.seed_state);
-    DenseDist submatrix_dist{n_rows, n_cols, DenseDistName::BlackBox, MajorAxis::Undefined};
-    DenseSkOp<T,RNG> submatrix{n_rows, n_cols, submatrix_dist, S.seed_state, S.next_state, buff, dl, true};
+    auto layout = S.layout;
+    fill_dense_unpacked(layout, S.dist, n_rows, n_cols, ro_s, co_s, buff, S.seed_state);
+    int64_t dim_major = S.dist.dim_major;
+    BFO submatrix{layout, n_rows, n_cols, buff, dim_major, true};
     return submatrix;
 }
 
diff --git a/RandBLAS/exceptions.hh b/RandBLAS/exceptions.hh
index 898f0bbe..f3105779 100644
--- a/RandBLAS/exceptions.hh
+++ b/RandBLAS/exceptions.hh
@@ -87,6 +87,8 @@ inline void throw_if( bool cond, const char* condstr, const char* func )
     #define RandBLAS_ATTR_FORMAT(I, F) __attribute__((format( printf, I, F )))
 #endif
 
+#define RandBLAS_ERROR_MESSAGE_SIZE 256
+
 // -----------------------------------------------------------------------------
 // internal helper function; throws Error if cond is true
 // uses printf-style format for error message
@@ -98,7 +100,7 @@ inline void throw_if( bool cond, const char* condstr, const char* func, const ch
 inline void throw_if( bool cond, const char* condstr, const char* func, const char* format, ... ) {
     UNUSED(condstr);
     if (cond) {
-        char buf[80];
+        char buf[RandBLAS_ERROR_MESSAGE_SIZE];
         va_list va;
         va_start( va, format );
         vsnprintf( buf, sizeof(buf), format, va );
@@ -116,7 +118,7 @@ inline void abort_if( bool cond, const char* func,  const char* format, ... )
 
 inline void abort_if( bool cond, const char* func,  const char* format, ... ) {
     if (cond) {
-        char buf[80];
+        char buf[RandBLAS_ERROR_MESSAGE_SIZE];
         va_list va;
         va_start( va, format );
         vsnprintf( buf, sizeof(buf), format, va );
diff --git a/RandBLAS/skge.hh b/RandBLAS/skge.hh
index 89ff455d..b13c86c6 100644
--- a/RandBLAS/skge.hh
+++ b/RandBLAS/skge.hh
@@ -42,16 +42,6 @@
 #include <math.h>
 #include <typeinfo>
 
-/* Intended macro definitions.
-
-   .. |op| mathmacro:: \operatorname{op}
-   .. |mat| mathmacro:: \operatorname{mat}
-   .. |submat| mathmacro:: \operatorname{submat}
-   .. |lda| mathmacro:: \texttt{lda}
-   .. |ldb| mathmacro:: \texttt{ldb}
-   .. |opA| mathmacro:: \texttt{opA}
-   .. |opS| mathmacro:: \texttt{opS}
-*/
 
 namespace RandBLAS::dense {
 
@@ -61,23 +51,12 @@ using RandBLAS::fill_dense;
 // MARK: LSKGE3
 
 // =============================================================================
-/// @verbatim embed:rst:leading-slashes
-///
-///   .. |op| mathmacro:: \operatorname{op}
-///   .. |mat| mathmacro:: \operatorname{mat}
-///   .. |submat| mathmacro:: \operatorname{submat}
-///   .. |lda| mathmacro:: \mathrm{lda}
-///   .. |ldb| mathmacro:: \mathrm{ldb}
-///   .. |opA| mathmacro:: \mathrm{opA}
-///   .. |opS| mathmacro:: \mathrm{opS}
-///
-/// @endverbatim
 /// LSKGE3: Perform a GEMM-like operation
 /// @verbatim embed:rst:leading-slashes
 /// .. math::
-///     \mat(B) = \alpha \cdot \underbrace{\op(\submat(S))}_{d \times m} \cdot \underbrace{\op(\mat(A))}_{m \times n} + \beta \cdot \underbrace{\mat(B)}_{d \times n},    \tag{$\star$}
+///     \mat(B) = \alpha \cdot \underbrace{\op(\submat(\mtxS))}_{d \times m} \cdot \underbrace{\op(\mat(A))}_{m \times n} + \beta \cdot \underbrace{\mat(B)}_{d \times n},    \tag{$\star$}
 /// @endverbatim
-/// where \math{\alpha} and \math{\beta} are real scalars, \math{\op(X)} either returns a matrix \math{X}
+/// where \math{\alpha} and \math{\beta} are real scalars, \math{\op(\mtxX)} either returns a matrix \math{X}
 /// or its transpose, and \math{S} is a sketching operator that takes Level 3 BLAS effort to apply.
 /// 
 /// @verbatim embed:rst:leading-slashes
@@ -86,19 +65,19 @@ using RandBLAS::fill_dense;
 ///     Their precise contents are determined by :math:`(A, \lda)`, :math:`(B, \ldb)`,
 ///     and "layout", following the same convention as BLAS.
 ///
-/// What is :math:`\submat(S)`?
+/// What is :math:`\submat(\mtxS)`?
 ///     Its shape is defined implicitly by :math:`(\opS, d, m)`.
-///     If :math:`{\submat(S)}` is of shape :math:`r \times c`,
-///     then it is the :math:`r \times c` submatrix of :math:`{S}` whose upper-left corner
-///     appears at index :math:`(\texttt{ro_s}, \texttt{co_s})` of :math:`{S}`.
+///     If :math:`{\submat(\mtxS)}` is of shape :math:`r \times c`,
+///     then it is the :math:`r \times c` submatrix of :math:`{\mtxS}` whose upper-left corner
+///     appears at index :math:`(\texttt{ro_s}, \texttt{co_s})` of :math:`{\mtxS}`.
 /// @endverbatim
 /// @param[in] layout
 ///     Layout::ColMajor or Layout::RowMajor
 ///      - Matrix storage for \math{\mat(A)} and \math{\mat(B)}.
 ///
 /// @param[in] opS
-///      - If \math{\opS} = NoTrans, then \math{ \op(\submat(S)) = \submat(S)}.
-///      - If \math{\opS} = Trans, then \math{\op(\submat(S)) = \submat(S)^T }.
+///      - If \math{\opS} = NoTrans, then \math{ \op(\submat(\mtxS)) = \submat(\mtxS)}.
+///      - If \math{\opS} = Trans, then \math{\op(\submat(\mtxS)) = \submat(\mtxS)^T }.
 /// @param[in] opA
 ///      - If \math{\opA} == NoTrans, then \math{\op(\mat(A)) = \mat(A)}.
 ///      - If \math{\opA} == Trans, then \math{\op(\mat(A)) = \mat(A)^T}.
@@ -114,7 +93,7 @@ using RandBLAS::fill_dense;
 ///
 /// @param[in] m
 ///     A nonnegative integer.
-///     - The number of columns in \math{\op(\submat(S))}
+///     - The number of columns in \math{\op(\submat(\mtxS))}
 ///     - The number of rows in \math{\op(\mat(A))}.
 ///
 /// @param[in] alpha
@@ -123,17 +102,17 @@ using RandBLAS::fill_dense;
 ///
 /// @param[in] S
 ///    A DenseSkOp object.
-///    - Defines \math{\submat(S)}.
+///    - Defines \math{\submat(\mtxS)}.
 ///
 /// @param[in] ro_s
 ///     A nonnegative integer.
-///     - The rows of \math{\submat(S)} are a contiguous subset of rows of \math{S}.
-///     - The rows of \math{\submat(S)} start at \math{S[\texttt{ro_s}, :]}.
+///     - The rows of \math{\submat(\mtxS)} are a contiguous subset of rows of \math{S}.
+///     - The rows of \math{\submat(\mtxS)} start at \math{S[\texttt{ro_s}, :]}.
 ///
 /// @param[in] co_s
 ///     A nonnnegative integer.
-///     - The columns of \math{\submat(S)} are a contiguous subset of columns of \math{S}.
-///     - The columns \math{\submat(S)} start at \math{S[:,\texttt{co_s}]}. 
+///     - The columns of \math{\submat(\mtxS)} are a contiguous subset of columns of \math{S}.
+///     - The columns \math{\submat(\mtxS)} start at \math{S[:,\texttt{co_s}]}. 
 ///
 /// @param[in] A
 ///     Pointer to a 1D array of real scalars.
@@ -145,13 +124,13 @@ using RandBLAS::fill_dense;
 ///     * If layout == ColMajor, then
 ///         @verbatim embed:rst:leading-slashes
 ///             .. math::
-///                 \mat(A)[i, j] = A[i + j \cdot \lda].
+///                 \mat(A)_{ij} = A[i + j \cdot \lda].
 ///         @endverbatim
 ///       In this case, \math{\lda} must be \math{\geq} the length of a column in \math{\mat(A)}.
 ///     * If layout == RowMajor, then
 ///         @verbatim embed:rst:leading-slashes
 ///             .. math::
-///                 \mat(A)[i, j] = A[i \cdot \lda + j].
+///                 \mat(A)_{ij} = A[i \cdot \lda + j].
 ///         @endverbatim
 ///       In this case, \math{\lda} must be \math{\geq} the length of a row in \math{\mat(A)}.
 ///
@@ -170,7 +149,7 @@ using RandBLAS::fill_dense;
 ///    - Leading dimension of \math{\mat(B)} when reading from \math{B}.
 ///    - Refer to documentation for \math{\lda} for details. 
 ///
-template <typename T, typename RNG>
+template <typename T, typename DenseSkOp>
 void lskge3(
     blas::Layout layout,
     blas::Op opS,
@@ -179,7 +158,7 @@ void lskge3(
     int64_t n, // op(A) is m-by-n
     int64_t m, // op(S) is d-by-m
     T alpha,
-    DenseSkOp<T,RNG> &S,
+    DenseSkOp &S,
     int64_t ro_s,
     int64_t co_s,
     const T *A,
@@ -189,13 +168,19 @@ void lskge3(
     int64_t ldb
 ){
     auto [rows_submat_S, cols_submat_S] = dims_before_op(d, m, opS);
-    if (!S.buff) {
-        auto submat_S = submatrix_as_blackbox(S, rows_submat_S, cols_submat_S, ro_s, co_s);
-        lskge3(layout, opS, opA, d, n, m, alpha, submat_S, 0, 0, A, lda, beta, B, ldb);
-        return;
+    constexpr bool maybe_denseskop = !std::is_same_v<std::remove_cv_t<DenseSkOp>, BLASFriendlyOperator<T>>;
+    if constexpr (maybe_denseskop) {
+        if (!S.buff) {
+            // DenseSkOp doesn't permit defining a "black box" distribution, so we have to pack the submatrix
+            // into an equivalent datastructure ourselves.
+            auto submat_S = submatrix_as_blackbox<BLASFriendlyOperator<T>>(S, rows_submat_S, cols_submat_S, ro_s, co_s);
+            lskge3(layout, opS, opA, d, n, m, alpha, submat_S, 0, 0, A, lda, beta, B, ldb);
+            return;
+        } // else, continue with the function as usual.
     }
-    randblas_require( S.dist.n_rows >= rows_submat_S + ro_s );
-    randblas_require( S.dist.n_cols >= cols_submat_S + co_s );
+    randblas_require( S.buff != nullptr );
+    randblas_require( S.n_rows >= rows_submat_S + ro_s );
+    randblas_require( S.n_cols >= cols_submat_S + co_s );
     auto [rows_A, cols_A] = dims_before_op(m, n, opA);
     if (layout == blas::Layout::ColMajor) {
         randblas_require(lda >= rows_A);
@@ -205,7 +190,7 @@ void lskge3(
         randblas_require(ldb >= n);
     }
 
-    auto [pos, lds] = offset_and_ldim(S.layout, S.dist.n_rows, S.dist.n_cols, ro_s, co_s);
+    auto [pos, lds] = offset_and_ldim(S.layout, S.n_rows, S.n_cols, ro_s, co_s);
     T* S_ptr = &S.buff[pos];
     if (S.layout != layout)
         opS = (opS == blas::Op::NoTrans) ? blas::Op::Trans : blas::Op::NoTrans;
@@ -220,9 +205,9 @@ void lskge3(
 /// RSKGE3: Perform a GEMM-like operation
 /// @verbatim embed:rst:leading-slashes
 /// .. math::
-///     \mat(B) = \alpha \cdot \underbrace{\op(\mat(A))}_{m \times n} \cdot \underbrace{\op(\submat(S))}_{n \times d} + \beta \cdot \underbrace{\mat(B)}_{m \times d},    \tag{$\star$}
+///     \mat(B) = \alpha \cdot \underbrace{\op(\mat(A))}_{m \times n} \cdot \underbrace{\op(\submat(\mtxS))}_{n \times d} + \beta \cdot \underbrace{\mat(B)}_{m \times d},    \tag{$\star$}
 /// @endverbatim
-/// where \math{\alpha} and \math{\beta} are real scalars, \math{\op(X)} either returns a matrix \math{X}
+/// where \math{\alpha} and \math{\beta} are real scalars, \math{\op(\mtxX)} either returns a matrix \math{X}
 /// or its transpose, and \math{S} is a sketching operator that takes Level 3 BLAS effort to apply.
 /// 
 /// @verbatim embed:rst:leading-slashes
@@ -231,11 +216,11 @@ void lskge3(
 ///     Their precise contents are determined by :math:`(A, \lda)`, :math:`(B, \ldb)`,
 ///     and "layout", following the same convention as BLAS.
 ///
-/// What is :math:`\submat(S)`?
+/// What is :math:`\submat(\mtxS)`?
 ///     Its shape is defined implicitly by :math:`(\opS, n, d)`.
-///     If :math:`{\submat(S)}` is of shape :math:`r \times c`,
-///     then it is the :math:`r \times c` submatrix of :math:`{S}` whose upper-left corner
-///     appears at index :math:`(\texttt{ro_s}, \texttt{co_s})` of :math:`{S}`.
+///     If :math:`{\submat(\mtxS)}` is of shape :math:`r \times c`,
+///     then it is the :math:`r \times c` submatrix of :math:`{\mtxS}` whose upper-left corner
+///     appears at index :math:`(\texttt{ro_s}, \texttt{co_s})` of :math:`{\mtxS}`.
 /// @endverbatim
 /// @param[in] layout
 ///     Layout::ColMajor or Layout::RowMajor
@@ -246,8 +231,8 @@ void lskge3(
 ///      - If \math{\opA} == Trans, then \math{\op(\mat(A)) = \mat(A)^T}.
 ///
 /// @param[in] opS
-///      - If \math{\opS} = NoTrans, then \math{ \op(\submat(S)) = \submat(S)}.
-///      - If \math{\opS} = Trans, then \math{\op(\submat(S)) = \submat(S)^T }.
+///      - If \math{\opS} = NoTrans, then \math{ \op(\submat(\mtxS)) = \submat(\mtxS)}.
+///      - If \math{\opS} = Trans, then \math{\op(\submat(\mtxS)) = \submat(\mtxS)^T }.
 ///
 /// @param[in] m
 ///     A nonnegative integer.
@@ -262,7 +247,7 @@ void lskge3(
 /// @param[in] n
 ///     A nonnegative integer.
 ///     - The number of columns in \math{\op(\mat(A))}
-///     - The number of rows in \math{\op(\submat(S))}.
+///     - The number of rows in \math{\op(\submat(\mtxS))}.
 ///
 /// @param[in] alpha
 ///     A real scalar.
@@ -278,29 +263,29 @@ void lskge3(
 ///     * If layout == ColMajor, then
 ///         @verbatim embed:rst:leading-slashes
 ///             .. math::
-///                 \mat(A)[i, j] = A[i + j \cdot \lda].
+///                 \mat(A)_{ij} = A[i + j \cdot \lda].
 ///         @endverbatim
 ///       In this case, \math{\lda} must be \math{\geq} the length of a column in \math{\mat(A)}.
 ///     * If layout == RowMajor, then
 ///         @verbatim embed:rst:leading-slashes
 ///             .. math::
-///                 \mat(A)[i, j] = A[i \cdot \lda + j].
+///                 \mat(A)_{ij} = A[i \cdot \lda + j].
 ///         @endverbatim
 ///       In this case, \math{\lda} must be \math{\geq} the length of a row in \math{\mat(A)}.
 ///
 /// @param[in] S
 ///    A DenseSkOp object.
-///    - Defines \math{\submat(S)}.
+///    - Defines \math{\submat(\mtxS)}.
 ///
 /// @param[in] ro_s
 ///     A nonnegative integer.
-///     - The rows of \math{\submat(S)} are a contiguous subset of rows of \math{S}.
-///     - The rows of \math{\submat(S)} start at \math{S[\texttt{ro_s}, :]}.
+///     - The rows of \math{\submat(\mtxS)} are a contiguous subset of rows of \math{S}.
+///     - The rows of \math{\submat(\mtxS)} start at \math{S[\texttt{ro_s}, :]}.
 ///
 /// @param[in] co_s
 ///     A nonnnegative integer.
-///     - The columns of \math{\submat(S)} are a contiguous subset of columns of \math{S}.
-///     - The columns \math{\submat(S)} start at \math{S[:,\texttt{co_s}]}. 
+///     - The columns of \math{\submat(\mtxS)} are a contiguous subset of columns of \math{S}.
+///     - The columns \math{\submat(\mtxS)} start at \math{S[:,\texttt{co_s}]}. 
 ///
 /// @param[in] beta
 ///     A real scalar.
@@ -317,7 +302,7 @@ void lskge3(
 ///    - Leading dimension of \math{\mat(B)} when reading from \math{B}.
 ///    - Refer to documentation for \math{\lda} for details. 
 ///
-template <typename T, typename RNG>
+template <typename T, typename DenseSkOp>
 void rskge3(
     blas::Layout layout,
     blas::Op opA,
@@ -328,7 +313,7 @@ void rskge3(
     T alpha,
     const T *A,
     int64_t lda,
-    DenseSkOp<T,RNG> &S,
+    DenseSkOp &S,
     int64_t ro_s,
     int64_t co_s,
     T beta,
@@ -336,15 +321,19 @@ void rskge3(
     int64_t ldb
 ){
     auto [rows_submat_S, cols_submat_S] = dims_before_op(n, d, opS);
-    if (!S.buff) {
-        // We'll make a shallow copy of the sketching operator, take responsibility for filling the memory
-        // of that sketching operator, and then call RSKGE3 with that new object.
-        auto submat_S = submatrix_as_blackbox(S, rows_submat_S, cols_submat_S, ro_s, co_s);
-        rskge3(layout, opA, opS, m, d, n, alpha, A, lda, submat_S, 0, 0, beta, B, ldb);
-        return;
+    constexpr bool maybe_denseskop = !std::is_same_v<std::remove_cv_t<DenseSkOp>, BLASFriendlyOperator<T>>;
+    if constexpr (maybe_denseskop) {
+        if (!S.buff) {
+            // DenseSkOp doesn't permit defining a "black box" distribution, so we have to pack the submatrix
+            // into an equivalent datastructure ourselves.
+            auto submat_S = submatrix_as_blackbox<BLASFriendlyOperator<T>>(S, rows_submat_S, cols_submat_S, ro_s, co_s);
+            rskge3(layout, opA, opS, m, d, n, alpha, A, lda, submat_S, 0, 0, beta, B, ldb);
+            return;
+        }
     }
-    randblas_require( S.dist.n_rows >= rows_submat_S + ro_s );
-    randblas_require( S.dist.n_cols >= cols_submat_S + co_s );
+    randblas_require( S.buff != nullptr );
+    randblas_require( S.n_rows >= rows_submat_S + ro_s );
+    randblas_require( S.n_cols >= cols_submat_S + co_s );
     auto [rows_A, cols_A] = dims_before_op(m, n, opA);
     if (layout == blas::Layout::ColMajor) {
         randblas_require(lda >= rows_A);
@@ -354,7 +343,7 @@ void rskge3(
         randblas_require(ldb >= d);
     }
 
-    auto [pos, lds] = offset_and_ldim(S.layout, S.dist.n_rows, S.dist.n_cols, ro_s, co_s);
+    auto [pos, lds] = offset_and_ldim(S.layout, S.n_rows, S.n_cols, ro_s, co_s);
     T* S_ptr = &S.buff[pos];
     if (S.layout != layout)
         opS = (opS == blas::Op::NoTrans) ? blas::Op::Trans : blas::Op::NoTrans;
@@ -371,23 +360,12 @@ namespace RandBLAS::sparse {
 // MARK: LSKGES
 
 // =============================================================================
-/// @verbatim embed:rst:leading-slashes
-///
-///   .. |op| mathmacro:: \operatorname{op}
-///   .. |mat| mathmacro:: \operatorname{mat}
-///   .. |submat| mathmacro:: \operatorname{submat}
-///   .. |lda| mathmacro:: \mathrm{lda}
-///   .. |ldb| mathmacro:: \mathrm{ldb}
-///   .. |opA| mathmacro:: \mathrm{opA}
-///   .. |opS| mathmacro:: \mathrm{opS}
-///
-/// @endverbatim
 /// LSKGES: Perform a GEMM-like operation
 /// @verbatim embed:rst:leading-slashes
 /// .. math::
-///     \mat(B) = \alpha \cdot \underbrace{\op(\submat(S))}_{d \times m} \cdot \underbrace{\op(\mat(A))}_{m \times n} + \beta \cdot \underbrace{\mat(B)}_{d \times n},    \tag{$\star$}
+///     \mat(B) = \alpha \cdot \underbrace{\op(\submat(\mtxS))}_{d \times m} \cdot \underbrace{\op(\mat(A))}_{m \times n} + \beta \cdot \underbrace{\mat(B)}_{d \times n},    \tag{$\star$}
 /// @endverbatim
-/// where \math{\alpha} and \math{\beta} are real scalars, \math{\op(X)} either returns a matrix \math{X}
+/// where \math{\alpha} and \math{\beta} are real scalars, \math{\op(\mtxX)} either returns a matrix \math{X}
 /// or its transpose, and \math{S} is a sparse sketching operator.
 /// 
 /// @verbatim embed:rst:leading-slashes
@@ -396,19 +374,19 @@ namespace RandBLAS::sparse {
 ///     Their precise contents are determined by :math:`(A, \lda)`, :math:`(B, \ldb)`,
 ///     and "layout", following the same convention as BLAS.
 ///
-/// What is :math:`\submat(S)`?
+/// What is :math:`\submat(\mtxS)`?
 ///     Its shape is defined implicitly by :math:`(\opS, d, m)`.
-///     If :math:`{\submat(S)}` is of shape :math:`r \times c`,
-///     then it is the :math:`r \times c` submatrix of :math:`{S}` whose upper-left corner
-///     appears at index :math:`(\texttt{ro_s}, \texttt{co_s})` of :math:`{S}`.
+///     If :math:`{\submat(\mtxS)}` is of shape :math:`r \times c`,
+///     then it is the :math:`r \times c` submatrix of :math:`{\mtxS}` whose upper-left corner
+///     appears at index :math:`(\texttt{ro_s}, \texttt{co_s})` of :math:`{\mtxS}`.
 /// @endverbatim
 /// @param[in] layout
 ///     Layout::ColMajor or Layout::RowMajor
 ///      - Matrix storage for \math{\mat(A)} and \math{\mat(B)}.
 ///
 /// @param[in] opS
-///      - If \math{\opS} = NoTrans, then \math{ \op(\submat(S)) = \submat(S)}.
-///      - If \math{\opS} = Trans, then \math{\op(\submat(S)) = \submat(S)^T }.
+///      - If \math{\opS} = NoTrans, then \math{ \op(\submat(\mtxS)) = \submat(\mtxS)}.
+///      - If \math{\opS} = Trans, then \math{\op(\submat(\mtxS)) = \submat(\mtxS)^T }.
 ///
 /// @param[in] opA
 ///      - If \math{\opA} == NoTrans, then \math{\op(\mat(A)) = \mat(A)}.
@@ -426,7 +404,7 @@ namespace RandBLAS::sparse {
 ///
 /// @param[in] m
 ///     A nonnegative integer.
-///     - The number of columns in \math{\op(\submat(S))}
+///     - The number of columns in \math{\op(\submat(\mtxS))}
 ///     - The number of rows in \math{\op(\mat(A))}.
 ///
 /// @param[in] alpha
@@ -435,17 +413,17 @@ namespace RandBLAS::sparse {
 ///
 /// @param[in] S
 ///    A SparseSkOp object.
-///    - Defines \math{\submat(S)}.
+///    - Defines \math{\submat(\mtxS)}.
 ///
 /// @param[in] ro_s
 ///     A nonnegative integer.
-///     - The rows of \math{\submat(S)} are a contiguous subset of rows of \math{S}.
-///     - The rows of \math{\submat(S)} start at \math{S[\texttt{ro_s}, :]}.
+///     - The rows of \math{\submat(\mtxS)} are a contiguous subset of rows of \math{S}.
+///     - The rows of \math{\submat(\mtxS)} start at \math{S[\texttt{ro_s}, :]}.
 ///
 /// @param[in] co_s
 ///     A nonnnegative integer.
-///     - The columns of \math{\submat(S)} are a contiguous subset of columns of \math{S}.
-///     - The columns \math{\submat(S)} start at \math{S[:,\texttt{co_s}]}. 
+///     - The columns of \math{\submat(\mtxS)} are a contiguous subset of columns of \math{S}.
+///     - The columns \math{\submat(\mtxS)} start at \math{S[:,\texttt{co_s}]}. 
 ///
 /// @param[in] A
 ///     Pointer to a 1D array of real scalars.
@@ -457,13 +435,13 @@ namespace RandBLAS::sparse {
 ///     * If layout == ColMajor, then
 ///         @verbatim embed:rst:leading-slashes
 ///             .. math::
-///                 \mat(A)[i, j] = A[i + j \cdot \lda].
+///                 \mat(A)_{ij} = A[i + j \cdot \lda].
 ///         @endverbatim
 ///       In this case, \math{\lda} must be \math{\geq} the length of a column in \math{\mat(A)}.
 ///     * If layout == RowMajor, then
 ///         @verbatim embed:rst:leading-slashes
 ///             .. math::
-///                 \mat(A)[i, j] = A[i \cdot \lda + j].
+///                 \mat(A)_{ij} = A[i \cdot \lda + j].
 ///         @endverbatim
 ///       In this case, \math{\lda} must be \math{\geq} the length of a row in \math{\mat(A)}.
 ///
@@ -482,16 +460,16 @@ namespace RandBLAS::sparse {
 ///    - Leading dimension of \math{\mat(B)} when reading from \math{B}.
 ///    - Refer to documentation for \math{\lda} for details. 
 ///
-template <typename T, typename SKOP>
-inline void lskges(
+template <typename T, typename RNG, SignedInteger sint_t>
+void lskges(
     blas::Layout layout,
     blas::Op opS,
     blas::Op opA,
     int64_t d, // B is d-by-n
     int64_t n, // \op(A) is m-by-n
-    int64_t m, // \op(S) is d-by-m
+    int64_t m, // \op(\mtxS) is d-by-m
     T alpha,
-    SKOP &S,
+    SparseSkOp<T,RNG,sint_t> &S,
     int64_t ro_s,
     int64_t co_s,
     const T *A,
@@ -500,12 +478,14 @@ inline void lskges(
     T *B,
     int64_t ldb
 ) {
-    if (!S.known_filled)
-        fill_sparse(S);
+    if (S.nnz < 0) {
+        SparseSkOp<T,RNG,sint_t> shallowcopy(S.dist, S.seed_state); // shallowcopy.own_memory = true.
+        fill_sparse(shallowcopy);
+        lskges(layout, opS, opA, d, n, m, alpha, shallowcopy, ro_s, co_s, A, lda, beta, B, ldb);
+        return;
+    }
     auto Scoo = coo_view_of_skop(S);
-    left_spmm(
-        layout, opS, opA, d, n, m, alpha, Scoo, ro_s, co_s, A, lda, beta, B, ldb
-    );
+    left_spmm(layout, opS, opA, d, n, m, alpha, Scoo, ro_s, co_s, A, lda, beta, B, ldb);
     return;
 }
 
@@ -516,9 +496,9 @@ inline void lskges(
 /// RSKGES: Perform a GEMM-like operation
 /// @verbatim embed:rst:leading-slashes
 /// .. math::
-///     \mat(B) = \alpha \cdot \underbrace{\op(\mat(A))}_{m \times n} \cdot \underbrace{\op(\submat(S))}_{n \times d} + \beta \cdot \underbrace{\mat(B)}_{m \times d},    \tag{$\star$}
+///     \mat(B) = \alpha \cdot \underbrace{\op(\mat(A))}_{m \times n} \cdot \underbrace{\op(\submat(\mtxS))}_{n \times d} + \beta \cdot \underbrace{\mat(B)}_{m \times d},    \tag{$\star$}
 /// @endverbatim
-/// where \math{\alpha} and \math{\beta} are real scalars, \math{\op(X)} either returns a matrix \math{X}
+/// where \math{\alpha} and \math{\beta} are real scalars, \math{\op(\mtxX)} either returns a matrix \math{X}
 /// or its transpose, and \math{S} is a sparse sketching operator.
 /// 
 /// @verbatim embed:rst:leading-slashes
@@ -527,11 +507,11 @@ inline void lskges(
 ///     Their precise contents are determined by :math:`(A, \lda)`, :math:`(B, \ldb)`,
 ///     and "layout", following the same convention as BLAS.
 ///
-/// What is :math:`\submat(S)`?
+/// What is :math:`\submat(\mtxS)`?
 ///     Its shape is defined implicitly by :math:`(\opS, n, d)`.
-///     If :math:`{\submat(S)}` is of shape :math:`r \times c`,
-///     then it is the :math:`r \times c` submatrix of :math:`{S}` whose upper-left corner
-///     appears at index :math:`(\texttt{ro_s}, \texttt{co_s})` of :math:`{S}`.
+///     If :math:`{\submat(\mtxS)}` is of shape :math:`r \times c`,
+///     then it is the :math:`r \times c` submatrix of :math:`{\mtxS}` whose upper-left corner
+///     appears at index :math:`(\texttt{ro_s}, \texttt{co_s})` of :math:`{\mtxS}`.
 /// @endverbatim
 /// @param[in] layout
 ///     Layout::ColMajor or Layout::RowMajor
@@ -542,8 +522,8 @@ inline void lskges(
 ///      - If \math{\opA} == Trans, then \math{\op(\mat(A)) = \mat(A)^T}.
 ///
 /// @param[in] opS
-///      - If \math{\opS} = NoTrans, then \math{ \op(\submat(S)) = \submat(S)}.
-///      - If \math{\opS} = Trans, then \math{\op(\submat(S)) = \submat(S)^T }.
+///      - If \math{\opS} = NoTrans, then \math{ \op(\submat(\mtxS)) = \submat(\mtxS)}.
+///      - If \math{\opS} = Trans, then \math{\op(\submat(\mtxS)) = \submat(\mtxS)^T }.
 ///
 /// @param[in] m
 ///     A nonnegative integer.
@@ -558,7 +538,7 @@ inline void lskges(
 /// @param[in] n
 ///     A nonnegative integer.
 ///     - The number of columns in \math{\op(\mat(A))}
-///     - The number of rows in \math{\op(\submat(S))}.
+///     - The number of rows in \math{\op(\submat(\mtxS))}.
 ///
 /// @param[in] alpha
 ///     A real scalar.
@@ -574,29 +554,29 @@ inline void lskges(
 ///     * If layout == ColMajor, then
 ///         @verbatim embed:rst:leading-slashes
 ///             .. math::
-///                 \mat(A)[i, j] = A[i + j \cdot \lda].
+///                 \mat(A)_{ij} = A[i + j \cdot \lda].
 ///         @endverbatim
 ///       In this case, \math{\lda} must be \math{\geq} the length of a column in \math{\mat(A)}.
 ///     * If layout == RowMajor, then
 ///         @verbatim embed:rst:leading-slashes
 ///             .. math::
-///                 \mat(A)[i, j] = A[i \cdot \lda + j].
+///                 \mat(A)_{ij} = A[i \cdot \lda + j].
 ///         @endverbatim
 ///       In this case, \math{\lda} must be \math{\geq} the length of a row in \math{\mat(A)}.
 ///
 /// @param[in] S
 ///    A SparseSkOp object.
-///    - Defines \math{\submat(S)}.
+///    - Defines \math{\submat(\mtxS)}.
 ///
 /// @param[in] ro_s
 ///     A nonnegative integer.
-///     - The rows of \math{\submat(S)} are a contiguous subset of rows of \math{S}.
-///     - The rows of \math{\submat(S)} start at \math{S[\texttt{ro_s}, :]}.
+///     - The rows of \math{\submat(\mtxS)} are a contiguous subset of rows of \math{S}.
+///     - The rows of \math{\submat(\mtxS)} start at \math{S[\texttt{ro_s}, :]}.
 ///
 /// @param[in] co_s
 ///     A nonnnegative integer.
-///     - The columns of \math{\submat(S)} are a contiguous subset of columns of \math{S}.
-///     - The columns \math{\submat(S)} start at \math{S[:,\texttt{co_s}]}. 
+///     - The columns of \math{\submat(\mtxS)} are a contiguous subset of columns of \math{S}.
+///     - The columns \math{\submat(\mtxS)} start at \math{S[:,\texttt{co_s}]}. 
 ///
 /// @param[in] beta
 ///     A real scalar.
@@ -613,7 +593,7 @@ inline void lskges(
 ///    - Leading dimension of \math{\mat(B)} when reading from \math{B}.
 ///    - Refer to documentation for \math{\lda} for details. 
 ///
-template <typename T, typename SKOP>
+template <typename T, typename RNG, SignedInteger sint_t>
 inline void rskges(
     blas::Layout layout,
     blas::Op opA,
@@ -624,15 +604,18 @@ inline void rskges(
     T alpha,
     const T *A,
     int64_t lda,
-    SKOP &S,
+    SparseSkOp<T,RNG,sint_t> &S,
     int64_t ro_s,
     int64_t co_s,
     T beta,
     T *B,
     int64_t ldb
 ) { 
-    if (!S.known_filled)
-        fill_sparse(S);
+    if (S.nnz < 0) {
+        SparseSkOp<T,RNG,sint_t> shallowcopy(S.dist, S.seed_state); // shallowcopy.own_memory = true.
+        fill_sparse(shallowcopy);
+        rskges(layout, opA, opS, m, d, n, alpha, A, lda, shallowcopy, ro_s, co_s, beta, B, ldb);
+    }
     auto Scoo = coo_view_of_skop(S);
     right_spmm(
         layout, opA, opS, m, d, n, alpha, A, lda, Scoo, ro_s, co_s, beta, B, ldb
@@ -659,10 +642,10 @@ using namespace RandBLAS::sparse;
 /// Sketch from the left in a GEMM-like operation
 /// 
 /// .. math::
-///     \mat(B) = \alpha \cdot \underbrace{\op(\submat(S))}_{d \times m} \cdot \underbrace{\op(\mat(A))}_{m \times n} + \beta \cdot \underbrace{\mat(B)}_{d \times n},    \tag{$\star$}
+///     \mat(B) = \alpha \cdot \underbrace{\op(\submat(\mtxS))}_{d \times m} \cdot \underbrace{\op(\mat(A))}_{m \times n} + \beta \cdot \underbrace{\mat(B)}_{d \times n},    \tag{$\star$}
 /// 
-/// where :math:`\alpha` and :math:`\beta` are real scalars, :math:`\op(X)` either returns a matrix :math:`X`
-/// or its transpose, and :math:`S` is a sketching operator.
+/// where :math:`\alpha` and :math:`\beta` are real scalars, :math:`\op(\mtxX)` either returns a matrix :math:`\mtxX`
+/// or its transpose, and :math:`\mtxS` is a sketching operator.
 ///
 /// .. dropdown:: FAQ
 ///   :animate: fade-in-slide-down
@@ -676,24 +659,24 @@ using namespace RandBLAS::sparse;
 ///       If layout == ColMajor, then
 ///
 ///             .. math::
-///                 \mat(A)[i, j] = A[i + j \cdot \lda].
+///                 \mat(A)_{ij} = A[i + j \cdot \lda].
 ///
 ///       In this case, :math:`\lda` must be :math:`\geq` the length of a column in :math:`\mat(A).`
 ///
 ///       If layout == RowMajor, then
 ///
 ///             .. math::
-///                 \mat(A)[i, j] = A[i \cdot \lda + j].
+///                 \mat(A)_{ij} = A[i \cdot \lda + j].
 ///
 ///       In this case, :math:`\lda` must be :math:`\geq` the length of a row in :math:`\mat(A).`
 ///
-///     **What is** :math:`\submat(S)` **?**
+///     **What is** :math:`\submat(\mtxS)` **?**
 ///
 ///       Its shape is defined implicitly by :math:`(\opS, d, m).`
 ///
-///       If :math:`{\submat(S)}` is of shape :math:`r \times c,`
-///       then it is the :math:`r \times c` submatrix of :math:`{S}` whose upper-left corner
-///       appears at index :math:`(\texttt{ro_s}, \texttt{co_s})` of :math:`{S}.`
+///       If :math:`{\submat(\mtxS)}` is of shape :math:`r \times c,`
+///       then it is the :math:`r \times c` submatrix of :math:`{\mtxS}` whose upper-left corner
+///       appears at index :math:`(\texttt{ro_s}, \texttt{co_s})` of :math:`{\mtxS}.`
 ///
 /// .. dropdown:: Full parameter descriptions
 ///     :animate: fade-in-slide-down
@@ -704,8 +687,8 @@ using namespace RandBLAS::sparse;
 ///
 ///      opS - [in]
 ///       * Either Op::Trans or Op::NoTrans.
-///       * If :math:`\opS` = NoTrans, then :math:`\op(\submat(S)) = \submat(S).`
-///       * If :math:`\opS` = Trans, then :math:`\op(\submat(S)) = \submat(S)^T.`
+///       * If :math:`\opS` = NoTrans, then :math:`\op(\submat(\mtxS)) = \submat(\mtxS).`
+///       * If :math:`\opS` = Trans, then :math:`\op(\submat(\mtxS)) = \submat(\mtxS)^T.`
 ///
 ///      opA - [in]
 ///       * If :math:`\opA` == NoTrans, then :math:`\op(\mat(A)) = \mat(A).`
@@ -714,7 +697,7 @@ using namespace RandBLAS::sparse;
 ///      d - [in]
 ///       * A nonnegative integer.
 ///       * The number of rows in :math:`\mat(B)`
-///       * The number of rows in :math:`\op(\submat(S)).`
+///       * The number of rows in :math:`\op(\submat(\mtxS)).`
 ///
 ///      n - [in]
 ///       * A nonnegative integer.
@@ -723,7 +706,7 @@ using namespace RandBLAS::sparse;
 ///
 ///      m - [in]
 ///       * A nonnegative integer.
-///       * The number of columns in :math:`\op(\submat(S))`
+///       * The number of columns in :math:`\op(\submat(\mtxS))`
 ///       * The number of rows in :math:`\op(\mat(A)).`
 ///
 ///      alpha - [in]
@@ -732,17 +715,17 @@ using namespace RandBLAS::sparse;
 ///
 ///      S - [in]  
 ///       * A DenseSkOp or SparseSkOp object.
-///       * Defines :math:`\submat(S).`
+///       * Defines :math:`\submat(\mtxS).`
 ///
 ///      ro_s - [in]
 ///       * A nonnegative integer.
-///       * The rows of :math:`\submat(S)` are a contiguous subset of rows of :math:`S.`
-///       * The rows of :math:`\submat(S)` start at :math:`S[\texttt{ro_s}, :].`
+///       * The rows of :math:`\submat(\mtxS)` are a contiguous subset of rows of :math:`S.`
+///       * The rows of :math:`\submat(\mtxS)` start at :math:`S[\texttt{ro_s}, :].`
 ///
 ///      co_s - [in]
 ///       * A nonnnegative integer.
-///       * The columns of :math:`\submat(S)` are a contiguous subset of columns of :math:`S.`
-///       * The columns of :math:`\submat(S)` start at :math:`S[:,\texttt{co_s}].` 
+///       * The columns of :math:`\submat(\mtxS)` are a contiguous subset of columns of :math:`S.`
+///       * The columns of :math:`\submat(\mtxS)` start at :math:`S[:,\texttt{co_s}].` 
 ///
 ///      A - [in]
 ///       * Pointer to a 1D array of real scalars.
@@ -768,14 +751,14 @@ using namespace RandBLAS::sparse;
 ///       * Leading dimension of :math:`\mat(B)` when reading from :math:`B.`
 ///
 /// @endverbatim
-template <typename T, typename SKOP>
+template <typename T, SketchingOperator SKOP>
 inline void sketch_general(
     blas::Layout layout,
     blas::Op opS,
     blas::Op opA,
     int64_t d, // B is d-by-n
     int64_t n, // op(A) is m-by-n
-    int64_t m, // op(submat(S)) is d-by-m
+    int64_t m, // op(submat(\mtxS)) is d-by-m
     T alpha,
     SKOP &S,
     int64_t ro_s,
@@ -794,7 +777,7 @@ inline void sketch_general(
     blas::Op opA,
     int64_t d, // B is d-by-n
     int64_t n, // op(A) is m-by-n
-    int64_t m, // op(submat(S)) is d-by-m
+    int64_t m, // op(submat(\mtxS)) is d-by-m
     T alpha,
     SparseSkOp<T, RNG> &S,
     int64_t ro_s,
@@ -818,7 +801,7 @@ inline void sketch_general(
     blas::Op opA,
     int64_t d, // B is d-by-n
     int64_t n, // op(A) is m-by-n
-    int64_t m, // op(submat(S)) is d-by-m
+    int64_t m, // op(submat(\mtxS)) is d-by-m
     T alpha,
     DenseSkOp<T, RNG> &S,
     int64_t ro_s,
@@ -845,10 +828,10 @@ inline void sketch_general(
 /// Sketch from the right in a GEMM-like operation
 ///
 /// .. math::
-///     \mat(B) = \alpha \cdot \underbrace{\op(\mat(A))}_{m \times n} \cdot \underbrace{\op(\submat(S))}_{n \times d} + \beta \cdot \underbrace{\mat(B)}_{m \times d},    \tag{$\star$}
+///     \mat(B) = \alpha \cdot \underbrace{\op(\mat(A))}_{m \times n} \cdot \underbrace{\op(\submat(\mtxS))}_{n \times d} + \beta \cdot \underbrace{\mat(B)}_{m \times d},    \tag{$\star$}
 /// 
-/// where :math:`\alpha` and :math:`\beta` are real scalars, :math:`\op(X)` either returns a matrix :math:`X`
-/// or its transpose, and :math:`S` is a sketching operator.
+/// where :math:`\alpha` and :math:`\beta` are real scalars, :math:`\op(\mtxX)` either returns a matrix :math:`\mtxX`
+/// or its transpose, and :math:`\mtxS` is a sketching operator.
 /// 
 /// .. dropdown:: FAQ
 ///   :animate: fade-in-slide-down
@@ -859,12 +842,12 @@ inline void sketch_general(
 ///       Their precise contents are determined by :math:`(A, \lda),` :math:`(B, \ldb),`
 ///       and "layout", following the same convention as the Level 3 BLAS function "GEMM."
 ///
-///     **What is** :math:`\submat(S)` **?**
+///     **What is** :math:`\submat(\mtxS)` **?**
 ///
 ///       Its shape is defined implicitly by :math:`(\opS, n, d).`
-///       If :math:`{\submat(S)}` is of shape :math:`r \times c,`
-///       then it is the :math:`r \times c` submatrix of :math:`{S}` whose upper-left corner
-///       appears at index :math:`(\texttt{ro_s}, \texttt{co_s})` of :math:`{S}.`
+///       If :math:`{\submat(\mtxS)}` is of shape :math:`r \times c,`
+///       then it is the :math:`r \times c` submatrix of :math:`{\mtxS}` whose upper-left corner
+///       appears at index :math:`(\texttt{ro_s}, \texttt{co_s})` of :math:`{\mtxS}.`
 ///
 /// .. dropdown:: Full parameter descriptions
 ///     :animate: fade-in-slide-down
@@ -879,8 +862,8 @@ inline void sketch_general(
 ///
 ///      opS - [in]
 ///       * Either Op::Trans or Op::NoTrans.
-///       * If :math:`\opS` = NoTrans, then :math:`\op(\submat(S)) = \submat(S).`
-///       * If :math:`\opS` = Trans, then :math:`\op(\submat(S)) = \submat(S)^T.`
+///       * If :math:`\opS` = NoTrans, then :math:`\op(\submat(\mtxS)) = \submat(\mtxS).`
+///       * If :math:`\opS` = Trans, then :math:`\op(\submat(\mtxS)) = \submat(\mtxS)^T.`
 ///
 ///      m - [in]
 ///       * A nonnegative integer.
@@ -890,12 +873,12 @@ inline void sketch_general(
 ///      d - [in]
 ///       * A nonnegative integer.
 ///       * The number of columns in :math:`\mat(B)`
-///       * The number of columns in :math:`\op(\submat(S)).`
+///       * The number of columns in :math:`\op(\submat(\mtxS)).`
 ///
 ///      n - [in]
 ///       * A nonnegative integer.
 ///       * The number of columns in :math:`\op(\mat(A)).`
-///       * The number of rows in :math:`\op(\submat(S)).`
+///       * The number of rows in :math:`\op(\submat(\mtxS)).`
 ///
 ///      alpha - [in]
 ///       * A real scalar.
@@ -911,18 +894,18 @@ inline void sketch_general(
 ///
 ///      S - [in]  
 ///       * A DenseSkOp or SparseSkOp object.
-///       * Defines :math:`\submat(S).`
-///       * Defines :math:`\submat(S).`
+///       * Defines :math:`\submat(\mtxS).`
+///       * Defines :math:`\submat(\mtxS).`
 ///
 ///      ro_s - [in]
 ///       * A nonnegative integer.
-///       * The rows of :math:`\submat(S)` are a contiguous subset of rows of :math:`S.`
-///       * The rows of :math:`\submat(S)` start at :math:`S[\texttt{ro_s}, :].`
+///       * The rows of :math:`\submat(\mtxS)` are a contiguous subset of rows of :math:`S.`
+///       * The rows of :math:`\submat(\mtxS)` start at :math:`S[\texttt{ro_s}, :].`
 ///
 ///      co_s - [in]
 ///       * A nonnegative integer.
-///       * The columns of :math:`\submat(S)` are a contiguous subset of columns of :math:`S.`
-///       * The columns :math:`\submat(S)` start at :math:`S[:,\texttt{co_s}].` 
+///       * The columns of :math:`\submat(\mtxS)` are a contiguous subset of columns of :math:`S.`
+///       * The columns :math:`\submat(\mtxS)` start at :math:`S[:,\texttt{co_s}].` 
 ///
 ///      beta - [in]
 ///       * A real scalar.
@@ -940,13 +923,13 @@ inline void sketch_general(
 ///       * Leading dimension of :math:`\mat(B)` when reading from :math:`B.`
 ///
 /// @endverbatim
-template <typename T, typename SKOP>
+template <typename T, SketchingOperator SKOP>
 inline void sketch_general(
     blas::Layout layout,
     blas::Op opA,
     blas::Op opS,
     int64_t m, // B is m-by-d
-    int64_t d, // op(submat(S)) is n-by-d
+    int64_t d, // op(submat(\mtxS)) is n-by-d
     int64_t n, // op(A) is m-by-n
     T alpha,
     const T *A,
@@ -965,7 +948,7 @@ inline void sketch_general(
     blas::Op opA,
     blas::Op opS,
     int64_t m, // B is m-by-d
-    int64_t d, // op(submat(S)) is n-by-d
+    int64_t d, // op(submat(\mtxS)) is n-by-d
     int64_t n, // op(A) is m-by-n
     T alpha,
     const T *A,
@@ -989,7 +972,7 @@ inline void sketch_general(
     blas::Op opA,
     blas::Op opS,
     int64_t m, // B is m-by-d
-    int64_t d, // op(submat(S)) is n-by-d
+    int64_t d, // op(submat(\mtxS)) is n-by-d
     int64_t n, // op(A) is m-by-n
     T alpha,
     const T *A,
@@ -1017,10 +1000,10 @@ inline void sketch_general(
 /// Sketch from the left in a GEMM-like operation
 ///
 /// .. math::
-///     \mat(B) = \alpha \cdot \underbrace{\op(S)}_{d \times m} \cdot \underbrace{\op(\mat(A))}_{m \times n} + \beta \cdot \underbrace{\mat(B)}_{d \times n},    \tag{$\star$}
+///     \mat(B) = \alpha \cdot \underbrace{\op(\mtxS)}_{d \times m} \cdot \underbrace{\op(\mat(A))}_{m \times n} + \beta \cdot \underbrace{\mat(B)}_{d \times n},    \tag{$\star$}
 ///
-/// where :math:`\alpha` and :math:`\beta` are real scalars, :math:`\op(X)` either returns a matrix :math:`X`
-/// or its transpose, and :math:`S` is a sketching operator.
+/// where :math:`\alpha` and :math:`\beta` are real scalars, :math:`\op(\mtxX)` either returns a matrix :math:`\mtxX`
+/// or its transpose, and :math:`\mtxS` is a sketching operator.
 ///
 /// .. dropdown:: Full parameter descriptions
 ///     :animate: fade-in-slide-down
@@ -1031,8 +1014,8 @@ inline void sketch_general(
 ///
 ///      opS - [in]
 ///       * Either Op::Trans or Op::NoTrans.
-///       * If :math:`\opS` = NoTrans, then :math:`\op(S) = S.`
-///       * If :math:`\opS` = Trans, then :math:`\op(S) = S^T.`
+///       * If :math:`\opS` = NoTrans, then :math:`\op(\mtxS) = \mtxS.`
+///       * If :math:`\opS` = Trans, then :math:`\op(\mtxS) = \mtxS^T.`
 ///
 ///      opA - [in]
 ///       * If :math:`\opA` == NoTrans, then :math:`\op(\mat(A)) = \mat(A).`
@@ -1050,7 +1033,7 @@ inline void sketch_general(
 ///
 ///      m - [in]
 ///       * A nonnegative integer.
-///       * The number of columns in :math:`\op(S).`
+///       * The number of columns in :math:`\op(\mtxS).`
 ///       * The number of rows in :math:`\op(\mat(A)).`
 ///
 ///      alpha - [in]
@@ -1059,7 +1042,7 @@ inline void sketch_general(
 ///
 ///      S - [in]  
 ///       * A DenseSkOp or SparseSkOp object.
-///       * Defines :math:`\submat(S).`
+///       * Defines :math:`\submat(\mtxS).`
 ///
 ///      A - [in]
 ///       * Pointer to a 1D array of real scalars.
@@ -1085,7 +1068,7 @@ inline void sketch_general(
 ///       * Leading dimension of :math:`\mat(B)` when reading from :math:`B.`
 ///
 /// @endverbatim
-template <typename T, typename SKOP>
+template <typename T, SketchingOperator SKOP>
 inline void sketch_general(
     blas::Layout layout,
     blas::Op opS,
@@ -1102,11 +1085,11 @@ inline void sketch_general(
     int64_t ldb
 ) {
     if (opS == blas::Op::NoTrans) {
-        randblas_require(S.dist.n_rows == d);
-        randblas_require(S.dist.n_cols == m);
+        randblas_require(S.n_rows == d);
+        randblas_require(S.n_cols == m);
     } else {
-        randblas_require(S.dist.n_rows == m);
-        randblas_require(S.dist.n_cols == d);
+        randblas_require(S.n_rows == m);
+        randblas_require(S.n_cols == d);
     }
     return sketch_general(layout, opS, opA, d, n, m, alpha, S, 0, 0, A, lda, beta, B, ldb);
 };
@@ -1120,10 +1103,10 @@ inline void sketch_general(
 /// Sketch from the right in a GEMM-like operation
 ///
 /// .. math::
-///     \mat(B) = \alpha \cdot \underbrace{\op(\mat(A))}_{m \times n} \cdot \underbrace{\op(S)}_{n \times d} + \beta \cdot \underbrace{\mat(B)}_{m \times d},    \tag{$\star$}
+///     \mat(B) = \alpha \cdot \underbrace{\op(\mat(A))}_{m \times n} \cdot \underbrace{\op(\mtxS)}_{n \times d} + \beta \cdot \underbrace{\mat(B)}_{m \times d},    \tag{$\star$}
 ///
-/// where :math:`\alpha` and :math:`\beta` are real scalars, :math:`\op(X)` either returns a matrix :math:`X`
-/// or its transpose, and :math:`S` is a sketching operator.
+/// where :math:`\alpha` and :math:`\beta` are real scalars, :math:`\op(\mtxX)` either returns a matrix :math:`\mtxX`
+/// or its transpose, and :math:`\mtxS` is a sketching operator.
 ///
 /// .. dropdown:: Full parameter descriptions
 ///     :animate: fade-in-slide-down
@@ -1138,8 +1121,8 @@ inline void sketch_general(
 ///
 ///      opS - [in]
 ///       * Either Op::Trans or Op::NoTrans.
-///       * If :math:`\opS` = NoTrans, then :math:`\op(S) = S.`
-///       * If :math:`\opS` = Trans, then :math:`\op(S) = S^T.`
+///       * If :math:`\opS` = NoTrans, then :math:`\op(\mtxS) = \mtxS.`
+///       * If :math:`\opS` = Trans, then :math:`\op(\mtxS) = \mtxS^T.`
 ///
 ///      m - [in]
 ///       * A nonnegative integer.
@@ -1154,7 +1137,7 @@ inline void sketch_general(
 ///      n - [in]
 ///       * A nonnegative integer.
 ///       * The number of columns in :math:`\op(\mat(A)).`
-///       * The number of rows in :math:`\op(S).`
+///       * The number of rows in :math:`\op(\mtxS).`
 ///
 ///      alpha - [in]
 ///       * A real scalar.
@@ -1187,7 +1170,7 @@ inline void sketch_general(
 ///       * Leading dimension of :math:`\mat(B)` when reading from :math:`B.`
 ///
 /// @endverbatim
-template <typename T, typename SKOP>
+template <typename T, SketchingOperator SKOP>
 inline void sketch_general(
     blas::Layout layout,
     blas::Op opA,
@@ -1204,11 +1187,11 @@ inline void sketch_general(
     int64_t ldb
 ) {
     if (opS == blas::Op::NoTrans) {
-        randblas_require(S.dist.n_rows == n);
-        randblas_require(S.dist.n_cols == d);
+        randblas_require(S.n_rows == n);
+        randblas_require(S.n_cols == d);
     } else {
-        randblas_require(S.dist.n_rows == d);
-        randblas_require(S.dist.n_cols == n);
+        randblas_require(S.n_rows == d);
+        randblas_require(S.n_cols == n);
     }
     return sketch_general(layout, opA, opS, m, d, n, alpha, A, lda, S, 0, 0, beta, B, ldb);
 };
diff --git a/RandBLAS/sksy.hh b/RandBLAS/sksy.hh
index 63c0954d..7fa49191 100644
--- a/RandBLAS/sksy.hh
+++ b/RandBLAS/sksy.hh
@@ -38,13 +38,6 @@ namespace RandBLAS {
 using namespace RandBLAS::dense;
 using namespace RandBLAS::sparse;
 
-/* Intended macro definitions.
-
-   .. |mat| mathmacro:: \operatorname{mat}
-   .. |submat| mathmacro:: \operatorname{submat}
-   .. |lda| mathmacro:: \texttt{lda}
-   .. |ldb| mathmacro:: \texttt{ldb}
-*/
 
 // MARK: SUBMAT(S)
 
@@ -58,9 +51,9 @@ using namespace RandBLAS::sparse;
 /// Check that :math:`\mat(A)` is symmetric up to tolerance :math:`\texttt{sym_check_tol}`, then sketch from the right in a SYMM-like operation
 /// 
 /// .. math::
-///     \mat(B) = \alpha \cdot \underbrace{\mat(A)}_{n \times n} \cdot \underbrace{\submat(S)}_{n \times d}  + \beta \cdot \underbrace{\mat(B)}_{n \times d},    \tag{$\star$}
+///     \mat(B) = \alpha \cdot \underbrace{\mat(A)}_{n \times n} \cdot \underbrace{\submat(\mtxS)}_{n \times d}  + \beta \cdot \underbrace{\mat(B)}_{n \times d},    \tag{$\star$}
 /// 
-/// where :math:`\alpha` and :math:`\beta` are real scalars and :math:`S` is a sketching operator.
+/// where :math:`\alpha` and :math:`\beta` are real scalars and :math:`\mtxS` is a sketching operator.
 ///
 /// .. dropdown:: FAQ
 ///   :animate: fade-in-slide-down
@@ -71,7 +64,7 @@ using namespace RandBLAS::sparse;
 ///       according to 
 ///
 ///             .. math::
-///                 \mat(A)[i, j] = A[i + j \cdot \lda] = A[i \cdot \lda + j].
+///                 \mat(A)_{ij} = A[i + j \cdot \lda] = A[i \cdot \lda + j].
 ///
 ///       Note that the the "layout" parameter passed to this function is not used here.
 ///       That's because this function requires :math:`\mat(A)` to be stored in the format
@@ -88,21 +81,21 @@ using namespace RandBLAS::sparse;
 ///       If layout == ColMajor, then
 ///
 ///             .. math::
-///                 \mat(B)[i, j] = B[i + j \cdot \ldb].
+///                 \mat(B)_{ij} = B[i + j \cdot \ldb].
 ///
 ///       In this case, :math:`\ldb` must be :math:`\geq n.`
 ///
 ///       If layout == RowMajor, then
 ///
 ///             .. math::
-///                 \mat(B)[i, j] = B[i \cdot \ldb + j].
+///                 \mat(B)_{ij} = B[i \cdot \ldb + j].
 ///
 ///       In this case, :math:`\ldb` must be :math:`\geq d.`
 ///
-///     **What is** :math:`\submat(S)` **?**
+///     **What is** :math:`\submat(\mtxS)` **?**
 ///
-///       It's the :math:`n \times d` submatrix of :math:`{S}` whose upper-left corner appears
-///       at index :math:`(\texttt{ro_s}, \texttt{co_s})` of :math:`{S}.`
+///       It's the :math:`n \times d` submatrix of :math:`{\mtxS}` whose upper-left corner appears
+///       at index :math:`(\texttt{ro_s}, \texttt{co_s})` of :math:`{\mtxS}.`
 ///
 /// .. dropdown:: Full parameter descriptions
 ///     :animate: fade-in-slide-down
@@ -118,7 +111,7 @@ using namespace RandBLAS::sparse;
 ///
 ///      d - [in]
 ///       * A nonnegative integer.
-///       * The number of columns in :math:`\mat(B)` and :math:`\submat(S).`
+///       * The number of columns in :math:`\mat(B)` and :math:`\submat(\mtxS).`
 ///
 ///      alpha - [in]
 ///       * A real scalar.
@@ -134,17 +127,17 @@ using namespace RandBLAS::sparse;
 ///
 ///      S - [in]  
 ///       * A DenseSkOp or SparseSkOp object.
-///       * Defines :math:`\submat(S).`
+///       * Defines :math:`\submat(\mtxS).`
 ///
 ///      ro_s - [in]
 ///       * A nonnegative integer.
-///       * The rows of :math:`\submat(S)` are a contiguous subset of rows of :math:`S.`
-///       * The rows of :math:`\submat(S)` start at :math:`S[\texttt{ro_s}, :].`
+///       * The rows of :math:`\submat(\mtxS)` are a contiguous subset of rows of :math:`S.`
+///       * The rows of :math:`\submat(\mtxS)` start at :math:`S[\texttt{ro_s}, :].`
 ///
 ///      co_s - [in]
 ///       * A nonnnegative integer.
-///       * The columns of :math:`\submat(S)` are a contiguous subset of columns of :math:`S.`
-///       * The columns of :math:`\submat(S)` start at :math:`S[:,\texttt{co_s}].` 
+///       * The columns of :math:`\submat(\mtxS)` are a contiguous subset of columns of :math:`S.`
+///       * The columns of :math:`\submat(\mtxS)` start at :math:`S[:,\texttt{co_s}].` 
 ///
 ///      beta - [in]
 ///       * A real scalar.
@@ -162,7 +155,7 @@ using namespace RandBLAS::sparse;
 ///       * Leading dimension of :math:`\mat(B)` when reading from :math:`B.`
 ///
 /// @endverbatim
-template <typename T, typename SKOP>
+template <SketchingOperator SKOP, typename T = SKOP::scalar_t>
 inline void sketch_symmetric(
     // B = alpha*A*S + beta*B, where A is a symmetric matrix stored in the format of a general matrix.
     blas::Layout layout,
@@ -193,9 +186,9 @@ inline void sketch_symmetric(
 /// Check that :math:`\mat(A)` is symmetric up to tolerance :math:`\texttt{sym_check_tol}`, then sketch from the left in a SYMM-like operation
 /// 
 /// .. math::
-///     \mat(B) = \alpha \cdot \underbrace{\submat(S)}_{d \times n} \cdot \underbrace{\mat(A)}_{n \times n} + \beta \cdot \underbrace{\mat(B)}_{d \times n},    \tag{$\star$}
+///     \mat(B) = \alpha \cdot \underbrace{\submat(\mtxS)}_{d \times n} \cdot \underbrace{\mat(A)}_{n \times n} + \beta \cdot \underbrace{\mat(B)}_{d \times n},    \tag{$\star$}
 /// 
-/// where :math:`\alpha` and :math:`\beta` are real scalars and :math:`S` is a sketching operator.
+/// where :math:`\alpha` and :math:`\beta` are real scalars and :math:`\mtxS` is a sketching operator.
 ///
 /// .. dropdown:: FAQ
 ///   :animate: fade-in-slide-down
@@ -206,7 +199,7 @@ inline void sketch_symmetric(
 ///       according to 
 ///
 ///             .. math::
-///                 \mat(A)[i, j] = A[i + j \cdot \lda] = A[i \cdot \lda + j].
+///                 \mat(A)_{ij} = A[i + j \cdot \lda] = A[i \cdot \lda + j].
 ///
 ///       Note that the the "layout" parameter passed to this function is not used here.
 ///       That's because this function requires :math:`\mat(A)` to be stored in the format
@@ -223,21 +216,21 @@ inline void sketch_symmetric(
 ///       If layout == ColMajor, then
 ///
 ///             .. math::
-///                 \mat(B)[i, j] = B[i + j \cdot \ldb].
+///                 \mat(B)_{ij} = B[i + j \cdot \ldb].
 ///
 ///       In this case, :math:`\ldb` must be :math:`\geq d.`
 ///
 ///       If layout == RowMajor, then
 ///
 ///             .. math::
-///                 \mat(B)[i, j] = B[i \cdot \ldb + j].
+///                 \mat(B)_{ij} = B[i \cdot \ldb + j].
 ///
 ///       In this case, :math:`\ldb` must be :math:`\geq n.`
 ///
-///     **What is** :math:`\submat(S)` **?**
+///     **What is** :math:`\submat(\mtxS)` **?**
 ///
-///       It's the :math:`d \times n` submatrix of :math:`{S}` whose upper-left corner appears
-///       at index :math:`(\texttt{ro_s}, \texttt{co_s})` of :math:`{S}.`
+///       It's the :math:`d \times n` submatrix of :math:`{\mtxS}` whose upper-left corner appears
+///       at index :math:`(\texttt{ro_s}, \texttt{co_s})` of :math:`{\mtxS}.`
 ///
 /// .. dropdown:: Full parameter descriptions
 ///     :animate: fade-in-slide-down
@@ -248,7 +241,7 @@ inline void sketch_symmetric(
 ///
 ///      d - [in]
 ///       * A nonnegative integer.
-///       * The number of rows in :math:`\mat(B)` and :math:`\submat(S).`
+///       * The number of rows in :math:`\mat(B)` and :math:`\submat(\mtxS).`
 ///
 ///      n - [in]
 ///       * A nonnegative integer.
@@ -261,17 +254,17 @@ inline void sketch_symmetric(
 ///
 ///      S - [in]  
 ///       * A DenseSkOp or SparseSkOp object.
-///       * Defines :math:`\submat(S).`
+///       * Defines :math:`\submat(\mtxS).`
 ///
 ///      ro_s - [in]
 ///       * A nonnegative integer.
-///       * The rows of :math:`\submat(S)` are a contiguous subset of rows of :math:`S.`
-///       * The rows of :math:`\submat(S)` start at :math:`S[\texttt{ro_s}, :].`
+///       * The rows of :math:`\submat(\mtxS)` are a contiguous subset of rows of :math:`S.`
+///       * The rows of :math:`\submat(\mtxS)` start at :math:`S[\texttt{ro_s}, :].`
 ///
 ///      co_s - [in]
 ///       * A nonnnegative integer.
-///       * The columns of :math:`\submat(S)` are a contiguous subset of columns of :math:`S.`
-///       * The columns of :math:`\submat(S)` start at :math:`S[:,\texttt{co_s}].` 
+///       * The columns of :math:`\submat(\mtxS)` are a contiguous subset of columns of :math:`S.`
+///       * The columns of :math:`\submat(\mtxS)` start at :math:`S[:,\texttt{co_s}].` 
 ///
 ///      A - [in]
 ///       * Pointer to a 1D array of real scalars.
@@ -297,7 +290,7 @@ inline void sketch_symmetric(
 ///       * Leading dimension of :math:`\mat(B)` when reading from :math:`B.`
 ///
 /// @endverbatim
-template <typename T, typename SKOP>
+template <SketchingOperator SKOP, typename T = SKOP::scalar_t>
 inline void sketch_symmetric(
     // B = alpha*S*A + beta*B
     blas::Layout layout,
@@ -329,9 +322,9 @@ inline void sketch_symmetric(
 /// Check that :math:`\mat(A)` is symmetric up to tolerance :math:`\texttt{sym_check_tol}`, then sketch from the right in a SYMM-like operation
 /// 
 /// .. math::
-///     \mat(B) = \alpha \cdot \underbrace{\mat(A)}_{n \times n} \cdot S  + \beta \cdot \underbrace{\mat(B)}_{n \times d},    \tag{$\star$}
+///     \mat(B) = \alpha \cdot \underbrace{\mat(A)}_{n \times n} \cdot \mtxS  + \beta \cdot \underbrace{\mat(B)}_{n \times d},    \tag{$\star$}
 /// 
-/// where :math:`\alpha` and :math:`\beta` are real scalars and :math:`S` is an :math:`n \times d` sketching operator.
+/// where :math:`\alpha` and :math:`\beta` are real scalars and :math:`\mtxS` is an :math:`n \times d` sketching operator.
 ///
 /// .. dropdown:: FAQ
 ///   :animate: fade-in-slide-down
@@ -342,7 +335,7 @@ inline void sketch_symmetric(
 ///       Its precise contents depend on :math:`(A, \lda)`, according to 
 ///
 ///             .. math::
-///                 \mat(A)[i, j] = A[i + j \cdot \lda] = A[i \cdot \lda + j].
+///                 \mat(A)_{ij} = A[i + j \cdot \lda] = A[i \cdot \lda + j].
 ///
 ///       Note that the the "layout" parameter passed to this function is not used here.
 ///       That's because this function requires :math:`\mat(A)` to be stored in the format
@@ -361,14 +354,14 @@ inline void sketch_symmetric(
 ///       If layout == ColMajor, then
 ///
 ///             .. math::
-///                 \mat(B)[i, j] = B[i + j \cdot \ldb].
+///                 \mat(B)_{ij} = B[i + j \cdot \ldb].
 ///
 ///       In this case, :math:`\ldb` must be :math:`\geq n.`
 ///
 ///       If layout == RowMajor, then
 ///
 ///             .. math::
-///                 \mat(B)[i, j] = B[i \cdot \ldb + j].
+///                 \mat(B)_{ij} = B[i \cdot \ldb + j].
 ///
 ///       In this case, :math:`\ldb` must be :math:`\geq d.`
 ///
@@ -410,7 +403,7 @@ inline void sketch_symmetric(
 ///       * Leading dimension of :math:`\mat(B)` when reading from :math:`B.`
 ///
 /// @endverbatim
-template <typename T, typename SKOP>
+template <SketchingOperator SKOP, typename T = SKOP::scalar_t>
 inline void sketch_symmetric(
     // B = alpha*A*S + beta*B, where A is a symmetric matrix stored in the format of a general matrix.
     blas::Layout layout,
@@ -438,9 +431,9 @@ inline void sketch_symmetric(
 /// Check that :math:`\mat(A)` is symmetric up to tolerance :math:`\texttt{sym_check_tol}`, then sketch from the left in a SYMM-like operation
 /// 
 /// .. math::
-///     \mat(B) = \alpha \cdot S \cdot \underbrace{\mat(A)}_{n \times n} + \beta \cdot \underbrace{\mat(B)}_{d \times n},    \tag{$\star$}
+///     \mat(B) = \alpha \cdot \mtxS \cdot \underbrace{\mat(A)}_{n \times n} + \beta \cdot \underbrace{\mat(B)}_{d \times n},    \tag{$\star$}
 /// 
-/// where :math:`\alpha` and :math:`\beta` are real scalars and :math:`S` is a :math:`d \times n` sketching operator.
+/// where :math:`\alpha` and :math:`\beta` are real scalars and :math:`\mtxS` is a :math:`d \times n` sketching operator.
 ///
 /// .. dropdown:: FAQ
 ///   :animate: fade-in-slide-down
@@ -451,7 +444,7 @@ inline void sketch_symmetric(
 ///       according to 
 ///
 ///             .. math::
-///                 \mat(A)[i, j] = A[i + j \cdot \lda] = A[i \cdot \lda + j].
+///                 \mat(A)_{ij} = A[i + j \cdot \lda] = A[i \cdot \lda + j].
 ///
 ///       Note that the the "layout" parameter passed to this function is not used here.
 ///       That's because this function requires :math:`\mat(A)` to be stored in the format
@@ -468,14 +461,14 @@ inline void sketch_symmetric(
 ///       If layout == ColMajor, then
 ///
 ///             .. math::
-///                 \mat(B)[i, j] = B[i + j \cdot \ldb].
+///                 \mat(B)_{ij} = B[i + j \cdot \ldb].
 ///
 ///       In this case, :math:`\ldb` must be :math:`\geq d.`
 ///
 ///       If layout == RowMajor, then
 ///
 ///             .. math::
-///                 \mat(B)[i, j] = B[i \cdot \ldb + j].
+///                 \mat(B)_{ij} = B[i \cdot \ldb + j].
 ///
 ///       In this case, :math:`\ldb` must be :math:`\geq n.`
 ///
@@ -517,7 +510,7 @@ inline void sketch_symmetric(
 ///       * Leading dimension of :math:`\mat(B)` when reading from :math:`B.`
 ///
 /// @endverbatim
-template <typename T, typename SKOP>
+template <SketchingOperator SKOP, typename T = SKOP::scalar_t>
 inline void sketch_symmetric(
     // B = alpha*S*A + beta*B
     blas::Layout layout,
diff --git a/RandBLAS/skve.hh b/RandBLAS/skve.hh
index 9657c23f..fcdcb806 100644
--- a/RandBLAS/skve.hh
+++ b/RandBLAS/skve.hh
@@ -48,18 +48,6 @@ using namespace RandBLAS::dense;
 using namespace RandBLAS::sparse;
 
 
-/* Intended macro definitions.
-
-   .. |op| mathmacro:: \operatorname{op}
-   .. |mat| mathmacro:: \operatorname{mat}
-   .. |submat| mathmacro:: \operatorname{submat}
-   .. |lda| mathmacro:: \texttt{lda}
-   .. |ldb| mathmacro:: \texttt{ldb}
-   .. |opA| mathmacro:: \texttt{opA}
-   .. |opS| mathmacro:: \texttt{opS}
-*/
-
-
 // MARK: SUBMAT(S)
 
 // =============================================================================
@@ -70,14 +58,14 @@ using namespace RandBLAS::sparse;
 /// Perform a GEMV-like operation. If :math:`{\opS} = \texttt{NoTrans},` then we perform
 ///
 /// .. math::
-///     \mat(y) = \alpha \cdot \underbrace{\submat(S)}_{d \times m} \cdot \underbrace{\mat(x)}_{m \times 1} + \beta \cdot \underbrace{\mat(y)}_{d \times 1},    \tag{$\star$}
+///     \mat(y) = \alpha \cdot \underbrace{\submat(\mtxS)}_{d \times m} \cdot \underbrace{\mat(x)}_{m \times 1} + \beta \cdot \underbrace{\mat(y)}_{d \times 1},    \tag{$\star$}
 ///
 /// otherwise, we perform
 ///
 /// .. math::
-///     \mat(y) = \alpha \cdot \underbrace{\submat(S)^T}_{m \times d} \cdot \underbrace{\mat(x)}_{d \times 1} + \beta \cdot \underbrace{\mat(y)}_{m \times 1},    \tag{$\diamond$}
+///     \mat(y) = \alpha \cdot \underbrace{\submat(\mtxS)^T}_{m \times d} \cdot \underbrace{\mat(x)}_{d \times 1} + \beta \cdot \underbrace{\mat(y)}_{m \times 1},    \tag{$\diamond$}
 ///
-/// where :math:`\alpha` and :math:`\beta` are real scalars and :math:`S` is a sketching operator.
+/// where :math:`\alpha` and :math:`\beta` are real scalars and :math:`\mtxS` is a sketching operator.
 /// 
 /// .. dropdown:: FAQ
 ///   :animate: fade-in-slide-down
@@ -85,7 +73,7 @@ using namespace RandBLAS::sparse;
 ///     **What are** :math:`\mat(x)` **and** :math:`\mat(y)` **?**
 ///     
 ///       They are vectors of shapes :math:`(\mat(x), L_x \times 1)` and :math:`(\mat(y), L_y \times 1),`
-///       where :math:`(L_x, L_y)` are lengths so that :math:`\opS(\submat(S)) \mat(x)` is well-defined and the same shape as :math:`\mat(y).` 
+///       where :math:`(L_x, L_y)` are lengths so that :math:`opS(\submat(\mtxS)) \mat(x)` is well-defined and the same shape as :math:`\mat(y).` 
 ///       Their precise contents are determined in a way that is identical to GEMV from BLAS.
 ///
 ///     **Why no "layout" argument?**
@@ -98,16 +86,16 @@ using namespace RandBLAS::sparse;
 ///
 ///      opS - [in]
 ///       * Either Op::Trans or Op::NoTrans.
-///       * If :math:`\opS` = NoTrans, then :math:`\op(\submat(S)) = \submat(S).`
-///       * If :math:`\opS` = Trans, then :math:`\op(\submat(S)) = \submat(S)^T.`
+///       * If :math:`\opS` = NoTrans, then :math:`\op(\submat(\mtxS)) = \submat(\mtxS).`
+///       * If :math:`\opS` = Trans, then :math:`\op(\submat(\mtxS)) = \submat(\mtxS)^T.`
 ///
 ///      d - [in]
 ///       * A nonnegative integer.
-///       * The number of rows in :math:`\submat(S).`
+///       * The number of rows in :math:`\submat(\mtxS).`
 ///
 ///      m - [in]
 ///       * A nonnegative integer.
-///       * The number of columns in :math:`\submat(S).`
+///       * The number of columns in :math:`\submat(\mtxS).`
 ///
 ///      alpha - [in]
 ///       * A real scalar.
@@ -115,15 +103,15 @@ using namespace RandBLAS::sparse;
 ///     
 ///      S - [in]  
 ///       * A DenseSkOp or SparseSkOp object.
-///       * Defines :math:`\submat(S).`
+///       * Defines :math:`\submat(\mtxS).`
 ///
 ///      ro_s - [in]
 ///       * A nonnegative integer.
-///       * :math:`\submat(S)` is a contiguous submatrix of :math:`S[\texttt{ro_s}:(\texttt{ro_s} + d), :].`
+///       * :math:`\submat(\mtxS)` is a contiguous submatrix of :math:`S[\texttt{ro_s}:(\texttt{ro_s} + d), :].`
 ///
 ///      co_s - [in]
 ///       * A nonnegative integer.
-///       * :math:`\submat(S)` is a contiguous submatrix of :math:`S[:,\texttt{co_s}:(\texttt{co_s} + m)].`
+///       * :math:`\submat(\mtxS)` is a contiguous submatrix of :math:`S[:,\texttt{co_s}:(\texttt{co_s} + m)].`
 ///
 ///      x - [in]
 ///       * Pointer to a 1D array of real scalars.
@@ -149,11 +137,11 @@ using namespace RandBLAS::sparse;
 ///       * Stride between elements of y.
 ///
 /// @endverbatim
-template <typename T, typename SKOP>
+template <SketchingOperator SKOP, typename T = SKOP::scalar_t>
 inline void sketch_vector(
     blas::Op opS,
-    int64_t d, // rows in submat(S)
-    int64_t m, // cols in submat(S)
+    int64_t d, // rows in submat(\mtxS)
+    int64_t m, // cols in submat(\mtxS)
     T alpha,
     SKOP &S,
     int64_t ro_s,
@@ -185,9 +173,9 @@ inline void sketch_vector(
 /// Perform a GEMV-like operation:
 ///
 /// .. math::
-///     \mat(y) = \alpha \cdot \op(S) \cdot \mat(x) + \beta \cdot \mat(y),    \tag{$\star$}
+///     \mat(y) = \alpha \cdot \op(\mtxS) \cdot \mat(x) + \beta \cdot \mat(y),    \tag{$\star$}
 ///
-/// where :math:`\alpha` and :math:`\beta` are real scalars and :math:`S` is a sketching operator.
+/// where :math:`\alpha` and :math:`\beta` are real scalars and :math:`\mtxS` is a sketching operator.
 /// 
 /// .. dropdown:: FAQ
 ///   :animate: fade-in-slide-down
@@ -195,7 +183,7 @@ inline void sketch_vector(
 ///     **What are** :math:`\mat(x)` **and** :math:`\mat(y)` **?**
 ///
 ///       They are vectors of shapes :math:`(\mat(x), L_x \times 1)` and :math:`(\mat(y), L_y \times 1),`
-///       where :math:`(L_x, L_y)` are lengths so that :math:`\opS(S) \mat(x)` is well-defined and the same shape as :math:`\mat(y).` 
+///       where :math:`(L_x, L_y)` are lengths so that :math:`\opS(\mtxS) \mat(x)` is well-defined and the same shape as :math:`\mat(y).` 
 ///       Their precise contents are determined in a way that is identical to GEMV from BLAS.
 ///
 ///     **Why no "layout" argument?**
@@ -208,8 +196,8 @@ inline void sketch_vector(
 ///
 ///      opS - [in]
 ///       * Either Op::Trans or Op::NoTrans.
-///       * If :math:`\opS` = NoTrans, then :math:`\op(S) = S.`
-///       * If :math:`\opS` = Trans, then :math:`\op(S) = S^T.`
+///       * If :math:`\opS` = NoTrans, then :math:`\op(\mtxS) = \mtxS.`
+///       * If :math:`\opS` = Trans, then :math:`\op(\mtxS) = \mtxS^T.`
 ///
 ///      alpha - [in]
 ///       * A real scalar.
@@ -241,7 +229,7 @@ inline void sketch_vector(
 ///       * Stride between elements of y.
 ///
 /// @endverbatim
-template <typename T, typename SKOP>
+template <SketchingOperator SKOP, typename T = SKOP::scalar_t>
 inline void sketch_vector(
     blas::Op opS,
     T alpha,
diff --git a/RandBLAS/sparse_data/base.hh b/RandBLAS/sparse_data/base.hh
index fa9abafe..e2fad4a4 100644
--- a/RandBLAS/sparse_data/base.hh
+++ b/RandBLAS/sparse_data/base.hh
@@ -36,13 +36,13 @@
 
 namespace RandBLAS::sparse_data {
 
-enum class IndexBase : char {
+enum class IndexBase : int {
     // ---------------------------------------------------------------
     // zero-based indexing
-    Zero = 'Z',
+    Zero = 0,
     // ---------------------------------------------------------------
     // one-based indexing
-    One = 'O'
+    One = 1
 };
 
 template <typename T>
@@ -65,7 +65,7 @@ int64_t nnz_in_dense(
     return nnz;
 }
 
-template <RandBLAS::SignedInteger sint_t = int64_t>
+template <SignedInteger sint_t = int64_t>
 static inline void sorted_nonzero_locations_to_pointer_array(
     int64_t nnz,
     sint_t *sorted, // length at least max(nnz, last_ptr_index + 1)
@@ -96,36 +96,35 @@ static inline void sorted_nonzero_locations_to_pointer_array(
 // nomincally public members like A._n_rows and A._n_cols, which the user will only change
 // at their own peril.
 
+#ifdef __cpp_concepts
 // =============================================================================
 /// @verbatim embed:rst:leading-slashes
 ///
-/// .. NOTE: \ttt expands to \texttt (its definition is given in an rst file)
+/// An object :math:`\ttt{M}` of type :math:`\ttt{SpMat}` has the following attributes.
 ///
-/// Any object :math:`\ttt{M}` of type :math:`\ttt{SpMat}` has the following attributes.
-///
-///     .. list-table::
-///        :widths: 25 30 40
-///        :header-rows: 1
-///        
-///        * - 
-///          - type
-///          - description
-///        * - :math:`\ttt{M.n_rows}`
-///          - :math:`\ttt{const int64_t}`
-///          - number of rows
-///        * - :math:`\ttt{M.n_cols}`
-///          - :math:`\ttt{const int64_t}`
-///          - number of columns
-///        * - :math:`\ttt{M.nnz}`
-///          - :math:`\ttt{int64_t}`
-///          - number of structural nonzeros
-///        * - :math:`\ttt{M.vals}`
-///          - :math:`\ttt{SpMat::scalar_t *}`
-///          - pointer to values of structural nonzeros
-///        * - :math:`\ttt{M.own_memory}`
-///          - :math:`\ttt{const bool}`
-///          - A flag indicating if memory attached to :math:`\ttt{M}` should be deallocated when :math:`\ttt{M}` is deleted.
-///            This flag is set automatically based on the type of constructor used for :math:`\ttt{M}.` 
+/// .. list-table::
+///    :widths: 25 30 40
+///    :header-rows: 1
+///    
+///    * - 
+///      - type
+///      - description
+///    * - :math:`\ttt{M.n_rows}`
+///      - :math:`\ttt{const int64_t}`
+///      - number of rows
+///    * - :math:`\ttt{M.n_cols}`
+///      - :math:`\ttt{const int64_t}`
+///      - number of columns
+///    * - :math:`\ttt{M.nnz}`
+///      - :math:`\ttt{int64_t}`
+///      - number of structural nonzeros
+///    * - :math:`\ttt{M.vals}`
+///      - :math:`\ttt{SpMat::scalar_t *}`
+///      - pointer to values of structural nonzeros
+///    * - :math:`\ttt{M.own_memory}`
+///      - :math:`\ttt{bool}`
+///      - A flag indicating if memory attached to :math:`\ttt{M}` should be deallocated when :math:`\ttt{M}` is deleted.
+///        This flag is set automatically based on the type of constructor used for :math:`\ttt{M}.` 
 /// 
 ///
 /// **Memory-owning constructors**
@@ -164,26 +163,25 @@ static inline void sorted_nonzero_locations_to_pointer_array(
 ///             }
 ///         }        
 ///
-/// **Non-owning constructors**
+/// **View constructors**
 ///
-///     This concept doesn't place specific requirements constructors for non-owning sparse matrix views of existing data. 
+///     This concept doesn't place specific requirements constructors for view sparse matrix views of existing data. 
 ///     However, all of RandBLAS' sparse matrix classes offer such constructors. See individual classes'
 ///     documentation for details.
 ///
 /// @endverbatim
 template<typename SpMat>
 concept SparseMatrix = requires(SpMat A) {
-    // TODO: figure out why I need to use convertible_to rather than is_same.
-    { A.n_rows } -> std::convertible_to<const int64_t>;
-    { A.n_cols } -> std::convertible_to<const int64_t>;
-    { A.nnz } -> std::convertible_to<int64_t>;
-    { *(A.vals) } -> std::convertible_to<typename SpMat::scalar_t>;
+    { A.n_rows }     -> std::same_as<const int64_t&>;
+    { A.n_cols }     -> std::same_as<const int64_t&>;
+    { A.nnz }        -> std::same_as<int64_t&>;
+    { *(A.vals) }    -> std::same_as<typename SpMat::scalar_t&>;
+    { A.own_memory } -> std::same_as<bool&>;
     { SpMat(A.n_rows, A.n_cols) };
-    // ^ Is there better way to require a two-argument constructor?
-    { A.own_memory } ->  std::convertible_to<const bool>;
-    // { A.reserve((int64_t) 10) };
-    // ^ Problem: const SpMat objects fail that check.
 };
+#else
+#define SparseMatrix typename
+#endif
 
 } // end namespace RandBLAS::sparse_data
 
diff --git a/RandBLAS/sparse_data/conversions.hh b/RandBLAS/sparse_data/conversions.hh
index 1db914b6..cb4958be 100644
--- a/RandBLAS/sparse_data/conversions.hh
+++ b/RandBLAS/sparse_data/conversions.hh
@@ -47,7 +47,7 @@ void coo_to_csc(COOMatrix<T, sint_t1> &coo, CSCMatrix<T, sint_t2> &csc) {
     randblas_require(csc.index_base == IndexBase::Zero);
     randblas_require(coo.index_base == IndexBase::Zero);
     sort_coo_data(NonzeroSort::CSC, coo);
-    csc.reserve(coo.nnz);
+    reserve_csc(coo.nnz, csc);
     csc.colptr[0] = 0;
     int64_t ell = 0;
     for (int64_t j = 0; j < coo.n_cols; ++j) {
@@ -67,7 +67,7 @@ void csc_to_coo(CSCMatrix<T, sint_t1> &csc, COOMatrix<T, sint_t2> &coo) {
     randblas_require(csc.n_cols == coo.n_cols);
     randblas_require(csc.index_base == IndexBase::Zero);
     randblas_require(coo.index_base == IndexBase::Zero);
-    coo.reserve(csc.nnz);
+    reserve_coo(csc.nnz, coo);
     int64_t ell = 0;
     for (int64_t j = 0; j < csc.n_cols; ++j) {
         for (int64_t i = csc.colptr[j]; i < csc.colptr[j+1]; ++i) {
@@ -88,7 +88,7 @@ void coo_to_csr(COOMatrix<T, sint_t1> &coo, CSRMatrix<T, sint_t2> &csr) {
     randblas_require(csr.index_base == IndexBase::Zero);
     randblas_require(coo.index_base == IndexBase::Zero);
     sort_coo_data(NonzeroSort::CSR, coo);
-    csr.reserve(coo.nnz);
+    reserve_csr(coo.nnz, csr);
     csr.rowptr[0] = (sint_t2) 0;
     int64_t ell = 0;
     for (int64_t i = 0; i < coo.n_rows; ++i) {
@@ -108,7 +108,7 @@ void csr_to_coo(CSRMatrix<T, sint_t1> &csr, COOMatrix<T, sint_t2> &coo) {
     randblas_require(csr.n_cols == coo.n_cols);
     randblas_require(csr.index_base == IndexBase::Zero);
     randblas_require(coo.index_base == IndexBase::Zero);
-    coo.reserve(csr.nnz);
+    reserve_coo(csr.nnz, coo);
     int64_t ell = 0;
     for (int64_t i = 0; i < csr.n_rows; ++i) {
         for (int64_t j = csr.rowptr[i]; j < csr.rowptr[i+1]; ++j) {
@@ -122,14 +122,15 @@ void csr_to_coo(CSRMatrix<T, sint_t1> &csr, COOMatrix<T, sint_t2> &coo) {
     return;
 }
 
-template <typename T>
-CSRMatrix<T> transpose_as_csr(CSCMatrix<T> &A, bool share_memory = true) {
+template <typename T, SignedInteger sint_t>
+CSRMatrix<T, sint_t> transpose_as_csr(CSCMatrix<T, sint_t> &A, bool share_memory = true) {
     if (share_memory) {
-        CSRMatrix<T> At(A.n_cols, A.n_rows, A.nnz, A.vals, A.colptr, A.rowidxs, A.index_base);
+        CSRMatrix<T, sint_t> At(A.n_cols, A.n_rows, A.nnz, A.vals, A.colptr, A.rowidxs, A.index_base);
         return At;
     } else {
-        CSRMatrix<T> At(A.n_cols, A.n_rows, A.index_base);
-        At.reserve(A.nnz);
+        CSRMatrix<T, sint_t> At(A.n_cols, A.n_rows);
+        At.index_base = A.index_base;
+        reserve_csr(A.nnz, At);
         for (int64_t i = 0; i < A.nnz; ++i) {
             At.colidxs[i] = A.rowidxs[i];
             At.vals[i] = A.vals[i];
@@ -140,14 +141,15 @@ CSRMatrix<T> transpose_as_csr(CSCMatrix<T> &A, bool share_memory = true) {
     }
 }
 
-template <typename T>
-CSCMatrix<T> transpose_as_csc(CSRMatrix<T> &A, bool share_memory = true) {
+template <typename T, SignedInteger sint_t>
+CSCMatrix<T, sint_t> transpose_as_csc(CSRMatrix<T, sint_t> &A, bool share_memory = true) {
     if (share_memory) {
-        CSCMatrix<T> At(A.n_cols, A.n_rows, A.nnz, A.vals, A.colidxs, A.rowptr, A.index_base);
+        CSCMatrix<T, sint_t> At(A.n_cols, A.n_rows, A.nnz, A.vals, A.colidxs, A.rowptr, A.index_base);
         return At;
     } else {
-        CSCMatrix<T> At(A.n_cols, A.n_rows, A.index_base);
-        At.reserve(A.nnz);
+        CSCMatrix<T, sint_t> At(A.n_cols, A.n_rows);
+        At.index_base = A.index_base;
+        reserve_csc(A.nnz, At);
         for (int64_t i = 0; i < A.nnz; ++i) {
             At.rowidxs[i] = A.colidxs[i];
             At.vals[i] = A.vals[i];
@@ -158,8 +160,18 @@ CSCMatrix<T> transpose_as_csc(CSRMatrix<T> &A, bool share_memory = true) {
     }
 }
 
-template <typename T>
-void reindex_inplace(CSCMatrix<T> &A, IndexBase desired) {
+/// -----------------------------------------------------------------------
+/// Given a RandBLAS SparseMatrix "A" (CSCMatrix, CSRMatrix, or COOMatrix),
+/// modify its underlying datastructures as necessary so that it labels 
+/// matrix elements in "desired" IndexBase.
+/// 
+/// Use this to convert between one-based indexing and zero-based indexing.
+/// This function returns immediately if desired == A.index_base.
+template <SparseMatrix SpMat>
+void reindex_inplace(SpMat &A, IndexBase desired);
+
+template <typename T, SignedInteger sint_t>
+void reindex_inplace(CSCMatrix<T, sint_t> &A, IndexBase desired) {
     if (A.index_base == desired)
         return;
     if (A.index_base == IndexBase::One) {
@@ -173,8 +185,8 @@ void reindex_inplace(CSCMatrix<T> &A, IndexBase desired) {
     return;
 }
 
-template <typename T>
-void reindex_inplace(CSRMatrix<T> &A, IndexBase desired) {
+template <typename T, SignedInteger sint_t>
+void reindex_inplace(CSRMatrix<T, sint_t> &A, IndexBase desired) {
     if (A.index_base == desired)
         return;
     if (A.index_base == IndexBase::One) {
@@ -188,8 +200,8 @@ void reindex_inplace(CSRMatrix<T> &A, IndexBase desired) {
     return;
 }
 
-template <typename T>
-void reindex_inplace(COOMatrix<T> &A, IndexBase desired) {
+template <typename T, SignedInteger sint_t>
+void reindex_inplace(COOMatrix<T,sint_t> &A, IndexBase desired) {
     if (A.index_base == desired)
         return;
     if (A.index_base == IndexBase::One) {
diff --git a/RandBLAS/sparse_data/coo_matrix.hh b/RandBLAS/sparse_data/coo_matrix.hh
index 76f587f6..8adbf746 100644
--- a/RandBLAS/sparse_data/coo_matrix.hh
+++ b/RandBLAS/sparse_data/coo_matrix.hh
@@ -42,12 +42,19 @@ namespace RandBLAS::sparse_data {
 using RandBLAS::SignedInteger;
 
 // =============================================================================
-/// Indicates whether the (rows, cols, vals) data of a COO-format sparse matrix
+/// Indicates whether the (vals, rows, cols) 
+/// data of a COO-format sparse matrix
 /// are known to be sorted in CSC order, CSR order, or neither of those orders.
 ///
 enum class NonzeroSort : char {
+    // ---------------------------------------------------
+    /// 
     CSC = 'C',
+    // ---------------------------------------------------
+    /// 
     CSR = 'R',
+    // ---------------------------------------------------
+    /// 
     None = 'N'
 };
 
@@ -101,113 +108,112 @@ static inline NonzeroSort coo_sort_type(int64_t nnz, sint_t *rows, sint_t *cols)
 }
 
 // =============================================================================
-/// A COO-format sparse matrix that complies with the SparseMatrix concept.
+/// Let \math{\mtxA} denote a sparse matrix with \math{\ttt{nnz}} structural nonzeros.
+/// Its COO representation consists of declared dimensions, \math{\ttt{n_rows}}
+/// and \math{\ttt{n_cols}}, as well as a triplet of arrays 
+/// \math{(\ttt{vals},\ttt{rows},\ttt{cols})} where
+/// @verbatim embed:rst:leading-slashes
+///
+/// .. math::
+///
+///         \mtxA_{\ttt{rows}[\ell],\ttt{cols}[\ell]} = \ttt{vals}[\ell] \quad\text{for all}\quad  \ell \in \{0,\ldots,\ttt{nnz}-1\}.
 ///
+/// @endverbatim
+///  This type conforms to the SparseMatrix concept.
 template <typename T, SignedInteger sint_t = int64_t>
 struct COOMatrix {
+
+    // ---------------------------------------------------------------------------
+    /// Real scalar type used for structural nonzeros in this matrix.
     using scalar_t = T;
+
+    // ---------------------------------------------------------------------------
+    /// Signed integer type used in the rows and cols array members.
     using index_t = sint_t; 
+
+    // ----------------------------------------------------------------------------
+    ///  The number of rows in this sparse matrix.
     const int64_t n_rows;
+
+    // ----------------------------------------------------------------------------
+    ///  The number of columns in this sparse matrix.
     const int64_t n_cols;
-    const bool own_memory;
-    int64_t nnz = 0;
+
+    // ----------------------------------------------------------------------------
+    ///  If true, then RandBLAS has permission to allocate and attach memory to the reference
+    ///  members of this matrix (vals, rows, and cols). If true *at destruction time*, then delete []
+    ///  will be called on each non-null reference member of this matrix.
+    ///
+    ///  RandBLAS only writes to this member at construction time.
+    ///
+    bool own_memory;
+    
+    // ---------------------------------------------------------------------------
+    ///  The number of structral nonzeros in this matrix.
+    int64_t nnz;
+    
+    // ---------------------------------------------------------------------------
+    ///  A flag to indicate whether (rows, cols) are interpreted
+    ///  with zero-based or one-based indexing.
     IndexBase index_base;
-    T *vals = nullptr;
+    
     // ---------------------------------------------------------------------------
-    ///  Row indicies for nonzeros.
-    sint_t *rows = nullptr;
+    ///  Reference to an array that holds values for the structural nonzeros of this matrix.
+    ///
+    ///  If non-null, this must have length at least nnz.
+    ///  \internal
+    ///  **Memory management note.** Because this length requirement is not a function of
+    ///  only const variables, calling reserve_coo(nnz, A) on a COOMatrix "A" will raise an error
+    ///  if A.vals is non-null.
+    T *vals;
+    
     // ---------------------------------------------------------------------------
-    ///  Column indicies for nonzeros
-    sint_t *cols = nullptr;
+    ///  Reference to an array that holds row indices for the structural nonzeros of this matrix.
+    ///
+    ///  If non-null, this must have length at least nnz.
+    ///  \internal
+    ///  **Memory management note.** Because this length requirement is not a function of
+    ///  only const variables, calling reserve_coo(nnz, A) on a COOMatrix "A" will raise an error
+    ///  if A.rows is non-null.
+    ///  \endinternal
+    sint_t *rows;
+    
     // ---------------------------------------------------------------------------
-    ///  A flag to indicate if the data in (rows, cols, vals) is sorted in a 
-    ///  CSC-like order, a CSR-like order, or neither order.
-    NonzeroSort sort = NonzeroSort::None;
-
-    bool _can_reserve = true;
-    // ^ A flag to indicate if we're allowed to allocate new memory for 
-    //   (rows, cols, vals). Set to false after COOMatrix.reserve(...) is called.
-
-    COOMatrix(
-        int64_t n_rows,
-        int64_t n_cols,
-        int64_t nnz,
-        T *vals,
-        sint_t *rows,
-        sint_t *cols,
-        bool compute_sort_type,
-        IndexBase index_base
-    ) : n_rows(n_rows), n_cols(n_cols), own_memory(false), index_base(index_base) {
-        this->nnz = nnz;
-        this->vals = vals;
-        this->rows = rows;
-        this->cols = cols;
-        if (compute_sort_type) {
-            this->sort = coo_sort_type(nnz, rows, cols);
-        } else {
-            this->sort = NonzeroSort::None;
-        }
-    };
+    ///  Reference to an array that holds column indicies for the structural nonzeros of this matrix.
+    ///
+    ///  If non-null, this must have length at least nnz.
+    ///  \internal
+    ///  **Memory management note.** Because this length requirement is not a function of
+    ///  only const variables, calling reserve_coo(nnz, A) on a COOMatrix "A" will raise an error
+    ///  if A.cols is non-null.
+    ///  \endinternal
+    sint_t *cols;
 
-    COOMatrix(
-        int64_t n_rows,
-        int64_t n_cols,
-        IndexBase index_base
-    ) : n_rows(n_rows), n_cols(n_cols), own_memory(true), index_base(index_base) {};
+    // ---------------------------------------------------------------------------
+    ///  A flag to indicate if the data in (vals, rows, cols) is sorted in a 
+    ///  CSC-like order, a CSR-like order, or neither order.
+    NonzeroSort sort;
 
-    // Constructs an empty sparse matrix of given dimensions.
-    // Data can't stored in this object until a subsequent call to reserve(int64_t nnz).
-    // This constructor initializes \math{\ttt{own_memory(true)},} and so
-    // all data stored in this object is deleted once its destructor is invoked.
-    //
+    // ---------------------------------------------------------------------------
+    ///  **Standard constructor.** Initializes n_rows and n_cols at the provided values.
+    ///  The vals, rows, and cols members are set to null pointers;
+    ///  nnz is set to Zero, index_base is set to
+    ///  zero, and COOMatrix::own_memory is set to true.
+    ///  
+    ///  This constructor is intended for use with reserve_coo(int64_t nnz, COOMatrix &A).
+    ///
     COOMatrix(
         int64_t n_rows,
         int64_t n_cols
-    ) : COOMatrix(n_rows, n_cols, IndexBase::Zero) {};
-
+    ) : n_rows(n_rows), n_cols(n_cols), own_memory(true), nnz(0), index_base(IndexBase::Zero),
+        vals(nullptr), rows(nullptr), cols(nullptr), sort(NonzeroSort::None) {};
 
     // ---------------------------------------------------------------------------
-    /// @verbatim embed:rst:leading-slashes
-    /// Constructs a sparse matrix based on declared dimensions and the data in three buffers
-    /// (vals, rows, cols).
-    /// This constructor initializes :math:`\ttt{own_memory(false)}`, and
-    /// so the provided buffers are unaffected when this object's destructor
-    /// is invoked.
-    ///
-    /// .. dropdown:: Full parameter descriptions
-    ///     :animate: fade-in-slide-down
-    ///
-    ///      n_rows - [in]
-    ///       * The number of rows in this sparse matrix.
-    ///
-    ///      n_cols - [in]
-    ///       * The number of columns in this sparse matrix.
-    ///
-    ///      nnz - [in]
-    ///       * The number of structural nonzeros in the matrix.
-    ///
-    ///      vals - [in]
-    ///       * Pointer to array of real numerical type T.
-    ///       * The first nnz entries store values of structural nonzeros
-    ///         as part of the COO format.
+    /// **Expert constructor.** Arguments passed to this function are used to initialize members of the same names;
+    /// COOMatrix::own_memory is set to false.
+    /// If compute_sort_type is true, then the sort member will be computed by inspecting
+    /// the contents of (rows, cols). If compute_sort_type is false, then the sort member is set to None.
     /// 
-    ///      rows - [in]
-    ///       * Pointer to array of sint_t.
-    ///       * The first nnz entries store row indices as part of the COO format.
-    ///
-    ///      cols - [in]
-    ///       * Pointer to array of sint_t.
-    ///       * The first nnz entries store column indices as part of the COO format.
-    ///
-    ///      compute_sort_type - [in]
-    ///       * Indicates if we should parse data in (rows, cols)
-    ///         to see if it's already in CSC-like order or CSR-like order.
-    ///         If you happen to know the sort order ahead of time then 
-    ///         you should set this parameter to false and then manually
-    ///         assign M.sort = ``<the order you already know>`` once you
-    ///         have a handle on M.
-    ///
-    /// @endverbatim
     COOMatrix(
         int64_t n_rows,
         int64_t n_cols,
@@ -215,35 +221,24 @@ struct COOMatrix {
         T *vals,
         sint_t *rows,
         sint_t *cols,
-        bool compute_sort_type = true
-    ) : COOMatrix(n_rows, n_cols, nnz, vals, rows, cols, compute_sort_type, IndexBase::Zero) {};
-
-    ~COOMatrix() {
-        if (this->own_memory) {
-            delete [] this->vals;
-            delete [] this->rows;
-            delete [] this->cols;
-           
+        bool compute_sort_type = true,
+        IndexBase index_base = IndexBase::Zero
+    ) : n_rows(n_rows), n_cols(n_cols), own_memory(false), nnz(nnz), index_base(index_base),
+        vals(vals), rows(rows), cols(cols) {
+        if (compute_sort_type) {
+            sort = coo_sort_type(nnz, rows, cols);
+        } else {
+            sort = NonzeroSort::None;
         }
     };
 
-    // @verbatim embed:rst:leading-slashes
-    // Attach three buffers of length nnz for (vals, rows, cols).
-    // This function can only be called if :math:`\ttt{own_memory == true},`` and
-    // it can only be called once.
-    //
-    // @endverbatim
-    void reserve(int64_t nnz) {
-        randblas_require(this->_can_reserve);
-        randblas_require(this->own_memory);
-        this->nnz = nnz;
-        if (this->nnz > 0) {
-            this->vals = new T[nnz];
-            this->rows = new sint_t[nnz];
-            this->cols = new sint_t[nnz];
+    ~COOMatrix() {
+        if (own_memory) {
+            if (vals != nullptr) delete [] vals;
+            if (rows != nullptr) delete [] rows;
+            if (cols != nullptr) delete [] cols;
         }
-        this->_can_reserve = false;
-    }
+    };
 
     /////////////////////////////////////////////////////////////////////
     //
@@ -253,17 +248,47 @@ struct COOMatrix {
 
     // move constructor
     COOMatrix(COOMatrix<T, sint_t> &&other) 
-    : n_rows(other.n_rows), n_cols(other.n_cols), own_memory(other.own_memory), index_base(other.index_base) {
-        this->nnz = other.nnz;
-        std::swap(this->rows, other.rows);
-        std::swap(this->cols, other.cols);
-        std::swap(this->vals, other.vals);
-        this->_can_reserve = other._can_reserve;
+    : n_rows(other.n_rows), n_cols(other.n_cols), own_memory(other.own_memory), nnz(other.nnz), index_base(other.index_base),
+      vals(nullptr), rows(nullptr), cols(nullptr), sort(other.sort) {
+        std::swap(rows, other.rows);
+        std::swap(cols, other.cols);
+        std::swap(vals, other.vals);
         other.nnz = 0;
-    }    
+        other.sort = NonzeroSort::None;
+    }
 
 };
 
+#ifdef __cpp_concepts
+static_assert(SparseMatrix<COOMatrix<float>>);
+static_assert(SparseMatrix<COOMatrix<double>>);
+#endif
+
+// -----------------------------------------------------
+///
+/// This function requires that M.own_memory is true and that
+/// M.vals, M.rows, and M.cols are all null. If any of these
+/// conditions aren't satisfied then RandBLAS will raise an error.
+///
+/// If no error is raised, then M.nnz is overwritten by nnz,
+/// M.vals is redirected to a new length-nnz array of type T,
+/// and (M.rows, M.cols) are redirected to new length-nnz arrays of type sint_t.
+///
+template <typename T, SignedInteger sint_t>
+void reserve_coo(int64_t nnz, COOMatrix<T,sint_t> &M) {
+    randblas_require(M.own_memory);
+    randblas_require(M.vals == nullptr);
+    randblas_require(M.rows == nullptr);
+    randblas_require(M.cols == nullptr);
+    M.nnz = nnz;
+    if (M.nnz > 0) {
+        M.vals = new T[nnz];
+        M.rows = new sint_t[nnz];
+        M.cols = new sint_t[nnz];
+    }
+    return;
+}
+
 template <typename T, SignedInteger sint_t>
 void sort_coo_data(NonzeroSort s, int64_t nnz, T *vals, sint_t *rows, sint_t *cols) {
     if (s == NonzeroSort::None)
@@ -351,7 +376,7 @@ void dense_to_coo(int64_t stride_row, int64_t stride_col, T *mat, T abs_tol, COO
     int64_t n_rows = spmat.n_rows;
     int64_t n_cols = spmat.n_cols;
     int64_t nnz = nnz_in_dense(n_rows, n_cols, stride_row, stride_col, mat, abs_tol);
-    spmat.reserve(nnz);
+    reserve_coo(nnz, spmat);
     nnz = 0;
     #define MAT(_i, _j) mat[(_i) * stride_row + (_j) * stride_col]
     for (int64_t i = 0; i < n_rows; ++i) {
@@ -379,7 +404,6 @@ void dense_to_coo(Layout layout, T* mat, T abs_tol, COOMatrix<T> &spmat) {
 
 template <typename T>
 void coo_to_dense(const COOMatrix<T> &spmat, int64_t stride_row, int64_t stride_col, T *mat) {
-    randblas_require(spmat.index_base == IndexBase::Zero);
     #define MAT(_i, _j) mat[(_i) * stride_row + (_j) * stride_col]
     for (int64_t i = 0; i < spmat.n_rows; ++i) {
         for (int64_t j = 0; j < spmat.n_cols; ++j) {
diff --git a/RandBLAS/sparse_data/coo_spmm_impl.hh b/RandBLAS/sparse_data/coo_spmm_impl.hh
index 2f798eae..a32bdb04 100644
--- a/RandBLAS/sparse_data/coo_spmm_impl.hh
+++ b/RandBLAS/sparse_data/coo_spmm_impl.hh
@@ -42,7 +42,9 @@
 
 namespace RandBLAS::sparse_data::coo {
 
-template <typename T, RandBLAS::SignedInteger sint_t = int64_t>
+using RandBLAS::SignedInteger;
+
+template <typename T, SignedInteger sint_t = int64_t>
 static int64_t set_filtered_coo(
     // COO-format matrix data
     const T       *vals,
@@ -75,8 +77,7 @@ static int64_t set_filtered_coo(
 }
 
 
-
-template <typename T, RandBLAS::SignedInteger sint_t>
+template <typename T, SignedInteger sint_t>
 static void apply_coo_left_jki_p11(
     T alpha,
     blas::Layout layout_B,
diff --git a/RandBLAS/sparse_data/csc_matrix.hh b/RandBLAS/sparse_data/csc_matrix.hh
index eb0fa6bd..b9c59e22 100644
--- a/RandBLAS/sparse_data/csc_matrix.hh
+++ b/RandBLAS/sparse_data/csc_matrix.hh
@@ -36,143 +36,176 @@
 
 namespace RandBLAS::sparse_data {
 
+using RandBLAS::SignedInteger;
+
 // =============================================================================
-/// A CSC-format sparse matrix that complies with the SparseMatrix concept.
 ///
-template <typename T, RandBLAS::SignedInteger sint_t = int64_t>
+///  Let \math{\mtxA} denote a sparse matrix with \math{\ttt{nnz}} structural nonzeros.
+///  Its CSC representation consists of declared dimensions, \math{\ttt{n_rows}}
+///  and \math{\ttt{n_cols}}, and a triplet of arrays 
+///  \math{(\ttt{vals},\ttt{rowidxs},\ttt{colptr}).}
+///
+///  The \math{\ttt{j}^{\text{th}}} column of \math{\mtxA} has 
+///  \math{\ttt{colptr[j+1] - colptr[j]}} structural nonzeros.
+///  The \math{\ttt{k}^{\text{th}}} structural nonzero in column \math{\ttt{j}} appears in
+///  row \math{\ttt{rowidxs[colptr[j] + k]}} and is equal to \math{\ttt{vals[colptr[j] + k]}.}
+/// 
+///  This type conforms to the SparseMatrix concept.
+template <typename T, SignedInteger sint_t = int64_t>
 struct CSCMatrix {
+    // ------------------------------------------------------------------------
+    /// Real scalar type used for structural nonzeros in this matrix.
     using scalar_t = T;
+
+    // ------------------------------------------------------------------------
+    /// Signed integer type used in the rowptr and colidxs array members.
     using index_t = sint_t; 
+
+    // ------------------------------------------------------------------------
+    ///  The number of rows in this sparse matrix.
     const int64_t n_rows;
+
+    // ------------------------------------------------------------------------
+    ///  The number of columns in this sparse matrix.
     const int64_t n_cols;
-    const bool own_memory;
-    int64_t nnz = 0;
-    IndexBase index_base;
-    T *vals = nullptr;
 
-    // ---------------------------------------------------------------------------
-    ///  Row index array in the CSC format. 
-    ///  
-    sint_t *rowidxs = nullptr;
+    // ------------------------------------------------------------------------
+    ///  If true, then RandBLAS has permission to allocate and attach memory to the reference
+    ///  members of this matrix (vals, rowidxs, colptr). If true *at destruction time*, then delete []
+    ///  will be called on each non-null reference member of this matrix.
+    ///
+    ///  RandBLAS only writes to this member at construction time.
+    ///
+    bool own_memory;
     
-    // ---------------------------------------------------------------------------
-    ///  Pointer offset array for the CSC format. The number of nonzeros in column j
-    ///  is given by colptr[j+1] - colptr[j]. The row index of the k-th nonzero
-    ///  in column j is rowidxs[colptr[j] + k].
-    ///  
-    sint_t *colptr = nullptr;
-    bool _can_reserve = true;
-
-    CSCMatrix(
-        int64_t n_rows,
-        int64_t n_cols,
-        IndexBase index_base
-    ) : n_rows(n_rows), n_cols(n_cols), own_memory(true), index_base(index_base) { };
-
-    CSCMatrix(
-        int64_t n_rows,
-        int64_t n_cols,
-        int64_t nnz,
-        T *vals,
-        sint_t *rowidxs,
-        sint_t *colptr,
-        IndexBase index_base
-    ) : n_rows(n_rows), n_cols(n_cols), own_memory(false), index_base(index_base) {
-        this->nnz = nnz;
-        this->vals = vals;
-        this->rowidxs = rowidxs;
-        this->colptr = colptr;
-    };
-
-    // Constructs an empty sparse matrix of given dimensions.
-    // Data can't stored in this object until a subsequent call to reserve(int64_t nnz).
-    // This constructor initializes \math{\ttt{own_memory(true)},} and so
-    // all data stored in this object is deleted once its destructor is invoked.
-    //
-    CSCMatrix(
-        int64_t n_rows,
-        int64_t n_cols
-    ) : CSCMatrix(n_rows, n_cols, IndexBase::Zero) { };
-
-    // ---------------------------------------------------------------------------
-    /// @verbatim embed:rst:leading-slashes
-    /// Constructs a sparse matrix based on declared dimensions and the data in three buffers
-    /// (vals, rowidxs, colptr). 
-    /// This constructor initializes :math:`\ttt{own_memory(false)}`, and
-    /// so the provided buffers are unaffected when this object's destructor
-    /// is invoked.
+    // ------------------------------------------------------------------------
+    ///  The number of structral nonzeros in this matrix.
     ///
-    /// .. dropdown:: Full parameter descriptions
-    ///     :animate: fade-in-slide-down
+    int64_t nnz;
+    
+    // ------------------------------------------------------------------------
+    ///  A flag to indicate whether colidxs is interpreted
+    ///  with zero-based or one-based indexing.
     ///
-    ///      n_rows - [in]
-    ///       * The number of rows in this sparse matrix.
+    IndexBase index_base;
+    
+    // ------------------------------------------------------------------------
+    ///  Reference to an array that holds values for the structural nonzeros of this matrix.
     ///
-    ///      n_cols - [in]
-    ///       * The number of columns in this sparse matrix.
+    ///  If non-null, this must have length at least nnz.
+    ///  \internal
+    ///  **Memory management note.** Because this length requirement is not a function of
+    ///  only const variables, calling reserve_csc(nnz, A) on a CSCMatrix "A" will raise an error
+    ///  if A.vals is non-null.
+    ///  \endinternal
+    T *vals;
+
+    // ------------------------------------------------------------------------
+    ///  Reference to a row index array in the CSC format, interpreted in \math{\ttt{index_base}}.
     ///
-    ///      nnz - [in]
-    ///       * The number of structural nonzeros in the matrix.
+    ///  If non-null, then must have length at least nnz.
+    ///  \internal
+    ///  **Memory management note.** Because this length requirement is not a function of
+    ///  only const variables, calling reserve_cs(nnz, A) on a CSCMatrix "A" will raise an error
+    ///  if A.rowidxs is non-null.
+    ///  \endinternal
+    sint_t *rowidxs;
+    
+    // ------------------------------------------------------------------------
+    /// Reference to a pointer offset array for the CSC format. 
     ///
-    ///      vals - [in]
-    ///       * Pointer to array of real numerical type T, of length at least nnz.
-    ///       * Stores values of structural nonzeros as part of the CSC format.
+    ///  If non-null, then must have length at least \math{\ttt{n_cols + 1}}.
     ///
-    ///      rowidxs - [in]
-    ///       * Pointer to array of sint_t, of length at least nnz.
+    sint_t *colptr;
+
+    // ---------------------------------------------------------------------------
+    ///  **Standard constructor.** Initializes n_rows and n_cols at the provided values.
+    ///  The vals, rowidxs, and colptr members are null-initialized;
+    ///  \math{\ttt{nnz}} is set to zero, \math{\ttt{index_base}} is set to
+    ///  Zero, and CSCMatrix::own_memory is set to true.
+    ///  
+    ///  This constructor is intended for use with reserve_csc(int64_t nnz, CSCMatrix &A).
     ///
-    ///      colptr - [in]
-    ///       * Pointer to array of sint_t, of length at least n_cols + 1.
+    CSCMatrix( 
+        int64_t n_rows,
+        int64_t n_cols
+    ) : n_rows(n_rows), n_cols(n_cols), own_memory(true), nnz(0), index_base(IndexBase::Zero),
+        vals(nullptr), rowidxs(nullptr), colptr(nullptr) { };
+
+    // ------------------------------------------------------------------------
+    /// **Expert constructor.** Arguments passed to this function are used to initialize members of the same names;
+    /// CSCMatrix::own_memory is set to false.
     ///
-    /// @endverbatim
     CSCMatrix(
         int64_t n_rows,
         int64_t n_cols,
         int64_t nnz,
         T *vals,
         sint_t *rowidxs,
-        sint_t *colptr
-    ) : CSCMatrix(n_rows, n_cols, nnz, vals, rowidxs, colptr, IndexBase::Zero) {};
+        sint_t *colptr,
+        IndexBase index_base = IndexBase::Zero
+    ) : n_rows(n_rows), n_cols(n_cols), own_memory(false), nnz(nnz), index_base(index_base),
+        vals(vals), rowidxs(rowidxs), colptr(colptr) { };
 
     ~CSCMatrix() {
-        if (this->own_memory) {
-            delete [] this->rowidxs;
-            delete [] this->colptr;
-            delete [] this->vals;
+        if (own_memory) {
+            if (rowidxs != nullptr) delete [] rowidxs;
+            if (colptr  != nullptr) delete [] colptr;
+            if (vals    != nullptr) delete [] vals;
         }
     };
 
-
-    // @verbatim embed:rst:leading-slashes
-    // Attach three buffers to this CSCMatrix, (vals, rowidxs, colptr), of sufficient
-    // size for this matrix to hold nnz structural nonzeros.
-    // This function can only be called if :math:`\ttt{own_memory == true},`` and
-    // it can only be called once.
-    //
-    // @endverbatim
-    void reserve(int64_t nnz) {
-        randblas_require(this->_can_reserve);
-        randblas_require(this->own_memory);
-        this->colptr = new sint_t[this->n_cols + 1]{0};
-        this->nnz = nnz;
-        if (this->nnz > 0) {
-            this->rowidxs = new sint_t[nnz]{0};
-            this->vals = new T[nnz]{0.0};
-        }
-        this->_can_reserve = false;
-    };
-
     CSCMatrix(CSCMatrix<T, sint_t> &&other) 
-    : n_rows(other.n_rows), n_cols(other.n_cols), own_memory(other.own_memory), index_base(other.index_base) {
-        this->nnz = other.nnz;
-        std::swap(this->rowidxs, other.rowidxs);
-        std::swap(this->colptr , other.colptr );
-        std::swap(this->vals   , other.vals   );
-        this->_can_reserve = other._can_reserve;
+    : n_rows(other.n_rows), n_cols(other.n_cols), own_memory(other.own_memory), nnz(other.nnz), index_base(other.index_base),
+      vals(nullptr), rowidxs(nullptr), colptr(nullptr) {
+        std::swap(rowidxs, other.rowidxs);
+        std::swap(colptr , other.colptr );
+        std::swap(vals   , other.vals   );
         other.nnz = 0;
     };
 };
 
+#ifdef __cpp_concepts
+static_assert(SparseMatrix<CSCMatrix<float>>);
+static_assert(SparseMatrix<CSCMatrix<double>>);
+#endif
+
+// -----------------------------------------------------
+///
+/// This function requires that M.own_memory is true, that
+/// M.rowidxs is null, and that M.vals is null. If any of
+/// these conditions are not met then this function will
+/// raise an error.
+/// 
+/// Special logic applies to M.colptr because its documented length
+/// requirement is determined by the const variable M.n_cols.
+///
+/// - If M.colptr is non-null then it is left unchanged,
+///   and it is presumed to point to an array of length
+///   at least M.n_cols + 1.
+///
+/// - If M.colptr is null, then it will be redirected to
+///   a new array of type sint_t and length (M.n_cols + 1).
+///
+/// From there, M.nnz is overwritten by nnz, and the reference
+/// members M.rowidxs and M.vals are redirected to new
+/// arrays of length nnz (of types sint_t and T, respectively).
+///
+template <typename T, SignedInteger sint_t>
+void reserve_csc(int64_t nnz, CSCMatrix<T,sint_t> &M) {
+    randblas_require(M.own_memory);
+    randblas_require(M.rowidxs == nullptr);
+    randblas_require(M.vals    == nullptr);
+    if (M.colptr == nullptr)
+        M.colptr = new sint_t[M.n_cols + 1]{0};
+    M.nnz = nnz;
+    if (nnz > 0) {
+        M.rowidxs = new sint_t[nnz]{0};
+        M.vals    = new T[nnz]{0.0};
+    }
+    return;
+}
+
 } // end namespace RandBLAS::sparse_data
 
 namespace RandBLAS::sparse_data::csc {
@@ -218,7 +251,7 @@ void dense_to_csc(int64_t stride_row, int64_t stride_col, T *mat, T abs_tol, CSC
     // Step 1: count the number of entries with absolute value at least abstol
     int64_t nnz = nnz_in_dense(n_rows, n_cols, stride_row, stride_col, mat, abs_tol);
     // Step 2: allocate memory needed by the sparse matrix
-    spmat.reserve(nnz);
+    reserve_csc(nnz, spmat);
     // Step 3: traverse the dense matrix again, populating the sparse matrix as we go
     nnz = 0;
     spmat.colptr[0] = 0;
diff --git a/RandBLAS/sparse_data/csc_spmm_impl.hh b/RandBLAS/sparse_data/csc_spmm_impl.hh
index 77d63654..46cbaf53 100644
--- a/RandBLAS/sparse_data/csc_spmm_impl.hh
+++ b/RandBLAS/sparse_data/csc_spmm_impl.hh
@@ -40,7 +40,9 @@
 
 namespace RandBLAS::sparse_data::csc {
 
-template <typename T, RandBLAS::SignedInteger sint_t = int64_t>
+using RandBLAS::SignedInteger;
+
+template <typename T, SignedInteger sint_t = int64_t>
 static void apply_csc_to_vector_from_left_ki(
     // CSC-format data
     const T *vals,
@@ -64,7 +66,7 @@ static void apply_csc_to_vector_from_left_ki(
     }
 }
 
-template <typename T, RandBLAS::SignedInteger sint_t = int64_t>
+template <typename T, SignedInteger sint_t = int64_t>
 static void apply_regular_csc_to_vector_from_left_ki(
     // data for "regular CSC": CSC with fixed nnz per col,
     // which obviates the requirement for colptr.
@@ -87,7 +89,7 @@ static void apply_regular_csc_to_vector_from_left_ki(
     }
 }
 
-template <typename T, RandBLAS::SignedInteger sint_t>
+template <typename T, SignedInteger sint_t>
 static void apply_csc_left_jki_p11(
     T alpha,
     blas::Layout layout_B,
@@ -152,7 +154,7 @@ static void apply_csc_left_jki_p11(
     return;
 }
 
-template <typename T, RandBLAS::SignedInteger sint_t>
+template <typename T, SignedInteger sint_t>
 static void apply_csc_left_kib_rowmajor_1p1(
     T alpha,
     int64_t d,
diff --git a/RandBLAS/sparse_data/csr_matrix.hh b/RandBLAS/sparse_data/csr_matrix.hh
index 9335b3b0..1393e6d0 100644
--- a/RandBLAS/sparse_data/csr_matrix.hh
+++ b/RandBLAS/sparse_data/csr_matrix.hh
@@ -40,150 +40,178 @@ using RandBLAS::SignedInteger;
 // ^ only used once, but I don't want the RandBLAS prefix
 // in the doxygen.
 
+
 // =============================================================================
-/// A CSR-format sparse matrix that complies with the SparseMatrix concept.
 ///
+///  Let \math{\mtxA} denote a sparse matrix with \math{\ttt{nnz}} structural nonzeros.
+///  Its CSR representation consists of declared dimensions, \math{\ttt{n_rows}}
+///  and \math{\ttt{n_cols}}, and a triplet of arrays 
+///  \math{(\ttt{vals},\ttt{rowptr},\ttt{colidxs}).}
+///
+///  The \math{\ttt{i}^{\text{th}}} row of \math{\mtxA} has 
+///  \math{\ttt{rowptr[i+1] - rowptr[i]}} structural nonzeros.
+///  The \math{\ttt{k}^{\text{th}}} structural nonzero in row \math{\ttt{i}} appears in
+///  column \math{\ttt{colidxs[rowptr[i] + k]}} and is equal to \math{\ttt{vals[rowptr[i] + k]}.}
+/// 
+///  This type conforms to the SparseMatrix concept.
 template <typename T, SignedInteger sint_t = int64_t>
 struct CSRMatrix {
+
+    // ------------------------------------------------------------------------
+    /// Real scalar type used for structural nonzeros in this matrix.
     using scalar_t = T;
-    using index_t = sint_t; 
-    const int64_t n_rows;
-    const int64_t n_cols;
-    const bool own_memory;
-    int64_t nnz = 0;
-    IndexBase index_base;
-    T *vals = nullptr;
-    
-    // ---------------------------------------------------------------------------
-    ///  Pointer offset array for the CSR format. The number of nonzeros in row i
-    ///  is given by rowptr[i+1] - rowptr[i]. The column index of the k-th nonzero
-    ///  in row i is colidxs[rowptr[i] + k].
-    ///  
-    sint_t *rowptr = nullptr;
-    
-    // ---------------------------------------------------------------------------
-    ///  Column index array in the CSR format. 
-    ///  
-    sint_t *colidxs = nullptr;
-    bool _can_reserve = true;
 
-    CSRMatrix(
-        int64_t n_rows,
-        int64_t n_cols,
-        IndexBase index_base
-    ) : n_rows(n_rows), n_cols(n_cols), own_memory(true), index_base(index_base) { };
+    // ------------------------------------------------------------------------
+    /// Signed integer type used in the rowptr and colidxs array members.
+    using index_t = sint_t; 
 
-    CSRMatrix(
-        int64_t n_rows,
-        int64_t n_cols,
-        int64_t nnz,
-        T *vals,
-        sint_t *rowptr,
-        sint_t *colidxs,
-        IndexBase index_base
-    ) : n_rows(n_rows), n_cols(n_cols), own_memory(false), index_base(index_base) {
-        this->nnz = nnz;
-        this->vals = vals;
-        this->rowptr = rowptr;
-        this->colidxs = colidxs;
-    };
+    // ------------------------------------------------------------------------
+    ///  The number of rows in this sparse matrix.
+    const int64_t n_rows;
 
-    // Constructs an empty sparse matrix of given dimensions.
-    // Data can't stored in this object until a subsequent call to reserve(int64_t nnz).
-    // This constructor initializes \math{\ttt{own_memory(true)},} and so
-    // all data stored in this object is deleted once its destructor is invoked.
-    //
-    CSRMatrix(
-        int64_t n_rows,
-        int64_t n_cols
-    ) : CSRMatrix(n_rows, n_cols, IndexBase::Zero) {};
+    // ------------------------------------------------------------------------
+    ///  The number of columns in this sparse matrix.
+    const int64_t n_cols;
 
-    // ---------------------------------------------------------------------------
-    /// @verbatim embed:rst:leading-slashes
-    /// Constructs a sparse matrix based on declared dimensions and the data in three buffers
-    /// (vals, rowptr, colidxs). 
-    /// This constructor initializes :math:`\ttt{own_memory(false)}`, and
-    /// so the provided buffers are unaffected when this object's destructor
-    /// is invoked.
+    // ------------------------------------------------------------------------
+    ///  If true, then RandBLAS has permission to allocate and attach memory to the reference
+    ///  members of this matrix (vals, rowptr, colidxs). If true *at destruction time*, then delete []
+    ///  will be called on each non-null reference member of this matrix.
     ///
-    /// .. dropdown:: Full parameter descriptions
-    ///     :animate: fade-in-slide-down
+    ///  RandBLAS only writes to this member at construction time.
     ///
-    ///      n_rows - [in]
-    ///       * The number of rows in this sparse matrix.
+    bool own_memory;
+    
+    // ------------------------------------------------------------------------
+    ///  The number of structral nonzeros in this matrix.
+    ///
+    int64_t nnz;
+    
+    // ------------------------------------------------------------------------
+    ///  A flag to indicate whether colidxs is interpreted
+    ///  with zero-based or one-based indexing.
     ///
-    ///      n_cols - [in]
-    ///       * The number of columns in this sparse matrix.
+    IndexBase index_base;
+    
+    // ------------------------------------------------------------------------
+    ///  Reference to an array that holds values for the structural nonzeros of this matrix.
+    ///
+    ///  If non-null, this must have length at least nnz.
+    ///  \internal
+    ///  **Memory management note.** Because this length requirement is not a function of
+    ///  only const variables, calling reserve_csr(nnz, A) on a CSRMatrix "A" will raise an error
+    ///  if A.vals is non-null.
+    ///  \endinternal
+    T *vals;
+    
+    // ------------------------------------------------------------------------
+    /// Reference to a pointer offset array for the CSR format. 
     ///
-    ///      nnz - [in]
-    ///       * The number of structural nonzeros in the matrix.
+    ///  If non-null, then must have length at least \math{\ttt{n_rows + 1}}.
     ///
-    ///      vals - [in]
-    ///       * Pointer to array of real numerical type T, of length at least nnz.
-    ///       * Stores values of structural nonzeros as part of the CSR format.
+    sint_t *rowptr;
+    
+    // ------------------------------------------------------------------------
+    ///  Reference to a column index array in the CSR format, interpreted in \math{\ttt{index_base}}.
     ///
-    ///      rowptr - [in]
-    ///       * Pointer to array of sint_t, of length at least n_rows + 1.
+    ///  If non-null, then must have length at least nnz.
+    ///  \internal
+    ///  **Memory management note.** Because this length requirement is not a function of
+    ///  only const variables, calling reserve_csr(nnz, A) on a CSRMatrix "A" will raise an error
+    ///  if A.colidxs is non-null.
+    ///  \endinternal
+    sint_t *colidxs;
+
+    // ------------------------------------------------------------------------
+    ///  **Standard constructor.** Initializes n_rows and n_cols at the provided values.
+    ///  The vals, rowptr, and colidxs members are set to null pointers;
+    ///  nnz is set to zero, index_base is set to
+    ///  Zero, and CSRMatrix::own_memory is set to true.
+    ///  
+    ///  This constructor is intended for use with reserve_csr(int64_t nnz, CSRMatrix &A).
     ///
-    ///      colidxs - [in]
-    ///       * Pointer to array of sint_t, of length at least nnz.
+    CSRMatrix(
+        int64_t n_rows,
+        int64_t n_cols
+    ) : n_rows(n_rows), n_cols(n_cols), own_memory(true), nnz(0), index_base(IndexBase::Zero),
+        vals(nullptr), rowptr(nullptr), colidxs(nullptr) { };
+
+    // ------------------------------------------------------------------------
+    /// **Expert constructor.** Arguments passed to this function are used to initialize members of the same names;
+    /// CSRMatrix::own_memory is set to false.
     ///
-    /// @endverbatim
     CSRMatrix(
         int64_t n_rows,
         int64_t n_cols,
         int64_t nnz,
         T *vals,
         sint_t *rowptr,
-        sint_t *colidxs
-    ) : CSRMatrix(n_rows, n_cols, nnz, vals, rowptr, colidxs, IndexBase::Zero) {};
+        sint_t *colidxs,
+        IndexBase index_base = IndexBase::Zero
+    ) : n_rows(n_rows), n_cols(n_cols), own_memory(false), nnz(nnz), index_base(index_base),
+        vals(vals), rowptr(rowptr), colidxs(colidxs) { };
 
     ~CSRMatrix() {
-        if (this->own_memory) {
-            delete [] this->rowptr;
-            delete [] this->colidxs;
-            delete [] this->vals;
-        }
-    };
-
-    // @verbatim embed:rst:leading-slashes
-    // Attach three buffers to this CSRMatrix, (vals, rowptr, colidxs), of sufficient
-    // size for this matrix to hold nnz structural nonzeros.
-    // This function can only be called if :math:`\ttt{own_memory == true},`` and
-    // it can only be called once.
-    //
-    // @endverbatim
-    void reserve(int64_t nnz) {
-        randblas_require(this->_can_reserve);
-        randblas_require(this->own_memory);
-        this->rowptr = new sint_t[this->n_rows + 1]{0};
-        this->nnz = nnz;
-        if (this->nnz > 0) {
-            this->colidxs = new sint_t[nnz]{0};
-            this->vals = new T[nnz]{0.0};
+        if (own_memory) {
+            if (rowptr  != nullptr) delete [] rowptr;
+            if (colidxs != nullptr) delete [] colidxs;
+            if (vals    != nullptr) delete [] vals;
         }
-        this->_can_reserve = false;
     };
 
-    /////////////////////////////////////////////////////////////////////
-    //
-    //     Functions that we don't want to appear in doxygen.
-    //
-    /////////////////////////////////////////////////////////////////////
-
     // move constructor
     CSRMatrix(CSRMatrix<T, sint_t> &&other)
-    : n_rows(other.n_rows), n_cols(other.n_cols), own_memory(other.own_memory), index_base(other.index_base)  {
-        this->nnz = other.nnz;
-        std::swap(this->colidxs, other.colidxs);
-        std::swap(this->rowptr , other.rowptr );
-        std::swap(this->vals   , other.vals   );
-        this->_can_reserve = other._can_reserve;
+    : n_rows(other.n_rows), n_cols(other.n_cols), own_memory(other.own_memory), nnz(other.nnz), index_base(other.index_base),
+      vals(nullptr), rowptr(nullptr), colidxs(nullptr) {
+        std::swap(colidxs, other.colidxs);
+        std::swap(rowptr , other.rowptr );
+        std::swap(vals   , other.vals   );
         other.nnz = 0;
     };
 
 };
 
+#ifdef __cpp_concepts
+static_assert(SparseMatrix<CSRMatrix<float>>);
+static_assert(SparseMatrix<CSRMatrix<double>>);
+#endif
+
+// -----------------------------------------------------
+///
+/// This function requires that M.own_memory is true, that
+/// M.colidxs is null, and that M.vals is null. If any of
+/// these conditions are not met then this function will
+/// raise an error.
+/// 
+/// Special logic applies to M.rowptr because its documented length
+/// requirement is determined by the const variable M.n_rows.
+///
+/// - If M.rowptr is non-null then it is left unchanged,
+///   and it is presumed to point to an array of length
+///   at least M.n_rows + 1.
+///
+/// - If M.rowptr is null, then it will be redirected to
+///   a new array of type sint_t and length (M.n_rows + 1).
+///
+/// From there, M.nnz is overwritten by nnz, and the reference
+/// members M.colidxs and M.vals are redirected to new
+/// arrays of length nnz (of types sint_t and T, respectively).
+///
+template <typename T, SignedInteger sint_t>
+void reserve_csr(int64_t nnz, CSRMatrix<T, sint_t> &M) {
+    randblas_require(M.own_memory);
+    randblas_require(M.colidxs == nullptr);
+    randblas_require(M.vals    == nullptr);
+    if (M.rowptr == nullptr) 
+        M.rowptr = new sint_t[M.n_rows + 1]{0};
+    M.nnz = nnz;
+    if (nnz > 0) {
+        M.colidxs = new sint_t[nnz]{0};
+        M.vals    = new T[nnz]{0.0};
+    }
+   return;
+}
+
 } // end namespace RandBLAS::sparse_data
 
 namespace RandBLAS::sparse_data::csr {
@@ -194,7 +222,7 @@ using blas::Layout;
 template <typename T>
 void csr_to_dense(const CSRMatrix<T> &spmat, int64_t stride_row, int64_t stride_col, T *mat) {
     randblas_require(spmat.index_base == IndexBase::Zero);
-    auto rowptr = spmat.rowptr;
+    auto rowptr  = spmat.rowptr;
     auto colidxs = spmat.colidxs;
     auto vals = spmat.vals;
     #define MAT(_i, _j) mat[(_i) * stride_row + (_j) * stride_col]
@@ -232,7 +260,7 @@ void dense_to_csr(int64_t stride_row, int64_t stride_col, T *mat, T abs_tol, CSR
     // Step 1: count the number of entries with absolute value at least abstol
     int64_t nnz = nnz_in_dense(n_rows, n_cols, stride_row, stride_col, mat, abs_tol);
     // Step 2: allocate memory needed by the sparse matrix
-    spmat.reserve(nnz);
+    reserve_csr(nnz, spmat);
     // Step 3: traverse the dense matrix again, populating the sparse matrix as we go
     nnz = 0;
     spmat.rowptr[0] = 0;
diff --git a/RandBLAS/sparse_data/csr_spmm_impl.hh b/RandBLAS/sparse_data/csr_spmm_impl.hh
index 1665c6ec..2958b283 100644
--- a/RandBLAS/sparse_data/csr_spmm_impl.hh
+++ b/RandBLAS/sparse_data/csr_spmm_impl.hh
@@ -42,7 +42,9 @@
 
 namespace RandBLAS::sparse_data::csr {
 
-template <typename T, RandBLAS::SignedInteger sint_t = int64_t>
+using RandBLAS::SignedInteger;
+
+template <typename T, SignedInteger sint_t = int64_t>
 static void apply_csr_to_vector_from_left_ik(
     // CSR-format data
     const T *vals,
@@ -66,7 +68,7 @@ static void apply_csr_to_vector_from_left_ik(
     }
 }
 
-template <typename T, RandBLAS::SignedInteger sint_t>
+template <typename T, SignedInteger sint_t>
 static void apply_csr_left_jik_p11(
     T alpha,
     blas::Layout layout_B,
@@ -119,7 +121,7 @@ static void apply_csr_left_jik_p11(
     return;
 }
 
-template <typename T, RandBLAS::SignedInteger sint_t>
+template <typename T, SignedInteger sint_t>
 static void apply_csr_left_ikb_rowmajor(
     T alpha,
     int64_t d,
diff --git a/RandBLAS/sparse_data/sksp.hh b/RandBLAS/sparse_data/sksp.hh
index 8045b4d8..41e88898 100644
--- a/RandBLAS/sparse_data/sksp.hh
+++ b/RandBLAS/sparse_data/sksp.hh
@@ -47,25 +47,10 @@ namespace RandBLAS::sparse_data {
 /// Sketch from the left in an SpMM-like operation
 ///
 /// .. math::
-///     \mat(B) = \alpha \cdot \underbrace{\op(\submat(S))}_{d \times m} \cdot \underbrace{\op(\submat(A))}_{m \times n} + \beta \cdot \underbrace{\mat(B)}_{d \times n},    \tag{$\star$}
+///     \mat(B) = \alpha \cdot \underbrace{\op(\submat(\mtxS))}_{d \times m} \cdot \underbrace{\op(\submat(\mtxA))}_{m \times n} + \beta \cdot \underbrace{\mat(B)}_{d \times n},    \tag{$\star$}
 ///
-/// where :math:`\alpha` and :math:`\beta` are real scalars, :math:`\op(X)` either returns a matrix :math:`X`
-/// or its transpose, :math:`A` is a sparse matrix, and :math:`S` is a dense sketching operator.
-/// 
-/// .. dropdown:: FAQ
-///   :animate: fade-in-slide-down
-///
-///     **What's** :math:`\mat(B)` **?**
-///
-///       It's matrix of shape :math:`d \times n`. Its contents are determined by :math:`(B, \ldb)`
-///       and "layout", following the same convention as the Level 3 BLAS function "GEMM."
-///
-///     **What are** :math:`\submat(S)` **and** :math:`\submat(A)` **?**
-///
-///       Their shapes are determined implicitly by :math:`(\opS, d, m)` and :math:`(\opA, n, m)`.
-///       If :math:`{\submat(X)}` is of shape :math:`r \times c`,
-///       then it is the :math:`r \times c` submatrix of :math:`{X}` whose upper-left corner
-///       appears at index :math:`(\texttt{ro_x}, \texttt{co_x})` of :math:`{X}`.
+/// where :math:`\alpha` and :math:`\beta` are real scalars, :math:`\op(\mtxX)` either returns a matrix :math:`\mtxX`
+/// or its transpose, :math:`\mtxA` is a sparse matrix, and :math:`\mtxS` is a dense sketching operator.
 ///
 /// .. dropdown:: Full parameter descriptions
 ///     :animate: fade-in-slide-down
@@ -75,58 +60,58 @@ namespace RandBLAS::sparse_data {
 ///       * Matrix storage for :math:`\mat(B)`.
 ///
 ///      opS - [in]
-///       * If :math:`\opS` = NoTrans, then :math:`\op(\submat(S)) = \submat(S)`.
-///       * If :math:`\opS` = Trans, then :math:`\op(\submat(S)) = \submat(S)^T`.
+///       * If :math:`\opS` = NoTrans, then :math:`\op(\submat(\mtxS)) = \submat(\mtxS)`.
+///       * If :math:`\opS` = Trans, then :math:`\op(\submat(\mtxS)) = \submat(\mtxS)^T`.
 ///
 ///      opA - [in]
-///       * If :math:`\opA` = NoTrans, then :math:`\op(\submat(A)) = \submat(A)`.
-///       * If :math:`\opA` = Trans, then :math:`\op(\submat(A)) = \submat(A)^T`.
+///       * If :math:`\opA` = NoTrans, then :math:`\op(\submat(\mtxA)) = \submat(\mtxA)`.
+///       * If :math:`\opA` = Trans, then :math:`\op(\submat(\mtxA)) = \submat(\mtxA)^T`.
 ///
 ///      d - [in]
 ///       * A nonnegative integer.
 ///       * The number of rows in :math:`\mat(B)`.
-///       * The number of rows in :math:`\op(\submat(S))`.
+///       * The number of rows in :math:`\op(\submat(\mtxS))`.
 ///
 ///      n - [in]
 ///       * A nonnegative integer.
 ///       * The number of columns in :math:`\mat(B)`.
-///       * The number of columns in :math:`\op(\mat(A))`.
+///       * The number of columns in :math:`\op(\mat(\mtxA))`.
 ///
 ///      m - [in]
 ///       * A nonnegative integer.
-///       * The number of columns in :math:`\op(\submat(S))`
-///       * The number of rows in :math:`\op(\mat(A))`.
+///       * The number of columns in :math:`\op(\submat(\mtxS))`
+///       * The number of rows in :math:`\op(\mat(\mtxA))`.
 ///
 ///      alpha - [in]
 ///       * A real scalar.
 ///
 ///      S - [in]
 ///       * A DenseSkOp object.
-///       * Defines :math:`\submat(S)`.
+///       * Defines :math:`\submat(\mtxS)`.
 ///
 ///      ro_s - [in]
 ///       * A nonnegative integer.
-///       * The rows of :math:`\submat(S)` are a contiguous subset of rows of :math:`S`.
-///       * The rows of :math:`\submat(S)` start at :math:`S[\texttt{ro_s}, :]`.
+///       * The rows of :math:`\submat(\mtxS)` are a contiguous subset of rows of :math:`\mtxS`.
+///       * The rows of :math:`\submat(\mtxS)` start at :math:`S[\texttt{ro_s}, :]`.
 ///
 ///      co_s - [in]
 ///       * A nonnegative integer.
-///       * The columns of :math:`\submat(S)` are a contiguous subset of columns of :math:`S`.
-///       * The columns :math:`\submat(S)` start at :math:`S[:,\texttt{co_s}]`. 
+///       * The columns of :math:`\submat(\mtxS)` are a contiguous subset of columns of :math:`\mtxS`.
+///       * The columns :math:`\submat(\mtxS)` start at :math:`S[:,\texttt{co_s}]`. 
 ///
 ///      A - [in]
 ///       * A RandBLAS sparse matrix object.
-///       * Defines :math:`\submat(A)`.
+///       * Defines :math:`\submat(\mtxA)`.
 ///
 ///      ro_a - [in]
 ///       * A nonnegative integer.
-///       * The rows of :math:`\submat(A)` are a contiguous subset of rows of :math:`A`.
-///       * The rows of :math:`\submat(A)` start at :math:`A[\texttt{ro_a}, :]`.
+///       * The rows of :math:`\submat(\mtxA)` are a contiguous subset of rows of :math:`\mtxA`.
+///       * The rows of :math:`\submat(\mtxA)` start at :math:`\mtxA[\texttt{ro_a}, :]`.
 ///
 ///      co_a - [in]
 ///       * A nonnegative integer.
-///       * The columns of :math:`\submat(A)` are a contiguous subset of columns of :math:`A`.
-///       * The columns :math:`\submat(A)` start at :math:`A[:,\texttt{co_a}]`. 
+///       * The columns of :math:`\submat(\mtxA)` are a contiguous subset of columns of :math:`\mtxA`.
+///       * The columns :math:`\submat(\mtxA)` start at :math:`\mtxA[:,\texttt{co_a}]`. 
 ///
 ///      beta - [in]
 ///       * A real scalar.
@@ -144,16 +129,16 @@ namespace RandBLAS::sparse_data {
 ///       * Leading dimension of :math:`\mat(B)` when reading from :math:`B`.
 ///
 /// @endverbatim
-template <typename T, SparseMatrix SpMat, typename RNG>
+template <typename T, SparseMatrix SpMat, typename DenseSkOp>
 void lsksp3(
     blas::Layout layout,
     blas::Op opS,
     blas::Op opA,
     int64_t d, // B is d-by-n
-    int64_t n, // op(submat(A)) is m-by-n
-    int64_t m, // op(submat(S)) is d-by-m
+    int64_t n, // op(submat(\mtxA)) is m-by-n
+    int64_t m, // op(submat(\mtxS)) is d-by-m
     T alpha,
-    DenseSkOp<T, RNG> &S,
+    DenseSkOp &S,
     int64_t ro_s,
     int64_t co_s,
     SpMat &A,
@@ -163,26 +148,31 @@ void lsksp3(
     T *B,
     int64_t ldb
 ) {
-    // B = op(submat(S)) @ op(submat(A))
+    // B = op(submat(\mtxS)) @ op(submat(\mtxA))
     auto [rows_submat_S, cols_submat_S] = dims_before_op(d, m, opS);
-    if (!S.buff) {
-        auto submat_S = submatrix_as_blackbox(S, rows_submat_S, cols_submat_S, ro_s, co_s);
-        lsksp3(layout, opS, opA, d, n, m, alpha, submat_S, 0, 0, A, ro_a, co_a, beta, B, ldb);
-        return;
-    }
-
+    constexpr bool maybe_denseskop = !std::is_same_v<std::remove_cv_t<DenseSkOp>, BLASFriendlyOperator<T>>;
+    if constexpr (maybe_denseskop) {
+        if (!S.buff) {
+            // DenseSkOp doesn't permit defining a "black box" distribution, so we have to pack the submatrix
+            // into an equivalent datastructure ourselves.
+            auto submat_S = submatrix_as_blackbox<BLASFriendlyOperator<T>>(S, rows_submat_S, cols_submat_S, ro_s, co_s);
+            lsksp3(layout, opS, opA, d, n, m, alpha, submat_S, 0, 0, A, ro_a, co_a, beta, B, ldb);
+            return;
+        } // else, proceed with the rest of the function call.
+    } 
+    randblas_require( S.buff != nullptr );
     auto [rows_submat_A, cols_submat_A] = dims_before_op(m, n, opA);
-    randblas_require( A.n_rows      >= rows_submat_A + ro_a );
-    randblas_require( A.n_cols      >= cols_submat_A + co_a );
-    randblas_require( S.dist.n_rows >= rows_submat_S + ro_s );
-    randblas_require( S.dist.n_cols >= cols_submat_S + co_s );
+    randblas_require( A.n_rows >= rows_submat_A + ro_a );
+    randblas_require( A.n_cols >= cols_submat_A + co_a );
+    randblas_require( S.n_rows >= rows_submat_S + ro_s );
+    randblas_require( S.n_cols >= cols_submat_S + co_s );
     if (layout == blas::Layout::ColMajor) {
         randblas_require(ldb >= d);
     } else {
         randblas_require(ldb >= n);
     }
 
-    auto [pos, lds] = offset_and_ldim(S.layout, S.dist.n_rows, S.dist.n_cols, ro_s, co_s);
+    auto [pos, lds] = offset_and_ldim(S.layout, S.n_rows, S.n_cols, ro_s, co_s);
     T* S_ptr = &S.buff[pos];
     if (S.layout != layout)
         opS = (opS == blas::Op::NoTrans) ? blas::Op::Trans : blas::Op::NoTrans;
@@ -202,25 +192,10 @@ void lsksp3(
 /// Sketch from the right in an SpMM-like operation
 ///
 /// .. math::
-///     \mat(B) = \alpha \cdot \underbrace{\op(\submat(A))}_{m \times n} \cdot \underbrace{\op(\submat(S))}_{n \times d} + \beta \cdot \underbrace{\mat(B)}_{m \times d},    \tag{$\star$}
-///
-/// where :math:`\alpha` and :math:`\beta` are real scalars, :math:`\op(X)` either returns a matrix :math:`X`
-/// or its transpose, :math:`A` is a sparse matrix, and :math:`S` is a dense sketching operator.
-/// 
-/// .. dropdown:: FAQ
-///   :animate: fade-in-slide-down
-///
-///     **What's** :math:`\mat(B)` **?**
-///
-///       It's matrix of shape :math:`m \times d`. Its contents are determined by :math:`(B, \ldb)`
-///       and "layout", following the same convention as the Level 3 BLAS function "GEMM."
-///
-///     **What are** :math:`\submat(S)` **and** :math:`\submat(A)` **?**
+///     \mat(B) = \alpha \cdot \underbrace{\op(\submat(\mtxA))}_{m \times n} \cdot \underbrace{\op(\submat(\mtxS))}_{n \times d} + \beta \cdot \underbrace{\mat(B)}_{m \times d},    \tag{$\star$}
 ///
-///       Their shapes are determined implicitly by :math:`(\opS, n, d)` and :math:`(\opA, m, n)`.
-///       If :math:`{\submat(X)}` is of shape :math:`r \times c`,
-///       then it is the :math:`r \times c` submatrix of :math:`{X}` whose upper-left corner
-///       appears at index :math:`(\texttt{ro_x}, \texttt{co_x})` of :math:`{X}`.
+/// where :math:`\alpha` and :math:`\beta` are real scalars, :math:`\op(\mtxX)` either returns a matrix :math:`\mtxX`
+/// or its transpose, :math:`\mtxA` is a sparse matrix, and :math:`\mtxS` is a dense sketching operator.
 ///
 /// .. dropdown:: Full parameter descriptions
 ///     :animate: fade-in-slide-down
@@ -230,58 +205,58 @@ void lsksp3(
 ///       * Matrix storage for :math:`\mat(B)`.
 ///
 ///      opA - [in]
-///       * If :math:`\opA` == NoTrans, then :math:`\op(\submat(A)) = \submat(A)`.
-///       * If :math:`\opA` == Trans, then :math:`\op(\submat(A)) = \submat(A)^T`.
+///       * If :math:`\opA` == NoTrans, then :math:`\op(\submat(\mtxA)) = \submat(\mtxA)`.
+///       * If :math:`\opA` == Trans, then :math:`\op(\submat(\mtxA)) = \submat(\mtxA)^T`.
 ///
 ///      opS - [in]
-///       * If :math:`\opS` = NoTrans, then :math:`\op(\submat(S)) = \submat(S)`.
-///       * If :math:`\opS` = Trans, then :math:`\op(\submat(S)) = \submat(S)^T`.
+///       * If :math:`\opS` = NoTrans, then :math:`\op(\submat(\mtxS)) = \submat(\mtxS)`.
+///       * If :math:`\opS` = Trans, then :math:`\op(\submat(\mtxS)) = \submat(\mtxS)^T`.
 ///
 ///      m - [in]
 ///       * A nonnegative integer.
 ///       * The number of rows in :math:`\mat(B)`.
-///       * The number of rows in :math:`\op(\submat(A))`.
+///       * The number of rows in :math:`\op(\submat(\mtxA))`.
 ///
 ///      d - [in]
 ///       * A nonnegative integer.
 ///       * The number of columns in :math:`\mat(B)`
-///       * The number of columns in :math:`\op(\submat(S))`.
+///       * The number of columns in :math:`\op(\submat(\mtxS))`.
 ///
 ///      n - [in]
 ///       * A nonnegative integer.
-///       * The number of columns in :math:`\op(\submat(A))`
-///       * The number of rows in :math:`\op(\submat(S))`.
+///       * The number of columns in :math:`\op(\submat(\mtxA))`
+///       * The number of rows in :math:`\op(\submat(\mtxS))`.
 ///
 ///      alpha - [in]
 ///       * A real scalar.
 ///
 ///      A - [in]
 ///       * A RandBLAS sparse matrix object.
-///       * Defines :math:`\submat(A)`.
+///       * Defines :math:`\submat(\mtxA)`.
 ///
 ///      ro_a - [in]
 ///       * A nonnegative integer.
-///       * The rows of :math:`\submat(A)` are a contiguous subset of rows of :math:`A`.
-///       * The rows of :math:`\submat(A)` start at :math:`A[\texttt{ro_a}, :]`.
+///       * The rows of :math:`\submat(\mtxA)` are a contiguous subset of rows of :math:`\mtxA`.
+///       * The rows of :math:`\submat(\mtxA)` start at :math:`\mtxA[\texttt{ro_a}, :]`.
 ///
 ///      co_a - [in]
 ///       * A nonnegative integer.
-///       * The columns of :math:`\submat(A)` are a contiguous subset of columns of :math:`A`.
-///       * The columns :math:`\submat(A)` start at :math:`A[:,\texttt{co_a}]`. 
+///       * The columns of :math:`\submat(\mtxA)` are a contiguous subset of columns of :math:`\mtxA`.
+///       * The columns :math:`\submat(\mtxA)` start at :math:`\mtxA[:,\texttt{co_a}]`. 
 ///
 ///      S - [in]
 ///       * A DenseSkOp object.
-///       * Defines :math:`\submat(S)`.
+///       * Defines :math:`\submat(\mtxS)`.
 ///
 ///      ro_s - [in]
 ///       * A nonnegative integer.
-///       * The rows of :math:`\submat(S)` are a contiguous subset of rows of :math:`S`.
-///       * The rows of :math:`\submat(S)` start at :math:`S[\texttt{ro_s}, :]`.
+///       * The rows of :math:`\submat(\mtxS)` are a contiguous subset of rows of :math:`\mtxS`.
+///       * The rows of :math:`\submat(\mtxS)` start at :math:`\mtxS[\texttt{ro_s}, :]`.
 ///
 ///      co_s - [in]
 ///       * A nonnegative integer.
-///       * The columns of :math:`\submat(S)` are a contiguous subset of columns of :math:`S`.
-///       * The columns :math:`\submat(S)` start at :math:`S[:,\texttt{co_s}]`. 
+///       * The columns of :math:`\submat(\mtxS)` are a contiguous subset of columns of :math:`\mtxS`.
+///       * The columns :math:`\submat(\mtxS)` start at :math:`\mtxS[:,\texttt{co_s}]`. 
 ///
 ///      beta - [in]
 ///       * A real scalar.
@@ -299,19 +274,19 @@ void lsksp3(
 ///       * Leading dimension of :math:`\mat(B)` when reading from :math:`B`.
 ///
 /// @endverbatim
-template <typename T, SparseMatrix SpMat, typename RNG>
+template <typename T, SparseMatrix SpMat, typename DenseSkOp>
 void rsksp3(
     blas::Layout layout,
     blas::Op opA,
     blas::Op opS,
     int64_t m, // B is m-by-d
-    int64_t d, // op(submat(A)) is m-by-n
-    int64_t n, // op(submat(S)) is n-by-d
+    int64_t d, // op(submat(\mtxA)) is m-by-n
+    int64_t n, // op(submat(\mtxS)) is n-by-d
     T alpha,
     SpMat &A,
     int64_t ro_a,
     int64_t co_a,
-    DenseSkOp<T, RNG> &S,
+    DenseSkOp &S,
     int64_t ro_s,
     int64_t co_s,
     T beta,
@@ -319,23 +294,29 @@ void rsksp3(
     int64_t ldb
 ) {
     auto [rows_submat_S, cols_submat_S] = dims_before_op(n, d, opS);
-    if (!S.buff) {
-        auto submat_S = submatrix_as_blackbox(S, rows_submat_S, cols_submat_S, ro_s, co_s);
-        rsksp3(layout, opA, opS, m, d, n, alpha, A, ro_a, co_a, submat_S, 0, 0, beta, B, ldb);
-        return;
+    constexpr bool maybe_denseskop = !std::is_same_v<std::remove_cv_t<DenseSkOp>, BLASFriendlyOperator<T>>;
+    if constexpr (maybe_denseskop) {
+        if (!S.buff) {
+            // DenseSkOp doesn't permit defining a "black box" distribution, so we have to pack the submatrix
+            // into an equivalent datastructure ourselves.
+            auto submat_S = submatrix_as_blackbox<BLASFriendlyOperator<T>>(S, rows_submat_S, cols_submat_S, ro_s, co_s);
+            rsksp3(layout, opA, opS, m, d, n, alpha, A, ro_a, co_a, submat_S, 0, 0, beta, B, ldb);
+            return;
+        }
     }
+    randblas_require( S.buff != nullptr );
     auto [rows_submat_A, cols_submat_A] = dims_before_op(m, n, opA);
-    randblas_require( A.n_rows      >= rows_submat_A + ro_a );
-    randblas_require( A.n_cols      >= cols_submat_A + co_a );
-    randblas_require( S.dist.n_rows >= rows_submat_S + ro_s );
-    randblas_require( S.dist.n_cols >= cols_submat_S + co_s );
+    randblas_require( A.n_rows >= rows_submat_A + ro_a );
+    randblas_require( A.n_cols >= cols_submat_A + co_a );
+    randblas_require( S.n_rows >= rows_submat_S + ro_s );
+    randblas_require( S.n_cols >= cols_submat_S + co_s );
     if (layout == blas::Layout::ColMajor) {
         randblas_require(ldb >= m);
     } else {
         randblas_require(ldb >= d);
     }
 
-    auto [pos, lds] = offset_and_ldim(S.layout, S.dist.n_rows, S.dist.n_cols, ro_s, co_s);
+    auto [pos, lds] = offset_and_ldim(S.layout, S.n_rows, S.n_cols, ro_s, co_s);
     T* S_ptr = &S.buff[pos];
     if (S.layout != layout)
         opS = (opS == blas::Op::NoTrans) ? blas::Op::Trans : blas::Op::NoTrans;
@@ -352,36 +333,20 @@ namespace RandBLAS {
 using namespace RandBLAS::dense;
 using namespace RandBLAS::sparse_data;
 
-// MARK: SKSP overloads, sub
+// MARK: SKSP overloads, full
 
 // =============================================================================
-/// \fn sketch_sparse(blas::Layout layout, blas::Op opS, blas::Op opA, int64_t d,
-///     int64_t n, int64_t m, T alpha, DenseSkOp<T,RNG> &S, int64_t ro_s, int64_t co_s,
-///     SpMat &A, int64_t ro_a, int64_t co_a, T beta, T *B, int64_t ldb
+/// \fn sketch_sparse(blas::Layout layout, blas::Op opS, blas::Op opA, int64_t d,  int64_t n, int64_t m,
+///     T alpha, DenseSkOp &S, int64_t ro_s, int64_t co_s, SpMat &A, T beta, T *B, int64_t ldb
 /// ) 
 /// @verbatim embed:rst:leading-slashes
 /// Sketch from the left in an SpMM-like operation
 ///
 /// .. math::
-///     \mat(B) = \alpha \cdot \underbrace{\op(\submat(S))}_{d \times m} \cdot \underbrace{\op(\submat(A))}_{m \times n} + \beta \cdot \underbrace{\mat(B)}_{d \times n},    \tag{$\star$}
-///
-/// where :math:`\alpha` and :math:`\beta` are real scalars, :math:`\op(X)` either returns a matrix :math:`X`
-/// or its transpose, :math:`A` is a sparse matrix, and :math:`S` is a dense sketching operator.
-/// 
-/// .. dropdown:: FAQ
-///   :animate: fade-in-slide-down
-///
-///     **What's** :math:`\mat(B)` **?**
+///     \mat(B) = \alpha \cdot \underbrace{\op(\submat(\mtxS))}_{d \times m} \cdot \underbrace{\op(\mtxA)}_{m \times n} + \beta \cdot \underbrace{\mat(B)}_{d \times n},    \tag{$\star$}
 ///
-///       It's matrix of shape :math:`d \times n`. Its contents are determined by :math:`(B, \ldb)`
-///       and "layout", following the same convention as the Level 3 BLAS function "GEMM."
-///
-///     **What are** :math:`\submat(S)` **and** :math:`\submat(A)` **?**
-///
-///       Their shapes are determined implicitly by :math:`(\opS, d, m)` and :math:`(\opA, n, m)`.
-///       If :math:`{\submat(X)}` is of shape :math:`r \times c`,
-///       then it is the :math:`r \times c` submatrix of :math:`{X}` whose upper-left corner
-///       appears at index :math:`(\texttt{ro_x}, \texttt{co_x})` of :math:`{X}`.
+/// where :math:`\alpha` and :math:`\beta` are real scalars, :math:`\op(\mtxX)` either returns a matrix :math:`\mtxX`
+/// or its transpose, :math:`\mtxA` is a sparse matrix, and :math:`\mtxS` is a dense sketching operator.
 ///
 /// .. dropdown:: Full parameter descriptions
 ///     :animate: fade-in-slide-down
@@ -391,27 +356,27 @@ using namespace RandBLAS::sparse_data;
 ///       * Matrix storage for :math:`\mat(B)`.
 ///
 ///      opS - [in]
-///       * If :math:`\opS` = NoTrans, then :math:`\op(\submat(S)) = \submat(S)`.
-///       * If :math:`\opS` = Trans, then :math:`\op(\submat(S)) = \submat(S)^T`.
+///       * If :math:`\opS` = NoTrans, then :math:`\op(\submat(\mtxS)) = \submat(\mtxS)`.
+///       * If :math:`\opS` = Trans, then :math:`\op(\submat(\mtxS)) = \submat(\mtxS)^T`.
 ///
 ///      opA - [in]
-///       * If :math:`\opA` = NoTrans, then :math:`\op(\submat(A)) = \submat(A)`.
-///       * If :math:`\opA` = Trans, then :math:`\op(\submat(A)) = \submat(A)^T`.
+///       * If :math:`\opA` = NoTrans, then :math:`\op(\mtxA) = \mtxA`.
+///       * If :math:`\opA` = Trans, then :math:`\op(\mtxA) = \mtxA^T`.
 ///
 ///      d - [in]
 ///       * A nonnegative integer.
 ///       * The number of rows in :math:`\mat(B)`.
-///       * The number of rows in :math:`\op(\submat(S))`.
+///       * The number of rows in :math:`\op(\submat(\mtxS))`.
 ///
 ///      n - [in]
 ///       * A nonnegative integer.
 ///       * The number of columns in :math:`\mat(B)`.
-///       * The number of columns in :math:`\op(\mat(A))`.
+///       * The number of columns in :math:`\op(\mtxA)`.
 ///
 ///      m - [in]
 ///       * A nonnegative integer.
-///       * The number of columns in :math:`\op(\submat(S))`
-///       * The number of rows in :math:`\op(\mat(A))`.
+///       * The number of columns in :math:`\op(\submat(\mtxS))`
+///       * The number of rows in :math:`\op(\mtxA)`.
 ///
 ///      alpha - [in]
 ///       * A real scalar.
@@ -419,31 +384,20 @@ using namespace RandBLAS::sparse_data;
 ///
 ///      S - [in]
 ///       * A DenseSkOp object.
-///       * Defines :math:`\submat(S)`.
+///       * Defines :math:`\submat(\mtxS)`.
 ///
 ///      ro_s - [in]
 ///       * A nonnegative integer.
-///       * The rows of :math:`\submat(S)` are a contiguous subset of rows of :math:`S`.
-///       * The rows of :math:`\submat(S)` start at :math:`S[\texttt{ro_s}, :]`.
+///       * The rows of :math:`\submat(\mtxS)` are a contiguous subset of rows of :math:`\mtxS`.
+///       * The rows of :math:`\submat(\mtxS)` start at :math:`\mtxS[\texttt{ro_s}, :]`.
 ///
 ///      co_s - [in]
 ///       * A nonnegative integer.
-///       * The columns of :math:`\submat(S)` are a contiguous subset of columns of :math:`S`.
-///       * The columns :math:`\submat(S)` start at :math:`S[:,\texttt{co_s}]`. 
+///       * The columns of :math:`\submat(\mtxS)` are a contiguous subset of columns of :math:`\mtxS`.
+///       * The columns :math:`\submat(\mtxS)` start at :math:`\mtxS[:,\texttt{co_s}]`. 
 ///
 ///      A - [in]
 ///       * A RandBLAS sparse matrix object.
-///       * Defines :math:`\submat(A)`.
-///
-///      ro_a - [in]
-///       * A nonnegative integer.
-///       * The rows of :math:`\submat(A)` are a contiguous subset of rows of :math:`A`.
-///       * The rows of :math:`\submat(A)` start at :math:`A[\texttt{ro_a}, :]`.
-///
-///      co_a - [in]
-///       * A nonnegative integer.
-///       * The columns of :math:`\submat(A)` are a contiguous subset of columns of :math:`A`.
-///       * The columns :math:`\submat(A)` start at :math:`A[:,\texttt{co_a}]`. 
 ///
 ///      beta - [in]
 ///       * A real scalar.
@@ -461,58 +415,40 @@ using namespace RandBLAS::sparse_data;
 ///       * Leading dimension of :math:`\mat(B)` when reading from :math:`B`.
 ///
 /// @endverbatim
-template <typename T, SparseMatrix SpMat, typename RNG>
+template <SparseMatrix SpMat, typename DenseSkOp, typename T = DenseSkOp::scalar_t>
 inline void sketch_sparse(
     blas::Layout layout,
     blas::Op opS,
     blas::Op opA,
     int64_t d, // B is d-by-n
-    int64_t n, // op(submat(A)) is m-by-n
-    int64_t m, // op(submat(S)) is d-by-m
+    int64_t n, // op(submat(\mtxA)) is m-by-n
+    int64_t m, // op(submat(\mtxS)) is d-by-m
     T alpha,
-    DenseSkOp<T, RNG> &S,
+    DenseSkOp &S,
     int64_t ro_s,
     int64_t co_s,
     SpMat &A,
-    int64_t ro_a,
-    int64_t co_a,
     T beta,
     T *B,
     int64_t ldb
 ) {
-    sparse_data::lsksp3(layout, opS, opA, d, n, m, alpha, S, ro_s, co_s, A, ro_a, co_a, beta, B, ldb);
+    sparse_data::lsksp3(layout, opS, opA, d, n, m, alpha, S, ro_s, co_s, A, 0, 0, beta, B, ldb);
     return;
 }
 
 
 // =============================================================================
 /// \fn sketch_sparse(blas::Layout layout, blas::Op opS, blas::Op opA, int64_t d,
-///     int64_t n, int64_t m, T alpha, SpMat &A, int64_t ro_a, int64_t co_a,
-///     DenseSkOp<T,RNG> &S, int64_t ro_s, int64_t co_s, T beta, T *B, int64_t ldb
+///     int64_t n, int64_t m, T alpha, SpMat &A, DenseSkOp &S, int64_t ro_s, int64_t co_s, T beta, T *B, int64_t ldb
 /// ) 
 /// @verbatim embed:rst:leading-slashes
 /// Sketch from the right in an SpMM-like operation
 ///
 /// .. math::
-///     \mat(B) = \alpha \cdot \underbrace{\op(\submat(A))}_{m \times n} \cdot \underbrace{\op(\submat(S))}_{n \times d} + \beta \cdot \underbrace{\mat(B)}_{m \times d},    \tag{$\star$}
-///
-/// where :math:`\alpha` and :math:`\beta` are real scalars, :math:`\op(X)` either returns a matrix :math:`X`
-/// or its transpose, :math:`A` is a sparse matrix, and :math:`S` is a dense sketching operator.
-/// 
-/// .. dropdown:: FAQ
-///   :animate: fade-in-slide-down
-///
-///     **What's** :math:`\mat(B)` **?**
-///
-///       It's matrix of shape :math:`m \times d`. Its contents are determined by :math:`(B, \ldb)`
-///       and "layout", following the same convention as the Level 3 BLAS function "GEMM."
+///     \mat(B) = \alpha \cdot \underbrace{\op(\mtxA)}_{m \times n} \cdot \underbrace{\op(\submat(\mtxS))}_{n \times d} + \beta \cdot \underbrace{\mat(B)}_{m \times d},    \tag{$\star$}
 ///
-///     **What are** :math:`\submat(S)` **and** :math:`\submat(A)` **?**
-///
-///       Their shapes are determined implicitly by :math:`(\opS, n, d)` and :math:`(\opA, m, n)`.
-///       If :math:`{\submat(X)}` is of shape :math:`r \times c`,
-///       then it is the :math:`r \times c` submatrix of :math:`{X}` whose upper-left corner
-///       appears at index :math:`(\texttt{ro_x}, \texttt{co_x})` of :math:`{X}`.
+/// where :math:`\alpha` and :math:`\beta` are real scalars, :math:`\op(\mtxX)` either returns a matrix :math:`\mtxX`
+/// or its transpose, :math:`\mtxA` is a sparse matrix, and :math:`\mtxS` is a dense sketching operator.
 ///
 /// .. dropdown:: Full parameter descriptions
 ///     :animate: fade-in-slide-down
@@ -522,27 +458,27 @@ inline void sketch_sparse(
 ///       * Matrix storage for :math:`\mat(B)`.
 ///
 ///      opA - [in]
-///       * If :math:`\opA` == NoTrans, then :math:`\op(\submat(A)) = \submat(A)`.
-///       * If :math:`\opA` == Trans, then :math:`\op(\submat(A)) = \submat(A)^T`.
+///       * If :math:`\opA` == NoTrans, then :math:`\op(\mtxA) = \mtxA`.
+///       * If :math:`\opA` == Trans, then :math:`\op(\mtxA) = \mtxA^T`.
 ///
 ///      opS - [in]
-///       * If :math:`\opS` = NoTrans, then :math:`\op(\submat(S)) = \submat(S)`.
-///       * If :math:`\opS` = Trans, then :math:`\op(\submat(S)) = \submat(S)^T`.
+///       * If :math:`\opS` = NoTrans, then :math:`\op(\submat(\mtxS)) = \submat(\mtxS)`.
+///       * If :math:`\opS` = Trans, then :math:`\op(\submat(\mtxS)) = \submat(\mtxS)^T`.
 ///
 ///      m - [in]
 ///       * A nonnegative integer.
 ///       * The number of rows in :math:`\mat(B)`.
-///       * The number of rows in :math:`\op(\submat(A))`.
+///       * The number of rows in :math:`\op(\mtxA)`.
 ///
 ///      d - [in]
 ///       * A nonnegative integer.
 ///       * The number of columns in :math:`\mat(B)`
-///       * The number of columns in :math:`\op(\submat(S))`.
+///       * The number of columns in :math:`\op(\submat(\mtxS))`.
 ///
 ///      n - [in]
 ///       * A nonnegative integer.
-///       * The number of columns in :math:`\op(\submat(A))`
-///       * The number of rows in :math:`\op(\submat(S))`.
+///       * The number of columns in :math:`\op(\mtxA)`
+///       * The number of rows in :math:`\op(\submat(\mtxS))`.
 ///
 ///      alpha - [in]
 ///       * A real scalar.
@@ -550,31 +486,20 @@ inline void sketch_sparse(
 ///
 ///      S - [in]
 ///       * A DenseSkOp object.
-///       * Defines :math:`\submat(S)`.
+///       * Defines :math:`\submat(\mtxS)`.
 ///
 ///      ro_s - [in]
 ///       * A nonnegative integer.
-///       * The rows of :math:`\submat(S)` are a contiguous subset of rows of :math:`S`.
-///       * The rows of :math:`\submat(S)` start at :math:`S[\texttt{ro_s}, :]`.
+///       * The rows of :math:`\submat(\mtxS)` are a contiguous subset of rows of :math:`\mtxS`.
+///       * The rows of :math:`\submat(\mtxS)` start at :math:`\mtxS[\texttt{ro_s}, :]`.
 ///
 ///      co_s - [in]
 ///       * A nonnegative integer.
-///       * The columns of :math:`\submat(S)` are a contiguous subset of columns of :math:`S`.
-///       * The columns :math:`\submat(S)` start at :math:`S[:,\texttt{co_s}]`. 
+///       * The columns of :math:`\submat(\mtxS)` are a contiguous subset of columns of :math:`\mtxS`.
+///       * The columns :math:`\submat(\mtxS)` start at :math:`\mtxS[:,\texttt{co_s}]`. 
 ///
 ///      A - [in]
 ///       * A RandBLAS sparse matrix object.
-///       * Defines :math:`\submat(A)`.
-///
-///      ro_a - [in]
-///       * A nonnegative integer.
-///       * The rows of :math:`\submat(A)` are a contiguous subset of rows of :math:`A`.
-///       * The rows of :math:`\submat(A)` start at :math:`A[\texttt{ro_a}, :]`.
-///
-///      co_a - [in]
-///       * A nonnegative integer.
-///       * The columns of :math:`\submat(A)` are a contiguous subset of columns of :math:`A`.
-///       * The columns :math:`\submat(A)` start at :math:`A[:,\texttt{co_a}]`. 
 ///
 ///      beta - [in]
 ///       * A real scalar.
@@ -592,26 +517,24 @@ inline void sketch_sparse(
 ///       * Leading dimension of :math:`\mat(B)` when reading from :math:`B`.
 ///
 /// @endverbatim
-template <typename T, SparseMatrix SpMat, typename RNG>
+template <SparseMatrix SpMat, typename DenseSkOp, typename T = DenseSkOp::scalar_t>
 inline void sketch_sparse(
     blas::Layout layout,
     blas::Op opA,
     blas::Op opS,
     int64_t m, // B is m-by-d
-    int64_t d, // op(submat(A)) is m-by-n
-    int64_t n, // op(submat(S)) is n-by-d
+    int64_t d, // op(submat(\mtxA)) is m-by-n
+    int64_t n, // op(submat(\mtxS)) is n-by-d
     T alpha,
     SpMat &A,
-    int64_t ro_a,
-    int64_t co_a,
-    DenseSkOp<T, RNG> &S,
+    DenseSkOp &S,
     int64_t ro_s,
     int64_t co_s,
     T beta,
     T *B,
     int64_t ldb
 ) {
-    sparse_data::rsksp3(layout, opA, opS, m, d, n, alpha, A, ro_a, co_a, S, ro_s, co_s, beta, B, ldb);
+    sparse_data::rsksp3(layout, opA, opS, m, d, n, alpha, A, 0, 0, S, ro_s, co_s, beta, B, ldb);
     return;
 }
 
diff --git a/RandBLAS/sparse_data/spmm_dispatch.hh b/RandBLAS/sparse_data/spmm_dispatch.hh
index ef6aa826..f6bd5c82 100644
--- a/RandBLAS/sparse_data/spmm_dispatch.hh
+++ b/RandBLAS/sparse_data/spmm_dispatch.hh
@@ -45,7 +45,7 @@
 
 namespace RandBLAS::sparse_data {
 
-template <typename T, SparseMatrix SpMat>
+template <SparseMatrix SpMat, typename T = SpMat::scalar_t>
 void left_spmm(
     blas::Layout layout,
     blas::Op opA,
@@ -159,7 +159,7 @@ void left_spmm(
     return;
 }
 
-template <typename T, SparseMatrix SpMat>
+template <SparseMatrix SpMat, typename T = SpMat::scalar_t>
 inline void right_spmm(
     blas::Layout layout,
     blas::Op opA,
@@ -178,13 +178,13 @@ inline void right_spmm(
     int64_t ldc
 ) { 
     //
-    // Compute C = op(mat(A)) @ op(submat(B)) by reduction to left_spmm. We start with
+    // Compute C = op(mat(A)) @ op(submat(\mtxB)) by reduction to left_spmm. We start with
     //
-    //      C^T = op(submat(B))^T @ op(mat(A))^T.
+    //      C^T = op(submat(\mtxB))^T @ op(mat(A))^T.
     //
-    // Then we interchange the operator "op(*)" in op(submat(A)) and (*)^T:
+    // Then we interchange the operator "op(*)" in op(submat(\mtxA)) and (*)^T:
     //
-    //      C^T = op(submat(B))^T @ op(mat(A)^T).
+    //      C^T = op(submat(\mtxB))^T @ op(mat(A)^T).
     //
     // We tell left_spmm to process (C^T) and (B^T) in the opposite memory layout
     // compared to the layout for (B, C).
@@ -205,17 +205,16 @@ namespace RandBLAS {
 
 // =============================================================================
 /// \fn spmm(blas::Layout layout, blas::Op opA, blas::Op opB, int64_t m,
-///     int64_t n, int64_t k, T alpha, SpMat &A, int64_t ro_a, int64_t co_a,
-///     const T *B, int64_t ldb, T beta, T *C, int64_t ldc
+///     int64_t n, int64_t k, T alpha, SpMat &A, const T *B, int64_t ldb, T beta, T *C, int64_t ldc
 /// ) 
 /// @verbatim embed:rst:leading-slashes
-/// Perform an SPMM-like operation, multiplying a dense matrix on the left with a (submatrix of a) sparse matrix:
+/// Perform an SPMM-like operation, multiplying a dense matrix on the left with a sparse matrix:
 ///
 /// .. math::
-///     \mat(C) = \alpha \cdot \underbrace{\op(\submat(A))}_{m \times k} \cdot \underbrace{\op(\mat(B))}_{k \times n} + \beta \cdot \underbrace{\mat(C)}_{m \times n},    \tag{$\star$}
+///     \mat(C) = \alpha \cdot \underbrace{\op(\mtxA)}_{m \times k} \cdot \underbrace{\op(\mat(B))}_{k \times n} + \beta \cdot \underbrace{\mat(C)}_{m \times n},    \tag{$\star$}
 ///
-/// where :math:`\alpha` and :math:`\beta` are real scalars, :math:`\op(X)` either returns a matrix :math:`X`
-/// or its transpose, and :math:`A` is a sparse matrix.
+/// where :math:`\alpha` and :math:`\beta` are real scalars, :math:`\op(\mtxX)` either returns a matrix :math:`\mtxX`
+/// or its transpose, and :math:`\mtxA` is a sparse matrix.
 ///
 /// .. dropdown:: Full parameter descriptions
 ///     :animate: fade-in-slide-down
@@ -225,8 +224,8 @@ namespace RandBLAS {
 ///       * Matrix storage for :math:`\mat(B)` and :math:`\mat(C)`.
 ///
 ///      opA - [in]
-///       * If :math:`\opA` == NoTrans, then :math:`\op(\submat(A)) = \submat(A)`.
-///       * If :math:`\opA` == Trans, then :math:`\op(\submat(A)) = \submat(A)^T`.
+///       * If :math:`\opA` == NoTrans, then :math:`\op(\mtxA) = \mtxA`.
+///       * If :math:`\opA` == Trans, then :math:`\op(\mtxA) = \mtxA^T`.
 ///
 ///      opB - [in]
 ///       * If :math:`\opB` = NoTrans, then :math:`\op(\mat(B)) = \mat(B)`.
@@ -235,7 +234,7 @@ namespace RandBLAS {
 ///      m - [in]
 ///       * A nonnegative integer.
 ///       * The number of rows in :math:`\mat(C)`.
-///       * The number of rows in :math:`\op(\submat(A))`.
+///       * The number of rows in :math:`\op(\mtxA)`.
 ///
 ///      n - [in]
 ///       * A nonnegative integer.
@@ -244,7 +243,7 @@ namespace RandBLAS {
 ///
 ///      k - [in]
 ///       * A nonnegative integer.
-///       * The number of columns in :math:`\op(\submat(A))`
+///       * The number of columns in :math:`\op(\mtxA)`
 ///       * The number of rows in :math:`\op(\mat(B))`.
 ///
 ///      alpha - [in]
@@ -252,17 +251,7 @@ namespace RandBLAS {
 ///
 ///      A - [in]
 ///       * A RandBLAS sparse matrix object.
-///       * Defines :math:`\submat(A)`.
-///
-///      ro_a - [in]
-///       * A nonnegative integer.
-///       * The rows of :math:`\submat(A)` are a contiguous subset of rows of :math:`A.`
-///       * The rows of :math:`\submat(A)` start at :math:`A[\texttt{ro_a}, :].`
-///
-///      co_a - [in]
-///       * A nonnegative integer.
-///       * The columns of :math:`\submat(A)` are a contiguous subset of columns of :math:`A`.
-///       * The columns :math:`\submat(A)` start at :math:`A[:,\texttt{co_a}]`. 
+///       * Defines :math:`\mtxA`.
 ///
 ///      B - [in]
 ///       * Pointer to 1D array of real scalars that define :math:`\mat(B)`.
@@ -287,25 +276,24 @@ namespace RandBLAS {
 ///       * Leading dimension of :math:`\mat(C)` when reading from :math:`C`.
 ///
 /// @endverbatim
-template < typename T, SparseMatrix SpMat>
-inline void spmm(blas::Layout layout, blas::Op opA, blas::Op opB, int64_t m, int64_t n, int64_t k, T alpha, SpMat &A, int64_t ro_a, int64_t co_a, const T *B, int64_t ldb, T beta, T *C, int64_t ldc) {
-    RandBLAS::sparse_data::left_spmm(layout, opA, opB, m, n, k, alpha, A, ro_a, co_a, B, ldb, beta, C, ldc);
+template <SparseMatrix SpMat, typename T = SpMat::scalar_t>
+inline void spmm(blas::Layout layout, blas::Op opA, blas::Op opB, int64_t m, int64_t n, int64_t k, T alpha, SpMat &A, const T *B, int64_t ldb, T beta, T *C, int64_t ldc) {
+    RandBLAS::sparse_data::left_spmm(layout, opA, opB, m, n, k, alpha, A, 0, 0, B, ldb, beta, C, ldc);
     return;
 };
 
 // =============================================================================
 /// \fn spmm(blas::Layout layout, blas::Op opA, blas::Op opB, int64_t m,
-///     int64_t n, int64_t k, T alpha, const T* A, int64_t lda,
-///     SpMat &B, int64_t ro_b, int64_t co_b, T beta, T *C, int64_t ldc
+///     int64_t n, int64_t k, T alpha, const T* A, int64_t lda, SpMat &B, T beta, T *C, int64_t ldc
 /// ) 
 /// @verbatim embed:rst:leading-slashes
 /// Perform an SPMM-like operation, multiplying a dense matrix on the right with a (submatrix of a) sparse matrix:
 ///
 /// .. math::
-///     \mat(C) = \alpha \cdot \underbrace{\op(\mat(A))}_{m \times k} \cdot \underbrace{\op(\submat(B))}_{k \times n} + \beta \cdot \underbrace{\mat(C)}_{m \times n},    \tag{$\star$}
+///     \mat(C) = \alpha \cdot \underbrace{\op(\mat(A))}_{m \times k} \cdot \underbrace{\op(\mtxB)}_{k \times n} + \beta \cdot \underbrace{\mat(C)}_{m \times n},    \tag{$\star$}
 ///
-/// where :math:`\alpha` and :math:`\beta` are real scalars, :math:`\op(X)` either returns a matrix :math:`X`
-/// or its transpose, and :math:`B` is a sparse matrix.
+/// where :math:`\alpha` and :math:`\beta` are real scalars, :math:`\op(\mtxX)` either returns a matrix :math:`\mtxX`
+/// or its transpose, and :math:`\mtxB` is a sparse matrix.
 ///
 /// .. dropdown:: Full parameter descriptions
 ///     :animate: fade-in-slide-down
@@ -319,8 +307,8 @@ inline void spmm(blas::Layout layout, blas::Op opA, blas::Op opB, int64_t m, int
 ///       * If :math:`\opA` = Trans, then :math:`\op(\mat(A)) = \mat(A)^T`.
 ///
 ///      opB - [in]
-///       * If :math:`\opB` = NoTrans, then :math:`\op(\submat(B)) = \submat(B)`.
-///       * If :math:`\opB` = Trans, then :math:`\op(\submat(B)) = \submat(B)^T`.
+///       * If :math:`\opB` = NoTrans, then :math:`\op(\mtxB) = \mtxB`.
+///       * If :math:`\opB` = Trans, then :math:`\op(\mtxB) = \mtxB^T`.
 ///
 ///      m - [in]
 ///       * A nonnegative integer.
@@ -330,12 +318,12 @@ inline void spmm(blas::Layout layout, blas::Op opA, blas::Op opB, int64_t m, int
 ///      n - [in]
 ///       * A nonnegative integer.
 ///       * The number of columns in :math:`\mat(C)`.
-///       * The number of columns in :math:`\op(\submat(B))`.
+///       * The number of columns in :math:`\op(\mtxB)`.
 ///
 ///      k - [in]
 ///       * A nonnegative integer.
 ///       * The number of columns in :math:`\op(\mat(A))`
-///       * The number of rows in :math:`\op(\submat(B))`.
+///       * The number of rows in :math:`\op(\mtxB)`.
 ///
 ///      alpha - [in]
 ///       * A real scalar.
@@ -349,17 +337,6 @@ inline void spmm(blas::Layout layout, blas::Op opA, blas::Op opB, int64_t m, int
 ///
 ///      B - [in]
 ///       * A RandBLAS sparse matrix object.
-///       * Defines :math:`\submat(B)`.
-///
-///      ro_b - [in]
-///       * A nonnegative integer.
-///       * The rows of :math:`\submat(B)` are a contiguous subset of rows of :math:`B`.
-///       * The rows of :math:`\submat(B)` start at :math:`B[\texttt{ro_b}, :]`.
-///
-///      co_b - [in]
-///       * A nonnegative integer.
-///       * The columns of :math:`\submat(B)` are a contiguous subset of columns of :math:`B`.
-///       * The columns :math:`\submat(B)` start at :math:`B[:,\texttt{co_a}]`.
 ///
 ///      beta - [in]
 ///       * A real scalar.
@@ -377,9 +354,9 @@ inline void spmm(blas::Layout layout, blas::Op opA, blas::Op opB, int64_t m, int
 ///       * Leading dimension of :math:`\mat(C)` when reading from :math:`C`.
 ///
 /// @endverbatim
-template <typename T, SparseMatrix SpMat>
-inline void spmm(blas::Layout layout, blas::Op opA, blas::Op opB, int64_t m, int64_t n, int64_t k, T alpha, const T *A, int64_t lda, SpMat &B, int64_t ro_b, int64_t co_b, T beta, T *C, int64_t ldc) {
-    RandBLAS::sparse_data::right_spmm(layout, opA, opB, m, n, k, alpha, A, lda, B, ro_b, co_b, B, beta, C, ldc);
+template <SparseMatrix SpMat, typename T = SpMat::scalar_t>
+inline void spmm(blas::Layout layout, blas::Op opA, blas::Op opB, int64_t m, int64_t n, int64_t k, T alpha, const T *A, int64_t lda, SpMat &B, T beta, T *C, int64_t ldc) {
+    RandBLAS::sparse_data::right_spmm(layout, opA, opB, m, n, k, alpha, A, lda, B, 0, 0, B, beta, C, ldc);
     return;
 }
 
diff --git a/RandBLAS/sparse_skops.hh b/RandBLAS/sparse_skops.hh
index b6c85fcc..a946ec08 100644
--- a/RandBLAS/sparse_skops.hh
+++ b/RandBLAS/sparse_skops.hh
@@ -47,12 +47,9 @@
 namespace RandBLAS::sparse {
 
 
-// =============================================================================
-/// WARNING: this function is not part of the public API.
-///
-template <typename T, typename RNG, SignedInteger sint_t>
-static RNGState<RNG> repeated_fisher_yates(
-    const RNGState<RNG> &state,
+template <typename T, SignedInteger sint_t, typename state_t = RNGState<DefaultRNG>>
+static state_t repeated_fisher_yates(
+    const state_t &state,
     int64_t vec_nnz,
     int64_t dim_major,
     int64_t dim_minor,
@@ -67,6 +64,7 @@ static RNGState<RNG> repeated_fisher_yates(
     for (sint_t j = 0; j < dim_major; ++j)
         vec_work[j] = j;
     std::vector<sint_t> pivots(vec_nnz);
+    using RNG = typename state_t::generator;
     RNG gen;
     auto [ctr, key] = state;
     for (sint_t i = 0; i < dim_minor; ++i) {
@@ -102,122 +100,220 @@ static RNGState<RNG> repeated_fisher_yates(
             vec_work[ell] = swap;
         }
     }
-    return RNGState<RNG> {ctr, key};
-}
-
-template <typename RNG, SignedInteger sint_t>
-inline RNGState<RNG> repeated_fisher_yates(
-    const RNGState<RNG> &state, int64_t k, int64_t n, int64_t r, sint_t *indices
-) {
-    return repeated_fisher_yates(state, k, n, r, indices, (sint_t*) nullptr, (double*) nullptr);
+    return state_t {ctr, key};
 }
 
-template <typename RNG, typename SD>
-RNGState<RNG> compute_next_state(SD dist, RNGState<RNG> state) {
-    int64_t minor_len;
-    if (dist.major_axis == MajorAxis::Short) {
-        minor_len = std::min(dist.n_rows, dist.n_cols);
+inline double isometry_scale(Axis major_axis, int64_t vec_nnz, int64_t dim_major, int64_t dim_minor) {
+    if (major_axis == Axis::Short) {
+        return std::pow(vec_nnz, -0.5); 
+    } else if (major_axis == Axis::Long) {
+        return std::sqrt( ((double) dim_major) / (vec_nnz * ((double) dim_minor)) );
     } else {
-        minor_len = std::max(dist.n_rows, dist.n_cols);
+        throw std::invalid_argument("Cannot compute the isometry scale for a sparse operator with unspecified major axis.");
     }
-    int64_t full_incr = minor_len * dist.vec_nnz;
-    state.counter.incr(full_incr);
-    return state;
 }
 
 }
 
 namespace RandBLAS {
 // =============================================================================
-/// A distribution over sparse matrices.
-///
+/// A distribution over matrices with structured sparsity. Depending on parameter
+/// choices, one can obtain distributions described in the literature as 
+/// SJLTs, OSNAPs, hashing embeddings, CountSketch, row or column sampling, or 
+/// LESS-Uniform distributions. All members of a SparseDist are const.
+/// 
 struct SparseDist {
 
     // ---------------------------------------------------------------------------
-    ///  Matrices drawn from this distribution have this many rows.
+    ///  Matrices drawn from this distribution have this many rows;
+    ///  must be greater than zero.
     const int64_t n_rows;
 
     // ---------------------------------------------------------------------------
-    ///  Matrices drawn from this distribution have this many columns.
+    ///  Matrices drawn from this distribution have this many columns;
+    ///  must be greater than zero.
     const int64_t n_cols;
 
     // ---------------------------------------------------------------------------
-    ///  If this distribution is short-axis major, then matrices sampled from
-    ///  it will have exactly \math{\texttt{vec_nnz}} nonzeros per short-axis
-    ///  vector (i.e., per column of a wide matrix or row of a tall matrix).
-    //// One would be paranoid to set this higher than, say, eight, even when
-    ///  sketching *very* high-dimensional data.
+    ///  Operators sampled from this distribution are constructed by taking independent
+    ///  samples from a suitable distribution \math{\mathcal{V}} over sparse vectors.
+    ///  This distribution is always over \math{\mathbb{R}^k,}
+    ///  where \math{k = \ttt{dim_major}.}  
+    ///  The structural properties of \math{\mathcal{V}} depend heavily on whether we're
+    ///  short-axis major or long-axis major.
     ///
-    ///  If this distribution is long-axis major, then matrices sampled from it
-    ///  will have *at most* \math{\texttt{vec_nnz}} nonzeros per long-axis
-    ///  vector (i.e., per row of a wide matrix or per column of a tall matrix).
+    ///  To be explicit, let's say that \math{\mtxx} is a sample from \math{\mathcal{V}.}
+    ///  
+    ///  If \math{\ttt{major_axis} = \ttt{Short}}, then \math{\mtxx} has exactly \math{\vecnnz} nonzeros,
+    ///  and the locations of those nonzeros are chosen uniformly
+    ///  without replacement from \math{\\{0,\ldots,k-1\\}.} The values of the nonzeros are
+    ///  sampled independently and uniformly from +/- 1.
     ///
-    const int64_t vec_nnz;
+    ///  If \math{\ttt{major_axis} = \ttt{Long}}, then \math{\mtxx} has *at most* \math{\vecnnz} nonzero
+    ///  entries. The locations of the nonzeros are determined by sampling uniformly
+    ///  with replacement from \math{\\{0,\ldots,k-1\\}.}
+    ///  If index \math{j} occurs in the sample \math{\ell} times, then 
+    ///  \math{\mtxx_j} will equal \math{\sqrt{\ell}} with probability 1/2 and
+    ///  \math{-\sqrt{\ell}} with probability 1/2.
+    ///
+    const Axis major_axis;
 
     // ---------------------------------------------------------------------------
-    ///  Constrains the sparsity pattern of matrices drawn from this distribution. 
+    ///  Defined as
+    ///  @verbatim embed:rst:leading-slashes
+    ///
+    ///  .. math::
     ///
-    ///  Having major_axis==Short results in sketches are more likely to contain
-    ///  useful geometric information, without making assumptions about the data
-    ///  being sketched.
+    ///      \ttt{dim_major} = \begin{cases} \,\min\{ \ttt{n_rows},\, \ttt{n_cols} \} &\text{ if }~~ \ttt{major_axis} = \ttt{Short} \\ \max\{ \ttt{n_rows},\,\ttt{n_cols} \} & \text{ if } ~~\ttt{major_axis} = \ttt{Long} \end{cases}.
     ///
-    const MajorAxis major_axis = MajorAxis::Short;
+    ///  @endverbatim
+    const int64_t dim_major;
+
+    // ---------------------------------------------------------------------------
+    ///  Defined as \math{\ttt{n_rows} + \ttt{n_cols} - \ttt{dim_major}.} This is
+    ///  just whichever of \math{(\ttt{n_rows},\, \ttt{n_cols})} wasn't identified
+    ///  as \math{\ttt{dim_major}.}
+    const int64_t dim_minor;
+
+    // ---------------------------------------------------------------------------
+    ///  An operator sampled from this distribution should be multiplied
+    ///  by this constant in order for sketching to preserve norms in expectation.
+    const double isometry_scale;
+
+    // ---------------------------------------------------------------------------
+    /// This constrains the number of nonzeros in each major-axis vector.
+    /// It's subject to the bounds \math{1 \leq \vecnnz \leq \ttt{dim_major}.}
+    /// See @verbatim embed:rst:inline :ref:`this tutorial page <sparsedist_params>` for advice on how to set this member. @endverbatim 
+    const int64_t vec_nnz;
+
+    // ---------------------------------------------------------------------------
+    ///  An upper bound on the number of structural nonzeros that can appear in an
+    ///  operator sampled from this distribution. Computed automatically as
+    ///  \math{\ttt{full_nnz} = \vecnnz * \ttt{dim_minor}.}
+    const int64_t full_nnz;
+
+    // ---------------------------------------------------------------------------
+    ///  Arguments passed to this function are used to initialize members of the same names.
+    ///  The members \math{\ttt{dim_major},} \math{\ttt{dim_minor},} \math{\ttt{isometry_scale},} and \math{\ttt{full_nnz}}
+    ///  are automatically initialized to be consistent with these arguments.
+    ///  
+    ///  This constructor will raise an error if \math{\min\\{\ttt{n_rows}, \ttt{n_cols}\\} \leq 0} or if 
+    ///  \math{\vecnnz} does not respect the bounds documented for the \math{\vecnnz} member.
+    SparseDist(
+        int64_t n_rows,
+        int64_t n_cols,
+        int64_t vec_nnz = 4,
+        Axis major_axis = Axis::Short
+    ) : n_rows(n_rows), n_cols(n_cols),
+        major_axis(major_axis),
+        dim_major((major_axis == Axis::Short) ? std::min(n_rows, n_cols) : std::max(n_rows, n_cols)),
+        dim_minor(n_rows + n_cols - dim_major),
+        isometry_scale(sparse::isometry_scale(major_axis, vec_nnz, dim_major, dim_minor)),
+        vec_nnz(vec_nnz), full_nnz(vec_nnz * dim_minor) 
+    {   // argument validation
+        randblas_require(n_rows > 0);
+        randblas_require(n_cols > 0);
+        randblas_require(vec_nnz > 0);
+        randblas_require(vec_nnz <= dim_major);
+    }
 };
 
-template <typename T>
-inline T isometry_scale_factor(SparseDist D) {
-    T vec_nnz = (T) D.vec_nnz;
-    if (D.major_axis == MajorAxis::Short) {
-        return std::pow(vec_nnz, -0.5); 
+#ifdef __cpp_concepts
+static_assert(SketchingDistribution<SparseDist>);
+#endif
+
+
+// =============================================================================
+/// This function is used for sampling a sequence of \math{k} elements uniformly
+/// without replacement from the index set \math{\\{0,\ldots,n-1\\}.} It uses a special 
+/// implementation of Fisher-Yates shuffling to produce \math{r} such samples in \math{O(n + rk)} time.
+/// These samples are stored by  writing to \math{\ttt{samples}} in \math{r} blocks of length \math{k.}
+/// 
+/// The returned RNGState should
+/// be used for the next call to a random sampling function whose output should be statistically
+/// independent from \math{\ttt{samples}.}
+///
+template <SignedInteger sint_t, typename state_t = RNGState<DefaultRNG>>
+inline state_t repeated_fisher_yates(
+    int64_t k, int64_t n, int64_t r, sint_t *samples, const state_t &state
+) {
+    return sparse::repeated_fisher_yates(state, k, n, r, samples, (sint_t*) nullptr, (double*) nullptr);
+}
+
+template <typename RNG>
+RNGState<RNG> compute_next_state(SparseDist dist, RNGState<RNG> state) {
+    int64_t num_mavec, incrs_per_mavec;
+    if (dist.major_axis == Axis::Short) {
+        num_mavec = std::max(dist.n_rows, dist.n_cols);
+        incrs_per_mavec = dist.vec_nnz;
+        // ^ SASOs don't try to be frugal with CBRNG increments.
+        //   See repeated_fisher_yates.
     } else {
-        T minor_ax_len = (T) std::min(D.n_rows, D.n_cols);
-        T major_ax_len = (T) std::max(D.n_rows, D.n_cols);
-        return std::sqrt( major_ax_len / (vec_nnz * minor_ax_len) );
+        num_mavec = std::min(dist.n_rows, dist.n_cols);
+        incrs_per_mavec = dist.vec_nnz * ((int64_t) state.len_c/2);
+        // ^ LASOs do try to be frugal with CBRNG increments.
+        //   See sample_indices_iid_uniform.
     }
+    int64_t full_incr = num_mavec * incrs_per_mavec;
+    state.counter.incr(full_incr);
+    return state;
 }
 
-
 // =============================================================================
-/// A sample from a prescribed distribution over sparse matrices.
-///
+/// A sample from a distribution over structured sparse matrices with either
+/// independent rows or independent columns. This type conforms to the
+/// SketchingOperator concept.
 template <typename T, typename RNG = r123::Philox4x32, SignedInteger sint_t = int64_t>
 struct SparseSkOp {
 
-    using index_t = sint_t;
+    // ---------------------------------------------------------------------------
+    /// Type alias.
+    using distribution_t = SparseDist;
+
+    // ---------------------------------------------------------------------------
+    /// Type alias.
     using state_t = RNGState<RNG>;
+
+    // ---------------------------------------------------------------------------
+    /// Real scalar type used for nonzeros in matrix representations of this operator.
     using scalar_t = T;
 
-    const int64_t n_rows;
-    const int64_t n_cols;
+    // ---------------------------------------------------------------------------
+    /// Signed integer type used in index arrays for sparse matrix representations
+    /// of this operator.
+    using index_t = sint_t;
 
     // ---------------------------------------------------------------------------
-    ///  The distribution from which this sketching operator is sampled.
-    ///  This member specifies the number of rows and columns of the sketching
-    ///  operator.
+    ///  The distribution from which this operator is sampled.
     const SparseDist dist;
 
     // ---------------------------------------------------------------------------
-    ///  The state that should be passed to the RNG when the full sketching 
+    ///  The state passed to random sampling functions when the full
     ///  operator needs to be sampled from scratch. 
-    const RNGState<RNG> seed_state;
+    const state_t seed_state;
 
     // ---------------------------------------------------------------------------
-    ///  The state that should be used by the next call to an RNG *after* the
-    ///  full sketching operator has been sampled.
-    const RNGState<RNG> next_state;
+    ///  The state that should be used in the next call to a random sampling function
+    ///  whose output should be statistically independent from properties of this
+    ///  operator.
+    const state_t next_state;
 
     // ---------------------------------------------------------------------------
-    /// We need workspace to store a representation of the sampled sketching
-    /// operator. This member indicates who is responsible for allocating and 
-    /// deallocating this workspace. If own_memory is true, then 
-    /// RandBLAS is responsible.
-    const bool own_memory = true;
+    ///  Alias for dist.n_rows. Automatically initialized in all constructors.
+    const int64_t n_rows;
 
     // ---------------------------------------------------------------------------
-    /// A flag (indicating a sufficient condition) that the data underlying the
-    /// sparse matrix has already been sampled.
-    bool known_filled = false;
-    
+    ///  Alias for dist.n_cols. Automatically initialized in all constructors.
+    const int64_t n_cols;
+
+    // ----------------------------------------------------------------------------
+    ///  If true, then RandBLAS has permission to allocate and attach memory to this operator's reference
+    ///  members (S.rows, S.cols, and S.vals). If true *at destruction time*, then delete []
+    ///  will be called on each of this operator's non-null reference members.
+    ///
+    ///  RandBLAS only writes to this member at construction time.
+    ///
+    bool own_memory;
     
     /////////////////////////////////////////////////////////////////////
     //
@@ -225,9 +321,38 @@ struct SparseSkOp {
     //
     /////////////////////////////////////////////////////////////////////
 
-    sint_t *rows = nullptr;
-    sint_t *cols = nullptr;
-    T *vals = nullptr;
+    // ---------------------------------------------------------------------------
+    ///  The number of structural nonzeros in this operator.
+    ///  Negative values are a flag that the operator's explicit representation
+    ///  hasn't been sampled yet.
+    ///
+    ///  \internal
+    ///  If dist.major_axis
+    ///  is Short then we know ahead of time that nnz=dist.full_nnz.
+    ///  Otherwise, the precise value of nnz can't be known until the operator's
+    ///  explicit representation is sampled (although it's always subject to the
+    ///  bounds 1 <= nnz <= dist.full_nnz.
+    ///  \endinternal
+    ///  
+    int64_t nnz;
+
+    // ---------------------------------------------------------------------------
+    ///  Reference to an array that holds the values of this operator's structural nonzeros.
+    ///
+    ///  If non-null, this must point to an array of length at least dist.full_nnz.
+    T *vals;
+
+    // ---------------------------------------------------------------------------
+    ///  Reference to an array that holds the row indices for this operator's structural nonzeros.
+    ///
+    ///  If non-null, this must point to an array of length at least dist.full_nnz.
+    sint_t *rows;
+
+    // ---------------------------------------------------------------------------
+    ///  Reference to an array that holds the column indices for this operator's structural nonzeros.
+    ///
+    ///  If non-null, this must point to an array of length at least dist.full_nnz.
+    sint_t *cols;
 
     /////////////////////////////////////////////////////////////////////
     //
@@ -235,188 +360,231 @@ struct SparseSkOp {
     //
     /////////////////////////////////////////////////////////////////////
 
-    // ---------------------------------------------------------------------------
-    // 
-    //  @param[in] dist
-    //      A SparseDist object.
-    //      - Defines the number of rows and columns in this sketching operator.
-    //      - Indirectly controls sparsity pattern.
-    //      - Directly controls sparsity level.
-    // 
-    //  @param[in] state
-    //      An RNGState object.
-    //      - The RNG will use this as the starting point to generate all 
-    //        random numbers needed for this sketching operator.
-    // 
-    //  @param[in] rows
-    //      Pointer to int64_t array.
-    //      - stores row indices as part of the COO format.
-    // 
-    //  @param[in] cols
-    //      Pointer to int64_t array.
-    //      - stores column indices as part of the COO format.
-    // 
-    //  @param[in] vals
-    //      Pointer to array of real numerical type T.
-    //      - stores nonzeros as part of the COO format.
-    //  
-    //  @param[in] known_filled
-    //      A boolean. If true, then the arrays pointed to by
-    //      (rows, cols, vals) already contain the randomly sampled
-    //      data defining this sketching operator.
-    //      
+    /// ---------------------------------------------------------------------------
+    ///  **Standard constructor**. Arguments passed to this function are 
+    ///  used to initialize members of the same names. own_memory is initialized to true,
+    ///  nnz is initialized to -1, and (vals, rows, cols) are each initialized
+    ///  to nullptr. next_state is computed automatically from dist and seed_state.
+    ///  
+    ///  Although own_memory is initialized to true, RandBLAS will not attach
+    ///  memory to (vals, rows, cols) unless fill_sparse(SparseSkOp &S) is called. 
+    ///
+    ///  If a RandBLAS function needs an explicit representation of this operator and
+    ///  yet nnz < 0, then RandBLAS will construct a temporary
+    ///  explicit representation of this operator and delete that representation before returning.
+    ///  
     SparseSkOp(
         SparseDist dist,
-        const RNGState<RNG> &state,
-        sint_t *rows,
-        sint_t *cols,
+        const state_t &seed_state
+    ):  // variable definitions
+        dist(dist),
+        seed_state(seed_state),
+        next_state(compute_next_state(dist, seed_state)),
+        n_rows(dist.n_rows),
+        n_cols(dist.n_cols), own_memory(true), nnz(-1), vals(nullptr), rows(nullptr), cols(nullptr) { }
+
+    /// --------------------------------------------------------------------------------
+    ///  **Expert constructor**. Arguments passed to this function are 
+    ///  used to initialize members of the same names. own_memory is initialized to false.
+    /// 
+    SparseSkOp(
+        SparseDist dist,
+        const state_t &seed_state,
+        const state_t &next_state,
+        int64_t nnz,
         T *vals,
-        bool known_filled = true
+        sint_t *rows,
+        sint_t *cols
     ) : // variable definitions
+        dist(dist),
+        seed_state(seed_state),
+        next_state(next_state),
         n_rows(dist.n_rows),
         n_cols(dist.n_cols),
-        dist(dist),
-        seed_state(state),
         own_memory(false),
-        next_state(sparse::compute_next_state(dist, seed_state))
-    {   // sanity checks
-        randblas_require(this->dist.n_rows > 0);
-        randblas_require(this->dist.n_cols > 0);
-        randblas_require(this->dist.vec_nnz > 0);
-        // actual work
-        this->rows = rows;
-        this->cols = cols;
-        this->vals = vals;
-        this->known_filled = known_filled;
-    };
-
-    // Useful for shallow copies (possibly with transposition)
-    SparseSkOp(
-        SparseDist dist,
-        const RNGState<RNG> &seed_state,
-        sint_t *rows,
-        sint_t *cols,
-        T *vals,
-        const RNGState<RNG> &next_state
-    ) : n_rows(dist.n_rows), n_cols(dist.n_cols), dist(dist), seed_state(seed_state), next_state(next_state), own_memory(false) {
-        randblas_require(this->dist.n_rows > 0);
-        randblas_require(this->dist.n_cols > 0);
-        randblas_require(this->dist.vec_nnz > 0);
-        // actual work
-        this->rows = rows;
-        this->cols = cols;
-        this->vals = vals;
-        this->known_filled = known_filled;
+        nnz(nnz), vals(vals), rows(rows), cols(cols){ };
+
+    //  Move constructor
+    SparseSkOp(SparseSkOp<T,RNG,sint_t> &&S
+    ) : dist(S.dist), seed_state(S.seed_state), next_state(S.next_state),
+        n_rows(dist.n_rows), n_cols(dist.n_cols), own_memory(S.own_memory),
+        nnz(S.nnz), rows(S.rows), cols(S.cols), vals(S.vals)
+    {
+        S.rows = nullptr;
+        S.cols = nullptr;
+        S.vals = nullptr;
+        S.nnz = -1;
     }
 
-    SparseSkOp(
-        SparseDist dist,
-        uint32_t key,
-        sint_t *rows,
-        sint_t *cols,
-        T *vals 
-    ) : SparseSkOp(dist, RNGState<RNG>(key), rows, cols, vals) {};
+    //  Destructor
+    ~SparseSkOp() {
+        if (own_memory) {
+            if (rows != nullptr) delete [] rows;
+            if (cols != nullptr) delete [] cols;
+            if (vals != nullptr) delete [] vals;
+        }
+    }
+};
 
 
-    ///---------------------------------------------------------------------------
-    /// The preferred constructor for SparseSkOp objects. There are other 
-    /// constructors, but they don't appear in the web documentation.
-    ///
-    /// @param[in] dist
-    ///     A SparseDist object.
-    ///     - Defines the number of rows and columns in this sketching operator.
-    ///     - Indirectly controls sparsity pattern.
-    ///     - Directly controls sparsity level.
-    ///
-    /// @param[in] state
-    ///     An RNGState object.
-    ///     - The RNG will use this as the starting point to generate all 
-    ///       random numbers needed for this sketching operator.
-    ///
-    SparseSkOp(
-        SparseDist dist,
-        const RNGState<RNG> &state
-    ) :  // variable definitions
-        n_rows(dist.n_rows),
-        n_cols(dist.n_cols),
-        dist(dist),
-        seed_state(state),
-        next_state(sparse::compute_next_state(dist, seed_state)),
-        own_memory(true)
-    {   // sanity checks
-        randblas_require(this->dist.n_rows > 0);
-        randblas_require(this->dist.n_cols > 0);
-        randblas_require(this->dist.vec_nnz > 0);
-        // actual work
-        int64_t minor_ax_len;
-        if (this->dist.major_axis == MajorAxis::Short) {
-            minor_ax_len = MAX(this->dist.n_rows, this->dist.n_cols);
-        } else { 
-            minor_ax_len = MIN(this->dist.n_rows, this->dist.n_cols);
+template <typename T, SignedInteger sint_t>
+void laso_merge_long_axis_vector_coo_data(
+    int64_t vec_nnz, T* vals, sint_t* idxs_lax, sint_t *idxs_sax, int64_t i,
+    std::unordered_map<sint_t, T> &loc2count,
+    std::unordered_map<sint_t, T> &loc2scale
+) {
+    loc2count.clear();
+    // ^ Used to count the number of times each long-axis index
+    //   appears in a given long-axis vector. Indices that don't
+    //   appear are not stored explicitly.
+    loc2scale.clear();
+    // ^ Stores a mean-zero variance-one subgaussian random variable for
+    //   each index appearing in the long-axis vector. Current
+    //   long-axis-sparse sampling uses Rademachers, but the literature
+    //   technically prefers Gaussians.
+    for (int64_t j = 0; j < vec_nnz; ++j) {
+        idxs_sax[j] = i;
+        sint_t ell = idxs_lax[j];
+        T      val = vals[j];
+        if (loc2scale.count(ell)) {
+            loc2count[ell] = loc2count[ell] + 1;
+        } else {
+            loc2scale[ell] = val;
+            loc2count[ell] = 1.0;
         }
-        int64_t nnz = this->dist.vec_nnz * minor_ax_len;
-        this->rows = new sint_t[nnz];
-        this->cols = new sint_t[nnz];
-        this->vals = new T[nnz];
     }
+    if ((int64_t) loc2scale.size() < vec_nnz) {
+        // Then we have duplicates. We need to overwrite some of the values
+        // of (idxs_lax, vals, idxs_sax) and implicitly
+        // shift them backward to remove duplicates;
+        int64_t count = 0;
+        for (const auto& [ell,c] : loc2count) {
+            idxs_lax[count] = ell;
+            vals[count] = std::sqrt(c) * loc2scale[ell];
+            count += 1;
+        }
+    }
+    return;
+}
 
-    SparseSkOp(
-        SparseDist dist,
-        uint32_t key
-    ) : SparseSkOp(dist, RNGState<RNG>(key)) {};
+// =============================================================================
+/// @verbatim embed:rst:leading-slashes
+///
+///   .. |vals|  mathmacro:: \mathtt{vals}
+///   .. |rows|  mathmacro:: \mathtt{rows}
+///   .. |cols|  mathmacro:: \mathtt{cols}
+///   .. |Dfullnnz| mathmacro:: {\mathcal{D}\mathtt{.full\_nnz}}
+///
+/// @endverbatim
+/// This function is the underlying implementation of fill_sparse(SparseSkOp &S).
+/// It has no allocation stage and it skips checks for null pointers.
+///
+/// On entry, \math{(\vals,\rows,\cols)} are arrays of length at least \math{\Dfullnnz.}
+/// On exit, the first \math{\ttt{nnz}} entries of these arrays contain the data for 
+/// a COO sparse matrix representation of the SparseSkOp
+/// defined by \math{(\D,\ttt{seed_state)}.}
+///
+/// 
+template <typename T, typename sint_t, typename state_t>
+state_t fill_sparse_unpacked_nosub(
+    const SparseDist &D,
+    int64_t &nnz, T* vals, sint_t* rows, sint_t *cols,
+    const state_t &seed_state
+) {
+    int64_t dim_major = D.dim_major;
+    int64_t dim_minor = D.dim_minor;
 
+    sint_t *idxs_short = (D.n_rows <= D.n_cols) ? rows : cols;
+    sint_t *idxs_long  = (D.n_rows <= D.n_cols) ? cols : rows;
+    int64_t vec_nnz  = D.vec_nnz;
 
-    //  Destructor
-    ~SparseSkOp() {
-        if (this->own_memory) {
-            delete [] this->rows;
-            delete [] this->cols;
-            delete [] this->vals;
+    if (D.major_axis == Axis::Short) {
+        auto state = sparse::repeated_fisher_yates(
+            seed_state, vec_nnz, dim_major, dim_minor, idxs_short, idxs_long, vals
+        );
+        nnz = vec_nnz * dim_minor;
+        return state;
+    } else if (D.major_axis == Axis::Long) {
+        // We don't sample all at once since we might need to merge duplicate entries
+        // in each long-axis vector. The way we do this is different than the
+        // standard COOMatrix convention of just adding entries together.
+
+        // We begin by defining some datastructures that we repeatedly pass to a helper function.
+        // See the comments in the helper function for info on what these guys mean.
+        std::unordered_map<sint_t, T> loc2count{};
+        std::unordered_map<sint_t, T> loc2scale{}; 
+        int64_t total_nnz = 0;
+        auto state = seed_state;
+        for (int64_t i = 0; i < dim_minor; ++i) {
+            state = sample_indices_iid_uniform(dim_major, vec_nnz, idxs_long, vals, state);
+            // ^ That writes directly so S.vals and either S.rows or S.cols.
+            //   The new values might need to be changed if there are duplicates in lind.
+            //   We have a helper function for this since it's a tedious process.
+            //   The helper function also sets whichever of S.rows or S.cols wasn't populated.
+            laso_merge_long_axis_vector_coo_data(
+                vec_nnz, vals, idxs_long, idxs_short, i, loc2count, loc2scale
+            );
+            int64_t count = loc2count.size();
+            vals += count;
+            idxs_long  += count;
+            idxs_short += count;
+            total_nnz  += count;
         }
+        nnz = total_nnz;
+        return state;
+    } else {
+        throw std::invalid_argument("D.major_axis must be Axis::Short or Axis::Long.");
     }
-};
+}
+
 
 // =============================================================================
-/// Performs the work in sampling S from its underlying distribution. This 
-/// entails populating S.rows, S.cols, and S.vals with COO-format sparse matrix
-/// data.
+/// If \math{\ttt{S.own_memory}} is true then we enter an allocation stage. This stage
+/// inspects the reference members of \math{\ttt{S}}.
+/// Any reference member that's equal to \math{\ttt{nullptr}} is redirected to 
+/// the start of a new array (allocated with ``new []``) of length \math{\ttt{S.dist.full_nnz}.} 
+///
+/// After the allocation stage, we inspect the reference members of \math{\ttt{S}}
+/// and we raise an error if any of them are null.
+///
+/// If all reference members are are non-null, then we'll assume each of them has length 
+/// at least \math{\ttt{S.dist.full_nnz}.} We'll proceed to populate those members 
+/// (and \math{\ttt{S.nnz}}) with the data for the explicit representation of \math{\ttt{S}.}
+/// On exit, \math{\ttt{S}} can be equivalently represented by
+/// @verbatim embed:rst:leading-slashes
+///  .. code:: c++
 ///
-/// RandBLAS will automatically call this function if and when it is needed.
+///         RandBLAS::COOMatrix mat(S.n_rows, S.n_cols, S.nnz, S.vals, S.rows, S.cols);
 ///
-/// @param[in] S
-///     SparseSkOp object.
-///     
+/// @endverbatim
 template <typename SparseSkOp>
 void fill_sparse(SparseSkOp &S) {
-    
-    int64_t long_ax_len = MAX(S.dist.n_rows, S.dist.n_cols);
-    int64_t short_ax_len = MIN(S.dist.n_rows, S.dist.n_cols);
-    bool is_wide = S.dist.n_rows == short_ax_len;
-
     using sint_t = typename SparseSkOp::index_t;
-    sint_t *short_ax_idxs = (is_wide) ? S.rows : S.cols;
-    sint_t *long_ax_idxs  = (is_wide) ? S.cols : S.rows;
-
-    if (S.dist.major_axis == MajorAxis::Short) {
-        sparse::repeated_fisher_yates(
-            S.seed_state, S.dist.vec_nnz, short_ax_len, long_ax_len,
-            short_ax_idxs, long_ax_idxs, S.vals
-        );
-    } else {
-        sparse::repeated_fisher_yates(
-            S.seed_state, S.dist.vec_nnz, long_ax_len, short_ax_len,
-            long_ax_idxs, short_ax_idxs, S.vals
-        );
+    using T      = typename SparseSkOp::scalar_t;
+    int64_t full_nnz = S.dist.full_nnz;
+    if (S.own_memory) {
+        if (S.rows == nullptr) S.rows = new sint_t[full_nnz];
+        if (S.cols == nullptr) S.cols = new sint_t[full_nnz];
+        if (S.vals == nullptr) S.vals = new T[full_nnz];
     }
-    S.known_filled = true;
+    randblas_require(S.rows != nullptr);
+    randblas_require(S.cols != nullptr);
+    randblas_require(S.vals != nullptr);
+    fill_sparse_unpacked_nosub(S.dist, S.nnz, S.vals, S.rows, S.cols, S.seed_state);
     return;
 }
 
-template <typename SKOP>
-void print_sparse(SKOP const &S0) {
+#ifdef __cpp_concepts
+static_assert(SketchingOperator<SparseSkOp<float>>);
+static_assert(SketchingOperator<SparseSkOp<double>>);
+#endif
+
+template <typename SparseSkOp>
+void print_sparse(SparseSkOp const &S0) {
+    // TODO: clean up this function.
     std::cout << "SparseSkOp information" << std::endl;
     int64_t nnz;
-    if (S0.dist.major_axis == MajorAxis::Short) {
+    if (S0.dist.major_axis == Axis::Short) {
         nnz = S0.dist.vec_nnz * MAX(S0.dist.n_rows, S0.dist.n_cols);
         std::cout << "\tSASO: short-axis-sparse operator" << std::endl;
     } else {
@@ -425,91 +593,51 @@ void print_sparse(SKOP const &S0) {
     }
     std::cout << "\tn_rows = " << S0.dist.n_rows << std::endl;
     std::cout << "\tn_cols = " << S0.dist.n_cols << std::endl;
-    std::cout << "\tvector of row indices\n\t\t";
-    for (int64_t i = 0; i < nnz; ++i) {
-        std::cout << S0.rows[i] << ", ";
+    if (S0.rows != nullptr) {
+        std::cout << "\tvector of row indices\n\t\t";
+        for (int64_t i = 0; i < nnz; ++i) {
+            std::cout << S0.rows[i] << ", ";
+        }
+    } else {
+        std::cout << "\trows is the null pointer.\n\t\t";
     }
     std::cout << std::endl;
-    std::cout << "\tvector of column indices\n\t\t";
-    for (int64_t i = 0; i < nnz; ++i) {
-        std::cout << S0.cols[i] << ", ";
+    if (S0.cols != nullptr) {
+        std::cout << "\tvector of column indices\n\t\t";
+        for (int64_t i = 0; i < nnz; ++i) {
+            std::cout << S0.cols[i] << ", ";
+        }
+    } else {
+        std::cout << "\tcols is the null pointer.\n\t\t";
     }
     std::cout << std::endl;
-    std::cout << "\tvector of values\n\t\t";
-    for (int64_t i = 0; i < nnz; ++i) {
-        std::cout << S0.vals[i] << ", ";
+    if (S0.vals != nullptr) {
+        std::cout << "\tvector of values\n\t\t";
+        for (int64_t i = 0; i < nnz; ++i) {
+            std::cout << S0.vals[i] << ", ";
+        }
+    } else {
+        std::cout << "\tvals is the null pointer.\n\t\t";
     }
     std::cout << std::endl;
+    return;
 }
 
-
 } // end namespace RandBLAS
 
 namespace RandBLAS::sparse {
 
 using RandBLAS::SparseSkOp;
-using RandBLAS::MajorAxis;
+using RandBLAS::Axis;
 using RandBLAS::sparse_data::COOMatrix;
 
-template <typename SKOP>
-static bool has_fixed_nnz_per_col(
-    SKOP const &S0
-) {
-    if (S0.dist.major_axis == MajorAxis::Short) {
-        return S0.dist.n_rows < S0.dist.n_cols;
-    } else {
-        return S0.dist.n_cols < S0.dist.n_rows;
-    }
-}
-
-template <typename SKOP>
-static int64_t nnz(
-    SKOP const &S0
-) {
-    bool saso = S0.dist.major_axis == MajorAxis::Short;
-    bool wide = S0.dist.n_rows < S0.dist.n_cols;
-    if (saso & wide) {
-        return S0.dist.vec_nnz * S0.dist.n_cols;
-    } else if (saso & (!wide)) {
-        return S0.dist.vec_nnz * S0.dist.n_rows;
-    } else if (wide & (!saso)) {
-        return S0.dist.vec_nnz * S0.dist.n_rows;
-    } else {
-        // tall LASO
-        return S0.dist.vec_nnz * S0.dist.n_cols;
-    }
-}
-
-template <typename SkOp, typename T = SkOp::scalar_t, typename sint_t = SkOp::index_t>
-COOMatrix<T, sint_t> coo_view_of_skop(SkOp &S) {
-    if (!S.known_filled)
+template <typename SparseSkOp, typename T = SparseSkOp::scalar_t, typename sint_t = SparseSkOp::index_t>
+COOMatrix<T, sint_t> coo_view_of_skop(SparseSkOp &S) {
+    if (S.nnz <= 0)
         fill_sparse(S);
-    int64_t nnz = RandBLAS::sparse::nnz(S);
-    COOMatrix<T, sint_t> A(S.dist.n_rows, S.dist.n_cols, nnz, S.vals, S.rows, S.cols);
+    COOMatrix<T, sint_t> A(S.n_rows, S.n_cols, S.nnz, S.vals, S.rows, S.cols);
     return A;
 }
 
-// =============================================================================
-/// Return a SparseSkOp object representing the transpose of S.
-///
-/// @param[in] S
-///     SparseSkOp object.
-/// @return 
-///     A new SparseSkOp object that depends on the memory underlying S.
-///     (In particular, it depends on S.rows, S.cols, and S.vals.)
-///     
-template <typename SKOP>
-static auto transpose(SKOP const &S) {
-    randblas_require(S.known_filled);
-    SparseDist dist = {
-        .n_rows = S.dist.n_cols,
-        .n_cols = S.dist.n_rows,
-        .vec_nnz = S.dist.vec_nnz,
-        .major_axis = S.dist.major_axis
-    };
-    SKOP St(dist, S.seed_state, S.cols, S.rows, S.vals);
-    St.next_state = S.next_state;
-    return St;
-}
 
 } // end namespace RandBLAS::sparse
diff --git a/RandBLAS/util.hh b/RandBLAS/util.hh
index 22242449..3cd6b7a1 100644
--- a/RandBLAS/util.hh
+++ b/RandBLAS/util.hh
@@ -32,10 +32,10 @@
 #include <RandBLAS/base.hh>
 #include <RandBLAS/exceptions.hh>
 #include <blas.hh>
-#include <cstdio>
 #include <Random123/philox.h>
 #include <Random123/uniform.hpp>
 
+#include <iostream>
 #include <type_traits>
 #include <typeinfo>
 #ifndef _MSC_VER
@@ -59,40 +59,152 @@ void safe_scal(int64_t n, T a, T* x, int64_t inc_x) {
 }
 
 template <typename T>
-void print_colmaj(int64_t n_rows, int64_t n_cols, T *a, const char label[]) {
+void omatcopy(int64_t m, int64_t n, const T* A, int64_t irs_a, int64_t ics_a, T* B, int64_t irs_b, int64_t ics_b) {
+    // TODO:
+    //     1. Order the loops with consideration to cache efficiency.
+    //     2. Vectorize one of the loops with blas::copy or std::memcpy.
+    #define MAT_A(_i, _j) A[(_i)*irs_a + (_j)*ics_a]
+    #define MAT_B(_i, _j) B[(_i)*irs_b + (_j)*ics_b]
+    for (int64_t i = 0; i < m; ++i) {
+        for (int64_t j = 0; j < n; ++j) {
+            MAT_B(i,j) = MAT_A(i,j);
+        }
+    }
+    #undef MAT_A
+    #undef MAT_B
+    return;
+}
+
+template <typename T>
+void flip_layout(blas::Layout layout_in, int64_t m, int64_t n, std::vector<T> &A, int64_t lda_in, int64_t lda_out) {
+    using blas::Layout;
+    Layout layout_out;
+    int64_t len_buff_A_out;
+    if (layout_in == Layout::ColMajor) {
+        layout_out = Layout::RowMajor;
+        randblas_require(lda_in  >= m);
+        randblas_require(lda_out >= n);
+        len_buff_A_out = lda_out * m;
+    } else {
+        layout_out = Layout::ColMajor;
+        randblas_require(lda_in  >= n);
+        randblas_require(lda_out >= m);
+        len_buff_A_out = lda_out * n;
+    }
+    // irs = inter row stride (stepping down a column)
+    // ics = inter column stride (stepping across a row)
+    auto [irs_in,   ics_in] = layout_to_strides(layout_in,  lda_in);
+    auto [irs_out, ics_out] = layout_to_strides(layout_out, lda_out);
+
+    if (len_buff_A_out >= (int64_t) A.size()) {
+        A.resize(len_buff_A_out);
+    }
+    std::vector<T> A_in(A);
+    T* A_buff_in  = A_in.data();
+    T* A_buff_out = A.data();
+    omatcopy(m, n, A_buff_in, irs_in, ics_in, A_buff_out, irs_out, ics_out);
+    A.erase(A.begin() + len_buff_A_out, A.end());
+    A.resize(len_buff_A_out);
+    return;
+}
+
+// =============================================================================
+/// \fn require_symmetric(blas::Layout layout, const T* A, int64_t n, int64_t lda, T tol)
+/// @verbatim embed:rst:leading-slashes
+/// If :math:`\ttt{tol} \geq 0`, this function checks if
+///
+///       .. math::
+///           \frac{|A[i + j \cdot \lda] - A[i \cdot \lda + j]|}{|A[i + j \cdot \lda]| + |A[i \cdot \lda + j]| + 1} \leq \ttt{tol}
+///
+/// for all :math:`i,j \in \\{0,\ldots,n-1\\}.` An error is raised if any such check fails.
+/// This function returns immediately without performing any checks if :math:`\ttt{tol} < 0.`
+/// @endverbatim
+/// sketch_symmetric calls this function with \math{\ttt{tol} = 0} by default.
+/// 
+template <typename T>
+void require_symmetric(blas::Layout layout, const T* A, int64_t n, int64_t lda, T tol) { 
+    if (tol < 0)
+        return;
+    auto [inter_row_stride, inter_col_stride] = layout_to_strides(layout, lda);
+    #define matA(_i, _j) A[(_i)*inter_row_stride + (_j)*inter_col_stride]
+    for (int64_t i = 0; i < n; ++i) {
+        for (int64_t j = i+1; j < n; ++j) {
+            T Aij = matA(i,j);
+            T Aji = matA(j,i);
+            T viol = abs(Aij - Aji);
+            T rel_tol = (abs(Aij) +  abs(Aji) + 1)*tol;
+            if (viol > rel_tol) {
+                std::string message = "Symmetry check failed. |A(%i,%i) - A(%i,%i)| was %e, which exceeds tolerance of %e.";
+                auto _message = message.c_str();
+                randblas_error_if_msg(viol > rel_tol, _message, i, j, j, i, viol, rel_tol);
+            }
+        }
+    }
+    #undef matA
+    return;
+}
+
+} // end namespace RandBLAS::util
+
+namespace RandBLAS {
+
+enum ArrayStyle : char {
+    MATLAB = 'M',
+    Python = 'P'
+};
+
+// =============================================================================
+/// \fn print_colmaj(int64_t n_rows, int64_t n_cols, T *a, cout_able &label, 
+///     ArrayStyle style = ArrayStyle::MATLAB, 
+///     std::ios_base::fmtflags &flags = std::cout.flags()
+/// )
+/// @verbatim embed:rst:leading-slashes
+/// Notes: see https://cplusplus.com/reference/ios/ios_base/fmtflags/ for info on the optional flags argument.
+/// @endverbatim
+template <typename T, typename cout_able>
+void print_colmaj(
+    int64_t n_rows, int64_t n_cols, T *a, cout_able &label,
+    ArrayStyle style = ArrayStyle::MATLAB, 
+    const std::ios_base::fmtflags flags = std::cout.flags()
+) {
+    std::string abs_start {(style == ArrayStyle::MATLAB) ? "\n\t[ "  : "\nnp.array([\n\t[ " };
+    std::string mid_start {(style == ArrayStyle::MATLAB) ? "\t  "    : "\t[ "               };
+    std::string mid_end   {(style == ArrayStyle::MATLAB) ? "; ...\n" : "],\n"               };
+    std::string abs_end   {(style == ArrayStyle::MATLAB) ? "];\n"    : "]\n])\n"            };
+
 	int64_t i, j;
     T val;
-	std::cout << "\n" << label << std::endl;
+    auto old_flags = std::cout.flags();
+    std::cout.flags(flags);
+	std::cout << std::endl << label << abs_start << std::endl;
     for (i = 0; i < n_rows; ++i) {
-        std::cout << "\t";
+        std::cout << mid_start;
         for (j = 0; j < n_cols - 1; ++j) {
             val = a[i + n_rows * j];
-            if (val < 0) {
-				//std::cout << string_format("  %2.4f,", val);
-                printf("  %2.20f,", val);
-            } else {
-				//std::cout << string_format("   %2.4f", val);
-				printf("   %2.20f,", val);
-            }
+            std::cout << "  " << val << ","; 
         }
         // j = n_cols - 1
         val = a[i + n_rows * j];
-        if (val < 0) {
-   			//std::cout << string_format("  %2.4f,", val); 
-			printf("  %2.20f,", val);
-		} else {
-            //std::cout << string_format("   %2.4f,", val);
-			printf("   %2.20f,", val);
-		}
-        printf("\n");
+        std::cout << "  " << val;
+        if (i < n_rows - 1) {
+           std::cout << mid_end;
+        } else {
+            std::cout << abs_end;
+        }
     }
-    printf("\n");
+    std::cout.flags(old_flags);
     return;
 }
 
-
+// =============================================================================
+/// \fn typeinfo_as_string()
+/// @verbatim embed:rst:leading-slashes
+/// When called as ``typeinfo_as_string<your_variable>()``, this function returns a string 
+/// giving all available type information for ``your_variable``. This can be useful
+/// for inspecting types in the heretical practice of *print statement debugging*.
+/// @endverbatim
 template <class T>
-std::string type_name() { // call as type_name<obj>()
+std::string typeinfo_as_string() {
     typedef typename std::remove_reference<T>::type TR;
     std::unique_ptr<char, void(*)(void*)> own
            (
@@ -116,6 +228,18 @@ std::string type_name() { // call as type_name<obj>()
     return r;
 }
 
+// =============================================================================
+/// \fn symmetrize(blas::Layout layout, blas::Uplo uplo, int64_t n, T* A, int64_t lda)
+/// @verbatim embed:rst:leading-slashes
+/// Use this function to convert a matrix that BLAS can *interpet* as symmetric into a matrix
+/// that's explicitly symmetric.
+///
+/// Formally, :math:`A` points to the start of a buffer for an :math:`n \times n` matrix :math:`\mat(A)`
+/// stored in :math:`\ttt{layout}` order with leading dimension :math:`\ttt{lda}.`
+/// This function copies the strict part of the :math:`\ttt{uplo}` triangle of :math:`\mat(A)`
+/// into the strict part of the opposing triangle.
+///
+/// @endverbatim
 template <typename T>
 void symmetrize(blas::Layout layout, blas::Uplo uplo, int64_t n, T* A, int64_t lda) { 
     auto [inter_row_stride, inter_col_stride] = layout_to_strides(layout, lda);
@@ -139,20 +263,42 @@ void symmetrize(blas::Layout layout, blas::Uplo uplo, int64_t n, T* A, int64_t l
     return;
 }
 
+// =============================================================================
+/// \fn overwrite_triangle(blas::Layout layout, blas::Uplo to_overwrite,
+///     int64_t n, int64_t strict_offset,  T* A, int64_t lda
+/// )
+/// @verbatim embed:rst:leading-slashes
+/// Use this function to convert a matrix which BLAS can *interpret* as triangular into a matrix that's
+/// explicitly triangular.
+/// 
+/// Formally, :math:`A` points to the start of a buffer for an :math:`n \times n` matrix :math:`\mat(A)`
+/// stored in :math:`\ttt{layout}` order with leading dimension :math:`\ttt{lda},`
+/// and :math:`\ttt{strict_offset}` is a nonnegative integer.
+///
+/// This function overwrites :math:`A` so that ...
+///  * If :math:`\ttt{to_overwrite} = \ttt{Uplo::Lower},` then elements of :math:`\mat(A)` on or
+///    below its :math:`\ttt{strict_offset}^{\text{th}}` subdiagonal are overwritten with zero.
+///  * If :math:`\ttt{to_overwrite} = \ttt{Uplo::Upper},` then elements of :math:`\mat(A)` on or
+///    above its :math:`\ttt{strict_offset}^{\text{th}}` superdiagonal are overwritten with zero.
+///
+/// This function raises an error if :math:`\ttt{strict_offset}` is negative or if 
+/// :math:`\ttt{to_overwrite}` is neither Upper nor Lower.
+///
+/// @endverbatim
 template <typename T>
-void overwrite_triangle(blas::Layout layout, blas::Uplo to_overwrite, int64_t n, int64_t strict_offset, T val,  T* A, int64_t lda) {
+void overwrite_triangle(blas::Layout layout, blas::Uplo to_overwrite, int64_t n, int64_t strict_offset,  T* A, int64_t lda) {
     auto [inter_row_stride, inter_col_stride] = layout_to_strides(layout, lda);
     #define matA(_i, _j) A[(_i)*inter_row_stride + (_j)*inter_col_stride]
     if (to_overwrite == blas::Uplo::Upper) {
         for (int64_t i = 0; i < n; ++i) {
             for (int64_t j = i + strict_offset; j < n; ++j) {
-                matA(i,j) = val;
+                matA(i,j) = 0.0;
             }
         }
     } else if (to_overwrite == blas::Uplo::Lower) {
         for (int64_t i = 0; i < n; ++i) {
             for (int64_t j = i + strict_offset; j < n; ++j) {
-                matA(j,i) = val;
+                matA(j,i) = 0.0;
             }
         }
     } else {
@@ -162,35 +308,14 @@ void overwrite_triangle(blas::Layout layout, blas::Uplo to_overwrite, int64_t n,
     return;
 }
 
-template <typename T>
-void require_symmetric(blas::Layout layout, const T* A, int64_t n, int64_t lda, T tol) { 
-    if (tol < 0)
-        return;
-    auto [inter_row_stride, inter_col_stride] = layout_to_strides(layout, lda);
-    #define matA(_i, _j) A[(_i)*inter_row_stride + (_j)*inter_col_stride]
-    for (int64_t i = 0; i < n; ++i) {
-        for (int64_t j = i+1; j < n; ++j) {
-            T Aij = matA(i,j);
-            T Aji = matA(j,i);
-            T viol = abs(Aij - Aji);
-            T rel_tol = (abs(Aij) +  abs(Aji) + 1)*tol;
-            if (viol > rel_tol) {
-                std::string message = "Symmetry check failed. |A(%i,%i) - A(%i,%i)| was %d, which exceeds tolerance of %d.";
-                auto _message = message.c_str();
-                randblas_error_if_msg(viol > rel_tol, _message, i, j, j, i, viol, rel_tol);
-                // ^ TODO: fix this macro. Apparently it doesn't print out all that I'd like. Example I just got:
-                //  "Symmetry check failed. |A(0,1) - A(1,0)| was 1610612736, which exceeds toleranc, in function require_symmetric" thrown in the test body.
-            }
-        }
-    }
-    #undef matA
-    return;
-}
-
-/**
- * In-place transpose of square matrix of order n, with leading dimension lda.
- * Turns out that "layout" doesn't matter here.
-*/
+// =============================================================================
+/// \fn transpose_square(T* A, int64_t n, int64_t lda)
+/// @verbatim embed:rst:leading-slashes
+/// In-place transpose of square matrix of order :math:`n`, with leading dimension :math:`\ttt{lda}.`
+///
+/// It turns out that there's no implementation difference between row-major
+/// or column-major data, so we don't accept a layout parameter.
+/// @endverbatim
 template <typename T>
 void transpose_square(T* A, int64_t n, int64_t lda) {
     #define matA(_i, _j) A[(_i) + lda*(_j)]
@@ -203,57 +328,21 @@ void transpose_square(T* A, int64_t n, int64_t lda) {
     return;
 }
 
-template <typename T>
-void omatcopy(int64_t m, int64_t n, const T* A, int64_t irs_a, int64_t ics_a, T* B, int64_t irs_b, int64_t ics_b) {
-    // TODO:
-    //     1. Order the loops with consideration to cache efficiency.
-    //     2. Vectorize one of the loops with blas::copy or std::memcpy.
-    #define MAT_A(_i, _j) A[(_i)*irs_a + (_j)*ics_a]
-    #define MAT_B(_i, _j) B[(_i)*irs_b + (_j)*ics_b]
-    for (int64_t i = 0; i < m; ++i) {
-        for (int64_t j = 0; j < n; ++j) {
-            MAT_B(i,j) = MAT_A(i,j);
-        }
-    }
-    #undef MAT_A
-    #undef MAT_B
-    return;
-}
-
-template <typename T>
-void flip_layout(blas::Layout layout_in, int64_t m, int64_t n, std::vector<T> &A, int64_t lda_in, int64_t lda_out) {
-    using blas::Layout;
-    Layout layout_out;
-    int64_t len_buff_A_out;
-    if (layout_in == Layout::ColMajor) {
-        layout_out = Layout::RowMajor;
-        randblas_require(lda_in  >= m);
-        randblas_require(lda_out >= n);
-        len_buff_A_out = lda_out * m;
-    } else {
-        layout_out = Layout::ColMajor;
-        randblas_require(lda_in  >= n);
-        randblas_require(lda_out >= m);
-        len_buff_A_out = lda_out * n;
-    }
-    // irs = inter row stride (stepping down a column)
-    // ics = inter column stride (stepping across a row)
-    auto [irs_in,   ics_in] = layout_to_strides(layout_in,  lda_in);
-    auto [irs_out, ics_out] = layout_to_strides(layout_out, lda_out);
-
-    if (len_buff_A_out >= (int64_t) A.size()) {
-        A.resize(len_buff_A_out);
-    }
-    std::vector<T> A_in(A);
-    T* A_buff_in  = A_in.data();
-    T* A_buff_out = A.data();
-    omatcopy(m, n, A_buff_in, irs_in, ics_in, A_buff_out, irs_out, ics_out);
-    A.erase(A.begin() + len_buff_A_out, A.end());
-    A.resize(len_buff_A_out);
-    return;
-}
-
 
+// =============================================================================
+/// \fn weights_to_cdf(int64_t n, T* w, T error_if_below = -std::numeric_limits<T>::epsilon())
+/// @verbatim embed:rst:leading-slashes
+/// Checks if all elements of length-:math:`n` array ":math:`w`" are at no smaller than 
+/// :math:`\ttt{error_if_below}.` If this check passes, then we (implicitly) initialize :math:`v := w`` 
+/// and overwrite :math:`w` by
+///
+/// .. math::
+///
+///     w_i = \frac{\textstyle\sum_{\ell=1}^{i}\max\{0, v_{\ell}\}}{\textstyle\sum_{j=1}^n \max\{0, v_j\}}.
+/// 
+/// @endverbatim
+/// On exit, \math{w} is a CDF suitable for use with sample_indices_iid.
+///
 template <typename T>
 void weights_to_cdf(int64_t n, T* w, T error_if_below = -std::numeric_limits<T>::epsilon()) {
     T sum = 0.0;
@@ -274,19 +363,24 @@ static inline TO uneg11_to_uneg01(TI in) {
     return ((TO) in + (TO) 1.0)/ ((TO) 2.0);
 }
 
-/***
- * cdf represents a cumulative distribution function over {0, ..., n - 1}.
- * 
- * TF is a template parameter for a real floating point type.
- * 
- * We overwrite the "samples" buffer with k (independent) samples from the
- * distribution specified by cdf.
- */
-template <typename TF, typename int64_t, typename RNG>
-RNGState<RNG> sample_indices_iid(
-    int64_t n, TF* cdf, int64_t k, int64_t* samples, RNGState<RNG> state
-) {
+// =============================================================================
+/// @verbatim embed:rst:leading-slashes
+/// :math:`\ttt{cdf}` encodes a cumulative distribution function over 
+/// :math:`\{0, \ldots, n - 1\}.` For :math:`0 \leq i < n-1,` it satisfies
+/// 
+/// .. math::
+///
+///    0 \leq \ttt{cdf}[i] \leq \ttt{cdf}[i+1] \leq \ttt{cdf}[n-1] = 1.
+/// 
+/// On exit, :math:`\ttt{samples}` is overwritten by :math:`k` independent samples 
+/// from :math:`\ttt{cdf}.` The returned RNGState should
+/// be used for the next call to a random sampling function whose output should be statistically
+/// independent from :math:`\ttt{samples}.`
+/// @endverbatim
+template <typename T, SignedInteger sint_t, typename state_t = RNGState<DefaultRNG>>
+state_t sample_indices_iid(int64_t n, const T* cdf, int64_t k, sint_t* samples, const state_t &state) {
     auto [ctr, key] = state;
+    using RNG = typename state_t::generator;
     RNG gen;
     auto rv_array = r123ext::uneg11::generate(gen, ctr, key);
     int64_t len_c = (int64_t) state.len_c;
@@ -297,41 +391,59 @@ RNGState<RNG> sample_indices_iid(
             rv_array = r123ext::uneg11::generate(gen, ctr, key);
             rv_index = 0;
         }
-        auto random_unif01 = uneg11_to_uneg01<TF>(rv_array[rv_index]);
-        int64_t sample_index = std::lower_bound(cdf, cdf + n, random_unif01) - cdf;
+        auto random_unif01 = uneg11_to_uneg01<T>(rv_array[rv_index]);
+        sint_t sample_index = std::lower_bound(cdf, cdf + n, random_unif01) - cdf;
         samples[i] = sample_index;
         rv_index += 1;
     }
-    return RNGState<RNG>(ctr, key);
+    return state_t(ctr, key);
 }
-
-/*** 
- * Overwrite the "samples" buffer with k (independent) samples from the
- * uniform distribution over {0, ..., n - 1}.
- */
-template <typename int64_t, typename RNG>
-RNGState<RNG> sample_indices_iid_uniform(
-    int64_t n,  int64_t k, int64_t* samples, RNGState<RNG> state
-) {
+ 
+template <typename T, SignedInteger sint_t, bool WriteRademachers = true, typename state_t = RNGState<DefaultRNG>>
+state_t sample_indices_iid_uniform(int64_t n, int64_t k, sint_t* samples, T* rademachers, const state_t &state) {
+    using RNG = typename state_t::generator;
     auto [ctr, key] = state;
     RNG gen;
     auto rv_array = r123ext::uneg11::generate(gen, ctr, key);
-    int64_t len_c = (int64_t) state.len_c;
+    int64_t len_c = static_cast<int64_t>(state.len_c);
+    if constexpr (WriteRademachers) {
+        len_c = 2*(len_c/2);
+        // ^ round down to the nearest multiple of two.
+    }
     int64_t rv_index = 0;
     double dN = (double) n;
     for (int64_t i = 0; i < k; ++i) {
-        if ((i+1) % len_c == 1) {
+        auto random_unif01 = uneg11_to_uneg01<double>(rv_array[rv_index]);
+        sint_t sample_index = (sint_t) dN * random_unif01;
+        samples[i] = sample_index;
+        rv_index += 1;
+        if constexpr (WriteRademachers) {
+            rademachers[i] = (rv_array[rv_index] >= 0) ? (T) 1 : (T) -1;
+            rv_index += 1;
+        }
+        if (rv_index == len_c) {
             ctr.incr(1);
             rv_array = r123ext::uneg11::generate(gen, ctr, key);
             rv_index = 0;
         }
-        auto random_unif01 = uneg11_to_uneg01<double>(rv_array[rv_index]);
-        int64_t sample_index = (int64_t) dN * random_unif01;
-        samples[i] = sample_index;
-        rv_index += 1;
     }
-    return RNGState<RNG>(ctr, key);
+    return state_t(ctr, key);
 }
 
 
-} // end namespace RandBLAS::util
+// =============================================================================
+/// @verbatim embed:rst:leading-slashes
+/// This function overwrites :math:`\ttt{samples}` with :math:`k` (independent) samples from the
+/// uniform distribution over :math:`\{0, \ldots, n - 1\}.` The returned RNGState should
+/// be used for the next call to a random sampling function whose output should be statistically
+/// independent from :math:`\ttt{samples}.`
+/// 
+/// @endverbatim
+template <SignedInteger sint_t = int64_t, typename state_t = RNGState<DefaultRNG>>
+state_t sample_indices_iid_uniform(int64_t n,  int64_t k, sint_t* samples, const state_t &state) {
+    return sample_indices_iid_uniform<float,sint_t,false,state_t>(n, k, samples, (float*) nullptr, state);
+}
+
+
+} // end namespace RandBLAS
+
diff --git a/docstring_transformers.py b/docstring_transformers.py
new file mode 100644
index 00000000..1824e8f5
--- /dev/null
+++ b/docstring_transformers.py
@@ -0,0 +1,119 @@
+import re
+
+def transform_param_line(line):
+    # Match the parameter line with the format "///      param_name - [direction]"
+    match = re.match(r'///\s+(\w+)\s+-\s+\[(.+)\]', line)
+    if match:
+        param_name = match.group(1)
+        direction = match.group(2)
+        return f"///  @param[{direction}] {param_name}"
+    return line
+
+def transform_math_expressions(text):
+    # Use regex to find and replace :math:`<expr>` with \math{<expr>}
+    return re.sub(r':math:`([^`]+)`', r'\\math{\1}', text)
+
+def transform_bullet_points(text, param_name):
+    # Calculate the offset for bullet points
+    offset = len(param_name) + 2
+    bullet_point_pattern = re.compile(r'(///\s+)[*-]\s+')
+    
+    def replace_bullet(match):
+        leading_slashes = match.group(1)
+        return leading_slashes + ' ' * (offset - len(leading_slashes)) + '- '
+    
+    return bullet_point_pattern.sub(replace_bullet, text)
+
+def transform_documentation(doc):
+    # Remove leading @verbatim and trailing @endverbatim lines
+    doc = re.sub(r'///\s*@verbatim.*\n', '', doc)
+    doc = re.sub(r'///\s*@endverbatim.*\n', '', doc)
+    
+    lines = doc.split('\n')
+    transformed_lines = []
+    param_name = None
+    
+    for line in lines:
+        if re.match(r'///\s+\w+\s+-\s+\[.+\]', line):
+            param_name = re.match(r'///\s+(\w+)\s+-\s+\[.+\]', line).group(1)
+            transformed_lines.append(transform_param_line(line))
+        elif param_name and re.match(r'///\s+[*-]\s+', line):
+            # Transform bullet points and math expressions within the bullet points
+            transformed_line = transform_bullet_points(line, param_name)
+            transformed_line = transform_math_expressions(transformed_line)
+            transformed_lines.append(transformed_line)
+        else:
+            transformed_lines.append(transform_math_expressions(line))
+    
+    return '\n'.join(transformed_lines)
+
+
+def example_rstfullparams():
+    params = \
+r"""
+/// @verbatim embed:rst:leading-slashes
+///
+///      layout - [in]
+///       * Layout::ColMajor or Layout::RowMajor.
+///       * Matrix storage for :math:`\mat(A)` and :math:`\mat(C)`.
+///
+///      opA - [in]
+///       * If :math:`\opA` = NoTrans, then :math:`\op(\mat(A)) = \mat(A)`.
+///       * If :math:`\opA` = Trans, then :math:`\op(\mat(A)) = \mat(A)^T`.
+///
+///      opB - [in]
+///       * If :math:`\opB` = NoTrans, then :math:`\op(\mtxB) = \mtxB`.
+///       * If :math:`\opB` = Trans, then :math:`\op(\mtxB) = \mtxB^T`.
+///
+///      m - [in]
+///       * A nonnegative integer.
+///       * The number of rows in :math:`\mat(C)`.
+///       * The number of rows in :math:`\op(\mat(A))`.
+///
+///      n - [in]
+///       * A nonnegative integer.
+///       * The number of columns in :math:`\mat(C)`.
+///       * The number of columns in :math:`\op(\mtxB)`.
+///
+///      k - [in]
+///       * A nonnegative integer.
+///       * The number of columns in :math:`\op(\mat(A))`
+///       * The number of rows in :math:`\op(\mtxB)`.
+///
+///      alpha - [in]
+///       * A real scalar.
+///
+///      A - [in]
+///       * Pointer to a 1D array of real scalars.
+///
+///      lda - [in]
+///       * A nonnegative integer.
+///       * Leading dimension of :math:`\mat(A)` when reading from :math:`A`. 
+///
+///      B - [in]
+///       * A RandBLAS sparse matrix object.
+///
+///      beta - [in]
+///       * A real scalar.
+///       * If zero, then :math:`C` need not be set on input.
+///
+///      C - [in, out]
+///       * Pointer to 1D array of real scalars.
+///       * On entry, defines :math:`\mat(C)`
+///         on the RIGHT-hand side of :math:`(\star)`.
+///       * On exit, defines :math:`\mat(C)`
+///         on the LEFT-hand side of :math:`(\star)`.
+///
+///      ldc - [in]
+///       * A nonnegative integer.
+///       * Leading dimension of :math:`\mat(C)` when reading from :math:`C`.
+///
+/// @endverbatim
+"""
+    return params
+
+
+if __name__ == '__main__':
+    out = transform_documentation(example_rstfullparams())
+    print(out)
+    print()
diff --git a/examples/sparse-low-rank-approx/qrcp_matrixmarket.cc b/examples/sparse-low-rank-approx/qrcp_matrixmarket.cc
index 252a348f..15ef89e8 100644
--- a/examples/sparse-low-rank-approx/qrcp_matrixmarket.cc
+++ b/examples/sparse-low-rank-approx/qrcp_matrixmarket.cc
@@ -47,6 +47,7 @@
 #include <stdexcept>
 
 
+using RandBLAS::sparse_data::reserve_coo;
 using RandBLAS::sparse_data::COOMatrix;
 using RandBLAS::sparse_data::CSCMatrix;
 using std_clock = std::chrono::high_resolution_clock;
@@ -80,7 +81,7 @@ COOMatrix<T> from_matrix_market(std::string fn) {
     );
 
     COOMatrix<T> out(n_rows, n_cols);
-    out.reserve(vals.size());
+    reserve_coo(vals.size(),out);
     for (int i = 0; i < out.nnz; ++i) {
         out.rows[i] = rows[i];
         out.cols[i] = cols[i];
@@ -132,7 +133,7 @@ int sketch_orthogonalize_rows(int64_t m, int64_t n, T* A, T* work, int64_t d, in
     randblas_require(d >= m);
     std::vector<T> tau(d, 0.0);
     int64_t vec_nnz = std::min(d/2, (int64_t) 4);
-    RandBLAS::SparseDist D{n, d, vec_nnz};
+    RandBLAS::SparseDist D(n, d, vec_nnz);
     RandBLAS::SparseSkOp<T> S(D, state);
     // Simple option (shown here):
     //      Sketch A in column-major format, then do LQ on the sketch.
@@ -261,11 +262,11 @@ void power_iter_col_sketch(SpMat &A, int64_t k, T* Y, int64_t p_data_aware, STAT
 
     int64_t p_done = 0;
     if (p_data_aware % 2 == 0) {
-        RandBLAS::DenseDist D(k, m, RandBLAS::DenseDistName::Gaussian);
+        RandBLAS::DenseDist D(k, m, RandBLAS::ScalarDist::Gaussian);
         TIMED_LINE(
         RandBLAS::fill_dense(D, mat_work2, state), "sampling : ")
     } else {
-        RandBLAS::DenseDist D(k, n, RandBLAS::DenseDistName::Gaussian);
+        RandBLAS::DenseDist D(k, n, RandBLAS::ScalarDist::Gaussian);
         TIMED_LINE(
         RandBLAS::fill_dense(D, mat_work1, state), "sampling : ")
         TIMED_LINE(
diff --git a/examples/sparse-low-rank-approx/svd_matrixmarket.cc b/examples/sparse-low-rank-approx/svd_matrixmarket.cc
index e441a188..71eaea90 100644
--- a/examples/sparse-low-rank-approx/svd_matrixmarket.cc
+++ b/examples/sparse-low-rank-approx/svd_matrixmarket.cc
@@ -46,6 +46,7 @@
 #include <chrono>
 #include <fstream>
 
+using RandBLAS::sparse_data::reserve_coo;
 using RandBLAS::sparse_data::COOMatrix;
 using std_clock = std::chrono::high_resolution_clock;
 using timepoint_t = std::chrono::time_point<std_clock>;
@@ -77,7 +78,7 @@ COOMatrix<T> from_matrix_market(std::string fn) {
     );
 
     COOMatrix<T> out(n_rows, n_cols);
-    out.reserve(vals.size());
+    reserve_coo(vals.size(), out);
     for (int i = 0; i < out.nnz; ++i) {
         out.rows[i] = rows[i];
         out.cols[i] = cols[i];
diff --git a/examples/sparse-low-rank-approx/svd_rank1_plus_noise.cc b/examples/sparse-low-rank-approx/svd_rank1_plus_noise.cc
index 4bbf7580..c0670679 100644
--- a/examples/sparse-low-rank-approx/svd_rank1_plus_noise.cc
+++ b/examples/sparse-low-rank-approx/svd_rank1_plus_noise.cc
@@ -44,6 +44,7 @@
 #include <numbers>
 #include <chrono>
 
+using RandBLAS::sparse_data::reserve_coo;
 using RandBLAS::sparse_data::COOMatrix;
 
 #define DOUT(_d) std::setprecision(std::numeric_limits<double>::max_digits10) << _d
@@ -73,11 +74,11 @@ void iid_sparsify_random_dense(
     int64_t n_rows, int64_t n_cols, int64_t stride_row, int64_t stride_col, T* mat, T prob_of_zero, RandBLAS::RNGState<RNG> state
 ) { 
     auto spar = new T[n_rows * n_cols];
-    auto dist = RandBLAS::DenseDist(n_rows, n_cols, RandBLAS::DenseDistName::Uniform);
+    auto dist = RandBLAS::DenseDist(n_rows, n_cols, RandBLAS::ScalarDist::Uniform);
     auto next_state = RandBLAS::fill_dense(dist, spar, state);
 
     auto temp = new T[n_rows * n_cols];
-    auto D_mat = RandBLAS::DenseDist(n_rows, n_cols, RandBLAS::DenseDistName::Uniform);
+    auto D_mat = RandBLAS::DenseDist(n_rows, n_cols, RandBLAS::ScalarDist::Uniform);
     RandBLAS::fill_dense(D_mat, temp, next_state);
 
     #define SPAR(_i, _j) spar[(_i) + (_j) * n_rows]
@@ -131,7 +132,7 @@ SpMat sum_of_coo_matrices(SpMat &A, SpMat &B) {
     }
 
     SpMat C(A.n_rows, A.n_cols);
-    C.reserve(c_dict.size());
+    reserve_coo(c_dict.size(), C);
     int64_t ell = 0;
     for (auto iter : c_dict) {
         Tuple t = iter.first;
@@ -145,13 +146,9 @@ SpMat sum_of_coo_matrices(SpMat &A, SpMat &B) {
 }
 
 
-template <typename SpMat>
-void make_signal_matrix(double signal_scale, double* u, int64_t m, double* v, int64_t n, int64_t vec_nnz, double* signal_dense, SpMat &signal_sparse) {
-    using T = typename SpMat::scalar_t;
-    using sint_t = typename SpMat::index_t;
-    constexpr bool valid_type = std::is_same_v<SpMat, COOMatrix<T, sint_t>>;
-    randblas_require(valid_type);
-    signal_sparse.reserve(vec_nnz * vec_nnz);
+template <typename COOMatrix>
+void make_signal_matrix(double signal_scale, double* u, int64_t m, double* v, int64_t n, int64_t vec_nnz, double* signal_dense, COOMatrix &signal_sparse) {
+    reserve_coo(vec_nnz * vec_nnz, signal_sparse);
 
     // populate signal_dense and signal_sparse.
     RandBLAS::RNGState u_state(0);
@@ -162,8 +159,8 @@ void make_signal_matrix(double signal_scale, double* u, int64_t m, double* v, in
     double uv_scale = 1.0 / std::sqrt((double) vec_nnz);
 
 
-    auto v_state    = RandBLAS::repeated_fisher_yates(u_state, vec_nnz, m, 1, work_idxs, trash, work_vals);
-    auto next_state = RandBLAS::repeated_fisher_yates(v_state, vec_nnz, n, 1, work_idxs+vec_nnz, trash, work_vals+vec_nnz);
+    auto v_state    = RandBLAS::sparse::repeated_fisher_yates(u_state, vec_nnz, m, 1, work_idxs, trash, work_vals);
+    auto next_state = RandBLAS::sparse::repeated_fisher_yates(v_state, vec_nnz, n, 1, work_idxs+vec_nnz, trash, work_vals+vec_nnz);
     for (int j = 0; j < vec_nnz; ++j) {
         for (int i = 0; i < vec_nnz; ++i) {
             int temp = i + j*vec_nnz;
diff --git a/examples/total-least-squares/tls_sparse_skop.cc b/examples/total-least-squares/tls_sparse_skop.cc
index 44c5c3e6..64aa44ae 100644
--- a/examples/total-least-squares/tls_sparse_skop.cc
+++ b/examples/total-least-squares/tls_sparse_skop.cc
@@ -139,12 +139,12 @@ int main(int argc, char* argv[]){
 
     // Sample the sketching operator 
     auto time_constructsketch1 = high_resolution_clock::now();
-    RandBLAS::SparseDist Dist = {
-        .n_rows = sk_dim,                            // Number of rows of the sketching operator 
-        .n_cols = m,                                 // Number of columns of the sketching operator
-        .vec_nnz = 8,                                // Number of non-zero entires per major-axis vector
-        .major_axis = RandBLAS::MajorAxis::Short     // A "SASO" (aka SJLT, aka OSNAP, aka generalized CountSketch)
-    };
+    RandBLAS::SparseDist Dist(
+        sk_dim,                  // Number of rows of the sketching operator 
+        m,                       // Number of columns of the sketching operator
+        8,                       // Number of non-zero entires per column,
+        RandBLAS::Axis::Short    // A "SASO" (aka SJLT, aka OSNAP, aka generalized CountSketch)
+    );
     uint32_t seed = 1997;
     RandBLAS::SparseSkOp<double> S(Dist, seed);  
     RandBLAS::fill_sparse(S);
diff --git a/rtd/DevNotes.md b/rtd/DevNotes.md
index 32854185..e706ef62 100644
--- a/rtd/DevNotes.md
+++ b/rtd/DevNotes.md
@@ -19,7 +19,7 @@ within the ``rtd`` folder. (The role of this folder is explained below.)
 
 ## 3. Project file and folder structure
 
-We have one directory that contains all files needed for web documentation. We call this diretory "``rtd``", as an abbreviation for "read the docs". Here is the project structure relative to that directory.
+We have one directory that contains all files needed for web documentation. We call this directory "``rtd``", as an abbreviation for "read the docs". Here is the project structure relative to that directory.
 ```
 rtd/
 ├── source/
@@ -43,9 +43,9 @@ The files ``conf.py``, ``Doxyfile``, ``theme.conf``, and ``theme_overrides.css``
 
 The file ``mathmacros.py`` contains custom, nontrivial code. It's needed so that we can define LaTeX macros and use them in the C++ source code documentation. We clarify its role and explain how it's accessed below.
 
-The file ``index.rst`` defines the landing page of the sphinx website. This file is also used together with ``rtd/source/<OTHER_FOLDERS>`` to define *all* pages on the sphinx website. Because there's tons of information about general sphinx websites out there we won't go into much more detail about this file.
+The file ``index.rst`` defines the landing page of the sphinx website. This file is also used together with ``rtd/source/<OTHER_FOLDERS>`` to define *all* pages on the sphinx website. Because there's a ton of information about general sphinx websites out there we won't go into much more detail about this file.
 
-## 4. Where did these files come from and what goes into them?
+## 4. Where did these files come from, and what goes into them?
 
 ### 4.1. What goes into conf.py
 
diff --git a/rtd/source/Doxyfile b/rtd/source/Doxyfile
index 1a369dde..8a3a3e4e 100644
--- a/rtd/source/Doxyfile
+++ b/rtd/source/Doxyfile
@@ -877,7 +877,7 @@ WARN_LOGFILE           =
 
 INPUT                  = ../../README.md \
                          ../../ \
-                         ../../RandBLAS/ \
+                         ../../RandBLAS/*.hh \
                          ../../RandBLAS/sparse_data
 
 # This tag can be used to specify the character encoding of the source files
@@ -2200,7 +2200,7 @@ INCLUDE_FILE_PATTERNS  =
 # recursively expanded use the := operator instead of the = operator.
 # This tag requires that the tag ENABLE_PREPROCESSING is set to YES.
 
-PREDEFINED             = DOXYGEN=1
+PREDEFINED             = DOXYGEN=1 __cpp_concepts=202002L
 
 # If the MACRO_EXPANSION and EXPAND_ONLY_PREDEF tags are set to YES then this
 # tag can be used to specify a list of macro names that should be expanded. The
diff --git a/rtd/source/FAQ.rst b/rtd/source/FAQ.rst
new file mode 100644
index 00000000..bb786959
--- /dev/null
+++ b/rtd/source/FAQ.rst
@@ -0,0 +1,181 @@
+FAQ and Limitations
+==============================
+
+
+
+How do I do this and that?
+--------------------------
+
+How do I sketch a const symmetric matrix that's only stored in an upper or lower triangle?
+  You can only do this with dense sketching operators.
+  You'll have to prepare the plain buffer representation yourself with 
+  :cpp:any:`RandBLAS::fill_dense_unpacked`
+  and then you'll have to use that buffer in your own SYMM function.
+
+How do I sketch a submatrix of a sparse matrix?
+  You can only do this if the sparse matrix in in COO format.
+  Take a look at the ``lsksp3`` and ``rsksp3`` functions in the source code (they aren't documented on this website).
+
+
+Why did you ... ?
+-----------------
+
+Why the name RandBLAS?
+  RandBLAS derives its name from BLAS: the basic linear algebra subprograms. Its name evokes the *purpose* of BLAS rather 
+  than the acronym. Specifically, the purpose of RandBLAS is to provide high-performance and reliable functionality that 
+  can be used to build sophisticated randomized algorithms for matrix computations.
+  
+  The RandBLAS API is also as close as possible to the BLAS API while remaining polymorphic. Some may find this
+  decision dubious, since the BLAS API is notoriously cumbersome for new users. We believe that these downsides
+  are made up for by the flexibilty and portability of a BLAS-like API. It is also our hope that popular high-level
+  libraries that already wrap BLAS might find it straightforward to define similar wrappers for RandBLAS.
+
+DenseDist and SparseDist are simple structs. Why bother having constructors for these classes, when you could just use initialization lists?
+  Both of these types only have four user-decidable parameters.
+  We tried to implement and document these structs only using four members each.
+  This was doable, but very cumbersome.
+  In order to write clearer documentation we introduced several additional members whose values are semantically meaningful
+  but ultimately dependent on the others.
+  Using constructors makes it possible for us to ensure all members are initialized consistently.
+
+Why don't you automatically scale your sketching operators to give partial isometries in expectation?
+  There are a few factors that led us to this decision. None of these factors is a huge deal, alone, but they become significant when considered together.
+
+  1. Many randomized algorithms have a property where their output is invariant under rescaling of the sketching operators that they use internally.
+  2. Sketching operators are easier to describe if we think in terms of their "natural" scales before considering their use as tools for dimension reduction.
+     For example, the natural scale for DenseSkOps is to have entries that are sampled iid from a mean-zero and variance-one distribution.
+     The natural scale for SparseSkOps (with major_axis==Short) is for nonzero entries to have absolute value equal to one.
+     Describing these operators in their isometric scales would require that we specify dimensions, which dimension is larger than the other,
+     and possibly additional tuning parameters (like vec_nnz for SparseSkOps).
+  3. It's easier for us to describe how implicit concatenation of sketching operators works (see :ref:`this part <sketch_updates>` of our tutorial)
+     if using the isometry scale is optional, and off by default.
+
+Why are all dimensions 64-bit integers?
+  RandNLA is interesting for large matrices. It would be too easy to have an index overflow if we allowed 32-bit indexing.
+  We do allow 32-bit indexing for buffers underlying sparse matrix datastructures, but we recommend sticking with 64-bit.
+
+I looked at the source code and found weird function names like "lskge3," "rskges," and "lsksp3." What's that about?
+  There are two reasons for these kinds of names.
+  First, having these names makes it easier to call RandBLAS from languages that don't support function overloading.
+  Second, these short and specific names make it possible to communicate efficiently and precisely (useful for test code and developer documentation). 
+
+Why does sketch_symmetric not use a "side" argument, like symm in BLAS?
+  There are many BLAS functions that include a "side" argument. This argument always refers to the argument with the most notable structure.
+  In symm, the more structured argument is the symmetric matrix.
+  In RandBLAS, the more structed argument is always the sketching operator. Given this, we saw three options to move forward.
+
+  1. Keep "side," and have it refer to the position of the symmetric matrix. This is superficially simmilar to the underlying BLAS convention.
+  2. Keep "side," and have it refer to the position of the sketching operator. This is similar to BLAS at a deeper level, but people could
+     easily use it incorrectly.
+  3. Dispense with "side" altogether.
+
+  We chose the third option since that's more in line with modern APIs for BLAS-like functionality (namely, std::linalg).
+
+
+Limitations
+-----------
+
+No complex domain support:
+  BLAS' support for this is incomplete. You can't mix real and complex, you can't conjugate without transposing, etc… 
+  We plan on revisiting the question of complex data in RandBLAS a few years from now.
+
+No support for sparse-times-sparse (aka SpGEMM):
+  This will probably "always" be the case, since we think it's valuable to keep RandBLAS' scope limited.
+
+No support for subrampled randomized trig transforms (SRFT, SRHT, SRCT, etc...):
+  We'd happily accept a contribution of a randomized Hadamard transform (without subsampling)
+  that implicitly zero-pads inputs when needed. Given such a function we could figure out 
+  how we'd like to build sketching operators on top of it.
+
+No support for DenseSkOps with Rademachers:
+  We'd probably need support for mixed-precision arithmetic to fully realize the advantage of
+  Rademacher over uniform [-1,1]. It's not clear to me how we'd go about doing that. There 
+  *is* the possibility of generating Rademachers far faster than uniform [-1, 1]. The implementation
+  of this method might be a little complicated. 
+
+No support for negative values of "incx" or "incy" in sketch_vector.
+  The BLAS function GEMV supports negative strides between input and output vector elements.
+  It would be easy to extend sketch_vector to support this if we had a proper
+  SPMV implementation that supported negative increments. If someone wants to volunteer 
+  to extent our SPMV kernels to support that, then we'd happily accept such a contribution.
+  (It shouldn't be hard! We just haven't gotten around to this.)
+
+Symmetric matrices have to be stored as general matrices.
+  This stems partly from a desire to have sketch_symmetric work equally well with DenseSkOp and SparseSkOp.
+  Another reason is that BLAS' SYMM function doesn't allow transposes, which is a key tool we use
+  in sketch_general to resolve layout discrepencies between the various arguments.
+
+
+Language interoperability
+-------------------------
+
+C++ idioms and features we do use
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Things that affect our API:
+ * Templates. We template for floating point precision just about everywhere.
+   We also template for stateful random number generators (see :cpp:any:`RandBLAS::RNGState`)
+   and arrays of 32-bit versus 64-bit signed integers.
+ * Standard constructors. We use these for any nontrivial struct type in RandBLAS. They're important
+   because many of our datatypes have const members that need to be initialized as functions (albeit
+   simple funcitons) of other members.
+ * Move constructors. We use these to return nontrivial datastructures from a few undocumented functions.
+   We mostly added them because we figured users would really want them.
+ * C++20 Concepts. These make our assumptions around template parameters more explicit.
+   In the cases of :ref:`SketchingDistribution <concept_rand_b_l_a_s_1_1_sketching_distribution>` and
+   :ref:`SketchingOperator <concept_rand_b_l_a_s_1_1_sketching_operator>` this is also a way
+   for us to declare a common interface for future functionality.
+ * Default values for trailing function arguments.
+
+Things that are purely internal:
+ * C++17 ``if constexpr`` branching.
+ * Structured bindings. 
+
+
+C++ idioms and features we don't use
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ * The span or mdspan datastructures.
+ * Inheritance.
+ * Private or protected members of structs.
+ * Shared pointers.
+ * Instance methods for structs (with the exceptions of constructors and destructors).
+
+
+Naming conventions to resolve function overloading
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+We routinely use function overloading, and that reduces portability across languages.
+See below for details on where we stand and where we plan to go to resolve this shortcoming.
+
+We have a consistent naming convention for functions that involve sketching operators
+ * [L/R] are prefixes used when we need to consider left and right-multiplication.
+ * The characters "sk" appearing at the start of a name or after [L/R] indicates that a function involves taking a product with a sketching operator.
+ * Two characters are used to indicate the structure of the data in the sketching operatation.
+   The options for the characters are {ge, sy, ve, sp}, which stand for general, *explicitly* symmetric, vector, and sparse (respectively).
+ * A single-character [X] suffix is used to indicate the structure of the sketching operator. The characters are "3" (for dense sketching
+   operators, which would traditionally be applied with BLAS 3 function) and "s" (for sparse sketching operators).
+
+Functions that implement the overload-free conventions
+ * [L/R]skge[X] for sketching a general matrix from the left (L) or right (R) with a matrix whose structure is indicated by [X].
+   C++ code should prefer overloaded sketch_general
+ * [L/R]sksp3 for sketching a sparse matrix from the left (L) (L) or right (R) with a DenseSkOp.
+   C++ code should prefer overloaded sketch_sparse, unless operating on a submatrix of a COO-format sparse data matrix is needed.
+
+Functions that are missing implementations of this convention
+ * [L/R]skve[X] for sketching vectors. This functionality is availabile in C++ with sketch_vector
+ * [L/R]sksy[X] for sketching a matrix with *explicit symmetry*. This functionality is availabile in C++ with sketch_symmetric.
+
+Some discussion
+
+  Our templating for numerical precision should be resolved by prepending "d" for double precision or "s" for single precision
+
+  RandBLAS requires a consistent naming convention across an API that supports multiple structured operands (e.g., sketching sparse data),
+  while conventions in the BLAS API only need to work when one operand is structured.
+  This is why our consistent naming convention might appear "less BLAS-like" than it could be.
+
+  All of these overload-free function names have explicit row and column offset parameters to handle submatrices of linear operators.
+  However, the overloaded versions of these functions have *additional* overloads based on setting the offset parameters to zero.
+
+We have no plans for consistent naming of overload-free sparse BLAS functions. The most we do in this regard is offer functions
+called [left/right]_spmm for SpMM where the sparse matrix operand appears on the left or on the right.
+
diff --git a/rtd/source/api_reference/index.rst b/rtd/source/api_reference/index.rst
index f0a3da6d..ce4b563c 100644
--- a/rtd/source/api_reference/index.rst
+++ b/rtd/source/api_reference/index.rst
@@ -1,5 +1,3 @@
-.. :sd_hide_title:
-.. ^ uncomment that if you want to prevent the header from rendering.
 
 #############
 API Reference
@@ -9,12 +7,9 @@ API Reference
 ..  TODO. Explain that we always say data matrices are :math:`m \times n`. Sketching involves matrices (S, A, B).
 
 .. toctree::
-    :maxdepth: 4
-
-    Distributions, random states, and sketching operators <skops_and_dists>
-    Computing a sketch: dense data <sketch_dense>
-    Representing sparse data <sparse_matrices>
-    Computing a sketch: sparse data <sketch_sparse>
-    Sparse BLAS operations <other_sparse>
-    Utilities for index sampling <index_sampling_utils>
+    :maxdepth: 2
 
+    Fundamentals <skops_and_dists>
+    Working with dense data <sketch_dense>
+    Working with sparse data <sketch_sparse>
+    Utilities <utilities>
diff --git a/rtd/source/api_reference/index_sampling_utils.rst b/rtd/source/api_reference/index_sampling_utils.rst
deleted file mode 100644
index d690ac86..00000000
--- a/rtd/source/api_reference/index_sampling_utils.rst
+++ /dev/null
@@ -1,13 +0,0 @@
-
-############################################################
-Utilities for coordinate and index-set sampling
-############################################################
-
-    .. doxygenfunction:: RandBLAS::util::sample_indices_iid_uniform(int64_t n, int64_t k, int64_t* samples, RNGState<RNG> state)
-      :project: RandBLAS
-
-    .. doxygenfunction:: RandBLAS::util::sample_indices_iid(int64_t n, TF* cdf, int64_t k, int64_t* samples, RNGState<RNG> state)
-      :project: RandBLAS
-
-    .. doxygenfunction:: RandBLAS::util::weights_to_cdf(int64_t n, T* w, T error_if_below = -std::numeric_limits<T>::epsilon())
-      :project: RandBLAS
diff --git a/rtd/source/api_reference/other_sparse.rst b/rtd/source/api_reference/other_sparse.rst
deleted file mode 100644
index 40dc5a34..00000000
--- a/rtd/source/api_reference/other_sparse.rst
+++ /dev/null
@@ -1,21 +0,0 @@
-   .. |op| mathmacro:: \operatorname{op}
-   .. |mat| mathmacro:: \operatorname{mat}
-   .. |submat| mathmacro:: \operatorname{submat}
-   .. |lda| mathmacro:: \texttt{lda}
-   .. |ldb| mathmacro:: \texttt{ldb}
-   .. |ldc| mathmacro:: \texttt{ldc}
-   .. |opA| mathmacro:: \texttt{opA}
-   .. |opB| mathmacro:: \texttt{opB}
-
-############################################################
-Other sparse matrix operations
-############################################################
-
-
-.. doxygenfunction:: RandBLAS::spmm(blas::Layout layout, blas::Op opA, blas::Op opB, int64_t m, int64_t n, int64_t k, T alpha, SpMat &A, int64_t ro_a, int64_t co_a, const T *B, int64_t ldb, T beta, T *C, int64_t ldc)  
-  :project: RandBLAS
-
-
-.. doxygenfunction:: RandBLAS::spmm(blas::Layout layout, blas::Op opA, blas::Op opB, int64_t m, int64_t n, int64_t k, T alpha, const T* A, int64_t lda, SpMat &B, int64_t ro_b, int64_t co_b, T beta, T *C, int64_t ldc) 
-  :project: RandBLAS
-
diff --git a/rtd/source/api_reference/sketch_dense.rst b/rtd/source/api_reference/sketch_dense.rst
index f8763266..7e1b9c3f 100644
--- a/rtd/source/api_reference/sketch_dense.rst
+++ b/rtd/source/api_reference/sketch_dense.rst
@@ -1,42 +1,59 @@
-
- .. |op| mathmacro:: \operatorname{op}
- .. |mat| mathmacro:: \operatorname{mat}
- .. |submat| mathmacro:: \operatorname{submat}
- .. |lda| mathmacro:: \texttt{lda}
- .. |ldb| mathmacro:: \texttt{ldb}
- .. |opA| mathmacro:: \texttt{opA}
- .. |opS| mathmacro:: \texttt{opS}
+   .. |op| mathmacro:: \operatorname{op}
+   .. |mat| mathmacro:: \operatorname{mat}
+   .. |submat| mathmacro:: \operatorname{submat}
+   .. |lda| mathmacro:: \texttt{lda}
+   .. |ldb| mathmacro:: \texttt{ldb}
+   .. |ldc| mathmacro:: \texttt{ldc}
+   .. |opA| mathmacro:: \texttt{opA}
+   .. |opB| mathmacro:: \texttt{opB}
+   .. |opS| mathmacro:: \texttt{opS}
+   .. |mtxA| mathmacro:: \mathbf{A}
+   .. |mtxB| mathmacro:: \mathbf{B}
+   .. |mtxC| mathmacro:: \mathbf{C}
+   .. |mtxS| mathmacro:: \mathbf{S}
+   .. |mtxX| mathmacro:: \mathbf{X}
+   .. |mtxx| mathmacro:: \mathbf{x}
+   .. |mtxy| mathmacro:: \mathbf{y}
+   .. |ttt| mathmacro:: \texttt
 
 ******************************************
-Computing a sketch: dense data
+Working with dense data in RandBLAS
 ******************************************
 
+.. TODO: add a few words about the data model.
+
+
+Sketching dense matrices and vectors 
+====================================
+
 RandBLAS has adaptions of GEMM, GEMV, and SYMM when one of their matrix operands is a sketching operator.
 These adaptations are provided through overloaded functions named sketch_general, sketch_vector, and sketch_symmetric.
 
 Out of the functions presented here, only sketch_general has low-level implementations;
-sketch_vector and sketch_symmetric are basic wrappers around sketch_general, and are provided to make
+sketch_vector and sketch_symmetric are basic wrappers around sketch_general, and are provided
 to make implementations less error-prone when porting code that currently uses BLAS
 or a BLAS-like interface.
 
+
+
 Analogs to GEMM
-===============
+---------------
 
-.. dropdown:: :math:`B = \alpha \cdot \op(S)\cdot \op(A) + \beta \cdot  B`
+.. dropdown:: :math:`\mtxB = \alpha \cdot \op(\mtxS)\cdot \op(\mtxA) + \beta \cdot  \mtxB`
     :animate: fade-in-slide-down
     :color: light
 
     .. doxygenfunction:: RandBLAS::sketch_general(blas::Layout layout, blas::Op opS, blas::Op opA, int64_t d, int64_t n, int64_t m, T alpha, SKOP &S, const T *A, int64_t lda, T beta, T *B, int64_t ldb)
       :project: RandBLAS
 
-.. dropdown:: :math:`B = \alpha \cdot \op(A)\cdot \op(S) + \beta \cdot B`
+.. dropdown:: :math:`\mtxB = \alpha \cdot \op(\mtxA)\cdot \op(\mtxS) + \beta \cdot \mtxB`
   :animate: fade-in-slide-down
   :color: light
 
     .. doxygenfunction:: RandBLAS::sketch_general(blas::Layout layout, blas::Op opA, blas::Op opS, int64_t m, int64_t d, int64_t n, T alpha, const T *A, int64_t lda, SKOP &S, T beta, T *B, int64_t ldb)
       :project: RandBLAS
 
-.. dropdown:: Variants using :math:`\op(\submat(S))`
+.. dropdown:: Variants using :math:`\op(\submat(\mtxS))`
     :animate: fade-in-slide-down
     :color: light
 
@@ -48,16 +65,16 @@ Analogs to GEMM
 
 
 Analogs to SYMM
-===============
+---------------
 
-.. dropdown:: :math:`B = \alpha \cdot S \cdot A + \beta \cdot B`
+.. dropdown:: :math:`\mtxB = \alpha \cdot \mtxS \cdot \mtxA + \beta \cdot \mtxB`
   :animate: fade-in-slide-down
   :color: light
 
     .. doxygenfunction:: RandBLAS::sketch_symmetric(blas::Layout layout, T alpha, SKOP &S, const T *A, int64_t lda, T beta, T *B, int64_t ldb, T sym_check_tol = 0)
       :project: RandBLAS
 
-.. dropdown:: :math:`B = \alpha \cdot A \cdot S + \beta \cdot B`
+.. dropdown:: :math:`\mtxB = \alpha \cdot \mtxA \cdot \mtxS + \beta \cdot \mtxB`
   :animate: fade-in-slide-down
   :color: light
 
@@ -65,7 +82,7 @@ Analogs to SYMM
       :project: RandBLAS
 
 
-.. dropdown:: Variants using  :math:`\submat(S)`
+.. dropdown:: Variants using  :math:`\submat(\mtxS)`
     :animate: fade-in-slide-down
     :color: light
 
@@ -78,19 +95,31 @@ Analogs to SYMM
 
 
 Analogs to GEMV
-===============
+---------------
 
-.. dropdown:: :math:`y = \alpha \cdot \op(S) \cdot x + \beta \cdot y`
+.. dropdown:: :math:`\mtxy = \alpha \cdot \op(\mtxS) \cdot \mtxx + \beta \cdot \mtxy`
     :animate: fade-in-slide-down
     :color: light
 
     .. doxygenfunction:: sketch_vector(blas::Op opS, T alpha, SKOP &S, const T *x, int64_t incx, T beta, T *y, int64_t incy)
       :project: RandBLAS
 
-.. dropdown:: Variants using :math:`\op(\submat(S))`
+.. dropdown:: Variants using :math:`\op(\submat(\mtxS))`
     :animate: fade-in-slide-down
     :color: light
 
     .. doxygenfunction:: sketch_vector(blas::Op opS, int64_t d, int64_t m, T alpha, SKOP &S, int64_t ro_s, int64_t co_s, const T *x, int64_t incx, T beta, T *y, int64_t incy)
       :project: RandBLAS
 
+
+Matrix format utility functions
+===============================
+
+.. doxygenfunction:: RandBLAS::symmetrize(blas::Layout layout, blas::Uplo uplo, int64_t n, T* A, int64_t lda)
+   :project: RandBLAS
+
+.. doxygenfunction:: RandBLAS::transpose_square(T* A, int64_t n, int64_t lda)
+   :project: RandBLAS
+
+.. doxygenfunction:: RandBLAS::overwrite_triangle(blas::Layout layout, blas::Uplo to_overwrite, int64_t n, int64_t strict_offset,  T* A, int64_t lda)
+   :project: RandBLAS
diff --git a/rtd/source/api_reference/sketch_sparse.rst b/rtd/source/api_reference/sketch_sparse.rst
index 7d9fa1c7..0d805681 100644
--- a/rtd/source/api_reference/sketch_sparse.rst
+++ b/rtd/source/api_reference/sketch_sparse.rst
@@ -1,20 +1,108 @@
-
    .. |op| mathmacro:: \operatorname{op}
    .. |mat| mathmacro:: \operatorname{mat}
    .. |submat| mathmacro:: \operatorname{submat}
+   .. |lda| mathmacro:: \texttt{lda}
    .. |ldb| mathmacro:: \texttt{ldb}
+   .. |ldc| mathmacro:: \texttt{ldc}
    .. |opA| mathmacro:: \texttt{opA}
+   .. |opB| mathmacro:: \texttt{opB}
    .. |opS| mathmacro:: \texttt{opS}
+   .. |mtxA| mathmacro:: \mathbf{A}
+   .. |mtxB| mathmacro:: \mathbf{B}
+   .. |mtxC| mathmacro:: \mathbf{C}
+   .. |mtxS| mathmacro:: \mathbf{S}
+   .. |mtxX| mathmacro:: \mathbf{X}
+   .. |ttt| mathmacro:: \texttt
+
+************************************
+Working with sparse data in RandBLAS
+************************************
+
+Sparse matrix data structures
+==============================
+
+
+.. dropdown:: The common interface for our sparse matrix types
+    :animate: fade-in-slide-down
+    :color: light
+
+    .. doxygenenum:: RandBLAS::sparse_data::IndexBase
+        :project: RandBLAS
+
+    .. doxygenconcept:: RandBLAS::sparse_data::SparseMatrix
+        :project: RandBLAS
+
+.. dropdown:: COOMatrix
+    :animate: fade-in-slide-down
+    :color: light
+
+    .. doxygenstruct:: RandBLAS::sparse_data::COOMatrix
+        :project: RandBLAS
+        :members:
+
+    .. doxygenfunction:: RandBLAS::sparse_data::reserve_coo
+        :project: RandBLAS
+
+    .. doxygenenum:: RandBLAS::sparse_data::NonzeroSort
+        :project: RandBLAS
+
+.. dropdown:: CSRMatrix
+    :animate: fade-in-slide-down
+    :color: light
+
+    .. doxygenstruct:: RandBLAS::sparse_data::CSRMatrix
+        :project: RandBLAS
+        :members:
+
+    .. doxygenfunction:: RandBLAS::sparse_data::reserve_csr
+        :project: RandBLAS
+
+.. dropdown:: CSCMatrix
+    :animate: fade-in-slide-down
+    :color: light
+
+    .. doxygenstruct:: RandBLAS::sparse_data::CSCMatrix
+        :project: RandBLAS
+        :members:
+
+    .. doxygenfunction:: RandBLAS::sparse_data::reserve_csc
+        :project: RandBLAS
+
+
+Operations with sparse matrices
+===============================
+
+Sketching
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. dropdown:: :math:`\mtxB = \alpha \cdot \op(\submat(\mtxS))\cdot \op(\mtxA) + \beta \cdot \mtxB`
+    :animate: fade-in-slide-down
+    :color: light
+
+    .. doxygenfunction:: RandBLAS::sketch_sparse(blas::Layout layout, blas::Op opS, blas::Op opA, int64_t d, int64_t n, int64_t m, T alpha, DenseSkOp &S, int64_t S_ro, int64_t S_co, SpMat &A, T beta, T *B, int64_t ldb) 
+      :project: RandBLAS
+
+.. dropdown:: :math:`\mtxB = \alpha \cdot \op(\mtxA)\cdot \op(\submat(\mtxS)) + \beta \cdot \mtxB`
+    :animate: fade-in-slide-down
+    :color: light
+
+    .. doxygenfunction:: RandBLAS::sketch_sparse(blas::Layout layout, blas::Op opA, blas::Op opS, int64_t m, int64_t d, int64_t n, T alpha, SpMat &A, DenseSkOp &S, int64_t S_ro, int64_t S_co, T beta, T *B, int64_t ldb) 
+      :project: RandBLAS
 
-********************************
-Computing a sketch : sparse data
-********************************
 
+Deterministic operations
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-.. doxygenfunction:: RandBLAS::sketch_sparse(blas::Layout layout, blas::Op opS, blas::Op opA, int64_t d, int64_t n, int64_t m, T alpha, DenseSkOp<T,RNG> &S, int64_t S_ro, int64_t S_co, SpMat &A, int64_t A_ro, int64_t A_co, T beta, T *B, int64_t ldb) 
-  :project: RandBLAS
+.. dropdown:: :math:`\mtxC = \alpha \cdot \op(\mtxA)\cdot \op(\mtxB) + \beta \cdot  \mtxC,` with sparse :math:`\mtxA`
+    :animate: fade-in-slide-down
+    :color: light
 
+    .. doxygenfunction:: RandBLAS::spmm(blas::Layout layout, blas::Op opA, blas::Op opB, int64_t m, int64_t n, int64_t k, T alpha, SpMat &A, const T *B, int64_t ldb, T beta, T *C, int64_t ldc)  
+      :project: RandBLAS
 
-.. doxygenfunction:: RandBLAS::sketch_sparse(blas::Layout layout, blas::Op opA, blas::Op opS, int64_t m, int64_t d, int64_t n, T alpha, SpMat &A, int64_t A_ro, int64_t A_co, DenseSkOp<T,RNG> &S, int64_t S_ro, int64_t S_co, T beta, T *B, int64_t ldb) 
-  :project: RandBLAS
+.. dropdown:: :math:`\mtxC = \alpha \cdot \op(\mtxA)\cdot \op(\mtxB) + \beta \cdot  \mtxC,` with sparse :math:`\mtxB`
+    :animate: fade-in-slide-down
+    :color: light
 
+    .. doxygenfunction:: RandBLAS::spmm(blas::Layout layout, blas::Op opA, blas::Op opB, int64_t m, int64_t n, int64_t k, T alpha, const T* A, int64_t lda, SpMat &B, T beta, T *C, int64_t ldc) 
+      :project: RandBLAS
diff --git a/rtd/source/api_reference/skops_and_dists.rst b/rtd/source/api_reference/skops_and_dists.rst
index 910f7805..053cc364 100644
--- a/rtd/source/api_reference/skops_and_dists.rst
+++ b/rtd/source/api_reference/skops_and_dists.rst
@@ -1,81 +1,136 @@
-***************************************************
-Distributions and sketching operators
-***************************************************
+   .. |op| mathmacro:: \operatorname{op}
+   .. |mat| mathmacro:: \operatorname{mat}
+   .. |submat| mathmacro:: \operatorname{submat}
+   .. |D| mathmacro:: \mathcal{D}
+   .. |lda| mathmacro:: \texttt{lda}
+   .. |ldb| mathmacro:: \texttt{ldb}
+   .. |ldc| mathmacro:: \texttt{ldc}
+   .. |opA| mathmacro:: \texttt{opA}
+   .. |opB| mathmacro:: \texttt{opB}
+   .. |opS| mathmacro:: \texttt{opS}
+   .. |mtxA| mathmacro:: \mathbf{A}
+   .. |mtxB| mathmacro:: \mathbf{B}
+   .. |mtxC| mathmacro:: \mathbf{C}
+   .. |mtxS| mathmacro:: \mathbf{S}
+   .. |mtxX| mathmacro:: \mathbf{X}
+   .. |mtxx| mathmacro:: \mathbf{x}
+   .. |mtxy| mathmacro:: \mathbf{y}
+   .. |vecnnz| mathmacro:: \texttt{vec_nnz}
+   .. |ttt| mathmacro:: \texttt
+
+********************************************************************
+Fundamentals
+********************************************************************
+
+  ..  very similar effects can be achieved in C, Fortran, or Julia. While there are 
+  ..  certainly some popular programming languages that don't support this kind of API
+  ..  (e.g., MATLAB, Python, and R), accessing RandBLAS from these languages should
+  ..  be mediated operator-overloaded objects in a way that's analogous to how one 
+  ..  would access BLAS.
+
+RandBLAS has a polymorphic free-function API. We have spent a significant amount of 
+effort on minimizing the number of RandBLAS–specific datastructures needed in order
+to achieve that polymorphism.
+
+We have a bunch of functions that aren't documented on this website.
+If such a function looks useful, you should feel free to use it. If you
+end up doing that, and you care about your code's compatibility with future
+versions of RandBLAS, then please let us know by filing a quick GitHub issue.
+
+Preliminaries
+=============
+
+.. dropdown:: The Axis enum
+  :animate: fade-in-slide-down 
+  :color: light
+    
+  .. doxygenenum:: RandBLAS::Axis
+      :project: RandBLAS
 
-.. _rngstate_api:
+.. dropdown:: RNGState 
+    :animate: fade-in-slide-down
+    :color: light
 
+    .. doxygenstruct:: RandBLAS::RNGState
+        :project: RandBLAS
+        :members:
 
-Sketching distributions
-=======================
 
-.. doxygenconcept:: RandBLAS::SketchingDistribution
-    :project: RandBLAS
+.. _densedist_and_denseskop_api:
 
-.. doxygenfunction:: RandBLAS::isometry_scale_factor(SkDist D)
-    :project: RandBLAS
+Dense sketching, with Gaussians *et al.*
+========================================
 
+.. dropdown:: DenseDist : a distribution over matrices with i.i.d., mean-zero, variance-one entries
+  :animate: fade-in-slide-down
+  :color: light
 
-Sketching operators and random states
-=====================================
+  .. doxygenenum:: RandBLAS::ScalarDist
+      :project: RandBLAS
 
-.. dropdown:: Sketching operators
-    :animate: fade-in-slide-down
-    :color: light
-    
-    .. doxygenconcept:: RandBLAS::SketchingOperator
-        :project: RandBLAS
+  .. doxygenstruct:: RandBLAS::DenseDist
+      :project: RandBLAS
+      :members:
 
+.. dropdown:: DenseSkOp : a sample from a DenseDist
+  :animate: fade-in-slide-down
+  :color: light
 
-.. dropdown:: The state of a random number generator
-    :animate: fade-in-slide-down
-    :color: light
+  .. doxygenstruct:: RandBLAS::DenseSkOp
+      :project: RandBLAS
+      :members: 
 
-    .. doxygenstruct:: RandBLAS::RNGState
+  .. doxygenfunction:: RandBLAS::fill_dense(DenseSkOp &S)
       :project: RandBLAS
-      :members:
 
+  .. doxygenfunction:: RandBLAS::fill_dense_unpacked(blas::Layout layout, const DenseDist &D, int64_t n_rows, int64_t n_cols, int64_t S_ro, int64_t S_co, T *buff, const RNGState<RNG> &seed)
+      :project: RandBLAS
 
 
-.. _densedist_and_denseskop_api:
+.. _sparsedist_and_sparseskop_api:
 
-DenseDist and DenseSkOp
-============================================
+Sparse sketching, with CountSketch *et al.*
+===========================================
 
-.. doxygenenum:: RandBLAS::DenseDistName
-    :project: RandBLAS
+.. dropdown:: SparseDist : a distribution over structured sparse matrices
+  :animate: fade-in-slide-down
+  :color: light
 
-.. doxygenstruct:: RandBLAS::DenseDist
-   :project: RandBLAS
-   :members:
+  .. doxygenstruct:: RandBLAS::SparseDist
+      :project: RandBLAS
+      :members:
 
-.. doxygenstruct:: RandBLAS::DenseSkOp
-   :project: RandBLAS
-   :members: 
+.. dropdown:: SparseSkOp : a sample from a SparseDist
+  :animate: fade-in-slide-down
+  :color: light
 
-.. doxygenfunction:: RandBLAS::fill_dense(DenseSkOp &S)
-   :project: RandBLAS
+  .. doxygenstruct:: RandBLAS::SparseSkOp
+      :project: RandBLAS
+      :members: 
 
+  .. doxygenfunction:: RandBLAS::fill_sparse(SparseSkOp &S)
+      :project: RandBLAS
 
-.. _sparsedist_and_sparseskop_api:
+  .. doxygenfunction:: RandBLAS::fill_sparse_unpacked_nosub(const SparseDist &D, int64_t &nnz, T* vals, sint_t* rows, sint_t* cols, const state_t &seed_state)
+      :project: RandBLAS
 
-SparseDist and SparseSkOp
-==============================
 
-.. doxygenstruct:: RandBLAS::SparseDist
-   :project: RandBLAS
-   :members:
 
-.. doxygenstruct:: RandBLAS::SparseSkOp
-   :project: RandBLAS
-   :members: 
+The unifying (C++20) concepts
+=============================
 
-.. doxygenfunction:: RandBLAS::fill_sparse(SparseSkOp &S)
-   :project: RandBLAS
+.. dropdown:: SketchingDistribution
+    :animate: fade-in-slide-down
+    :color: light
 
+    .. doxygenconcept:: RandBLAS::SketchingDistribution
+        :project: RandBLAS
 
-Advanced material
-=================
 
-.. doxygenfunction:: RandBLAS::fill_dense(blas::Layout layout, const DenseDist &D, int64_t n_rows, int64_t n_cols, int64_t S_ro, int64_t S_co, T *buff, const RNGState<RNG> &seed)
-   :project: RandBLAS
+.. dropdown:: SketchingOperator
+    :animate: fade-in-slide-down
+    :color: light
+  
+    .. doxygenconcept:: RandBLAS::SketchingOperator
+        :project: RandBLAS
 
diff --git a/rtd/source/api_reference/sparse_matrices.rst b/rtd/source/api_reference/sparse_matrices.rst
deleted file mode 100644
index 79e939ae..00000000
--- a/rtd/source/api_reference/sparse_matrices.rst
+++ /dev/null
@@ -1,32 +0,0 @@
-  
-   .. |op| mathmacro:: \operatorname{op}
-   .. |mat| mathmacro:: \operatorname{mat}
-   .. |submat| mathmacro:: \operatorname{submat}
-   .. |ldb| mathmacro:: \texttt{ldb}
-   .. |opA| mathmacro:: \texttt{opA}
-   .. |opS| mathmacro:: \texttt{opS}
-   .. |ttt| mathmacro:: \texttt
-
-********************************
-Representing sparse matrices
-********************************
-
-
-.. doxygenconcept:: RandBLAS::sparse_data::SparseMatrix
-    :project: RandBLAS
-
-
-Built-in sparse matrix classes
-==============================
-
-.. doxygenstruct:: RandBLAS::sparse_data::COOMatrix
-    :project: RandBLAS
-    :members:
-
-.. doxygenstruct:: RandBLAS::sparse_data::CSRMatrix
-    :project: RandBLAS
-    :members:
-
-.. doxygenstruct:: RandBLAS::sparse_data::CSCMatrix
-    :project: RandBLAS
-    :members:
\ No newline at end of file
diff --git a/rtd/source/api_reference/utilities.rst b/rtd/source/api_reference/utilities.rst
new file mode 100644
index 00000000..5436338d
--- /dev/null
+++ b/rtd/source/api_reference/utilities.rst
@@ -0,0 +1,36 @@
+   .. |op| mathmacro:: \operatorname{op}
+   .. |mat| mathmacro:: \operatorname{mat}
+   .. |lda| mathmacro:: \texttt{lda}
+   .. |mtxA| mathmacro:: \mathbf{A}
+   .. |ttt| mathmacro:: \texttt
+
+
+
+############################################################
+Utilities
+############################################################
+
+Random sampling from index sets
+===============================
+
+.. doxygenfunction:: RandBLAS::weights_to_cdf(int64_t n, T* w, T error_if_below = -std::numeric_limits<T>::epsilon())
+  :project: RandBLAS
+
+.. doxygenfunction:: RandBLAS::sample_indices_iid(int64_t n, const T* cdf, int64_t k, sint_t* samples, const state_t &state)
+  :project: RandBLAS
+
+.. doxygenfunction:: RandBLAS::sample_indices_iid_uniform(int64_t n, int64_t k, sint_t* samples, const state_t &state)
+  :project: RandBLAS
+
+.. doxygenfunction:: RandBLAS::repeated_fisher_yates(int64_t k, int64_t n, int64_t r, sint_t *samples, const state_t &state)
+  :project: RandBLAS 
+
+Debugging
+=========
+
+.. doxygenfunction:: RandBLAS::print_colmaj
+   :project: RandBLAS
+
+.. doxygenfunction:: RandBLAS::typeinfo_as_string()
+   :project: RandBLAS
+
diff --git a/rtd/source/assets/sparse_vs_dense_diagram_no_header.html b/rtd/source/assets/sparse_vs_dense_diagram_no_header.html
deleted file mode 100644
index 1001d73d..00000000
--- a/rtd/source/assets/sparse_vs_dense_diagram_no_header.html
+++ /dev/null
@@ -1,2 +0,0 @@
-<div class="mxgraph" style="max-width:100%;border:1px solid transparent;" data-mxgraph="{&quot;highlight&quot;:&quot;#0000ff&quot;,&quot;nav&quot;:true,&quot;zoom&quot;:1.25,&quot;resize&quot;:true,&quot;toolbar&quot;:&quot;zoom layers tags lightbox&quot;,&quot;edit&quot;:&quot;_blank&quot;,&quot;xml&quot;:&quot;&lt;mxfile host=\&quot;app.diagrams.net\&quot; modified=\&quot;2024-04-04T21:20:18.158Z\&quot; agent=\&quot;Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36\&quot; etag=\&quot;D8BBpXcFOBFU_TYvL8if\&quot; version=\&quot;24.2.2\&quot; type=\&quot;google\&quot;&gt;\n  &lt;diagram id=\&quot;C5RBs43oDa-KdzZeNtuy\&quot; name=\&quot;Page-1\&quot;&gt;\n    &lt;mxGraphModel dx=\&quot;1117\&quot; dy=\&quot;747\&quot; grid=\&quot;1\&quot; gridSize=\&quot;10\&quot; guides=\&quot;1\&quot; tooltips=\&quot;1\&quot; connect=\&quot;1\&quot; arrows=\&quot;1\&quot; fold=\&quot;1\&quot; page=\&quot;1\&quot; pageScale=\&quot;1\&quot; pageWidth=\&quot;827\&quot; pageHeight=\&quot;1169\&quot; math=\&quot;1\&quot; shadow=\&quot;0\&quot;&gt;\n      &lt;root&gt;\n        &lt;mxCell id=\&quot;WIyWlLk6GJQsqaUBKTNV-0\&quot; /&gt;\n        &lt;mxCell id=\&quot;WIyWlLk6GJQsqaUBKTNV-1\&quot; parent=\&quot;WIyWlLk6GJQsqaUBKTNV-0\&quot; /&gt;\n        &lt;mxCell id=\&quot;-Jd2ifwVyMrd-ZknzVJX-16\&quot; value=\&quot;\&quot; style=\&quot;group\&quot; vertex=\&quot;1\&quot; connectable=\&quot;0\&quot; parent=\&quot;WIyWlLk6GJQsqaUBKTNV-1\&quot;&gt;\n          &lt;mxGeometry x=\&quot;5\&quot; y=\&quot;4\&quot; width=\&quot;373.51\&quot; height=\&quot;330\&quot; as=\&quot;geometry\&quot; /&gt;\n        &lt;/mxCell&gt;\n        &lt;mxCell id=\&quot;-Jd2ifwVyMrd-ZknzVJX-5\&quot; style=\&quot;edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;exitX=0;exitY=0.5;exitDx=0;exitDy=0;\&quot; edge=\&quot;1\&quot; parent=\&quot;-Jd2ifwVyMrd-ZknzVJX-16\&quot; source=\&quot;sJ36rBQyTd84ULO7J676-0\&quot; target=\&quot;sJ36rBQyTd84ULO7J676-8\&quot;&gt;\n          &lt;mxGeometry relative=\&quot;1\&quot; as=\&quot;geometry\&quot; /&gt;\n        &lt;/mxCell&gt;\n        &lt;mxCell id=\&quot;-Jd2ifwVyMrd-ZknzVJX-9\&quot; value=\&quot;&amp;lt;font style=&amp;quot;font-size: 12px;&amp;quot;&amp;gt;YES&amp;lt;/font&amp;gt;\&quot; style=\&quot;edgeLabel;html=1;align=center;verticalAlign=middle;resizable=0;points=[];labelBackgroundColor=none;\&quot; vertex=\&quot;1\&quot; connectable=\&quot;0\&quot; parent=\&quot;-Jd2ifwVyMrd-ZknzVJX-5\&quot;&gt;\n          &lt;mxGeometry x=\&quot;-0.2764\&quot; y=\&quot;1\&quot; relative=\&quot;1\&quot; as=\&quot;geometry\&quot;&gt;\n            &lt;mxPoint x=\&quot;14\&quot; y=\&quot;-10\&quot; as=\&quot;offset\&quot; /&gt;\n          &lt;/mxGeometry&gt;\n        &lt;/mxCell&gt;\n        &lt;mxCell id=\&quot;-Jd2ifwVyMrd-ZknzVJX-10\&quot; style=\&quot;edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;exitX=1;exitY=0.5;exitDx=0;exitDy=0;\&quot; edge=\&quot;1\&quot; parent=\&quot;-Jd2ifwVyMrd-ZknzVJX-16\&quot; source=\&quot;sJ36rBQyTd84ULO7J676-0\&quot; target=\&quot;sJ36rBQyTd84ULO7J676-3\&quot;&gt;\n          &lt;mxGeometry relative=\&quot;1\&quot; as=\&quot;geometry\&quot; /&gt;\n        &lt;/mxCell&gt;\n        &lt;mxCell id=\&quot;-Jd2ifwVyMrd-ZknzVJX-11\&quot; value=\&quot;&amp;lt;font style=&amp;quot;font-size: 12px;&amp;quot;&amp;gt;NO&amp;lt;/font&amp;gt;\&quot; style=\&quot;edgeLabel;html=1;align=center;verticalAlign=middle;resizable=0;points=[];labelBackgroundColor=none;\&quot; vertex=\&quot;1\&quot; connectable=\&quot;0\&quot; parent=\&quot;-Jd2ifwVyMrd-ZknzVJX-10\&quot;&gt;\n          &lt;mxGeometry x=\&quot;-0.729\&quot; y=\&quot;8\&quot; relative=\&quot;1\&quot; as=\&quot;geometry\&quot;&gt;\n            &lt;mxPoint as=\&quot;offset\&quot; /&gt;\n          &lt;/mxGeometry&gt;\n        &lt;/mxCell&gt;\n        &lt;mxCell id=\&quot;sJ36rBQyTd84ULO7J676-0\&quot; value=\&quot;Is \\(A\\) dense?\&quot; style=\&quot;whiteSpace=wrap;html=1;movable=1;resizable=1;rotatable=1;deletable=1;editable=1;locked=0;connectable=1;perimeter=rectanglePerimeter;container=0;noLabel=0;\&quot; parent=\&quot;-Jd2ifwVyMrd-ZknzVJX-16\&quot; vertex=\&quot;1\&quot;&gt;\n          &lt;mxGeometry x=\&quot;138.51\&quot; width=\&quot;90\&quot; height=\&quot;50\&quot; as=\&quot;geometry\&quot; /&gt;\n        &lt;/mxCell&gt;\n        &lt;mxCell id=\&quot;sJ36rBQyTd84ULO7J676-3\&quot; value=\&quot;Use a dense sketching operator\&quot; style=\&quot;ellipse;whiteSpace=wrap;html=1;\&quot; parent=\&quot;-Jd2ifwVyMrd-ZknzVJX-16\&quot; vertex=\&quot;1\&quot;&gt;\n          &lt;mxGeometry x=\&quot;253.51\&quot; y=\&quot;250\&quot; width=\&quot;120\&quot; height=\&quot;80\&quot; as=\&quot;geometry\&quot; /&gt;\n        &lt;/mxCell&gt;\n        &lt;mxCell id=\&quot;sJ36rBQyTd84ULO7J676-4\&quot; value=\&quot;Use a sparse&amp;lt;div&amp;gt;sketching operator&amp;lt;/div&amp;gt;\&quot; style=\&quot;ellipse;whiteSpace=wrap;html=1;resizeWidth=1;resizeHeight=1;\&quot; parent=\&quot;-Jd2ifwVyMrd-ZknzVJX-16\&quot; vertex=\&quot;1\&quot;&gt;\n          &lt;mxGeometry x=\&quot;4.259999999999991\&quot; y=\&quot;250\&quot; width=\&quot;120\&quot; height=\&quot;80\&quot; as=\&quot;geometry\&quot; /&gt;\n        &lt;/mxCell&gt;\n        &lt;mxCell id=\&quot;-Jd2ifwVyMrd-ZknzVJX-6\&quot; style=\&quot;edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;exitX=0.5;exitY=1;exitDx=0;exitDy=0;\&quot; edge=\&quot;1\&quot; parent=\&quot;-Jd2ifwVyMrd-ZknzVJX-16\&quot; source=\&quot;sJ36rBQyTd84ULO7J676-8\&quot; target=\&quot;sJ36rBQyTd84ULO7J676-4\&quot;&gt;\n          &lt;mxGeometry relative=\&quot;1\&quot; as=\&quot;geometry\&quot; /&gt;\n        &lt;/mxCell&gt;\n        &lt;mxCell id=\&quot;-Jd2ifwVyMrd-ZknzVJX-8\&quot; value=\&quot;&amp;lt;font style=&amp;quot;font-size: 12px;&amp;quot;&amp;gt;NO&amp;lt;/font&amp;gt;\&quot; style=\&quot;edgeLabel;html=1;align=center;verticalAlign=middle;resizable=0;points=[];labelBackgroundColor=none;\&quot; vertex=\&quot;1\&quot; connectable=\&quot;0\&quot; parent=\&quot;-Jd2ifwVyMrd-ZknzVJX-6\&quot;&gt;\n          &lt;mxGeometry x=\&quot;-0.1676\&quot; y=\&quot;-1\&quot; relative=\&quot;1\&quot; as=\&quot;geometry\&quot;&gt;\n            &lt;mxPoint x=\&quot;11\&quot; as=\&quot;offset\&quot; /&gt;\n          &lt;/mxGeometry&gt;\n        &lt;/mxCell&gt;\n        &lt;mxCell id=\&quot;-Jd2ifwVyMrd-ZknzVJX-12\&quot; style=\&quot;edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;exitX=1;exitY=0.5;exitDx=0;exitDy=0;entryX=0;entryY=0.5;entryDx=0;entryDy=0;\&quot; edge=\&quot;1\&quot; parent=\&quot;-Jd2ifwVyMrd-ZknzVJX-16\&quot; source=\&quot;sJ36rBQyTd84ULO7J676-8\&quot; target=\&quot;sJ36rBQyTd84ULO7J676-3\&quot;&gt;\n          &lt;mxGeometry relative=\&quot;1\&quot; as=\&quot;geometry\&quot; /&gt;\n        &lt;/mxCell&gt;\n        &lt;mxCell id=\&quot;-Jd2ifwVyMrd-ZknzVJX-13\&quot; value=\&quot;&amp;lt;font style=&amp;quot;font-size: 12px;&amp;quot;&amp;gt;YES&amp;lt;/font&amp;gt;\&quot; style=\&quot;edgeLabel;html=1;align=center;verticalAlign=middle;resizable=0;points=[];labelBackgroundColor=none;\&quot; vertex=\&quot;1\&quot; connectable=\&quot;0\&quot; parent=\&quot;-Jd2ifwVyMrd-ZknzVJX-12\&quot;&gt;\n          &lt;mxGeometry x=\&quot;-0.8316\&quot; relative=\&quot;1\&quot; as=\&quot;geometry\&quot;&gt;\n            &lt;mxPoint x=\&quot;23\&quot; y=\&quot;68\&quot; as=\&quot;offset\&quot; /&gt;\n          &lt;/mxGeometry&gt;\n        &lt;/mxCell&gt;\n        &lt;mxCell id=\&quot;sJ36rBQyTd84ULO7J676-8\&quot; value=\&quot;&amp;lt;span style=&amp;quot;white-space-collapse: preserve;&amp;quot; id=&amp;quot;docs-internal-guid-34ef27bd-7fff-1a7c-9843-7abc309a05b5&amp;quot;&amp;gt;&amp;lt;div style=&amp;quot;white-space-collapse: collapse;&amp;quot;&amp;gt;&amp;lt;span style=&amp;quot;background-color: initial; white-space-collapse: preserve; font-family: Arial, sans-serif;&amp;quot;&amp;gt;Can you afford&amp;lt;/span&amp;gt;&amp;lt;span style=&amp;quot;background-color: initial; font-family: Arial, sans-serif; white-space-collapse: preserve;&amp;quot;&amp;gt; \\(\\Theta(dmn)\\) flops to&amp;lt;/span&amp;gt;&amp;lt;/div&amp;gt;&amp;lt;div style=&amp;quot;white-space-collapse: collapse;&amp;quot;&amp;gt;&amp;lt;span style=&amp;quot;background-color: initial; white-space-collapse: preserve; font-family: Arial, sans-serif;&amp;quot;&amp;gt;compute the sketch&amp;lt;/span&amp;gt;&amp;lt;span style=&amp;quot;background-color: initial; white-space-collapse: preserve; font-family: Arial, sans-serif;&amp;quot;&amp;gt;?&amp;lt;/span&amp;gt;&amp;lt;br&amp;gt;&amp;lt;/div&amp;gt;&amp;lt;/span&amp;gt;\&quot; style=\&quot;rounded=0;whiteSpace=wrap;html=1;\&quot; parent=\&quot;-Jd2ifwVyMrd-ZknzVJX-16\&quot; vertex=\&quot;1\&quot;&gt;\n          &lt;mxGeometry y=\&quot;100\&quot; width=\&quot;128.51\&quot; height=\&quot;60\&quot; as=\&quot;geometry\&quot; /&gt;\n        &lt;/mxCell&gt;\n      &lt;/root&gt;\n    &lt;/mxGraphModel&gt;\n  &lt;/diagram&gt;\n&lt;/mxfile&gt;\n&quot;}"></div>
-<script type="text/javascript" src="https://viewer.diagrams.net/js/viewer-static.min.js"></script>
diff --git a/rtd/source/conf.py b/rtd/source/conf.py
index 4b35eeb1..d9af1ff7 100644
--- a/rtd/source/conf.py
+++ b/rtd/source/conf.py
@@ -89,7 +89,54 @@
 #     'theme_overrides.css'  # overrides for wide tables in RTD theme
 # ]
 
+# Add custom JavaScript file
+html_js_files = [
+    'custom.js',
+]
+
 # numfig = True
+# cpp_maximum_signature_line_length = 120
 math_numfig = True
 math_eqref_format = "Eq. {number}"  # use a non-breaking-space unicode character.
-numfig_secnum_depth = 1
\ No newline at end of file
+numfig_secnum_depth = 1
+
+"""
+I'm struggling with MathJax styling.
+
+I'd like to change the default global scaling of all math expressions to 0.9.
+
+Here's what I'ev got at time of giving up:
+
+    I'm able to get the scaling of all display environments by setting custom CSS,
+    but it isn't working for in-line text.
+    
+    I wasn't able to get consistent behavior by setting options like in the
+    dictionaries below, but that might have been because of changes not propogating
+    to the same browser session even if CTRL+Shift+R has been invoked.
+    
+    I think the CSS files only get updated when we build from scratch.
+
+Things I could try in the future:
+
+    Use custom Javascript to select the 0.9x scaling as though I were doing that
+    via the GUI.
+
+"""
+
+# mathjax_options = {'scale': '0.5'}
+# mathjax3_config = {
+#   'chtml': {
+#     'scale': '0.5',                      #// global scaling factor for all expressions
+#     'minScale': '.5',                  #// smallest scaling factor to use
+# #    ' mtextInheritFont': 'false',       #// true to make mtext elements use surrounding font
+# #     'merrorInheritFont': 'true',       #// true to make merror text use surrounding font
+# #     'mathmlSpacing': 'false',          #// true for MathML spacing rules, false for TeX rules
+# #     'skipAttributes': r'{}',           # // RFDa and other attributes NOT to copy to the output
+# #     'exFactor': '.5',                  #// default size of ex in em units
+#      'displayAlign': 'center',          #// default for indentalign when set to 'auto'
+# #     'displayIndent': '0',            #  // default for indentshift when set to 'auto'
+# #     'matchFontHeight': 'false',         #// true to match ex-height of surrounding font
+# #     'fontURL': '[mathjax]/components/output/chtml/fonts/woff-v2',   #// The URL where the fonts are found
+# #     'adaptiveCSS': 'true'              #// true means only produce CSS that is used in the processed equations
+#   }
+# };
diff --git a/rtd/source/index.rst b/rtd/source/index.rst
index 0efd91d9..69fa6fa4 100644
--- a/rtd/source/index.rst
+++ b/rtd/source/index.rst
@@ -6,6 +6,7 @@
    Tutorial <tutorial/index>
    API Reference <api_reference/index>
    Changelog <updates/index>
+   FAQ and Limitations <FAQ>
 
 .. default-domain:: cpp
 
@@ -17,7 +18,7 @@ RandBLAS is a C++ library for randomized linear dimension reduction -- an operat
 We built RandBLAS to make it easier to write, debug, and deploy high-performance implementations of sketching-based algorithms.
 
 RandBLAS is efficient, flexible, and reliable.
-It uses CPU-based OpenMP acceleration to apply its sketching operators to dense or sparse data matrices stored in main memory.
+It uses CPU-based OpenMP acceleration to apply its sketching operators to matrices stored in main memory.
 It includes dense and sparse sketching operators (e.g., Gaussian operators, CountSketch, OSNAPs, etc..), which can 
 be applied to dense or sparse data in any combination that leads to a dense sketch.
 
diff --git a/rtd/source/installation/index.rst b/rtd/source/installation/index.rst
index 736d3111..a4fdef8a 100644
--- a/rtd/source/installation/index.rst
+++ b/rtd/source/installation/index.rst
@@ -15,8 +15,6 @@ RandBLAS is most useful when called from programs that can access LAPACK,
 or an equivalent library for dense matrix computations. However, we don't
 require that such a library is available.
 
-RandBLAS uses `C++20 concepts <https://en.cppreference.com/w/cpp/language/constraints>`_.
-Make sure your compiler supports these!
 
 CMake users
 -----------
@@ -35,6 +33,11 @@ Check out our `examples <https://github.com/BallisticLA/RandBLAS/tree/main/examp
 for CMake projects that use RandBLAS and `LAPACK++ <https://github.com/icl-utk-edu/lapackpp>`_
 to implement high-level randomized algorithms.
 
+.. warning::
+
+  Make sure to use the flag ``-Dblas_int=int64`` in the CMake configuration line for BLAS++
+  If you don't do that then you might get int32, which can lead to issues for large matrices.
+
 Everyone else
 -------------
 Strictly speaking, we only need three things to use RandBLAS in other projects.
@@ -44,7 +47,7 @@ Strictly speaking, we only need three things to use RandBLAS in other projects.
 2. The locations of Random123 header files.
 
 3. The locations of the header files and compiled binary for BLAS++ (which will
-   referred to as blaspp when installed on your system).
+   referred to as "blaspp" when installed on your system).
 
 If you have these things at hand, then compiling a RandBLAS-dependent
 program is just a matter of specifying standard compiler flags. 
diff --git a/rtd/source/tutorial/_incomplete_sketching.rst b/rtd/source/tutorial/_incomplete_sketching.rst
deleted file mode 100644
index 79bafdcb..00000000
--- a/rtd/source/tutorial/_incomplete_sketching.rst
+++ /dev/null
@@ -1,19 +0,0 @@
-:orphan:
-
-.. **********************************************
-.. Computing a sketch of a data matrix
-.. **********************************************
-
-.. RandBLAS has two main functions for sketching:
-
-..  * :math:`\texttt{sketch_general}`, which is used for dense data matrices, and 
-..  * :math:`\texttt{sketch_sparse}`, which is used for sparse data matrices.
-
-.. These functions are overloaded and templated to allow for different numerical 
-.. precisions and different types of sketching operators. It's possible to apply 
-.. dense or sparse sketching operators to dense matrices, and to apply dense sketching
-.. operators to sparse matrices. The common thread in both
-.. cases is that the final sketch is always dense.
-
-.. From a mathematical perspective, :math:`\texttt{sketch_general}` and :math:`\texttt{sketch_sparse}`
-.. have the same capabilities as GEMM.
\ No newline at end of file
diff --git a/rtd/source/tutorial/distributions.rst b/rtd/source/tutorial/distributions.rst
index 4ec7cd44..1511039d 100644
--- a/rtd/source/tutorial/distributions.rst
+++ b/rtd/source/tutorial/distributions.rst
@@ -1,4 +1,17 @@
-.. :sd_hide_title:
+   .. |op| mathmacro:: \operatorname{op}
+   .. |mat| mathmacro:: \operatorname{mat}
+   .. |submat| mathmacro:: \operatorname{submat}
+   .. |D| mathmacro:: \mathcal{D}
+   .. |mtxA| mathmacro:: \mathbf{A}
+   .. |mtxB| mathmacro:: \mathbf{B}
+   .. |mtxS| mathmacro:: \mathbf{S}
+   .. |mtxx| mathmacro:: \mathbf{x}
+   .. |ttt| mathmacro:: \texttt
+   .. |vecnnz| mathmacro:: \texttt{vec_nnz}
+   .. |majoraxis| mathmacro:: \texttt{major_axis}
+   .. |nrows| mathmacro:: \texttt{n_rows}
+   .. |ncols| mathmacro:: \texttt{n_cols}
+   .. |vals| mathmacro:: \texttt{vals}
 
 .. toctree::
   :maxdepth: 3
@@ -12,15 +25,15 @@ RandBLAS' sketching operators can be divided into two categories.
   *Dense* sketching operators have entries that are sampled iid from
   a mean-zero distribution over the reals.
   Distributions over these operators are represented with the
-  :ref:`DenseDist <densedist_and_denseskop_api>` class.
+  :cpp:struct:`RandBLAS::DenseDist` type.
 
   *Sparse* sketching operators have random (but highly structured) sparsity patterns.
   Their nonzero entries are sampled iid and uniformly from :math:`\{-1,1\}.`
   Distributions over these operators are represented with the 
-  :ref:`SparseDist <sparsedist_and_sparseskop_api>` class.
+  :cpp:struct:`RandBLAS::SparseDist` type.
 
-The first order of business in correctly using RandBLAS is to decide which type of sketching
-operator is appropriate in your situation. 
+The first order of business in implementing sketching-based algorithms is to decide which type of
+distribution is appropriate in your situation. 
 From there, you need to instantiate a specific distribution by setting some parameters.
 This part of the tutorial gives tips on both of these points.
 
@@ -28,14 +41,18 @@ This part of the tutorial gives tips on both of these points.
 How to choose between dense and sparse sketching
 =====================================================================
 
-Let's say you have an :math:`m \times n` matrix :math:`A` and an integer :math:`d,`
-and you want to compute a sketch of :math:`A` that has rank :math:`\min\{d, \operatorname{rank}(A)\}.`
-Here's a chart to help decide whether to use a dense or a sparse sketching operator.
+Let's say you have an :math:`m \times n` matrix :math:`\mtxA` and an integer :math:`d,`
+and you want to compute a sketch of :math:`\mtxA` that has rank :math:`\min\{d, \operatorname{rank}(\mtxA)\}.`
+Here's a two-question heuristic for choosing the best sketching operator RandBLAS can offer.
 
-  .. raw:: html
-      :file: ../assets/sparse_vs_dense_diagram_no_header.html
+  Q1. Is :math:`\mtxA` sparse?
+    If so, use a dense operator.
 
-Discussion of the chart's first yes/no branch.
+  Q2. Supposing that :math:`\mtxA` is dense -- can you afford :math:`\Theta(dmn)` flops to compute the sketch?
+    If so, use a dense operator. If not, use a sparse operator.
+
+
+Discussion of Q1.
   RandBLAS doesn't allow applying sparse sketching operators to sparse data.
   This is because RandBLAS is only intended to produce sketches that are dense.
 
@@ -45,45 +62,152 @@ Discussion of the chart's first yes/no branch.
   algorithms that would benefit from this functionality.
 
 
-Discussion of the chart's second yes/no branch.
+Discussion of Q2.
   This gets at whether adding :math:`O(dmn)` flops to a randomized algorithm
   can decisively impact that algorithm's performance.
   Some randomized algorithms for dense matrix computations make it easy to answer this question.
   Consider, for example ...
 
     *Subspace iteration methods for low-rank approximation.* These methods have complexity :math:`\Omega(dmn)`
-    regardless of whether the complexity of computing the initial sketch is :math:`o(dmn)`.
+    regardless of whether the complexity of computing the initial sketch is :math:`o(dmn).`
 
-    *Sketch-and-precondition methods for least squares.* These methods need to set :math:`d \geq \min\{m,n\}`.
+    *Sketch-and-precondition methods for least squares.* These methods need to set :math:`d \geq \min\{m,n\}.`
     As a result, they can't tolerate :math:`O(dmn)` operations for sketching while still providing
     asymptotically faster runtime than a direct least squares solver.
 
-  With this in mind, notice that the chart indicates a preference dense sketching over
-  sparse sketching when dense sketching can be afforded.
-  This preference stems from how if the sketching dimension is fixed, then the statistical properties of dense sketching
-  operators will generally be preferable to those of sparse
-  sketching operators.
+With this in mind, note how our heuristic indicates a preference dense sketching over
+sparse sketching when dense sketching can be afforded.
+This preference stems from how if the sketching dimension is fixed, then the statistical properties of dense sketching
+operators will generally be preferable to those of sparse
+sketching operators.
 
-  .. note::
-    See Wikipedia for the meanings of
-    `big-omega notation <https://en.wikipedia.org/wiki/Big_O_notation#Big_Omega_notation>`_ and 
-    `little-o notation <https://en.wikipedia.org/wiki/Big_O_notation#Little-o_notation>`_.
+.. note::
+  See Wikipedia for the meanings of
+  `big-omega notation <https://en.wikipedia.org/wiki/Big_O_notation#Big_Omega_notation>`_ and 
+  `little-o notation <https://en.wikipedia.org/wiki/Big_O_notation#Little-o_notation>`_.
 
 
 Distribution parameters
 =======================
 
-This part of the web docs is coming soon!
-
-
-
-The semantics of MajorAxis
-==========================
-
-Sketching operators in RandBLAS have a "MajorAxis" member.
-The semantics of this member can be complicated.
-We only expect advanced users to benefit from chosing this member
-differently from the defaults we set.
-
-A proper explanation of MajorAxis' semantics is coming soon!
-Bear with us until then.
+Here are example constructor invocations for DenseDist and SparseDist if we include all optional arguments.
+
+.. code:: c++
+
+    int64_t d = 500; int64_t m = 10000; // We don't require that d < m. This is just for concreteness.
+
+    ScalarDist family = ScalarDist::Gaussian;
+    DenseDist  D_dense( d, m, family,  Axis::Long);
+
+    int64_t vec_nnz = 4;
+    SparseDist D_sparse(d, m, vec_nnz, Axis::Short);
+
+The first two arguments have *identical* meanings for DenseDist and
+SparseDist; they give the number of rows and columns in the operator.
+The meanings of the remaining arguments differ
+*significantly* depending on which case you're in.
+
+
+DenseDist: family and major_axis
+--------------------------------
+A DenseDist represents a distribution over matrices with fixed dimensions, where 
+the entries are i.i.d. mean-zero variance-one random variables. 
+Its trailing constructor arguments are called  :math:`\ttt{family}` and :math:`\majoraxis.`
+
+The family argument indicates whether the entries follow the standard normal distribution
+(:math:`\ttt{ScalarDist::Gaussian}`) or the uniform distribution over :math:`[-\sqrt{3},\sqrt{3}]`
+(:math:`\ttt{ScalarDist::Uniform}`).
+These "Gaussian operators" and "uniform operators" (as we'll call them) are *very similar* in theory
+and *extremely similar* in practice.
+The DenseDist constructor uses Gaussian by default since this is only marginally more expensive
+than uniform in most cases, and Gaussian operators are far more common in theoretical analysis
+of randomized algorithms.
+
+Then there's :math:`\majoraxis.` From a statistical perspective there is absolutely
+no difference between :math:`\ttt{major_axis = Short}` or :math:`\ttt{major_axis = Long}.`
+Although, there are 
+narrow circumstances where one of these might be preferred in practice. We'll explain with an example.
+    
+.. code:: c++
+
+    // Assume previous code defined integers (d1, d2, n) where 0 < d1 < d2 < n,
+    // and "family" variable equal to ScalarDist::Gaussian or ScalarDist::Uniform,
+    // and a "state" variable of type RNGState.
+    DenseDist D1(d1, n, family, Axis::Long);
+    DenseDist D2(d2, n, family, Axis::Long);
+    DenseSkOp S1(D1, state);
+    DenseSkOp S2(D2, state);
+    // If S1 and S2 are represented explicitly as dense matrices, then S1 is the 
+    // n-by-d1 submatrix of S2 obtained by selecting its first d1 columns.
+
+In this example, long-axis-major DenseDists provide for a reproducible stream of random column vectors
+for tall sketching operators. If the row and column dimensions were swapped, then we'd have a mechanism
+for reproducibly sampling from streams of random row vectors for wide sketching operators.
+See :ref:`this page <sketch_updates>` of the tutorial for more information on the role of 
+:math:`\majoraxis` for dense sketching operators.
+
+.. _sparsedist_params:
+
+SparseDist: vec_nnz and major_axis
+----------------------------------
+
+A SparseDist represents a distribution over sparse matrices with fixed dimensions, where 
+either the rows or the columns are sampled independently from a certain distribution over
+sparse vectors.
+The distribution is determined by the trailing constructor arguments:  :math:`\vecnnz` and :math:`\majoraxis.`
+
+Let :math:`k = \ttt{dim_major}.`
+If :math:`\majoraxis = \ttt{Short},` this is :math:`\min\{\nrows,\ncols\}.`
+If :math:`\majoraxis = \ttt{Long},` this is :math:`\max\{\nrows,\ncols\}.`
+The major-axis vectors of a SparseSkOp follow a distribution :math:`\mathcal{V}` over :math:`\mathbb{R}^k.`
+The number of nonzeros in each major-axis vector is bounded by  :math:`1 \leq \vecnnz \leq k.`
+
+All else equal, larger values of :math:`\vecnnz` result in distributions
+that are "better" at preserving Euclidean geometry when sketching.
+The value of :math:`\vecnnz` that suffices for a given context will 
+also depend on the sketch size, :math:`d := \min\{\nrows,\ncols\}.`
+Larger sketch sizes make it possible to "get away with" smaller values of
+:math:`\vecnnz.`
+
+SparseDist when :math:`\majoraxis = \ttt{Short}.`
+
+  A sample from :math:`\mathcal{V}` has exactly :math:`\vecnnz` nonzeros.
+  The locations of those nonzeros are chosen uniformly
+  without replacement from :math:`\{0,\ldots,k-1\}.` The values of the nonzeros are
+  sampled independently and uniformly from :math:`\pm 1.`
+
+  Many sketching distributions from the literature fall into this category.
+  :math:`\vecnnz = 1` corresponds to the distribution over CountSketch operators.
+  :math:`\vecnnz > 1` corresponds to distributions which have been studied under
+  many names, including OSNAPs, SJLTs, and hashing embeddings.
+
+  The community has come to a consensus that very small values of :math:`\vecnnz` can suffice for good performance.
+  For example, suppose we seek a constant-distortion embedding
+  of an unknown subspace of dimension :math:`n,` where :math:`1{,}000 \leq n \leq 10{,}000.`
+  If :math:`d = 2n`, then many practitioners
+  would restrict their attention to :math:`\vecnnz \leq 8.`
+  There are no special performance benefits in RandBLAS to setting :math:`\vecnnz = 1.`
+  Additionally, using :math:`\vecnnz > 1` makes it far more likely for a sketch to retain
+  useful geometric information from the data.
+  Therefore, we recommend using :math:`\vecnnz \geq 2` in practice.
+
+SparseDist when :math:`\majoraxis = \ttt{Long}.` 
+
+  A sample :math:`\mtxx` from :math:`\mathcal{V}` has *at most* :math:`\vecnnz` nonzero
+  entries. The locations of the nonzeros are determined by sampling uniformly
+  with replacement from :math:`\{0,\ldots,k-1\}.`
+  If index :math:`j` occurs in the sample :math:`\ell` times, then 
+  :math:`\mtxx_j` will equal :math:`\sqrt{\ell}` with probability 1/2 and
+  :math:`-\sqrt{\ell}` with probability 1/2.
+
+  In the literature,
+  :math:`\vecnnz = 1` corresponds to operators for sampling uniformly with replacement
+  from the rows or columns of a data matrix (although the signs on the rows or
+  columns may be flipped). Taking :math:`\vecnnz > 1` gives a special case of LESS-uniform
+  distributions, where the underlying scalar sub-gaussian distribution is the uniform
+  distribution over :math:`\pm 1.`
+
+  It is important to use (much) larger values of :math:`\vecnnz` here compared to the 
+  short-axis-major case, at least for the same sketch size :math:`d.`
+  There is less consensus in the community for what constitutes "big enough in practice," 
+  therefore we make no prescriptions on this front.
diff --git a/rtd/source/tutorial/gemm.rst b/rtd/source/tutorial/gemm.rst
index 20e4ffc1..b0389870 100644
--- a/rtd/source/tutorial/gemm.rst
+++ b/rtd/source/tutorial/gemm.rst
@@ -12,6 +12,7 @@
   .. |roa| mathmacro:: \texttt{ro_a}
   .. |coa| mathmacro:: \texttt{co_a}
   .. |mtx| mathmacro:: \mathbf
+  .. |ttt| mathmacro:: \texttt
 
 
 .. _gemm_tutorial:
@@ -39,7 +40,7 @@ information in the form of flags.
 The flag for :math:`\mtx{A}` is traditionally called :math:`\text{“}\opA\text{”}` and is interpreted as
 
   .. math::
-    \op(\mtx{A}) = \begin{cases} \mtx{A} & \text{ if } \opA \texttt{ == NoTrans}  \\ \mtx{A}^T & \text{ if } \opA \texttt{ == Trans}  \end{cases}.
+    \op(\mtx{A}) = \begin{cases} \mtx{A} & \text{ if } \opA \ttt{ == NoTrans}  \\ \mtx{A}^T & \text{ if } \opA \ttt{ == Trans}  \end{cases}.
 
 The flag for :math:`\mtx{B}` is traditionally named :math:`\text{“}\opB\text{”}` and is interpreted similarly.
 
@@ -82,16 +83,16 @@ The semantics of :math:`\mat` can be understood by focusing on :math:`\mtx{A} =
 First, there is the matter of the dimensions.
 These are inferred from :math:`(m, k)` and from :math:`\opA` in the way indicated by :eq:`eq_realisticgemm`.
 
-* If :math:`\opA \texttt{ == NoTrans}`, then :math:`\mtx{A}` is :math:`m \times k`.
-* If :math:`\opA \texttt{ == Trans }`, then :math:`\mtx{A}` is :math:`k \times m`.
+* If :math:`\opA \ttt{ == NoTrans}`, then :math:`\mtx{A}` is :math:`m \times k`.
+* If :math:`\opA \ttt{ == Trans }`, then :math:`\mtx{A}` is :math:`k \times m`.
 
 Moving forward let us say that :math:`\mtx{A}` is :math:`r \times c`.
 The actual contents of :math:`\mtx{A}` are determined by the pointer, :math:`A\text{,}`
 an explicitly declared stride parameter, :math:`\lda\text{,}`
-and a layout parameter, :math:`\texttt{ell}\text{,}` according to the rule 
+and a layout parameter, :math:`\ttt{ell}\text{,}` according to the rule 
 
   .. math::
-      \mtx{A}_{i,j} = \begin{cases}  A[\,i + j \cdot \lda\,] & \text{ if } \texttt{ell == ColMajor} \\ A[\,i \cdot \lda + j\,] & \text{ if } \texttt{ell == RowMajor} \end{cases}
+      \mtx{A}_{i,j} = \begin{cases}  A[\,i + j \cdot \lda\,] & \text{ if } \ttt{ell == ColMajor} \\ A[\,i \cdot \lda + j\,] & \text{ if } \ttt{ell == RowMajor} \end{cases}
 
 where we zero-index :math:`\mtx{A}` for consistency with indexing into buffers in C/C++.
 
@@ -99,7 +100,7 @@ Only the leading :math:`r \times c` submatrix of :math:`\mat(A)` will be accesse
 Note that in order for this submatrix to be well-defined it's necessary that
 
   .. math::
-    \lda \geq \begin{cases} r & \text{ if } \texttt{ell == ColMajor} \\  c & \text{ if } \texttt{ell == RowMajor} \end{cases}.
+    \lda \geq \begin{cases} r & \text{ if } \ttt{ell == ColMajor} \\  c & \text{ if } \ttt{ell == RowMajor} \end{cases}.
 
 Most performance libraries check that this is the case on entry to GEMM and will raise an error if this condition
 isn't satisfied.
diff --git a/rtd/source/tutorial/index.rst b/rtd/source/tutorial/index.rst
index aa0662fb..dd939695 100644
--- a/rtd/source/tutorial/index.rst
+++ b/rtd/source/tutorial/index.rst
@@ -11,36 +11,39 @@ Once a sketching operator is sampled it is applied to a user provided *data matr
 
 Abstractly, a sketch is supposed to summarize some geometric information that underlies its data matrix.
 The RandNLA literature documents a huge array of possibilities for how to compute and process sketches to obtain various desired outcomes.
-It also documents sketching operators of many different "flavors;" some are sparse matrices, some are subsampled FFT-like operations, and others still are dense matrices. 
+It also documents sketching operators of different "flavors;" some are sparse matrices, some are subsampled FFT-like operations, and others still are dense matrices. 
 
+.. note::
+
+  If we call something an "operator," we mean it's a *sketching operator* unless otherwise stated.
 
 RandBLAS, at a glance
   It's useful to think of RandBLAS' sketching workflow in three steps.
 
     1. Get your hands on a random state.
-    2. Define a sketching distribution, and use the random state to sample a sketching operator from that distribution.
-    3. Apply the sketching operator with a function that's *almost* identical to GEMM.
+    2. Define a sketching distribution, and use the random state to sample an operator from that distribution.
+    3. Apply the operator with a function that's *almost* identical to GEMM.
 
   To illustrate this workflow, suppose we have a 20,000-by-10,000 double-precision matrix :math:`A`  stored in column-major
   layout. Suppose also that we want to compute a sketch of the form :math:`B = AS`, where :math:`S` is a Gaussian matrix of size 10,000-by-50.
   This can be done as follows.
 
-   .. code:: c++
-
-      // step 1
-      RandBLAS::RNGState state();
-      // step 2
-      RandBLAS::DenseDist D(10000, 50);
-      RandBLAS::DenseSkOp<double> S(D, state);
-      // step 3
-      double B* = new double[20000 * 50];
-      RandBLAS::sketch_general(
-            blas::Layout::ColMajor, blas::Op::NoTrans, blas::Op::NoTrans,
-            20000, 50,  10000,
-            1.0, A, 20000, S, 0.0, B, 20000
-      ); // B = AS
-
-RandBLAS has a wealth of capabilities that are not reflected in that code sippet.
+  .. code:: c++
+
+    // step 1
+    RandBLAS::RNGState state();
+    // step 2
+    RandBLAS::DenseDist D(10000, 50);
+    RandBLAS::DenseSkOp<double> S(D, state);
+    // step 3
+    double B* = new double[20000 * 50];
+    RandBLAS::sketch_general(
+          blas::Layout::ColMajor, blas::Op::NoTrans, blas::Op::NoTrans,
+          20000, 50,  10000,
+          1.0, A, 20000, S, 0.0, B, 20000
+    ); // B = AS
+
+RandBLAS has a wealth of capabilities that are not reflected in that code snippet.
 For example, it lets you set an integer-valued the seed when defining :math:`\texttt{state}`, and it provides a wide range of both dense and sparse sketching operators.
 It even lets you compute products against *submatrices* of sketching operators without ever forming the full operator in memory.
 
@@ -51,6 +54,6 @@ It even lets you compute products against *submatrices* of sketching operators w
     Background on GEMM <gemm>
     Defining a sketching distribution <distributions>
     Sampling a sketching operator <sampling_skops>
-    Updating a sketch <updates>
+    Updating a sketch <sketch_updates>
     The meaning of "submat(・)" in RandBLAS documentation <submatrices>
-
+    Memory management <memory>
diff --git a/rtd/source/tutorial/memory.rst b/rtd/source/tutorial/memory.rst
new file mode 100644
index 00000000..73b74174
--- /dev/null
+++ b/rtd/source/tutorial/memory.rst
@@ -0,0 +1,77 @@
+.. _memory_tutorial:
+
+Memory management 
+=================
+
+Decades ago, the designers of the classic BLAS made the wise decision to not internally allocate dynamically-sized
+arrays.
+Such an approach was (and is) viable because BLAS only operates on very simple datatypes: scalars and references
+to dense arrays of scalars.
+
+RandBLAS, by contrast, needs to provide a polymorphic API with far more sophisticated datatypes.
+This has led us to adopt a policy where we can internally allocate and deallocate dynamically-sized arrays 
+with the ``new []`` and ``delete []`` keywords, subject to the restrictions below. 
+Users are not bound to follow these rules, but deviations from them should be made with care.
+
+Allocation and writing to reference members
+-------------------------------------------
+
+1. We allocate memory with ``new []`` only when necessary or explicitly requested.
+    
+    If a region of memory is allocated, it must either be deallocated before the function returns
+    or attached to a RandBLAS-defined object that persists beyond the function's scope.
+
+2. We can only attach memory to objects by overwriting a null-valued reference member,
+   and only when the object has an ``own_memory`` member that evaluates to true.
+
+3. We cannot overwrite an object's reference member if there is a chance that doing so may cause a memory leak.
+    
+    This restriction is in place regardless of whether ``obj.own_memory`` is true.
+    It makes for very few cases when RandBLAS is allowed to overwrite a non-null reference member.
+
+Deallocation
+------------
+
+1. We deallocate memory only in destructors.
+
+    In particular, we never "reallocate" memory. If reallocation is needed, the user must manage the deletion of old memory
+    and then put the object in a state where RandBLAS can write to it.
+
+2. A destructor attempts deallocation only if ``own_memory`` is true.
+
+    The destructor calls ``delete []`` on a specific reference member if and only if that member is non-null.
+
+What we do instead of overwriting non-null references 
+-----------------------------------------------------
+
+Let ``obj`` denote an instance of a RandBLAS-defined type where  ``obj.member`` is a reference.
+Suppose we find ourselves in a situation where ``obj.member`` is *non-null*,
+but we're at a point in RandBLAS' code that would have written to ``obj.member`` if it were null.
+There are two possibilities for what happens next.
+
+1. If the documentation for ``obj.member`` states an array length requirement purely in terms of ``const`` members,
+   then we silently skip memory allocations that would overwrite ``obj.member``. We'll simply
+   assume that ``obj.member`` has the correct size.
+
+2. Under any other circumstances, RandBLAS raises an error. 
+
+In essence, the first situation has enough structure that the user could plausibly understand RandBLAS' behavior,
+while the latter situation is too error-prone for RandBLAS to play a role in it.
+
+
+Discussion
+----------
+
+Clarifications:
+ * No RandBLAS datatypes use ``new`` or ``new []`` in their constructors.
+   Reference members of such datatypes are either null-initialized or initialized at user-provided values.
+ * Move-constructors are allowed to overwrite an object's reference members with ``nullptr`` if those references have been copied to
+   a newly-created object.
+ * Users retain the ability to overwrite any RandBLAS object's ``own_member`` member and its reference members at any time;
+   no such members are declared as const.
+
+We're not totally satisfied with this document writ-large.
+It would probably be better if removed the commentary from the enumerations above and added lots of examples that refer to actual RandBLAS code.
+Alas, this will have to do for now.
+Questions about what specific parts of this policy mean or proposed revisions are welcome!
+Please get in touch with us on GitHub.
diff --git a/rtd/source/tutorial/sampling_skops.rst b/rtd/source/tutorial/sampling_skops.rst
index ee549bdb..1b21a64d 100644
--- a/rtd/source/tutorial/sampling_skops.rst
+++ b/rtd/source/tutorial/sampling_skops.rst
@@ -1,7 +1,6 @@
 .. toctree::
   :maxdepth: 3
 
-.. Note to self: I can first describe CBRNGs mathematically. Then I get to implementation details.
 
 ******************************************************************************
 Sampling a sketching operator
@@ -16,7 +15,7 @@ Sequential calls to the CBRNG with a fixed key should use different values for t
 
 
 RandBLAS doesn't expose CBRNGs directly. Instead, it exposes an abstraction of
-a CBRNG's state as defined in the :ref:`RNGState <rngstate_api>` class.
+a CBRNG's state as defined in the :cpp:struct:`RandBLAS::RNGState` type.
 RNGState objects are needed to construct sketching operators.
 
 .. _constructing_rng_states_tut:
diff --git a/rtd/source/tutorial/sketch_updates.rst b/rtd/source/tutorial/sketch_updates.rst
new file mode 100644
index 00000000..84b9765b
--- /dev/null
+++ b/rtd/source/tutorial/sketch_updates.rst
@@ -0,0 +1,278 @@
+
+
+  .. |seedstate| mathmacro:: \texttt{seed_state}
+  .. |nextstate| mathmacro:: \texttt{next_state}
+  .. |majoraxis| mathmacro:: \texttt{major_axis}
+  .. |ttt| mathmacro:: \texttt
+  .. |D| mathmacro:: \mathcal{D}
+  .. |mS| mathmacro:: {S}
+  .. |mA| mathmacro:: {A}
+  .. |mB| mathmacro:: {B}
+  .. |mX| mathmacro:: {X}
+  .. |mY| mathmacro:: {Y}
+  .. |R| mathmacro:: \mathbb{R}
+  .. |rank| mathmacro:: \operatorname{rank}
+
+.. _sketch_updates:
+
+
+*********************************************************************************************
+Updating and downdating sketches
+*********************************************************************************************
+
+This page presents four ways of updating a sketch.
+We use MATLAB notation for in-line concatenation of matrices.
+
+
+Increasing sketch size 
+-----------------------
+
+The scenarios here use sketching operators :math:`S_1` and :math:`S_2` 
+that are sampled independently from distributions :math:`\D_1` and :math:`\D_2.`
+We denote the isometry scales of :math:`\D_1` and :math:`\D_2` by :math:`\alpha_1` and :math:`\alpha_2,` respectively.
+
+Increasing the size of a sketch is glorified concatenation.
+The only subtlety is how to perform the update in a way that preserves isometric scaling
+(which can be useful in contexts like norm estimation).
+
+Our main purpose in explaining these updates is to highlight the effects of 
+setting a distribution's :math:`\majoraxis` to Long.
+If you're working with DenseDists and DenseSkOps (where there's no statistical difference
+between short-axis-major or long-axis-major) then this choice of major axis (which is
+the default) can provide an additional measure of reproducibility in experiments that require
+tuning sketch size.
+
+
+Sketching from the left
+~~~~~~~~~~~~~~~~~~~~~~~
+Here the :math:`\D_i` are distributions over wide matrices, and we set :math:`\mB_i = \mS_i \mA.`
+To increase sketch size is to combine these individual sketches via concatenation:
+
+.. math::
+  \mB = \begin{bmatrix} \mB_1 \\ \mB_2 \end{bmatrix} = \begin{bmatrix} \mS_1 \mA \\ \mS_2 \mA \end{bmatrix}.
+
+It only makes sense to do this if :math:`\mB` ends up having fewer rows than :math:`\mA.`
+Put another way, this type of update only makes sense if the block operator
+defined by :math:`\mS = [\mS_1;~\mS_2]` is also wide.
+
+It is important to be aware of the basic statistical properties of this block operator,
+so we'll give its distribution the name :math:`\D.`
+The isometry scale of :math:`\D` is :math:`\alpha = (\alpha_1^{-2} + \alpha_2^{-2})^{-1/2}.`
+If :math:`B_1` was computed with isometric scaling (that is, if :math:`\mB_1 = \alpha_1 \mS_1 \mA`),
+then isometrically-scaled
+updated sketch would be :math:`B = \alpha [ \mB_1/\alpha_1;~ \mB_2].`
+
+RandBLAS can explicitly represent :math:`\D` under certain conditions, which we express with a code snippet.
+
+.. code:: c++
+
+  // SkDist is either DenseDist or SparseDist.
+  // (d1, d2, and m) are positive integers where d1 + d2 < m.
+  // arg3 is any variable allowed in the third argument of the SkDist constructor.
+  SkDist D1(     d1, m, arg3, Axis::Long);
+  SkDist D2(     d2, m, arg3, Axis::Long);
+  SkDist  D(d1 + d2, m, arg3, Axis::Long);
+
+
+Furthermore, if :math:`\mS_1.\nextstate` is the seed state for :math:`\mS_2`, then
+the resulting block operator :math:`\mS = [\mS_1;~\mS_2]` equals the operator obtained by sampling from
+:math:`\D` with :math:`\mS_1.\seedstate.`
+
+This presents another option for how sketch updates might be performed.
+Rather than working with two sketching operators explicitly,
+one can work with a single operator :math:`\mS` sampled from the larger
+distribution :math:`\D,` and compute :math:`\mB_1` and :math:`\mB_2` by working with
+appropriate submatrices of :math:`\mS.`
+
+Sketching from the right
+~~~~~~~~~~~~~~~~~~~~~~~~
+Here the :math:`\D_i` are distributions over tall matrices, and :math:`\mB_i = A \mS_i.`
+The combined sketch is
+
+.. math::
+
+    \mB = \begin{bmatrix} \mB_1 & \mB_2 \end{bmatrix} = \begin{bmatrix} \mA \mS_1 & \mA \mS_2 \end{bmatrix},
+
+and it can be obtained by right-multiplying :math:`\mA` with the block operator 
+:math:`\mS = [\mS_1,~\mS_2].`
+
+The isometry scale of :math:`\mS`'s distribution is the same as before: :math:`\alpha = (\alpha_1^{-2} + \alpha_2^{-2})^{-1/2},`
+and RandBLAS can explicitly represent this distribution under the following conditions
+(:math:`\ttt{SkDist}` and :math:`\ttt{x}` are as before).
+
+.. code:: c++
+
+  // (d1, d2, n) are positive integers where d1 + d2 < n.
+  SkDist D1(n,      d1, arg3, Axis::Long);
+  SkDist D2(n,      d2, arg3, Axis::Long);
+  SkDist  D(n, d1 + d2, arg3, Axis::Long);
+
+If :math:`\mS_1` is sampled from :math:`\D_1` with seed state :math:`r` and 
+:math:`\mS_2` is sampled from :math:`\D_2` with seed state equal to :math:`\mS_1.\nextstate,`
+then the block operator :math:`\mS` is the same as the matrix sampled from 
+:math:`\D` with seed state :math:`r.`
+As with sketching from the left, this shows there are situations where
+it can suffice to define a single operator and sketch with appropriate submatrices
+of that operator.
+
+
+Rank-:math:`k` updates
+----------------------
+
+A *rank-k update* is a multiply-accumulate operation involving matrices. 
+It involves a pair of matrices :math:`\mX` and :math:`\mY` that have k columns and k rows, respectively.
+It also involves a real scalar :math:`\alpha` and a matrix :math:`\mB` of the same shape as :math:`\mX \mY.`
+The operation itself is 
+
+.. math::
+    \mB \leftarrow \mB  + \alpha \mX \mY.
+
+Here we describe some rank-k updates that arise in sketching algorithms.
+
+This framework can be used to describe incorporating new data into a sketch,
+or removing the contributions of old data from a sketch.
+We've focused our documentation efforts on the cases that add data.
+More specifically, we focus on when we're performing a rank-k update to add new
+data into an existing sketch, but k was not known when the original sketch was
+formed. 
+This case has more complications than if k was known in advance, but it can still be handled
+with RandBLAS when using distributions with *if* :math:`\majoraxis = \ttt{Short}.`
+
+.. note::
+  Future *updates* (pun intended) to these web docs will explain how the major-axis requirement
+  can be dropped if an upper bound on k is known in advance. That really just amounts to explaining
+  in detail how you operate with submatrices in RandBLAS. Incidentally, operating with submatrices
+  is really all you need to perform rank-k updates that "remove" data.
+
+
+Adding data: left-sketching
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Problem statement
+  We start with an :math:`m \times n` data matrix :math:`\mA_1.`
+  When presented with this matrix, we sample a wide operator :math:`\mS_1` from a distribution :math:`\D_1`
+  (defined as follows)
+
+  .. code:: c++
+  
+      // Assumptions
+      //  * SkDist is DenseDist or SparseDist
+      //  * arg3 is any variable that makes sense for the SkDist constructor.
+      //  * This example requires d < m.
+      SkDist D1(d, m, arg3, Axis::Short);
+
+  and we compute a sketch :math:`\mB = \mS_1 \mA_1.`
+
+  Sometime later, a :math:`k \times n` matrix :math:`\mA_2` arrives.
+  We want to independently sample a :math:`d \times k` operator :math:`\mS_2` from *some* :math:`\D_2` and perform a rank-k update :math:`\mB \leftarrow \mB + \mS_2 \mA_2.`
+  In essence, we want to redefine
+
+  .. math::
+
+      \mB = \begin{bmatrix} \mS_1 & \mS_2 \end{bmatrix} \begin{bmatrix} \mA_1 \\ \mA_2 \end{bmatrix}
+
+  without having to revisit :math:`\mA_1.`
+
+Conceptual solution
+  Since :math:`\majoraxis` is Short and :math:`d < m,` the columns of :math:`d \times m` matrices sampled from :math:`\D_1`
+  will be sampled independently from a shared distribution on :math:`\mathbb{R}^d.`
+  This suggests we could define :math:`\D_2` as a distribution over :math:`d \times k` matrices whose columns follow the same
+  distribution used in :math:`\D_1.`
+
+Implementation
+  If :math:`d < k,` then short-axis vectors of a :math:`d \times k` matrix still refer to columns.
+  This makes it possible to express :math:`\D_2` explicitly:
+  
+  .. code:: c++
+    
+    SkDist D2(d, k, arg3, Axis::Short);
+  
+  If :math:`d \geq k,` then we have to define a distribution :math:`\D` over :math:`d \times (m + k)` matrices  
+
+  .. code:: c++
+
+      SkDist D(d, m + k, arg3, Axis::Short);
+
+  and think of :math:`\D_2` as the distribution obtained by selecting the trailing :math:`k` columns of a sample from :math:`\D.`
+
+  This second approach may look wasteful, but that's not really the case.
+  If a DenseSkOp is used in one of RandBLAS' functions for sketching with a specified submatrix,
+  only the submatrix that's necessary for the operation will be generated.
+  The following code snippet provides more insight on the situation.
+
+  .. code:: c++
+
+      SkDist D1( d,     m, arg3, Axis::Short );
+      SkDist D(  d, m + k, arg3, Axis::Short ); 
+      // Since d < m and we're short-axis major, the columns of matrices sampled from
+      // D1 or D1 will be sampled i.i.d. from some distribution on R^d.
+      using SkOp = typename SkDist::distribution_t;
+      SkOp S1( D1,  seed_state ); // seed_state is some RNGState.
+      SkOp  S(  D,  seed_state );
+      // With these definitions, S1 is *always* equal to the first m columns of S.
+      // We recover S2 by working implicitly with the trailing k columns of S.
+
+
+Adding data: right-sketching
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Problem statement
+  We start with an :math:`m \times n` data matrix :math:`\mA_1.`
+  When presented with this matrix, we sample a tall operator :math:`\mS_1` from a distribution :math:`\D_1` of the form
+
+  .. code:: c++
+  
+      // Assumptions
+      //  * SkDist is DenseDist or SparseDist
+      //  * arg3 is any variable that makes sense for the SkDist constructor.
+      //  * This example requires n > d.
+      SkDist D1(n, d, arg3, Axis::Short);
+
+  and we compute a sketch :math:`\mB = \mA_1 \mS_1.`
+
+  Sometime later, an :math:`m \times k` matrix :math:`\mA_2` arrives.
+  We want to independently sample an :math:`k \times d` operator :math:`\mS_2` from *some* :math:`\D_2`
+  and perform a rank-k update :math:`\mB \leftarrow \mB + \mA_2 \mS_2.`
+  Essentially, we want to redefine
+
+  .. math::
+
+      \mB = \begin{bmatrix} \mA_1 & \mA_2 \end{bmatrix}\begin{bmatrix} \mS_1 \\ \mS_2 \end{bmatrix} 
+
+  without having to revisit :math:`\mA_1.`
+
+Conceptual solution
+  The idea is the same as with left-sketching. The difference is that since we're sketching from the
+  right with a tall :math:`n \times d` operator, the short-axis vectors are rows instead of columns.
+  This means the rows of :math:`\mS_1` are sampled independently from some distribution on :math:`\R^d,`
+  and we can define :math:`\mS_2` by sampling its rows from that same distribution.
+
+Implementation
+  If :math:`k > d,` then we can represent :math:`\D_2` explicitly, constructing it as follows.
+  
+  .. code:: c++
+    
+    SkDist D2(k, d, arg3, Axis::Short);
+  
+  If :math:`k \leq d,` then we have to define a distribution :math:`\D` over :math:`(n + k) \times d` matrices  
+
+  .. code:: c++
+
+      SkDist D(n + k, d, arg3, Axis::Short);
+
+  and think of :math:`\D_2` as the distribution obtained by selecting the bottom :math:`k` rows of a sample from :math:`\D.`
+
+  As with the left-sketching case, we provide a code snippet to summarize the situation.
+  
+  .. code:: c++
+
+      SkDist D1(     n, d, arg3, Axis::Short );
+      SkDist D(  n + k, d, arg3, Axis::Short ); 
+      // Since n > d and we're short-axis major, the rows of matrices sampled from
+      // D1 or D1 will be sampled i.i.d. from some distribution on R^d.
+      using SkOp = typename SkDist::distribution_t;
+      SkOp S1( D1,  seed_state ); // seed_state is some RNGState.
+      SkOp  S(  D,  seed_state );
+      // With these definitions, S1 is *always* equal to the first m rows of S.
+      // We recover S2 by working implicitly with the last k rows of S.
+
diff --git a/rtd/source/tutorial/submatrices.rst b/rtd/source/tutorial/submatrices.rst
index b99deaa2..a9b6919b 100644
--- a/rtd/source/tutorial/submatrices.rst
+++ b/rtd/source/tutorial/submatrices.rst
@@ -63,10 +63,10 @@ The corresponding GEMM-like function signature is as follows.
   template <typename T, typename LinOp>
   abstract_gemm(
     blas::Layout ell, blas::Op opA, blas::Op opB, int m, int n, int k,
-    T alpha, LinOp A, int ro_a, int co_a, const T* B, int ldb, T beta, T* C, int ldc
+    T alpha, LinOp &A, int ro_a, int co_a, const T* B, int ldb, T beta, T* C, int ldc
   )
 
-Analgous changes apply just as well in two other cases: when :math:`\mtx{B}` is abstract
+Analogous changes apply just as well in two other cases: when :math:`\mtx{B}` is abstract
 rather than :math:`\mtx{A}`, or when both :math:`\mtx{A}` and :math:`\mtx{B}` are abstract.
 
 RandBLAS doesn't actually have a function called :math:`\texttt{abstract_gemm}.`
diff --git a/rtd/source/tutorial/temp.rst b/rtd/source/tutorial/temp.rst
deleted file mode 100644
index 20cf453f..00000000
--- a/rtd/source/tutorial/temp.rst
+++ /dev/null
@@ -1,68 +0,0 @@
-:orphan:
-
-
-..     \mtx{A} = \begin{bmatrix} \submat(\mtx{A}) &  * \\
-..                               *                & *  
-..                \end{bmatrix}.
-
-.. Alternatively, one can view the submatrix as the middle block in a :math:`3 \times 3` partition of :math:`\mtx{A}`:
-
-..     .. math::
-
-..       \mtx{A} = \begin{bmatrix} (\roa \times \coa)  & *                &  *  \\
-..                                 *                   & \submat(\mtx{A}) &  *  \\
-..                                 *                   & *                &  *  
-..                   \end{bmatrix}.
-
-.. \begin{eqnarray}
-.. \mat(C) &= \alpha \cdot\, \underbrace{\op(\submat(\mtx{A}))}_{m \times k}\, \cdot \,\underbrace{\op(\mat(B))}_{k \times n} + \,\beta \cdot \underbrace{\mat(C)}_{m \times n} \\
-..     \text{ and } \qquad \qquad & \text{ } \\
-..   \mat(C) &= \alpha \cdot\, \underbrace{\op(\mat(A))}_{m \times k}\, \cdot \,\underbrace{\op(\submat(\mtx{B}))}_{k \times n} + \,\beta \cdot \underbrace{\mat(C)}_{m \times n}
-.. \end{eqnarray}
-
-
-.. These functions have the same capabilities as GEMM, in the sense that they permit operating on arbitrary contiguous submatrices.
-.. However, RandBLAS uses a more abstract data model than BLAS, the way that one specifies submatrices needs to change.
-.. Therefore rather than exposing a function for performing :eq:`eq_realisticgemm`, it exposes functions for performing
-
-.. The philosophy of RandBLAS' sketching APIs
-.. ==========================================
-
-.. RandBLAS has two main functions for sketching:
-
-..  * :math:`\texttt{sketch_general}`, which is used for dense data matrices, and 
-..  * :math:`\texttt{sketch_sparse}`, which is used for sparse data matrices.
-
-.. These functions are overloaded and templated to allow for different numerical 
-.. precisions and different types of sketching operators. It's possible to apply 
-.. dense or sparse sketching operators to dense matrices, and to apply dense sketching
-.. operators to sparse matrices. The common thread in both
-.. cases is that the final sketch is always dense.
-
-.. From a mathematical perspective, :math:`\texttt{sketch_general}` and :math:`\texttt{sketch_sparse}`
-.. have the same capabilities as GEMM.
-
-
-
-.. ******************************************************************************
-.. Implementation notes
-.. ******************************************************************************
-
-.. RNGState
-.. ========
-
-.. TODO: Implementation details. Things like the counter and key being arrays. You shouldn't need to interact with the APIs
-.. of these arrays. But, to address any curiosity, one of their nice features is an ``ctr.incr(val)`` method that effectively
-.. encodes addition in a way that correctly handles overflow from one entry of ``ctr`` to the next.
-
-.. Every RNGState has an associated template parameter, RNG.
-.. The default value of the RNG template parameter is :math:`\texttt{Philox4x32}`.
-.. An RNG template parameter with name :math:`\texttt{GeneratorNxW}` will represent
-.. the counter and key by an array of (at most) :math:`\texttt{N}` unsiged :math:`\texttt{W}`-bit integers.
-
-.. DenseSkOp
-.. ===================================
-
-
-.. SparseSkOp
-.. ===================================
\ No newline at end of file
diff --git a/rtd/source/tutorial/updates.rst b/rtd/source/tutorial/updates.rst
deleted file mode 100644
index a553d475..00000000
--- a/rtd/source/tutorial/updates.rst
+++ /dev/null
@@ -1,128 +0,0 @@
-*********************************************************************************************
-Updating sketches with dense sketching operators
-*********************************************************************************************
-
-  .. |denseskop| mathmacro:: \texttt{DenseSkOp}
-  .. |seedstate| mathmacro:: \texttt{seed_state}
-  .. |nextstate| mathmacro:: \texttt{next_state}
-  .. |mtx| mathmacro:: {}
-
-
-RandBLAS makes it easy to *implicitly* extend an initial sketching
-operator :math:`\mtx{S}_1` into an augmented operator :math:`\mtx{S} = [\mtx{S}_1; \mtx{S}_2]` or :math:`\mtx{S} = [\mtx{S}_1, \mtx{S}_2]`.
-There are four scenarios that you can find yourself in where
-this can be done without generating :math:`S` from scratch.
-In all four scenarios, the idea is to
-use the :math:`\nextstate` of :math:`\mtx{S}_1` as the
-:math:`\seedstate` of :math:`\mtx{S}_2`.
-
-There are two reasons why you'd want to
-extend a sketching operator; you might be trying to improve statistical
-quality by increasing sketch size (Scenarios 1 and 4), or you might be
-incorporating new data into a sketch of fixed size (Scenarios 2 and 3).
-The unifying perspective on Scenarios 1 and 4 is that they both add
-*long-axis vectors* to the sketching operator.
-The unifying perspective on
-Scenarios 2 and 3 is that they both add *short-axis vectors* to the
-sketching operator. 
-
-:math:`\texttt{DenseDist}` objects have a :math:`\texttt{major_axis}` member, which states
-whether operators sampled from that distribution are short-axis or
-long-axis major. So when you specify the major axis for a sketching
-operator, you're basically saying whether you want to keep open the possibility of
-improving the statistical quality of a sketch or updating a sketch to
-incorporate more data.
-
-
-Scenario 1
-==========
-
-   Suppose :math:`\mtx{S}_1` is a *wide* :math:`d_1 \times m` row-wise
-   :math:`\denseskop` with seed :math:`c`. It's easy to generate a 
-   :math:`d_2\times m` row-wise :math:`\denseskop` :math:`\mtx{S}_2` in such a way that
-   :math:`\mtx{S} = [\mtx{S}_1; \mtx{S}_2]` is the same as the :math:`(d_1 + d_2) \times m` row-wise
-   :math:`\denseskop` with seed :math:`c`.
-
-This scenario arises if we have a fixed data matrix :math:`\mtx{A}`, an initial
-sketch :math:`\mtx{B}_1 = \mtx{S}_1 \mtx{A}`, and we decide we want a larger sketch for
-statistical reasons. The updated sketch :math:`\mtx{B} = \mtx{S} \mtx{A}` can be expressed as
-
-.. math::
-
-    \mtx{B} = \begin{bmatrix} \mtx{S}_1 \mtx{A} \\ \mtx{S}_2 \mtx{A} \end{bmatrix}.
-
-If :math:`(\mtx{S}_1, \mtx{S}_2, \mtx{S})` satisfy the assumptions above, then the final sketch
-:math:`\mtx{B} = \mtx{S}\mtx{A}` will be the same regardless of whether we computed the sketch
-in one step or two steps. This is desirable for benchmarking and
-debugging RandNLA algorithms where the sketch size is a tuning parameter.
-
-Scenario 2
-==========
-
-   Suppose :math:`\mtx{S}_1` is a *wide* :math:`d \times m_1` column-wise
-   :math:`\denseskop` with seed :math:`c`. It's easy to generate a 
-   :math:`d \times m_2` column-wise :math:`\denseskop` :math:`\mtx{S}_2` so that 
-   :math:`\mtx{S} = [\mtx{S}_1, \mtx{S}_2]` is the same as the :math:`d \times (m_1 + m_2)` column-wise
-   :math:`\denseskop` with seed :math:`c`.
-
-This situation arises if we have an initial data matrix :math:`\mtx{A}_1`, an
-initial sketch :math:`\mtx{B}_1 = \mtx{S}_1 \mtx{A}_1`, and then a new matrix :math:`\mtx{A}_2` arrives in
-such a way that we want a sketch of :math:`A = [\mtx{A}_1; \mtx{A}_2]`. To compute :math:`\mtx{B} = \mtx{S}\mtx{A}`, 
-we update :math:`\mtx{B}_1` according to the formula
-
-.. math::
-
-   \mtx{B} = \begin{bmatrix} \mtx{S}_1 & \mtx{S}_2 \end{bmatrix} \begin{bmatrix} \mtx{A}_1 \\ \mtx{A}_2 \end{bmatrix} = \mtx{B}_1 + \mtx{S}_2 \mtx{A}_2.
-
-If :math:`(\mtx{S}_1, \mtx{S}_2, \mtx{S})` satisfy the assumptions above, then :math:`\mtx{B}` will be the
-same as though we started with all of :math:`\mtx{A}` from the very beginning. This
-is useful for benchmarking and debugging RandNLA algorithms that involve
-operating on data matrices that increase in size over some number of iterations.
-
-.. _scenario-3:
-
-Scenario 3
-==========
-
-   Let :math:`\mtx{S}_1` be a *tall* :math:`n \times d_1` column-wise :math:`\denseskop`
-   with seed :math:`c`. We can easily generate an :math:`n\times d_2` column-wise
-   :math:`\denseskop` :math:`\mtx{S}_2` so that :math:`\mtx{S} = [\mtx{S}_1, \mtx{S}_2]` is the same
-   as the :math:`d \times (n_1 + n_2)` column-wise :math:`\denseskop` with seed :math:`c`.
-
-This arises we have a fixed data matrix :math:`\mtx{A}`, an initial sketch :math:`\mtx{B}_1 = \mtx{A} \mtx{S}_1`,
-and we decide we want a larger sketch for statistical reasons. The
-updated sketch :math:`\mtx{B} = \mtx{A}\mtx{S}` can be expressed as
-
-.. math::
-
-    \mtx{B} = \begin{bmatrix} \mtx{A} \mtx{S}_1 & \mtx{A} \mtx{S}_2 \end{bmatrix}.
-
-If :math:`(\mtx{S}_1, \mtx{S}_2, \mtx{S})` satisfy the assumptions above, then the final sketch
-:math:`B` will be the same regardless of whether we computed the sketch in one
-step or two steps. This is desirable for benchmarking and debugging
-RandNLA algorithms where the sketch size is a tuning parameter.
-
-.. _scenario-4:
-
-Scenario 4
-==========
-
-   Suppose :math:`\mtx{S}_1` is a *tall* :math:`n_1 \times d` row-wise
-   :math:`\denseskop` with seed :math:`c`. It's easy to generate an :math:`n_2\times d`
-   row-wise :math:`\denseskop` :math:`\mtx{S}_2` in such a way that
-   :math:`\mtx{S} = [\mtx{S}_1; \mtx{S}_2]` is the same as the :math:`(n_1 + n_2) \times d` row-wise
-   :math:`\denseskop` with seed :math:`c`.
-
-This situation arises if we have an initial data matrix :math:`\mtx{A}_1`, an initial sketch 
-:math:`\mtx{B}_1 = \mtx{A}_1 \mtx{S}_1`, and then a new matrix :math:`\mtx{A}_2` arrives in such a way that we 
-want a sketch of :math:`\mtx{A} = [\mtx{A}_1, \mtx{A}_2]`. To compute :math:`\mtx{B} = \mtx{A}\mtx{S}`, we update :math:`\mtx{B}_1` 
-according to the formula
-
-.. math::
-
-   \mtx{B} = \begin{bmatrix} \mtx{A}_1 & \mtx{A}_2 \end{bmatrix} \begin{bmatrix} \mtx{S}_1 \\ \mtx{S}_2 \end{bmatrix} = \mtx{B}_1 + \mtx{A}_2 \mtx{S}_2.
-
-If :math:`(\mtx{S}_1, \mtx{S}_2, \mtx{S})` satisfy the assumptions above, then :math:`\mtx{B}` will be the same as though
-we started with all of :math:`\mtx{A}` from the very beginning. This is useful for benchmarking and 
-debugging RandNLA algorithms that involve operating on data matrices that increase in size over 
-some number of iterations.
diff --git a/rtd/source/updates/index.rst b/rtd/source/updates/index.rst
index d8a4667a..236685ea 100644
--- a/rtd/source/updates/index.rst
+++ b/rtd/source/updates/index.rst
@@ -8,7 +8,7 @@ RandBLAS upon request, no matter how old. With any luck, RandBLAS will grow enou
 in the future that we will change this policy to support a handful of versions
 at a time.
 
-RandBLAS follows `Semantic Versioning <semver.org>`_.
+RandBLAS follows `Semantic Versioning <https://semver.org>`_.
 
 
 RandBLAS 0.2
diff --git a/rtd/themes/randblas_rtd/static/custom.js b/rtd/themes/randblas_rtd/static/custom.js
new file mode 100644
index 00000000..f353d17f
--- /dev/null
+++ b/rtd/themes/randblas_rtd/static/custom.js
@@ -0,0 +1,16 @@
+// custom.js
+document.addEventListener("DOMContentLoaded", function() {
+    var sidebar = document.querySelector(".wy-nav-side");
+    var content = document.querySelector(".wy-nav-content");
+    var contentWrap = document.querySelector(".wy-nav-content-wrap");
+    var toggleButton = document.createElement("button");
+    toggleButton.className = "wy-nav-side-toggle";
+    toggleButton.innerHTML = "☰";
+    document.body.appendChild(toggleButton);
+
+    toggleButton.addEventListener("click", function() {
+        sidebar.classList.toggle("collapsed");
+        content.classList.toggle("collapsed");
+        contentWrap.classList.toggle("collapsed");
+    });
+});
diff --git a/rtd/themes/randblas_rtd/static/theme_overrides.css b/rtd/themes/randblas_rtd/static/theme_overrides.css
index 47138aca..94eac829 100644
--- a/rtd/themes/randblas_rtd/static/theme_overrides.css
+++ b/rtd/themes/randblas_rtd/static/theme_overrides.css
@@ -1,5 +1,9 @@
 @import 'css/theme.css';
 
+.MathJax {
+scale: 1.0; /* This definely has an effect on display environments. No effect on in-line text.*/
+}
+
 /* override table width restrictions */
 .wy-table-responsive table td, .wy-table-responsive table th {
     white-space: normal;
@@ -8,12 +12,66 @@
 .wy-table-responsive {
     margin-bottom: 24px;
     max-width: 100%;
-    overflow: visible;
+    overflow: auto;
+}
+
+.wy-nav-side {
+    transition: transform 0.3s ease;
+}
+
+.wy-nav-side.collapsed {
+    transform: translateX(-100%);
+}
+
+.wy-nav-side-toggle {
+    display: block;
+    position: fixed;
+    top: 10px;
+    left: 10px;
+    z-index: 1000;
+    cursor: pointer;
+}
+
+.wy-nav-content {
+    transition: margin-left 0.3s ease;
+    padding: 1.618em 3.236em;
+    height: 100%;
+    max-width: 860px;
+    margin-left: 0;
+    text-align: justify;
+}
+
+.wy-nav-content-wrap.collapsed {
+    transition: margin-left 0.3s ease;
+    margin-left: 0;
+    width: 100%;
+}
+
+.wy-nav-content.collapsed {
+    margin-left: 0;
+    max-width: 860x;
+    margin: 0;
+    width: 100%;
 }
 
 .math {
     text-align: left;
+    overflow-y: hidden;
+    overflow-x: auto;
 }
 .eqno {
     float: right;
-}
\ No newline at end of file
+}
+
+
+/*
+The standard rendering of c++ doyxgen comments involves large margins for many elements.
+The styles below set override the margin choice for all (?) relevant HTML elements.
+*/
+.cpp.function,   .cpp.function dl,   .cpp.function dd,   .cpp.function dt,
+.cpp.struct,     .cpp.struct dl,     .cpp.struct dd,     .cpp.struct dt,
+.cpp.concept,    .cpp.concept dl,    .cpp.concept dd,    .cpp.concept dt,
+.cpp.var,        .cpp.var dl,        .cpp.var dd,        .cpp.var dt,
+.cpp.enum-class, .cpp.enum-class dl, .cpp.enum-class dd, .cpp.enum-class dt {
+    margin-left: 0.5em;
+}
diff --git a/test/CMakeLists.txt b/test/CMakeLists.txt
index e48cd660..dbcc885a 100644
--- a/test/CMakeLists.txt
+++ b/test/CMakeLists.txt
@@ -41,6 +41,8 @@ if (GTest_FOUND)
         test_matmul_cores/test_spmm/test_spmm_csc.cc
         test_matmul_cores/test_spmm/test_spmm_csr.cc
         test_matmul_cores/test_spmm/test_spmm_coo.cc
+
+        test_matmul_wrappers/test_sketch_sparse.cc
     )
     target_link_libraries(SparseRandBLAS_tests RandBLAS GTest::GTest GTest::Main)
     gtest_discover_tests(SparseRandBLAS_tests)
diff --git a/test/DevNotes.md b/test/DevNotes.md
index 5e5b1dd2..9f34d500 100644
--- a/test/DevNotes.md
+++ b/test/DevNotes.md
@@ -1,7 +1,7 @@
 # Developer notes for RandBLAS' testing infrastructure
 
 
-This document doesn't don't defend previous design decisions.
+This document doesn't defend previous design decisions.
 It just explains how things work right now.
 That's easier for me (Riley) to write, and it's more useful to others.
 (Plus, it helps make the pros and cons of the current approach self-evident.)
@@ -30,7 +30,7 @@ but I haven't actually verified this.
 
 ### test_basic_rng
 
-  * test_r123.cc has deterministic tests for Random123. The tests comapre generated values
+  * test_r123.cc has deterministic tests for Random123. The tests compare generated values
     to reference values computed ahead of time. The tests are __extremely__ messy, since they're
     adapted from tests in the official Random123 repository, and Random123 needs to handle a far wider
     range of compilers and languages than we assume for RandBLAS.
diff --git a/test/handrolled_lapack.hh b/test/handrolled_lapack.hh
index 9b2fbd7e..5e5977db 100644
--- a/test/handrolled_lapack.hh
+++ b/test/handrolled_lapack.hh
@@ -79,7 +79,7 @@ void chol_qr(int64_t m, int64_t n, T* A, T* R, int64_t chol_block_size = 32, boo
     if (twice) {
         T* R2 = R + n*n;
         chol_qr(m, n, A, R2, chol_block_size, false);
-        RandBLAS::util::overwrite_triangle(layout, blas::Uplo::Lower, n, 1, (T) 0.0, R,  n);
+        RandBLAS::overwrite_triangle(layout, blas::Uplo::Lower, n, 1, R,  n);
         // now overwrite R = R2 R with TRMM (saying R2 is the triangular matrix)
         blas::trmm(layout, blas::Side::Left, uplo, blas::Op::NoTrans, blas::Diag::NonUnit, n, n, (T) 1.0, R2, n, R, n);
     }
@@ -134,7 +134,7 @@ void qr_block_cgs2(int64_t m, int64_t n, T* A, T* R, std::vector<T> &bigwork, in
     T* littlework = R2 + n*n;
     std::fill(R, R + n * n, (T) 0.0);
     qr_block_cgs(m, n, A, R, n, littlework, b);
-    RandBLAS::util::overwrite_triangle(blas::Layout::ColMajor, blas::Uplo::Lower, n, 1, (T) 0.0, R, n);
+    RandBLAS::overwrite_triangle(blas::Layout::ColMajor, blas::Uplo::Lower, n, 1, R, n);
     qr_block_cgs(m, n, A, R2, n, littlework, b);
     blas::trmm(
         blas::Layout::ColMajor, blas::Side::Left, blas::Uplo::Upper, blas::Op::NoTrans, blas::Diag::NonUnit,
@@ -197,9 +197,9 @@ int64_t posdef_eig_chol_iteration(int64_t n, T* A, T* eigvals, T reltol, int64_t
     std::vector<int64_t> pivots(n, 0);
     for (; iter < max_iters; ++iter) {
         potrf_upper(n, A, n, b);
-        RandBLAS::util::overwrite_triangle(Layout::ColMajor, Uplo::Lower, n, 1, (T) 0.0, A, n);
+        RandBLAS::overwrite_triangle(Layout::ColMajor, Uplo::Lower, n, 1, A, n);
         blas::syrk(Layout::ColMajor, Uplo::Upper, Op::NoTrans, n, n, (T) 1.0, A, n, (T) 0.0, G, n);
-        RandBLAS::util::symmetrize(Layout::ColMajor, Uplo::Upper, n, G, n);
+        RandBLAS::symmetrize(Layout::ColMajor, Uplo::Upper, n, G, n);
         for (int64_t i = 0; i < n; ++i)
             eigvals[i] = G[i * n + i];
         converged = extremal_eigvals_converged_gershgorin(n, G, reltol);
@@ -231,7 +231,7 @@ inline int64_t required_powermethod_iters(int64_t n, T p_fail, T tol) {
 
 template <typename T, typename FUNC, typename RNG>
 std::pair<T, RNGState<RNG>> power_method(int64_t n, FUNC &A, T* v, T tol, T failure_prob, const RNGState<RNG> &state) {
-    auto next_state = RandBLAS::fill_dense(blas::Layout::ColMajor, {n, 1}, n, 1, 0, 0, v, state);
+    auto next_state = RandBLAS::fill_dense_unpacked(blas::Layout::ColMajor, {n, 1}, n, 1, 0, 0, v, state);
     std::vector<T> work(n, 0.0);
     T* u = work.data();
     T norm = blas::nrm2(n, v, 1);
diff --git a/test/test_basic_rng/benchmark_speed.cc b/test/test_basic_rng/benchmark_speed.cc
index 1cd66e32..5f08396d 100644
--- a/test/test_basic_rng/benchmark_speed.cc
+++ b/test/test_basic_rng/benchmark_speed.cc
@@ -80,7 +80,7 @@ int main(int argc, char **argv)
     int64_t m = atoi(argv[1]);
     int64_t n = atoi(argv[2]);
     int64_t d = m*n;
-    RandBLAS::DenseDist dist{m, n, RandBLAS::DenseDistName::Uniform};
+    RandBLAS::DenseDist dist{m, n, RandBLAS::ScalarDist::Uniform};
 
     std::vector<T> mat(d);
 
diff --git a/test/test_basic_rng/test_continuous.cc b/test/test_basic_rng/test_continuous.cc
index 03b5d87d..15c2e493 100644
--- a/test/test_basic_rng/test_continuous.cc
+++ b/test/test_basic_rng/test_continuous.cc
@@ -32,7 +32,7 @@
 #include "RandBLAS/util.hh"
 #include "RandBLAS/dense_skops.hh"
 using RandBLAS::RNGState;
-using RandBLAS::DenseDistName;
+using RandBLAS::ScalarDist;
 #include "rng_common.hh"
 
 #include <algorithm>
@@ -54,12 +54,12 @@ class TestScalarDistributions : public ::testing::Test {
  
     template <typename T>
     static void kolmogorov_smirnov_tester(
-        std::vector<T> &samples, double critical_value, DenseDistName dn
+        std::vector<T> &samples, double critical_value, ScalarDist sd
     ) { 
-        auto F_true = [dn](T x) {
-            if (dn == DenseDistName::Gaussian) {
+        auto F_true = [sd](T x) {
+            if (sd == ScalarDist::Gaussian) {
                 return RandBLAS_StatTests::standard_normal_cdf(x);
-            } else if (dn == DenseDistName::Uniform) {
+            } else if (sd == ScalarDist::Uniform) {
                 return RandBLAS_StatTests::uniform_syminterval_cdf(x, (T) std::sqrt(3));
             } else {
                 std::string msg = "Unrecognized distributions name";
@@ -108,13 +108,14 @@ class TestScalarDistributions : public ::testing::Test {
     }
 
     template <typename T>
-    static void run(double significance, int64_t num_samples, DenseDistName dn, uint32_t seed) {
+    static void run(double significance, int64_t num_samples, ScalarDist sd, uint32_t seed) {
         using RandBLAS_StatTests::KolmogorovSmirnovConstants::critical_value_rep_mutator;
         auto critical_value = critical_value_rep_mutator(num_samples, significance);
         RNGState state(seed);
         std::vector<T> samples(num_samples, -1);
-        RandBLAS::fill_dense({num_samples, 1, dn, RandBLAS::MajorAxis::Long}, samples.data(), state);
-        kolmogorov_smirnov_tester(samples, critical_value, dn);
+        RandBLAS::DenseDist D(num_samples, 1, sd, RandBLAS::Axis::Long);
+        RandBLAS::fill_dense(D, samples.data(), state);
+        kolmogorov_smirnov_tester(samples, critical_value, sd);
         return;
     }
 };
@@ -122,45 +123,45 @@ class TestScalarDistributions : public ::testing::Test {
 TEST_F(TestScalarDistributions, uniform_ks_generous) {
     double s = 1e-6;
     for (uint32_t i = 999; i < 1011; ++i) {
-        run<double>(s, 100000, DenseDistName::Uniform, i);
-        run<double>(s, 10000,  DenseDistName::Uniform, i*i);
-        run<double>(s, 1000,   DenseDistName::Uniform, i*i*i);
+        run<double>(s, 100000, ScalarDist::Uniform, i);
+        run<double>(s, 10000,  ScalarDist::Uniform, i*i);
+        run<double>(s, 1000,   ScalarDist::Uniform, i*i*i);
     }
 }
 
 TEST_F(TestScalarDistributions, uniform_ks_moderate) {
     double s = 1e-4;
-    run<float>(s, 100000, DenseDistName::Uniform, 0);
-    run<float>(s, 10000,  DenseDistName::Uniform, 0);
-    run<float>(s, 1000,   DenseDistName::Uniform, 0);
+    run<float>(s, 100000, ScalarDist::Uniform, 0);
+    run<float>(s, 10000,  ScalarDist::Uniform, 0);
+    run<float>(s, 1000,   ScalarDist::Uniform, 0);
 }
 
 TEST_F(TestScalarDistributions, uniform_ks_skeptical) {
     double s = 1e-2;
-    run<float>(s, 100000, DenseDistName::Uniform, 0);
-    run<float>(s, 10000,  DenseDistName::Uniform, 0);
-    run<float>(s, 1000,   DenseDistName::Uniform, 0);
+    run<float>(s, 100000, ScalarDist::Uniform, 0);
+    run<float>(s, 10000,  ScalarDist::Uniform, 0);
+    run<float>(s, 1000,   ScalarDist::Uniform, 0);
 }
 
 TEST_F(TestScalarDistributions, guassian_ks_generous) {
     double s = 1e-6;
     for (uint32_t i = 99; i < 103; ++i) {
-        run<double>(s, 100000, DenseDistName::Gaussian, i);
-        run<double>(s, 10000,  DenseDistName::Gaussian, i*i);
-        run<double>(s, 1000,   DenseDistName::Gaussian, i*i*i);
+        run<double>(s, 100000, ScalarDist::Gaussian, i);
+        run<double>(s, 10000,  ScalarDist::Gaussian, i*i);
+        run<double>(s, 1000,   ScalarDist::Gaussian, i*i*i);
     }
 }
 
 TEST_F(TestScalarDistributions, guassian_ks_moderate) {
     double s = 1e-4;
-    run<float>(s, 100000, DenseDistName::Gaussian, 0);
-    run<float>(s, 10000,  DenseDistName::Gaussian, 0);
-    run<float>(s, 1000,   DenseDistName::Gaussian, 0);
+    run<float>(s, 100000, ScalarDist::Gaussian, 0);
+    run<float>(s, 10000,  ScalarDist::Gaussian, 0);
+    run<float>(s, 1000,   ScalarDist::Gaussian, 0);
 }
 
 TEST_F(TestScalarDistributions, guassian_ks_skeptical) {
     double s = 1e-2;
-    run<float>(s, 100000, DenseDistName::Gaussian, 0);
-    run<float>(s, 10000,  DenseDistName::Gaussian, 0);
-    run<float>(s, 1000,   DenseDistName::Gaussian, 0);
+    run<float>(s, 100000, ScalarDist::Gaussian, 0);
+    run<float>(s, 10000,  ScalarDist::Gaussian, 0);
+    run<float>(s, 1000,   ScalarDist::Gaussian, 0);
 }
diff --git a/test/test_basic_rng/test_discrete.cc b/test/test_basic_rng/test_discrete.cc
index 92014d29..6b485dca 100644
--- a/test/test_basic_rng/test_discrete.cc
+++ b/test/test_basic_rng/test_discrete.cc
@@ -65,7 +65,7 @@ class TestSampleIndices : public ::testing::Test
     static void test_iid_uniform_smoke(int64_t N, int64_t k, uint32_t seed) { 
         RNGState state(seed);
         std::vector<int64_t> samples(k, -1);
-        RandBLAS::util::sample_indices_iid_uniform(N, k, samples.data(), state);
+        RandBLAS::sample_indices_iid_uniform(N, k, samples.data(), state);
         int64_t* data = samples.data();
         for (int64_t i = 0; i < k; ++i) {
             ASSERT_LT(data[i], N);
@@ -81,7 +81,7 @@ class TestSampleIndices : public ::testing::Test
         std::vector<float> sample_cdf(N, 0.0);
         for (int64_t s : samples)
             sample_cdf[s] += 1;
-        RandBLAS::util::weights_to_cdf(N, sample_cdf.data());
+        RandBLAS::weights_to_cdf(N, sample_cdf.data());
 
         for (int i = 0; i < N; ++i) {
             float F_empirical = sample_cdf[i];
@@ -97,11 +97,11 @@ class TestSampleIndices : public ::testing::Test
         auto critical_value = critical_value_rep_mutator(num_samples, significance);
 
         std::vector<float> true_cdf(N, 1.0);
-        RandBLAS::util::weights_to_cdf(N, true_cdf.data());
+        RandBLAS::weights_to_cdf(N, true_cdf.data());
 
         RNGState state(seed);
         std::vector<int64_t> samples(num_samples, -1);
-        RandBLAS::util::sample_indices_iid_uniform(N, num_samples, samples.data(), state);
+        RandBLAS::sample_indices_iid_uniform(N, num_samples, samples.data(), state);
 
         index_set_kolmogorov_smirnov_tester(samples, true_cdf, critical_value);
         return;
@@ -115,11 +115,11 @@ class TestSampleIndices : public ::testing::Test
         std::vector<float> true_cdf{};
         for (int i = 0; i < N; ++i)
             true_cdf.push_back(std::pow(1.0/((float)i + 1.0), exponent));
-        RandBLAS::util::weights_to_cdf(N, true_cdf.data());
+        RandBLAS::weights_to_cdf(N, true_cdf.data());
 
         RNGState state(seed);
         std::vector<int64_t> samples(num_samples, -1);
-        RandBLAS::util::sample_indices_iid(N, true_cdf.data(), num_samples, samples.data(), state);
+        RandBLAS::sample_indices_iid(N, true_cdf.data(), num_samples, samples.data(), state);
 
         index_set_kolmogorov_smirnov_tester(samples, true_cdf, critical_value);
         return;
@@ -131,8 +131,8 @@ class TestSampleIndices : public ::testing::Test
         std::vector<int64_t> samples(num_samples, -1);
         RNGState state(seed);
 
-        using RandBLAS::util::weights_to_cdf;
-        using RandBLAS::util::sample_indices_iid;
+        using RandBLAS::weights_to_cdf;
+        using RandBLAS::sample_indices_iid;
 
         // Test case 1: distribution is nonuniform, with mass only on even elements != 10.
         std::vector<float> true_cdf(N, 0.0);
@@ -166,7 +166,7 @@ class TestSampleIndices : public ::testing::Test
     //
 
     static std::vector<float> fisher_yates_cdf(const std::vector<int64_t> &idxs_major, int64_t K, int64_t num_samples) {
-        using RandBLAS::util::weights_to_cdf;
+        using RandBLAS::weights_to_cdf;
         std::vector<float> empirical_cdf;
 
         // If K is 0, then there's nothing to count over and we should just return 1
@@ -214,7 +214,7 @@ class TestSampleIndices : public ::testing::Test
     static void single_test_fisher_yates_kolmogorov_smirnov(int64_t N, int64_t K, double significance, int64_t num_samples, uint32_t seed) {
         using RandBLAS::sparse::repeated_fisher_yates;
         using RandBLAS_StatTests::hypergeometric_pmf_arr;
-        using RandBLAS::util::weights_to_cdf;
+        using RandBLAS::weights_to_cdf;
         using RandBLAS_StatTests::KolmogorovSmirnovConstants::critical_value_rep_mutator;
 
         auto critical_value = critical_value_rep_mutator(num_samples, significance);
@@ -224,7 +224,7 @@ class TestSampleIndices : public ::testing::Test
         RNGState state(seed);
 
         // Generate repeated Fisher-Yates in idxs_major
-        state = repeated_fisher_yates(state, K, N, num_samples, indices.data());
+        state = repeated_fisher_yates(K, N, num_samples, indices.data(), state);
 
         // Generate the true hypergeometric cdf (get the pdf first)
         std::vector<float> true_cdf = hypergeometric_pmf_arr<float>(N, K, K);
diff --git a/test/test_basic_rng/test_distortion.cc b/test/test_basic_rng/test_distortion.cc
index dfbabedf..a2ad0ec7 100644
--- a/test/test_basic_rng/test_distortion.cc
+++ b/test/test_basic_rng/test_distortion.cc
@@ -32,7 +32,7 @@
 #include "RandBLAS/util.hh"
 #include "RandBLAS/dense_skops.hh"
 using RandBLAS::DenseDist;
-using RandBLAS::DenseDistName;
+using RandBLAS::ScalarDist;
 using RandBLAS::RNGState;
 
 #include "rng_common.hh"
@@ -47,18 +47,18 @@ class TestSubspaceDistortion : public ::testing::Test {
     protected:
 
     template <typename T>
-    void run_general(DenseDistName name, T distortion, int64_t d, int64_t N, uint32_t key) {
+    void run_general(ScalarDist name, T distortion, int64_t d, int64_t N, uint32_t key) {
         auto layout = blas::Layout::ColMajor;
         DenseDist D(d, N, name);
         std::vector<T> S(d*N);
         std::cout << "(d, N) = ( " << d << ", " << N << " )\n";
         RandBLAS::RNGState<r123::Philox4x32> state(key);
         auto next_state = RandBLAS::fill_dense(D, S.data(), state);
-        T inv_stddev = (name == DenseDistName::Gaussian) ? (T) 1.0 : (T) 1.0;
+        T inv_stddev = (name == ScalarDist::Gaussian) ? (T) 1.0 : (T) 1.0;
         blas::scal(d*N, inv_stddev / std::sqrt(d), S.data(), 1);
         std::vector<T> G(N*N, 0.0);
         blas::syrk(layout, blas::Uplo::Upper, blas::Op::Trans, N, d, (T)1.0, S.data(), d, (T)0.0, G.data(), N);
-        RandBLAS::util::symmetrize(layout, blas::Uplo::Upper, N, G.data(), N);
+        RandBLAS::symmetrize(layout, blas::Uplo::Upper, N, G.data(), N);
         
         std::vector<T> eigvecs(2*N, 0.0);
         std::vector<T> subwork{};
@@ -100,7 +100,7 @@ class TestSubspaceDistortion : public ::testing::Test {
         val *= val;
         int64_t N = (int64_t) std::ceil(val);
         int64_t d = std::ceil( std::pow((1 + tau) / distortion, 2) * N );
-        run_general<T>(DenseDistName::Gaussian, distortion, d, N, key);
+        run_general<T>(ScalarDist::Gaussian, distortion, d, N, key);
         return;
     }
 
@@ -111,7 +111,7 @@ class TestSubspaceDistortion : public ::testing::Test {
         T epsnet_spectralnorm_factor = 1.0; // should be 4.0
         T theta = epsnet_spectralnorm_factor * c6 * (rate + std::log(9));
         int64_t d = std::ceil(N * theta * std::pow(distortion, -2));
-        run_general<T>(DenseDistName::Uniform, distortion, d, N, key);
+        run_general<T>(ScalarDist::Uniform, distortion, d, N, key);
         return;
     }
 };
diff --git a/test/test_datastructures/test_denseskop.cc b/test/test_datastructures/test_denseskop.cc
index 884243be..d2088097 100644
--- a/test/test_datastructures/test_denseskop.cc
+++ b/test/test_datastructures/test_denseskop.cc
@@ -106,7 +106,7 @@ class TestDenseMoments : public ::testing::Test {
         uint32_t key,
         int64_t n_rows,
         int64_t n_cols,
-        RandBLAS::DenseDistName dn,
+        RandBLAS::ScalarDist sd,
         T expect_stddev
     ) {
         // Allocate workspace
@@ -114,7 +114,7 @@ class TestDenseMoments : public ::testing::Test {
         std::vector<T> A(size, 0.0);
 
         // Construct the sketching operator
-        RandBLAS::DenseDist D(n_rows, n_cols, dn);
+        RandBLAS::DenseDist D(n_rows, n_cols, sd);
         auto state = RandBLAS::RNGState(key);
         auto next_state = RandBLAS::fill_dense(D, A.data(), state);
 
@@ -136,25 +136,25 @@ class TestDenseMoments : public ::testing::Test {
 // For small matrix sizes, mean and stddev are not very close to desired vals.
 TEST_F(TestDenseMoments, Gaussian)
 {
-    auto dn = RandBLAS::DenseDistName::Gaussian;
+    auto sd = RandBLAS::ScalarDist::Gaussian;
     for (uint32_t key : {0, 1, 2})
     {
-        test_mean_stddev<float>(key, 500, 500, dn, 1.0);
-        test_mean_stddev<double>(key, 203, 203, dn, 1.0);
-        test_mean_stddev<double>(key, 203, 503, dn, 1.0);
+        test_mean_stddev<float>(key, 500, 500, sd, 1.0);
+        test_mean_stddev<double>(key, 203, 203, sd, 1.0);
+        test_mean_stddev<double>(key, 203, 503, sd, 1.0);
     }
 }
 
 // For small matrix sizes, mean and stddev are not very close to desired vals.
 TEST_F(TestDenseMoments, Uniform)
 {
-    auto dn = RandBLAS::DenseDistName::Uniform;
+    auto sd = RandBLAS::ScalarDist::Uniform;
     double expect_stddev = 1.0;
     for (uint32_t key : {0, 1, 2})
     {
-        test_mean_stddev<float>(key, 500, 500, dn, (float) expect_stddev);
-        test_mean_stddev<double>(key, 203, 203, dn, expect_stddev);
-        test_mean_stddev<double>(key, 203, 503, dn, expect_stddev);
+        test_mean_stddev<float>(key, 500, 500, sd, (float) expect_stddev);
+        test_mean_stddev<double>(key, 203, 203, sd, expect_stddev);
+        test_mean_stddev<double>(key, 203, 503, sd, expect_stddev);
     }
 }
 
@@ -344,19 +344,19 @@ TEST(TestDenseThreading, GaussianPhilox) {
 class TestFillAxis : public::testing::Test
 {
     protected:
-        static inline auto distname = RandBLAS::DenseDistName::Uniform;
+        static inline auto distname = RandBLAS::ScalarDist::Uniform;
 
     template <typename T>
-    static void auto_transpose(int64_t short_dim, int64_t long_dim, RandBLAS::MajorAxis ma) {
+    static void auto_transpose(int64_t short_dim, int64_t long_dim, RandBLAS::Axis major_axis) {
         uint32_t seed = 99;
     
         // make the wide sketching operator
-        RandBLAS::DenseDist D_wide {short_dim, long_dim, distname, ma};
+        RandBLAS::DenseDist D_wide(short_dim, long_dim, distname, major_axis);
         RandBLAS::DenseSkOp<T> S_wide(D_wide, seed);
         RandBLAS::fill_dense(S_wide);
 
         // make the tall sketching operator
-        RandBLAS::DenseDist D_tall {long_dim, short_dim, distname, ma};
+        RandBLAS::DenseDist D_tall(long_dim, short_dim, distname, major_axis);
         RandBLAS::DenseSkOp<T> S_tall(D_tall, seed);
         RandBLAS::fill_dense(S_tall);
 
@@ -379,27 +379,27 @@ class TestFillAxis : public::testing::Test
 };
 
 TEST_F(TestFillAxis, autotranspose_long_axis_3x5) {
-    auto_transpose<float>(3, 5, RandBLAS::MajorAxis::Long);
+    auto_transpose<float>(3, 5, RandBLAS::Axis::Long);
 }
 
 TEST_F(TestFillAxis, autotranspose_short_axis_3x5) {
-    auto_transpose<float>(3, 5, RandBLAS::MajorAxis::Short);
+    auto_transpose<float>(3, 5, RandBLAS::Axis::Short);
 }
 
 TEST_F(TestFillAxis, autotranspose_long_axis_4x8) {
-    auto_transpose<float>(4, 8, RandBLAS::MajorAxis::Long);
+    auto_transpose<float>(4, 8, RandBLAS::Axis::Long);
 }
 
 TEST_F(TestFillAxis, autotranspose_short_axis_4x8) {
-    auto_transpose<float>(4, 8, RandBLAS::MajorAxis::Short);
+    auto_transpose<float>(4, 8, RandBLAS::Axis::Short);
 }
 
 TEST_F(TestFillAxis, autotranspose_long_axis_2x4) {
-    auto_transpose<float>(2, 4, RandBLAS::MajorAxis::Long);
+    auto_transpose<float>(2, 4, RandBLAS::Axis::Long);
 }
 
 TEST_F(TestFillAxis, autotranspose_short_axis_2x4) {
-    auto_transpose<float>(2, 4, RandBLAS::MajorAxis::Short);
+    auto_transpose<float>(2, 4, RandBLAS::Axis::Short);
 }
 
 class TestDenseSkOpStates : public ::testing::Test
@@ -411,12 +411,12 @@ class TestDenseSkOpStates : public ::testing::Test
         uint32_t key,
         int64_t n_rows,
         int64_t n_cols,
-        RandBLAS::DenseDistName dn
+        RandBLAS::ScalarDist sd
     ) {
         randblas_require(n_rows > n_cols);
-        RandBLAS::DenseDist D1(    n_rows, n_cols/2,          dn, RandBLAS::MajorAxis::Long);
-        RandBLAS::DenseDist D2(    n_rows, n_cols - n_cols/2, dn, RandBLAS::MajorAxis::Long);
-        RandBLAS::DenseDist Dfull( n_rows, n_cols,            dn, RandBLAS::MajorAxis::Long);
+        RandBLAS::DenseDist D1(    n_rows, n_cols/2,          sd, RandBLAS::Axis::Long);
+        RandBLAS::DenseDist D2(    n_rows, n_cols - n_cols/2, sd, RandBLAS::Axis::Long);
+        RandBLAS::DenseDist Dfull( n_rows, n_cols,            sd, RandBLAS::Axis::Long);
         RandBLAS::RNGState state(key);
         int64_t size = n_rows * n_cols;
 
@@ -444,12 +444,12 @@ class TestDenseSkOpStates : public ::testing::Test
         uint32_t key,
         int64_t n_rows,
         int64_t n_cols,
-        RandBLAS::DenseDistName dn
+        RandBLAS::ScalarDist sd
     ) {
         float *buff = new float[n_rows*n_cols];
         RandBLAS::RNGState state(key);
 
-        RandBLAS::DenseDist D(n_rows, n_cols, dn);
+        RandBLAS::DenseDist D(n_rows, n_cols, sd);
 
         auto actual_final_state = RandBLAS::fill_dense(D, buff, state);
         auto actual_c = actual_final_state.counter;
@@ -468,22 +468,22 @@ class TestDenseSkOpStates : public ::testing::Test
 
 TEST_F(TestDenseSkOpStates, concat_tall_with_long_major_axis) {
     for (uint32_t key : {0, 1, 2}) {
-        auto dn = RandBLAS::DenseDistName::Gaussian;
-        test_concatenate_along_columns<double>(key, 13, 7, dn);
-        test_concatenate_along_columns<double>(key, 80, 40, dn);
-        test_concatenate_along_columns<double>(key, 83, 41, dn);
-        test_concatenate_along_columns<double>(key, 91, 43, dn);
-        test_concatenate_along_columns<double>(key, 97, 47, dn);
+        auto sd = RandBLAS::ScalarDist::Gaussian;
+        test_concatenate_along_columns<double>(key, 13, 7, sd);
+        test_concatenate_along_columns<double>(key, 80, 40, sd);
+        test_concatenate_along_columns<double>(key, 83, 41, sd);
+        test_concatenate_along_columns<double>(key, 91, 43, sd);
+        test_concatenate_along_columns<double>(key, 97, 47, sd);
     }
 }
 
 TEST_F(TestDenseSkOpStates, compare_skopless_fill_dense_to_compute_next_state) {
     for (uint32_t key : {0, 1, 2}) {
-        auto dn = RandBLAS::DenseDistName::Gaussian;
-        test_compute_next_state<r123::Philox4x32>(key, 13, 7, dn);
-        test_compute_next_state<r123::Philox4x32>(key, 11, 5, dn);
-        test_compute_next_state<r123::Philox4x32>(key, 131, 71, dn);
-        test_compute_next_state<r123::Philox4x32>(key, 80, 40, dn);
-        test_compute_next_state<r123::Philox4x32>(key, 91, 43, dn);
+        auto sd = RandBLAS::ScalarDist::Gaussian;
+        test_compute_next_state<r123::Philox4x32>(key, 13, 7, sd);
+        test_compute_next_state<r123::Philox4x32>(key, 11, 5, sd);
+        test_compute_next_state<r123::Philox4x32>(key, 131, 71, sd);
+        test_compute_next_state<r123::Philox4x32>(key, 80, 40, sd);
+        test_compute_next_state<r123::Philox4x32>(key, 91, 43, sd);
     }
 }
diff --git a/test/test_datastructures/test_sparseskop.cc b/test/test_datastructures/test_sparseskop.cc
index 7709de52..c94d32d9 100644
--- a/test/test_datastructures/test_sparseskop.cc
+++ b/test/test_datastructures/test_sparseskop.cc
@@ -34,6 +34,13 @@
 #include <gtest/gtest.h>
 #include <math.h>
 
+using RandBLAS::RNGState;
+using RandBLAS::SignedInteger;
+using RandBLAS::SparseDist;
+using RandBLAS::SparseSkOp;
+using RandBLAS::Axis;
+using RandBLAS::fill_sparse;
+
 
 class TestSparseSkOpConstruction : public ::testing::Test
 {
@@ -75,35 +82,79 @@ class TestSparseSkOpConstruction : public ::testing::Test
         }
     }
 
-    template <RandBLAS::SignedInteger sint_t>
+    template <SignedInteger sint_t>
     void proper_saso_construction(int64_t d, int64_t m, int64_t key_index, int64_t nnz_index) {
-        using RNG = RandBLAS::SparseSkOp<float>::state_t::generator;
-        RandBLAS::SparseSkOp<float, RNG, sint_t> S0(
-            {d, m, vec_nnzs[nnz_index], RandBLAS::MajorAxis::Short}, keys[key_index]
-        );
-        RandBLAS::fill_sparse(S0);
+        using RNG = SparseSkOp<float>::state_t::generator;
+        SparseDist D0(d, m, vec_nnzs[nnz_index], Axis::Short);
+        SparseSkOp<float, RNG, sint_t> S0(D0, keys[key_index]);
+        fill_sparse(S0);
         if (d < m) {
-                check_fixed_nnz_per_col(S0);
+            check_fixed_nnz_per_col(S0);
         } else {
-                check_fixed_nnz_per_row(S0);
+            check_fixed_nnz_per_row(S0);
         }
     }
 
-    template <RandBLAS::SignedInteger sint_t>
+    template <SignedInteger sint_t>
     void proper_laso_construction(int64_t d, int64_t m, int64_t key_index, int64_t nnz_index) {
-        using RNG = RandBLAS::SparseSkOp<float>::state_t::generator;
-        RandBLAS::SparseSkOp<float, RNG, sint_t> S0(
-            {d, m, vec_nnzs[nnz_index], RandBLAS::MajorAxis::Long}, keys[key_index]
-        );
-        RandBLAS::fill_sparse(S0);
+        using RNG = SparseSkOp<float>::state_t::generator;
+        SparseDist D0(d, m, vec_nnzs[nnz_index], Axis::Long);
+        SparseSkOp<float, RNG, sint_t> S0(D0, keys[key_index]);
+        fill_sparse(S0);
         if (d < m) {
-                check_fixed_nnz_per_row(S0);
+            check_fixed_nnz_per_row(S0);
         } else {
-                check_fixed_nnz_per_col(S0);
+            check_fixed_nnz_per_col(S0);
         } 
     }
+
+    template <SignedInteger sint_t, typename T = float>
+    void respect_ownership(int64_t d, int64_t m) {
+        RNGState state(0);
+        SparseDist sd(d, m, 2, Axis::Short);
+
+        std::vector<sint_t> rows(sd.full_nnz, -1);
+        std::vector<sint_t> cols(sd.full_nnz, -1);
+        std::vector<T> vals(sd.full_nnz, -0.5);
+        auto rows_copy = rows;
+        auto cols_copy = cols;
+        auto vals_copy = vals;
+
+        auto next_state = state; // it's safe to pass in a nonsense value, since we aren't going reference this again.
+        auto S = new SparseSkOp(sd, state, next_state, -1, vals.data(), rows.data(), cols.data());
+        // check that nothing has changed
+        test::comparison::buffs_approx_equal(rows.data(), rows_copy.data(), sd.full_nnz, __PRETTY_FUNCTION__, __FILE__, __LINE__, (sint_t) 0, (sint_t) 0);
+        test::comparison::buffs_approx_equal(cols.data(), cols_copy.data(), sd.full_nnz, __PRETTY_FUNCTION__, __FILE__, __LINE__, (sint_t) 0, (sint_t) 0);
+        test::comparison::buffs_approx_equal(vals.data(), vals_copy.data(), sd.full_nnz, __PRETTY_FUNCTION__, __FILE__, __LINE__, (T) 0, (T) 0);
+        fill_sparse(*S);
+        rows_copy = rows;
+        cols_copy = cols;
+        vals_copy = vals;
+        // check that everything has been overwritten
+        for (int i = 0; i < sd.full_nnz; ++i) {
+            EXPECT_GE(rows[i], 0);
+            EXPECT_GE(cols[i], 0);
+            EXPECT_NE(vals[i], -0.5);
+        }
+        // delete S, and make sure the rows,cols,vals are unchanged from before the deletion.
+        delete S;
+        test::comparison::buffs_approx_equal(rows.data(), rows_copy.data(), sd.full_nnz, __PRETTY_FUNCTION__, __FILE__, __LINE__, (sint_t) 0, (sint_t) 0);
+        test::comparison::buffs_approx_equal(cols.data(), cols_copy.data(), sd.full_nnz, __PRETTY_FUNCTION__, __FILE__, __LINE__, (sint_t) 0, (sint_t) 0);
+        test::comparison::buffs_approx_equal(vals.data(), vals_copy.data(), sd.full_nnz, __PRETTY_FUNCTION__, __FILE__, __LINE__, (T) 0, (T) 0);
+        return;
+    }
 };
 
+TEST_F(TestSparseSkOpConstruction, respect_ownership) {
+    respect_ownership<int64_t>(7, 20);
+    respect_ownership<int64_t>(7, 20);
+    respect_ownership<int64_t>(7, 20);
+
+    respect_ownership<int>(7, 20);
+    respect_ownership<int>(7, 20);
+    respect_ownership<int>(7, 20);
+}
+
 
 ////////////////////////////////////////////////////////////////////////
 //
@@ -180,18 +231,18 @@ TEST_F(TestSparseSkOpConstruction, LASO_Dim_7by20) {
     proper_laso_construction<int64_t>(7, 20, 0, 0);
     proper_laso_construction<int64_t>(7, 20, 1, 0);
     proper_laso_construction<int64_t>(7, 20, 2, 0);
-    // vec_nnz=2
-    proper_laso_construction<int64_t>(7, 20, 0, 1);
-    proper_laso_construction<int64_t>(7, 20, 1, 1);
-    proper_laso_construction<int64_t>(7, 20, 2, 1);
-    // vec_nnz=3
-    proper_laso_construction<int64_t>(7, 20, 0, 2);
-    proper_laso_construction<int64_t>(7, 20, 1, 2);
-    proper_laso_construction<int64_t>(7, 20, 2, 2);
-    // vec_nnz=7
-    proper_laso_construction<int64_t>(7, 20, 0, 3);
-    proper_laso_construction<int64_t>(7, 20, 1, 3);
-    proper_laso_construction<int64_t>(7, 20, 2, 3);
+    // // vec_nnz=2
+    // proper_laso_construction<int64_t>(7, 20, 0, 1);
+    // proper_laso_construction<int64_t>(7, 20, 1, 1);
+    // proper_laso_construction<int64_t>(7, 20, 2, 1);
+    // // vec_nnz=3
+    // proper_laso_construction<int64_t>(7, 20, 0, 2);
+    // proper_laso_construction<int64_t>(7, 20, 1, 2);
+    // proper_laso_construction<int64_t>(7, 20, 2, 2);
+    // // vec_nnz=7
+    // proper_laso_construction<int64_t>(7, 20, 0, 3);
+    // proper_laso_construction<int64_t>(7, 20, 1, 3);
+    // proper_laso_construction<int64_t>(7, 20, 2, 3);
 }
 
 
@@ -199,15 +250,15 @@ TEST_F(TestSparseSkOpConstruction, LASO_Dim_15by7) {
     // vec_nnz=1
     proper_laso_construction<int64_t>(15, 7, 0, 0);
     proper_laso_construction<int64_t>(15, 7, 1, 0);
-    // vec_nnz=2
-    proper_laso_construction<int64_t>(15, 7, 0, 1);
-    proper_laso_construction<int64_t>(15, 7, 1, 1);
-    // vec_nnz=3
-    proper_laso_construction<int64_t>(15, 7, 0, 2);
-    proper_laso_construction<int64_t>(15, 7, 1, 2);
-    // vec_nnz=7
-    proper_laso_construction<int64_t>(15, 7, 0, 3);
-    proper_laso_construction<int64_t>(15, 7, 1, 3);
+    // // vec_nnz=2
+    // proper_laso_construction<int64_t>(15, 7, 0, 1);
+    // proper_laso_construction<int64_t>(15, 7, 1, 1);
+    // // vec_nnz=3
+    // proper_laso_construction<int64_t>(15, 7, 0, 2);
+    // proper_laso_construction<int64_t>(15, 7, 1, 2);
+    // // vec_nnz=7
+    // proper_laso_construction<int64_t>(15, 7, 0, 3);
+    // proper_laso_construction<int64_t>(15, 7, 1, 3);
 }
 
 
@@ -216,18 +267,18 @@ TEST_F(TestSparseSkOpConstruction, LASO_Dim_7by20_int32) {
     proper_laso_construction<int>(7, 20, 0, 0);
     proper_laso_construction<int>(7, 20, 1, 0);
     proper_laso_construction<int>(7, 20, 2, 0);
-    // vec_nnz=2
-    proper_laso_construction<int>(7, 20, 0, 1);
-    proper_laso_construction<int>(7, 20, 1, 1);
-    proper_laso_construction<int>(7, 20, 2, 1);
-    // vec_nnz=3
-    proper_laso_construction<int>(7, 20, 0, 2);
-    proper_laso_construction<int>(7, 20, 1, 2);
-    proper_laso_construction<int>(7, 20, 2, 2);
-    // vec_nnz=7
-    proper_laso_construction<int>(7, 20, 0, 3);
-    proper_laso_construction<int>(7, 20, 1, 3);
-    proper_laso_construction<int>(7, 20, 2, 3);
+    // // vec_nnz=2
+    // proper_laso_construction<int>(7, 20, 0, 1);
+    // proper_laso_construction<int>(7, 20, 1, 1);
+    // proper_laso_construction<int>(7, 20, 2, 1);
+    // // vec_nnz=3
+    // proper_laso_construction<int>(7, 20, 0, 2);
+    // proper_laso_construction<int>(7, 20, 1, 2);
+    // proper_laso_construction<int>(7, 20, 2, 2);
+    // // vec_nnz=7
+    // proper_laso_construction<int>(7, 20, 0, 3);
+    // proper_laso_construction<int>(7, 20, 1, 3);
+    // proper_laso_construction<int>(7, 20, 2, 3);
 }
 
 
@@ -235,13 +286,13 @@ TEST_F(TestSparseSkOpConstruction, LASO_Dim_15by7_int32) {
     // vec_nnz=1
     proper_laso_construction<int>(15, 7, 0, 0);
     proper_laso_construction<int>(15, 7, 1, 0);
-    // vec_nnz=2
-    proper_laso_construction<int>(15, 7, 0, 1);
-    proper_laso_construction<int>(15, 7, 1, 1);
-    // vec_nnz=3
-    proper_laso_construction<int>(15, 7, 0, 2);
-    proper_laso_construction<int>(15, 7, 1, 2);
-    // vec_nnz=7
-    proper_laso_construction<int>(15, 7, 0, 3);
-    proper_laso_construction<int>(15, 7, 1, 3);
+    // // vec_nnz=2
+    // proper_laso_construction<int>(15, 7, 0, 1);
+    // proper_laso_construction<int>(15, 7, 1, 1);
+    // // vec_nnz=3
+    // proper_laso_construction<int>(15, 7, 0, 2);
+    // proper_laso_construction<int>(15, 7, 1, 2);
+    // // vec_nnz=7
+    // proper_laso_construction<int>(15, 7, 0, 3);
+    // proper_laso_construction<int>(15, 7, 1, 3);
 }
diff --git a/test/test_datastructures/test_spmats/common.hh b/test/test_datastructures/test_spmats/common.hh
index c262822e..b59a9abe 100644
--- a/test/test_datastructures/test_spmats/common.hh
+++ b/test/test_datastructures/test_spmats/common.hh
@@ -56,11 +56,11 @@ void iid_sparsify_random_dense(
     RandBLAS::RNGState<RNG> state
 ) { 
     auto spar = new T[n_rows * n_cols];
-    auto dist = RandBLAS::DenseDist(n_rows, n_cols, RandBLAS::DenseDistName::Uniform);
+    auto dist = RandBLAS::DenseDist(n_rows, n_cols, RandBLAS::ScalarDist::Uniform);
     auto next_state = RandBLAS::fill_dense(dist, spar, state);
 
     auto temp = new T[n_rows * n_cols];
-    auto D_mat = RandBLAS::DenseDist(n_rows, n_cols, RandBLAS::DenseDistName::Uniform);
+    auto D_mat = RandBLAS::DenseDist(n_rows, n_cols, RandBLAS::ScalarDist::Uniform);
     RandBLAS::fill_dense(D_mat, temp, next_state);
 
     // We'll pretend both of those matrices are column-major, regardless of the layout
@@ -108,7 +108,7 @@ void coo_from_diag(
     int64_t offset,
     RandBLAS::sparse_data::COOMatrix<T> &spmat
 ) {
-    spmat.reserve(nnz);
+    reserve_coo(nnz, spmat);
     int64_t ell = 0;
     if (offset >= 0) {
         randblas_require(nnz <= spmat.n_rows);
diff --git a/test/test_datastructures/test_spmats/test_coo.cc b/test/test_datastructures/test_spmats/test_coo.cc
index a9bb0bbd..07539923 100644
--- a/test/test_datastructures/test_spmats/test_coo.cc
+++ b/test/test_datastructures/test_spmats/test_coo.cc
@@ -33,6 +33,7 @@
 #include <algorithm>
 #include <vector>
 
+using RandBLAS::Axis;
 using namespace RandBLAS::sparse_data;
 using namespace RandBLAS::sparse_data::coo;
 using namespace test::test_datastructures::test_spmats;
@@ -52,7 +53,7 @@ void sparseskop_to_dense(
     auto idx = [D, layout](int64_t i, int64_t j) {
         return  (layout == Layout::ColMajor) ? (i + j*D.n_rows) : (j + i*D.n_cols);
     };
-    int64_t nnz = RandBLAS::sparse::nnz(S0);
+    int64_t nnz = S0.nnz;
     for (int64_t i = 0; i < nnz; ++i) {
         sint_t row = S0.rows[i];
         sint_t col = S0.cols[i];
@@ -138,15 +139,19 @@ class Test_SkOp_to_COO : public ::testing::Test {
     virtual void TearDown(){};
 
     template <typename T = double>
-    void sparse_skop_to_coo(int64_t d, int64_t m, int64_t key_index, int64_t nnz_index, RandBLAS::MajorAxis ma) {
-        RandBLAS::SparseSkOp<T> S(
-            {d, m, vec_nnzs[nnz_index], ma}, keys[key_index]
-        );
+    void sparse_skop_to_coo(int64_t d, int64_t m, int64_t key_index, int64_t nnz_index, Axis major_axis) {
+        RandBLAS::SparseDist D(d, m, vec_nnzs[nnz_index], major_axis);
+        RandBLAS::SparseSkOp<T> S(D, keys[key_index]);
         auto A = RandBLAS::sparse::coo_view_of_skop(S);
 
-        EXPECT_EQ(S.dist.n_rows, A.n_rows);
-        EXPECT_EQ(S.dist.n_cols, A.n_cols);
-        EXPECT_EQ(RandBLAS::sparse::nnz(S), A.nnz);
+        EXPECT_EQ(S.dist.n_rows,   A.n_rows);
+        EXPECT_EQ(S.dist.n_cols,   A.n_cols);
+        if (major_axis == Axis::Short) {
+            EXPECT_EQ(S.dist.full_nnz, A.nnz);
+        } {
+            EXPECT_GE(S.dist.full_nnz, A.nnz);
+            EXPECT_EQ(S.nnz, A.nnz);
+        }
 
         std::vector<T> S_dense(d * m);
         sparseskop_to_dense(S, S_dense.data(), Layout::ColMajor);
@@ -161,77 +166,77 @@ class Test_SkOp_to_COO : public ::testing::Test {
 };
 
 TEST_F(Test_SkOp_to_COO, SASO_Dim_7by20_nnz_1) {
-    sparse_skop_to_coo(7, 20, 0, 0, RandBLAS::MajorAxis::Short);
-    sparse_skop_to_coo(7, 20, 1, 0, RandBLAS::MajorAxis::Short);
-    sparse_skop_to_coo(7, 20, 2, 0, RandBLAS::MajorAxis::Short);
+    sparse_skop_to_coo(7, 20, 0, 0, RandBLAS::Axis::Short);
+    sparse_skop_to_coo(7, 20, 1, 0, RandBLAS::Axis::Short);
+    sparse_skop_to_coo(7, 20, 2, 0, RandBLAS::Axis::Short);
 }
 
 TEST_F(Test_SkOp_to_COO, SASO_Dim_7by20_nnz_2) {
-    sparse_skop_to_coo(7, 20, 0, 1, RandBLAS::MajorAxis::Short);
-    sparse_skop_to_coo(7, 20, 1, 1, RandBLAS::MajorAxis::Short);
-    sparse_skop_to_coo(7, 20, 2, 1, RandBLAS::MajorAxis::Short);
+    sparse_skop_to_coo(7, 20, 0, 1, RandBLAS::Axis::Short);
+    sparse_skop_to_coo(7, 20, 1, 1, RandBLAS::Axis::Short);
+    sparse_skop_to_coo(7, 20, 2, 1, RandBLAS::Axis::Short);
 }
 
 TEST_F(Test_SkOp_to_COO, SASO_Dim_7by20_nnz_3) {
-    sparse_skop_to_coo(7, 20, 0, 2, RandBLAS::MajorAxis::Short);
-    sparse_skop_to_coo(7, 20, 1, 2, RandBLAS::MajorAxis::Short);
-    sparse_skop_to_coo(7, 20, 2, 2, RandBLAS::MajorAxis::Short);
+    sparse_skop_to_coo(7, 20, 0, 2, RandBLAS::Axis::Short);
+    sparse_skop_to_coo(7, 20, 1, 2, RandBLAS::Axis::Short);
+    sparse_skop_to_coo(7, 20, 2, 2, RandBLAS::Axis::Short);
 }
 
 TEST_F(Test_SkOp_to_COO, SASO_Dim_7by20_nnz_7) {
-    sparse_skop_to_coo(7, 20, 0, 3, RandBLAS::MajorAxis::Short);
-    sparse_skop_to_coo(7, 20, 1, 3, RandBLAS::MajorAxis::Short);
-    sparse_skop_to_coo(7, 20, 2, 3, RandBLAS::MajorAxis::Short);
+    sparse_skop_to_coo(7, 20, 0, 3, RandBLAS::Axis::Short);
+    sparse_skop_to_coo(7, 20, 1, 3, RandBLAS::Axis::Short);
+    sparse_skop_to_coo(7, 20, 2, 3, RandBLAS::Axis::Short);
 }
 
 TEST_F(Test_SkOp_to_COO, SASO_Dim_15by7) {
-    sparse_skop_to_coo(15, 7, 0, 0, RandBLAS::MajorAxis::Short);
-    sparse_skop_to_coo(15, 7, 1, 0, RandBLAS::MajorAxis::Short);
+    sparse_skop_to_coo(15, 7, 0, 0, RandBLAS::Axis::Short);
+    sparse_skop_to_coo(15, 7, 1, 0, RandBLAS::Axis::Short);
 
-    sparse_skop_to_coo(15, 7, 0, 1, RandBLAS::MajorAxis::Short);
-    sparse_skop_to_coo(15, 7, 1, 1, RandBLAS::MajorAxis::Short);
+    sparse_skop_to_coo(15, 7, 0, 1, RandBLAS::Axis::Short);
+    sparse_skop_to_coo(15, 7, 1, 1, RandBLAS::Axis::Short);
 
-    sparse_skop_to_coo(15, 7, 0, 2, RandBLAS::MajorAxis::Short);
-    sparse_skop_to_coo(15, 7, 1, 2, RandBLAS::MajorAxis::Short);
+    sparse_skop_to_coo(15, 7, 0, 2, RandBLAS::Axis::Short);
+    sparse_skop_to_coo(15, 7, 1, 2, RandBLAS::Axis::Short);
 
-    sparse_skop_to_coo(15, 7, 0, 3, RandBLAS::MajorAxis::Short);
-    sparse_skop_to_coo(15, 7, 1, 3, RandBLAS::MajorAxis::Short);
+    sparse_skop_to_coo(15, 7, 0, 3, RandBLAS::Axis::Short);
+    sparse_skop_to_coo(15, 7, 1, 3, RandBLAS::Axis::Short);
 }
 
 TEST_F(Test_SkOp_to_COO, LASO_Dim_7by20_nnz_1) {
-    sparse_skop_to_coo(7, 20, 0, 0, RandBLAS::MajorAxis::Long);
-    sparse_skop_to_coo(7, 20, 1, 0, RandBLAS::MajorAxis::Long);
-    sparse_skop_to_coo(7, 20, 2, 0, RandBLAS::MajorAxis::Long);
+    sparse_skop_to_coo(7, 20, 0, 0, RandBLAS::Axis::Long);
+    sparse_skop_to_coo(7, 20, 1, 0, RandBLAS::Axis::Long);
+    sparse_skop_to_coo(7, 20, 2, 0, RandBLAS::Axis::Long);
 }
 
 TEST_F(Test_SkOp_to_COO, LASO_Dim_7by20_nnz_2) {
-    sparse_skop_to_coo(7, 20, 0, 1, RandBLAS::MajorAxis::Long);
-    sparse_skop_to_coo(7, 20, 1, 1, RandBLAS::MajorAxis::Long);
-    sparse_skop_to_coo(7, 20, 2, 1, RandBLAS::MajorAxis::Long);
+    sparse_skop_to_coo(7, 20, 0, 1, RandBLAS::Axis::Long);
+    sparse_skop_to_coo(7, 20, 1, 1, RandBLAS::Axis::Long);
+    sparse_skop_to_coo(7, 20, 2, 1, RandBLAS::Axis::Long);
 }
 
 TEST_F(Test_SkOp_to_COO, LASO_Dim_7by20_nnz_3) {
-    sparse_skop_to_coo(7, 20, 0, 2, RandBLAS::MajorAxis::Long);
-    sparse_skop_to_coo(7, 20, 1, 2, RandBLAS::MajorAxis::Long);
-    sparse_skop_to_coo(7, 20, 2, 2, RandBLAS::MajorAxis::Long);
+    sparse_skop_to_coo(7, 20, 0, 2, RandBLAS::Axis::Long);
+    sparse_skop_to_coo(7, 20, 1, 2, RandBLAS::Axis::Long);
+    sparse_skop_to_coo(7, 20, 2, 2, RandBLAS::Axis::Long);
 }
 
 TEST_F(Test_SkOp_to_COO, LASO_Dim_7by20_nnz_7) {
-    sparse_skop_to_coo(7, 20, 0, 3, RandBLAS::MajorAxis::Long);
-    sparse_skop_to_coo(7, 20, 1, 3, RandBLAS::MajorAxis::Long);
-    sparse_skop_to_coo(7, 20, 2, 3, RandBLAS::MajorAxis::Long);
+    sparse_skop_to_coo(7, 20, 0, 3, RandBLAS::Axis::Long);
+    sparse_skop_to_coo(7, 20, 1, 3, RandBLAS::Axis::Long);
+    sparse_skop_to_coo(7, 20, 2, 3, RandBLAS::Axis::Long);
 }
 
 TEST_F(Test_SkOp_to_COO, LASO_Dim_15by7) {
-    sparse_skop_to_coo(15, 7, 0, 0, RandBLAS::MajorAxis::Long);
-    sparse_skop_to_coo(15, 7, 1, 0, RandBLAS::MajorAxis::Long);
+    sparse_skop_to_coo(15, 7, 0, 0, RandBLAS::Axis::Long);
+    sparse_skop_to_coo(15, 7, 1, 0, RandBLAS::Axis::Long);
 
-    sparse_skop_to_coo(15, 7, 0, 1, RandBLAS::MajorAxis::Long);
-    sparse_skop_to_coo(15, 7, 1, 1, RandBLAS::MajorAxis::Long);
+    sparse_skop_to_coo(15, 7, 0, 1, RandBLAS::Axis::Long);
+    sparse_skop_to_coo(15, 7, 1, 1, RandBLAS::Axis::Long);
 
-    sparse_skop_to_coo(15, 7, 0, 2, RandBLAS::MajorAxis::Long);
-    sparse_skop_to_coo(15, 7, 1, 2, RandBLAS::MajorAxis::Long);
+    sparse_skop_to_coo(15, 7, 0, 2, RandBLAS::Axis::Long);
+    sparse_skop_to_coo(15, 7, 1, 2, RandBLAS::Axis::Long);
 
-    sparse_skop_to_coo(15, 7, 0, 3, RandBLAS::MajorAxis::Long);
-    sparse_skop_to_coo(15, 7, 1, 3, RandBLAS::MajorAxis::Long);
+    sparse_skop_to_coo(15, 7, 0, 3, RandBLAS::Axis::Long);
+    sparse_skop_to_coo(15, 7, 1, 3, RandBLAS::Axis::Long);
 }
diff --git a/test/test_datastructures/test_spmats/test_csc.cc b/test/test_datastructures/test_spmats/test_csc.cc
index 7c081fc6..aea06e9a 100644
--- a/test/test_datastructures/test_spmats/test_csc.cc
+++ b/test/test_datastructures/test_spmats/test_csc.cc
@@ -55,7 +55,7 @@ class TestCSC_Conversions : public ::testing::Test {
         iid_sparsify_random_dense(m, n, layout, dn_mat, p, s);
 
         // Step 2. convert the dense representation into a CSC matrix
-        CSCMatrix<T> spmat(m, n, IndexBase::Zero);
+        CSCMatrix<T> spmat(m, n);
         dense_to_csc(layout, dn_mat, 0.0, spmat);
 
         // Step 3. reconstruct the dense representation of dn_mat from the CSC matrix.
diff --git a/test/test_datastructures/test_spmats/test_csr.cc b/test/test_datastructures/test_spmats/test_csr.cc
index 294cb0f2..501cde57 100644
--- a/test/test_datastructures/test_spmats/test_csr.cc
+++ b/test/test_datastructures/test_spmats/test_csr.cc
@@ -50,8 +50,8 @@ class TestCSR_Conversions : public ::testing::Test
 
     template <typename T = double>
     static void test_csr_to_dense_diagonal(int64_t n) {
-        CSRMatrix<T> A(n, n, IndexBase::Zero);
-        A.reserve(n);
+        CSRMatrix<T> A(n, n);
+        reserve_csr(n, A);
         for (int i = 0; i < n; ++i) {
             A.vals[i] = 1.0 + (T) i;
             A.rowptr[i] = i;
@@ -80,7 +80,7 @@ class TestCSR_Conversions : public ::testing::Test
         iid_sparsify_random_dense(m, n, layout, dn_mat, p, s);
 
         // Step 2. convert the dense representation into a CSR matrix
-        CSRMatrix<T> spmat(m, n, IndexBase::Zero);
+        CSRMatrix<T> spmat(m, n);
         dense_to_csr(layout, dn_mat, 0.0, spmat);
 
         // Step 3. reconstruct the dense representation of dn_mat from the CSR matrix.
diff --git a/test/test_handrolled_lapack.cc b/test/test_handrolled_lapack.cc
index 3e1aa2eb..2c7335ca 100644
--- a/test/test_handrolled_lapack.cc
+++ b/test/test_handrolled_lapack.cc
@@ -12,7 +12,7 @@
 #include "RandBLAS/util.hh"
 #include "RandBLAS/dense_skops.hh"
 using RandBLAS::DenseDist;
-using RandBLAS::DenseDistName;
+using RandBLAS::ScalarDist;
 using RandBLAS::RNGState;
 
 #include <iostream>
@@ -32,24 +32,24 @@ class TestHandrolledCholesky : public ::testing::Test {
         DenseDist D(m, n);
         std::vector<T> A(n*n);
         std::vector<T> B(m*n);
-        T iso_scale = std::pow(RandBLAS::isometry_scale_factor<T>(D), 2);
+        T iso_scale = std::pow(D.isometry_scale, 2);
         RNGState state(key);
         RandBLAS::fill_dense(D, B.data(), state);
         std::vector<T> C(B);
 
         // define positive definite A
         blas::syrk(layout, blas::Uplo::Upper, blas::Op::Trans, n, m, iso_scale, B.data(), m, 0.0, A.data(), n);
-        RandBLAS::util::symmetrize(layout, blas::Uplo::Upper, n, A.data(), n);
+        RandBLAS::symmetrize(layout, blas::Uplo::Upper, n, A.data(), n);
         // overwrite A by its upper-triangular cholesky factor
         cholfunc(n, A.data());
-        RandBLAS::util::overwrite_triangle(layout, blas::Uplo::Lower, n, 1, (T) 0.0, A.data(), n);
+        RandBLAS::overwrite_triangle(layout, blas::Uplo::Lower, n, 1, A.data(), n);
 
         // compute the gram matrix of A's cholesky factor
         blas::syrk(layout, blas::Uplo::Upper, blas::Op::Trans, n, n, 1.0, A.data(), n, 0.0, B.data(), n);
-        RandBLAS::util::symmetrize(layout, blas::Uplo::Upper, n, B.data(), n);
+        RandBLAS::symmetrize(layout, blas::Uplo::Upper, n, B.data(), n);
         // recompute A
         blas::syrk(layout, blas::Uplo::Upper, blas::Op::Trans, n, m, iso_scale, C.data(), m, 0.0, A.data(), n);
-        RandBLAS::util::symmetrize(layout, blas::Uplo::Upper, n, A.data(), n);
+        RandBLAS::symmetrize(layout, blas::Uplo::Upper, n, A.data(), n);
 
         test::comparison::matrices_approx_equal(layout, blas::Op::NoTrans, n, n, B.data(), n, A.data(), n, 
             __PRETTY_FUNCTION__, __FILE__, __LINE__
@@ -175,9 +175,9 @@ class TestHandrolledQR : public ::testing::Test {
 
     template <typename T>
     void run_cholqr_gaussian(int m, int n, int b, uint32_t key) {
-        DenseDist D(m, n, DenseDistName::Gaussian);
+        DenseDist D(m, n, ScalarDist::Gaussian);
         std::vector<T> A(m*n);
-        T iso_scale = RandBLAS::isometry_scale_factor<T>(D);
+        T iso_scale = D.isometry_scale;
         RNGState state(key);
         RandBLAS::fill_dense(D, A.data(), state);
         blas::scal(m*n, iso_scale, A.data(), 1);
@@ -191,9 +191,9 @@ class TestHandrolledQR : public ::testing::Test {
 
     template <typename T>
     void run_qr_blocked_cgs(int m, int n, int b, uint32_t key) {
-        DenseDist D(m, n, DenseDistName::Gaussian);
+        DenseDist D(m, n, ScalarDist::Gaussian);
         std::vector<T> A(m*n);
-        T iso_scale = RandBLAS::isometry_scale_factor<T>(D);
+        T iso_scale = D.isometry_scale;
         RNGState state(key);
         RandBLAS::fill_dense(D, A.data(), state);
         blas::scal(m*n, iso_scale, A.data(), 1);
@@ -251,7 +251,7 @@ std::vector<T> posdef_with_random_eigvecs(std::vector<T> &eigvals, uint32_t key)
         randblas_require(ev > 0);
     std::vector<T> work0(n*n, 0.0);
     T* work0_buff = work0.data();
-    DenseDist distn(n, n, DenseDistName::Gaussian);
+    DenseDist distn(n, n, ScalarDist::Gaussian);
     RNGState state(key);
     RandBLAS::fill_dense(distn, work0_buff, state);
     std::vector<T> work1(n*n, 0.0);
@@ -261,7 +261,7 @@ std::vector<T> posdef_with_random_eigvecs(std::vector<T> &eigvals, uint32_t key)
         blas::scal(n, std::sqrt(eigvals[i]), work0_buff + i*n, 1);
     std::vector<T> out(n*n, 0.0);
     blas::syrk(blas::Layout::ColMajor, blas::Uplo::Upper, blas::Op::NoTrans, n, n, (T)1.0, work0_buff, n, (T)0.0, out.data(), n);
-    RandBLAS::util::symmetrize(blas::Layout::ColMajor, blas::Uplo::Upper, n, out.data(), n);
+    RandBLAS::symmetrize(blas::Layout::ColMajor, blas::Uplo::Upper, n, out.data(), n);
     return out;
 }
 
diff --git a/test/test_matmul_cores/linop_common.hh b/test/test_matmul_cores/linop_common.hh
index 82def271..264a644e 100644
--- a/test/test_matmul_cores/linop_common.hh
+++ b/test/test_matmul_cores/linop_common.hh
@@ -73,19 +73,12 @@ auto random_matrix(int64_t m, int64_t n, RNGState<RNG> s) {
     std::vector<T> A(m * n);
     DenseDist DA(m, n);
     auto next_state = RandBLAS::fill_dense(DA, A.data(), s);
-    std::tuple<std::vector<T>, Layout, RNGState<RNG>> t{A, RandBLAS::dist_to_layout(DA), next_state};
+    std::tuple<std::vector<T>, Layout, RNGState<RNG>> t{A, DA.natural_layout, next_state};
     return t;
 }
 
-template <typename T>
-dims64_t dimensions(SparseSkOp<T> &S) {return {S.dist.n_rows, S.dist.n_cols}; }
-
-template <typename T>
-dims64_t dimensions(DenseSkOp<T> &S) {return {S.dist.n_rows, S.dist.n_cols};}
-
-template <SparseMatrix SpMat>
-dims64_t dimensions(SpMat &S) {return {S.n_rows, S.n_cols};}
-
+template <typename LINOP>
+dims64_t dimensions(LINOP &S) {return {S.n_rows, S.n_cols};}
 
 template <typename T, SparseMatrix SpMat>
 void to_explicit_buffer(SpMat &a, T *mat_a, Layout layout) {
@@ -134,12 +127,8 @@ void to_explicit_buffer(DenseSkOp<T> &s, T *mat_s, Layout layout) {
 }
 
 
+// MARK:      Multiply from the LEFT
 ////////////////////////////////////////////////////////////////////////
-//
-//
-//      Multiply from the LEFT
-//
-//
 ////////////////////////////////////////////////////////////////////////
 
 
@@ -465,12 +454,8 @@ void test_left_apply_to_transposed(
 }
 
 
+// MARK:      Multiply from the RIGHT
 ////////////////////////////////////////////////////////////////////////
-//
-//
-//      Multiply from the RIGHT
-//
-//
 ////////////////////////////////////////////////////////////////////////
 
 template <typename T>
@@ -627,7 +612,7 @@ void test_right_apply_submatrix_to_eye(
 }
 
 template <typename T, typename LinOp>
-void test_right_apply_tranpose_to_eye(
+void test_right_apply_transpose_to_eye(
     // B = eye * S^T, where S is d-by-n, so eye is order n and B is n-by-d
     LinOp &S, Layout layout, int threads = 0
 ) {
diff --git a/test/test_matmul_cores/test_lskges.cc b/test/test_matmul_cores/test_lskges.cc
index 1bbd0d03..87104022 100644
--- a/test/test_matmul_cores/test_lskges.cc
+++ b/test/test_matmul_cores/test_lskges.cc
@@ -32,7 +32,7 @@
 
 using RandBLAS::SparseDist;
 using RandBLAS::SparseSkOp;
-using RandBLAS::MajorAxis;
+using RandBLAS::Axis;
 using namespace test::linop_common;
 
 
@@ -48,7 +48,7 @@ class TestLSKGES : public ::testing::Test
 
     template <typename T>
     static void apply(
-        MajorAxis major_axis,
+        Axis major_axis,
         int64_t d,
         int64_t m,
         int64_t n,
@@ -57,7 +57,8 @@ class TestLSKGES : public ::testing::Test
         int64_t nnz_index,
         int threads
     ) {
-        SparseSkOp<T> S0({d, m, vec_nnzs[nnz_index], major_axis}, keys[key_index]);
+        SparseDist D0(d, m, vec_nnzs[nnz_index], major_axis);
+        SparseSkOp<T> S0(D0, keys[key_index]);
         test_left_apply_to_random<T>(1.0, S0, n, 0.0, layout, threads);
     }
 
@@ -73,7 +74,8 @@ class TestLSKGES : public ::testing::Test
         blas::Layout layout
     ) {
         int64_t vec_nnz = d0 / 3; // this is actually quite dense. 
-        SparseSkOp<T> S0({d0, m0, vec_nnz, MajorAxis::Short}, seed);
+        SparseDist D0(d0, m0, vec_nnz, Axis::Short);
+        SparseSkOp<T> S0(D0, seed);
         test_left_apply_submatrix_to_eye<T>(1.0, S0, d1, m1, S_ro, S_co, layout, 0.0);
     }
 
@@ -87,35 +89,30 @@ class TestLSKGES : public ::testing::Test
         blas::Layout layout
     ) {
         int64_t vec_nnz = d / 2;
-        SparseDist DS = {d, m, vec_nnz, MajorAxis::Short};
+        SparseDist DS(d, m, vec_nnz, Axis::Short);
         SparseSkOp<T> S(DS, key);
         test_left_apply_submatrix_to_eye(alpha, S, d, m, 0, 0, layout, beta);
     }
 
     template <typename T>
     static void transpose_S(
-        MajorAxis major_axis,
+        Axis major_axis,
         uint32_t key,
         int64_t m,
         int64_t d,
         blas::Layout layout
     ) {
         randblas_require(m > d);
-        bool is_saso = (major_axis == MajorAxis::Short);
+        bool is_saso = (major_axis == Axis::Short);
         int64_t vec_nnz = (is_saso) ?  d/2 : m/2;
-        SparseDist Dt = {
-            .n_rows = m,
-            .n_cols = d,
-            .vec_nnz = vec_nnz,
-            .major_axis = major_axis
-        };
+        SparseDist Dt(m, d, vec_nnz, major_axis);
         SparseSkOp<T> S0(Dt, key);
         test_left_apply_transpose_to_eye<T>(S0, layout);
     }
 
     template <typename T>
     static void submatrix_A(
-        MajorAxis major_axis,
+        Axis major_axis,
         uint32_t seed_S0, // seed for S0
         int64_t d, // rows in S0
         int64_t m, // cols in S0, and rows in A.
@@ -127,21 +124,16 @@ class TestLSKGES : public ::testing::Test
         blas::Layout layout
     ) {
         // Define the distribution for S0.
-        bool is_saso = (major_axis == MajorAxis::Short);
+        bool is_saso = (major_axis == Axis::Short);
         int64_t vec_nnz = (is_saso) ?  d/2 : m/2;
-        SparseDist D = {
-            .n_rows = d,
-            .n_cols = m,
-            .vec_nnz = vec_nnz,
-            .major_axis = major_axis
-        };
+        SparseDist D(d, m, vec_nnz, major_axis);
         SparseSkOp<T> S0(D, seed_S0);
         test_left_apply_to_submatrix<T>(S0, n, m0, n0, A_ro, A_co, layout);
     }
 
     template <typename T>
     static void transpose_A(
-        MajorAxis major_axis,
+        Axis major_axis,
         uint32_t seed_S0, // seed for S0
         int64_t d, // rows in S0
         int64_t m, // cols in S0, and rows in A.
@@ -149,14 +141,9 @@ class TestLSKGES : public ::testing::Test
         blas::Layout layout
     ) {
         // Define the distribution for S0.
-        bool is_saso = (major_axis == MajorAxis::Short);
+        bool is_saso = (major_axis == Axis::Short);
         int64_t vec_nnz = (is_saso) ?  d/2 : m/2;
-        SparseDist D = {
-            .n_rows = d,
-            .n_cols = m,
-            .vec_nnz = vec_nnz,
-            .major_axis = major_axis
-        };
+        SparseDist D(d, m, vec_nnz, major_axis);
         SparseSkOp<T> S0(D, seed_S0);
         test_left_apply_to_transposed<T>(S0, n, layout);
     }
@@ -175,10 +162,10 @@ TEST_F(TestLSKGES, sketch_saso_rowMajor_oneThread)
 {
     for (int64_t k_idx : {0, 1, 2}) {
         for (int64_t nz_idx: {4, 1, 2, 3, 0}) {
-            apply<double>(MajorAxis::Short,
+            apply<double>(Axis::Short,
                 19, 201, 12, blas::Layout::RowMajor, k_idx, nz_idx, 1
             );
-            apply<float>(MajorAxis::Short,
+            apply<float>(Axis::Short,
                 19, 201, 12, blas::Layout::RowMajor, k_idx, nz_idx, 1
             );
         }
@@ -189,8 +176,8 @@ TEST_F(TestLSKGES, sketch_laso_rowMajor_oneThread)
 {
     for (int64_t k_idx : {0, 1, 2}) {
         for (int64_t nz_idx: {4, 1, 2, 3, 0}) {
-            apply<double>(MajorAxis::Long, 19, 201, 12, blas::Layout::RowMajor, k_idx, nz_idx, 1);
-            apply<float>(MajorAxis::Long, 19, 201, 12, blas::Layout::RowMajor, k_idx, nz_idx, 1);
+            apply<double>(Axis::Long, 19, 201, 12, blas::Layout::RowMajor, k_idx, nz_idx, 1);
+            apply<float>(Axis::Long, 19, 201, 12, blas::Layout::RowMajor, k_idx, nz_idx, 1);
         }
     }
 }
@@ -200,10 +187,10 @@ TEST_F(TestLSKGES, sketch_saso_rowMajor_FourThreads)
 {
     for (int64_t k_idx : {0, 1, 2}) {
         for (int64_t nz_idx: {4, 1, 2, 3, 0}) {
-            apply<double>(MajorAxis::Short,
+            apply<double>(Axis::Short,
                 19, 201, 12, blas::Layout::RowMajor, k_idx, nz_idx, 4
             );
-            apply<float>(MajorAxis::Short,
+            apply<float>(Axis::Short,
                 19, 201, 12, blas::Layout::RowMajor, k_idx, nz_idx, 4
             );
         }
@@ -215,8 +202,8 @@ TEST_F(TestLSKGES, sketch_saso_colMajor_oneThread)
 {
     for (int64_t k_idx : {0, 1, 2}) {
         for (int64_t nz_idx: {4, 1, 2, 3, 0}) {
-            apply<double>(MajorAxis::Short, 19, 201, 12, blas::Layout::ColMajor, k_idx, nz_idx, 1);
-            apply<float>(MajorAxis::Short, 19, 201, 12, blas::Layout::ColMajor, k_idx, nz_idx, 1);
+            apply<double>(Axis::Short, 19, 201, 12, blas::Layout::ColMajor, k_idx, nz_idx, 1);
+            apply<float>(Axis::Short, 19, 201, 12, blas::Layout::ColMajor, k_idx, nz_idx, 1);
         }
     }
 }
@@ -225,8 +212,8 @@ TEST_F(TestLSKGES, sketch_laso_colMajor_oneThread)
 {
     for (int64_t k_idx : {0, 1, 2}) {
         for (int64_t nz_idx: {4, 1, 2, 3, 0}) {
-            apply<double>(MajorAxis::Long, 19, 201, 12, blas::Layout::ColMajor, k_idx, nz_idx, 1);
-            apply<float>(MajorAxis::Long, 19, 201, 12, blas::Layout::ColMajor, k_idx, nz_idx, 1);
+            apply<double>(Axis::Long, 19, 201, 12, blas::Layout::ColMajor, k_idx, nz_idx, 1);
+            apply<float>(Axis::Long, 19, 201, 12, blas::Layout::ColMajor, k_idx, nz_idx, 1);
         }
     }
 }
@@ -236,8 +223,8 @@ TEST_F(TestLSKGES, sketch_saso_colMajor_fourThreads)
 {
     for (int64_t k_idx : {0, 1, 2}) {
         for (int64_t nz_idx: {4, 1, 2, 3, 0}) {
-            apply<double>(MajorAxis::Short, 19, 201, 12, blas::Layout::ColMajor, k_idx, nz_idx, 4);
-            apply<float>(MajorAxis::Short, 19, 201, 12, blas::Layout::ColMajor, k_idx, nz_idx, 4);
+            apply<double>(Axis::Short, 19, 201, 12, blas::Layout::ColMajor, k_idx, nz_idx, 4);
+            apply<float>(Axis::Short, 19, 201, 12, blas::Layout::ColMajor, k_idx, nz_idx, 4);
         }
     }
 }
@@ -257,10 +244,10 @@ TEST_F(TestLSKGES, lift_saso_rowMajor_oneThread)
 {
     for (int64_t k_idx : {0, 1, 2}) {
         for (int64_t nz_idx: {4, 1, 2, 3, 0}) {
-            apply<double>(MajorAxis::Short,
+            apply<double>(Axis::Short,
                 201, 19, 12, blas::Layout::RowMajor, k_idx, nz_idx, 1
             );
-            apply<float>(MajorAxis::Short,
+            apply<float>(Axis::Short,
                 201, 19, 12, blas::Layout::RowMajor, k_idx, nz_idx, 1
             );
         }
@@ -271,8 +258,8 @@ TEST_F(TestLSKGES, lift_laso_rowMajor_oneThread)
 {
     for (int64_t k_idx : {0, 1, 2}) {
         for (int64_t nz_idx: {4, 1, 2, 3, 0}) {
-            apply<double>(MajorAxis::Long, 201, 19, 12, blas::Layout::RowMajor, k_idx, nz_idx, 1);
-            apply<float>(MajorAxis::Long, 201, 19, 12, blas::Layout::RowMajor, k_idx, nz_idx, 1);
+            apply<double>(Axis::Long, 201, 19, 12, blas::Layout::RowMajor, k_idx, nz_idx, 1);
+            apply<float>(Axis::Long, 201, 19, 12, blas::Layout::RowMajor, k_idx, nz_idx, 1);
         }
     }
 }
@@ -281,8 +268,8 @@ TEST_F(TestLSKGES, lift_saso_colMajor_oneThread)
 {
     for (int64_t k_idx : {0, 1, 2}) {
         for (int64_t nz_idx: {4, 1, 2, 3, 0}) {
-            apply<double>(MajorAxis::Short, 201, 19, 12, blas::Layout::ColMajor, k_idx, nz_idx, 1);
-            apply<float>(MajorAxis::Short, 201, 19, 12, blas::Layout::ColMajor, k_idx, nz_idx, 1);
+            apply<double>(Axis::Short, 201, 19, 12, blas::Layout::ColMajor, k_idx, nz_idx, 1);
+            apply<float>(Axis::Short, 201, 19, 12, blas::Layout::ColMajor, k_idx, nz_idx, 1);
         }
     }
 }
@@ -291,8 +278,8 @@ TEST_F(TestLSKGES, lift_laso_colMajor_oneThread)
 {
     for (int64_t k_idx : {0, 1, 2}) {
         for (int64_t nz_idx: {4, 1, 2, 3, 0}) {
-            apply<double>(MajorAxis::Long, 201, 19, 12, blas::Layout::ColMajor, k_idx, nz_idx, 1);
-            apply<float>(MajorAxis::Long, 201, 19, 12, blas::Layout::ColMajor, k_idx, nz_idx, 1);
+            apply<double>(Axis::Long, 201, 19, 12, blas::Layout::ColMajor, k_idx, nz_idx, 1);
+            apply<float>(Axis::Long, 201, 19, 12, blas::Layout::ColMajor, k_idx, nz_idx, 1);
         }
     }
 }
@@ -426,25 +413,25 @@ TEST_F(TestLSKGES, subset_cols_s_rowmajor2)
 TEST_F(TestLSKGES, transpose_saso_double_colmajor)
 {
     uint32_t seed = 0;
-    transpose_S<double>(MajorAxis::Short, seed, 21, 4, blas::Layout::ColMajor);
+    transpose_S<double>(Axis::Short, seed, 21, 4, blas::Layout::ColMajor);
 }
 
 TEST_F(TestLSKGES, transpose_laso_double_colmajor)
 {
     uint32_t seed = 0;
-    transpose_S<double>(MajorAxis::Long, seed, 21, 4, blas::Layout::ColMajor);
+    transpose_S<double>(Axis::Long, seed, 21, 4, blas::Layout::ColMajor);
 }
 
 TEST_F(TestLSKGES, transpose_saso_double_rowmajor)
 {
     uint32_t seed = 0;
-    transpose_S<double>(MajorAxis::Short, seed, 21, 4, blas::Layout::RowMajor);
+    transpose_S<double>(Axis::Short, seed, 21, 4, blas::Layout::RowMajor);
 }
 
 TEST_F(TestLSKGES, transpose_laso_double_rowmajor)
 {
     uint32_t seed = 0;
-    transpose_S<double>(MajorAxis::Long, seed, 21, 4, blas::Layout::RowMajor);
+    transpose_S<double>(Axis::Long, seed, 21, 4, blas::Layout::RowMajor);
 }
 
 
@@ -461,7 +448,7 @@ TEST_F(TestLSKGES, saso_submatrix_a_colmajor)
 {
     for (uint32_t seed : {0})
         submatrix_A<double>(
-            MajorAxis::Short,
+            Axis::Short,
             seed,
             3, // number of rows in sketch
             10, 5, // (rows, cols) in A.
@@ -476,7 +463,7 @@ TEST_F(TestLSKGES, saso_submatrix_a_rowmajor)
 {
     for (uint32_t seed : {0})
         submatrix_A<double>(
-            MajorAxis::Short,
+            Axis::Short,
             seed,
             3, // number of rows in sketch
             10, 5, // (rows, cols) in A.
@@ -491,7 +478,7 @@ TEST_F(TestLSKGES, laso_submatrix_a_colmajor)
 {
     for (uint32_t seed : {0})
         submatrix_A<double>(
-            MajorAxis::Long,
+            Axis::Long,
             seed,
             3, // number of rows in sketch
             10, 5, // (rows, cols) in A.
@@ -506,7 +493,7 @@ TEST_F(TestLSKGES, laso_submatrix_a_rowmajor)
 {
     for (uint32_t seed : {0})
         submatrix_A<double>(
-            MajorAxis::Long,
+            Axis::Long,
             seed,
             3, // number of rows in sketch
             10, 5, // (rows, cols) in A.
@@ -530,25 +517,25 @@ TEST_F(TestLSKGES, laso_submatrix_a_rowmajor)
 TEST_F(TestLSKGES, saso_times_trans_A_colmajor)
 {
     uint32_t seed = 0;
-    transpose_A<double>(MajorAxis::Short, seed, 7, 22, 5, blas::Layout::ColMajor);
+    transpose_A<double>(Axis::Short, seed, 7, 22, 5, blas::Layout::ColMajor);
 }
 
 TEST_F(TestLSKGES, laso_times_trans_A_colmajor)
 {
     uint32_t seed = 0;
-    transpose_A<double>(MajorAxis::Long, seed, 7, 22, 5, blas::Layout::ColMajor);
+    transpose_A<double>(Axis::Long, seed, 7, 22, 5, blas::Layout::ColMajor);
 }
 
 TEST_F(TestLSKGES, saso_times_trans_A_rowmajor)
 {
     uint32_t seed = 0;
-    transpose_A<double>(MajorAxis::Short, seed, 7, 22, 5, blas::Layout::RowMajor);
+    transpose_A<double>(Axis::Short, seed, 7, 22, 5, blas::Layout::RowMajor);
 }
 
 TEST_F(TestLSKGES, laso_times_trans_A_rowmajor)
 {
     uint32_t seed = 0;
-    transpose_A<double>(MajorAxis::Long, seed, 7, 22, 5, blas::Layout::RowMajor);
+    transpose_A<double>(Axis::Long, seed, 7, 22, 5, blas::Layout::RowMajor);
 }
 
 
diff --git a/test/test_matmul_cores/test_rskge3.cc b/test/test_matmul_cores/test_rskge3.cc
index 118a0276..eeb90050 100644
--- a/test/test_matmul_cores/test_rskge3.cc
+++ b/test/test_matmul_cores/test_rskge3.cc
@@ -67,7 +67,7 @@ class TestRSKGE3 : public ::testing::Test
     ) {
         DenseDist Dt(d, m);
         DenseSkOp<T> S0(Dt, seed);
-        test_right_apply_tranpose_to_eye<T>(S0, layout);
+        test_right_apply_transpose_to_eye<T>(S0, layout);
     }
 
     template <typename T>
diff --git a/test/test_matmul_cores/test_rskges.cc b/test/test_matmul_cores/test_rskges.cc
index 35b62f61..86db0bed 100644
--- a/test/test_matmul_cores/test_rskges.cc
+++ b/test/test_matmul_cores/test_rskges.cc
@@ -51,13 +51,11 @@ class TestRSKGES : public ::testing::Test
         uint32_t seed,
         int64_t m,
         int64_t d,
-        RandBLAS::MajorAxis major_axis,
+        RandBLAS::Axis major_axis,
         int64_t vec_nnz,
         Layout layout
     ) {
-        SparseDist D = {
-            .n_rows = m, .n_cols = d, .vec_nnz = vec_nnz, .major_axis = major_axis
-        };
+        SparseDist D(m, d, vec_nnz, major_axis);
         SparseSkOp<T> S0(D, seed);
         RandBLAS::fill_sparse(S0);
         test_right_apply_submatrix_to_eye<T>(1.0, S0, m, d, 0, 0, layout, 0.0, 0);
@@ -65,7 +63,7 @@ class TestRSKGES : public ::testing::Test
 
     template <typename T>
     static void apply(
-        RandBLAS::MajorAxis major_axis,
+        RandBLAS::Axis major_axis,
         int64_t d,
         int64_t m,
         int64_t n,
@@ -74,14 +72,10 @@ class TestRSKGES : public ::testing::Test
         int64_t nnz_index,
         int threads
     ) {
-        SparseDist D = {
-            .n_rows=n, .n_cols=d, .vec_nnz=vec_nnzs[nnz_index], .major_axis=major_axis
-        };
+        SparseDist D(n, d, vec_nnzs[nnz_index], major_axis);
         SparseSkOp<T> S0(D, keys[key_index]);
         RandBLAS::fill_sparse(S0);
         test_right_apply_to_random<T>(1.0, S0, m, layout, 0.0, threads);
-        
-
     }
 
     template <typename T>
@@ -98,9 +92,8 @@ class TestRSKGES : public ::testing::Test
         randblas_require(d0 >= d1);
         randblas_require(n0 >= n1);
         int64_t vec_nnz = d0 / 3; // this is actually quite dense. 
-        SparseSkOp<T> S0(
-            {n0, d0, vec_nnz, RandBLAS::MajorAxis::Short}, seed
-        );
+        SparseDist D0(n0, d0, vec_nnz, RandBLAS::Axis::Short);
+        SparseSkOp<T> S0(D0, seed);
         RandBLAS::fill_sparse(S0);
         test_right_apply_submatrix_to_eye<T>(1.0, S0, n1, d1, S_ro, S_co, layout, 0.0, 0);
     }
@@ -117,25 +110,25 @@ class TestRSKGES : public ::testing::Test
 TEST_F(TestRSKGES, right_sketch_eye_saso_colmajor)
 {
     for (uint32_t seed : {0})
-        sketch_eye<double>(seed, 10, 3, RandBLAS::MajorAxis::Short, 1, Layout::ColMajor);
+        sketch_eye<double>(seed, 10, 3, RandBLAS::Axis::Short, 1, Layout::ColMajor);
 }
 
 TEST_F(TestRSKGES, right_sketch_eye_saso_rowmajor)
 {
     for (uint32_t seed : {0})
-        sketch_eye<double>(seed, 10, 3, RandBLAS::MajorAxis::Short, 1,  Layout::RowMajor);
+        sketch_eye<double>(seed, 10, 3, RandBLAS::Axis::Short, 1,  Layout::RowMajor);
 }
 
 TEST_F(TestRSKGES, right_sketch_eye_laso_colmajor)
 {
     for (uint32_t seed : {0})
-        sketch_eye<double>(seed, 10, 3, RandBLAS::MajorAxis::Long, 1,  Layout::ColMajor);
+        sketch_eye<double>(seed, 10, 3, RandBLAS::Axis::Long, 1,  Layout::ColMajor);
 }
 
 TEST_F(TestRSKGES, right_sketch_eye_laso_rowmajor)
 {
     for (uint32_t seed : {0})
-        sketch_eye<double>(seed, 10, 3, RandBLAS::MajorAxis::Long, 1,  Layout::RowMajor);
+        sketch_eye<double>(seed, 10, 3, RandBLAS::Axis::Long, 1,  Layout::RowMajor);
 }
 
 
@@ -150,25 +143,25 @@ TEST_F(TestRSKGES, right_sketch_eye_laso_rowmajor)
 TEST_F(TestRSKGES, right_lift_eye_saso_colmajor)
 {
     for (uint32_t seed : {0})
-        sketch_eye<double>(seed, 22, 51, RandBLAS::MajorAxis::Short, 5, Layout::ColMajor);
+        sketch_eye<double>(seed, 22, 51, RandBLAS::Axis::Short, 5, Layout::ColMajor);
 }
 
 TEST_F(TestRSKGES, right_lift_eye_saso_rowmajor)
 {
     for (uint32_t seed : {0})
-        sketch_eye<double>(seed, 22, 51, RandBLAS::MajorAxis::Short, 5, Layout::RowMajor);
+        sketch_eye<double>(seed, 22, 51, RandBLAS::Axis::Short, 5, Layout::RowMajor);
 }
 
 TEST_F(TestRSKGES, right_lift_eye_laso_colmajor)
 {
     for (uint32_t seed : {0})
-        sketch_eye<double>(seed, 22, 51, RandBLAS::MajorAxis::Long, 13, Layout::ColMajor);
+        sketch_eye<double>(seed, 22, 51, RandBLAS::Axis::Long, 13, Layout::ColMajor);
 }
 
 TEST_F(TestRSKGES, right_lift_eye_laso_rowmajor)
 {
     for (uint32_t seed : {0})
-        sketch_eye<double>(seed, 22, 51, RandBLAS::MajorAxis::Long, 13, Layout::RowMajor);
+        sketch_eye<double>(seed, 22, 51, RandBLAS::Axis::Long, 13, Layout::RowMajor);
 }
 
 ////////////////////////////////////////////////////////////////////////
@@ -183,10 +176,10 @@ TEST_F(TestRSKGES, sketch_saso_rowMajor_oneThread)
 {
     for (int64_t k_idx : {0, 1, 2}) {
         for (int64_t nz_idx: {1, 2, 3, 0}) {
-            apply<double>(RandBLAS::MajorAxis::Short,
+            apply<double>(RandBLAS::Axis::Short,
                 12, 19, 201, Layout::RowMajor, k_idx, nz_idx, 1
             );
-            apply<float>(RandBLAS::MajorAxis::Short,
+            apply<float>(RandBLAS::Axis::Short,
                 12, 19, 201, Layout::RowMajor, k_idx, nz_idx, 1
             );
         }
@@ -197,8 +190,8 @@ TEST_F(TestRSKGES, sketch_laso_rowMajor_oneThread)
 {
     for (int64_t k_idx : {0, 1, 2}) {
         for (int64_t nz_idx: {1, 2, 3, 0}) {
-            apply<double>(RandBLAS::MajorAxis::Long, 12, 19, 201, Layout::RowMajor, k_idx, nz_idx, 1);
-            apply<float>(RandBLAS::MajorAxis::Long, 12, 19, 201, Layout::RowMajor, k_idx, nz_idx, 1);
+            apply<double>(RandBLAS::Axis::Long, 12, 19, 201, Layout::RowMajor, k_idx, nz_idx, 1);
+            apply<float>(RandBLAS::Axis::Long, 12, 19, 201, Layout::RowMajor, k_idx, nz_idx, 1);
         }
     }
 }
@@ -207,8 +200,8 @@ TEST_F(TestRSKGES, sketch_saso_colMajor_oneThread)
 {
     for (int64_t k_idx : {0, 1, 2}) {
         for (int64_t nz_idx: {1, 2, 3, 0}) {
-            apply<double>(RandBLAS::MajorAxis::Short, 12, 19, 201, Layout::ColMajor, k_idx, nz_idx, 1);
-            apply<float>(RandBLAS::MajorAxis::Short, 12, 19, 201, Layout::ColMajor, k_idx, nz_idx, 1);
+            apply<double>(RandBLAS::Axis::Short, 12, 19, 201, Layout::ColMajor, k_idx, nz_idx, 1);
+            apply<float>(RandBLAS::Axis::Short, 12, 19, 201, Layout::ColMajor, k_idx, nz_idx, 1);
         }
     }
 }
@@ -217,8 +210,8 @@ TEST_F(TestRSKGES, sketch_laso_colMajor_oneThread)
 {
     for (int64_t k_idx : {0, 1, 2}) {
         for (int64_t nz_idx: {1, 2, 3, 0}) {
-            apply<double>(RandBLAS::MajorAxis::Long, 12, 19, 201, Layout::ColMajor, k_idx, nz_idx, 1);
-            apply<float>(RandBLAS::MajorAxis::Long, 12, 19, 201, Layout::ColMajor, k_idx, nz_idx, 1);
+            apply<double>(RandBLAS::Axis::Long, 12, 19, 201, Layout::ColMajor, k_idx, nz_idx, 1);
+            apply<float>(RandBLAS::Axis::Long, 12, 19, 201, Layout::ColMajor, k_idx, nz_idx, 1);
         }
     }
 }
@@ -237,10 +230,10 @@ TEST_F(TestRSKGES, lift_saso_rowMajor_oneThread)
 {
     for (int64_t k_idx : {0, 1, 2}) {
         for (int64_t nz_idx: {1, 2, 3, 0}) {
-            apply<double>(RandBLAS::MajorAxis::Short,
+            apply<double>(RandBLAS::Axis::Short,
                 201, 19, 12, Layout::RowMajor, k_idx, nz_idx, 1
             );
-            apply<float>(RandBLAS::MajorAxis::Short,
+            apply<float>(RandBLAS::Axis::Short,
                 201, 19, 12, Layout::RowMajor, k_idx, nz_idx, 1
             );
         }
@@ -251,8 +244,8 @@ TEST_F(TestRSKGES, lift_laso_rowMajor_oneThread)
 {
     for (int64_t k_idx : {0, 1, 2}) {
         for (int64_t nz_idx: {1, 2, 3, 0}) {
-            apply<double>(RandBLAS::MajorAxis::Long, 201, 19, 12, Layout::RowMajor, k_idx, nz_idx, 1);
-            apply<float>(RandBLAS::MajorAxis::Long, 201, 19, 12, Layout::RowMajor, k_idx, nz_idx, 1);
+            apply<double>(RandBLAS::Axis::Long, 201, 19, 12, Layout::RowMajor, k_idx, nz_idx, 1);
+            apply<float>(RandBLAS::Axis::Long, 201, 19, 12, Layout::RowMajor, k_idx, nz_idx, 1);
         }
     }
 }
@@ -261,8 +254,8 @@ TEST_F(TestRSKGES, lift_saso_colMajor_oneThread)
 {
     for (int64_t k_idx : {0, 1, 2}) {
         for (int64_t nz_idx: {1, 2, 3, 0}) {
-            apply<double>(RandBLAS::MajorAxis::Short, 201, 19, 12, Layout::ColMajor, k_idx, nz_idx, 1);
-            apply<float>(RandBLAS::MajorAxis::Short, 201, 19, 12, Layout::ColMajor, k_idx, nz_idx, 1);
+            apply<double>(RandBLAS::Axis::Short, 201, 19, 12, Layout::ColMajor, k_idx, nz_idx, 1);
+            apply<float>(RandBLAS::Axis::Short, 201, 19, 12, Layout::ColMajor, k_idx, nz_idx, 1);
         }
     }
 }
@@ -271,8 +264,8 @@ TEST_F(TestRSKGES, lift_laso_colMajor_oneThread)
 {
     for (int64_t k_idx : {0, 1, 2}) {
         for (int64_t nz_idx: {1, 2, 3, 0}) {
-            apply<double>(RandBLAS::MajorAxis::Long, 201, 19, 12, Layout::ColMajor, k_idx, nz_idx, 1);
-            apply<float>(RandBLAS::MajorAxis::Long, 201, 19, 12, Layout::ColMajor, k_idx, nz_idx, 1);
+            apply<double>(RandBLAS::Axis::Long, 201, 19, 12, Layout::ColMajor, k_idx, nz_idx, 1);
+            apply<float>(RandBLAS::Axis::Long, 201, 19, 12, Layout::ColMajor, k_idx, nz_idx, 1);
         }
     }
 }
diff --git a/test/test_matmul_cores/test_spmm/spmm_test_helpers.hh b/test/test_matmul_cores/test_spmm/spmm_test_helpers.hh
index c07f16a4..0077c8c4 100644
--- a/test/test_matmul_cores/test_spmm/spmm_test_helpers.hh
+++ b/test/test_matmul_cores/test_spmm/spmm_test_helpers.hh
@@ -143,12 +143,12 @@ class TestRightMultiply_Sparse : public ::testing::Test {
 
     void alpha_beta(uint32_t key, T alpha, T beta, int64_t m, int64_t n, Layout layout, T p) {
         auto A = this->make_test_matrix(m, n, p, key);
-       test_right_apply_submatrix_to_eye<T>(alpha, A, m, n, 0, 0, layout, beta, 0);
+        test_right_apply_submatrix_to_eye<T>(alpha, A, m, n, 0, 0, layout, beta, 0);
     }
 
     void transpose_self(uint32_t key, int64_t m, int64_t n, Layout layout, T p) {
         auto A = this->make_test_matrix(m, n, p, key);
-        test_right_apply_tranpose_to_eye<T>(A, layout, 0);
+        test_right_apply_transpose_to_eye<T>(A, layout, 0);
     }
 
     void submatrix_self(
diff --git a/test/test_matmul_wrappers/DevNotes.md b/test/test_matmul_wrappers/DevNotes.md
index d339d2c5..ce790d0f 100644
--- a/test/test_matmul_wrappers/DevNotes.md
+++ b/test/test_matmul_wrappers/DevNotes.md
@@ -3,16 +3,23 @@
 
 ## Tests for sketch_sparse
 
+Our tests for [L/R]SKSP3 are adaptations of the tests in ``test_matmul_cores/test_lskge3.cc``
+and ``test_matmul_cores/test_rskge3.cc`` where the data matrix was the identity.
+
+ * We only test COOMatrix for the sparse matrix datatype, but that's reasonable since the implementations
+   of [L/R]SKSP3 are fully templated for the sparse matrix datatype.
+ * These tests don't consider operating on submatrices of the data matrix.
+   It's possible to do that in principle (at least for COOMatrix) but that's not necessary since logic of handling
+   submatrices of the sparse data matrix is handled in left_spmm and right_spmm.
 
 ## Tests for sketch_symmetric
-Right now this falls back on sketch_general, which means it suffices to 
-test with only DenseSkOp.
+sketch_symmetric currently falls back on sketch_general, so it suffices to test with only DenseSkOp.
 
 ## Tests for sketch_vector
-Right now sketch_vector falls back on sketch_general, which means it suffices
-to test with only DenseSkOp.
-There's an argument to be made for it to directly handle
-dispatching of GEMV (for DenseSkOp) and SPMV (for SparseSkOp).
+sketch_vector currently falls back on sketch_general, so it suffices to test with only DenseSkOp.
+
+There's an argument to be made for sketch_vector to directly handle
+dispatching of GEMV (for DenseSkOp) and an SPMV kernel (for SparseSkOp).
 Additional tests would be warranted if we made that change.
 
 *Note.* We have some infrastructure in place for SPMV,
diff --git a/test/test_matmul_wrappers/test_sketch_sparse.cc b/test/test_matmul_wrappers/test_sketch_sparse.cc
index e69de29b..3f1dbf92 100644
--- a/test/test_matmul_wrappers/test_sketch_sparse.cc
+++ b/test/test_matmul_wrappers/test_sketch_sparse.cc
@@ -0,0 +1,558 @@
+#include "test/test_matmul_cores/linop_common.hh"
+// ^ That includes a ton of stuff.
+
+using blas::Layout;
+using blas::Op;
+
+using RandBLAS::RNGState;
+using RandBLAS::DenseDist;
+using RandBLAS::DenseSkOp;
+using RandBLAS::dims_before_op;
+using RandBLAS::offset_and_ldim;
+using RandBLAS::layout_to_strides;
+using RandBLAS::sketch_sparse;
+using namespace RandBLAS::sparse_data;
+
+using test::linop_common::dimensions;
+using test::linop_common::random_matrix;
+using test::linop_common::to_explicit_buffer;
+// ^ Call as to_explicit_buffer(denseskop, mat_s, layout).
+//   That populates mat_s with data from the denseskop in layout
+//   order with the smallest possible leading dimension.
+
+
+template <SparseMatrix SpMat>
+SpMat eye(int64_t m) {
+    using      T = SpMat::scalar_t;
+    using sint_t = SpMat::index_t;
+    COOMatrix<T,sint_t> coo(m, m);
+    reserve_coo(m, coo);
+    for (int i = 0; i < m; ++i) {
+        coo.rows[i] = i;
+        coo.cols[i] = i;
+        coo.vals[i] = (T) 1.0;
+    }
+    constexpr bool is_coo = std::is_same_v<SpMat, COOMatrix<T, sint_t>>;
+    constexpr bool is_csc = std::is_same_v<SpMat, CSCMatrix<T, sint_t>>;
+    constexpr bool is_csr = std::is_same_v<SpMat, CSRMatrix<T, sint_t>>;
+    if constexpr (is_coo) {
+        return coo;
+    } else if constexpr (is_csc) {
+        CSCMatrix<T,sint_t> csc(m, m);
+        coo_to_csc(coo, csc);
+        return csc;
+    } else if constexpr (is_csr) {
+        CSRMatrix<T,sint_t> csr(m, m);
+        coo_to_csr(coo, csr);
+        return csr;
+    } else {
+        randblas_require(false);
+    }
+}
+
+// Adapted from test::linop_common::test_left_apply_transpose_to_eye.
+template <typename T, typename DenseSkOp, SparseMatrix SpMat = COOMatrix<T,int64_t>>
+void test_left_transposed_sketch_of_eye(
+    // B = S^T * eye, where S is m-by-d, B is d-by-m
+    DenseSkOp &S, Layout layout
+) {
+    auto [m, d] = dimensions(S);
+    auto I = eye<SpMat>(m);
+    std::vector<T> B(d * m, 0.0);
+    bool is_colmajor = (Layout::ColMajor == layout);
+    int64_t ldb = (is_colmajor) ? d : m;
+    int64_t lds = (is_colmajor) ? m : d;
+
+    lsksp3(
+        layout, Op::Trans, Op::NoTrans, d, m, m,
+        (T) 1.0, S, 0, 0, I, 0, 0, (T) 0.0, B.data(), ldb
+    );
+
+    std::vector<T> S_dense(m * d, 0.0);
+    to_explicit_buffer(S, S_dense.data(), layout);
+    test::comparison::matrices_approx_equal(
+        layout, Op::Trans, d, m,
+        B.data(), ldb, S_dense.data(), lds,
+        __PRETTY_FUNCTION__, __FILE__, __LINE__
+    );
+}
+
+// Adapted from test::linop_common::test_left_apply_submatrix_to_eye.
+template <typename T, typename DenseSkOp, SparseMatrix SpMat = COOMatrix<T,int64_t>>
+void test_left_submat_sketch_of_eye(
+    // B = alpha * submat(S0) * eye + beta*B, where S = submat(S) is d1-by-m1 offset by (S_ro, S_co) in S0, and B is random.
+    T alpha, DenseSkOp &S0, int64_t d1, int64_t m1, int64_t S_ro, int64_t S_co, Layout layout, T beta = 0.0
+) {
+    auto [d0, m0] = dimensions(S0);
+    randblas_require(d0 >= d1);
+    randblas_require(m0 >= m1);
+    bool is_colmajor = layout == Layout::ColMajor;
+    int64_t ldb = (is_colmajor) ? d1 : m1;
+
+    // define a matrix to be sketched, and create workspace for sketch.
+    auto I = eye<SpMat>(m1);
+    auto B = std::get<0>(random_matrix<T>(d1, m1, RNGState(42)));
+    std::vector<T> B_backup(B);
+
+    // Perform the sketch
+    lsksp3(
+        layout, Op::NoTrans, Op::NoTrans, d1, m1, m1,
+        alpha, S0, S_ro, S_co, I, 0, 0, beta, B.data(), ldb
+    );
+
+    // Check the result
+    T *expect = new T[d0 * m0];
+    to_explicit_buffer(S0, expect, layout);
+    int64_t ld_expect = (is_colmajor) ? d0 : m0; 
+    auto [row_stride_s, col_stride_s] = layout_to_strides(layout, ld_expect);
+    auto [row_stride_b, col_stride_b] = layout_to_strides(layout, ldb);
+    int64_t offset = row_stride_s * S_ro + col_stride_s * S_co;
+    #define MAT_E(_i, _j) expect[offset + (_i)*row_stride_s + (_j)*col_stride_s]
+    #define MAT_B(_i, _j) B_backup[       (_i)*row_stride_b + (_j)*col_stride_b]
+    for (int i = 0; i < d1; ++i) {
+        for (int j = 0; j < m1; ++j) {
+            MAT_E(i,j) = alpha * MAT_E(i,j) + beta * MAT_B(i, j);
+        }
+    }
+
+    test::comparison::matrices_approx_equal(
+        layout, Op::NoTrans,
+        d1, m1,
+        B.data(), ldb,
+        &expect[offset], ld_expect,
+        __PRETTY_FUNCTION__, __FILE__, __LINE__
+    );
+
+    delete [] expect;
+}
+
+// Adapted from test::linop_common::test_right_apply_transpose_to_eye.
+template <typename T, typename DenseSkOp, SparseMatrix SpMat = COOMatrix<T,int64_t>>
+void test_right_transposed_sketch_of_eye(
+    // B = eye * S^T, where S is d-by-n, so eye is order n and B is n-by-d
+    DenseSkOp &S, Layout layout
+) {
+    auto [d, n] = dimensions(S);
+    auto I = eye<SpMat>(n);
+    std::vector<T> B(n * d, 0.0);
+    bool is_colmajor = Layout::ColMajor == layout;
+    int64_t ldb = (is_colmajor) ? n : d;
+    int64_t lds = (is_colmajor) ? d : n;
+    
+    rsksp3(layout, Op::NoTrans, Op::Trans, n, d, n, (T) 1.0, I, 0, 0, S, 0, 0, (T) 0.0, B.data(), ldb);
+
+    std::vector<T> S_dense(n * d, 0.0);
+    to_explicit_buffer(S, S_dense.data(), layout);
+    test::comparison::matrices_approx_equal(
+        layout, Op::Trans, n, d, 
+        B.data(), ldb, S_dense.data(), lds,
+        __PRETTY_FUNCTION__, __FILE__, __LINE__
+    );
+}
+
+// Adapted from test::linop_common::test_right_apply_submatrix_to_eye.
+template <typename T, typename DenseSkOp, SparseMatrix SpMat = COOMatrix<T,int64_t>>
+void test_right_submat_sketch_of_eye(
+    // B = alpha * eye * submat(S) + beta*B : submat(S) is n-by-d, eye is n-by-n, B is n-by-d and random
+    T alpha, DenseSkOp &S0, int64_t n, int64_t d, int64_t S_ro, int64_t S_co, Layout layout, T beta = 0.0
+) {
+    auto [n0, d0] = dimensions(S0);
+    randblas_require(n0 >= n);
+    randblas_require(d0 >= d);
+    bool is_colmajor = layout == Layout::ColMajor;
+    int64_t ldb = (is_colmajor) ? n : d;
+
+    auto I = eye<SpMat>(n);
+    auto B = std::get<0>(random_matrix<T>(n, d, RNGState(11)));
+    std::vector<T> B_backup(B);
+    rsksp3(layout, Op::NoTrans, Op::NoTrans, n, d, n, alpha, I, 0, 0, S0, S_ro, S_co, beta, B.data(), ldb);
+
+    T *expect = new T[n0 * d0];
+    to_explicit_buffer(S0, expect, layout);
+    int64_t ld_expect = (is_colmajor)? n0 : d0;
+    auto [row_stride_s, col_stride_s] = layout_to_strides(layout, ld_expect);
+    auto [row_stride_b, col_stride_b] = layout_to_strides(layout, ldb);
+    int64_t offset = row_stride_s * S_ro + col_stride_s * S_co;
+    #define MAT_E(_i, _j) expect[offset + (_i)*row_stride_s + (_j)*col_stride_s]
+    #define MAT_B(_i, _j) B_backup[       (_i)*row_stride_b + (_j)*col_stride_b]
+    for (int i = 0; i < n; ++i) {
+        for (int j = 0; j < d; ++j) {
+            MAT_E(i,j) = alpha * MAT_E(i,j) + beta * MAT_B(i, j);
+        }
+    }
+
+    test::comparison::matrices_approx_equal(
+        layout, Op::NoTrans, n, d, B.data(), ldb, &expect[offset], ld_expect,
+        __PRETTY_FUNCTION__, __FILE__, __LINE__
+    );
+
+    delete [] expect;
+}
+
+
+class TestLSKSP3 : public ::testing::Test
+{
+    protected:
+    
+    virtual void SetUp(){};
+
+    virtual void TearDown(){};
+
+    template <typename T>
+    static void sketch_eye(uint32_t seed, int64_t m, int64_t d, bool preallocate, Layout layout) {
+        DenseDist D(d, m);
+        DenseSkOp<T> S0(D, seed);
+        if (preallocate)
+            RandBLAS::fill_dense(S0);
+        test_left_submat_sketch_of_eye<T>(1.0, S0, d, m, 0, 0, layout, 0.0);
+    }
+
+    template <typename T>
+    static void transpose_S(uint32_t seed, int64_t m, int64_t d, Layout layout) {
+        DenseDist Dt(m, d);
+        DenseSkOp<T> S0(Dt, seed);
+        RandBLAS::fill_dense(S0);
+        test_left_transposed_sketch_of_eye<T>(S0, layout);
+    }
+
+    template <typename T>
+    static void submatrix_S(
+        uint32_t seed,
+        int64_t d,    // rows in sketch
+        int64_t m,    // size of identity matrix
+        int64_t d0,   // rows in S0
+        int64_t m0,   // cols in S0
+        int64_t S_ro, // row offset for S in S0
+        int64_t S_co, // column offset for S in S0
+        Layout layout
+    ) {
+        randblas_require(d0 > d);
+        randblas_require(m0 > m);
+        DenseDist D(d0, m0);
+        DenseSkOp<T> S0(D, seed);
+        test_left_submat_sketch_of_eye<T>(1.0, S0, d, m, S_ro, S_co, layout, 0.0);
+    }
+
+};
+
+////////////////////////////////////////////////////////////////////////
+//
+//
+//      Basic sketching (vary preallocation, row vs col major)
+//
+//
+////////////////////////////////////////////////////////////////////////
+
+TEST_F(TestLSKSP3, sketch_eye_double_preallocate_colmajor) {
+    for (uint32_t seed : {0})
+        sketch_eye<double>(seed, 200, 30, true, blas::Layout::ColMajor);
+}
+
+TEST_F(TestLSKSP3, sketch_eye_double_preallocate_rowmajor) {
+    for (uint32_t seed : {0})
+        sketch_eye<double>(seed, 200, 30, true, blas::Layout::RowMajor);
+}
+
+TEST_F(TestLSKSP3, sketch_eye_double_null_colmajor) {
+    for (uint32_t seed : {0})
+        sketch_eye<double>(seed, 200, 30, false, blas::Layout::ColMajor);
+}
+
+TEST_F(TestLSKSP3, sketch_eye_double_null_rowmajor) {
+    for (uint32_t seed : {0})
+        sketch_eye<double>(seed, 200, 30, false, blas::Layout::RowMajor);
+}
+
+TEST_F(TestLSKSP3, sketch_eye_single_preallocate) {
+    for (uint32_t seed : {0})
+        sketch_eye<float>(seed, 200, 30, true, blas::Layout::ColMajor);
+}
+
+TEST_F(TestLSKSP3, sketch_eye_single_null) {
+    for (uint32_t seed : {0})
+        sketch_eye<float>(seed, 200, 30, false, blas::Layout::ColMajor);
+}
+
+////////////////////////////////////////////////////////////////////////
+//
+//
+//      Lifting
+//
+//
+////////////////////////////////////////////////////////////////////////
+
+TEST_F(TestLSKSP3, lift_eye_double_preallocate_colmajor) {
+    for (uint32_t seed : {0})
+        sketch_eye<double>(seed, 10, 51, true, blas::Layout::ColMajor);
+}
+
+TEST_F(TestLSKSP3, lift_eye_double_preallocate_rowmajor) {
+    for (uint32_t seed : {0})
+        sketch_eye<double>(seed, 10, 51, true, blas::Layout::RowMajor);
+}
+
+TEST_F(TestLSKSP3, lift_eye_double_null_colmajor) {
+    for (uint32_t seed : {0})
+        sketch_eye<double>(seed, 10, 51, false, blas::Layout::ColMajor);
+}
+
+TEST_F(TestLSKSP3, lift_eye_double_null_rowmajor) {
+    for (uint32_t seed : {0})
+        sketch_eye<double>(seed, 10, 51, false, blas::Layout::RowMajor);
+}
+
+////////////////////////////////////////////////////////////////////////
+//
+//
+//      transpose of S
+//
+//
+////////////////////////////////////////////////////////////////////////
+
+TEST_F(TestLSKSP3, transpose_double_colmajor) {
+    for (uint32_t seed : {0})
+        transpose_S<double>(seed, 200, 30, blas::Layout::ColMajor);
+}
+
+TEST_F(TestLSKSP3, transpose_double_rowmajor) {
+    for (uint32_t seed : {0})
+        transpose_S<double>(seed, 200, 30, blas::Layout::RowMajor);
+}
+
+TEST_F(TestLSKSP3, transpose_single) {
+    for (uint32_t seed : {0})
+        transpose_S<float>(seed, 200, 30, blas::Layout::ColMajor);
+}
+
+////////////////////////////////////////////////////////////////////////
+//
+//
+//      Submatrices of S
+//
+//
+////////////////////////////////////////////////////////////////////////
+
+TEST_F(TestLSKSP3, submatrix_s_double_colmajor) {
+    for (uint32_t seed : {0})
+        submatrix_S<double>(seed,
+            3, 10, // (rows, cols) in S.
+            8, 12, // (rows, cols) in S0.
+            3, // The first row of S is in the forth row of S0
+            1, // The first col of S is in the second col of S0
+            blas::Layout::ColMajor
+        );
+}
+
+TEST_F(TestLSKSP3, submatrix_s_double_rowmajor) {
+    for (uint32_t seed : {0})
+        submatrix_S<double>(seed,
+            3, 10, // (rows, cols) in S.
+            8, 12, // (rows, cols) in S0.
+            3, // The first row of S is in the forth row of S0
+            1, // The first col of S is in the second col of S0
+            blas::Layout::RowMajor
+        );
+}
+
+TEST_F(TestLSKSP3, submatrix_s_single) {
+    for (uint32_t seed : {0})
+        submatrix_S<float>(seed,
+            3, 10, // (rows, cols) in S.
+            8, 12, // (rows, cols) in S0.
+            3, // The first row of S is in the forth row of S0
+            1, // The first col of S is in the second col of S0
+            blas::Layout::ColMajor
+        );
+}
+
+
+class TestRSKSP3 : public ::testing::Test
+{
+    protected:
+    
+    virtual void SetUp(){};
+
+    virtual void TearDown(){};
+
+    template <typename T>
+    static void sketch_eye(uint32_t seed, int64_t m, int64_t d, bool preallocate, Layout layout) {
+        DenseDist D(m, d);
+        DenseSkOp<T> S0(D, seed);
+        if (preallocate)
+            RandBLAS::fill_dense(S0);
+        test_right_submat_sketch_of_eye<T>(1.0, S0, m, d, 0, 0, layout, 0.0);
+    }
+
+    template <typename T>
+    static void transpose_S(uint32_t seed, int64_t m, int64_t d, Layout layout) {
+        DenseDist Dt(d, m);
+        DenseSkOp<T> S0(Dt, seed);
+        test_right_transposed_sketch_of_eye<T>(S0, layout);
+    }
+
+    template <typename T>
+    static void submatrix_S(
+        uint32_t seed,
+        int64_t d, // columns in sketch
+        int64_t m, // size of identity matrix
+        int64_t d0, // cols in S0
+        int64_t m0, // rows in S0
+        int64_t S_ro, // row offset for S in S0
+        int64_t S_co, // column offset for S in S0
+        Layout layout
+    ) {
+        DenseDist D(m0, d0);
+        DenseSkOp<T> S0(D, seed);
+        test_right_submat_sketch_of_eye<T>(1.0, S0, m, d, S_ro, S_co, layout, 0.0);
+    }
+
+};
+
+
+////////////////////////////////////////////////////////////////////////
+//
+//
+//      RSKSP3: Basic sketching (vary preallocation, row vs col major)
+//
+//
+////////////////////////////////////////////////////////////////////////
+
+TEST_F(TestRSKSP3, right_sketch_eye_double_preallocate_colmajor)
+{
+    for (uint32_t seed : {0})
+        sketch_eye<double>(seed, 200, 30, true, Layout::ColMajor);
+}
+
+TEST_F(TestRSKSP3, right_sketch_eye_double_preallocate_rowmajor)
+{
+    for (uint32_t seed : {0})
+        sketch_eye<double>(seed, 200, 30, true, Layout::RowMajor);
+}
+
+TEST_F(TestRSKSP3, right_sketch_eye_double_null_colmajor)
+{
+    for (uint32_t seed : {0})
+        sketch_eye<double>(seed, 200, 30, false, Layout::ColMajor);
+}
+
+TEST_F(TestRSKSP3, right_sketch_eye_double_null_rowmajor)
+{
+    for (uint32_t seed : {0})
+        sketch_eye<double>(seed, 200, 30, false, Layout::RowMajor);
+}
+
+TEST_F(TestRSKSP3, right_sketch_eye_single_preallocate)
+{
+    for (uint32_t seed : {0})
+        sketch_eye<float>(seed, 200, 30, true, Layout::ColMajor);
+}
+
+TEST_F(TestRSKSP3, right_sketch_eye_single_null)
+{
+    for (uint32_t seed : {0})
+        sketch_eye<float>(seed, 200, 30, false, Layout::ColMajor);
+}
+
+
+////////////////////////////////////////////////////////////////////////
+//
+//
+//      RSKSP3: Lifting
+//
+//
+////////////////////////////////////////////////////////////////////////
+
+TEST_F(TestRSKSP3, right_lift_eye_double_preallocate_colmajor)
+{
+    for (uint32_t seed : {0})
+        sketch_eye<double>(seed, 10, 51, true, Layout::ColMajor);
+}
+
+TEST_F(TestRSKSP3, right_lift_eye_double_preallocate_rowmajor)
+{
+    for (uint32_t seed : {0})
+        sketch_eye<double>(seed, 10, 51, true, Layout::RowMajor);
+}
+
+TEST_F(TestRSKSP3, right_lift_eye_double_null_colmajor)
+{
+    for (uint32_t seed : {0})
+        sketch_eye<double>(seed, 10, 51, false, Layout::ColMajor);
+}
+
+TEST_F(TestRSKSP3, right_lift_eye_double_null_rowmajor)
+{
+    for (uint32_t seed : {0})
+        sketch_eye<double>(seed, 10, 51, false, Layout::RowMajor);
+}
+
+
+////////////////////////////////////////////////////////////////////////
+//
+//
+//      RSKSP3: transpose of S
+//
+//
+////////////////////////////////////////////////////////////////////////
+
+TEST_F(TestRSKSP3, transpose_double_colmajor)
+{
+    for (uint32_t seed : {0})
+        transpose_S<double>(seed, 200, 30, Layout::ColMajor);
+}
+
+TEST_F(TestRSKSP3, transpose_double_rowmajor)
+{
+    for (uint32_t seed : {0})
+        transpose_S<double>(seed, 200, 30, Layout::RowMajor);
+}
+
+TEST_F(TestRSKSP3, transpose_single)
+{
+    for (uint32_t seed : {0})
+        transpose_S<float>(seed, 200, 30, Layout::ColMajor);
+}
+
+////////////////////////////////////////////////////////////////////////
+//
+//
+//      RSKSP3: Submatrices of S
+//
+//
+////////////////////////////////////////////////////////////////////////
+
+TEST_F(TestRSKSP3, submatrix_s_double_colmajor) 
+{
+    for (uint32_t seed : {0})
+        submatrix_S<double>(seed,
+            3, 10, // (cols, rows) in S.
+            8, 12, // (cols, rows) in S0.
+            2, // The first row of S is in the third row of S0
+            1, // The first col of S is in the second col of S0
+            Layout::ColMajor
+        );
+}
+
+TEST_F(TestRSKSP3, submatrix_s_double_rowmajor) 
+{
+    for (uint32_t seed : {0})
+        submatrix_S<double>(seed,
+            3, 10, // (cols, rows) in S.
+            8, 12, // (cols, rows) in S0.
+            2, // The first row of S is in the third row of S0
+            1, // The first col of S is in the second col of S0
+            Layout::RowMajor
+        );
+}
+
+TEST_F(TestRSKSP3, submatrix_s_single) 
+{
+    for (uint32_t seed : {0})
+        submatrix_S<float>(seed,
+            3, 10, // (cols, rows) in S.
+            8, 12, // (cols, rows) in S0.
+            2, // The first row of S is in the third row of S0
+            1, // The first col of S is in the second col of S0
+            Layout::ColMajor
+        );
+}
diff --git a/test/test_matmul_wrappers/test_sketch_symmetric.cc b/test/test_matmul_wrappers/test_sketch_symmetric.cc
index 47594f2a..7807b92c 100644
--- a/test/test_matmul_wrappers/test_sketch_symmetric.cc
+++ b/test/test_matmul_wrappers/test_sketch_symmetric.cc
@@ -36,11 +36,11 @@
 
 using blas::Layout;
 using blas::Uplo;
-using RandBLAS::DenseDistName;
+using RandBLAS::ScalarDist;
 using RandBLAS::DenseDist;
 using RandBLAS::DenseSkOp;
 using RandBLAS::RNGState;
-using RandBLAS::MajorAxis;
+using RandBLAS::Axis;
 
 #include "test/comparison.hh"
 
@@ -53,8 +53,8 @@ void random_symmetric_mat(int64_t n, T* A, int64_t lda, STATE s) {
     // This function can be interpreted as first generating a random lda-by-lda symmetric matrix
     // whose entries in the upper triangle are iid, then symmetrizing that matrix, then
     // zeroing out all entries outside the leading principal submatrix of order n.
-    RandBLAS::fill_dense(Layout::ColMajor, {lda, lda}, n, n, 0, 0, A, s);
-    RandBLAS::util::symmetrize(Layout::ColMajor, Uplo::Upper, n, A, lda);
+    RandBLAS::fill_dense_unpacked(Layout::ColMajor, {lda, lda}, n, n, 0, 0, A, s);
+    RandBLAS::symmetrize(Layout::ColMajor, Uplo::Upper, n, A, lda);
     return;
 }
 
@@ -88,12 +88,12 @@ class TestSketchSymmetric : public ::testing::Test {
 
     template <typename T>
     static void test_same_layouts(
-        uint32_t seed_a, uint32_t seed_skop, MajorAxis ma, T alpha, int64_t d, int64_t n, int64_t lda, T beta, blas::Side side_skop
+        uint32_t seed_a, uint32_t seed_skop, Axis major_axis, T alpha, int64_t d, int64_t n, int64_t lda, T beta, blas::Side side_skop
     ) {
         auto [rows_out, cols_out] = dims_of_sketch_symmetric_output(d, n, side_skop);
         std::vector<T> A(lda*lda, 0.0);
         random_symmetric_mat(n, A.data(), lda, RNGState(seed_a));
-        DenseDist D(rows_out, cols_out, DenseDistName::Uniform, ma);
+        DenseDist D(rows_out, cols_out, ScalarDist::Uniform, major_axis);
         DenseSkOp<T> S(D, seed_skop);
         RandBLAS::fill_dense(S);
         int64_t lds = (S.layout == Layout::RowMajor) ? cols_out : rows_out;
@@ -117,12 +117,12 @@ class TestSketchSymmetric : public ::testing::Test {
 
     template <typename T>
     static void test_opposing_layouts(
-        uint32_t seed_a, uint32_t seed_skop, MajorAxis ma, T alpha, int64_t d, int64_t n, int64_t lda, T beta, blas::Side side_skop
+        uint32_t seed_a, uint32_t seed_skop, Axis major_axis, T alpha, int64_t d, int64_t n, int64_t lda, T beta, blas::Side side_skop
     ) {
         auto [rows_out, cols_out] = dims_of_sketch_symmetric_output(d, n, side_skop);
         std::vector<T> A(lda*lda, 0.0);
         random_symmetric_mat(n, A.data(), lda, RNGState(seed_a));
-        DenseDist D(rows_out, cols_out, DenseDistName::Uniform, ma);
+        DenseDist D(rows_out, cols_out, ScalarDist::Uniform, major_axis);
         DenseSkOp<T> S(D, seed_skop);
         RandBLAS::fill_dense(S);
         int64_t lds_init, ldb;
@@ -160,94 +160,94 @@ class TestSketchSymmetric : public ::testing::Test {
 
 TEST_F(TestSketchSymmetric, left_sketch_10_to_3_same_layouts) {
     // LDA=10,   (seed_a, seed_skop) = (0, 1) then (31, 33),   beta = 0.0
-    test_same_layouts( 0,  1, MajorAxis::Short, 0.5, 3, 10, 10, 0.0, blas::Side::Left);
-    test_same_layouts( 0,  1, MajorAxis::Long,  0.5, 3, 10, 10, 0.0, blas::Side::Left);
-    test_same_layouts(31, 33, MajorAxis::Short, 0.5, 3, 10, 10, 0.0, blas::Side::Left);
-    test_same_layouts(31, 33, MajorAxis::Long,  0.5, 3, 10, 10, 0.0, blas::Side::Left);
+    test_same_layouts( 0,  1, Axis::Short, 0.5, 3, 10, 10, 0.0, blas::Side::Left);
+    test_same_layouts( 0,  1, Axis::Long,  0.5, 3, 10, 10, 0.0, blas::Side::Left);
+    test_same_layouts(31, 33, Axis::Short, 0.5, 3, 10, 10, 0.0, blas::Side::Left);
+    test_same_layouts(31, 33, Axis::Long,  0.5, 3, 10, 10, 0.0, blas::Side::Left);
     // LDA=19,   (seed_a, seed_skop) = (0, 1) then (31, 33),   beta = 0.0
-    test_same_layouts(0, 1,   MajorAxis::Short, 0.5, 3, 10, 19, 0.0, blas::Side::Left);
-    test_same_layouts(0, 1,   MajorAxis::Long,  0.5, 3, 10, 19, 0.0, blas::Side::Left);
-    test_same_layouts(31, 33, MajorAxis::Short, 0.5, 3, 10, 19, 0.0, blas::Side::Left);
-    test_same_layouts(31, 33, MajorAxis::Long,  0.5, 3, 10, 19, 0.0, blas::Side::Left);
+    test_same_layouts(0, 1,   Axis::Short, 0.5, 3, 10, 19, 0.0, blas::Side::Left);
+    test_same_layouts(0, 1,   Axis::Long,  0.5, 3, 10, 19, 0.0, blas::Side::Left);
+    test_same_layouts(31, 33, Axis::Short, 0.5, 3, 10, 19, 0.0, blas::Side::Left);
+    test_same_layouts(31, 33, Axis::Long,  0.5, 3, 10, 19, 0.0, blas::Side::Left);
     // LDA=10,   (seed_a, seed_skop) = (0, 1) then (31, 33),   beta = -1.0
-    test_same_layouts( 0,  1, MajorAxis::Short, 0.5, 3, 10, 10, -1.0, blas::Side::Left);
-    test_same_layouts( 0,  1, MajorAxis::Long,  0.5, 3, 10, 10, -1.0, blas::Side::Left);
-    test_same_layouts(31, 33, MajorAxis::Short, 0.5, 3, 10, 10, -1.0, blas::Side::Left);
-    test_same_layouts(31, 33, MajorAxis::Long,  0.5, 3, 10, 10, -1.0, blas::Side::Left);
+    test_same_layouts( 0,  1, Axis::Short, 0.5, 3, 10, 10, -1.0, blas::Side::Left);
+    test_same_layouts( 0,  1, Axis::Long,  0.5, 3, 10, 10, -1.0, blas::Side::Left);
+    test_same_layouts(31, 33, Axis::Short, 0.5, 3, 10, 10, -1.0, blas::Side::Left);
+    test_same_layouts(31, 33, Axis::Long,  0.5, 3, 10, 10, -1.0, blas::Side::Left);
     // LDA=19,   (seed_a, seed_skop) = (0, 1) then (31, 33),   beta = -1.0
-    test_same_layouts(0, 1,   MajorAxis::Short, 0.5, 3, 10, 19, -1.0, blas::Side::Left);
-    test_same_layouts(0, 1,   MajorAxis::Long,  0.5, 3, 10, 19, -1.0, blas::Side::Left);
-    test_same_layouts(31, 33, MajorAxis::Short, 0.5, 3, 10, 19, -1.0, blas::Side::Left);
-    test_same_layouts(31, 33, MajorAxis::Long,  0.5, 3, 10, 19, -1.0, blas::Side::Left);
+    test_same_layouts(0, 1,   Axis::Short, 0.5, 3, 10, 19, -1.0, blas::Side::Left);
+    test_same_layouts(0, 1,   Axis::Long,  0.5, 3, 10, 19, -1.0, blas::Side::Left);
+    test_same_layouts(31, 33, Axis::Short, 0.5, 3, 10, 19, -1.0, blas::Side::Left);
+    test_same_layouts(31, 33, Axis::Long,  0.5, 3, 10, 19, -1.0, blas::Side::Left);
 }
 
 TEST_F(TestSketchSymmetric, left_lift_same_layouts) {
     // LDA=10,   (seed_a, seed_skop) = (0, 1) then (31, 33),   beta = 0.0
-    test_same_layouts( 0,  1, MajorAxis::Short, 0.5, 13, 10, 10, 0.0, blas::Side::Left);
-    test_same_layouts( 0,  1, MajorAxis::Long,  0.5, 13, 10, 10, 0.0, blas::Side::Left);
-    test_same_layouts(31, 33, MajorAxis::Short, 0.5, 13, 10, 10, 0.0, blas::Side::Left);
-    test_same_layouts(31, 33, MajorAxis::Long,  0.5, 13, 10, 10, 0.0, blas::Side::Left);
+    test_same_layouts( 0,  1, Axis::Short, 0.5, 13, 10, 10, 0.0, blas::Side::Left);
+    test_same_layouts( 0,  1, Axis::Long,  0.5, 13, 10, 10, 0.0, blas::Side::Left);
+    test_same_layouts(31, 33, Axis::Short, 0.5, 13, 10, 10, 0.0, blas::Side::Left);
+    test_same_layouts(31, 33, Axis::Long,  0.5, 13, 10, 10, 0.0, blas::Side::Left);
     // LDA=19,   (seed_a, seed_skop) = (0, 1) then (31, 33),   beta = 0.0
-    test_same_layouts(0, 1,   MajorAxis::Short, 0.5, 50, 10, 19, 0.0, blas::Side::Left);
-    test_same_layouts(0, 1,   MajorAxis::Long,  0.5, 50, 10, 19, 0.0, blas::Side::Left);
-    test_same_layouts(31, 33, MajorAxis::Short, 0.5, 50, 10, 19, 0.0, blas::Side::Left);
-    test_same_layouts(31, 33, MajorAxis::Long,  0.5, 50, 10, 19, 0.0, blas::Side::Left);
+    test_same_layouts(0, 1,   Axis::Short, 0.5, 50, 10, 19, 0.0, blas::Side::Left);
+    test_same_layouts(0, 1,   Axis::Long,  0.5, 50, 10, 19, 0.0, blas::Side::Left);
+    test_same_layouts(31, 33, Axis::Short, 0.5, 50, 10, 19, 0.0, blas::Side::Left);
+    test_same_layouts(31, 33, Axis::Long,  0.5, 50, 10, 19, 0.0, blas::Side::Left);
     // LDA=10,   (seed_a, seed_skop) = (0, 1) then (31, 33),   beta = -1.0
-    test_same_layouts( 0,  1, MajorAxis::Short, 0.5, 13, 10, 10, -1.0, blas::Side::Left);
-    test_same_layouts( 0,  1, MajorAxis::Long,  0.5, 13, 10, 10, -1.0, blas::Side::Left);
-    test_same_layouts(31, 33, MajorAxis::Short, 0.5, 13, 10, 10, -1.0, blas::Side::Left);
-    test_same_layouts(31, 33, MajorAxis::Long,  0.5, 13, 10, 10, -1.0, blas::Side::Left);
+    test_same_layouts( 0,  1, Axis::Short, 0.5, 13, 10, 10, -1.0, blas::Side::Left);
+    test_same_layouts( 0,  1, Axis::Long,  0.5, 13, 10, 10, -1.0, blas::Side::Left);
+    test_same_layouts(31, 33, Axis::Short, 0.5, 13, 10, 10, -1.0, blas::Side::Left);
+    test_same_layouts(31, 33, Axis::Long,  0.5, 13, 10, 10, -1.0, blas::Side::Left);
     // LDA=19,   (seed_a, seed_skop) = (0, 1) then (31, 33),   beta = -1.0
-    test_same_layouts(0, 1,   MajorAxis::Short, 0.5, 50, 10, 19, -1.0, blas::Side::Left);
-    test_same_layouts(0, 1,   MajorAxis::Long,  0.5, 50, 10, 19, -1.0, blas::Side::Left);
-    test_same_layouts(31, 33, MajorAxis::Short, 0.5, 50, 10, 19, -1.0, blas::Side::Left);
-    test_same_layouts(31, 33, MajorAxis::Long,  0.5, 50, 10, 19, -1.0, blas::Side::Left);
+    test_same_layouts(0, 1,   Axis::Short, 0.5, 50, 10, 19, -1.0, blas::Side::Left);
+    test_same_layouts(0, 1,   Axis::Long,  0.5, 50, 10, 19, -1.0, blas::Side::Left);
+    test_same_layouts(31, 33, Axis::Short, 0.5, 50, 10, 19, -1.0, blas::Side::Left);
+    test_same_layouts(31, 33, Axis::Long,  0.5, 50, 10, 19, -1.0, blas::Side::Left);
 }
 
 TEST_F(TestSketchSymmetric, right_sketch_10_to_3_same_layouts) {
     // LDA=10,   (seed_a, seed_skop) = (0, 1) then (31, 33),   beta = 0.0
-    test_same_layouts( 0,  1, MajorAxis::Short, 0.5, 3, 10, 10, 0.0, blas::Side::Right);
-    test_same_layouts( 0,  1, MajorAxis::Long,  0.5, 3, 10, 10, 0.0, blas::Side::Right);
-    test_same_layouts(31, 33, MajorAxis::Short, 0.5, 3, 10, 10, 0.0, blas::Side::Right);
-    test_same_layouts(31, 33, MajorAxis::Long,  0.5, 3, 10, 10, 0.0, blas::Side::Right);
+    test_same_layouts( 0,  1, Axis::Short, 0.5, 3, 10, 10, 0.0, blas::Side::Right);
+    test_same_layouts( 0,  1, Axis::Long,  0.5, 3, 10, 10, 0.0, blas::Side::Right);
+    test_same_layouts(31, 33, Axis::Short, 0.5, 3, 10, 10, 0.0, blas::Side::Right);
+    test_same_layouts(31, 33, Axis::Long,  0.5, 3, 10, 10, 0.0, blas::Side::Right);
     // LDA=19,   (seed_a, seed_skop) = (0, 1) then (31, 33),   beta = 0.0
-    test_same_layouts(0, 1,   MajorAxis::Short, 0.5, 3, 10, 19, 0.0, blas::Side::Right);
-    test_same_layouts(0, 1,   MajorAxis::Long,  0.5, 3, 10, 19, 0.0, blas::Side::Right);
-    test_same_layouts(31, 33, MajorAxis::Short, 0.5, 3, 10, 19, 0.0, blas::Side::Right);
-    test_same_layouts(31, 33, MajorAxis::Long,  0.5, 3, 10, 19, 0.0, blas::Side::Right);
+    test_same_layouts(0, 1,   Axis::Short, 0.5, 3, 10, 19, 0.0, blas::Side::Right);
+    test_same_layouts(0, 1,   Axis::Long,  0.5, 3, 10, 19, 0.0, blas::Side::Right);
+    test_same_layouts(31, 33, Axis::Short, 0.5, 3, 10, 19, 0.0, blas::Side::Right);
+    test_same_layouts(31, 33, Axis::Long,  0.5, 3, 10, 19, 0.0, blas::Side::Right);
     // LDA=10,   (seed_a, seed_skop) = (0, 1) then (31, 33),   beta = -1.0
-    test_same_layouts( 0,  1, MajorAxis::Short, 0.5, 3, 10, 10, -1.0, blas::Side::Right);
-    test_same_layouts( 0,  1, MajorAxis::Long,  0.5, 3, 10, 10, -1.0, blas::Side::Right);
-    test_same_layouts(31, 33, MajorAxis::Short, 0.5, 3, 10, 10, -1.0, blas::Side::Right);
-    test_same_layouts(31, 33, MajorAxis::Long,  0.5, 3, 10, 10, -1.0, blas::Side::Right);
+    test_same_layouts( 0,  1, Axis::Short, 0.5, 3, 10, 10, -1.0, blas::Side::Right);
+    test_same_layouts( 0,  1, Axis::Long,  0.5, 3, 10, 10, -1.0, blas::Side::Right);
+    test_same_layouts(31, 33, Axis::Short, 0.5, 3, 10, 10, -1.0, blas::Side::Right);
+    test_same_layouts(31, 33, Axis::Long,  0.5, 3, 10, 10, -1.0, blas::Side::Right);
     // LDA=19,   (seed_a, seed_skop) = (0, 1) then (31, 33),   beta = -1.0
-    test_same_layouts(0, 1,   MajorAxis::Short, 0.5, 3, 10, 19, -1.0, blas::Side::Right);
-    test_same_layouts(0, 1,   MajorAxis::Long,  0.5, 3, 10, 19, -1.0, blas::Side::Right);
-    test_same_layouts(31, 33, MajorAxis::Short, 0.5, 3, 10, 19, -1.0, blas::Side::Right);
-    test_same_layouts(31, 33, MajorAxis::Long,  0.5, 3, 10, 19, -1.0, blas::Side::Right);
+    test_same_layouts(0, 1,   Axis::Short, 0.5, 3, 10, 19, -1.0, blas::Side::Right);
+    test_same_layouts(0, 1,   Axis::Long,  0.5, 3, 10, 19, -1.0, blas::Side::Right);
+    test_same_layouts(31, 33, Axis::Short, 0.5, 3, 10, 19, -1.0, blas::Side::Right);
+    test_same_layouts(31, 33, Axis::Long,  0.5, 3, 10, 19, -1.0, blas::Side::Right);
 }
 
 TEST_F(TestSketchSymmetric, right_lift_same_layouts) {
     // LDA=10,   (seed_a, seed_skop) = (0, 1) then (31, 33),   beta = 0.0
-    test_same_layouts( 0,  1, MajorAxis::Short, 0.5, 13, 10, 10, 0.0, blas::Side::Right);
-    test_same_layouts( 0,  1, MajorAxis::Long,  0.5, 13, 10, 10, 0.0, blas::Side::Right);
-    test_same_layouts(31, 33, MajorAxis::Short, 0.5, 13, 10, 10, 0.0, blas::Side::Right);
-    test_same_layouts(31, 33, MajorAxis::Long,  0.5, 13, 10, 10, 0.0, blas::Side::Right);
+    test_same_layouts( 0,  1, Axis::Short, 0.5, 13, 10, 10, 0.0, blas::Side::Right);
+    test_same_layouts( 0,  1, Axis::Long,  0.5, 13, 10, 10, 0.0, blas::Side::Right);
+    test_same_layouts(31, 33, Axis::Short, 0.5, 13, 10, 10, 0.0, blas::Side::Right);
+    test_same_layouts(31, 33, Axis::Long,  0.5, 13, 10, 10, 0.0, blas::Side::Right);
     // LDA=19,   (seed_a, seed_skop) = (0, 1) then (31, 33),   beta = 0.0
-    test_same_layouts(0, 1,   MajorAxis::Short, 0.5, 50, 10, 19, 0.0, blas::Side::Right);
-    test_same_layouts(0, 1,   MajorAxis::Long,  0.5, 50, 10, 19, 0.0, blas::Side::Right);
-    test_same_layouts(31, 33, MajorAxis::Short, 0.5, 50, 10, 19, 0.0, blas::Side::Right);
-    test_same_layouts(31, 33, MajorAxis::Long,  0.5, 50, 10, 19, 0.0, blas::Side::Right);
+    test_same_layouts(0, 1,   Axis::Short, 0.5, 50, 10, 19, 0.0, blas::Side::Right);
+    test_same_layouts(0, 1,   Axis::Long,  0.5, 50, 10, 19, 0.0, blas::Side::Right);
+    test_same_layouts(31, 33, Axis::Short, 0.5, 50, 10, 19, 0.0, blas::Side::Right);
+    test_same_layouts(31, 33, Axis::Long,  0.5, 50, 10, 19, 0.0, blas::Side::Right);
     // LDA=10,   (seed_a, seed_skop) = (0, 1) then (31, 33),   beta = -1.0
-    test_same_layouts( 0,  1, MajorAxis::Short, 0.5, 13, 10, 10, -1.0, blas::Side::Right);
-    test_same_layouts( 0,  1, MajorAxis::Long,  0.5, 13, 10, 10, -1.0, blas::Side::Right);
-    test_same_layouts(31, 33, MajorAxis::Short, 0.5, 13, 10, 10, -1.0, blas::Side::Right);
-    test_same_layouts(31, 33, MajorAxis::Long,  0.5, 13, 10, 10, -1.0, blas::Side::Right);
+    test_same_layouts( 0,  1, Axis::Short, 0.5, 13, 10, 10, -1.0, blas::Side::Right);
+    test_same_layouts( 0,  1, Axis::Long,  0.5, 13, 10, 10, -1.0, blas::Side::Right);
+    test_same_layouts(31, 33, Axis::Short, 0.5, 13, 10, 10, -1.0, blas::Side::Right);
+    test_same_layouts(31, 33, Axis::Long,  0.5, 13, 10, 10, -1.0, blas::Side::Right);
     // LDA=19,   (seed_a, seed_skop) = (0, 1) then (31, 33),   beta = -1.0
-    test_same_layouts(0, 1,   MajorAxis::Short, 0.5, 50, 10, 19, -1.0, blas::Side::Right);
-    test_same_layouts(0, 1,   MajorAxis::Long,  0.5, 50, 10, 19, -1.0, blas::Side::Right);
-    test_same_layouts(31, 33, MajorAxis::Short, 0.5, 50, 10, 19, -1.0, blas::Side::Right);
-    test_same_layouts(31, 33, MajorAxis::Long,  0.5, 50, 10, 19, -1.0, blas::Side::Right);
+    test_same_layouts(0, 1,   Axis::Short, 0.5, 50, 10, 19, -1.0, blas::Side::Right);
+    test_same_layouts(0, 1,   Axis::Long,  0.5, 50, 10, 19, -1.0, blas::Side::Right);
+    test_same_layouts(31, 33, Axis::Short, 0.5, 50, 10, 19, -1.0, blas::Side::Right);
+    test_same_layouts(31, 33, Axis::Long,  0.5, 50, 10, 19, -1.0, blas::Side::Right);
 }
 
 
@@ -255,93 +255,93 @@ TEST_F(TestSketchSymmetric, right_lift_same_layouts) {
 
 TEST_F(TestSketchSymmetric, left_sketch_10_to_3_opposing_layouts) {
     // LDA=10,   (seed_a, seed_skop) = (0, 1) then (31, 33),   beta = 0.0
-    test_opposing_layouts( 0,  1, MajorAxis::Short, 0.5, 3, 10, 10, 0.0, blas::Side::Left);
-    test_opposing_layouts( 0,  1, MajorAxis::Long,  0.5, 3, 10, 10, 0.0, blas::Side::Left);
-    test_opposing_layouts(31, 33, MajorAxis::Short, 0.5, 3, 10, 10, 0.0, blas::Side::Left);
-    test_opposing_layouts(31, 33, MajorAxis::Long,  0.5, 3, 10, 10, 0.0, blas::Side::Left);
+    test_opposing_layouts( 0,  1, Axis::Short, 0.5, 3, 10, 10, 0.0, blas::Side::Left);
+    test_opposing_layouts( 0,  1, Axis::Long,  0.5, 3, 10, 10, 0.0, blas::Side::Left);
+    test_opposing_layouts(31, 33, Axis::Short, 0.5, 3, 10, 10, 0.0, blas::Side::Left);
+    test_opposing_layouts(31, 33, Axis::Long,  0.5, 3, 10, 10, 0.0, blas::Side::Left);
     // LDA=19,   (seed_a, seed_skop) = (0, 1) then (31, 33),   beta = 0.0
-    test_opposing_layouts(0, 1,   MajorAxis::Short, 0.5, 3, 10, 19, 0.0, blas::Side::Left);
-    test_opposing_layouts(0, 1,   MajorAxis::Long,  0.5, 3, 10, 19, 0.0, blas::Side::Left);
-    test_opposing_layouts(31, 33, MajorAxis::Short, 0.5, 3, 10, 19, 0.0, blas::Side::Left);
-    test_opposing_layouts(31, 33, MajorAxis::Long,  0.5, 3, 10, 19, 0.0, blas::Side::Left);
+    test_opposing_layouts(0, 1,   Axis::Short, 0.5, 3, 10, 19, 0.0, blas::Side::Left);
+    test_opposing_layouts(0, 1,   Axis::Long,  0.5, 3, 10, 19, 0.0, blas::Side::Left);
+    test_opposing_layouts(31, 33, Axis::Short, 0.5, 3, 10, 19, 0.0, blas::Side::Left);
+    test_opposing_layouts(31, 33, Axis::Long,  0.5, 3, 10, 19, 0.0, blas::Side::Left);
     // LDA=10,   (seed_a, seed_skop) = (0, 1) then (31, 33),   beta = -1.0
-    test_opposing_layouts( 0,  1, MajorAxis::Short, 0.5, 3, 10, 10, -1.0, blas::Side::Left);
-    test_opposing_layouts( 0,  1, MajorAxis::Long,  0.5, 3, 10, 10, -1.0, blas::Side::Left);
-    test_opposing_layouts(31, 33, MajorAxis::Short, 0.5, 3, 10, 10, -1.0, blas::Side::Left);
-    test_opposing_layouts(31, 33, MajorAxis::Long,  0.5, 3, 10, 10, -1.0, blas::Side::Left);
+    test_opposing_layouts( 0,  1, Axis::Short, 0.5, 3, 10, 10, -1.0, blas::Side::Left);
+    test_opposing_layouts( 0,  1, Axis::Long,  0.5, 3, 10, 10, -1.0, blas::Side::Left);
+    test_opposing_layouts(31, 33, Axis::Short, 0.5, 3, 10, 10, -1.0, blas::Side::Left);
+    test_opposing_layouts(31, 33, Axis::Long,  0.5, 3, 10, 10, -1.0, blas::Side::Left);
     // LDA=19,   (seed_a, seed_skop) = (0, 1) then (31, 33),   beta = -1.0
-    test_opposing_layouts(0, 1,   MajorAxis::Short, 0.5, 3, 10, 19, -1.0, blas::Side::Left);
-    test_opposing_layouts(0, 1,   MajorAxis::Long,  0.5, 3, 10, 19, -1.0, blas::Side::Left);
-    test_opposing_layouts(31, 33, MajorAxis::Short, 0.5, 3, 10, 19, -1.0, blas::Side::Left);
-    test_opposing_layouts(31, 33, MajorAxis::Long,  0.5, 3, 10, 19, -1.0, blas::Side::Left);
+    test_opposing_layouts(0, 1,   Axis::Short, 0.5, 3, 10, 19, -1.0, blas::Side::Left);
+    test_opposing_layouts(0, 1,   Axis::Long,  0.5, 3, 10, 19, -1.0, blas::Side::Left);
+    test_opposing_layouts(31, 33, Axis::Short, 0.5, 3, 10, 19, -1.0, blas::Side::Left);
+    test_opposing_layouts(31, 33, Axis::Long,  0.5, 3, 10, 19, -1.0, blas::Side::Left);
 }
 
 TEST_F(TestSketchSymmetric, left_lift_opposing_layouts) {
     // LDA=10,   (seed_a, seed_skop) = (0, 1) then (31, 33),   beta = 0.0
-    test_opposing_layouts( 0,  1, MajorAxis::Short, 0.5, 13, 10, 10, 0.0, blas::Side::Left);
-    test_opposing_layouts( 0,  1, MajorAxis::Long,  0.5, 13, 10, 10, 0.0, blas::Side::Left);
-    test_opposing_layouts(31, 33, MajorAxis::Short, 0.5, 13, 10, 10, 0.0, blas::Side::Left);
-    test_opposing_layouts(31, 33, MajorAxis::Long,  0.5, 13, 10, 10, 0.0, blas::Side::Left);
+    test_opposing_layouts( 0,  1, Axis::Short, 0.5, 13, 10, 10, 0.0, blas::Side::Left);
+    test_opposing_layouts( 0,  1, Axis::Long,  0.5, 13, 10, 10, 0.0, blas::Side::Left);
+    test_opposing_layouts(31, 33, Axis::Short, 0.5, 13, 10, 10, 0.0, blas::Side::Left);
+    test_opposing_layouts(31, 33, Axis::Long,  0.5, 13, 10, 10, 0.0, blas::Side::Left);
     // LDA=19,   (seed_a, seed_skop) = (0, 1) then (31, 33),   beta = 0.0
-    test_opposing_layouts(0, 1,   MajorAxis::Short, 0.5, 50, 10, 19, 0.0, blas::Side::Left);
-    test_opposing_layouts(0, 1,   MajorAxis::Long,  0.5, 50, 10, 19, 0.0, blas::Side::Left);
-    test_opposing_layouts(31, 33, MajorAxis::Short, 0.5, 50, 10, 19, 0.0, blas::Side::Left);
-    test_opposing_layouts(31, 33, MajorAxis::Long,  0.5, 50, 10, 19, 0.0, blas::Side::Left);
+    test_opposing_layouts(0, 1,   Axis::Short, 0.5, 50, 10, 19, 0.0, blas::Side::Left);
+    test_opposing_layouts(0, 1,   Axis::Long,  0.5, 50, 10, 19, 0.0, blas::Side::Left);
+    test_opposing_layouts(31, 33, Axis::Short, 0.5, 50, 10, 19, 0.0, blas::Side::Left);
+    test_opposing_layouts(31, 33, Axis::Long,  0.5, 50, 10, 19, 0.0, blas::Side::Left);
     // LDA=10,   (seed_a, seed_skop) = (0, 1) then (31, 33),   beta = -1.0
-    test_opposing_layouts( 0,  1, MajorAxis::Short, 0.5, 13, 10, 10, -1.0, blas::Side::Left);
-    test_opposing_layouts( 0,  1, MajorAxis::Long,  0.5, 13, 10, 10, -1.0, blas::Side::Left);
-    test_opposing_layouts(31, 33, MajorAxis::Short, 0.5, 13, 10, 10, -1.0, blas::Side::Left);
-    test_opposing_layouts(31, 33, MajorAxis::Long,  0.5, 13, 10, 10, -1.0, blas::Side::Left);
+    test_opposing_layouts( 0,  1, Axis::Short, 0.5, 13, 10, 10, -1.0, blas::Side::Left);
+    test_opposing_layouts( 0,  1, Axis::Long,  0.5, 13, 10, 10, -1.0, blas::Side::Left);
+    test_opposing_layouts(31, 33, Axis::Short, 0.5, 13, 10, 10, -1.0, blas::Side::Left);
+    test_opposing_layouts(31, 33, Axis::Long,  0.5, 13, 10, 10, -1.0, blas::Side::Left);
     // LDA=19,   (seed_a, seed_skop) = (0, 1) then (31, 33),   beta = -1.0
-    test_opposing_layouts(0, 1,   MajorAxis::Short, 0.5, 50, 10, 19, -1.0, blas::Side::Left);
-    test_opposing_layouts(0, 1,   MajorAxis::Long,  0.5, 50, 10, 19, -1.0, blas::Side::Left);
-    test_opposing_layouts(31, 33, MajorAxis::Short, 0.5, 50, 10, 19, -1.0, blas::Side::Left);
-    test_opposing_layouts(31, 33, MajorAxis::Long,  0.5, 50, 10, 19, -1.0, blas::Side::Left);
+    test_opposing_layouts(0, 1,   Axis::Short, 0.5, 50, 10, 19, -1.0, blas::Side::Left);
+    test_opposing_layouts(0, 1,   Axis::Long,  0.5, 50, 10, 19, -1.0, blas::Side::Left);
+    test_opposing_layouts(31, 33, Axis::Short, 0.5, 50, 10, 19, -1.0, blas::Side::Left);
+    test_opposing_layouts(31, 33, Axis::Long,  0.5, 50, 10, 19, -1.0, blas::Side::Left);
 }
 
 TEST_F(TestSketchSymmetric, right_sketch_10_to_3_opposing_layouts) {
     // LDA=10,   (seed_a, seed_skop) = (0, 1) then (31, 33),   beta = 0.0
-    test_opposing_layouts( 0,  1, MajorAxis::Short, 0.5, 3, 10, 10, 0.0, blas::Side::Right);
-    test_opposing_layouts( 0,  1, MajorAxis::Long,  0.5, 3, 10, 10, 0.0, blas::Side::Right);
-    test_opposing_layouts(31, 33, MajorAxis::Short, 0.5, 3, 10, 10, 0.0, blas::Side::Right);
-    test_opposing_layouts(31, 33, MajorAxis::Long,  0.5, 3, 10, 10, 0.0, blas::Side::Right);
+    test_opposing_layouts( 0,  1, Axis::Short, 0.5, 3, 10, 10, 0.0, blas::Side::Right);
+    test_opposing_layouts( 0,  1, Axis::Long,  0.5, 3, 10, 10, 0.0, blas::Side::Right);
+    test_opposing_layouts(31, 33, Axis::Short, 0.5, 3, 10, 10, 0.0, blas::Side::Right);
+    test_opposing_layouts(31, 33, Axis::Long,  0.5, 3, 10, 10, 0.0, blas::Side::Right);
     // LDA=19,   (seed_a, seed_skop) = (0, 1) then (31, 33),   beta = 0.0
-    test_opposing_layouts(0, 1,   MajorAxis::Short, 0.5, 3, 10, 19, 0.0, blas::Side::Right);
-    test_opposing_layouts(0, 1,   MajorAxis::Long,  0.5, 3, 10, 19, 0.0, blas::Side::Right);
-    test_opposing_layouts(31, 33, MajorAxis::Short, 0.5, 3, 10, 19, 0.0, blas::Side::Right);
-    test_opposing_layouts(31, 33, MajorAxis::Long,  0.5, 3, 10, 19, 0.0, blas::Side::Right);
+    test_opposing_layouts(0, 1,   Axis::Short, 0.5, 3, 10, 19, 0.0, blas::Side::Right);
+    test_opposing_layouts(0, 1,   Axis::Long,  0.5, 3, 10, 19, 0.0, blas::Side::Right);
+    test_opposing_layouts(31, 33, Axis::Short, 0.5, 3, 10, 19, 0.0, blas::Side::Right);
+    test_opposing_layouts(31, 33, Axis::Long,  0.5, 3, 10, 19, 0.0, blas::Side::Right);
     // LDA=10,   (seed_a, seed_skop) = (0, 1) then (31, 33),   beta = -1.0
-    test_opposing_layouts( 0,  1, MajorAxis::Short, 0.5, 3, 10, 10, -1.0, blas::Side::Right);
-    test_opposing_layouts( 0,  1, MajorAxis::Long,  0.5, 3, 10, 10, -1.0, blas::Side::Right);
-    test_opposing_layouts(31, 33, MajorAxis::Short, 0.5, 3, 10, 10, -1.0, blas::Side::Right);
-    test_opposing_layouts(31, 33, MajorAxis::Long,  0.5, 3, 10, 10, -1.0, blas::Side::Right);
+    test_opposing_layouts( 0,  1, Axis::Short, 0.5, 3, 10, 10, -1.0, blas::Side::Right);
+    test_opposing_layouts( 0,  1, Axis::Long,  0.5, 3, 10, 10, -1.0, blas::Side::Right);
+    test_opposing_layouts(31, 33, Axis::Short, 0.5, 3, 10, 10, -1.0, blas::Side::Right);
+    test_opposing_layouts(31, 33, Axis::Long,  0.5, 3, 10, 10, -1.0, blas::Side::Right);
     // LDA=19,   (seed_a, seed_skop) = (0, 1) then (31, 33),   beta = -1.0
-    test_opposing_layouts(0, 1,   MajorAxis::Short, 0.5, 3, 10, 19, -1.0, blas::Side::Right);
-    test_opposing_layouts(0, 1,   MajorAxis::Long,  0.5, 3, 10, 19, -1.0, blas::Side::Right);
-    test_opposing_layouts(31, 33, MajorAxis::Short, 0.5, 3, 10, 19, -1.0, blas::Side::Right);
-    test_opposing_layouts(31, 33, MajorAxis::Long,  0.5, 3, 10, 19, -1.0, blas::Side::Right);
+    test_opposing_layouts(0, 1,   Axis::Short, 0.5, 3, 10, 19, -1.0, blas::Side::Right);
+    test_opposing_layouts(0, 1,   Axis::Long,  0.5, 3, 10, 19, -1.0, blas::Side::Right);
+    test_opposing_layouts(31, 33, Axis::Short, 0.5, 3, 10, 19, -1.0, blas::Side::Right);
+    test_opposing_layouts(31, 33, Axis::Long,  0.5, 3, 10, 19, -1.0, blas::Side::Right);
 }
 
 
 TEST_F(TestSketchSymmetric, right_lift_opposing_layouts) {
     // LDA=10,   (seed_a, seed_skop) = (0, 1) then (31, 33),   beta = 0.0
-    test_opposing_layouts( 0,  1, MajorAxis::Short, 0.5, 13, 10, 10, 0.0, blas::Side::Right);
-    test_opposing_layouts( 0,  1, MajorAxis::Long,  0.5, 13, 10, 10, 0.0, blas::Side::Right);
-    test_opposing_layouts(31, 33, MajorAxis::Short, 0.5, 13, 10, 10, 0.0, blas::Side::Right);
-    test_opposing_layouts(31, 33, MajorAxis::Long,  0.5, 13, 10, 10, 0.0, blas::Side::Right);
+    test_opposing_layouts( 0,  1, Axis::Short, 0.5, 13, 10, 10, 0.0, blas::Side::Right);
+    test_opposing_layouts( 0,  1, Axis::Long,  0.5, 13, 10, 10, 0.0, blas::Side::Right);
+    test_opposing_layouts(31, 33, Axis::Short, 0.5, 13, 10, 10, 0.0, blas::Side::Right);
+    test_opposing_layouts(31, 33, Axis::Long,  0.5, 13, 10, 10, 0.0, blas::Side::Right);
     // LDA=19,   (seed_a, seed_skop) = (0, 1) then (31, 33),   beta = 0.0
-    test_opposing_layouts(0, 1,   MajorAxis::Short, 0.5, 50, 10, 19, 0.0, blas::Side::Right);
-    test_opposing_layouts(0, 1,   MajorAxis::Long,  0.5, 50, 10, 19, 0.0, blas::Side::Right);
-    test_opposing_layouts(31, 33, MajorAxis::Short, 0.5, 50, 10, 19, 0.0, blas::Side::Right);
-    test_opposing_layouts(31, 33, MajorAxis::Long,  0.5, 50, 10, 19, 0.0, blas::Side::Right);
+    test_opposing_layouts(0, 1,   Axis::Short, 0.5, 50, 10, 19, 0.0, blas::Side::Right);
+    test_opposing_layouts(0, 1,   Axis::Long,  0.5, 50, 10, 19, 0.0, blas::Side::Right);
+    test_opposing_layouts(31, 33, Axis::Short, 0.5, 50, 10, 19, 0.0, blas::Side::Right);
+    test_opposing_layouts(31, 33, Axis::Long,  0.5, 50, 10, 19, 0.0, blas::Side::Right);
     // LDA=10,   (seed_a, seed_skop) = (0, 1) then (31, 33),   beta = -1.0
-    test_opposing_layouts( 0,  1, MajorAxis::Short, 0.5, 13, 10, 10, -1.0, blas::Side::Right);
-    test_opposing_layouts( 0,  1, MajorAxis::Long,  0.5, 13, 10, 10, -1.0, blas::Side::Right);
-    test_opposing_layouts(31, 33, MajorAxis::Short, 0.5, 13, 10, 10, -1.0, blas::Side::Right);
-    test_opposing_layouts(31, 33, MajorAxis::Long,  0.5, 13, 10, 10, -1.0, blas::Side::Right);
+    test_opposing_layouts( 0,  1, Axis::Short, 0.5, 13, 10, 10, -1.0, blas::Side::Right);
+    test_opposing_layouts( 0,  1, Axis::Long,  0.5, 13, 10, 10, -1.0, blas::Side::Right);
+    test_opposing_layouts(31, 33, Axis::Short, 0.5, 13, 10, 10, -1.0, blas::Side::Right);
+    test_opposing_layouts(31, 33, Axis::Long,  0.5, 13, 10, 10, -1.0, blas::Side::Right);
     // LDA=19,   (seed_a, seed_skop) = (0, 1) then (31, 33),   beta = -1.0
-    test_opposing_layouts(0, 1,   MajorAxis::Short, 0.5, 50, 10, 19, -1.0, blas::Side::Right);
-    test_opposing_layouts(0, 1,   MajorAxis::Long,  0.5, 50, 10, 19, -1.0, blas::Side::Right);
-    test_opposing_layouts(31, 33, MajorAxis::Short, 0.5, 50, 10, 19, -1.0, blas::Side::Right);
-    test_opposing_layouts(31, 33, MajorAxis::Long,  0.5, 50, 10, 19, -1.0, blas::Side::Right);
+    test_opposing_layouts(0, 1,   Axis::Short, 0.5, 50, 10, 19, -1.0, blas::Side::Right);
+    test_opposing_layouts(0, 1,   Axis::Long,  0.5, 50, 10, 19, -1.0, blas::Side::Right);
+    test_opposing_layouts(31, 33, Axis::Short, 0.5, 50, 10, 19, -1.0, blas::Side::Right);
+    test_opposing_layouts(31, 33, Axis::Long,  0.5, 50, 10, 19, -1.0, blas::Side::Right);
 }
diff --git a/test/test_matmul_wrappers/test_sketch_vector.cc b/test/test_matmul_wrappers/test_sketch_vector.cc
index 279c4a2f..abbd0630 100644
--- a/test/test_matmul_wrappers/test_sketch_vector.cc
+++ b/test/test_matmul_wrappers/test_sketch_vector.cc
@@ -70,8 +70,8 @@ class TestSketchVector : public ::testing::Test
         RandBLAS::fill_dense(S);
         int64_t lds = (S.layout == blas::Layout::RowMajor) ? m : d;
 
-        RandBLAS::sketch_vector<T>(blas::Op::NoTrans, d, m, 1.0, S, 0, 0, x, incx, 0.0, y_actual, incy);
-        blas::gemv(S.layout, blas::Op::NoTrans, d, m, 1.0, S.buff, lds, x, incx, 0.0, y_expect, incy); 
+        RandBLAS::sketch_vector(blas::Op::NoTrans, d, m, (T)1.0, S, 0, 0, x, incx, (T)0.0, y_actual, incy);
+        blas::gemv(S.layout, blas::Op::NoTrans, d, m, (T)1.0, S.buff, lds, x, incx, (T)0.0, y_expect, incy); 
 
         test::comparison::buffs_approx_equal(d, y_actual, incy, y_expect, incy,
                 __PRETTY_FUNCTION__, __FILE__, __LINE__
@@ -101,8 +101,8 @@ class TestSketchVector : public ::testing::Test
         int64_t lds = (S.layout == blas::Layout::RowMajor) ? d : m;
 
         // Perform tall sketch with Op::Trans
-        RandBLAS::sketch_vector<T>(blas::Op::Trans, m, d, 1.0, S, 0, 0, x, incx, 0, y_actual, incy);
-        blas::gemv(S.layout, blas::Op::Trans, m, d, 1.0, S.buff, lds, x, incx, 0, y_expect, incy); 
+        RandBLAS::sketch_vector(blas::Op::Trans, m, d, (T)1.0, S, 0, 0, x, incx, (T)0.0, y_actual, incy);
+        blas::gemv(S.layout, blas::Op::Trans, m, d, (T)1.0, S.buff, lds, x, incx, (T)0.0, y_expect, incy); 
         
         // Compare entrywise results of sketching with sketch_vector and using gemv
         test::comparison::buffs_approx_equal(d, y_actual, incy, y_expect, incy,
@@ -136,8 +136,8 @@ class TestSketchVector : public ::testing::Test
         RandBLAS::fill_dense(S_tall);
 
         // Perform wide sketch with Op::NoTrans and tall sketch with Op::Trans. Should be the same operation
-        RandBLAS::sketch_vector<T>(blas::Op::NoTrans, d, m, 1.0, S_wide, 0, 0, x, incx, 0.0, y_wide, incy);
-        RandBLAS::sketch_vector<T>(blas::Op::Trans, m, d, 1.0, S_tall, 0, 0, x, incx, 0.0, y_tall, incy);
+        RandBLAS::sketch_vector(blas::Op::NoTrans, d, m, (T)1.0, S_wide, 0, 0, x, incx, (T)0.0, y_wide, incy);
+        RandBLAS::sketch_vector(blas::Op::Trans, m, d, (T)1.0, S_tall, 0, 0, x, incx, (T)0.0, y_tall, incy);
         
         test::comparison::buffs_approx_equal(d, y_wide, incy, y_tall, incy,
                 __PRETTY_FUNCTION__, __FILE__, __LINE__
@@ -168,8 +168,8 @@ class TestSketchVector : public ::testing::Test
         int64_t lds = (S.layout == blas::Layout::RowMajor) ? m : d;
 
         // Perform tall sketch
-        RandBLAS::sketch_vector<T>(blas::Op::NoTrans, d, m, 1, S, 0, 0, x, incx, 0, y_actual, incy);
-        blas::gemv(S.layout, blas::Op::NoTrans, d, m, 1, S.buff, lds, x, incx, 0, y_expect, incy); 
+        RandBLAS::sketch_vector(blas::Op::NoTrans, d, m, (T)1, S, 0, 0, x, incx, (T)0, y_actual, incy);
+        blas::gemv(S.layout, blas::Op::NoTrans, d, m, (T)1, S.buff, lds, x, incx, (T)0, y_expect, incy); 
 
         // Compare entrywise results of sketching with sketch_vector and using gemv
         test::comparison::buffs_approx_equal(d, y_actual, incy, y_expect, incy,