set next_state on construction, web documentation, more stat tests an…

…d testing infrastructre, API changes for DenseSkOp and SparseSkOp, more C++20 concepts (#104) This PR is huge. I'm going to merge it even though there are outstanding documentation TODOs, as indicated in Issue #107. ## Changes in RandBLAS proper Major new identifiers * SketchingDistribution and SketchingOperator C++20 concepts. * isometry_scale_factor (overloaded for DenseSkOp and SparseSkOp), returns the scale that should canonically be applied to the full sketching operator assuming its entries are sampled in RandBLAS' standard way (uniform $\sqrt{3}[-1,1]$ or Gaussian(0, 1) for dense, or length-vec_nnz vectors of iid Rademachers for sparse). * dense::compute_next_state and sparse::compute_next_state, for computing the appropriate value of ``S.next_state`` in O(1) time. This enables defining sequences of sketching operators with the same key without having to explicitly generate the operators. Minor new identifiers * For dense sketching operators, submatrix_as_blackbox accepts a sketching operator, row and column counts, and row and column offsets, and returns a DenseSkOp with a BlackBox distribution whose entries match those of the appropriate submatrix of the given sketching operator. * safe_int_product (in RandBLAS/base.hh) checks for overflows. This is useful when computing potentially-large index offsets for implicitly defined matrices. * util::omatcopy for general stride changes * util::overwrite_triangle, for preparing a triangular matrix to be treated as a general matrix in gemm or trmm. * A new value MajorAxis::Unknown in the MajorAxis enum, for use with BlackBox distribution families. * An overload of ``repeated_fisher_yates`` that skips writes to ``vals`` and ``idxs_minor`` and that has more intuitive argument names. Other changes * The default scalar distribution for DenseSkOp objects is now Gaussian instead of uniform. * The default scaling of uniform sketching operators is $[-\sqrt{3},\sqrt{3}]$, since this choice provides for unit variance. * rewrite of dense::fill_dense_submat_impl, needed in order to efficiently implement dense::compute_next_state. * In SparseSkOp, the typename ``RNG_t`` has been removed, a typename ``state_t`` has been added (equal to ``RNGState<RNG>``) and the typename ``T_t`` typename has been renamed to ``scalar_t``. Similar changes have been made to ``DenseSkOp`` for consistency across the two APIs. * Changed the verbose overload of``fill_dense`` for writing an implicit submatrix to a given buffer. This function now requires a layout parameter as its first argument. It performs an out-of-place change of layout if the requested layout is different from the natural layout implied by the distribution object. * Moved several function definitions for DenseSkOp and SparseSkOp into the struct definitions, rather than only declaring signatures in the struct definition and implementing the functions outside the struct. ## Changes in documentation * Added ``sample_indices_iid_uniform``, ``sample_indicies_iid``, and ``weights_to_cdf`` to a new page in the API reference. * Added a section to the tutorial on updating sketches. ## Changes in tests New / significantly changed tests * test_continuous.cc: Kolmogorov-Smirnov tests for distributional correctness of uniform [-sqrt(3),sqrt(3)] and Gaussian sampling. This includes a helper function for exactly computing the KS test statistic for distributions whose CDF is continuous, which has some surprising subtleties. * test_r123.cc: a significant clean up of (and slight expansion of) test_r123_kat.cc. * test_distortion.cc: statistical tests for subspace embedding distortion of dense sketching operators. This required a bunch of new infrastructure, described below. ``handrolled_lapack.hh`` implements algorithms that would generally be considered "LAPACK-level," although the algorithms won't be found in LAPACK itself. * cholesky (sequential and blocked) * cholesky qr (with a flag to run a second pass) * QR based on block classic Gram-Schmidt, where blocks are handled by cholesky qr2. * QR with a two-pass version of blocked classic Gram-Schmidt. (Not equivalent to modified Gram-Schmidt.) * A function to check if the outer-most Gershgorin discs for a positive definite matrix have converged to within some specified relative radius. * cholesky iteration to compute all eigenvalues of a positive definite matrix (with termination criteria based only on extremal eigenvalues). This is actually prohibitively slow except when the matrix is very small. But it works, and it's tested, so might as well keep it around. * power method to compute the extremal eigenvalues of a positive definite matrix. The number of iterations is chosen based on a well-known convergence result for the power method in the absence of a spectral gap. The smallest eigenvalue is obtained by Cholesky decomposing the matrix, explicitly forming its inverse, and then running the standard power method. This leads to faster performance for matrix dimensions and tolerances that we need in statistical tests.
BallisticLA · Aug 22, 2024 · 614c1c2 · 614c1c2
1 parent 826c9bb
commit 614c1c2
Show file tree

Hide file tree

Showing 36 changed files with 2,296 additions and 1,002 deletions.
diff --git a/RandBLAS/base.hh b/RandBLAS/base.hh
@@ -118,14 +118,35 @@ template<typename T>
 concept SignedInteger = (std::numeric_limits<T>::is_signed && std::numeric_limits<T>::is_integer);
 
 
+template <SignedInteger TI, SignedInteger TO = int64_t>
+inline TO safe_int_product(TI a, TI b) {
+    if (a == 0 || b == 0) {
+        return 0;
+    }
+    TO c = a * b;
+    TO b_check = c / a;
+    TO a_check = c / b;
+    if ((a_check != a) || (b_check != b)) {
+        std::stringstream s;
+        s << "Overflow when multiplying a (=" << a << ") and b(=" << b << "), which resulted in " << c << ".\n";
+        throw std::overflow_error(s.str());
+    }
+    return c;
+}
+
+
 enum class MajorAxis : char {
     // ---------------------------------------------------------------------------
     ///  short-axis vectors (cols of a wide matrix, rows of a tall matrix)
     Short = 'S',
 
     // ---------------------------------------------------------------------------
     ///  long-axis vectors (rows of a wide matrix, cols of a tall matrix)
-    Long = 'L'
+    Long = 'L',
+
+    // ---------------------------------------------------------------------------
+    ///  Undefined (used when row-major vs column-major must be explicit)
+    Undefined = 'U'
 };
 
 
@@ -134,28 +155,38 @@ enum class MajorAxis : char {
  * the counter and the key. The arrays' types are statically sized, small
  * (typically of length 2 or 4), and can be distinct from one another.
  * 
- * @tparam RNG A CBRNG type in defined in Random123. We've found that Philox-based
- * CBRNGs work best for our purposes. Strictly speaking, we allow all Random123 CBRNGs
- * besides those based on AES.
+ * The template parameter RNG is a CBRNG type in defined in Random123. We've found
+ * that Philox-based CBRNGs work best for our purposes, but we also support Threefry-based CBRNGS.
  */
 template <typename RNG = r123::Philox4x32>
 struct RNGState
 {
     using generator = RNG;
-    // The unsigned integer type used in this RNGState's counter array.
-    using ctr_uint_type = typename RNG::ctr_type::value_type;
-    /// @brief The unsigned integer type used in this RNGState's key array.
-    ///        This is typically std::uint32_t, but it can be std::uint64_t.
-    using key_uint_type = typename RNG::key_type::value_type;
-    // An array type defined in Random123.
+
     using ctr_type = typename RNG::ctr_type;
-    // An array type defined in Random123.
+    // ^ An array type defined in Random123.
     using key_type = typename RNG::key_type;
+    // ^ An array type defined in Random123.
+    using ctr_uint = typename RNG::ctr_type::value_type;
+    // ^ The unsigned integer type used in this RNGState's counter array.
+
+    /// -------------------------------------------------------------------
+    /// @brief The unsigned integer type used in this RNGState's key array.
+    ///        This is typically std::uint32_t, but it can be std::uint64_t.
+    using key_uint = typename RNG::key_type::value_type;
+
 
     const static int len_c = RNG::ctr_type::static_size;
     const static int len_k = RNG::key_type::static_size;
-    typename RNG::ctr_type counter; ///< This RNGState's counter array.
-    typename RNG::key_type key;     ///< This RNGState's key array.
+    typename RNG::ctr_type counter;
+    // ^ This RNGState's counter array.
+
+    /// ------------------------------------------------------------------
+    /// This RNGState's key array. If you want to manually advance the key
+    /// by an integer increment of size "step," then you do so by calling 
+    /// this->key.incr(step).
+    typename RNG::key_type key;
+
 
     /// Initialize the counter and key arrays to all zeros.
     RNGState() : counter{{0}}, key(key_type{{}}) {}
@@ -171,7 +202,7 @@ struct RNGState
 
     /// Initialize the counter array to all zeros. Initialize the key array to have first
     /// element equal to k and all other elements equal to zero.
-    RNGState(key_uint_type k) : counter{{0}}, key{{k}} {}
+    RNGState(key_uint k) : counter{{0}}, key{{k}} {}
 
     ~RNGState() {};
 
@@ -187,16 +218,16 @@ template <typename RNG>
 RNGState<RNG>::RNGState(
     const RNGState<RNG> &s
 ) {
-    std::memcpy(this->counter.v, s.counter.v, this->len_c * sizeof(ctr_uint_type));
-    std::memcpy(this->key.v,     s.key.v,     this->len_k * sizeof(key_uint_type));
+    std::memcpy(this->counter.v, s.counter.v, this->len_c * sizeof(ctr_uint));
+    std::memcpy(this->key.v,     s.key.v,     this->len_k * sizeof(key_uint));
 }
 
 template <typename RNG>
 RNGState<RNG> &RNGState<RNG>::operator=(
     const RNGState &s
 ) {
-    std::memcpy(this->counter.v, s.counter.v, this->len_c * sizeof(ctr_uint_type));
-    std::memcpy(this->key.v,     s.key.v,     this->len_k * sizeof(key_uint_type));
+    std::memcpy(this->counter.v, s.counter.v, this->len_c * sizeof(ctr_uint));
+    std::memcpy(this->key.v,     s.key.v,     this->len_k * sizeof(key_uint));
     return *this;
 }
 
@@ -219,4 +250,45 @@ std::ostream &operator<<(
     return out;
 }
 
+// =============================================================================
+/// @verbatim embed:rst:leading-slashes
+///
+/// .. NOTE: \ttt expands to \texttt (its definition is given in an rst file)
+///
+/// Words. Hello!
+///
+/// @endverbatim
+template<typename SkDist>
+concept SketchingDistribution = requires(SkDist D) {
+    { D.n_rows }     -> std::convertible_to<const int64_t>;
+    { D.n_cols }     -> std::convertible_to<const int64_t>;
+    { D.major_axis } -> std::convertible_to<const MajorAxis>;
+};
+
+// =============================================================================
+/// \fn isometry_scale_factor(SkDist D) 
+/// @verbatim embed:rst:leading-slashes
+/// Words here ...
+/// @endverbatim
+template <typename T, SketchingDistribution SkDist>
+inline T isometry_scale_factor(SkDist D);
+
+
+// =============================================================================
+/// @verbatim embed:rst:leading-slashes
+///
+/// .. NOTE: \ttt expands to \texttt (its definition is given in an rst file)
+///
+/// Words. Hello!
+///
+/// @endverbatim
+template<typename SkOp>
+concept SketchingOperator = requires(SkOp S) {
+    { S.n_rows }     -> std::convertible_to<const int64_t>;
+    { S.n_cols }     -> std::convertible_to<const int64_t>;
+    { S.seed_state } -> std::convertible_to<const typename SkOp::state_t>;
+    { S.seed_state } -> std::convertible_to<const typename SkOp::state_t>;
+};
+
+
 } // end namespace RandBLAS::base