Skip to content

Commit

Permalink
Release of RandBLAS 1.0.1: bugfixes for sampling from index sets (#121)
Browse files Browse the repository at this point in the history
Changes since 1.0.0:
 * Resolved #118.
* Resolved minor bugs in values of RNGState objects returned from
sample_indices_iid and sample_indices_iid_uniform.
* Changed certain template defaults from ``r123::Philox4x32`` to
``DefaultRNG``.
* Replaced the RNGState constructor which previously accepted
``RNG::key_type::value_type`` to now accept``std::uint64_t``. This has
the benefit of allowing any initial key for RandBLAS' default RNG, as
opposed to only allowing keys of the form ``{k, 0}`` for some uint32
``k``.
 * Added ``sqrt_epsilon<T>()`` to the web documentation.
* Clarify that any functions which appear in the web documentation are
part of our public API, for purposes of semantic versioning.
  • Loading branch information
rileyjmurray authored Sep 29, 2024
1 parent 5ca3b3e commit 1489103
Show file tree
Hide file tree
Showing 10 changed files with 310 additions and 97 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,9 @@

## Local

RandBLAS/config.h
temp_vscode

.vscode/*
.idea/*.xml

Expand Down
96 changes: 65 additions & 31 deletions RandBLAS/base.hh
Original file line number Diff line number Diff line change
Expand Up @@ -35,9 +35,7 @@
#include "RandBLAS/random_gen.hh"

#include <blas.hh>
#include <tuple>
#include <utility>
#include <type_traits>
#include <cstring>
#include <cstdint>
#include <iostream>
Expand All @@ -47,22 +45,22 @@
#endif

#include<iostream>
#include<numeric>


/// code common across the project
namespace RandBLAS {

typedef r123::Philox4x32 DefaultRNG;
using std::uint64_t;

/** A representation of the state of a counter-based random number generator
* (CBRNG) defined in Random123. The representation consists of two arrays:
* the counter and the key. The arrays' types are statically sized, small
* (typically of length 2 or 4), and can be distinct from one another.
*
* The template parameter RNG is a CBRNG type in defined in Random123. We've found
* that Philox-based CBRNGs work best for our purposes, but we also support Threefry-based CBRNGS.
*/
/// -------------------------------------------------------------------
/// This is a stateful version of a
/// *counter-based random number generator* (CBRNG) from Random123.
/// It packages a CBRNG together with two arrays, called "counter" and "key,"
/// which are interpreted as extended-width unsigned integers.
///
/// RNGStates are used in every RandBLAS function that involves random sampling.
///
template <typename RNG = DefaultRNG>
struct RNGState {

Expand All @@ -78,46 +76,63 @@ struct RNGState {
// ^ An array type defined in Random123.
using ctr_uint = typename RNG::ctr_type::value_type;
// ^ The unsigned integer type used in this RNGState's counter array.

/// -------------------------------------------------------------------
/// The unsigned integer type used in this RNGState's key array.
/// This is typically std::uint32_t, but it can be std::uint64_t.
using key_uint = typename RNG::key_type::value_type;
// ^ The unsigned integer type used in this RNGState's key array.

const static int len_c = RNG::ctr_type::static_size;
static_assert(len_c >= 2);
const static int len_k = RNG::key_type::static_size;

/// ------------------------------------------------------------------
/// This is a Random123-defined statically-sized array of unsigned integers.
/// With RandBLAS' default, it contains four 32-bit unsigned ints
/// and is interpreted as one 128-bit unsigned int.
///
/// This member specifies a "location" in the random stream
/// defined by RNGState::generator and RNGState::key.
/// Random sampling functions in RandBLAS effectively consume elements
/// of the random stream starting from this location.
///
/// **RandBLAS functions do not mutate input RNGStates.** Free-functions
/// return new RNGStates with suitably updated counters. Constructors
/// for SketchingOperator objects store updated RNGStates in the
/// object's next_state member.
typename RNG::ctr_type counter;
// ^ This RNGState's counter array. Exclude from doxygen comments
// since end-users shouldn't touch it.

/// ------------------------------------------------------------------
/// This RNGState's key array. If you want to manually advance the key
/// by an integer increment of size "step," then you do so by calling
/// this->key.incr(step).
/// This is a Random123-defined statically-sized array of unsigned integers.
/// With RandBLAS' default, it contains two 32-bit unsigned ints
/// and is interpreted as one 64-bit unsigned int.
///
/// This member specifices a sequece of pseudo-random numbers
/// that RNGState::generator can produce. Any fixed sequence has
/// fairly large period (\math{2^{132},} with RandBLAS' default) and
/// is statistically independent from sequences induced by different keys.
///
/// To increment the key by "step," call \math{\ttt{key.incr(step)}}.
typename RNG::key_type key;

const static int len_c = RNG::ctr_type::static_size;
static_assert(len_c >= 2);
const static int len_k = RNG::key_type::static_size;

/// Initialize the counter and key to zero.
RNGState() : counter{}, key{} {}

/// Initialize the counter and key arrays to all zeros.
RNGState() : counter{{0}}, key(key_type{{}}) {}
/// Initialize the counter and key to zero, then increment the key by k.
RNGState(uint64_t k) : counter{}, key{} { key.incr(k); }

// construct from a key
RNGState(key_type const &k) : counter{{0}}, key(k) {}
RNGState(key_type const &k) : counter{}, key(k) {}

// Initialize counter and key arrays at the given values.
RNGState(ctr_type const &c, key_type const &k) : counter(c), key(k) {}

// move construct from an initial counter and key
RNGState(ctr_type &&c, key_type &&k) : counter(std::move(c)), key(std::move(k)) {}

/// Initialize the counter array to all zeros. Initialize the key array to have first
/// element equal to k and all other elements equal to zero.
RNGState(key_uint k) : counter{{0}}, key{{k}} {}
// move constructor.
RNGState(RNGState<RNG> &&s) : RNGState(std::move(s.counter), std::move(s.key)) {};

~RNGState() {};

/// A copy constructor.
/// Copy constructor.
RNGState(const RNGState<RNG> &s) : RNGState(s.counter, s.key) {};

// A copy-assignment operator.
Expand All @@ -127,6 +142,25 @@ struct RNGState {
return *this;
};

//
// Comparators (for now, these are just for testing and debugging)
//

bool operator==(const RNGState<RNG> &s) const {
// the compiler should only allow comparisons between RNGStates of the same type.
for (int i = 0; i < len_c; ++i) {
if (counter.v[i] != s.counter.v[i]) { return false; }
}
for (int i = 0; i < len_k; ++i) {
if (key.v[i] != s.key.v[i]) { return false; }
}
return true;
};

bool operator!=(const RNGState<RNG> &s) const {
return !(this == s);
};

};

template <typename RNG>
Expand Down
16 changes: 7 additions & 9 deletions RandBLAS/dense_skops.hh
Original file line number Diff line number Diff line change
Expand Up @@ -204,17 +204,15 @@ inline blas::Layout natural_layout(Axis major_axis, int64_t n_rows, int64_t n_co
namespace RandBLAS {

// =============================================================================
/// We support two distributions for dense sketching operators: those whose
/// entries are iid Gaussians or iid uniform over a symmetric interval.
///
/// Names for mean-zero variance-one distributions on the reals.
enum class ScalarDist : char {
// ---------------------------------------------------------------------------
/// Indicates the Gaussian distribution with mean 0 and variance 1.
/// The Gaussian distribution with mean 0 and variance 1.
Gaussian = 'G',

// ---------------------------------------------------------------------------
/// Indicates the uniform distribution over [-r, r] where r := sqrt(3)
/// is the radius that provides for a variance of 1.
/// The uniform distribution over \math{[-r, r],} where
/// \math{r := \sqrt{3}} provides for a variance of 1.
Uniform = 'U'
};

Expand Down Expand Up @@ -333,7 +331,7 @@ static_assert(SketchingDistribution<DenseDist>);
/// A sample from a distribution over matrices whose entries are iid
/// mean-zero variance-one random variables.
/// This type conforms to the SketchingOperator concept.
template <typename T, typename RNG = r123::Philox4x32>
template <typename T, typename RNG = DefaultRNG>
struct DenseSkOp {

// ---------------------------------------------------------------------------
Expand Down Expand Up @@ -537,7 +535,7 @@ static_assert(SketchingOperator<DenseSkOp<double>>);
/// - Used to define :math:`\mtxS` as a sample from :math:`\D.`
///
/// @endverbatim
template<typename T, typename RNG = r123::Philox4x32>
template<typename T, typename RNG = DefaultRNG>
RNGState<RNG> fill_dense_unpacked(blas::Layout layout, const DenseDist &D, int64_t n_rows, int64_t n_cols, int64_t ro_s, int64_t co_s, T* buff, const RNGState<RNG> &seed) {
using RandBLAS::dense::fill_dense_submat_impl;
randblas_require(D.n_rows >= n_rows + ro_s);
Expand Down Expand Up @@ -597,7 +595,7 @@ RNGState<RNG> fill_dense_unpacked(blas::Layout layout, const DenseDist &D, int64
/// A CBRNG state
/// - Used to define \math{\mat(\buff)} as a sample from \math{\D}.
///
template <typename T, typename RNG = r123::Philox4x32>
template <typename T, typename RNG = DefaultRNG>
RNGState<RNG> fill_dense(const DenseDist &D, T *buff, const RNGState<RNG> &seed) {
return fill_dense_unpacked(D.natural_layout, D, D.n_rows, D.n_cols, 0, 0, buff, seed);
}
Expand Down
12 changes: 9 additions & 3 deletions RandBLAS/sparse_skops.hh
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,7 @@ static state_t repeated_fisher_yates(
vec_work[ell] = swap;
}
}
ctr.incr(dim_minor * vec_nnz);
return state_t {ctr, key};
}

Expand Down Expand Up @@ -239,7 +240,7 @@ inline state_t repeated_fisher_yates(
return sparse::repeated_fisher_yates(state, k, n, r, samples, (sint_t*) nullptr, (double*) nullptr);
}

template <typename RNG>
template <typename RNG = DefaultRNG>
RNGState<RNG> compute_next_state(SparseDist dist, RNGState<RNG> state) {
int64_t num_mavec, incrs_per_mavec;
if (dist.major_axis == Axis::Short) {
Expand All @@ -249,7 +250,7 @@ RNGState<RNG> compute_next_state(SparseDist dist, RNGState<RNG> state) {
// See repeated_fisher_yates.
} else {
num_mavec = std::min(dist.n_rows, dist.n_cols);
incrs_per_mavec = dist.vec_nnz * ((int64_t) state.len_c/2);
incrs_per_mavec = (int64_t) std::ceil((double) dist.vec_nnz / ((double) state.len_c/2));
// ^ LASOs do try to be frugal with CBRNG increments.
// See sample_indices_iid_uniform.
}
Expand All @@ -262,7 +263,7 @@ RNGState<RNG> compute_next_state(SparseDist dist, RNGState<RNG> state) {
/// A sample from a distribution over structured sparse matrices with either
/// independent rows or independent columns. This type conforms to the
/// SketchingOperator concept.
template <typename T, typename RNG = r123::Philox4x32, SignedInteger sint_t = int64_t>
template <typename T, typename RNG = DefaultRNG, SignedInteger sint_t = int64_t>
struct SparseSkOp {

// ---------------------------------------------------------------------------
Expand Down Expand Up @@ -483,6 +484,10 @@ void laso_merge_long_axis_vector_coo_data(
/// a COO sparse matrix representation of the SparseSkOp
/// defined by \math{(\D,\ttt{seed_state)}.}
///
/// Note: the "nosub" suffix in this function name reflects how it has no ability
/// to sample submatrices of sparse sketching operators. A future release of
/// RandBLAS will add a function called "fill_sparse_unpacked()" with capabilities
/// that are completely analogous to fill_dense_unpacked().
///
template <typename T, typename sint_t, typename state_t>
state_t fill_sparse_unpacked_nosub(
Expand Down Expand Up @@ -570,6 +575,7 @@ void fill_sparse(SparseSkOp &S) {
randblas_require(S.cols != nullptr);
randblas_require(S.vals != nullptr);
fill_sparse_unpacked_nosub(S.dist, S.nnz, S.vals, S.rows, S.cols, S.seed_state);
// ^ We ignore the return value from that function call.
return;
}

Expand Down
24 changes: 16 additions & 8 deletions RandBLAS/util.hh
Original file line number Diff line number Diff line change
Expand Up @@ -391,13 +391,19 @@ void transpose_square(T* A, int64_t n, int64_t lda) {
return;
}

// =============================================================================
/// \fn sqrt_epsilon()
/// Alias for sqrt(numeric_limits<T>::epsilon()). For example,
/// \math{\ttt{sqrt_epsilon<float>()} \approx 0.0003452}, and
/// \math{\ttt{sqrt_epsilon<double>()} \approx 1.4901\text{e-}8}.
///
template <typename T>
T sqrt_epsilon() {
return std::sqrt(std::numeric_limits<T>::epsilon());
}

// =============================================================================
/// \fn weights_to_cdf(int64_t n, T* w, T error_if_below = -std::numeric_limits<T>::epsilon())
/// \fn weights_to_cdf(int64_t n, T* w, T error_if_below = -sqrt_epsilon<T>())
/// @verbatim embed:rst:leading-slashes
/// Checks if all elements of length-:math:`n` array ":math:`w`" are at no smaller than
/// :math:`\ttt{error_if_below}.` If this check passes, then we (implicitly) initialize :math:`v := w``
Expand Down Expand Up @@ -426,7 +432,7 @@ void weights_to_cdf(int64_t n, T* w, T error_if_below = -sqrt_epsilon<T>()) {
}

template <typename TO, typename TI>
static inline TO uneg11_to_uneg01(TI in) {
static inline TO uneg11_to_u01(TI in) {
return ((TO) in + (TO) 1.0)/ ((TO) 2.0);
}

Expand All @@ -453,16 +459,17 @@ state_t sample_indices_iid(int64_t n, const T* cdf, int64_t k, sint_t* samples,
int64_t len_c = (int64_t) state.len_c;
int64_t rv_index = 0;
for (int64_t i = 0; i < k; ++i) {
if ((i+1) % len_c == 1) {
auto random_unif01 = uneg11_to_u01<T>(rv_array[rv_index]);
sint_t sample_index = std::lower_bound(cdf, cdf + n, random_unif01) - cdf;
samples[i] = sample_index;
rv_index += 1;
if (rv_index == len_c) {
ctr.incr(1);
rv_array = r123ext::uneg11::generate(gen, ctr, key);
rv_index = 0;
}
auto random_unif01 = uneg11_to_uneg01<T>(rv_array[rv_index]);
sint_t sample_index = std::lower_bound(cdf, cdf + n, random_unif01) - cdf;
samples[i] = sample_index;
rv_index += 1;
}
if (0 < rv_index) ctr.incr(1);
return state_t(ctr, key);
}

Expand All @@ -480,7 +487,7 @@ state_t sample_indices_iid_uniform(int64_t n, int64_t k, sint_t* samples, T* rad
int64_t rv_index = 0;
double dN = (double) n;
for (int64_t i = 0; i < k; ++i) {
auto random_unif01 = uneg11_to_uneg01<double>(rv_array[rv_index]);
auto random_unif01 = uneg11_to_u01<double>(rv_array[rv_index]);
sint_t sample_index = (sint_t) dN * random_unif01;
samples[i] = sample_index;
rv_index += 1;
Expand All @@ -494,6 +501,7 @@ state_t sample_indices_iid_uniform(int64_t n, int64_t k, sint_t* samples, T* rad
rv_index = 0;
}
}
if (0 < rv_index) ctr.incr(1);
return state_t(ctr, key);
}

Expand Down
7 changes: 6 additions & 1 deletion rtd/source/api_reference/utilities.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,15 @@
Utilities
############################################################

Numerical tolerances
====================
.. doxygenfunction:: RandBLAS::sqrt_epsilon
:project: RandBLAS

Random sampling from index sets
===============================

.. doxygenfunction:: RandBLAS::weights_to_cdf(int64_t n, T* w, T error_if_below = -std::numeric_limits<T>::epsilon())
.. doxygenfunction:: RandBLAS::weights_to_cdf(int64_t n, T* w, T error_if_below = -sqrt_epsilon<T>())
:project: RandBLAS

.. doxygenfunction:: RandBLAS::sample_indices_iid(int64_t n, const T* cdf, int64_t k, sint_t* samples, const state_t &state)
Expand Down
16 changes: 13 additions & 3 deletions rtd/source/updates/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,20 @@
Changes to RandBLAS
===================

This page details changes made to RandBLAS over time, in reverse chronological order.
This page reviews changes made to RandBLAS over time, in reverse chronological order.
We have a tentative policy of providing bugfix support for any release of
RandBLAS upon request, no matter how old. With any luck, this project will grow enough
that we'll have to change this policy.

RandBLAS follows `Semantic Versioning <https://semver.org>`_.
RandBLAS follows `Semantic Versioning <https://semver.org>`_. Any function documented
on this website is part of the public API. There are many functions which are not
part of our public API, but could be added to it if there is user interest.

RandBLAS' latest version is :ref:`1.0.1 <v10x_patches>`.

RandBLAS 1.0
------------
*Release date: September 12, 2024. Release manager: Riley Murray.*
*Original release date: September 12, 2024. Release manager: Riley Murray.*

Today marks RandBLAS' second-ever release, its first *stable* release,
and its first release featuring the contributions of someone who showed
Expand Down Expand Up @@ -94,6 +97,13 @@ National Technology and Engineering Solutions of Sandia, LLC., a wholly owned su
of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear
Security Administration under contract DE-NA-0003525.

.. _v10x_patches:

Patch releases in series 1.0.x
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Version 1.0.1 (September 29, 2024). This patches bugs in values of RNGStates
returned from functions for sampling from index sets. See GitHub for more details.


RandBLAS 0.2
Expand Down
Loading

0 comments on commit 1489103

Please sign in to comment.