Skip to content

Official Release for RandBLAS 1.0

Compare
Choose a tag to compare
@rileyjmurray rileyjmurray released this 13 Sep 01:39
· 8 commits to main since this release

 
Today marks RandBLAS' second-ever release, its first stable release, and its first release featuring the contributions of someone who showed up entirely out of the blue (shoutout to Rylie Weaver)!

New features for core functionality

The semantics of SparseDist::major_axis have changed in RandBLAS 1.0. As a result of this change, SparseSkOps can represent LESS-Uniform operators and operators for plain row or column sampling with replacement. (This is in addition to hashing-style operators like CountSketch, which we've supported since version 0.2.)
 
We have four new functions for sampling from index sets.

  • weights_to_cdf
  • sample_indices_iid
  • sample_indices_iid_uniform
  • repeated_fisher_yates

We have two new functions for getting low-level data for a sketching operator's explicit representation: fill_dense_unpacked and fill_sparse_unpacked_nosub. These are useful if you want to incorporate RandBLAS' sketching functionality into other frameworks, like Kokkos, cuBLAS, or MKL.
 
Finally, there's sketch_symmetric, overloaded for sketching from the left or right.

Quality-of-life improvements

  • We've significantly expanded the tutorial part of our web docs. It now has details on updating sketches and some advice on choosing parameters for sketching distributions.
  • Error is now in the public API.
  • print_buff_to_stream is for writing MATLAB-style or NumPy-style string representations of matrices to a provided stream, like std::cout.
  • We settled on a unified memory-management / memory-ownership policy. There's no difference between DenseSkOp, SparseSkOp, or any of the sparse matrix types. The abstract policy is described in our web documentation. The consequences of the policy for each of the aforementioned types is documented in source code and on our website.
  • We added a few utility functions for working with dense matrices: symmetrize, overwrite_triangle, and transpose_square.

Significantly revised APIs for sketching distributions and operators

  • Added new SketchingDistribution and SketchingOperator C++20 concepts.
  • API revisions to DenseDist/DenseSkOp and SparseDist/SparseSkOp were mostly about taking quantities which we would compute from an object's const members with free-functions, and instead making those quantities const members themselves. Good examples of this are DenseDist::isometry_scale and SparseDist::isometry_scale, whose meanings are explained in the SketchingDistribution docs.
  • DenseSkOp::next_state and SparseSkOp::next_state are computed at construction time, without actually performing any random sampling. This means that one can define a sequence of independent sketching without changing an RNGState's "key" and without realizing any of them explicitly.        

Statistical tests

  • Kolmogorov–Smirnov tests for distributional correctness of sample_indices_iid, sample_indices_iid_uniform, repeated_fisher_yates, and the scalar distributions that can be used with DenseSkOp (standard-normal and uniform over [-1,1]).
  • Tests for subspace embedding properties of DenseSkOp. A forthcoming paper will describe how these tests cover a wide range of relevant parameter values at very mild computational cost.
  • We've incorporated select tests from Random123 into our testing framework.

Contributors

I'd like to start by acknowledging the contributions of Parth Nobel (@PTNobel) to RandBLAS' development. Parth and I have worked on-and-off on several projects involving RandNLA algorithms. None of these projects has been published yet, but they've had a significant role in uncovering bugs and setting development priorities for RandBLAS. (As a recent example in the latter category, I probably wouldn't have added the "sample_indices_iid" function were it not for its relevance to one of our projects.) This led me to be quite surprised when I noticed that Parth technically hasn't made a commit to the RandBLAS repository! Let this statement set the record straight: Parth has made very real contributions to RandBLAS, even if the commit history doesn't currently reflect that.
 
Rylie Weaver (@RylieWeaver), the aforementiƒoned out-of-the-blue contributor, helped write our Kolmogorov–Smirnov tests for repeated Fisher–Yates.
 
I wrote a lot of code (as one might imagine).

Funding acknowledgements

This work was wholly supported by LDRD funding from Sandia National Laboratories.
 
Sandia National Laboratories is a multi-mission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA-0003525.