-
Notifications
You must be signed in to change notification settings - Fork 7
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Expand and update examples, improve spmm performance (#89)
Changes to examples * The total least squares examples now run the classical algorithm in addition to the randomized algorithm. We compare runtime and solution quality for the two methods. * I've added three examples for low-rank approximation of sparse matrices. * QB-based SVD of randomly generated synthetic test matrices (rank one plus noise). * QB-based SVD of any sparse matrix in Matrix Market format. * Low-rank QRCP of a sparse matrix in Matrix Market format. Changes to sparse matrix functionality * I've added two kernel implementations for SPMM of row-major data. This was needed to address cache inefficiencies revealed in the existing SPMM kernels during low-rank QRCP benchmarking. * I've removed the requirement for ``A.reserve((int64_t) 10)`` being able to execute for a matrix ``A`` that's compatible with the ``SpMatrix`` concept. Apparently this didn't work with sparse matrices marked as ``const``. I've also added two files of developer notes: ``RandBLAS/DevNotes.md`` and ``RandBLAS/sparse_data/DevNotes.md``.
- Loading branch information
1 parent
21700fd
commit 9d0a03f
Showing
18 changed files
with
1,695 additions
and
370 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
# Developer Notes for RandBLAS | ||
|
||
This file has discussions of RandBLAS' implementation that aren't (currently) suitable | ||
for the RandBLAS User Guide. | ||
|
||
## What's where? | ||
|
||
* Our basic random number generation is handled by [Random123](https://github.com/DEShawResearch/random123). | ||
We have small wrappers around Random123 code in ``RandBLAS/base.hh`` and ``RandBLAS/random_gen.hh``. | ||
|
||
* ``RandBLAS/dense_skops.hh`` has code for representing and sampling dense sketching operators. | ||
The sampling code is complicated because it supports multi-threaded (yet threading invariant!) | ||
random (sub)matrix generation. | ||
|
||
* ``RandBLAS/sparse_skops.hh`` has code for representing and sampling sparse sketching operators. | ||
The sampling code has a customized method for repeatedly sampling from an index set without | ||
replacement, which is needed to quickly generate the structures used in statistically reliable | ||
sparse sketching operators. | ||
|
||
* [BLAS++ (aka blaspp)](https://github.com/icl-utk-edu/blaspp) is our portability layer for BLAS. | ||
We actually use very few functions in BLAS at time of writing (GEMM, GEMV, SCAL, COPY, and | ||
AXPY) but we use its enumerations _everywhere_. Fast GEMM is important for sketching dense | ||
data with dense operators. | ||
|
||
* The ``sketch_general`` functions in ``RandBLAS/skge.hh`` are the main entry point for sketching dense data. | ||
These functions are small wrappers around functions with more BLAS-like names: | ||
* ``lskge3`` and ``rskge3`` in ``RandBLAS/skge3_to_gemm.hh``. | ||
* ``lskges`` and ``rskges`` in ``RandBLAS/skges_to_spmm.hh``. | ||
The former pair of functions are just fancy wrappers around GEMM. | ||
The latter pair of functions trigger a far more opaque call sequence, since they rely on sparse | ||
matrix operations. | ||
|
||
* There is no widely accepted standard for sparse BLAS operations. This is a bummer because | ||
sparse matrices are super important in data science and scientific computing. In view of this, | ||
RandBLAS provides its own abstractions for sparse matrices (CSC, CSR, and COO formats). | ||
The abstractions can either own their associated data or just wrap existing data (say, data | ||
attached to a sparse matrix in Eigen). RandBLAS has reasonably flexible and high-performance code | ||
for multiplying a sparse matrix and a dense matrix. All code related to sparse matrices is in | ||
``RandBLAS/sparse_data``. See that folder's ``DevNotes.md`` file for details. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,76 @@ | ||
# Developer Notes for RandBLAS' sparse matrix functionality | ||
|
||
RandBLAS provides abstractions for CSC, CSR, and COO-format sparse matrices. | ||
The following functions use these abstractions: | ||
|
||
* ``left_spmm``, which computes a product of a sparse matrix and a dense matrix when the sparse matrix | ||
is the left operand. This function is GEMM-like, in that it allows offsets and transposition flags | ||
for either argument. | ||
* ``right_spmm``, which is analogous to ``left_spmm`` when the sparse matrix is the right operand. | ||
* ``sketch_general``, when called with a SparseSkOp object. | ||
* ``sketch_sparse``, when called with a DenseSkOp object. | ||
|
||
Each of those functions is merely a _dispatcher_ of other (lower level) functions. See below for details on | ||
how the dispatching works. | ||
|
||
## Left_spmm and right_spmm | ||
|
||
These functions are implemented in ``RandBLAS/sparse_data/spmm_dispatch.hh``. | ||
|
||
``right_spmm`` is implemented by falling back on ``left_spmm`` with transformed | ||
values for ``opA, opB`` and ``layout``. | ||
Here's what happens if ``left_spmm`` is called with a sparse matrix ``A``, a dense input matrix ``B``, and a dense output matrix ``C``. | ||
|
||
1. If needed, transposition of ``A`` is resolved by creating a lightweight object for the transpose | ||
called ``At``. This object is just a tool for us to change how we intrepret the buffers that underlie ``A``. | ||
* If ``A`` is COO, then ``At`` will also be COO. | ||
* If ``A`` is CSR, then ``At`` will be CSC. | ||
* If ``A`` is CSC, then ``At`` will be CSR. | ||
|
||
We make a recursive call to ``left_spmm`` once we have our hands on ``At``, so | ||
the rest of ``left_spmm``'s logic only needs to handle un-transposed ``A``. | ||
|
||
2. A memory layout is determined for how we'll read ``B`` in the low-level | ||
sparse matrix multiplication kernels. | ||
* If ``B`` is un-transposed then we'll use the same layout as ``C``. | ||
* If ``B`` is transposed then we'll swap its declared dimensions | ||
(i.e., we'll swap its reported numbers of rows and columns) and | ||
and we'll tell the kernel to read it in the opposite layout as ``C``. | ||
|
||
3. We dispatch a kernel from ``coo_spmm_impl.hh``, or ``csc_spmm_impl.hh``, | ||
or ``csr_spmm_impl.h``. The precise kernel depends on the type of ``A``, and the inferred layout for ``B``, and the declared layout for ``C``. | ||
|
||
## Sketching dense data with sparse operators. | ||
|
||
Sketching dense data with a sparse operator is typically handled with ``sketch_general``, | ||
which is defined in ``skge.hh``. | ||
|
||
If we call this function with a SparseSkOp object, ``S``, we'd immediately get routed to | ||
a function in ``skges_to_spmm.hh``: either ``lskges`` or ``rskges``. Here's what would happen | ||
after we entered one of those functions: | ||
|
||
1. If necessary, we'd sample the defining data of ``S`` with ``RandBLAS::fill_sparse(S)``. | ||
|
||
2. We'd obtain a lightweight view of ``S`` as a COOMatrix, and we'd pass that matrix to ``left_spmm`` | ||
(if inside ``lskges``) or ``right_spmm`` (if inside ``rskges``). | ||
|
||
|
||
## Sketching sparse data with dense operators | ||
|
||
If we call ``sketch_sparse`` with a DenseSkOp, ``S``, and a sparse matrix, ``A``, then we'll get routed to either | ||
``lsksp3`` or ``rsksp3`` in ``sparse_data/sksp3_to_spmm.hh``. | ||
|
||
From there, we'll do the following. | ||
|
||
1. If necessary, we sample the defining data of ``S``. The way that we do this is a | ||
little more complicated than using ``RandBLAS::fill_dense(S)``, but it's similar | ||
in spirit. | ||
|
||
2. We get our hands on the simple buffer representation of ``S``. From there ... | ||
* We call ``right_spmm`` if we're inside ``lsksp3``. | ||
* We call ``left_spmm`` if we're inside ``rsksp3``. | ||
|
||
Note that the ``l`` and ``r`` in the ``[l/r]sksp3`` function names | ||
get matched to opposite sides for ``[left/right]_spmm``! This is because all the fancy abstractions in ``S`` have been stripped away by this point in the call sequence, so the "side" that we emphasize in function names changes | ||
from emphasizing ``S`` to emphasizing ``A``. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
sparse-data-matrices/* | ||
sparse-low-rank-approx/data-matrices/* | ||
sparse-low-rank-approx/fast-matrix-market/* |
Oops, something went wrong.