Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand and update examples, improve spmm performance #89

Merged
merged 44 commits into from
May 30, 2024
Merged
Changes from 1 commit
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
8a5e294
update examples
rileyjmurray Apr 17, 2024
8744efa
have data set up for low-rank approx of sparse matrix
rileyjmurray Apr 19, 2024
37087b8
svd example compiles
rileyjmurray Apr 23, 2024
b481b5b
accuracy check and runtime
rileyjmurray Apr 26, 2024
a7b4ab0
directory
rileyjmurray Apr 28, 2024
8e6c035
ignore directory contents
rileyjmurray Apr 28, 2024
74122e1
fast matrix market folder
rileyjmurray Apr 28, 2024
c8b8378
include examples that use fast-matrix-market
rileyjmurray Apr 28, 2024
cb8f8d2
interesting demo using test data from fast-matrix-market
rileyjmurray Apr 28, 2024
8d5ac2d
tweak
rileyjmurray Apr 28, 2024
e19a266
clean up svd example (using a new macro for timing)
rileyjmurray Apr 28, 2024
299cbc3
ternary assignment
rileyjmurray Apr 29, 2024
f66d209
proper QRCP demo
rileyjmurray May 1, 2024
698d272
standalone Eigen example (currently very messy)
rileyjmurray May 1, 2024
2dff390
update eigen example
rileyjmurray May 2, 2024
5676def
remove Eigen example (its sparse QR doesnt do pivoting to reveal nume…
rileyjmurray May 2, 2024
669acbd
restructure examples
rileyjmurray May 7, 2024
4af1221
fix bug in examples (those using lu_stabilize). Create special implem…
rileyjmurray May 8, 2024
1eb7b80
properly check for OpenMP before calling omp_get_thread_num()
rileyjmurray May 9, 2024
db4adcb
fix thread number
rileyjmurray May 9, 2024
6e70336
graphs examples
rileyjmurray May 9, 2024
2408585
check in
rileyjmurray May 10, 2024
18f91eb
check in
rileyjmurray May 10, 2024
cb929a2
fix logging bug
rileyjmurray May 10, 2024
09244d3
debugging
rileyjmurray May 24, 2024
8966842
fixed sparse QRCP example
rileyjmurray May 24, 2024
9a13962
Update lu_row_stabilize so that it errors when necessary. Modulo read…
rileyjmurray May 24, 2024
a86d2c5
clean up notes in SpMatrix concept and remove requirement that preclu…
rileyjmurray May 29, 2024
862121e
remove messy graphs example. Rename fmm_svd.cc
rileyjmurray May 29, 2024
daac3fd
rename qrcp.cc example, update CMakeLists.txt
rileyjmurray May 29, 2024
bb16d37
remove added whitespace
rileyjmurray May 29, 2024
fc25e2d
clean up svd_matrixmarket example
rileyjmurray May 29, 2024
4274ad3
Polish example for low-rank QRCP of sparse matrices
rileyjmurray May 29, 2024
90d6678
blank line at end of file
rileyjmurray May 29, 2024
703ccef
specify μs (microseconds) instead of ms (milliseconds)
rileyjmurray May 29, 2024
a226f21
revise TLS examples
rileyjmurray May 29, 2024
96588fa
change fetcher to a README
rileyjmurray May 29, 2024
a7ab7bb
README and CMakeLists.txt
rileyjmurray May 29, 2024
a91397c
make sure to deallocate before returning
rileyjmurray May 29, 2024
d9eacf9
dev notes
rileyjmurray May 29, 2024
1f8ae46
better devnotes
rileyjmurray May 29, 2024
d0125c1
devnotes
rileyjmurray May 29, 2024
5754ce1
better devnotes
rileyjmurray May 30, 2024
fd5a987
tweak
rileyjmurray May 30, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
have data set up for low-rank approx of sparse matrix
rileyjmurray committed May 29, 2024
commit 8744efa99accbde9058ffbbade534b6cfb682ad9
10 changes: 10 additions & 0 deletions examples/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -48,3 +48,13 @@ target_link_libraries(
tls_sparse_skop PUBLIC RandBLAS blaspp lapackpp
)


add_executable(
slra_rank1_plus_noise sparse-low-rank-approx/svd_rank1_plus_noise.cc
)
target_include_directories(
slra_rank1_plus_noise PUBLIC ${Random123_DIR}
)
target_link_libraries(
slra_rank1_plus_noise PUBLIC RandBLAS blaspp lapackpp
)
5 changes: 4 additions & 1 deletion examples/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
# RandBLAS examples

Files in this folder show how RandBLAS can be used to implement high-level RandNLA algorithms.
Files in this directory show how RandBLAS can be used to implement high-level RandNLA algorithms.
Right now we have two such examples.
1. A sketch-and-solve approach to the total least squares problem.
2. Basic methods for finding factored representations of low-rank sparse matrices.
208 changes: 208 additions & 0 deletions examples/sparse-low-rank-approx/svd_rank1_plus_noise.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,208 @@
#include <blas.hh>
#include <RandBLAS.hh>
#include <lapack.hh>
#include <omp.h>
#include <stdio.h>
#include <unistd.h>
#include <iostream>
#include <math.h>
#include <time.h>
#include <stdlib.h>
#include <chrono>
#include <unordered_map>

using RandBLAS::sparse_data::COOMatrix;

auto parse_dimension_args(int argc, char** argv) {
int64_t m;
int64_t n;
int64_t vec_nnz;

if (argc == 1) {
m = 10000;
n = 500;
vec_nnz = 4;
} else if (argc == 4) {
m = atoi(argv[1]);
n = atoi(argv[2]);
vec_nnz = atoi(argv[3]);
} else {
std::cout << "Invalid parameters; must be called with no parameters, or with three positive integers." << '\n';
exit(1);
}
return std::make_tuple(m, n, vec_nnz);
}

template <typename T, typename RNG = r123::Philox4x32>
void iid_sparsify_random_dense(
int64_t n_rows, int64_t n_cols, int64_t stride_row, int64_t stride_col, T* mat, T prob_of_zero, RandBLAS::RNGState<RNG> state
) {
auto spar = new T[n_rows * n_cols];
auto dist = RandBLAS::DenseDist(n_rows, n_cols, RandBLAS::DenseDistName::Uniform);
auto [unused, next_state] = RandBLAS::fill_dense(dist, spar, state);

auto temp = new T[n_rows * n_cols];
auto D_mat = RandBLAS::DenseDist(n_rows, n_cols, RandBLAS::DenseDistName::Uniform);
RandBLAS::fill_dense(D_mat, temp, next_state);

#define SPAR(_i, _j) spar[(_i) + (_j) * n_rows]
#define TEMP(_i, _j) temp[(_i) + (_j) * n_rows]
#define MAT(_i, _j) mat[(_i) * stride_row + (_j) * stride_col]
for (int64_t i = 0; i < n_rows; ++i) {
for (int64_t j = 0; j < n_cols; ++j) {
T v = (SPAR(i, j) + 1.0) / 2.0;
if (v < prob_of_zero) {
MAT(i, j) = 0.0;
} else {
MAT(i, j) = TEMP(i, j);
}
}
}

delete [] spar;
delete [] temp;
}

template <typename SpMat>
SpMat sum_of_coo_matrices(SpMat &A, SpMat &B) {
randblas_require(A.n_rows == B.n_rows);
randblas_require(A.n_cols == B.n_cols);

using T = typename SpMat::scalar_t;
using sint_t = typename SpMat::index_t;
constexpr bool valid_type = std::is_same_v<SpMat, COOMatrix<T, sint_t>>;
randblas_require(valid_type);

using Tuple = std::pair<int64_t, int64_t>;
struct TupleHasher {
size_t operator()(const Tuple &coordinate_pair) const {
// an implementation suggested by my robot friend.
size_t hash1 = std::hash<int64_t>{}(coordinate_pair.first);
size_t hash2 = std::hash<int64_t>{}(coordinate_pair.second);
size_t hash3 = hash1;
hash3 ^= hash2 + 0x9e3779b9 + (hash1 << 6) + (hash1 >> 2);
return hash3;
}
};
std::unordered_map<Tuple, T, TupleHasher> c_dict{};

for (int ell = 0; ell < A.nnz; ++ell) {
Tuple curr_idx(A.rows[ell], A.cols[ell]);
c_dict[curr_idx] = A.vals[ell];
}
for (int ell = 0; ell < B.nnz; ++ell) {
Tuple curr_idx(B.rows[ell], B.cols[ell]);
c_dict[curr_idx] = B.vals[ell] + c_dict[curr_idx];
}

SpMat C(A.n_rows, A.n_cols);
C.reserve(c_dict.size());
int64_t ell = 0;
for (auto iter : c_dict) {
Tuple t = iter.first;
auto [i, j] = t;
C.rows[ell] = i;
C.cols[ell] = j;
C.vals[ell] = iter.second;
++ell;
}
return C;
}


template <typename SpMat>
void make_signal_matrix(double signal_scale, int64_t m, int64_t n, int64_t vec_nnz, double* signal_dense, SpMat &signal_sparse) {
using T = typename SpMat::scalar_t;
using sint_t = typename SpMat::index_t;
constexpr bool valid_type = std::is_same_v<SpMat, COOMatrix<T, sint_t>>;
randblas_require(valid_type);
signal_sparse.reserve(vec_nnz * vec_nnz);

// populate signal_dense and signal_sparse.
RandBLAS::RNGState u_state(0);
double *work_vals = new double[2*vec_nnz]{};
int64_t *work_idxs = new int64_t[2*vec_nnz]{};
int64_t *trash = new int64_t[vec_nnz]{};

auto v_state = RandBLAS::repeated_fisher_yates(u_state, vec_nnz, m, 1, work_idxs, trash, work_vals);
auto next_state = RandBLAS::repeated_fisher_yates(v_state, vec_nnz, n, 1, work_idxs+vec_nnz, trash, work_vals+vec_nnz);
double *u = new double[m]{};
double *v = new double[n]{};
for (int j = 0; j < vec_nnz; ++j) {
for (int i = 0; i < vec_nnz; ++i) {
int temp = i + j*vec_nnz;
signal_sparse.rows[temp] = work_idxs[i];
signal_sparse.cols[temp] = work_idxs[j+vec_nnz];
signal_sparse.vals[temp] = work_vals[i] * work_vals[j+vec_nnz];
}
u[work_idxs[j]] = work_vals[j];
v[work_idxs[j + vec_nnz]] = work_vals[j + vec_nnz];
}
blas::ger(blas::Layout::ColMajor, m, n, signal_scale, u, 1, v, 1, signal_dense, m);

delete [] work_vals;
delete [] work_idxs;
delete [] trash;
delete [] u;
delete [] v;
return;
}


template <typename SpMat>
void make_noise_matrix(double noise_scale, int64_t m, int64_t n, double prob_of_nonzero, double* noise_dense, SpMat &noise_sparse) {
// populate noise_dense and noise_sparse.
//
// NOTE: it would be more efficient to sample vec_nnz*vec_nnz elements without replacement from the index set
// from 0 to m*n-1, then de-vectorize those indices (in either row-major or col-major interpretation) and
// only sample the values of the nonzeros for these pre-determined structural nonzeros. The current implementation
// has to generate to dense m-by-n matrices whose entries are iid uniform [-1, 1].
//
using T = typename SpMat::scalar_t;
using sint_t = typename SpMat::index_t;
constexpr bool valid_type = std::is_same_v<SpMat, COOMatrix<T, sint_t>>;
randblas_require(valid_type);

RandBLAS::RNGState noise_state(1);
double prob_of_zero = 1 - prob_of_nonzero;
iid_sparsify_random_dense(m, n, 1, m, noise_dense, prob_of_zero, noise_state);
blas::scal(m * n, noise_scale, noise_dense, 1);
RandBLAS::sparse_data::coo::dense_to_coo(blas::Layout::ColMajor, noise_dense, 0.0, noise_sparse);
return;
}

/*Utilities
3. Basic QB-based SVD with power iteration and HouseholderQR stabilization.
*/

int main(int argc, char** argv) {
auto [m, n, vec_nnz] = parse_dimension_args(argc, argv);
// First we set up problem data: a sparse matrix of low numerical rank
// given by a sum of a sparse "signal" matrix of rank 1 and a sparse
// "noise" matrix that has very small norm.
double signal_scale = 1e+2;
double noise_scale = 1e-6;
double prob_nonzero = 1e-4;
RandBLAS::sparse_data::COOMatrix<double> signal_sparse(m, n);
RandBLAS::sparse_data::COOMatrix<double> noise_sparse(m, n);
auto mn = m * n;
double *signal_dense = new double[mn]{};
double *noise_dense = new double[mn];

make_signal_matrix(signal_scale, m, n, vec_nnz, signal_dense, signal_sparse);
make_noise_matrix(noise_scale, m, n, prob_nonzero, noise_dense, noise_sparse);

// Add the two matrices together.
auto mat_sparse = sum_of_coo_matrices(noise_sparse, signal_sparse);
std::cout << signal_sparse.nnz << std::endl;
std::cout << noise_sparse.nnz << std::endl;
std::cout << mat_sparse.nnz << std::endl;
double *mat_dense = new double[mn]{};
blas::copy(mn, noise_dense, 1, mat_dense, 1);
blas::axpy(mn, 1.0, signal_dense, 1, mat_dense, 1);

delete [] signal_dense;
delete [] noise_dense;
delete [] mat_dense;
return 0;
}
8 changes: 4 additions & 4 deletions examples/total-least-squares/tls_dense_skop.cc
Original file line number Diff line number Diff line change
@@ -29,10 +29,10 @@ void init_noisy_data(int64_t m, int64_t n, int64_t d, double* AB){
RandBLAS::RNGState state(0);
RandBLAS::RNGState state1(1);

RandBLAS::fill_dense<double>(Dist_A, AB, state); //Fill A to be a random gaussian
RandBLAS::fill_dense<double>(Dist_eps, eps, state1); //Fill A to be a random gaussian
RandBLAS::fill_dense(Dist_A, AB, state); //Fill A to be a random gaussian
RandBLAS::fill_dense(Dist_eps, eps, state1); //Fill A to be a random gaussian

blas::gemm(blas::Layout::ColMajor, blas::Op::NoTrans, blas::Op::NoTrans, m, d, n, 1, AB, m, target_x, n, 0, &AB[m*n], m);
blas::gemm(blas::Layout::ColMajor, blas::Op::NoTrans, blas::Op::NoTrans, m, d, n, 1.0, AB, m, target_x, n, 0.0, &AB[m*n], m);

for (int i = 0; i < m*d; i++){
AB[m*n + i] += eps[i]; // Add Gaussian Noise to right hand side
@@ -113,7 +113,7 @@ int main(int argc, char* argv[]){
// Sketch AB
// SAB = alpha * \op(S) * \op(AB) + beta * SAB
auto time_sketch1 = high_resolution_clock::now();
RandBLAS::sketch_general<double>(
RandBLAS::sketch_general(
blas::Layout::ColMajor, // Matrix storage layout of AB and SAB
blas::Op::NoTrans, // NoTrans => \op(S) = S, Trans => \op(S) = S^T
blas::Op::NoTrans, // NoTrans => \op(AB) = AB, Trans => \op(AB) = AB^T
6 changes: 3 additions & 3 deletions examples/total-least-squares/tls_sparse_skop.cc
Original file line number Diff line number Diff line change
@@ -32,8 +32,8 @@ void init_noisy_data(int64_t m, int64_t n, int64_t d, double* AB){
RandBLAS::RNGState state(0);
RandBLAS::RNGState state1(1);

RandBLAS::fill_dense<double>(Dist_A, AB, state); //Fill A to be a random gaussian
RandBLAS::fill_dense<double>(Dist_eps, eps, state1); //Fill A to be a random gaussian
RandBLAS::fill_dense(Dist_A, AB, state); //Fill A to be a random gaussian
RandBLAS::fill_dense(Dist_eps, eps, state1); //Fill A to be a random gaussian

blas::gemm(blas::Layout::ColMajor, blas::Op::NoTrans, blas::Op::NoTrans, m, d, n, 1, AB, m, target_x, n, 0, &AB[m*n], m);

@@ -109,7 +109,7 @@ int main(int argc, char* argv[]){
// Sketch AB
// SAB = alpha * \op(S) * \op(AB) + beta * SAB
auto time_sketch1 = high_resolution_clock::now();
RandBLAS::sketch_general<double>(
RandBLAS::sketch_general(
blas::Layout::ColMajor, // Matrix storage layout of AB and SAB
blas::Op::NoTrans, // NoTrans => \op(S) = S, Trans => \op(S) = S^T
blas::Op::NoTrans, // NoTrans => \op(AB) = AB, Trans => \op(AB) = AB^T