Simple MATLAB code for iterative computing an SVD or PCA via the randomized block Krylov method analyzed in Randomized Block Krylov Methods for Stronger and Faster Approximate Singular Value Decomposition
Download bksvd.m
, add to MATLAB path, or include directly in project directory.
bksvd
can be used as a drop-in replacement for MATLAB's svds
function for computing the Singular Value Decomposition (SVD). It can also be used for Principal Component Analysis (PCA) which performs an SVD on the after mean-centering the data matrix.
Input:
bksvd(A, k, iter, bsize, center)
A
: matrix to decomposek
: number of singular vectors to compute, default = 6iter
: number of iterations, default = 3, increase for higher accuracybsize
: block size, must be >= k, default = kcenter
: set totrue
if A's rows should be mean-centered before the singular value decomposition (e.g. when performing PCA), default =false
Output: k singular vector/value pairs
U
: an orthogonal matrix whose columns are approximate top k left singular vectors forA
S
: a diagonal matrix whose entries areA
's approximate top k singular valuesV
: an orthogonal matrix whose columns are approximate top k right singular vectors forA
U*S*V'
is a near optimal low-rank approximation for A
Standard Singular Values Decomposition:
% generate test matrix
s = 1.5.^[-40:.5:40];
A = randn(10000,161)*diag(s)*randn(161,161);
% compute SVD
[U,S,V] = bksvd(A,10);
bksvd
is typically as accurate as svds
and often faster:
tic; [U,S,V] = svds(A,30); toc;
Elapsed time is 1.380471 seconds.
norm(A- U*S*V')/norm(A)
0.0018
tic; [U,S,V] = bksvd(A,30); toc;
Elapsed time is 0.062798 seconds.
norm(A- U*S*V')/norm(A)
0.0018
Principal Component Analysis:
For PCA, A
's rows (data points) should be mean-centered before computing the SVD. If the center
flag is set to true
, bksvd
can do this implicitly, without densifying A
:
[U,S,V] = bksvd(A,10,4,10,true);
Here V
contains loading vector for the top 10 principal components. U*S
can be taken as a dimensionality reduction for the data to 10 components.
For higher accuracy (at the cost of slower runtime), the number of iterations can be increased from the default of 3
, although for many matrices this is unecessary. Increasing the block size, bsize
, so that it is > k also increases accuracy. For matrices with quickly decaying singular values, increasing block size can be more effective than increasing iterations. For details, see the NIPS paper.
An implementation of bksvd
is available through MLPACK. We plan to upload a Python implementation soon.