arxiv: v1 [math.na] 29 Dec 2014

Size: px

Start display at page:

Download "arxiv: v1 [math.na] 29 Dec 2014"

Brice Edwards
6 years ago
Views:

1 A CUR Factorization Algorithm based on the Interpolative Decomposition Sergey Voronin and Per-Gunnar Martinsson arxiv: v1 [math.na] 29 Dec 214 December 3, 214 Abstract An algorithm for the efficient computation of the CUR decomposition is presented. The method is based on simple modifications to the classical truncated pivoted QR decomposition, which means that highly optimized library codes can be utilized for its implementation. Numerical experiments demonstrate advantageous performance compared to existing techniques for computing CUR factorizations. 1 Introduction In many applications, it is useful to approximate a matrix A C m n by a factorization of < min(m, n). When the singular values of A decay sufficiently fast so that an accurate approximation can be obtained for a that is substantially smaller than either m or n, great savings can be obtained both in terms of storage requirements, and in terms of speed of any computations involving A. The most accurate low rank approximation is the truncated SVD of, which approximates A via the product A U k Σ k V k, m n m k k k k n where U k and V k have orthonormal columns, and where Σ k is diagonal. The factorization (1.1) can be obtained from the full SVD of the matrix, but this is expensive. In contrast, fast randomized algorithms are available to construct such factorizations approximately; see e.g. the survey [8] and the references therein. One advantage of such factorizations is that they are, in many standard matrix norms, of optimal accuracy for any given. Recently a different factorization, the CUR, has been introduced [11]. This factorization approximately factors A into a product A C U R, m n m k k k k n (1.1) (1.2) where C contains a subset of the columns of A and R contains a subset of the rows of A. The key advantage of the CUR is that the long factors C and R inherit properties such as sparsity or non-negativity from A. Also, the index sets that point out which columns and rows of A to include in C and R often assist in data interpretation. There have been numerous discussions on how to obtain a CUR factorization (see e.g. [4, 14]), with some of the most recent and popular approaches relying on a method known as leverage scores [4, 11], a notion originating from statistics 1

2 [9]. In this paper, we discuss a new algorithm for computing an approximate CUR factorization, which we call CUR ID. This algorithm is obtained by slight variations on classical rank-revealing QR factorizations [2] and is very easy to implement the most expensive parts of the computation can all be done using standard packages. In addition, CUR ID lets the user leverage the power of carefully optimized multicore canned packages. This algorithm is proved to compare favorably in terms of both speed and accuracy with many proposed CUR algorithms. The manuscript also investigates the effect of accelerating existing schemes for constructing the CUR-factorization by using the randomized algorithm of [8] to compute the singular vectors used to compute leverage scores. Remark. The CUR factorization is closely related to the so called interpolative decomposition (ID), which decomposes A as a product A X A skel Y, m n m k k k k n (1.3) where A skel consists of a k k submatrix of A. The ID allows for data interpretation in a manner entirely analogous to the CUR, but has a great advantage over the CUR in that it is inherently better conditioned. However, while the factors X and Y (which tend to require much more storage than A skel ) are well-conditioned, they do not inherit properties such as sparsity or non-negativity. 2 Preliminaries In this section we review some existing matrix decompositions, notably the ID decomposition [8] which is closely related to a pivoted QR decomposition. The decomposition techniques will be the building blocks for the CUR ID decomposition that we introduce in Section 3. We follow the notation of [6] (so called Matlab style notation ): given any matrix A and (ordered) subindex sets J r and J c, A(J r, J c ) denotes the submatrix of A obtained by extracting the rows and columns of A indexed by J r and J c, respectively; and A(:, J c ) denotes the submatrix of A obtained by extracting the columns of A indexed by J c. For any positive integer k, 1 : k denotes the ordered index set (1,..., k). We take to be the spectral or operator norm and p the l p norm. 2.1 Pivoted QR factorizations Let A be an m n matrix with real or complex entries, and set r = min(m, n). The (compact) QR-factorization of A then takes the form A P = Q S, m n n n m r r n (2.1) where P is a permutation matrix, Q has orthonormal columns, and S is upper triangular (the matrix we call S is customarily labeled R, but we use that letter for one of the factors in the CUR-decomposition). The permutation matrix P can more efficiently be represented via a vector J c Z n + of indices such that P = I(:, J c ) where I is the n n identity matrix. The factorization (2.1) can then be written A(:, J c ) = Q S. (2.2) m n m r r n The QR-factorization is often computed via column pivoting combined with either the Gram- Schmidt process, Householder reflectors [6], or Givens rotations [3]. The resulting factor S then 2

3 satisfies various decay conditions [6], such as: S(j, j) S(j : m, l) 2 for all j < l. The QR-factorization (2.2) expresses A as a sum of r rank-one matrices A(:, J c ) r Q(:, j) S(j, :). j=1 The QR-factorization is often built incrementally via a greedy algorithm such as column pivoted Gram-Schmidt. This opens up the possibility of stopping after the first k terms have been computed and settling for a partial QR-factorization of A. We can express the error term by splitting the factors in (2.2) as follows: A(:, J c ) = k r k ï n ò [ ] k S m Q 1 Q 2 1 r k S 2 = Q 1 S 1 + Q 2 S 2. (2.3) Observe that since the SVD is optimal, it is always the case that σ k+1 (A) Q 2 S 2. We say that a factorization is a rank-revealing QR-factorization (RRQR) if the stronger statement Q 2 S 2 C σ k+1 (A) holds for some modest number C. (Some authors require additionally that σ j (Q 1 S 1 ) σ j (A) for 1 j k). Classical column pivoted Gram-Schmidt typically results in an RRQR, but there are counter-examples. More sophisticated versions such as [7] provably compute an RRQR in modest time, but are substantially harder to code, and the gain compared to standard methods is typically modest. 2.2 Low rank interpolative decomposition An approximate interpolative decomposition (ID) of a matrix A C m n is the approximate factorization: A C V, (2.4) m n m k k n where the partial column skeleton C C m k is given by a subset of the columns of A and V is well-conditioned in a sense that we will make precise shortly. The interpolative decomposition approximates A using only some of its columns, and one of the advantages of doing so is that the more compact description of the range of A given by its skeleton preserves some of the properties of the original matrix A such as sparsity and non-negativity. In this section we show one way of obtaining a low rank interpolative decomposition, via the truncated QR with column pivoting. From (2.3), we see that as long as S 2 2 is small, we can approximate A(:, J c ) by Q 1 S 1. We show that the approximation term Q 1 S 1 provides a ID to the matrix A. In fact, the approximation term Q 1 S 1 is the image of a skeleton of A, i.e., the range of Q 1 S 1 is contained in the span of k columns of A. Splitting the columns of S 1 and S 2 as follows: S 1 = k n k [ ] k S 11 S 12 and S 2 = [ k n k ï ] k S 22, (i.e., S = k n k ò k S 11 S 12, ) (2.5) r k S 22 3

4 it is immediate that [ ] [ ] [ ] A(:, J c ) = Q 1 S11 S 12 + Q2 S22 = m Q 1 S 11 Q 1 S 12 + Q 2 S 22. In other words, the partial column skeleton of A given by the first k columns of A(:, J c ) equals Q 1 S 11 : C := A(:, J c (1 : k)) = Q 1 S 11. Then we have: k n k Q 1 S 1 = [ Q 1 S 11 Q 1 S 12 ] = Q1 S 11 [I k S 1 11 S 12] = C [I k T l ], where T l is the solution to the matrix equation S 11 T l = S 12. Thus, we have the approximation A CV, where V = [ I k T l ] P T. (2.6) This approximate decomposition is known as a Interpolative Decomposition (ID) of the matrix A. The approximation error of the ID obtained via truncated QR with pivoting is the same as that of the truncated QR: A CV = Q 2 S 22. Notice also that the approximation A CV where C contains a subset of the columns of A, can be replaced by a similar approximation using a subset of the rows of A. Such a factorization can be obtained by performing pivoted QR on A instead of A. That is, we can obtain: from an ID of A. This implies: A A (:, J r (1 : k))z A ZA(J r (1 : k), :). In this approximation, we are representing A via a partial row skeleton. 2.3 Two sided interpolative decomposition A two sided ID approximation for matrices, is constructed via two successive one sided IDs. Assume that we have performed the one sided decomposition to obtain (2.6). Then we can perform a ID on C : that is, there exist a row permutation J r and some W C m k such that C = C (:, J r (1 : k))w (where equality holds because rank(c) k). This essentially amounts to doing a full pivoted QR decomposition on C which can often by efficiently done via canned routines in numerical libraries. Hence C = W C(J r (1 : k), :) = W A(J r (1 : k), J c (1 : k)). Thus, we obtain the approximation to the original matrix A: A CV = W A(J r (1 : k), J c (1 : k))v. (2.7) Note that the approximation error in (2.7) is the same as for the one sided ID in (2.6): the product term Q 2 S 2 from the truncated QR decomposition we performed. Notice also, that the two sided ID approach provides another way to obtain a row skeleton decomposition. Since A CV, if we take a subset of the rows of A, via the index set J k obtained from doing an ID on C, we get: C(J r (1 : k), :)V A(J r (1 : k), :). 4

5 If we set R := A(J r (1 : k), :) and left-multiply both sides by W, we obtain: W C(J r (1 : k), :)V W A(J r (1 : k), :) = W R, but the left hand side W C(J r (1 : k), :)V = CV A, so that: A W R (2.8) with R a partial row skeleton of A. (See (3.5) in Lemma 3.1 which quantifies the approximation error in (2.8).) 2.4 The randomized interpolative decomposition We note that performing k steps of the pivoted QR algorithm on a big matrix is costly. Instead of performing QR on the matrix A we can perform the decomposition on the smaller matrix product Y = ΩA, where Ω is a Gaussian random matrix of size (k + p) m. Hence Y is of size (k + p) n, substantially smaller than m n. Here, p is an oversampling parameter which is typically a small number like 1 [8]. Having performed the pivoted QR on Y, we can then remove p columns of the resulting matrix Q and p rows of the resulting matrix S and then proceed as before to construct the matrix V. We note that the randomized algorithm can also be applied to the two sided ID decomposition, where we perform the first QR on a smaller projected matrix followed by a full QR on C. The resulting algorithm tends to be significantly more efficient, without adverse effects on performance. 2.5 The CUR Decomposition A CUR factorization of a matrix A C m n is given by A C U R, m n m k k k k n where C consists of k columns of A, and R consists of k rows of A. The decomposition is typically obtained in three steps [12]. First, some scheme is used to assign a weight or so called leverage score (of importance) to each column and row in the matrix. This is typically done either using the l 2 norms of the columns and rows or by using the leading singular vectors of A [5]. Next, the matrices C and R are constructed via a randomized sampling procedure by using the weights computed in the prior step. Finally, the U matrix is computed via: U C AR, (2.9) with C and R being the pseudoinverses of C and R. Note that C and R, containing spanning columns and spanning rows of A, are in the ideal case, expected to have a singular value distribution similar to that of A. Since low-rank factorizations are most useful when applied to matrices whose singular values decay reasonably rapidly, we would typically expect C and R to be highly illconditioned, with condition numbers roughly on the order of σ 1 (A)/σ k (A). Hence, in the typical case, evaluation of the formula (2.9) can be expected to result in substantial loss of accuracy due to accumulation of round-off errors. Many techniques for computing CUR factorizations have been proposed. In particular, we mention the recent work of Sorensen and Embree [13] on the DEIM-CUR method. A number of standard CUR algorithms is implemented in the software package rcur [1] which we use for our numerical comparisons. The methods in the rcur package utilize eigenvectors to assign weights 5

6 to columns and rows of A. Computing the eigenvectors exactly amounts to doing the SVD which is very expensive. However, instead of the full SVD, when a CUR of is required, we can utilize instead the randomized SVD algorithm [8] to compute an approximate SVD of at substantially lower cost. 3 The CUR ID algorithm We now present an algorithm for computing the CUR decomposition based on the two sided ID. The difference between recently popularized algorithms for CUR computation and CUR ID is in the choice of columns and rows of A for forming C and R. In the CUR ID algorithm, the columns and rows are chosen via the two sided ID. The idea behind the use of ID for obtaining the CUR factorization is that the matrix C in the CUR factorization is immediately available from the ID (see (2.6)), and the matrix V C n k not only captures a rough row space description of A but also is of rank at most k. A ID on C, being an exact factorization of C which is of rank at most k, could hint on the relevant rows of A that approximate the entire row space of A itself. Specifically, similar to (2.6) where approximating range(a) using C incurs an error term [ Q 2 S 22 ], we can estimate the error of approximating range(a ) using A(J r (1 : k), :); see Lemma 3.1 below. The CUR ID algorithm we propose here is based on the two sided ID factorization from (2.7). We start by setting C = A(:, J c (1 : k)) and R = A(J r (1 : k), :), (3.1) i.e., C and R are respectively subsets of columns and of rows of A, with J c and J r determined by the pivoted QR factorizations. Next we construct a matrix U C k k such that A CUR by appealing to the one and two sided ID decompositions. From (2.6), we have that A CV. We proceed by replacing the right-hand side of CV = A(:, J c (1 : k))v by the matrix product CUR, recognizing that we previously chose C and R as in (3.1). This results in the equations: A(:, J c (1 : k))v = CUR = A(:, J c (1 : k))ua(j r (1 : k), :) satisfied by some matrix U C k k. In particular, the above leads to the matrix equation UR = V, (3.2) which holds for some U if and only if the row space of R = A(:, J c (1 : k)) contains the row space of the factor V from the ID. If we can solve for the matrix U from the least squares system (RR )U = RV, or equivalently, U = V R, then the matrix product CUR will approximate the original matrix A. Let us now consider the error in approximating the matrix A by CUR. Recall that the error E in the ID is the same as for the truncated QR. That is, A = CV + E. Let us also introduce an error term Ẽ which relates A and W R (from (2.8)). That is, A = CV + E = W R + Ẽ. We also know that U = V R, from above. Thus, we have that: A CUR = A CV R R = A (A E)R R = A AR R + ER R. We have that R R 1 and I R R 1. Since we expect E to be small, the term ER R E R R would also be expected to be small. Thus, it remains to look at the term A AR R. Notice that: AR R = (W R + Ẽ)R R = W R + ẼR R = A Ẽ + ẼR R = A AR R = Ẽ ẼR R. 6

7 This implies that: A CUR = Ẽ ẼR R + ER R = Ẽ(I R R) + ER R = A CUR Ẽ (I R R) + E R R Ẽ + E. To assure ourselves that the norm of A CUR can be made small for sufficiently large, we must show that Ẽ becomes small. We do so by relating Ẽ to E below. Lemma 3.1 Let A C m n. Suppose that A = CV + E = W R + Ẽ, (3.3) where C = A(:, J c (1 : k)) for some permutation J c of {1,..., n}, V C n k and E C m n, and where, similarly, R = A(J r (1 : k), :) for some permutation J r of {1,..., m}, W C m k and Ẽ C m n. If (via the ID decomposition of C ) we have that J r and W satisfy: C = W C(J r (1 : k), :), and W (J r, :) = [ I k T r ] (3.4) for some T r C k (m k), then the error term Ẽ is related to E (in the truncated QR) via: ï ò Ẽ = A W R = A W A(J r (1 : k), :) = P E(J r (k + 1 : m), :) Tr E(J r (1 : k), :) for some permutation matrix P R m m. Proof. Let P R m m be the permutation matrix defined by P = I(:, J r ). Then Then (3.4) implies that ï Ik T r Hence, using (3.3), we get that: W = P [ I k T r ]. ò ï ò C(J r (1 : k), :) = P T C(J C = C(J r, :) = r (1 : k), :) C(J r (k + 1 : m), :) = C(J r (k + 1 : m), :) = T r C(J r (1 : k), :). T r A(J r (1 : k), :) = T r C(J r (1 : k), :)V + T r E(J r (1 : k), :) = C(J r (k + 1 : m), :)V + T r E(J r (1 : k), :) = A(J r (k + 1 : m), :) E(J r (k + 1 : m), :) + T r E(J r (1 : k), :). It follows that: ï ò ï ò Ik A(Jr (1 : k), :) W R = W A(J r (1 : k), :) = P Tr A(J r (1 : k), :) = P Tr A(J r (1 : k), :) ï ò A(J = P r (1 : k), :) A(J r (k + 1 : m), :) E(J r (k + 1 : m), :) + Tr E(J r (1 : k), :) ï ò ï ò A(J = P r (1 : k), :) P A(J r (k + 1 : m), :) E(J r (k + 1 : m), :) Tr E(J r (1 : k), :) ï ò = P A(J r, :) P E(J r (k + 1 : m), :) Tr E(J r (1 : k), :) ï ò = A P E(J r (k + 1 : m), :) Tr. E(J r (1 : k), :) This proves (3.5). (3.5) 7

8 Notice that the entries of T r are expected to be small in absolute magnitude [1], which implies that Ẽ can be made small when k is large enough so that E is sufficiently small. 4 Numerics In this section, we present numerical comparisons between the proposed CUR ID algorithm, and previously proposed schemes, specifically those implemented in the rcur package [1]. We compare the proposed CUR ID algorithm against three versions of CUR algorithms, as executed by the rcur package: CUR-1 The full SVD is computed and provided to rcur, and then the orthogonal top scores option is chosen. This is an expensive scheme that we believe represent the best possible performance in rcur. CUR-2 The full SVD is computed and provided to rcur, and then the top scores option is chosen. This is a fairly expensive procedure (since it uses the full SVD) that reflects a common way that leverage scores are used. CUR-3 This is the same as CUR-2, but instead of providing the full SVD, we provide approximations to the top singular values and singular values, as computed using the randomized SVD in [8]. Our first set of test matrices ( Set 1 ) involves matrices A of size 1 3, of the form A = U D V where U and V are random orthonormal matrices, and D is a diagonal matrix with entries that are logspaced between 1 and 1 b, for b = 2, 4, 6. The second set ( Set 2 ) are simply the transposes of the matrices in Set 1 (so these are matrices of size 3 1). Figure 1 plots the median relative errors between the matrix A and the corresponding factorization, taken over 5 trials. We observe from the plots that the performance of the CUR ID algorithm is comparable with the scheme CUR-1, which uses the full SVD of A. From Figure 1, we also see that using the randomized SVD in CUR for weights instead of the full SVD does not significantly increase the error. Hence, in Figure 2, we compare the performance and runtimes of the top scores CUR algorithm with the randomized SVD and the CUR ID algorithm using the randomized ID, as described in this text. This comparison allow us to test two algorithms which can be used in practice on large matrices, since they both involve randomization. We again use random matrices constructed as above whose singular values are logspaced, ranging from 1 to 1 4, but of larger size: 5 8. We notice that the performance with both schemes is similar but the runtime with the randomized CUR ID algorithm is lower. The plotted quantities are again medians over 5 trials. In Figure 3, we repeat the experiment with the two matrices A 1 and A 2 defined in the preprint [13]. The matrices A 1, A 2 R 3, 3 are constructed as follows: A 1 = 1 j=1 2 j x jy T j + 3 j=11 1 j x jy T j and A 2 = 1 j=1 1 j x j y T j + 3 j=11 1 j x jy T j, where x and y are sparse vectors with random non-negative entries. One problem with using traditional CUR algorithms for these matrices, stems from the fact that their singular values decay fairly rapidly. Because of this, the factorizations are often computed for a small k, but if we are to use a randomized scheme to compute the approximate low rank SVD of the matrices, we must typically take a higher k than we need to get more accurate eigenvectors. Also, the ill-conditioning of the matrices requires the inversion of ill-conditioned matrices for traditional CUR algorithms, which adversely effects performance. In Figure 3, we show the medians of relative errors versus k over 5 trials. 8

9 In Figure 4, we have an image compression experiment, using CUR ID and CUR with the full SVD and ortho top score scheme in rcur. We take a 6 44 back and white image, and transform the matrix using a 2D CDF 97 wavelet transform. We then threshold the result, leaving a sparse matrix M with about 3% nonzeros. We then construct a low rank CUR approximation of this wavelet thresholded matrix (with k = 44, 44, 22) to further compress the image data. Obviously, when k = 44 is used, we must store more data than with the wavelet thresholded matrix, since we must store the three matrices C, U, and R which corresponds to about 5 times more nonzeros. However, for k = 44 and k = 22 we obtain compression ratios of a factor of 4 and 8, respectively, when comparing the total number of nonzeros in matrices C, U, and R against the wavelet transformed thresholded representation M. To reconstruct the image from compressed samples, we perform the inverse CDF 97 WT transform on the matrix product CUR, which approximates the wavelet thresholded matrix. From the plots, we see that CUR ID often produces an U which is less ill-conditioned than the U matrix obtained with the regular CUR algorithm, leading to significantly better reconstructions. Thus, in each case, we observe comparable or even better performance with CUR ID than with existing CUR algorithms. For large matrices, the only plausible way to go is to use an existing CUR algorithm with a randomized SVD or to use CUR ID with the randomized ID. We find that for random matrices the performance is similar, but CUR ID is easier to implement and is generally more efficient. Also, as in the case of the imaging example we present, existing CUR algorithms suffer from a badly conditioned U matrix when the original matrix is not well conditioned. The U matrix returned by the CUR ID algorithm tends to be better conditioned. 9

10 log relative error CUR 1 CUR 2 CUR 3 CUR ID ID2S RSVD SVD log relative error CUR 1 CUR 2 CUR 3 CUR ID ID2S RSVD SVD log relative error CUR 1 CUR 2 CUR 3 CUR ID ID2S RSVD SVD log relative error CUR 1 CUR 2 CUR 3 CUR ID ID2S RSVD SVD 5 5 log relative error CUR 1 CUR 2 CUR 3 CUR ID ID2S RSVD SVD log relative error CUR 1 CUR 2 CUR 3 CUR ID ID2S RSVD SVD Figure 1: Relative errors for differently conditioned matrices approximated with various algorithms. Left: fat matrices (1 3), right: thin matrices (3 1). Top to bottom: faster drop off of logspaced singular values. 1

11 log relative error CUR RSVD CUR R ID RSVD SVD time (s) ELAPSED TIMES CUR RSVD CUR R ID RSVD Figure 2: Relative errors and elapsed times for CUR top scores scheme with randomized SVD and CUR ID with the randomized ID using larger matrices of size CUR SVD CUR RSVD CUR R ID CUR SVD CUR RSVD CUR R ID log relative error log relative error Figure 3: Relative errors versus k for matrices A 1 and A 2 from [13] approximated using the CUR with RSVD and CUR ID with randomized ID schemes. 11

with increasing compression ratios ( 51, 4, 8).

obtained with rcur with SVD in column 1 and with CUR ID in column 2.

12 SVDs of U mats 1 U svd U id A SVDs of U mats 1 U svd U id A SVDs of U mats 1 U svd U id A Figure 4: Set IV: reconstructed images with increasing compression ratios ( 51, 4, 8). Images resulting from applying Inverse Wavelet transform to matrix product CU R obtained with rcur with SVD in column 1 and with CUR ID in column 2. Column 3 plots: singular value distributions of U matrices with the two algorithms compared. 12

13 5 Conclusions In this paper, we present a new algorithm for computing the CUR decomposition. The algorithm relies on the ID decomposition, which follows directly from the truncated pivoted QR factorization. The new algorithm provides a more direct and more efficient way to compute the CUR factorization. Our numerical tests support this statement, while revealing that our CUR ID algorithm often leads to better results than the CUR decomposition obtained with existing methods. We demonstrate that while the CUR factorizations is inherently ill-conditioned, the loss of error is less in the proposed CUR ID scheme than those produced by existing methods (this difference can be appreciable, cf. Figure 4). For larger matrices, the CUR ID algorithm can be used with a randomized ID decomposition, which corresponds to a pivoted QR factorization on a significantly smaller matrix than the original. Hence, CUR ID can be used in place of existing algorithms for computing the CUR decomposition for both small and large matrices. Work is currently under way to compare the proposed scheme to the recently published DEIM- CUR of Sorensen and Embree [13], and the comparison will be published shortly. Acknowledgement The research reported was supported by the Defense Advanced Projects Research Agency under the contract N , and by the National Science Foundation under contracts and

14 References [1] András Bodor, István Csabai, Michael Mahoney, and Norbert Solymosi. rcur: an R package for CUR matrix decomposition. BMC Bioinformatics, 13(1), , 8 [2] Tony F. Chan. Rank revealing {QR} factorizations. Linear Algebra and its Applications, 8889():67 82, [3] Tony F. Chan. Rank revealing QR factorizations. Linear Algebra Appl., 88/89:67 82, [4] Petros Drineas, Michael W. Mahoney, and S. Muthukrishnan. Relative-error CU R matrix decompositions. SIAM J. Matrix Anal. Appl., 3(2): , [5] Petros Drineas, Michael W. Mahoney, and S. Muthukrishnan. Relative-error CUR matrix decompositions. SIAM J. Matrix Anal. Appl., 3(2): , September [6] Gene H. Golub and Charles F. Van Loan. Matrix computations. Johns Hopkins Studies in the Mathematical Sciences. Johns Hopkins University Press, Baltimore, MD, fourth edition, , 3 [7] Ming Gu and Stanley C. Eisenstat. Efficient algorithms for computing a strong rank-revealing qr factorization. SIAM J. Sci. Comput., 17(4): , July [8] N. Halko, P. G. Martinsson, and J. A. Tropp. Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev., 53(2): , , 2, 5, 6, 8 [9] David C. Hoaglin and Roy E. Welsch. The Hat matrix in regression and ANOVA. The American Statistician, 32(1):17 22, [1] Edo Liberty, Franco Woolfe, Per-Gunnar Martinsson, Vladimir Rokhlin, and Mark Tygert. Randomized algorithms for the low-rank approximation of matrices. Proceedings of the National Academy of Sciences, 14(51): , [11] Michael W. Mahoney and Petros Drineas. CUR matrix decompositions for improved data analysis. Proc. Natl. Acad. Sci. USA, 16(3):697 72, 29. With supplementary material available online. 1 [12] Nikola Mitrovic, Muhammad Tayyab Asif, Umer Rasheed, Justin Dauwels, and Patrick Jaillet. CUR decomposition for compression and compressed sensing of large-scale traffic data. Proceedings of the 16th International IEEE Annual Conference on Intelligent Transportation Systems, [13] D. C. Sorensen and M. Embree. A DEIM Induced CUR Factorization. ArXiv e-prints, July , 8, 11, 13 [14] Shusen Wang and Zhihua Zhang. Improving CUR matrix decomposition and the Nyström approximation via adaptive sampling. J. Mach. Learn. Res., 14: ,

15 Appendix: Pseudocodes In this section, we give the pseudocode for the ID and CUR algorithms mentioned in this paper. Algorithm 1: A ID Decomposition Input : A C m n and parameter k < min(m, n). Output: A column index set J c and a matrix V C n k (such that A A(:, J c (1 : k))v ). 1 Perform pivoted QR factorization to get: AP = Q 1 S 1 ; % i.e., P R n n is a permutation matrix, Q 1 C m k has orthonormal columns and S 1 C k n is upper triangular; 2 define the ordered index set J c via I(:, J c ) = P ; 3 partition S 1 : S 11 S 1 (:, 1 : k), S 12 S 1 (:, k + 1 : n); 4 V P î ; I k S11 1 S 12ó Algorithm 2: A randomized ID Decomposition Input : A C m n and parameter k < min(m, n) and oversampling parameter p < k. Output: A column index set J c and a matrix V C n k (such that A A(:, J c (1 : k))v ). 1 Construct random matrix Ω R (k+p) m ; 2 Construct sample matrix Y = ΩA; 3 Perform full pivoted QR factorization on Y to get: Y P = QS; 4 Remove p columns of Q and p rows of S to construct Q 1 and S 1 ; 5 Define the ordered index set J c via I(:, J c ) = P ; 6 Partition S 1 : S 11 S 1 (:, 1 : k), S 12 S 1 (:, k + 1 : n); 7 V P î I k S 1 11 S 12ó ; 15

16 Algorithm 3: A two sided ID Decomposition Input : A C m n and parameter k < min(m, n). Output: A column index set J c, a row index set J r and a matrices V C n k and W C m k (such that A W A(J r (1 : k), J c (1 : k))v ). 1 Perform a one sided ID of A so that A CV where C = A(:, J c (1 : k)); 2 Perform a full rank ID on C so that C = C (:, J r (1 : k))w ; Algorithm 4: A CUR ID Algorithm Input : A C m n and parameter k < min(m, n). Output: Matrices C C m k, R C k n, and U C k k (such that A CUR). 1 Construct a two sided ID of A so that A W A(J r (1 : k), J c (1 : k))v ; 2 Construct matrices C = A(:, J c (1 : k)) and R = A(J r (1 : k), :); 3 Construct matrix U via U = V R ; 16

A fast randomized algorithm for overdetermined linear least-squares regression

A fast randomized algorithm for overdetermined linear least-squares regression Vladimir Rokhlin and Mark Tygert Technical Report YALEU/DCS/TR-1403 April 28, 2008 Abstract We introduce a randomized algorithm