Randomized Numerical Linear Algebra: Review and Progresses

Size: px
Start display at page:

Download "Randomized Numerical Linear Algebra: Review and Progresses"

Transcription

1 ized ized SVD ized : Review and Progresses Zhihua Department of Computer Science and Engineering Shanghai Jiao Tong University The 12th China Workshop on Machine Learning and Applications Xi an, November 2014

2 ized ized SVD An interdisciplinary among Theoretical Computer Science (TCS), (NLA), and Modern Data Analysis Many data mining and machine learning algorithms involve matrix decomposition, matrix inverse and matrix determinant; and some methods are based on low-rank matrix approximation. The Big Data phenomenon brings new challenges and opportunities to machine learning and data mining.

3 Singular Value Decomposition (SVD) ized ized SVD Input: an m n data matrix A of rank r and an integer k less than r. The (condensed) SVD: A = UΣV T where U T U = I r, V T V = I r, and Σ = diag(σ 1,..., σ r ) with σ 1 σ 2 σ r > 0. time complexity: O(mn min(m, n)) The truncated SVD: A k = U k Σ k V T k where U k and V k are the first k columns of U and V, and Σ k is the k k top sub-block of Σ. A k is the closest rank-k approximation to A. That is, A k = argmin A X ξ. rank(x) k where ξ = 2" is the matrix spectral norm and ξ = F " is the matrix Frobenius norm. time complexity: O(mnk)

4 Singular Value Decomposition (SVD) ized ized SVD Input: an m n data matrix A of rank r and an integer k less than r. The (condensed) SVD: A = UΣV T where U T U = I r, V T V = I r, and Σ = diag(σ 1,..., σ r ) with σ 1 σ 2 σ r > 0. time complexity: O(mn min(m, n)) The truncated SVD: A k = U k Σ k V T k where U k and V k are the first k columns of U and V, and Σ k is the k k top sub-block of Σ. A k is the closest rank-k approximation to A. That is, A k = argmin A X ξ. rank(x) k where ξ = 2" is the matrix spectral norm and ξ = F " is the matrix Frobenius norm. time complexity: O(mnk)

5 Singular Value Decomposition (SVD) ized ized SVD Input: an m n data matrix A of rank r and an integer k less than r. The (condensed) SVD: A = UΣV T where U T U = I r, V T V = I r, and Σ = diag(σ 1,..., σ r ) with σ 1 σ 2 σ r > 0. time complexity: O(mn min(m, n)) The truncated SVD: A k = U k Σ k V T k where U k and V k are the first k columns of U and V, and Σ k is the k k top sub-block of Σ. A k is the closest rank-k approximation to A. That is, A k = argmin A X ξ. rank(x) k where ξ = 2" is the matrix spectral norm and ξ = F " is the matrix Frobenius norm. time complexity: O(mnk)

6 The ized ized SVD A CUR decomposition algorithm seeks to find a subset of c columns of A to form a matrix C R m c, a subset of r rows to form a matrix R R r n, and an intersection matrix U R c r such that A CUR ξ is minimized. The CUR decomposition results in an interpretable matrix approximation to A. There are ( n c) possible choices of constructing C and ( m r ) possible choices of constructing R, so selecting the best subsets is a hard problem.

7 Kernel Methods ized ized SVD K: n n kernel matrix. Matrix inverse b = (K + αi n ) 1 y time complexity: O(n 3 ) performed by Gaussian process regression, least square SVM, kernel ridge regression Partial eigenvalue decomposition of K time complexity: O(n 2 k) performed by kernel PCA and some manifold learning methods Space complexity: O(n 2 ) the iterative algorithms go many passes through the data you had better put the entire kernel matrix in RAM if the data does not fit in the RAM, one swap between RAM and disk in each pass.

8 Approaches for Large Scale Matrix Computations ized ized SVD Two typical approaches: incremental and distributed ized algorithms have been also used.

9 Outline ized ized SVD 1 ized SVD 2 3 4

10 Outline ized ized SVD 1 ized SVD 2 3 4

11 ized ized SVD This lemma has been given by Johnson and (1984), but the proof was not constructive. Indyk and Motwani (1998) and Dasgupta and Gupta (2003) constructed a result based on Gaussian random projection matrix R = [r ij ] where r ij iid N(0, 1). Matoušek (2008) generalized the result to the case that r ij s are any subgaussian random variables; that is, r ij iid G(ν 2 ) for ν 1.

12 ized ized SVD Definition (ɛ-isometry) Given ɛ (0, 1), a map f : R p R q where p > q is called an ɛ-isometry on set X R p if for every pair x, y X, we have (1 ɛ) x y 2 2 f (x) f (y) 2 2 (1 + ɛ) x y 2 2. We consider the case that f is defined as a linear map R R q p. The Basic idea is to construct a random projection R R q p that is an exact isometry in expectation;" that is, for every x R p, E [ Rx 2 2] = x 2 2.

13 ized ized SVD Theorem ( ) Let X = {x 1,..., x n } R p, and let ɛ, δ (0, 1). Assume that R R q p ( p > q) where r ij G(ν 2 ) for some ν 1. If q 100ν 2 ɛ 2 log(n/ δ), then with probability at least 1 δ, R is an ɛ-isometry on X where Y = { Pr sup y Y } Ry ɛ δ. { xi x j x i x j 2 : x i, x j X, x i x j }.

14 Prototype for ized SVD ized ized SVD Given an m n matrix A, a target number k of singular vectors, and an integer c such that k < c < min(m, n), a proto-algorithm based on random projection for Singular Value Decomposition (SVD) of A is as follows. 1 Construct an m c column-orthonormal matrix Q and form B = Q T A; 2 Compute SVD of the small matrix: B = U B Σ B V T B ; 3 Set Ũ = QU B; 4 Return ŨΣ BV T B as an approximate SVD of A, and U B,k Σ B,k V T B,k as a truncated SVD of A.

15 A Proto-Algorithm for Construction of Matrix Q ized ized SVD Let A be an m n matrix, and k be a target number of singular vectors. 1 Generate an m 2k Gaussian test matrix Ω. 2 Form Y = (AA T ) γ AΩ where γ = 1 or γ = 2. 3 Construct a matrix Q whose columns form an orthonormal basis for the range of Y.

16 Computational Complexity for the ized SVD ized ized SVD The randomized SVD procedure requires only 2(γ + 1) passes over the matrix. The flop count is (2γ + 2)kT mult + O(k 2 (m + n)), where T mult is the flop count of a matrix-vector multiply with A or A T.

17 Theoretical Analysis for the ized SVD ized ized SVD Theorem (Halko et al., 2011) Let A R m n. Give an exponent γ and a target number k of singular vectors, where 2 k 1 2 min(m, n), running the ized SVD algorithm obtains a rank-2k factorization Ũ 2k Σ2k Ṽ T 2k. Then E A Ũ2k Σ [ 2 min(m, n) ] 1/(2γ+1)σk+1 2k Ṽ T 2k k 1 where E is taken w.r.t. the random test matrix and σ k+1 is the top (k + 1)th singular value of A.

18 Outline ized ized SVD 1 ized SVD 2 3 4

19 The Problem ized ized SVD For a fixed m n matrix A of rank r and an error parameter ɛ (0, 1), we call S : R m R k a subspace embedding matrix for A if for all x R n. (1 ɛ) Ax 2 SAx 2 (1 + ɛ) Ax 2 The Problem is to find such an embedding matrix obliviously. More specifically, one designs a distribution π over linear maps from R m to R k such that for any fixed m n matrix A, if we choose S π, then with high probability S is an embedding matrix for A.

20 The Problem ized ized SVD For a fixed m n matrix A of rank r and an error parameter ɛ (0, 1), we call S : R m R k a subspace embedding matrix for A if for all x R n. (1 ɛ) Ax 2 SAx 2 (1 + ɛ) Ax 2 The Problem is to find such an embedding matrix obliviously. More specifically, one designs a distribution π over linear maps from R m to R k such that for any fixed m n matrix A, if we choose S π, then with high probability S is an embedding matrix for A.

21 Sparse Matrices ized ized SVD For a fixed m n matrix A with m > n, let nnz(a) denote the number of non-zero entries of A. Assume that nnz(a) m and that there are no all-zero rows or columns in A. Let [m] = {1, 2,..., m}. For a parameter k, define a random linear map ΦD : R m R k as follows h : [m] [k] is a random map so that for each i [m], h(i) = t where t [k] with probability 1/k. Φ {0, 1} k m is a k m binary matrix, with φ h(i),i = 1 and all remaining entries 0. D is an m m random diagonal matrix, with each diagonal entry independently chosen to be +1 or 1 with equal probability. A matrix of the form S = ΦD is referred to as a sparse embedding matrix (Dasgupta et al., 2010; Clarkson and Woodruff, 2013).

22 Sparse Matrices ized ized SVD For a fixed m n matrix A with m > n, let nnz(a) denote the number of non-zero entries of A. Assume that nnz(a) m and that there are no all-zero rows or columns in A. Let [m] = {1, 2,..., m}. For a parameter k, define a random linear map ΦD : R m R k as follows h : [m] [k] is a random map so that for each i [m], h(i) = t where t [k] with probability 1/k. Φ {0, 1} k m is a k m binary matrix, with φ h(i),i = 1 and all remaining entries 0. D is an m m random diagonal matrix, with each diagonal entry independently chosen to be +1 or 1 with equal probability. A matrix of the form S = ΦD is referred to as a sparse embedding matrix (Dasgupta et al., 2010; Clarkson and Woodruff, 2013).

23 Sparse Matrices ized ized SVD For a fixed m n matrix A with m > n, let nnz(a) denote the number of non-zero entries of A. Assume that nnz(a) m and that there are no all-zero rows or columns in A. Let [m] = {1, 2,..., m}. For a parameter k, define a random linear map ΦD : R m R k as follows h : [m] [k] is a random map so that for each i [m], h(i) = t where t [k] with probability 1/k. Φ {0, 1} k m is a k m binary matrix, with φ h(i),i = 1 and all remaining entries 0. D is an m m random diagonal matrix, with each diagonal entry independently chosen to be +1 or 1 with equal probability. A matrix of the form S = ΦD is referred to as a sparse embedding matrix (Dasgupta et al., 2010; Clarkson and Woodruff, 2013).

24 Sparse Matrices ized ized SVD For a fixed m n matrix A with m > n, let nnz(a) denote the number of non-zero entries of A. Assume that nnz(a) m and that there are no all-zero rows or columns in A. Let [m] = {1, 2,..., m}. For a parameter k, define a random linear map ΦD : R m R k as follows h : [m] [k] is a random map so that for each i [m], h(i) = t where t [k] with probability 1/k. Φ {0, 1} k m is a k m binary matrix, with φ h(i),i = 1 and all remaining entries 0. D is an m m random diagonal matrix, with each diagonal entry independently chosen to be +1 or 1 with equal probability. A matrix of the form S = ΦD is referred to as a sparse embedding matrix (Dasgupta et al., 2010; Clarkson and Woodruff, 2013).

25 Sparse Matrices ized ized SVD For a fixed m n matrix A with m > n, let nnz(a) denote the number of non-zero entries of A. Assume that nnz(a) m and that there are no all-zero rows or columns in A. Let [m] = {1, 2,..., m}. For a parameter k, define a random linear map ΦD : R m R k as follows h : [m] [k] is a random map so that for each i [m], h(i) = t where t [k] with probability 1/k. Φ {0, 1} k m is a k m binary matrix, with φ h(i),i = 1 and all remaining entries 0. D is an m m random diagonal matrix, with each diagonal entry independently chosen to be +1 or 1 with equal probability. A matrix of the form S = ΦD is referred to as a sparse embedding matrix (Dasgupta et al., 2010; Clarkson and Woodruff, 2013).

26 in Input-Sparsity Time ized ized SVD Theorem (Meng and Mahoney, 2013) Let S = ΦD R k m with k = n2 +n. Then with probability at ɛ 2 δ least 1 δ, (1 ɛ) Ax 2 SAx 2 (1 + ɛ) Ax 2 for all x R n. In addition, SA can be computed in O(nnz(A)).

27 Spectral Sparsifiers ized ized SVD Theorem (Batson, Spielman and Srivastava, 2014) Suppose ρ > 1 and {v 1, v 2,..., v m } R n with v i v T i = I n. i m Then there exist scalars d i 0 with {i : d i 0} ρn such that ( 1 1 ) 2In ( d i v i v T ρ i ) 2In. ρ i m This theorem shows that λ 1 ( i m d iv i v T i ) λ n ( i m d iv i v T i ) ρ ρ ρ ρ.

28 Outline ized ized SVD 1 ized SVD 2 3 4

29 and The CX Decomposition ized ized SVD Given an m n matrix A, column selection algorithms aim to find a matrix with c columns of A such that A CC + A ξ = (I m CC + )A ξ achieves the minimum. Here ξ = 2," ξ = F," and ξ = " respectively represent the matrix spectral norm, the matrix Frobenius norm, and the matrix nuclear norm, and C + is the Moore-Penrose inverse of C. Let X be the best rank k approximation to A in the column span of C. Then CX is called the CX Decomposition of A. Since there are ( n c) possible choices of constructing C, selecting the best subset is a hard problem.

30 A ized Algorithm for ized ized SVD Given an m n matrix A and a rank parameter k, a random sampling based on the statistical leverage score is: Compute the importance sampling probabilities {π i } n i=1. Here π i = 1 k V k (i), where V k is an n k orthonormal matrix spanning the top-k right singular subspace of A. ly select c = O(k log(k/ɛ 2 )) columns of A according to these probabilities to form the matrix C.

31 Theoretical Result for the Column (Drineas et al., 2008) ized ized SVD Let C k be the best rank-k approximation to the matrix C, and define the projection matrix P Ck = C k C + k. Then A P Ck A F (1 + ɛ) A A k F, where A k = U k Σ k V T k is the best rank k approximation to A.

32 The Adaptive Sampling Algorithm ized ized SVD (Deshpande et al., 2006) Given a matrix A R m n, let C 1 R m c 1 consist of c 1 columns of A, and define the residual B = A C 1 C + 1 A. Additionally, for i = 1,, n, define π i = b i 2 2 / B 2 F. We further sample c 2 columns i.i.d. from A, in each trial of which the i-th column is chosen with probability π i. Let C 2 R m c 2 contain the c 2 sampled columns and let C = [C 1, C 2 ] R m (c 1+c 2 ). Then, for any integer k > 0, the following inequality holds: E A CC + A 2 F A A k 2 F + k c 2 A C 1 C + 1 A 2 F,

33 The Near-Optimal Algorithm ized ized SVD Boutsidis et al. (2013) derived a near-optimal algorithm, which consists of three steps: the approximate SVD via random projection (Halko et al. 2011) a dual set sparsification algorithm an extension of spectral sparsifier (BSS) the adaptive sampling algorithm (Deshpande et al., 2006)

34 The Near-Optimal Algorithm ized ized SVD Theorem (Boutsidis et al., 2013) Given a matrix A R m n of rank ρ, a target rank k (2 k < ρ), and 0 < ɛ < 1, the algorithm selects c = 2k ɛ ( ) 1 + o(1) columns of A to form a matrix C R m c. Then the following inequality holds: E A CC + A 2 F (1 + ɛ) A A k 2 F, where the expectation is taken w.r.t. C. Furthermore, the matrix C can be obtained in time: O ( mk 2 ɛ 4/3 + nk 3 ɛ 2/3) + T Multiply ( mnkɛ 2/3 ).

35 The (Drineas et al., 2008; Mahoney and Drineas, 2009) ized ized SVD Given an m n matrix A, and integers c < n and r < m, the CUR decomposition of A finds C R m c with c columns from A, R R r n with r rows from A, and U R c r such that A = CUR + E. Here E = A CUR is the residual error matrix.

36 The CUR Problem ized ized SVD Definition (The ) Given an m n matrix A of rank ρ, a rank parameter k, and accuracy parameter ɛ (0, 1), construct a matrix C R m c with c columns from A, R R r n with rows from A, and U R c r, with c, r, and rank(u) being as small as possible, such that A CUR 2 F (1 + ɛ) A A k 2 F. Here A k = U k Σ k V T k Rm n is the best rank k matrix obtained via the SVD of A: A = UΣV T.

37 The Sampling CUR Algorithm ized ized SVD Drineas et al., (2008) proposed a two-stage randomized CUR algorithm that called Sampling. The first stage samples c columns of A to construct C according to the sampling probabilities proportional to the squared l 2 -norm of the rows of V k ; The second stage samples r rows from A and C simultaneously to construct R and W and let U = W. The sampling probabilities in this stages are proportional to the leverage scores of A and C, respectively.

38 The Sampling CUR Algorithm ized ized SVD (Drineas et al., 2008) Given an m n matrix A and a target rank k min{m, n}, the subspace sampling algorithm selects c = O(kɛ 2 log k log(1/δ)) columns and r = O ( cɛ 2 log c log(1/δ) ) rows without replacement. Then A CUR F = A CW + R F (1 + ɛ) A A k F, holds with probability at least 1 δ, where W contains the rows of C with scaling. The running time is dominated by the truncated SVD of A, that is, O(mnk).

39 The Adaptive Sampling CUR Algorithm ized ized SVD Wang and (2013) proposed an Adaptive Sampling CUR Algorithm. Select c = 2k ( ) ɛ 1 + o(1) columns of A to construct C R m c using Algorithm of Boutsidis et al. (2013); Select r 1 = c rows of A to construct R 1 R r 1 n using Algorithm of Boutsidis et al. (2013); Adaptively sample r 2 = c/ɛ rows from A according to the residual A AR 1 R 1; Return C, R = [R T 1, RT 2 ]T, and U = C AR.

40 The Adaptive Sampling CUR Algorithm ized (Wang and, 2013) Given a matrix A R m n and a matrix C R m c such that rank(c) = rank(cc A) = ρ (ρ c n), let R 1 R r 1 n consist of r 1 rows of A and define the residual B = A AR 1 R 1. Additionally, for i = 1,, m, we define ized SVD π i = b (i) 2 2 / B 2 F. We further sample r 2 rows i.i.d. from A, in each trial of which the i-th row is chosen with probability p i. Let R 2 R r 2 n contain the r 2 sampled rows and let R = [R T 1, RT 2 ]T R (r 1+r 2 ) n. Then we have E A CC AR R 2 F A CC A 2 F + ρ r 2 A AR 1 R 1 2 F,

41 The Adaptive Sampling CUR Algorithm ized ized SVD Theorem (Wang and, 2013) Given a matrix A R m n and a positive integer k min{m, n}, the Adaptive Sampling CUR algorithm randomly selects c = 2k ɛ (1+o(1)) columns of A to construct C R m c, and then selects r = c ɛ (1+ɛ) rows of A to construct R R r n. Then we have E A CUR F = E A C(C AR )R F (1+ɛ) A A k F. The algorithm costs time O ( (m + n)k 3 ɛ 2/3 + mk 2 ɛ 2 + nk 2 ɛ 4) + T Multiply ( mnkɛ 1 ) to compute matrices C, U and R.

42 Optimal CUR Algorithm ized ized SVD Boutsidis and Woodruff (2014) proposed Optimal CUR Algorithm. Construction C with O(k + k ɛ ) columns: Compute the top k singular vectors of A: Z 1 Sample O(k log k) columns from Z T 1 with the leverage scores Down-sample columns to c 1 = O(k) columns with the sampling algorithm of Boutsidis et al. (2013) Adaptively sample c 2 = O( k ɛ ) columns of A Construction R with O(k + k ɛ ) rows: Find Z 2 in the span of C such that: A Z 2 Z T 2 A 2 F (1 + ɛ) A A k 2 F Sample O(k log k) rows from Z 2 with the leverage scores Down-sample rows to r 1 = O(k) rows with the sampling algorithm of Boutsidis et al. (2013) Sample r 2 = O( k ɛ ) rows with adaptive sampling

43 Optimal CUR Algorithm ized ized SVD (Boutsidis and Woodruff, 2014) Given a matrix A R m n, V R m c and an integer k, let V = YΨ be a QR decomposition of V, Γ = Y T A, Γ k = Σ k V T k be a rank k SVD of Γ, Rc k. Then Y T Y T satisfies: A Y T Y T A 2 F A Y Σ kv T k 2 F = A ΠF V,k (A) 2 F.

44 Optimal CUR Algorithm ized ized SVD Theorem (Boutsidis and Woodruff, 2014) Given a matrix A R m n of rank ρ, a target rank 1 k ρ, and 0 < ɛ < 1, the optimal CUR algorithm selects at most c = O(k/ɛ) columns and at most r = O(k/ɛ) rows from A form matrices C R m c, R R r n, and U R c r with rank(u) = k such that, with some probability, A CUR 2 F (1 + O(ɛ)) A A k 2 F. The matrices C, U, and R can be computed in time O [ nnz(a) log n + (m + n) poly(log n, k, 1/ɛ) ].

45 ized ized SVD : selects c ( n) columns of K to construct C using some randomized algorithms. After permutation we have K = [ W K T 21 K 21 K 22 The Nyström Approximation: K nys c }{{} n n = C ], C = }{{} n c W }{{} c c K nys c C T }{{} c n K. [ W K 21 ].

46 ized ized SVD : selects c ( n) columns of K to construct C using some randomized algorithms. After permutation we have K = [ W K T 21 K 21 K 22 The Nyström Approximation: K nys c }{{} n n = C ], C = }{{} n c W }{{} c c K nys c C T }{{} c n K. [ W K 21 ].

47 The Nyström Approximation ized The Nyström Approximation: K (A low-rank factorization). K nys c = CW C T ized SVD Nyström Approximation c c c n n n n c

48 Problem Formulation ized ized SVD Problem: How to select informative columns of K R n n to construct C R n c? The approximation error K CUC T F or K CUC T 2 should be as small as possible.

49 Criterion: Upper Error Bounds ized ized SVD Using approximation algorithms to find c good columns (not necessarily the best) Hope that K CUCT F K K k F smaller the better. has upper bound, which is the

50 Uniform Sampling: The Simplest Algorithm ized ized SVD Sample c columns of K uniformly at random to construct C. The simplest, but the most widely used.

51 Adaptive Sampling ized ized SVD The adaptive sampling algorithm [Deshpande et al., 2006]: 1 Sample c 1 columns of K to construct C 1 using some algorithm; 2 Compute the residual B = K C 1 C 1 K; 3 Compute sampling probabilities p i = b i 2 2, for i = 1 to B 2 F n; 4 Sample further c 2 columns of K in c 2 i.i.d. trials, in each trial the i-th column is chosen with probability p i ; Denote the selected columns by C 2 ; 5 Return C = [C 1, C 2 ].

52 Adaptive Sampling ized ized SVD The error term K CC K F is bounded theoretically, but K CW C T F is not. Empirically, the adaptive sampling algorithm works very well.

53 Better Sampling Algorithms? ized ized SVD We hope K CW C T F K K k F will be very small if the column sampling algorithm is good enough. But it cannot be arbitrarily small. Lower Error Bound Theorem (Wang &, JMLR 2013) Whatever column sampling is used to select c columns, there exists a bad case K such that K CW C T 2 ( F K K k 2 Ω 1 + nk ) c F 2.

54 Different Types of Low-Rank Approximation? ized ized SVD The Ensemble Nyström Method [Kumar et al., JMLR 2012]: t 1 K t C(i) W (i) C (i)t i=1 It does not improve the lower error bound. Lower Error Bound Theorem (Wang &, JMLR 2013) Whatever column sampling is used to select c columns, there exists a bad case K such that K t i=1 1 t C(i) W (i) C (i)t 2 F K K k 2 F ( Ω 1 + nk ) c 2.

55 The Modified Nyström Method ized ized SVD The Modified Nyström Method [Wang &, JMLR 2013]: ( ) K C C K(C ) T }{{} c c C T. Theorem (Wang &, JMLR 2013) Using a column sampling algorithm, the error incurred by the modified Nyström method satisfies K C ( C K(C ) T ) C T 2 F E K K k 2 F 1 + k c.

56 Comparisons between the Two Methods ized ized SVD The Standard Nyström Method: fast. It costs only T SVD (c 3 ) time to compute the intersection matrix U nys = W. The Modified Nyström Method: slow. It costs T SVD (nc 2 ) + T Multiply (n 2 c) time to compute the intersection matrix U mod = C K(C ) T naively.

57 Comparisons between the Two Methods ized ized SVD The Standard Nyström Method: inaccurate. It cannot attain 1 + ɛ Frobenius relative-error bound unless c nk/ɛ columns are selected, whatever column selection algorithm is used. (Due to its lower error bound.) The Modified Nyström Method: accurate. Some adaptive sampling based algorithms attain 1 + ɛ Frobenius relative-error bound when c = O(k/ɛ 2 ). (c is the smaller the better.)

58 Comparisons between the Two Methods ized ized SVD Theorem (Exact Recovery) For the symmetric matrix K defined previously, the following three statements are equivalent: 1 rank(w) = rank(k), 2 K = CW C T, (i.e., the standard Nyström method is exact) 3 K = C ( C K(C ) T ) C T, (i.e., the modified Nyström method is exact)

59 Outline ized ized SVD 1 ized SVD 2 3 4

60 ized ized SVD Santosh S. Vempala. The Method. American Mathematical Society, Michael W. Mahoney. ized Algorithms for Matrices and Data. Foundations and Trends in Machine Learning, 3(2): , N. Halko, P. G. Martinsson, and J. A. Tropp. Finding Structure with ness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions. SIAM Review, 53(2): , 2011 W. B. Johnson and J.. Extensions of Lipschitz mapping into a Hilbert space. Contemporary Mathematics, S. Dasgupta and A. Gupta. An elementary proof of a theorem of Johnson and. Structure & Algorithms, J. Matoušek. On variants of the Johnson and Leamma. Structure & Algorithms, 2008.

61 ized ized SVD A. Dasgupta, R. Kumar, and T. Sarlós: A sparse Johnson- Transform. In STOC, K. L. Clarkson and D. P. Woodruff: Low Rank Approximation and Regression in Sparsity Time. In STOC, X. Meng and M. W. Mahoney. Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression. STOC, J. Nelson and H. L. Nguyên. OSNAP: Faster numerical linear algebra algorithms via sparser subspace embeddings In FOCS, J. Batson, D. Spielman, and N. Srivastave: Twice-Ramanujan Sparsifiers. SIAM Review, 2014.

62 ized ized SVD A. Frieze, K. Kannan, and Rademacher, S. Vempala: Fast Monte-Carlo algorithms for finding low-rank approximation. In FOCS, Journal of the ACM, A. Deshpande, L. Rademacher, S. Vempala, and G.Wang: Matrix approximation and projective clustering via volume sampling. Theory of Computing, C. Boutsidis, P. Drineas, and M. Magdon-Ismail: Near optimal column-based matrix reconstruction. SIAM Journal on Computing, V. Guruswami and A. K. Sinop: Optimal column based low-rank matrix reconstruction. In SODA, 2012.

63 ized ized SVD P. Drineas, M. W. Mahoney, and S. Muthukrishnan. Relative-error CUR matrix decompositions. SIAM Journal on Matrix Analysis and Applications, M. W. Mahoney and P. Drineas. CUR matrix decompositions for improved data analysis. Proceedings of the National Academy of Sciences, Sshuse Wang and Zhihua : Improving CUR matrix decomposition and the Nyström approximation via adaptive sampling. JMLR, C. Boutsidis and D. P. Woodruff: Optimal CUR matrix decompositions. In STOC, S. Kumar, M. Mohri, and A. Talwalkar: Sampling methods for the Nyström method. JMLR, K. L. Clarkson and D. P. Woodruff: Low Rank Approximation and Regression in Sparsity Time. In STOC, 2013.

Relative-Error CUR Matrix Decompositions

Relative-Error CUR Matrix Decompositions RandNLA Reading Group University of California, Berkeley Tuesday, April 7, 2015. Motivation study [low-rank] matrix approximations that are explicitly expressed in terms of a small numbers of columns and/or

More information

Sketching as a Tool for Numerical Linear Algebra

Sketching as a Tool for Numerical Linear Algebra Sketching as a Tool for Numerical Linear Algebra (Part 2) David P. Woodruff presented by Sepehr Assadi o(n) Big Data Reading Group University of Pennsylvania February, 2015 Sepehr Assadi (Penn) Sketching

More information

Randomized algorithms for the approximation of matrices

Randomized algorithms for the approximation of matrices Randomized algorithms for the approximation of matrices Luis Rademacher The Ohio State University Computer Science and Engineering (joint work with Amit Deshpande, Santosh Vempala, Grant Wang) Two topics

More information

Approximate Spectral Clustering via Randomized Sketching

Approximate Spectral Clustering via Randomized Sketching Approximate Spectral Clustering via Randomized Sketching Christos Boutsidis Yahoo! Labs, New York Joint work with Alex Gittens (Ebay), Anju Kambadur (IBM) The big picture: sketch and solve Tradeoff: Speed

More information

Column Selection via Adaptive Sampling

Column Selection via Adaptive Sampling Column Selection via Adaptive Sampling Saurabh Paul Global Risk Sciences, Paypal Inc. saupaul@paypal.com Malik Magdon-Ismail CS Dept., Rensselaer Polytechnic Institute magdon@cs.rpi.edu Petros Drineas

More information

Lecture 18 Nov 3rd, 2015

Lecture 18 Nov 3rd, 2015 CS 229r: Algorithms for Big Data Fall 2015 Prof. Jelani Nelson Lecture 18 Nov 3rd, 2015 Scribe: Jefferson Lee 1 Overview Low-rank approximation, Compression Sensing 2 Last Time We looked at three different

More information

to be more efficient on enormous scale, in a stream, or in distributed settings.

to be more efficient on enormous scale, in a stream, or in distributed settings. 16 Matrix Sketching The singular value decomposition (SVD) can be interpreted as finding the most dominant directions in an (n d) matrix A (or n points in R d ). Typically n > d. It is typically easy to

More information

RandNLA: Randomization in Numerical Linear Algebra

RandNLA: Randomization in Numerical Linear Algebra RandNLA: Randomization in Numerical Linear Algebra Petros Drineas Department of Computer Science Rensselaer Polytechnic Institute To access my web page: drineas Why RandNLA? Randomization and sampling

More information

Sketching as a Tool for Numerical Linear Algebra

Sketching as a Tool for Numerical Linear Algebra Sketching as a Tool for Numerical Linear Algebra David P. Woodruff presented by Sepehr Assadi o(n) Big Data Reading Group University of Pennsylvania February, 2015 Sepehr Assadi (Penn) Sketching for Numerical

More information

RandNLA: Randomized Numerical Linear Algebra

RandNLA: Randomized Numerical Linear Algebra RandNLA: Randomized Numerical Linear Algebra Petros Drineas Rensselaer Polytechnic Institute Computer Science Department To access my web page: drineas RandNLA: sketch a matrix by row/ column sampling

More information

Randomized Algorithms in Linear Algebra and Applications in Data Analysis

Randomized Algorithms in Linear Algebra and Applications in Data Analysis Randomized Algorithms in Linear Algebra and Applications in Data Analysis Petros Drineas Rensselaer Polytechnic Institute Computer Science Department To access my web page: drineas Why linear algebra?

More information

MANY scientific computations, signal processing, data analysis and machine learning applications lead to large dimensional

MANY scientific computations, signal processing, data analysis and machine learning applications lead to large dimensional Low rank approximation and decomposition of large matrices using error correcting codes Shashanka Ubaru, Arya Mazumdar, and Yousef Saad 1 arxiv:1512.09156v3 [cs.it] 15 Jun 2017 Abstract Low rank approximation

More information

Sparse Features for PCA-Like Linear Regression

Sparse Features for PCA-Like Linear Regression Sparse Features for PCA-Like Linear Regression Christos Boutsidis Mathematical Sciences Department IBM T J Watson Research Center Yorktown Heights, New York cboutsi@usibmcom Petros Drineas Computer Science

More information

Low-Rank PSD Approximation in Input-Sparsity Time

Low-Rank PSD Approximation in Input-Sparsity Time Low-Rank PSD Approximation in Input-Sparsity Time Kenneth L. Clarkson IBM Research Almaden klclarks@us.ibm.com David P. Woodruff IBM Research Almaden dpwoodru@us.ibm.com Abstract We give algorithms for

More information

Sketched Ridge Regression:

Sketched Ridge Regression: Sketched Ridge Regression: Optimization and Statistical Perspectives Shusen Wang UC Berkeley Alex Gittens RPI Michael Mahoney UC Berkeley Overview Ridge Regression min w f w = 1 n Xw y + γ w Over-determined:

More information

Subset Selection. Deterministic vs. Randomized. Ilse Ipsen. North Carolina State University. Joint work with: Stan Eisenstat, Yale

Subset Selection. Deterministic vs. Randomized. Ilse Ipsen. North Carolina State University. Joint work with: Stan Eisenstat, Yale Subset Selection Deterministic vs. Randomized Ilse Ipsen North Carolina State University Joint work with: Stan Eisenstat, Yale Mary Beth Broadbent, Martin Brown, Kevin Penner Subset Selection Given: real

More information

CS 229r: Algorithms for Big Data Fall Lecture 17 10/28

CS 229r: Algorithms for Big Data Fall Lecture 17 10/28 CS 229r: Algorithms for Big Data Fall 2015 Prof. Jelani Nelson Lecture 17 10/28 Scribe: Morris Yau 1 Overview In the last lecture we defined subspace embeddings a subspace embedding is a linear transformation

More information

Random Methods for Linear Algebra

Random Methods for Linear Algebra Gittens gittens@acm.caltech.edu Applied and Computational Mathematics California Institue of Technology October 2, 2009 Outline The Johnson-Lindenstrauss Transform 1 The Johnson-Lindenstrauss Transform

More information

Low Rank Matrix Approximation

Low Rank Matrix Approximation Low Rank Matrix Approximation John T. Svadlenka Ph.D. Program in Computer Science The Graduate Center of the City University of New York New York, NY 10036 USA jsvadlenka@gradcenter.cuny.edu Abstract Low

More information

sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU)

sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU) sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU) 0 overview Our Contributions: 1 overview Our Contributions: A near optimal low-rank

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models

More information

EECS 275 Matrix Computation

EECS 275 Matrix Computation EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 22 1 / 21 Overview

More information

Efficiently Implementing Sparsity in Learning

Efficiently Implementing Sparsity in Learning Efficiently Implementing Sparsity in Learning M. Magdon-Ismail Rensselaer Polytechnic Institute (Joint Work) December 9, 2013. Out-of-Sample is What Counts NO YES A pattern exists We don t know it We have

More information

Sketching Structured Matrices for Faster Nonlinear Regression

Sketching Structured Matrices for Faster Nonlinear Regression Sketching Structured Matrices for Faster Nonlinear Regression Haim Avron Vikas Sindhwani IBM T.J. Watson Research Center Yorktown Heights, NY 10598 {haimav,vsindhw}@us.ibm.com David P. Woodruff IBM Almaden

More information

Tighter Low-rank Approximation via Sampling the Leveraged Element

Tighter Low-rank Approximation via Sampling the Leveraged Element Tighter Low-rank Approximation via Sampling the Leveraged Element Srinadh Bhojanapalli The University of Texas at Austin bsrinadh@utexas.edu Prateek Jain Microsoft Research, India prajain@microsoft.com

More information

Randomized Algorithms for Matrix Computations

Randomized Algorithms for Matrix Computations Randomized Algorithms for Matrix Computations Ilse Ipsen Students: John Holodnak, Kevin Penner, Thomas Wentworth Research supported in part by NSF CISE CCF, DARPA XData Randomized Algorithms Solve a deterministic

More information

Lecture 5: Randomized methods for low-rank approximation

Lecture 5: Randomized methods for low-rank approximation CBMS Conference on Fast Direct Solvers Dartmouth College June 23 June 27, 2014 Lecture 5: Randomized methods for low-rank approximation Gunnar Martinsson The University of Colorado at Boulder Research

More information

A fast randomized algorithm for approximating an SVD of a matrix

A fast randomized algorithm for approximating an SVD of a matrix A fast randomized algorithm for approximating an SVD of a matrix Joint work with Franco Woolfe, Edo Liberty, and Vladimir Rokhlin Mark Tygert Program in Applied Mathematics Yale University Place July 17,

More information

Subspace sampling and relative-error matrix approximation

Subspace sampling and relative-error matrix approximation Subspace sampling and relative-error matrix approximation Petros Drineas Rensselaer Polytechnic Institute Computer Science Department (joint work with M. W. Mahoney) For papers, etc. drineas The CUR decomposition

More information

A Tutorial on Matrix Approximation by Row Sampling

A Tutorial on Matrix Approximation by Row Sampling A Tutorial on Matrix Approximation by Row Sampling Rasmus Kyng June 11, 018 Contents 1 Fast Linear Algebra Talk 1.1 Matrix Concentration................................... 1. Algorithms for ɛ-approximation

More information

Subspace Embeddings for the Polynomial Kernel

Subspace Embeddings for the Polynomial Kernel Subspace Embeddings for the Polynomial Kernel Haim Avron IBM T.J. Watson Research Center Yorktown Heights, NY 10598 haimav@us.ibm.com Huy L. Nguy ên Simons Institute, UC Berkeley Berkeley, CA 94720 hlnguyen@cs.princeton.edu

More information

Compressed Sensing and Robust Recovery of Low Rank Matrices

Compressed Sensing and Robust Recovery of Low Rank Matrices Compressed Sensing and Robust Recovery of Low Rank Matrices M. Fazel, E. Candès, B. Recht, P. Parrilo Electrical Engineering, University of Washington Applied and Computational Mathematics Dept., Caltech

More information

arxiv: v3 [cs.ds] 21 Mar 2013

arxiv: v3 [cs.ds] 21 Mar 2013 Low-distortion Subspace Embeddings in Input-sparsity Time and Applications to Robust Linear Regression Xiangrui Meng Michael W. Mahoney arxiv:1210.3135v3 [cs.ds] 21 Mar 2013 Abstract Low-distortion subspace

More information

A Fast Algorithm For Computing The A-optimal Sampling Distributions In A Big Data Linear Regression

A Fast Algorithm For Computing The A-optimal Sampling Distributions In A Big Data Linear Regression A Fast Algorithm For Computing The A-optimal Sampling Distributions In A Big Data Linear Regression Hanxiang Peng and Fei Tan Indiana University Purdue University Indianapolis Department of Mathematical

More information

Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices

Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices Vahid Dehdari and Clayton V. Deutsch Geostatistical modeling involves many variables and many locations.

More information

A fast randomized algorithm for overdetermined linear least-squares regression

A fast randomized algorithm for overdetermined linear least-squares regression A fast randomized algorithm for overdetermined linear least-squares regression Vladimir Rokhlin and Mark Tygert Technical Report YALEU/DCS/TR-1403 April 28, 2008 Abstract We introduce a randomized algorithm

More information

Recovering any low-rank matrix, provably

Recovering any low-rank matrix, provably Recovering any low-rank matrix, provably Rachel Ward University of Texas at Austin October, 2014 Joint work with Yudong Chen (U.C. Berkeley), Srinadh Bhojanapalli and Sujay Sanghavi (U.T. Austin) Matrix

More information

Lecture 9: Matrix approximation continued

Lecture 9: Matrix approximation continued 0368-348-01-Algorithms in Data Mining Fall 013 Lecturer: Edo Liberty Lecture 9: Matrix approximation continued Warning: This note may contain typos and other inaccuracies which are usually discussed during

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Numerical Linear Algebra Background Cho-Jui Hsieh UC Davis May 15, 2018 Linear Algebra Background Vectors A vector has a direction and a magnitude

More information

Fast Approximate Matrix Multiplication by Solving Linear Systems

Fast Approximate Matrix Multiplication by Solving Linear Systems Electronic Colloquium on Computational Complexity, Report No. 117 (2014) Fast Approximate Matrix Multiplication by Solving Linear Systems Shiva Manne 1 and Manjish Pal 2 1 Birla Institute of Technology,

More information

Subset Selection. Ilse Ipsen. North Carolina State University, USA

Subset Selection. Ilse Ipsen. North Carolina State University, USA Subset Selection Ilse Ipsen North Carolina State University, USA Subset Selection Given: real or complex matrix A integer k Determine permutation matrix P so that AP = ( A 1 }{{} k A 2 ) Important columns

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 5: Numerical Linear Algebra Cho-Jui Hsieh UC Davis April 20, 2017 Linear Algebra Background Vectors A vector has a direction and a magnitude

More information

Lecture 24: Element-wise Sampling of Graphs and Linear Equation Solving. 22 Element-wise Sampling of Graphs and Linear Equation Solving

Lecture 24: Element-wise Sampling of Graphs and Linear Equation Solving. 22 Element-wise Sampling of Graphs and Linear Equation Solving Stat260/CS294: Randomized Algorithms for Matrices and Data Lecture 24-12/02/2013 Lecture 24: Element-wise Sampling of Graphs and Linear Equation Solving Lecturer: Michael Mahoney Scribe: Michael Mahoney

More information

Principal Component Analysis

Principal Component Analysis Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used

More information

Randomized methods for computing the Singular Value Decomposition (SVD) of very large matrices

Randomized methods for computing the Singular Value Decomposition (SVD) of very large matrices Randomized methods for computing the Singular Value Decomposition (SVD) of very large matrices Gunnar Martinsson The University of Colorado at Boulder Students: Adrianna Gillman Nathan Halko Sijia Hao

More information

randomized block krylov methods for stronger and faster approximate svd

randomized block krylov methods for stronger and faster approximate svd randomized block krylov methods for stronger and faster approximate svd Cameron Musco and Christopher Musco December 2, 25 Massachusetts Institute of Technology, EECS singular value decomposition n d left

More information

Sketching as a Tool for Numerical Linear Algebra

Sketching as a Tool for Numerical Linear Algebra Foundations and Trends R in Theoretical Computer Science Vol. 10, No. 1-2 (2014) 1 157 c 2014 D. P. Woodruff DOI: 10.1561/0400000060 Sketching as a Tool for Numerical Linear Algebra David P. Woodruff IBM

More information

Randomized algorithms for matrix computations and analysis of high dimensional data

Randomized algorithms for matrix computations and analysis of high dimensional data PCMI Summer Session, The Mathematics of Data Midway, Utah July, 2016 Randomized algorithms for matrix computations and analysis of high dimensional data Gunnar Martinsson The University of Colorado at

More information

A randomized algorithm for approximating the SVD of a matrix

A randomized algorithm for approximating the SVD of a matrix A randomized algorithm for approximating the SVD of a matrix Joint work with Per-Gunnar Martinsson (U. of Colorado) and Vladimir Rokhlin (Yale) Mark Tygert Program in Applied Mathematics Yale University

More information

Approximating a Gram Matrix for Improved Kernel-Based Learning

Approximating a Gram Matrix for Improved Kernel-Based Learning Approximating a Gram Matrix for Improved Kernel-Based Learning (Extended Abstract) Petros Drineas 1 and Michael W. Mahoney 1 Department of Computer Science, Rensselaer Polytechnic Institute, Troy, New

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.

More information

Fast Random Projections using Lean Walsh Transforms Yale University Technical report #1390

Fast Random Projections using Lean Walsh Transforms Yale University Technical report #1390 Fast Random Projections using Lean Walsh Transforms Yale University Technical report #1390 Edo Liberty Nir Ailon Amit Singer Abstract We present a k d random projection matrix that is applicable to vectors

More information

Approximate Principal Components Analysis of Large Data Sets

Approximate Principal Components Analysis of Large Data Sets Approximate Principal Components Analysis of Large Data Sets Daniel J. McDonald Department of Statistics Indiana University mypage.iu.edu/ dajmcdon April 27, 2016 Approximation-Regularization for Analysis

More information

Introduction to Numerical Linear Algebra II

Introduction to Numerical Linear Algebra II Introduction to Numerical Linear Algebra II Petros Drineas These slides were prepared by Ilse Ipsen for the 2015 Gene Golub SIAM Summer School on RandNLA 1 / 49 Overview We will cover this material in

More information

Technical Report No September 2009

Technical Report No September 2009 FINDING STRUCTURE WITH RANDOMNESS: STOCHASTIC ALGORITHMS FOR CONSTRUCTING APPROXIMATE MATRIX DECOMPOSITIONS N. HALKO, P. G. MARTINSSON, AND J. A. TROPP Technical Report No. 2009-05 September 2009 APPLIED

More information

Subsampling for Ridge Regression via Regularized Volume Sampling

Subsampling for Ridge Regression via Regularized Volume Sampling Subsampling for Ridge Regression via Regularized Volume Sampling Micha l Dereziński and Manfred Warmuth University of California at Santa Cruz AISTATS 18, 4-9-18 1 / 22 Linear regression y x 2 / 22 Optimal

More information

Technical Report. Random projections for Bayesian regression. Leo Geppert, Katja Ickstadt, Alexander Munteanu and Christian Sohler 04/2014

Technical Report. Random projections for Bayesian regression. Leo Geppert, Katja Ickstadt, Alexander Munteanu and Christian Sohler 04/2014 Random projections for Bayesian regression Technical Report Leo Geppert, Katja Ickstadt, Alexander Munteanu and Christian Sohler 04/2014 technische universität dortmund Part of the work on this technical

More information

Randomized algorithms in numerical linear algebra

Randomized algorithms in numerical linear algebra Acta Numerica (2017), pp. 95 135 c Cambridge University Press, 2017 doi:10.1017/s0962492917000058 Printed in the United Kingdom Randomized algorithms in numerical linear algebra Ravindran Kannan Microsoft

More information

MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing

MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing Afonso S. Bandeira April 9, 2015 1 The Johnson-Lindenstrauss Lemma Suppose one has n points, X = {x 1,..., x n }, in R d with d very

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Edgar

More information

Column Subset Selection with Missing Data via Active Sampling

Column Subset Selection with Missing Data via Active Sampling Yining Wang Machine Learning Department Carnegie Mellon University Aarti Singh Machine Learning Department Carnegie Mellon University Abstract Column subset selection of massive data matrices has found

More information

NORMS ON SPACE OF MATRICES

NORMS ON SPACE OF MATRICES NORMS ON SPACE OF MATRICES. Operator Norms on Space of linear maps Let A be an n n real matrix and x 0 be a vector in R n. We would like to use the Picard iteration method to solve for the following system

More information

Enabling very large-scale matrix computations via randomization

Enabling very large-scale matrix computations via randomization Enabling very large-scale matrix computations via randomization Gunnar Martinsson The University of Colorado at Boulder Students: Adrianna Gillman Nathan Halko Patrick Young Collaborators: Edo Liberty

More information

Near-Optimal Coresets for Least-Squares Regression

Near-Optimal Coresets for Least-Squares Regression 6880 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 59, NO 10, OCTOBER 2013 Near-Optimal Coresets for Least-Squares Regression Christos Boutsidis, Petros Drineas, Malik Magdon-Ismail Abstract We study the

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

Can matrix coherence be efficiently and accurately estimated?

Can matrix coherence be efficiently and accurately estimated? Mehryar Mohri Courant Institute and Google Research New York, NY mohri@cs.nyu.edu Ameet Talwalkar Computer Science Division University of California, Berkeley ameet@eecs.berkeley.edu Abstract Matrix coherence

More information

Dense Fast Random Projections and Lean Walsh Transforms

Dense Fast Random Projections and Lean Walsh Transforms Dense Fast Random Projections and Lean Walsh Transforms Edo Liberty, Nir Ailon, and Amit Singer Abstract. Random projection methods give distributions over k d matrices such that if a matrix Ψ (chosen

More information

A Review of Nyström Methods for Large-Scale Machine Learning

A Review of Nyström Methods for Large-Scale Machine Learning A Review of Nyström Methods for Large-Scale Machine Learning Shiliang Sun, Jing Zhao, Jiang Zhu Shanghai Key Laboratory of Multidimensional Information Processing, Department of Computer Science and Technology,

More information

Fast low rank approximations of matrices and tensors

Fast low rank approximations of matrices and tensors Fast low rank approximations of matrices and tensors S. Friedland, V. Mehrmann, A. Miedlar and M. Nkengla Univ. Illinois at Chicago & Technische Universität Berlin Gene Golub memorial meeting, Berlin,

More information

THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR

THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR 1. Definition Existence Theorem 1. Assume that A R m n. Then there exist orthogonal matrices U R m m V R n n, values σ 1 σ 2... σ p 0 with p = min{m, n},

More information

Normalized power iterations for the computation of SVD

Normalized power iterations for the computation of SVD Normalized power iterations for the computation of SVD Per-Gunnar Martinsson Department of Applied Mathematics University of Colorado Boulder, Co. Per-gunnar.Martinsson@Colorado.edu Arthur Szlam Courant

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) Lecture 19: Computing the SVD; Sparse Linear Systems Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical

More information

Theoretical and empirical aspects of SPSD sketches

Theoretical and empirical aspects of SPSD sketches 53 Chapter 6 Theoretical and empirical aspects of SPSD sketches 6. Introduction In this chapter we consider the accuracy of randomized low-rank approximations of symmetric positive-semidefinite matrices

More information

Sketching as a Tool for Numerical Linear Algebra All Lectures. David Woodruff IBM Almaden

Sketching as a Tool for Numerical Linear Algebra All Lectures. David Woodruff IBM Almaden Sketching as a Tool for Numerical Linear Algebra All Lectures David Woodruff IBM Almaden Massive data sets Examples Internet traffic logs Financial data etc. Algorithms Want nearly linear time or less

More information

dimensionality reduction for k-means and low rank approximation

dimensionality reduction for k-means and low rank approximation dimensionality reduction for k-means and low rank approximation Michael Cohen, Sam Elder, Cameron Musco, Christopher Musco, Mădălina Persu Massachusetts Institute of Technology 0 overview Simple techniques

More information

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013. The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment Two Caramanis/Sanghavi Due: Tuesday, Feb. 19, 2013. Computational

More information

be a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u

be a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u MATH 434/534 Theoretical Assignment 7 Solution Chapter 7 (71) Let H = I 2uuT Hu = u (ii) Hv = v if = 0 be a Householder matrix Then prove the followings H = I 2 uut Hu = (I 2 uu )u = u 2 uut u = u 2u =

More information

CS60021: Scalable Data Mining. Dimensionality Reduction

CS60021: Scalable Data Mining. Dimensionality Reduction J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 1 CS60021: Scalable Data Mining Dimensionality Reduction Sourangshu Bhattacharya Assumption: Data lies on or near a

More information

Fast Monte Carlo Algorithms for Matrix Operations & Massive Data Set Analysis

Fast Monte Carlo Algorithms for Matrix Operations & Massive Data Set Analysis Fast Monte Carlo Algorithms for Matrix Operations & Massive Data Set Analysis Michael W. Mahoney Yale University Dept. of Mathematics http://cs-www.cs.yale.edu/homes/mmahoney Joint work with: P. Drineas

More information

Empirical Performance of Approximate Algorithms for Low Rank Approximation

Empirical Performance of Approximate Algorithms for Low Rank Approximation Empirical Performance of Approximate Algorithms for Low Rank Approximation Dimitris Konomis (dkonomis@cs.cmu.edu) Machine Learning Department (MLD) School of Computer Science (SCS) Carnegie Mellon University

More information

Fast Dimension Reduction

Fast Dimension Reduction Fast Dimension Reduction MMDS 2008 Nir Ailon Google Research NY Fast Dimension Reduction Using Rademacher Series on Dual BCH Codes (with Edo Liberty) The Fast Johnson Lindenstrauss Transform (with Bernard

More information

Memory Efficient Kernel Approximation

Memory Efficient Kernel Approximation Si Si Department of Computer Science University of Texas at Austin ICML Beijing, China June 23, 2014 Joint work with Cho-Jui Hsieh and Inderjit S. Dhillon Outline Background Motivation Low-Rank vs. Block

More information

Lecture 12: Randomized Least-squares Approximation in Practice, Cont. 12 Randomized Least-squares Approximation in Practice, Cont.

Lecture 12: Randomized Least-squares Approximation in Practice, Cont. 12 Randomized Least-squares Approximation in Practice, Cont. Stat60/CS94: Randomized Algorithms for Matrices and Data Lecture 1-10/14/013 Lecture 1: Randomized Least-squares Approximation in Practice, Cont. Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning:

More information

OPERATIONS on large matrices are a cornerstone of

OPERATIONS on large matrices are a cornerstone of 1 On Sparse Representations of Linear Operators and the Approximation of Matrix Products Mohamed-Ali Belabbas and Patrick J. Wolfe arxiv:0707.4448v2 [cs.ds] 26 Jun 2009 Abstract Thus far, sparse representations

More information

Chapter XII: Data Pre and Post Processing

Chapter XII: Data Pre and Post Processing Chapter XII: Data Pre and Post Processing Information Retrieval & Data Mining Universität des Saarlandes, Saarbrücken Winter Semester 2013/14 XII.1 4-1 Chapter XII: Data Pre and Post Processing 1. Data

More information

Seeing the Forest from the Trees in Two Looks: Matrix Sketching by Cascaded Bilateral Sampling

Seeing the Forest from the Trees in Two Looks: Matrix Sketching by Cascaded Bilateral Sampling Seeing the orest from the Trees in Two Looks: Matrix Sketching by Cascaded Bilateral Sampling arxiv:67.7395v3 [cs.lg] 27 Jul 26 Kai Zhang, Chuanren Liu 2, Jie Zhang 3, Hui Xiong 4, Eric Xing 5, Jieping

More information

arxiv: v1 [math.na] 1 Sep 2018

arxiv: v1 [math.na] 1 Sep 2018 On the perturbation of an L -orthogonal projection Xuefeng Xu arxiv:18090000v1 [mathna] 1 Sep 018 September 5 018 Abstract The L -orthogonal projection is an important mathematical tool in scientific computing

More information

FAST MONTE CARLO ALGORITHMS FOR MATRICES III: COMPUTING A COMPRESSED APPROXIMATE MATRIX DECOMPOSITION

FAST MONTE CARLO ALGORITHMS FOR MATRICES III: COMPUTING A COMPRESSED APPROXIMATE MATRIX DECOMPOSITION SIAM J COMPUT Vol 36 No 1 pp 184 206 c 2006 Society for Industrial and Applied Mathematics AST MONTE CARLO ALGORITHMS OR MATRICES III: COMPUTING A COMPRESSED APPROXIMATE MATRIX DECOMPOSITION PETROS DRINEAS

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional

More information

Matrix Approximations

Matrix Approximations Matrix pproximations Sanjiv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 00 Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning Latent Semantic Indexing (LSI) Given n documents

More information

Improved Distributed Principal Component Analysis

Improved Distributed Principal Component Analysis Improved Distributed Principal Component Analysis Maria-Florina Balcan School of Computer Science Carnegie Mellon University ninamf@cscmuedu Yingyu Liang Department of Computer Science Princeton University

More information

arxiv: v1 [math.na] 22 Mar 2019

arxiv: v1 [math.na] 22 Mar 2019 CUR DECOMPOSITIONS, APPROXIMATIONS, AND PERTURBATIONS KEATON HAMM AND LONGXIU HUANG arxiv:1903.09698v1 [math.na] 22 Mar 2019 Abstract. This article discusses a useful tool in dimensionality reduction and

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 26, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 55 High dimensional

More information

Latent Semantic Analysis. Hongning Wang

Latent Semantic Analysis. Hongning Wang Latent Semantic Analysis Hongning Wang CS@UVa VS model in practice Document and query are represented by term vectors Terms are not necessarily orthogonal to each other Synonymy: car v.s. automobile Polysemy:

More information

arxiv: v1 [math.na] 29 Dec 2014

arxiv: v1 [math.na] 29 Dec 2014 A CUR Factorization Algorithm based on the Interpolative Decomposition Sergey Voronin and Per-Gunnar Martinsson arxiv:1412.8447v1 [math.na] 29 Dec 214 December 3, 214 Abstract An algorithm for the efficient

More information

Lecture notes: Applied linear algebra Part 1. Version 2

Lecture notes: Applied linear algebra Part 1. Version 2 Lecture notes: Applied linear algebra Part 1. Version 2 Michael Karow Berlin University of Technology karow@math.tu-berlin.de October 2, 2008 1 Notation, basic notions and facts 1.1 Subspaces, range and

More information

Efficient Non-Oblivious Randomized Reduction for Risk Minimization with Improved Excess Risk Guarantee

Efficient Non-Oblivious Randomized Reduction for Risk Minimization with Improved Excess Risk Guarantee Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-7) Efficient Non-Oblivious Randomized Reduction for Risk Minimization with Improved Excess Risk Guarantee Yi Xu, Haiqin

More information

Lecture 9: Numerical Linear Algebra Primer (February 11st)

Lecture 9: Numerical Linear Algebra Primer (February 11st) 10-725/36-725: Convex Optimization Spring 2015 Lecture 9: Numerical Linear Algebra Primer (February 11st) Lecturer: Ryan Tibshirani Scribes: Avinash Siravuru, Guofan Wu, Maosheng Liu Note: LaTeX template

More information

Low Rank Approximation Lecture 3. Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL

Low Rank Approximation Lecture 3. Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL Low Rank Approximation Lecture 3 Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch 1 Sampling based approximation Aim: Obtain rank-r approximation

More information

Information-Theoretic Methods in Data Science

Information-Theoretic Methods in Data Science Information-Theoretic Methods in Data Science Information-theoretic bounds on sketching Mert Pilanci Department of Electrical Engineering Stanford University Contents Information-theoretic bounds on sketching

More information