Randomized Numerical Linear Algebra: Review and Progresses
|
|
- Frederica Russell
- 5 years ago
- Views:
Transcription
1 ized ized SVD ized : Review and Progresses Zhihua Department of Computer Science and Engineering Shanghai Jiao Tong University The 12th China Workshop on Machine Learning and Applications Xi an, November 2014
2 ized ized SVD An interdisciplinary among Theoretical Computer Science (TCS), (NLA), and Modern Data Analysis Many data mining and machine learning algorithms involve matrix decomposition, matrix inverse and matrix determinant; and some methods are based on low-rank matrix approximation. The Big Data phenomenon brings new challenges and opportunities to machine learning and data mining.
3 Singular Value Decomposition (SVD) ized ized SVD Input: an m n data matrix A of rank r and an integer k less than r. The (condensed) SVD: A = UΣV T where U T U = I r, V T V = I r, and Σ = diag(σ 1,..., σ r ) with σ 1 σ 2 σ r > 0. time complexity: O(mn min(m, n)) The truncated SVD: A k = U k Σ k V T k where U k and V k are the first k columns of U and V, and Σ k is the k k top sub-block of Σ. A k is the closest rank-k approximation to A. That is, A k = argmin A X ξ. rank(x) k where ξ = 2" is the matrix spectral norm and ξ = F " is the matrix Frobenius norm. time complexity: O(mnk)
4 Singular Value Decomposition (SVD) ized ized SVD Input: an m n data matrix A of rank r and an integer k less than r. The (condensed) SVD: A = UΣV T where U T U = I r, V T V = I r, and Σ = diag(σ 1,..., σ r ) with σ 1 σ 2 σ r > 0. time complexity: O(mn min(m, n)) The truncated SVD: A k = U k Σ k V T k where U k and V k are the first k columns of U and V, and Σ k is the k k top sub-block of Σ. A k is the closest rank-k approximation to A. That is, A k = argmin A X ξ. rank(x) k where ξ = 2" is the matrix spectral norm and ξ = F " is the matrix Frobenius norm. time complexity: O(mnk)
5 Singular Value Decomposition (SVD) ized ized SVD Input: an m n data matrix A of rank r and an integer k less than r. The (condensed) SVD: A = UΣV T where U T U = I r, V T V = I r, and Σ = diag(σ 1,..., σ r ) with σ 1 σ 2 σ r > 0. time complexity: O(mn min(m, n)) The truncated SVD: A k = U k Σ k V T k where U k and V k are the first k columns of U and V, and Σ k is the k k top sub-block of Σ. A k is the closest rank-k approximation to A. That is, A k = argmin A X ξ. rank(x) k where ξ = 2" is the matrix spectral norm and ξ = F " is the matrix Frobenius norm. time complexity: O(mnk)
6 The ized ized SVD A CUR decomposition algorithm seeks to find a subset of c columns of A to form a matrix C R m c, a subset of r rows to form a matrix R R r n, and an intersection matrix U R c r such that A CUR ξ is minimized. The CUR decomposition results in an interpretable matrix approximation to A. There are ( n c) possible choices of constructing C and ( m r ) possible choices of constructing R, so selecting the best subsets is a hard problem.
7 Kernel Methods ized ized SVD K: n n kernel matrix. Matrix inverse b = (K + αi n ) 1 y time complexity: O(n 3 ) performed by Gaussian process regression, least square SVM, kernel ridge regression Partial eigenvalue decomposition of K time complexity: O(n 2 k) performed by kernel PCA and some manifold learning methods Space complexity: O(n 2 ) the iterative algorithms go many passes through the data you had better put the entire kernel matrix in RAM if the data does not fit in the RAM, one swap between RAM and disk in each pass.
8 Approaches for Large Scale Matrix Computations ized ized SVD Two typical approaches: incremental and distributed ized algorithms have been also used.
9 Outline ized ized SVD 1 ized SVD 2 3 4
10 Outline ized ized SVD 1 ized SVD 2 3 4
11 ized ized SVD This lemma has been given by Johnson and (1984), but the proof was not constructive. Indyk and Motwani (1998) and Dasgupta and Gupta (2003) constructed a result based on Gaussian random projection matrix R = [r ij ] where r ij iid N(0, 1). Matoušek (2008) generalized the result to the case that r ij s are any subgaussian random variables; that is, r ij iid G(ν 2 ) for ν 1.
12 ized ized SVD Definition (ɛ-isometry) Given ɛ (0, 1), a map f : R p R q where p > q is called an ɛ-isometry on set X R p if for every pair x, y X, we have (1 ɛ) x y 2 2 f (x) f (y) 2 2 (1 + ɛ) x y 2 2. We consider the case that f is defined as a linear map R R q p. The Basic idea is to construct a random projection R R q p that is an exact isometry in expectation;" that is, for every x R p, E [ Rx 2 2] = x 2 2.
13 ized ized SVD Theorem ( ) Let X = {x 1,..., x n } R p, and let ɛ, δ (0, 1). Assume that R R q p ( p > q) where r ij G(ν 2 ) for some ν 1. If q 100ν 2 ɛ 2 log(n/ δ), then with probability at least 1 δ, R is an ɛ-isometry on X where Y = { Pr sup y Y } Ry ɛ δ. { xi x j x i x j 2 : x i, x j X, x i x j }.
14 Prototype for ized SVD ized ized SVD Given an m n matrix A, a target number k of singular vectors, and an integer c such that k < c < min(m, n), a proto-algorithm based on random projection for Singular Value Decomposition (SVD) of A is as follows. 1 Construct an m c column-orthonormal matrix Q and form B = Q T A; 2 Compute SVD of the small matrix: B = U B Σ B V T B ; 3 Set Ũ = QU B; 4 Return ŨΣ BV T B as an approximate SVD of A, and U B,k Σ B,k V T B,k as a truncated SVD of A.
15 A Proto-Algorithm for Construction of Matrix Q ized ized SVD Let A be an m n matrix, and k be a target number of singular vectors. 1 Generate an m 2k Gaussian test matrix Ω. 2 Form Y = (AA T ) γ AΩ where γ = 1 or γ = 2. 3 Construct a matrix Q whose columns form an orthonormal basis for the range of Y.
16 Computational Complexity for the ized SVD ized ized SVD The randomized SVD procedure requires only 2(γ + 1) passes over the matrix. The flop count is (2γ + 2)kT mult + O(k 2 (m + n)), where T mult is the flop count of a matrix-vector multiply with A or A T.
17 Theoretical Analysis for the ized SVD ized ized SVD Theorem (Halko et al., 2011) Let A R m n. Give an exponent γ and a target number k of singular vectors, where 2 k 1 2 min(m, n), running the ized SVD algorithm obtains a rank-2k factorization Ũ 2k Σ2k Ṽ T 2k. Then E A Ũ2k Σ [ 2 min(m, n) ] 1/(2γ+1)σk+1 2k Ṽ T 2k k 1 where E is taken w.r.t. the random test matrix and σ k+1 is the top (k + 1)th singular value of A.
18 Outline ized ized SVD 1 ized SVD 2 3 4
19 The Problem ized ized SVD For a fixed m n matrix A of rank r and an error parameter ɛ (0, 1), we call S : R m R k a subspace embedding matrix for A if for all x R n. (1 ɛ) Ax 2 SAx 2 (1 + ɛ) Ax 2 The Problem is to find such an embedding matrix obliviously. More specifically, one designs a distribution π over linear maps from R m to R k such that for any fixed m n matrix A, if we choose S π, then with high probability S is an embedding matrix for A.
20 The Problem ized ized SVD For a fixed m n matrix A of rank r and an error parameter ɛ (0, 1), we call S : R m R k a subspace embedding matrix for A if for all x R n. (1 ɛ) Ax 2 SAx 2 (1 + ɛ) Ax 2 The Problem is to find such an embedding matrix obliviously. More specifically, one designs a distribution π over linear maps from R m to R k such that for any fixed m n matrix A, if we choose S π, then with high probability S is an embedding matrix for A.
21 Sparse Matrices ized ized SVD For a fixed m n matrix A with m > n, let nnz(a) denote the number of non-zero entries of A. Assume that nnz(a) m and that there are no all-zero rows or columns in A. Let [m] = {1, 2,..., m}. For a parameter k, define a random linear map ΦD : R m R k as follows h : [m] [k] is a random map so that for each i [m], h(i) = t where t [k] with probability 1/k. Φ {0, 1} k m is a k m binary matrix, with φ h(i),i = 1 and all remaining entries 0. D is an m m random diagonal matrix, with each diagonal entry independently chosen to be +1 or 1 with equal probability. A matrix of the form S = ΦD is referred to as a sparse embedding matrix (Dasgupta et al., 2010; Clarkson and Woodruff, 2013).
22 Sparse Matrices ized ized SVD For a fixed m n matrix A with m > n, let nnz(a) denote the number of non-zero entries of A. Assume that nnz(a) m and that there are no all-zero rows or columns in A. Let [m] = {1, 2,..., m}. For a parameter k, define a random linear map ΦD : R m R k as follows h : [m] [k] is a random map so that for each i [m], h(i) = t where t [k] with probability 1/k. Φ {0, 1} k m is a k m binary matrix, with φ h(i),i = 1 and all remaining entries 0. D is an m m random diagonal matrix, with each diagonal entry independently chosen to be +1 or 1 with equal probability. A matrix of the form S = ΦD is referred to as a sparse embedding matrix (Dasgupta et al., 2010; Clarkson and Woodruff, 2013).
23 Sparse Matrices ized ized SVD For a fixed m n matrix A with m > n, let nnz(a) denote the number of non-zero entries of A. Assume that nnz(a) m and that there are no all-zero rows or columns in A. Let [m] = {1, 2,..., m}. For a parameter k, define a random linear map ΦD : R m R k as follows h : [m] [k] is a random map so that for each i [m], h(i) = t where t [k] with probability 1/k. Φ {0, 1} k m is a k m binary matrix, with φ h(i),i = 1 and all remaining entries 0. D is an m m random diagonal matrix, with each diagonal entry independently chosen to be +1 or 1 with equal probability. A matrix of the form S = ΦD is referred to as a sparse embedding matrix (Dasgupta et al., 2010; Clarkson and Woodruff, 2013).
24 Sparse Matrices ized ized SVD For a fixed m n matrix A with m > n, let nnz(a) denote the number of non-zero entries of A. Assume that nnz(a) m and that there are no all-zero rows or columns in A. Let [m] = {1, 2,..., m}. For a parameter k, define a random linear map ΦD : R m R k as follows h : [m] [k] is a random map so that for each i [m], h(i) = t where t [k] with probability 1/k. Φ {0, 1} k m is a k m binary matrix, with φ h(i),i = 1 and all remaining entries 0. D is an m m random diagonal matrix, with each diagonal entry independently chosen to be +1 or 1 with equal probability. A matrix of the form S = ΦD is referred to as a sparse embedding matrix (Dasgupta et al., 2010; Clarkson and Woodruff, 2013).
25 Sparse Matrices ized ized SVD For a fixed m n matrix A with m > n, let nnz(a) denote the number of non-zero entries of A. Assume that nnz(a) m and that there are no all-zero rows or columns in A. Let [m] = {1, 2,..., m}. For a parameter k, define a random linear map ΦD : R m R k as follows h : [m] [k] is a random map so that for each i [m], h(i) = t where t [k] with probability 1/k. Φ {0, 1} k m is a k m binary matrix, with φ h(i),i = 1 and all remaining entries 0. D is an m m random diagonal matrix, with each diagonal entry independently chosen to be +1 or 1 with equal probability. A matrix of the form S = ΦD is referred to as a sparse embedding matrix (Dasgupta et al., 2010; Clarkson and Woodruff, 2013).
26 in Input-Sparsity Time ized ized SVD Theorem (Meng and Mahoney, 2013) Let S = ΦD R k m with k = n2 +n. Then with probability at ɛ 2 δ least 1 δ, (1 ɛ) Ax 2 SAx 2 (1 + ɛ) Ax 2 for all x R n. In addition, SA can be computed in O(nnz(A)).
27 Spectral Sparsifiers ized ized SVD Theorem (Batson, Spielman and Srivastava, 2014) Suppose ρ > 1 and {v 1, v 2,..., v m } R n with v i v T i = I n. i m Then there exist scalars d i 0 with {i : d i 0} ρn such that ( 1 1 ) 2In ( d i v i v T ρ i ) 2In. ρ i m This theorem shows that λ 1 ( i m d iv i v T i ) λ n ( i m d iv i v T i ) ρ ρ ρ ρ.
28 Outline ized ized SVD 1 ized SVD 2 3 4
29 and The CX Decomposition ized ized SVD Given an m n matrix A, column selection algorithms aim to find a matrix with c columns of A such that A CC + A ξ = (I m CC + )A ξ achieves the minimum. Here ξ = 2," ξ = F," and ξ = " respectively represent the matrix spectral norm, the matrix Frobenius norm, and the matrix nuclear norm, and C + is the Moore-Penrose inverse of C. Let X be the best rank k approximation to A in the column span of C. Then CX is called the CX Decomposition of A. Since there are ( n c) possible choices of constructing C, selecting the best subset is a hard problem.
30 A ized Algorithm for ized ized SVD Given an m n matrix A and a rank parameter k, a random sampling based on the statistical leverage score is: Compute the importance sampling probabilities {π i } n i=1. Here π i = 1 k V k (i), where V k is an n k orthonormal matrix spanning the top-k right singular subspace of A. ly select c = O(k log(k/ɛ 2 )) columns of A according to these probabilities to form the matrix C.
31 Theoretical Result for the Column (Drineas et al., 2008) ized ized SVD Let C k be the best rank-k approximation to the matrix C, and define the projection matrix P Ck = C k C + k. Then A P Ck A F (1 + ɛ) A A k F, where A k = U k Σ k V T k is the best rank k approximation to A.
32 The Adaptive Sampling Algorithm ized ized SVD (Deshpande et al., 2006) Given a matrix A R m n, let C 1 R m c 1 consist of c 1 columns of A, and define the residual B = A C 1 C + 1 A. Additionally, for i = 1,, n, define π i = b i 2 2 / B 2 F. We further sample c 2 columns i.i.d. from A, in each trial of which the i-th column is chosen with probability π i. Let C 2 R m c 2 contain the c 2 sampled columns and let C = [C 1, C 2 ] R m (c 1+c 2 ). Then, for any integer k > 0, the following inequality holds: E A CC + A 2 F A A k 2 F + k c 2 A C 1 C + 1 A 2 F,
33 The Near-Optimal Algorithm ized ized SVD Boutsidis et al. (2013) derived a near-optimal algorithm, which consists of three steps: the approximate SVD via random projection (Halko et al. 2011) a dual set sparsification algorithm an extension of spectral sparsifier (BSS) the adaptive sampling algorithm (Deshpande et al., 2006)
34 The Near-Optimal Algorithm ized ized SVD Theorem (Boutsidis et al., 2013) Given a matrix A R m n of rank ρ, a target rank k (2 k < ρ), and 0 < ɛ < 1, the algorithm selects c = 2k ɛ ( ) 1 + o(1) columns of A to form a matrix C R m c. Then the following inequality holds: E A CC + A 2 F (1 + ɛ) A A k 2 F, where the expectation is taken w.r.t. C. Furthermore, the matrix C can be obtained in time: O ( mk 2 ɛ 4/3 + nk 3 ɛ 2/3) + T Multiply ( mnkɛ 2/3 ).
35 The (Drineas et al., 2008; Mahoney and Drineas, 2009) ized ized SVD Given an m n matrix A, and integers c < n and r < m, the CUR decomposition of A finds C R m c with c columns from A, R R r n with r rows from A, and U R c r such that A = CUR + E. Here E = A CUR is the residual error matrix.
36 The CUR Problem ized ized SVD Definition (The ) Given an m n matrix A of rank ρ, a rank parameter k, and accuracy parameter ɛ (0, 1), construct a matrix C R m c with c columns from A, R R r n with rows from A, and U R c r, with c, r, and rank(u) being as small as possible, such that A CUR 2 F (1 + ɛ) A A k 2 F. Here A k = U k Σ k V T k Rm n is the best rank k matrix obtained via the SVD of A: A = UΣV T.
37 The Sampling CUR Algorithm ized ized SVD Drineas et al., (2008) proposed a two-stage randomized CUR algorithm that called Sampling. The first stage samples c columns of A to construct C according to the sampling probabilities proportional to the squared l 2 -norm of the rows of V k ; The second stage samples r rows from A and C simultaneously to construct R and W and let U = W. The sampling probabilities in this stages are proportional to the leverage scores of A and C, respectively.
38 The Sampling CUR Algorithm ized ized SVD (Drineas et al., 2008) Given an m n matrix A and a target rank k min{m, n}, the subspace sampling algorithm selects c = O(kɛ 2 log k log(1/δ)) columns and r = O ( cɛ 2 log c log(1/δ) ) rows without replacement. Then A CUR F = A CW + R F (1 + ɛ) A A k F, holds with probability at least 1 δ, where W contains the rows of C with scaling. The running time is dominated by the truncated SVD of A, that is, O(mnk).
39 The Adaptive Sampling CUR Algorithm ized ized SVD Wang and (2013) proposed an Adaptive Sampling CUR Algorithm. Select c = 2k ( ) ɛ 1 + o(1) columns of A to construct C R m c using Algorithm of Boutsidis et al. (2013); Select r 1 = c rows of A to construct R 1 R r 1 n using Algorithm of Boutsidis et al. (2013); Adaptively sample r 2 = c/ɛ rows from A according to the residual A AR 1 R 1; Return C, R = [R T 1, RT 2 ]T, and U = C AR.
40 The Adaptive Sampling CUR Algorithm ized (Wang and, 2013) Given a matrix A R m n and a matrix C R m c such that rank(c) = rank(cc A) = ρ (ρ c n), let R 1 R r 1 n consist of r 1 rows of A and define the residual B = A AR 1 R 1. Additionally, for i = 1,, m, we define ized SVD π i = b (i) 2 2 / B 2 F. We further sample r 2 rows i.i.d. from A, in each trial of which the i-th row is chosen with probability p i. Let R 2 R r 2 n contain the r 2 sampled rows and let R = [R T 1, RT 2 ]T R (r 1+r 2 ) n. Then we have E A CC AR R 2 F A CC A 2 F + ρ r 2 A AR 1 R 1 2 F,
41 The Adaptive Sampling CUR Algorithm ized ized SVD Theorem (Wang and, 2013) Given a matrix A R m n and a positive integer k min{m, n}, the Adaptive Sampling CUR algorithm randomly selects c = 2k ɛ (1+o(1)) columns of A to construct C R m c, and then selects r = c ɛ (1+ɛ) rows of A to construct R R r n. Then we have E A CUR F = E A C(C AR )R F (1+ɛ) A A k F. The algorithm costs time O ( (m + n)k 3 ɛ 2/3 + mk 2 ɛ 2 + nk 2 ɛ 4) + T Multiply ( mnkɛ 1 ) to compute matrices C, U and R.
42 Optimal CUR Algorithm ized ized SVD Boutsidis and Woodruff (2014) proposed Optimal CUR Algorithm. Construction C with O(k + k ɛ ) columns: Compute the top k singular vectors of A: Z 1 Sample O(k log k) columns from Z T 1 with the leverage scores Down-sample columns to c 1 = O(k) columns with the sampling algorithm of Boutsidis et al. (2013) Adaptively sample c 2 = O( k ɛ ) columns of A Construction R with O(k + k ɛ ) rows: Find Z 2 in the span of C such that: A Z 2 Z T 2 A 2 F (1 + ɛ) A A k 2 F Sample O(k log k) rows from Z 2 with the leverage scores Down-sample rows to r 1 = O(k) rows with the sampling algorithm of Boutsidis et al. (2013) Sample r 2 = O( k ɛ ) rows with adaptive sampling
43 Optimal CUR Algorithm ized ized SVD (Boutsidis and Woodruff, 2014) Given a matrix A R m n, V R m c and an integer k, let V = YΨ be a QR decomposition of V, Γ = Y T A, Γ k = Σ k V T k be a rank k SVD of Γ, Rc k. Then Y T Y T satisfies: A Y T Y T A 2 F A Y Σ kv T k 2 F = A ΠF V,k (A) 2 F.
44 Optimal CUR Algorithm ized ized SVD Theorem (Boutsidis and Woodruff, 2014) Given a matrix A R m n of rank ρ, a target rank 1 k ρ, and 0 < ɛ < 1, the optimal CUR algorithm selects at most c = O(k/ɛ) columns and at most r = O(k/ɛ) rows from A form matrices C R m c, R R r n, and U R c r with rank(u) = k such that, with some probability, A CUR 2 F (1 + O(ɛ)) A A k 2 F. The matrices C, U, and R can be computed in time O [ nnz(a) log n + (m + n) poly(log n, k, 1/ɛ) ].
45 ized ized SVD : selects c ( n) columns of K to construct C using some randomized algorithms. After permutation we have K = [ W K T 21 K 21 K 22 The Nyström Approximation: K nys c }{{} n n = C ], C = }{{} n c W }{{} c c K nys c C T }{{} c n K. [ W K 21 ].
46 ized ized SVD : selects c ( n) columns of K to construct C using some randomized algorithms. After permutation we have K = [ W K T 21 K 21 K 22 The Nyström Approximation: K nys c }{{} n n = C ], C = }{{} n c W }{{} c c K nys c C T }{{} c n K. [ W K 21 ].
47 The Nyström Approximation ized The Nyström Approximation: K (A low-rank factorization). K nys c = CW C T ized SVD Nyström Approximation c c c n n n n c
48 Problem Formulation ized ized SVD Problem: How to select informative columns of K R n n to construct C R n c? The approximation error K CUC T F or K CUC T 2 should be as small as possible.
49 Criterion: Upper Error Bounds ized ized SVD Using approximation algorithms to find c good columns (not necessarily the best) Hope that K CUCT F K K k F smaller the better. has upper bound, which is the
50 Uniform Sampling: The Simplest Algorithm ized ized SVD Sample c columns of K uniformly at random to construct C. The simplest, but the most widely used.
51 Adaptive Sampling ized ized SVD The adaptive sampling algorithm [Deshpande et al., 2006]: 1 Sample c 1 columns of K to construct C 1 using some algorithm; 2 Compute the residual B = K C 1 C 1 K; 3 Compute sampling probabilities p i = b i 2 2, for i = 1 to B 2 F n; 4 Sample further c 2 columns of K in c 2 i.i.d. trials, in each trial the i-th column is chosen with probability p i ; Denote the selected columns by C 2 ; 5 Return C = [C 1, C 2 ].
52 Adaptive Sampling ized ized SVD The error term K CC K F is bounded theoretically, but K CW C T F is not. Empirically, the adaptive sampling algorithm works very well.
53 Better Sampling Algorithms? ized ized SVD We hope K CW C T F K K k F will be very small if the column sampling algorithm is good enough. But it cannot be arbitrarily small. Lower Error Bound Theorem (Wang &, JMLR 2013) Whatever column sampling is used to select c columns, there exists a bad case K such that K CW C T 2 ( F K K k 2 Ω 1 + nk ) c F 2.
54 Different Types of Low-Rank Approximation? ized ized SVD The Ensemble Nyström Method [Kumar et al., JMLR 2012]: t 1 K t C(i) W (i) C (i)t i=1 It does not improve the lower error bound. Lower Error Bound Theorem (Wang &, JMLR 2013) Whatever column sampling is used to select c columns, there exists a bad case K such that K t i=1 1 t C(i) W (i) C (i)t 2 F K K k 2 F ( Ω 1 + nk ) c 2.
55 The Modified Nyström Method ized ized SVD The Modified Nyström Method [Wang &, JMLR 2013]: ( ) K C C K(C ) T }{{} c c C T. Theorem (Wang &, JMLR 2013) Using a column sampling algorithm, the error incurred by the modified Nyström method satisfies K C ( C K(C ) T ) C T 2 F E K K k 2 F 1 + k c.
56 Comparisons between the Two Methods ized ized SVD The Standard Nyström Method: fast. It costs only T SVD (c 3 ) time to compute the intersection matrix U nys = W. The Modified Nyström Method: slow. It costs T SVD (nc 2 ) + T Multiply (n 2 c) time to compute the intersection matrix U mod = C K(C ) T naively.
57 Comparisons between the Two Methods ized ized SVD The Standard Nyström Method: inaccurate. It cannot attain 1 + ɛ Frobenius relative-error bound unless c nk/ɛ columns are selected, whatever column selection algorithm is used. (Due to its lower error bound.) The Modified Nyström Method: accurate. Some adaptive sampling based algorithms attain 1 + ɛ Frobenius relative-error bound when c = O(k/ɛ 2 ). (c is the smaller the better.)
58 Comparisons between the Two Methods ized ized SVD Theorem (Exact Recovery) For the symmetric matrix K defined previously, the following three statements are equivalent: 1 rank(w) = rank(k), 2 K = CW C T, (i.e., the standard Nyström method is exact) 3 K = C ( C K(C ) T ) C T, (i.e., the modified Nyström method is exact)
59 Outline ized ized SVD 1 ized SVD 2 3 4
60 ized ized SVD Santosh S. Vempala. The Method. American Mathematical Society, Michael W. Mahoney. ized Algorithms for Matrices and Data. Foundations and Trends in Machine Learning, 3(2): , N. Halko, P. G. Martinsson, and J. A. Tropp. Finding Structure with ness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions. SIAM Review, 53(2): , 2011 W. B. Johnson and J.. Extensions of Lipschitz mapping into a Hilbert space. Contemporary Mathematics, S. Dasgupta and A. Gupta. An elementary proof of a theorem of Johnson and. Structure & Algorithms, J. Matoušek. On variants of the Johnson and Leamma. Structure & Algorithms, 2008.
61 ized ized SVD A. Dasgupta, R. Kumar, and T. Sarlós: A sparse Johnson- Transform. In STOC, K. L. Clarkson and D. P. Woodruff: Low Rank Approximation and Regression in Sparsity Time. In STOC, X. Meng and M. W. Mahoney. Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression. STOC, J. Nelson and H. L. Nguyên. OSNAP: Faster numerical linear algebra algorithms via sparser subspace embeddings In FOCS, J. Batson, D. Spielman, and N. Srivastave: Twice-Ramanujan Sparsifiers. SIAM Review, 2014.
62 ized ized SVD A. Frieze, K. Kannan, and Rademacher, S. Vempala: Fast Monte-Carlo algorithms for finding low-rank approximation. In FOCS, Journal of the ACM, A. Deshpande, L. Rademacher, S. Vempala, and G.Wang: Matrix approximation and projective clustering via volume sampling. Theory of Computing, C. Boutsidis, P. Drineas, and M. Magdon-Ismail: Near optimal column-based matrix reconstruction. SIAM Journal on Computing, V. Guruswami and A. K. Sinop: Optimal column based low-rank matrix reconstruction. In SODA, 2012.
63 ized ized SVD P. Drineas, M. W. Mahoney, and S. Muthukrishnan. Relative-error CUR matrix decompositions. SIAM Journal on Matrix Analysis and Applications, M. W. Mahoney and P. Drineas. CUR matrix decompositions for improved data analysis. Proceedings of the National Academy of Sciences, Sshuse Wang and Zhihua : Improving CUR matrix decomposition and the Nyström approximation via adaptive sampling. JMLR, C. Boutsidis and D. P. Woodruff: Optimal CUR matrix decompositions. In STOC, S. Kumar, M. Mohri, and A. Talwalkar: Sampling methods for the Nyström method. JMLR, K. L. Clarkson and D. P. Woodruff: Low Rank Approximation and Regression in Sparsity Time. In STOC, 2013.
Relative-Error CUR Matrix Decompositions
RandNLA Reading Group University of California, Berkeley Tuesday, April 7, 2015. Motivation study [low-rank] matrix approximations that are explicitly expressed in terms of a small numbers of columns and/or
More informationSketching as a Tool for Numerical Linear Algebra
Sketching as a Tool for Numerical Linear Algebra (Part 2) David P. Woodruff presented by Sepehr Assadi o(n) Big Data Reading Group University of Pennsylvania February, 2015 Sepehr Assadi (Penn) Sketching
More informationRandomized algorithms for the approximation of matrices
Randomized algorithms for the approximation of matrices Luis Rademacher The Ohio State University Computer Science and Engineering (joint work with Amit Deshpande, Santosh Vempala, Grant Wang) Two topics
More informationApproximate Spectral Clustering via Randomized Sketching
Approximate Spectral Clustering via Randomized Sketching Christos Boutsidis Yahoo! Labs, New York Joint work with Alex Gittens (Ebay), Anju Kambadur (IBM) The big picture: sketch and solve Tradeoff: Speed
More informationColumn Selection via Adaptive Sampling
Column Selection via Adaptive Sampling Saurabh Paul Global Risk Sciences, Paypal Inc. saupaul@paypal.com Malik Magdon-Ismail CS Dept., Rensselaer Polytechnic Institute magdon@cs.rpi.edu Petros Drineas
More informationLecture 18 Nov 3rd, 2015
CS 229r: Algorithms for Big Data Fall 2015 Prof. Jelani Nelson Lecture 18 Nov 3rd, 2015 Scribe: Jefferson Lee 1 Overview Low-rank approximation, Compression Sensing 2 Last Time We looked at three different
More informationto be more efficient on enormous scale, in a stream, or in distributed settings.
16 Matrix Sketching The singular value decomposition (SVD) can be interpreted as finding the most dominant directions in an (n d) matrix A (or n points in R d ). Typically n > d. It is typically easy to
More informationRandNLA: Randomization in Numerical Linear Algebra
RandNLA: Randomization in Numerical Linear Algebra Petros Drineas Department of Computer Science Rensselaer Polytechnic Institute To access my web page: drineas Why RandNLA? Randomization and sampling
More informationSketching as a Tool for Numerical Linear Algebra
Sketching as a Tool for Numerical Linear Algebra David P. Woodruff presented by Sepehr Assadi o(n) Big Data Reading Group University of Pennsylvania February, 2015 Sepehr Assadi (Penn) Sketching for Numerical
More informationRandNLA: Randomized Numerical Linear Algebra
RandNLA: Randomized Numerical Linear Algebra Petros Drineas Rensselaer Polytechnic Institute Computer Science Department To access my web page: drineas RandNLA: sketch a matrix by row/ column sampling
More informationRandomized Algorithms in Linear Algebra and Applications in Data Analysis
Randomized Algorithms in Linear Algebra and Applications in Data Analysis Petros Drineas Rensselaer Polytechnic Institute Computer Science Department To access my web page: drineas Why linear algebra?
More informationMANY scientific computations, signal processing, data analysis and machine learning applications lead to large dimensional
Low rank approximation and decomposition of large matrices using error correcting codes Shashanka Ubaru, Arya Mazumdar, and Yousef Saad 1 arxiv:1512.09156v3 [cs.it] 15 Jun 2017 Abstract Low rank approximation
More informationSparse Features for PCA-Like Linear Regression
Sparse Features for PCA-Like Linear Regression Christos Boutsidis Mathematical Sciences Department IBM T J Watson Research Center Yorktown Heights, New York cboutsi@usibmcom Petros Drineas Computer Science
More informationLow-Rank PSD Approximation in Input-Sparsity Time
Low-Rank PSD Approximation in Input-Sparsity Time Kenneth L. Clarkson IBM Research Almaden klclarks@us.ibm.com David P. Woodruff IBM Research Almaden dpwoodru@us.ibm.com Abstract We give algorithms for
More informationSketched Ridge Regression:
Sketched Ridge Regression: Optimization and Statistical Perspectives Shusen Wang UC Berkeley Alex Gittens RPI Michael Mahoney UC Berkeley Overview Ridge Regression min w f w = 1 n Xw y + γ w Over-determined:
More informationSubset Selection. Deterministic vs. Randomized. Ilse Ipsen. North Carolina State University. Joint work with: Stan Eisenstat, Yale
Subset Selection Deterministic vs. Randomized Ilse Ipsen North Carolina State University Joint work with: Stan Eisenstat, Yale Mary Beth Broadbent, Martin Brown, Kevin Penner Subset Selection Given: real
More informationCS 229r: Algorithms for Big Data Fall Lecture 17 10/28
CS 229r: Algorithms for Big Data Fall 2015 Prof. Jelani Nelson Lecture 17 10/28 Scribe: Morris Yau 1 Overview In the last lecture we defined subspace embeddings a subspace embedding is a linear transformation
More informationRandom Methods for Linear Algebra
Gittens gittens@acm.caltech.edu Applied and Computational Mathematics California Institue of Technology October 2, 2009 Outline The Johnson-Lindenstrauss Transform 1 The Johnson-Lindenstrauss Transform
More informationLow Rank Matrix Approximation
Low Rank Matrix Approximation John T. Svadlenka Ph.D. Program in Computer Science The Graduate Center of the City University of New York New York, NY 10036 USA jsvadlenka@gradcenter.cuny.edu Abstract Low
More informationsublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU)
sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU) 0 overview Our Contributions: 1 overview Our Contributions: A near optimal low-rank
More informationRandomized Algorithms
Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models
More informationEECS 275 Matrix Computation
EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 22 1 / 21 Overview
More informationEfficiently Implementing Sparsity in Learning
Efficiently Implementing Sparsity in Learning M. Magdon-Ismail Rensselaer Polytechnic Institute (Joint Work) December 9, 2013. Out-of-Sample is What Counts NO YES A pattern exists We don t know it We have
More informationSketching Structured Matrices for Faster Nonlinear Regression
Sketching Structured Matrices for Faster Nonlinear Regression Haim Avron Vikas Sindhwani IBM T.J. Watson Research Center Yorktown Heights, NY 10598 {haimav,vsindhw}@us.ibm.com David P. Woodruff IBM Almaden
More informationTighter Low-rank Approximation via Sampling the Leveraged Element
Tighter Low-rank Approximation via Sampling the Leveraged Element Srinadh Bhojanapalli The University of Texas at Austin bsrinadh@utexas.edu Prateek Jain Microsoft Research, India prajain@microsoft.com
More informationRandomized Algorithms for Matrix Computations
Randomized Algorithms for Matrix Computations Ilse Ipsen Students: John Holodnak, Kevin Penner, Thomas Wentworth Research supported in part by NSF CISE CCF, DARPA XData Randomized Algorithms Solve a deterministic
More informationLecture 5: Randomized methods for low-rank approximation
CBMS Conference on Fast Direct Solvers Dartmouth College June 23 June 27, 2014 Lecture 5: Randomized methods for low-rank approximation Gunnar Martinsson The University of Colorado at Boulder Research
More informationA fast randomized algorithm for approximating an SVD of a matrix
A fast randomized algorithm for approximating an SVD of a matrix Joint work with Franco Woolfe, Edo Liberty, and Vladimir Rokhlin Mark Tygert Program in Applied Mathematics Yale University Place July 17,
More informationSubspace sampling and relative-error matrix approximation
Subspace sampling and relative-error matrix approximation Petros Drineas Rensselaer Polytechnic Institute Computer Science Department (joint work with M. W. Mahoney) For papers, etc. drineas The CUR decomposition
More informationA Tutorial on Matrix Approximation by Row Sampling
A Tutorial on Matrix Approximation by Row Sampling Rasmus Kyng June 11, 018 Contents 1 Fast Linear Algebra Talk 1.1 Matrix Concentration................................... 1. Algorithms for ɛ-approximation
More informationSubspace Embeddings for the Polynomial Kernel
Subspace Embeddings for the Polynomial Kernel Haim Avron IBM T.J. Watson Research Center Yorktown Heights, NY 10598 haimav@us.ibm.com Huy L. Nguy ên Simons Institute, UC Berkeley Berkeley, CA 94720 hlnguyen@cs.princeton.edu
More informationCompressed Sensing and Robust Recovery of Low Rank Matrices
Compressed Sensing and Robust Recovery of Low Rank Matrices M. Fazel, E. Candès, B. Recht, P. Parrilo Electrical Engineering, University of Washington Applied and Computational Mathematics Dept., Caltech
More informationarxiv: v3 [cs.ds] 21 Mar 2013
Low-distortion Subspace Embeddings in Input-sparsity Time and Applications to Robust Linear Regression Xiangrui Meng Michael W. Mahoney arxiv:1210.3135v3 [cs.ds] 21 Mar 2013 Abstract Low-distortion subspace
More informationA Fast Algorithm For Computing The A-optimal Sampling Distributions In A Big Data Linear Regression
A Fast Algorithm For Computing The A-optimal Sampling Distributions In A Big Data Linear Regression Hanxiang Peng and Fei Tan Indiana University Purdue University Indianapolis Department of Mathematical
More informationApplications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices
Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices Vahid Dehdari and Clayton V. Deutsch Geostatistical modeling involves many variables and many locations.
More informationA fast randomized algorithm for overdetermined linear least-squares regression
A fast randomized algorithm for overdetermined linear least-squares regression Vladimir Rokhlin and Mark Tygert Technical Report YALEU/DCS/TR-1403 April 28, 2008 Abstract We introduce a randomized algorithm
More informationRecovering any low-rank matrix, provably
Recovering any low-rank matrix, provably Rachel Ward University of Texas at Austin October, 2014 Joint work with Yudong Chen (U.C. Berkeley), Srinadh Bhojanapalli and Sujay Sanghavi (U.T. Austin) Matrix
More informationLecture 9: Matrix approximation continued
0368-348-01-Algorithms in Data Mining Fall 013 Lecturer: Edo Liberty Lecture 9: Matrix approximation continued Warning: This note may contain typos and other inaccuracies which are usually discussed during
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Numerical Linear Algebra Background Cho-Jui Hsieh UC Davis May 15, 2018 Linear Algebra Background Vectors A vector has a direction and a magnitude
More informationFast Approximate Matrix Multiplication by Solving Linear Systems
Electronic Colloquium on Computational Complexity, Report No. 117 (2014) Fast Approximate Matrix Multiplication by Solving Linear Systems Shiva Manne 1 and Manjish Pal 2 1 Birla Institute of Technology,
More informationSubset Selection. Ilse Ipsen. North Carolina State University, USA
Subset Selection Ilse Ipsen North Carolina State University, USA Subset Selection Given: real or complex matrix A integer k Determine permutation matrix P so that AP = ( A 1 }{{} k A 2 ) Important columns
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Lecture 5: Numerical Linear Algebra Cho-Jui Hsieh UC Davis April 20, 2017 Linear Algebra Background Vectors A vector has a direction and a magnitude
More informationLecture 24: Element-wise Sampling of Graphs and Linear Equation Solving. 22 Element-wise Sampling of Graphs and Linear Equation Solving
Stat260/CS294: Randomized Algorithms for Matrices and Data Lecture 24-12/02/2013 Lecture 24: Element-wise Sampling of Graphs and Linear Equation Solving Lecturer: Michael Mahoney Scribe: Michael Mahoney
More informationPrincipal Component Analysis
Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used
More informationRandomized methods for computing the Singular Value Decomposition (SVD) of very large matrices
Randomized methods for computing the Singular Value Decomposition (SVD) of very large matrices Gunnar Martinsson The University of Colorado at Boulder Students: Adrianna Gillman Nathan Halko Sijia Hao
More informationrandomized block krylov methods for stronger and faster approximate svd
randomized block krylov methods for stronger and faster approximate svd Cameron Musco and Christopher Musco December 2, 25 Massachusetts Institute of Technology, EECS singular value decomposition n d left
More informationSketching as a Tool for Numerical Linear Algebra
Foundations and Trends R in Theoretical Computer Science Vol. 10, No. 1-2 (2014) 1 157 c 2014 D. P. Woodruff DOI: 10.1561/0400000060 Sketching as a Tool for Numerical Linear Algebra David P. Woodruff IBM
More informationRandomized algorithms for matrix computations and analysis of high dimensional data
PCMI Summer Session, The Mathematics of Data Midway, Utah July, 2016 Randomized algorithms for matrix computations and analysis of high dimensional data Gunnar Martinsson The University of Colorado at
More informationA randomized algorithm for approximating the SVD of a matrix
A randomized algorithm for approximating the SVD of a matrix Joint work with Per-Gunnar Martinsson (U. of Colorado) and Vladimir Rokhlin (Yale) Mark Tygert Program in Applied Mathematics Yale University
More informationApproximating a Gram Matrix for Improved Kernel-Based Learning
Approximating a Gram Matrix for Improved Kernel-Based Learning (Extended Abstract) Petros Drineas 1 and Michael W. Mahoney 1 Department of Computer Science, Rensselaer Polytechnic Institute, Troy, New
More informationBig Data Analytics: Optimization and Randomization
Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.
More informationFast Random Projections using Lean Walsh Transforms Yale University Technical report #1390
Fast Random Projections using Lean Walsh Transforms Yale University Technical report #1390 Edo Liberty Nir Ailon Amit Singer Abstract We present a k d random projection matrix that is applicable to vectors
More informationApproximate Principal Components Analysis of Large Data Sets
Approximate Principal Components Analysis of Large Data Sets Daniel J. McDonald Department of Statistics Indiana University mypage.iu.edu/ dajmcdon April 27, 2016 Approximation-Regularization for Analysis
More informationIntroduction to Numerical Linear Algebra II
Introduction to Numerical Linear Algebra II Petros Drineas These slides were prepared by Ilse Ipsen for the 2015 Gene Golub SIAM Summer School on RandNLA 1 / 49 Overview We will cover this material in
More informationTechnical Report No September 2009
FINDING STRUCTURE WITH RANDOMNESS: STOCHASTIC ALGORITHMS FOR CONSTRUCTING APPROXIMATE MATRIX DECOMPOSITIONS N. HALKO, P. G. MARTINSSON, AND J. A. TROPP Technical Report No. 2009-05 September 2009 APPLIED
More informationSubsampling for Ridge Regression via Regularized Volume Sampling
Subsampling for Ridge Regression via Regularized Volume Sampling Micha l Dereziński and Manfred Warmuth University of California at Santa Cruz AISTATS 18, 4-9-18 1 / 22 Linear regression y x 2 / 22 Optimal
More informationTechnical Report. Random projections for Bayesian regression. Leo Geppert, Katja Ickstadt, Alexander Munteanu and Christian Sohler 04/2014
Random projections for Bayesian regression Technical Report Leo Geppert, Katja Ickstadt, Alexander Munteanu and Christian Sohler 04/2014 technische universität dortmund Part of the work on this technical
More informationRandomized algorithms in numerical linear algebra
Acta Numerica (2017), pp. 95 135 c Cambridge University Press, 2017 doi:10.1017/s0962492917000058 Printed in the United Kingdom Randomized algorithms in numerical linear algebra Ravindran Kannan Microsoft
More informationMAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing
MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing Afonso S. Bandeira April 9, 2015 1 The Johnson-Lindenstrauss Lemma Suppose one has n points, X = {x 1,..., x n }, in R d with d very
More informationParallel Numerical Algorithms
Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Edgar
More informationColumn Subset Selection with Missing Data via Active Sampling
Yining Wang Machine Learning Department Carnegie Mellon University Aarti Singh Machine Learning Department Carnegie Mellon University Abstract Column subset selection of massive data matrices has found
More informationNORMS ON SPACE OF MATRICES
NORMS ON SPACE OF MATRICES. Operator Norms on Space of linear maps Let A be an n n real matrix and x 0 be a vector in R n. We would like to use the Picard iteration method to solve for the following system
More informationEnabling very large-scale matrix computations via randomization
Enabling very large-scale matrix computations via randomization Gunnar Martinsson The University of Colorado at Boulder Students: Adrianna Gillman Nathan Halko Patrick Young Collaborators: Edo Liberty
More informationNear-Optimal Coresets for Least-Squares Regression
6880 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 59, NO 10, OCTOBER 2013 Near-Optimal Coresets for Least-Squares Regression Christos Boutsidis, Petros Drineas, Malik Magdon-Ismail Abstract We study the
More informationMachine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University
More informationCan matrix coherence be efficiently and accurately estimated?
Mehryar Mohri Courant Institute and Google Research New York, NY mohri@cs.nyu.edu Ameet Talwalkar Computer Science Division University of California, Berkeley ameet@eecs.berkeley.edu Abstract Matrix coherence
More informationDense Fast Random Projections and Lean Walsh Transforms
Dense Fast Random Projections and Lean Walsh Transforms Edo Liberty, Nir Ailon, and Amit Singer Abstract. Random projection methods give distributions over k d matrices such that if a matrix Ψ (chosen
More informationA Review of Nyström Methods for Large-Scale Machine Learning
A Review of Nyström Methods for Large-Scale Machine Learning Shiliang Sun, Jing Zhao, Jiang Zhu Shanghai Key Laboratory of Multidimensional Information Processing, Department of Computer Science and Technology,
More informationFast low rank approximations of matrices and tensors
Fast low rank approximations of matrices and tensors S. Friedland, V. Mehrmann, A. Miedlar and M. Nkengla Univ. Illinois at Chicago & Technische Universität Berlin Gene Golub memorial meeting, Berlin,
More informationTHE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR
THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR 1. Definition Existence Theorem 1. Assume that A R m n. Then there exist orthogonal matrices U R m m V R n n, values σ 1 σ 2... σ p 0 with p = min{m, n},
More informationNormalized power iterations for the computation of SVD
Normalized power iterations for the computation of SVD Per-Gunnar Martinsson Department of Applied Mathematics University of Colorado Boulder, Co. Per-gunnar.Martinsson@Colorado.edu Arthur Szlam Courant
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)
AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) Lecture 19: Computing the SVD; Sparse Linear Systems Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical
More informationTheoretical and empirical aspects of SPSD sketches
53 Chapter 6 Theoretical and empirical aspects of SPSD sketches 6. Introduction In this chapter we consider the accuracy of randomized low-rank approximations of symmetric positive-semidefinite matrices
More informationSketching as a Tool for Numerical Linear Algebra All Lectures. David Woodruff IBM Almaden
Sketching as a Tool for Numerical Linear Algebra All Lectures David Woodruff IBM Almaden Massive data sets Examples Internet traffic logs Financial data etc. Algorithms Want nearly linear time or less
More informationdimensionality reduction for k-means and low rank approximation
dimensionality reduction for k-means and low rank approximation Michael Cohen, Sam Elder, Cameron Musco, Christopher Musco, Mădălina Persu Massachusetts Institute of Technology 0 overview Simple techniques
More informationThe University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.
The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment Two Caramanis/Sanghavi Due: Tuesday, Feb. 19, 2013. Computational
More informationbe a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u
MATH 434/534 Theoretical Assignment 7 Solution Chapter 7 (71) Let H = I 2uuT Hu = u (ii) Hv = v if = 0 be a Householder matrix Then prove the followings H = I 2 uut Hu = (I 2 uu )u = u 2 uut u = u 2u =
More informationCS60021: Scalable Data Mining. Dimensionality Reduction
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 1 CS60021: Scalable Data Mining Dimensionality Reduction Sourangshu Bhattacharya Assumption: Data lies on or near a
More informationFast Monte Carlo Algorithms for Matrix Operations & Massive Data Set Analysis
Fast Monte Carlo Algorithms for Matrix Operations & Massive Data Set Analysis Michael W. Mahoney Yale University Dept. of Mathematics http://cs-www.cs.yale.edu/homes/mmahoney Joint work with: P. Drineas
More informationEmpirical Performance of Approximate Algorithms for Low Rank Approximation
Empirical Performance of Approximate Algorithms for Low Rank Approximation Dimitris Konomis (dkonomis@cs.cmu.edu) Machine Learning Department (MLD) School of Computer Science (SCS) Carnegie Mellon University
More informationFast Dimension Reduction
Fast Dimension Reduction MMDS 2008 Nir Ailon Google Research NY Fast Dimension Reduction Using Rademacher Series on Dual BCH Codes (with Edo Liberty) The Fast Johnson Lindenstrauss Transform (with Bernard
More informationMemory Efficient Kernel Approximation
Si Si Department of Computer Science University of Texas at Austin ICML Beijing, China June 23, 2014 Joint work with Cho-Jui Hsieh and Inderjit S. Dhillon Outline Background Motivation Low-Rank vs. Block
More informationLecture 12: Randomized Least-squares Approximation in Practice, Cont. 12 Randomized Least-squares Approximation in Practice, Cont.
Stat60/CS94: Randomized Algorithms for Matrices and Data Lecture 1-10/14/013 Lecture 1: Randomized Least-squares Approximation in Practice, Cont. Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning:
More informationOPERATIONS on large matrices are a cornerstone of
1 On Sparse Representations of Linear Operators and the Approximation of Matrix Products Mohamed-Ali Belabbas and Patrick J. Wolfe arxiv:0707.4448v2 [cs.ds] 26 Jun 2009 Abstract Thus far, sparse representations
More informationChapter XII: Data Pre and Post Processing
Chapter XII: Data Pre and Post Processing Information Retrieval & Data Mining Universität des Saarlandes, Saarbrücken Winter Semester 2013/14 XII.1 4-1 Chapter XII: Data Pre and Post Processing 1. Data
More informationSeeing the Forest from the Trees in Two Looks: Matrix Sketching by Cascaded Bilateral Sampling
Seeing the orest from the Trees in Two Looks: Matrix Sketching by Cascaded Bilateral Sampling arxiv:67.7395v3 [cs.lg] 27 Jul 26 Kai Zhang, Chuanren Liu 2, Jie Zhang 3, Hui Xiong 4, Eric Xing 5, Jieping
More informationarxiv: v1 [math.na] 1 Sep 2018
On the perturbation of an L -orthogonal projection Xuefeng Xu arxiv:18090000v1 [mathna] 1 Sep 018 September 5 018 Abstract The L -orthogonal projection is an important mathematical tool in scientific computing
More informationFAST MONTE CARLO ALGORITHMS FOR MATRICES III: COMPUTING A COMPRESSED APPROXIMATE MATRIX DECOMPOSITION
SIAM J COMPUT Vol 36 No 1 pp 184 206 c 2006 Society for Industrial and Applied Mathematics AST MONTE CARLO ALGORITHMS OR MATRICES III: COMPUTING A COMPRESSED APPROXIMATE MATRIX DECOMPOSITION PETROS DRINEAS
More informationMethods for sparse analysis of high-dimensional data, II
Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional
More informationMatrix Approximations
Matrix pproximations Sanjiv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 00 Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning Latent Semantic Indexing (LSI) Given n documents
More informationImproved Distributed Principal Component Analysis
Improved Distributed Principal Component Analysis Maria-Florina Balcan School of Computer Science Carnegie Mellon University ninamf@cscmuedu Yingyu Liang Department of Computer Science Princeton University
More informationarxiv: v1 [math.na] 22 Mar 2019
CUR DECOMPOSITIONS, APPROXIMATIONS, AND PERTURBATIONS KEATON HAMM AND LONGXIU HUANG arxiv:1903.09698v1 [math.na] 22 Mar 2019 Abstract. This article discusses a useful tool in dimensionality reduction and
More informationMethods for sparse analysis of high-dimensional data, II
Methods for sparse analysis of high-dimensional data, II Rachel Ward May 26, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 55 High dimensional
More informationLatent Semantic Analysis. Hongning Wang
Latent Semantic Analysis Hongning Wang CS@UVa VS model in practice Document and query are represented by term vectors Terms are not necessarily orthogonal to each other Synonymy: car v.s. automobile Polysemy:
More informationarxiv: v1 [math.na] 29 Dec 2014
A CUR Factorization Algorithm based on the Interpolative Decomposition Sergey Voronin and Per-Gunnar Martinsson arxiv:1412.8447v1 [math.na] 29 Dec 214 December 3, 214 Abstract An algorithm for the efficient
More informationLecture notes: Applied linear algebra Part 1. Version 2
Lecture notes: Applied linear algebra Part 1. Version 2 Michael Karow Berlin University of Technology karow@math.tu-berlin.de October 2, 2008 1 Notation, basic notions and facts 1.1 Subspaces, range and
More informationEfficient Non-Oblivious Randomized Reduction for Risk Minimization with Improved Excess Risk Guarantee
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-7) Efficient Non-Oblivious Randomized Reduction for Risk Minimization with Improved Excess Risk Guarantee Yi Xu, Haiqin
More informationLecture 9: Numerical Linear Algebra Primer (February 11st)
10-725/36-725: Convex Optimization Spring 2015 Lecture 9: Numerical Linear Algebra Primer (February 11st) Lecturer: Ryan Tibshirani Scribes: Avinash Siravuru, Guofan Wu, Maosheng Liu Note: LaTeX template
More informationLow Rank Approximation Lecture 3. Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL
Low Rank Approximation Lecture 3 Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch 1 Sampling based approximation Aim: Obtain rank-r approximation
More informationInformation-Theoretic Methods in Data Science
Information-Theoretic Methods in Data Science Information-theoretic bounds on sketching Mert Pilanci Department of Electrical Engineering Stanford University Contents Information-theoretic bounds on sketching
More information