Lecture 18 Nov 3rd, 2015

Size: px
Start display at page:

Download "Lecture 18 Nov 3rd, 2015"

Transcription

1 CS 229r: Algorithms for Big Data Fall 2015 Prof. Jelani Nelson Lecture 18 Nov 3rd, 2015 Scribe: Jefferson Lee 1 Overview Low-rank approximation, Compression Sensing 2 Last Time We looked at three different regression methods. The first was based on ε-subspace embeddings. The second was an iterative approach, building a well-conditioned matrix good for stochastic gradient descent. The third was formulated as follows: For the least square problem min x Sx b 2, which has optimal solution x = S + b, and approximate solution x = argmin x ΠSx Πx 2, we let x = Uα, w = Sx b, Uβ = S x Sx where S = UΣV T. We proved that (ΠU) T (ΠU)β = (ΠU) T Πw last time. These results from regression will appear in our work for low-rank approximation. 3 Low-rank approximation The basic idea is a huge matrix A R n d with n, d both very large - say, n users rating d movies. We might believe that the users are linear combinations of a few (k) basic types. We want to discover this low-rank structure. More formally: Given a matrix A R n d, we want to compute A k := argmin rank(b) k A B X. Some now argue that we should look for a non-negative matrix factorization; nevertheless, this version is still used. Theorem 1 (Eckart-Young). Let A = UΣV T be a singular-value decomposition of A where rank(a) = r and Σ is diagonal with entries σ 1 σ 2... σ r > 0, then under X = F, A k = U k Σ k Vk T is the minimizer where U k and V k are the first k columns of U and V and Σ k = diag(σ 1,..., σ k ). Our output is then U k, Σ k, V k. We can calculate A k in O(nd 2 ) time, by calculating the SVD of A. We would like to do better. First, a few definitions: Definition 2. Proj A B is the projection of the columns of B onto the colspace(a). Definition 3. Let A = UΣV T be a singular decomposition. A + = V Σ 1 U T is called Moore- Penrose pseudoinverse of A. 1

2 3.1 Algorithm Today we are going to use a sketch which is used both in subspace embedding and approximate matrix multiplication to compute Ãk with rank at most k such that A Ãk F (1+ɛ) A A k F, following Sarlós approach [8]. The first works which got some decent error (like ε A F ) was due to Papadimitriou [7] and Frieze, Kanna and Vempala [5]. Theorem 4. Define Ãk = Proj AΠ T,k(A). As long as Π R m n is an 1/2 subspace embedding for a certain k-dimensional subspace V k and satisfies approximate matrix multiplication with error ε/k, then A Ãk F (1 + O(ε)) A A k F, where Proj V,k (A) is the best rank k approximation to Proj V (A), i.e., projecting the columns of A to V. Before we prove this theorem, let us first convince ourselves that this algorithm is fast, and that we can compute Proj AΠ T,k(A) quickly. To satisfy the conditions in the above theorem, we know that Π R m d can be chosen with m = O(k/ε) e.g. using a random sign matrix (or slightly larger m using a faster subspace embedding). We need to multiply AΠ T. We can use a fast subspace embedding to compute AΠ T quickly, then we can compute the SVD of AΠ T = U Σ V T in O(nm 2 ) time. Let [ ] k denote the best rank-k approximation under Frobenius norm. We then want to compute [U U T A] k = U [U T A] k. Computing U T A takes O(mnd) time, then computing the SVD of U T A takes O(dm 2 ) time. Note that this is already better than the O(nd 2 ) time to compute the SVD of A, but we can do better if we approximate. In particular, by using the right combination of subspace embeddings, for constant ε the scheme described here can be made to take O(nnz(A)) + Õ(ndk) time (where Õ hides log n factors). We will shoot instead for O(nnz(A)) + Õ(nk2 ). Consider that: We want to compute Ãk = argmin X:rank(X) k U X A 2 F. If X+ is the argmin without the rank constraint, then the argmin with the rank constraint is [U X + ] k = U [X + ] k, where [ ] k denotes the best rank-k approximation under Frobenius error. Rather than find X +, we use approximate regression to find an approximately optimal X. That is, we compute X = argmin X Π U X Π A 2 F where Π is an α-subspace embedding for the column space of U (note U has rank m). Then our final output is U [ X] k. Why does the above work? (Thanks to Michael Cohen for describing the following simple argument.) First note ( ) 1 + α U X + A 2 F U X A 2 1 α F = (U X + A) + U ( X X + ) 2 F = U X + A 2 F + U ( X X + ) 2 F = U X + A 2 F + X X + 2 F 2

3 and thus X X + 2 F O(α) U X + A 2 F. The second equality above holds since the matrix U preserves Frobenius norms, and the first equality since U X + A has a column space orthogonal to the column space of U. Next, suppose f, f are two functions mapping the same domain to R such that f(x) f(x) η for all x in the domain. Then clearly f(argmin x f(x)) minx f(x) + 2η. Now, let the domain be the set of all rank-k matrices, and let f(z) = U X + Z F and f(z) = U X Z F. Then η = U X + U X F = X + X F. Thus U [ X] k A 2 F = U [ X] k U X + F + (I U U T )A 2 F ( U [X + ] k U X + F + 2 X + X F ) 2 + (I U U T )A 2 F ( U [X + ] k U X + F + O( α) U X + A F ) 2 + (I U U T )A 2 F = ( U [X + ] k U X + F + O( α) U X + A F ) 2 + U X + A 2 F = U [X + ] k U X + 2 F + O( α) U [X + ] k U X + F U X + A F + O(α) U X + A 2 F + U X + A 2 F = U [X + ] k A 2 F + O( α) U [X + ] k U X + F U X + A F + O(α) U X + A 2 F (1) (1 + O(α)) U [X + ] k A 2 F + O( α) U [X + ] k U X + F U X + A F (2) (1 + O(α)) U [X + ] k A 2 F + O( α) U [X + ] k A 2 F (3) = (1 + O( α)) U [X + ] k A 2 F where (1) used that U [X + ] k U X + + U X + A 2 F = U [X + ] k A 2 F + U [X + ] k U X + 2 F since U X + A has columns orthogonal to the column space of U. Also, (2) used that U X + A F U [X + ] k A F, since U X + is the best Frobenius approximation to A in the column space of U. Finally, (3) again used U X + A F U [X + ] k A F, and also used the triangle inequality U [X + ] k U X + F U [X + ] k A F + U X + A F 2 U [X + ] k A F. Thus we have established the following theorem, which follows from the above calculations and Theorem 4. Theorem 5. Let Π 1 R m 1 n be a 1/2 subspace embedding for a certain k-dimensional subspace V k, and suppose Π 1 also satisfies approximate matrix multiplication with error ε/k. Let Π 2 R m 2 n be an α-subspace embedding for the column space of U, where AΠ T 1 = U Σ V T is the SVD (and hence U has rank at most m 1 ). Let à k = U [ X] k where X = argmin Π 2 U X Π 2 A 2 F. X Then à k has rank k and A à k F (1 + O(ε) + O( α)) A A k F. In particular, the error is (1 + O(ε)) A A k F for α = ε. 3

4 In the remaining part of these lecture notes, we show that Proj AΠ T,k(A) actually is a good rank-k approximation to A (i.e. we prove Theorem 4). In the following proof, we will denote the first k columns of U and V as U k and V k and the remaining columns by U k and V k. Proof. Let Y be the column span of Proj AΠ T (A k ) and the orthogonal projection operator onto Y as P. Then, A Proj AΠ T,k(A) 2 F A P A 2 F = A k P A k 2 F + A k P A k 2 F Then we can bound the second term in that sum: A k = (I P )A k 2 F A k 1 F Now we just need to show that A k P A k 2 F ε A k 2 F : A P A 2 F = A k (AΠ T )(AΠ T ) + A k ) 2 F A k (AΠ T )(AΠ T ) + A k 2 F = = A T k AT k (ΠAT ) + (ΠA T ) 2 F n A T (i) k A T k (ΠA T ) + (ΠA T ) (i) 2 2 i=1 Here superscript (i) means the ith column. Now we have a bunch of different approximate regression problems which have the following form: min x ΠA T k x Π(AT ) (i) 2, which has optimal value x = (ΠA T k )+ (ΠA T ) (i). Consider the problem min x ΠA T k x (AT ) (i) 2 as original regression problem. In this case optimal x gives A T k x = Proj A T k ((A T ) (i) ) = (A T k )(i). Now we can use the analysis on the approximate least square from last week. In our problem, we have a bunch of w i, β i, α i with S = A T k = V kσ k Uk T and b i = (A T ) (i). Here, w i 2 = Sx b 2 = (A T k )(i) (A T ) (i) 2. Hence i w i 2 = A A k 2 F. On the other hand, i β i 2 = A T k AT k (ΠAT k )+ (ΠA T ) 2 F. Since (ΠV k) T (ΠV k )β i = (ΠV k ) T Πw i, if all singlar values of ΠV k are at least 1/2 1/4, we have i β i 2 2 i (ΠV k ) T (ΠV k )β i 2 = i (ΠV k ) T Πw i 2 = (ΠV k ) T ΠW T F where W has w i as ith column. What does it look like? (ΠV k ) T ΠW exactly look like approximate matrix multiplication of V k and W. Since columns of W and V k are orthogonal, we have Vk T W = 0, hence if Π is a sketch for approximate matrix multiplication of error ε = ε/k, then P Π ( (ΠV k ) T (ΠW ) 2 F > ε W 2 F ) < δ since V k 2 F = k. Clearly W 2 F = i w i 2 = A A k 2 F, we get the desired result. 4

5 3.2 Further results What we just talked about gives a good low-rank approximation but every column of à k is a linear combination of potentially all columns of A. In applications (e.g. information retrieval), we want a few number of columns be spanning our low dimensional subspace. There has been work on finding fewer columns of A (call them C) such that A (CC + A) k 2 F is small, but we will not talk about it deeply. Boutsidis et al. [1] showed that we can take C with 2k/ε columns and error ɛ A A k F. Guruswami and Sinop got C with k ε + k 1 columns such that A CC+ A F (1 + ɛ) A A k F. 3.3 K-Means as a Low-Rank Approximation Problem The k-means problem, which was stated on the problem set, involved a set of points (x 1,..., x n ) R d. Let A be the matrix with the ith row equal to x T i. Given a partition P(P 1,..., P k ) of points into k clusters, then the best centroids are averages of the clusters. Define the matrix X p R n k such that: 1 ifi P j (X p ) i,j = Pj 0 otherwise Note that X T p X p = I. It can e shown that the ith row of X p X T p A is the centroid of the cluster that X i belongs to. Thus, solving k-means is equivalent to finding some P = argmin A X p X T p A - this is a constrained rank-k approximation problem. Cohen et. al[3] show that Π can have m = O(k/ε 2 ) for a (1 + ε) approximation, or a m = O(lg k/ε 2 ) for a (9 + ε) approximation (the second bound is specifically for the k-means problem). It is an open problem whether this second bon can get a better approximation. 4 Compressed Sensing 4.1 Basic Idea Consider x R n. If x is a k sparse vector, we could represent it in a far more compressed manner. Thus, we define a measure of how compressible a vector is as a measure of how close it is to being k sparse. Definition 6. Let x head(k) be the k elements of largest magnitude in x. Let x tail(k) be the rest of x. Therefore, we call x compressible if x tail(k) is small. The goal here is to approximately recover x from few linear measurements. Consider we have a matrix Πx such that each the ith row is equal to α i, x for some α 1,..., α m R n. We want to recover a x from ΠX such that x x p C ε,p,q x tail(k) q, where C ε,p,q is some constant dependent on ε, p and q. Depending on the problem formulation, I may or may not get to choose this matrix Π. 5

6 4.2 Approximate Sparsity There are many practical applications in which approximately sparse vectors appear. Pixelated images, for example, are usually approximately sparse in some basis U. For example, consider an n by n image x R n2. then x = Uy for some basis U, and y is approximately sparse. Thus we can get measurements from ΠU y. Images are typically sparse in the wavelet basis. We will describe how to transform to the Haar wavelet basis here. Assume that n is a power of two. Then: 1. Break the image x into squares of size four pixels. 2. Initialize a new image, with four regions R 1, R 2, R 3, R Each block of four pixels, b, in x has a corresponding single pixel in each of R 1b, R 2b, R 3b, and R 4b based on its location. For each block of four b: Let the b have pixel values p 1, p 2, p 3, and p 4. R 1b 1 4 (p 1 + p 2 + p 3 + p 4 ) R 2b 1 4 (p 1 p 2 + p 3 p 4 ) R 3b 1 4 (p 1 p 2 p 3 + p 4 ) R 4b 1 4 (p 1 p 2 + p 3 p 4 ) 4. Recurse on R 1, R 2, R 3, and R 4. The general idea is this: usually, pixels are relatively constant in certain regions. Thus, the values in all regions except for the first are usually relatively small. If you view images after this transform, the upper left hand regions will often be closer to white, while the rest will be relatively sparse. Theorem 7 (Candès, Romberg, Tao [2], Donoho [4]). There exists a Π R m n with m = O(klg(n/k)) and a poly-time algorithm Alg s.t. if x = Alg(Πx) then x x 2 O(k 1/2 ) x tail(k) 1 If x is actually k-spares, 2k measurements are necessary and sufficient. We will see this by examining Prony s method in one of our problem sets, and investigate compressed sensing further next class. References [1] Christos Boutsidis, Petros Drineas, Malik Magdon-Ismail. Near Optimal Column-based Matrix Reconstruction. FOCS, , [2] Emmanuel J. Candès, Justin K. Romberg, Terence Tao. Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Transactions on Information Theory, 52(2): , [3] Michael B. Cohen, Sam Elder, Cameron Musco, Christopher Musco, Madalina Persu. Dimensionality Reduction for k-means Clustering and Low Rank Approximation. STOC, ,

7 [4] David L. Donoho. Compressed sensing, IEEE Transactions on Information Theory, 52(4): , [5] Alan M. Frieze, Ravi Kannan, Santosh Vempala. Fast Monte-carlo Algorithms for Finding Low-rank Approximations. J. ACM, 51(6): , [6] Venkatesan Guruswami, Ali Kemal Sinop. Optimal Column-based Low-rank Matrix Reconstruction. SODA, , [7] Christos H. Papadimitriou, Prabhakar Raghavan, Hisao Tamaki, Santosh Vempala. Latent Semantic Indexing: A Probabilistic Analysis. J. Comput. Syst. Sci., 61(2): , [8] Tamás Sarlós. Improved Approximation Algorithms for Large Matrices via Random Projections. FOCS, ,

CS 229r: Algorithms for Big Data Fall Lecture 17 10/28

CS 229r: Algorithms for Big Data Fall Lecture 17 10/28 CS 229r: Algorithms for Big Data Fall 2015 Prof. Jelani Nelson Lecture 17 10/28 Scribe: Morris Yau 1 Overview In the last lecture we defined subspace embeddings a subspace embedding is a linear transformation

More information

dimensionality reduction for k-means and low rank approximation

dimensionality reduction for k-means and low rank approximation dimensionality reduction for k-means and low rank approximation Michael Cohen, Sam Elder, Cameron Musco, Christopher Musco, Mădălina Persu Massachusetts Institute of Technology 0 overview Simple techniques

More information

to be more efficient on enormous scale, in a stream, or in distributed settings.

to be more efficient on enormous scale, in a stream, or in distributed settings. 16 Matrix Sketching The singular value decomposition (SVD) can be interpreted as finding the most dominant directions in an (n d) matrix A (or n points in R d ). Typically n > d. It is typically easy to

More information

CS 229r: Algorithms for Big Data Fall Lecture 19 Nov 5

CS 229r: Algorithms for Big Data Fall Lecture 19 Nov 5 CS 229r: Algorithms for Big Data Fall 215 Prof. Jelani Nelson Lecture 19 Nov 5 Scribe: Abdul Wasay 1 Overview In the last lecture, we started discussing the problem of compressed sensing where we are given

More information

Randomized Numerical Linear Algebra: Review and Progresses

Randomized Numerical Linear Algebra: Review and Progresses ized ized SVD ized : Review and Progresses Zhihua Department of Computer Science and Engineering Shanghai Jiao Tong University The 12th China Workshop on Machine Learning and Applications Xi an, November

More information

Lecture 9: Matrix approximation continued

Lecture 9: Matrix approximation continued 0368-348-01-Algorithms in Data Mining Fall 013 Lecturer: Edo Liberty Lecture 9: Matrix approximation continued Warning: This note may contain typos and other inaccuracies which are usually discussed during

More information

sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU)

sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU) sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU) 0 overview Our Contributions: 1 overview Our Contributions: A near optimal low-rank

More information

Randomized algorithms for the approximation of matrices

Randomized algorithms for the approximation of matrices Randomized algorithms for the approximation of matrices Luis Rademacher The Ohio State University Computer Science and Engineering (joint work with Amit Deshpande, Santosh Vempala, Grant Wang) Two topics

More information

Column Selection via Adaptive Sampling

Column Selection via Adaptive Sampling Column Selection via Adaptive Sampling Saurabh Paul Global Risk Sciences, Paypal Inc. saupaul@paypal.com Malik Magdon-Ismail CS Dept., Rensselaer Polytechnic Institute magdon@cs.rpi.edu Petros Drineas

More information

Sketching as a Tool for Numerical Linear Algebra

Sketching as a Tool for Numerical Linear Algebra Sketching as a Tool for Numerical Linear Algebra (Part 2) David P. Woodruff presented by Sepehr Assadi o(n) Big Data Reading Group University of Pennsylvania February, 2015 Sepehr Assadi (Penn) Sketching

More information

Dimensionality Reduction Notes 3

Dimensionality Reduction Notes 3 Dimensionality Reduction Notes 3 Jelani Nelson minilek@seas.harvard.edu August 13, 2015 1 Gordon s theorem Let T be a finite subset of some normed vector space with norm X. We say that a sequence T 0 T

More information

EECS 275 Matrix Computation

EECS 275 Matrix Computation EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 22 1 / 21 Overview

More information

Relative-Error CUR Matrix Decompositions

Relative-Error CUR Matrix Decompositions RandNLA Reading Group University of California, Berkeley Tuesday, April 7, 2015. Motivation study [low-rank] matrix approximations that are explicitly expressed in terms of a small numbers of columns and/or

More information

Approximate Spectral Clustering via Randomized Sketching

Approximate Spectral Clustering via Randomized Sketching Approximate Spectral Clustering via Randomized Sketching Christos Boutsidis Yahoo! Labs, New York Joint work with Alex Gittens (Ebay), Anju Kambadur (IBM) The big picture: sketch and solve Tradeoff: Speed

More information

A Fast Algorithm For Computing The A-optimal Sampling Distributions In A Big Data Linear Regression

A Fast Algorithm For Computing The A-optimal Sampling Distributions In A Big Data Linear Regression A Fast Algorithm For Computing The A-optimal Sampling Distributions In A Big Data Linear Regression Hanxiang Peng and Fei Tan Indiana University Purdue University Indianapolis Department of Mathematical

More information

Efficiently Implementing Sparsity in Learning

Efficiently Implementing Sparsity in Learning Efficiently Implementing Sparsity in Learning M. Magdon-Ismail Rensselaer Polytechnic Institute (Joint Work) December 9, 2013. Out-of-Sample is What Counts NO YES A pattern exists We don t know it We have

More information

A randomized algorithm for approximating the SVD of a matrix

A randomized algorithm for approximating the SVD of a matrix A randomized algorithm for approximating the SVD of a matrix Joint work with Per-Gunnar Martinsson (U. of Colorado) and Vladimir Rokhlin (Yale) Mark Tygert Program in Applied Mathematics Yale University

More information

Compressed Sensing and Robust Recovery of Low Rank Matrices

Compressed Sensing and Robust Recovery of Low Rank Matrices Compressed Sensing and Robust Recovery of Low Rank Matrices M. Fazel, E. Candès, B. Recht, P. Parrilo Electrical Engineering, University of Washington Applied and Computational Mathematics Dept., Caltech

More information

Strengthened Sobolev inequalities for a random subspace of functions

Strengthened Sobolev inequalities for a random subspace of functions Strengthened Sobolev inequalities for a random subspace of functions Rachel Ward University of Texas at Austin April 2013 2 Discrete Sobolev inequalities Proposition (Sobolev inequality for discrete images)

More information

RandNLA: Randomized Numerical Linear Algebra

RandNLA: Randomized Numerical Linear Algebra RandNLA: Randomized Numerical Linear Algebra Petros Drineas Rensselaer Polytechnic Institute Computer Science Department To access my web page: drineas RandNLA: sketch a matrix by row/ column sampling

More information

Optimality of the Johnson-Lindenstrauss Lemma

Optimality of the Johnson-Lindenstrauss Lemma Optimality of the Johnson-Lindenstrauss Lemma Kasper Green Larsen Jelani Nelson September 7, 2016 Abstract For any integers d, n 2 and 1/(min{n, d}) 0.4999 < ε < 1, we show the existence of a set of n

More information

EE 381V: Large Scale Optimization Fall Lecture 24 April 11

EE 381V: Large Scale Optimization Fall Lecture 24 April 11 EE 381V: Large Scale Optimization Fall 2012 Lecture 24 April 11 Lecturer: Caramanis & Sanghavi Scribe: Tao Huang 24.1 Review In past classes, we studied the problem of sparsity. Sparsity problem is that

More information

randomized block krylov methods for stronger and faster approximate svd

randomized block krylov methods for stronger and faster approximate svd randomized block krylov methods for stronger and faster approximate svd Cameron Musco and Christopher Musco December 2, 25 Massachusetts Institute of Technology, EECS singular value decomposition n d left

More information

arxiv: v2 [cs.ds] 17 Feb 2016

arxiv: v2 [cs.ds] 17 Feb 2016 Efficient Algorithm for Sparse Matrices Mina Ghashami University of Utah ghashami@cs.utah.edu Edo Liberty Yahoo Labs edo.liberty@yahoo.com Jeff M. Phillips University of Utah jeffp@cs.utah.edu arxiv:1602.00412v2

More information

Low-Rank PSD Approximation in Input-Sparsity Time

Low-Rank PSD Approximation in Input-Sparsity Time Low-Rank PSD Approximation in Input-Sparsity Time Kenneth L. Clarkson IBM Research Almaden klclarks@us.ibm.com David P. Woodruff IBM Research Almaden dpwoodru@us.ibm.com Abstract We give algorithms for

More information

Sketching as a Tool for Numerical Linear Algebra

Sketching as a Tool for Numerical Linear Algebra Sketching as a Tool for Numerical Linear Algebra David P. Woodruff presented by Sepehr Assadi o(n) Big Data Reading Group University of Pennsylvania February, 2015 Sepehr Assadi (Penn) Sketching for Numerical

More information

arxiv: v6 [cs.ds] 11 Jul 2012

arxiv: v6 [cs.ds] 11 Jul 2012 Simple and Deterministic Matrix Sketching Edo Liberty arxiv:1206.0594v6 [cs.ds] 11 Jul 2012 Abstract We adapt a well known streaming algorithm for approximating item frequencies to the matrix sketching

More information

Bindel, Fall 2009 Matrix Computations (CS 6210) Week 8: Friday, Oct 17

Bindel, Fall 2009 Matrix Computations (CS 6210) Week 8: Friday, Oct 17 Logistics Week 8: Friday, Oct 17 1. HW 3 errata: in Problem 1, I meant to say p i < i, not that p i is strictly ascending my apologies. You would want p i > i if you were simply forming the matrices and

More information

7 Principal Component Analysis

7 Principal Component Analysis 7 Principal Component Analysis This topic will build a series of techniques to deal with high-dimensional data. Unlike regression problems, our goal is not to predict a value (the y-coordinate), it is

More information

CS60021: Scalable Data Mining. Dimensionality Reduction

CS60021: Scalable Data Mining. Dimensionality Reduction J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 1 CS60021: Scalable Data Mining Dimensionality Reduction Sourangshu Bhattacharya Assumption: Data lies on or near a

More information

RandNLA: Randomization in Numerical Linear Algebra

RandNLA: Randomization in Numerical Linear Algebra RandNLA: Randomization in Numerical Linear Algebra Petros Drineas Department of Computer Science Rensselaer Polytechnic Institute To access my web page: drineas Why RandNLA? Randomization and sampling

More information

Randomized Algorithms for Matrix Computations

Randomized Algorithms for Matrix Computations Randomized Algorithms for Matrix Computations Ilse Ipsen Students: John Holodnak, Kevin Penner, Thomas Wentworth Research supported in part by NSF CISE CCF, DARPA XData Randomized Algorithms Solve a deterministic

More information

Dense Error Correction for Low-Rank Matrices via Principal Component Pursuit

Dense Error Correction for Low-Rank Matrices via Principal Component Pursuit Dense Error Correction for Low-Rank Matrices via Principal Component Pursuit Arvind Ganesh, John Wright, Xiaodong Li, Emmanuel J. Candès, and Yi Ma, Microsoft Research Asia, Beijing, P.R.C Dept. of Electrical

More information

Multi-Linear Mappings, SVD, HOSVD, and the Numerical Solution of Ill-Conditioned Tensor Least Squares Problems

Multi-Linear Mappings, SVD, HOSVD, and the Numerical Solution of Ill-Conditioned Tensor Least Squares Problems Multi-Linear Mappings, SVD, HOSVD, and the Numerical Solution of Ill-Conditioned Tensor Least Squares Problems Lars Eldén Department of Mathematics, Linköping University 1 April 2005 ERCIM April 2005 Multi-Linear

More information

Principal Component Analysis

Principal Component Analysis Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used

More information

Can matrix coherence be efficiently and accurately estimated?

Can matrix coherence be efficiently and accurately estimated? Mehryar Mohri Courant Institute and Google Research New York, NY mohri@cs.nyu.edu Ameet Talwalkar Computer Science Division University of California, Berkeley ameet@eecs.berkeley.edu Abstract Matrix coherence

More information

Lecture 9: Low Rank Approximation

Lecture 9: Low Rank Approximation CSE 521: Design and Analysis of Algorithms I Fall 2018 Lecture 9: Low Rank Approximation Lecturer: Shayan Oveis Gharan February 8th Scribe: Jun Qi Disclaimer: These notes have not been subjected to the

More information

Lecture 24: Element-wise Sampling of Graphs and Linear Equation Solving. 22 Element-wise Sampling of Graphs and Linear Equation Solving

Lecture 24: Element-wise Sampling of Graphs and Linear Equation Solving. 22 Element-wise Sampling of Graphs and Linear Equation Solving Stat260/CS294: Randomized Algorithms for Matrices and Data Lecture 24-12/02/2013 Lecture 24: Element-wise Sampling of Graphs and Linear Equation Solving Lecturer: Michael Mahoney Scribe: Michael Mahoney

More information

Sparse Features for PCA-Like Linear Regression

Sparse Features for PCA-Like Linear Regression Sparse Features for PCA-Like Linear Regression Christos Boutsidis Mathematical Sciences Department IBM T J Watson Research Center Yorktown Heights, New York cboutsi@usibmcom Petros Drineas Computer Science

More information

Sketched Ridge Regression:

Sketched Ridge Regression: Sketched Ridge Regression: Optimization and Statistical Perspectives Shusen Wang UC Berkeley Alex Gittens RPI Michael Mahoney UC Berkeley Overview Ridge Regression min w f w = 1 n Xw y + γ w Over-determined:

More information

Using Matrix Decompositions in Formal Concept Analysis

Using Matrix Decompositions in Formal Concept Analysis Using Matrix Decompositions in Formal Concept Analysis Vaclav Snasel 1, Petr Gajdos 1, Hussam M. Dahwa Abdulla 1, Martin Polovincak 1 1 Dept. of Computer Science, Faculty of Electrical Engineering and

More information

Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora

Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora Scribe: Today we continue the

More information

Fast Approximate Matrix Multiplication by Solving Linear Systems

Fast Approximate Matrix Multiplication by Solving Linear Systems Electronic Colloquium on Computational Complexity, Report No. 117 (2014) Fast Approximate Matrix Multiplication by Solving Linear Systems Shiva Manne 1 and Manjish Pal 2 1 Birla Institute of Technology,

More information

AN INTRODUCTION TO COMPRESSIVE SENSING

AN INTRODUCTION TO COMPRESSIVE SENSING AN INTRODUCTION TO COMPRESSIVE SENSING Rodrigo B. Platte School of Mathematical and Statistical Sciences APM/EEE598 Reverse Engineering of Complex Dynamical Networks OUTLINE 1 INTRODUCTION 2 INCOHERENCE

More information

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2014 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276,

More information

Problem set 5: SVD, Orthogonal projections, etc.

Problem set 5: SVD, Orthogonal projections, etc. Problem set 5: SVD, Orthogonal projections, etc. February 21, 2017 1 SVD 1. Work out again the SVD theorem done in the class: If A is a real m n matrix then here exist orthogonal matrices such that where

More information

Properties of Matrices and Operations on Matrices

Properties of Matrices and Operations on Matrices Properties of Matrices and Operations on Matrices A common data structure for statistical analysis is a rectangular array or matris. Rows represent individual observational units, or just observations,

More information

Jeffrey D. Ullman Stanford University

Jeffrey D. Ullman Stanford University Jeffrey D. Ullman Stanford University 2 Often, our data can be represented by an m-by-n matrix. And this matrix can be closely approximated by the product of two matrices that share a small common dimension

More information

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular

More information

14 Singular Value Decomposition

14 Singular Value Decomposition 14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

Lecture 9: September 28

Lecture 9: September 28 0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

Matrices, Vector Spaces, and Information Retrieval

Matrices, Vector Spaces, and Information Retrieval Matrices, Vector Spaces, and Information Authors: M. W. Berry and Z. Drmac and E. R. Jessup SIAM 1999: Society for Industrial and Applied Mathematics Speaker: Mattia Parigiani 1 Introduction Large volumes

More information

Simple and Deterministic Matrix Sketches

Simple and Deterministic Matrix Sketches Simple and Deterministic Matrix Sketches Edo Liberty + ongoing work with: Mina Ghashami, Jeff Philips and David Woodruff. Edo Liberty: Simple and Deterministic Matrix Sketches 1 / 41 Data Matrices Often

More information

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Numerical Linear Algebra Background Cho-Jui Hsieh UC Davis May 15, 2018 Linear Algebra Background Vectors A vector has a direction and a magnitude

More information

Singular Value Decompsition

Singular Value Decompsition Singular Value Decompsition Massoud Malek One of the most useful results from linear algebra, is a matrix decomposition known as the singular value decomposition It has many useful applications in almost

More information

Randomness-in-Structured Ensembles for Compressed Sensing of Images

Randomness-in-Structured Ensembles for Compressed Sensing of Images Randomness-in-Structured Ensembles for Compressed Sensing of Images Abdolreza Abdolhosseini Moghadam Dep. of Electrical and Computer Engineering Michigan State University Email: abdolhos@msu.edu Hayder

More information

Information Retrieval

Information Retrieval Introduction to Information CS276: Information and Web Search Christopher Manning and Pandu Nayak Lecture 13: Latent Semantic Indexing Ch. 18 Today s topic Latent Semantic Indexing Term-document matrices

More information

Lecture 5 : Projections

Lecture 5 : Projections Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization

More information

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2016 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276,

More information

UNIT 6: The singular value decomposition.

UNIT 6: The singular value decomposition. UNIT 6: The singular value decomposition. María Barbero Liñán Universidad Carlos III de Madrid Bachelor in Statistics and Business Mathematical methods II 2011-2012 A square matrix is symmetric if A T

More information

Lecture 12: Randomized Least-squares Approximation in Practice, Cont. 12 Randomized Least-squares Approximation in Practice, Cont.

Lecture 12: Randomized Least-squares Approximation in Practice, Cont. 12 Randomized Least-squares Approximation in Practice, Cont. Stat60/CS94: Randomized Algorithms for Matrices and Data Lecture 1-10/14/013 Lecture 1: Randomized Least-squares Approximation in Practice, Cont. Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning:

More information

Low Rank Matrix Approximation

Low Rank Matrix Approximation Low Rank Matrix Approximation John T. Svadlenka Ph.D. Program in Computer Science The Graduate Center of the City University of New York New York, NY 10036 USA jsvadlenka@gradcenter.cuny.edu Abstract Low

More information

Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds

Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds Tao Wu Institute for Mathematics and Scientific Computing Karl-Franzens-University of Graz joint work with Prof.

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 5: Numerical Linear Algebra Cho-Jui Hsieh UC Davis April 20, 2017 Linear Algebra Background Vectors A vector has a direction and a magnitude

More information

IV. Matrix Approximation using Least-Squares

IV. Matrix Approximation using Least-Squares IV. Matrix Approximation using Least-Squares The SVD and Matrix Approximation We begin with the following fundamental question. Let A be an M N matrix with rank R. What is the closest matrix to A that

More information

10-725/36-725: Convex Optimization Prerequisite Topics

10-725/36-725: Convex Optimization Prerequisite Topics 10-725/36-725: Convex Optimization Prerequisite Topics February 3, 2015 This is meant to be a brief, informal refresher of some topics that will form building blocks in this course. The content of the

More information

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 61, NO. 2, FEBRUARY

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 61, NO. 2, FEBRUARY IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 61, NO. 2, FEBRUARY 2015 1045 Randomized Dimensionality Reduction for k-means Clustering Christos Boutsidis, Anastasios Zouzias, Michael W. Mahoney, and Petros

More information

Using SVD to Recommend Movies

Using SVD to Recommend Movies Michael Percy University of California, Santa Cruz Last update: December 12, 2009 Last update: December 12, 2009 1 / Outline 1 Introduction 2 Singular Value Decomposition 3 Experiments 4 Conclusion Last

More information

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method CS 395T: Sublinear Algorithms Fall 2016 Prof. Eric Price Lecture 13 October 6, 2016 Scribe: Kiyeon Jeon and Loc Hoang 1 Overview In the last lecture we covered the lower bound for p th moment (p > 2) and

More information

Lecture Notes 5: Multiresolution Analysis

Lecture Notes 5: Multiresolution Analysis Optimization-based data analysis Fall 2017 Lecture Notes 5: Multiresolution Analysis 1 Frames A frame is a generalization of an orthonormal basis. The inner products between the vectors in a frame and

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu

More information

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection

More information

Pseudoinverse & Moore-Penrose Conditions

Pseudoinverse & Moore-Penrose Conditions ECE 275AB Lecture 7 Fall 2008 V1.0 c K. Kreutz-Delgado, UC San Diego p. 1/1 Lecture 7 ECE 275A Pseudoinverse & Moore-Penrose Conditions ECE 275AB Lecture 7 Fall 2008 V1.0 c K. Kreutz-Delgado, UC San Diego

More information

THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR

THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR 1. Definition Existence Theorem 1. Assume that A R m n. Then there exist orthogonal matrices U R m m V R n n, values σ 1 σ 2... σ p 0 with p = min{m, n},

More information

Linear Algebra for Machine Learning. Sargur N. Srihari

Linear Algebra for Machine Learning. Sargur N. Srihari Linear Algebra for Machine Learning Sargur N. srihari@cedar.buffalo.edu 1 Overview Linear Algebra is based on continuous math rather than discrete math Computer scientists have little experience with it

More information

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013. The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment Two Caramanis/Sanghavi Due: Tuesday, Feb. 19, 2013. Computational

More information

arxiv: v2 [cs.ds] 1 May 2013

arxiv: v2 [cs.ds] 1 May 2013 Dimension Independent Matrix Square using MapReduce arxiv:1304.1467v2 [cs.ds] 1 May 2013 Reza Bosagh Zadeh Institute for Computational and Mathematical Engineering rezab@stanford.edu Gunnar Carlsson Mathematics

More information

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2013 PROBLEM SET 2

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2013 PROBLEM SET 2 STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2013 PROBLEM SET 2 1. You are not allowed to use the svd for this problem, i.e. no arguments should depend on the svd of A or A. Let W be a subspace of C n. The

More information

Singular Value Decomposition

Singular Value Decomposition Singular Value Decomposition CS 205A: Mathematical Methods for Robotics, Vision, and Graphics Doug James (and Justin Solomon) CS 205A: Mathematical Methods Singular Value Decomposition 1 / 35 Understanding

More information

Signal Recovery from Permuted Observations

Signal Recovery from Permuted Observations EE381V Course Project Signal Recovery from Permuted Observations 1 Problem Shanshan Wu (sw33323) May 8th, 2015 We start with the following problem: let s R n be an unknown n-dimensional real-valued signal,

More information

Data Mining Lecture 4: Covariance, EVD, PCA & SVD

Data Mining Lecture 4: Covariance, EVD, PCA & SVD Data Mining Lecture 4: Covariance, EVD, PCA & SVD Jo Houghton ECS Southampton February 25, 2019 1 / 28 Variance and Covariance - Expectation A random variable takes on different values due to chance The

More information

Fast Monte Carlo Algorithms for Matrix Operations & Massive Data Set Analysis

Fast Monte Carlo Algorithms for Matrix Operations & Massive Data Set Analysis Fast Monte Carlo Algorithms for Matrix Operations & Massive Data Set Analysis Michael W. Mahoney Yale University Dept. of Mathematics http://cs-www.cs.yale.edu/homes/mmahoney Joint work with: P. Drineas

More information

Tighter Low-rank Approximation via Sampling the Leveraged Element

Tighter Low-rank Approximation via Sampling the Leveraged Element Tighter Low-rank Approximation via Sampling the Leveraged Element Srinadh Bhojanapalli The University of Texas at Austin bsrinadh@utexas.edu Prateek Jain Microsoft Research, India prajain@microsoft.com

More information

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx

More information

9 Searching the Internet with the SVD

9 Searching the Internet with the SVD 9 Searching the Internet with the SVD 9.1 Information retrieval Over the last 20 years the number of internet users has grown exponentially with time; see Figure 1. Trying to extract information from this

More information

EE731 Lecture Notes: Matrix Computations for Signal Processing

EE731 Lecture Notes: Matrix Computations for Signal Processing EE731 Lecture Notes: Matrix Computations for Signal Processing James P. Reilly c Department of Electrical and Computer Engineering McMaster University September 22, 2005 0 Preface This collection of ten

More information

Fast low rank approximations of matrices and tensors

Fast low rank approximations of matrices and tensors Fast low rank approximations of matrices and tensors S. Friedland, V. Mehrmann, A. Miedlar and M. Nkengla Univ. Illinois at Chicago & Technische Universität Berlin Gene Golub memorial meeting, Berlin,

More information

Sensing systems limited by constraints: physical size, time, cost, energy

Sensing systems limited by constraints: physical size, time, cost, energy Rebecca Willett Sensing systems limited by constraints: physical size, time, cost, energy Reduce the number of measurements needed for reconstruction Higher accuracy data subject to constraints Original

More information

Conditions for Robust Principal Component Analysis

Conditions for Robust Principal Component Analysis Rose-Hulman Undergraduate Mathematics Journal Volume 12 Issue 2 Article 9 Conditions for Robust Principal Component Analysis Michael Hornstein Stanford University, mdhornstein@gmail.com Follow this and

More information

Collaborative Filtering: A Machine Learning Perspective

Collaborative Filtering: A Machine Learning Perspective Collaborative Filtering: A Machine Learning Perspective Chapter 6: Dimensionality Reduction Benjamin Marlin Presenter: Chaitanya Desai Collaborative Filtering: A Machine Learning Perspective p.1/18 Topics

More information

Quick Introduction to Nonnegative Matrix Factorization

Quick Introduction to Nonnegative Matrix Factorization Quick Introduction to Nonnegative Matrix Factorization Norm Matloff University of California at Davis 1 The Goal Given an u v matrix A with nonnegative elements, we wish to find nonnegative, rank-k matrices

More information

OSNAP: Faster numerical linear algebra algorithms via sparser subspace embeddings

OSNAP: Faster numerical linear algebra algorithms via sparser subspace embeddings OSNAP: Faster numerical linear algebra algorithms via sparser subspace embeddings Jelani Nelson Huy L. Nguy ên Abstract An oblivious subspace embedding (OSE) given some parameters ε, d is a distribution

More information

Randomized Algorithms in Linear Algebra and Applications in Data Analysis

Randomized Algorithms in Linear Algebra and Applications in Data Analysis Randomized Algorithms in Linear Algebra and Applications in Data Analysis Petros Drineas Rensselaer Polytechnic Institute Computer Science Department To access my web page: drineas Why linear algebra?

More information

Random Methods for Linear Algebra

Random Methods for Linear Algebra Gittens gittens@acm.caltech.edu Applied and Computational Mathematics California Institue of Technology October 2, 2009 Outline The Johnson-Lindenstrauss Transform 1 The Johnson-Lindenstrauss Transform

More information

Lecture 16 Oct. 26, 2017

Lecture 16 Oct. 26, 2017 Sketching Algorithms for Big Data Fall 2017 Prof. Piotr Indyk Lecture 16 Oct. 26, 2017 Scribe: Chi-Ning Chou 1 Overview In the last lecture we constructed sparse RIP 1 matrix via expander and showed that

More information

arxiv: v2 [stat.ml] 29 Nov 2018

arxiv: v2 [stat.ml] 29 Nov 2018 Randomized Iterative Algorithms for Fisher Discriminant Analysis Agniva Chowdhury Jiasen Yang Petros Drineas arxiv:1809.03045v2 [stat.ml] 29 Nov 2018 Abstract Fisher discriminant analysis FDA is a widely

More information

Structured matrix factorizations. Example: Eigenfaces

Structured matrix factorizations. Example: Eigenfaces Structured matrix factorizations Example: Eigenfaces An extremely large variety of interesting and important problems in machine learning can be formulated as: Given a matrix, find a matrix and a matrix

More information