Lecture 18 Nov 3rd, 2015
|
|
- Berenice Hunter
- 6 years ago
- Views:
Transcription
1 CS 229r: Algorithms for Big Data Fall 2015 Prof. Jelani Nelson Lecture 18 Nov 3rd, 2015 Scribe: Jefferson Lee 1 Overview Low-rank approximation, Compression Sensing 2 Last Time We looked at three different regression methods. The first was based on ε-subspace embeddings. The second was an iterative approach, building a well-conditioned matrix good for stochastic gradient descent. The third was formulated as follows: For the least square problem min x Sx b 2, which has optimal solution x = S + b, and approximate solution x = argmin x ΠSx Πx 2, we let x = Uα, w = Sx b, Uβ = S x Sx where S = UΣV T. We proved that (ΠU) T (ΠU)β = (ΠU) T Πw last time. These results from regression will appear in our work for low-rank approximation. 3 Low-rank approximation The basic idea is a huge matrix A R n d with n, d both very large - say, n users rating d movies. We might believe that the users are linear combinations of a few (k) basic types. We want to discover this low-rank structure. More formally: Given a matrix A R n d, we want to compute A k := argmin rank(b) k A B X. Some now argue that we should look for a non-negative matrix factorization; nevertheless, this version is still used. Theorem 1 (Eckart-Young). Let A = UΣV T be a singular-value decomposition of A where rank(a) = r and Σ is diagonal with entries σ 1 σ 2... σ r > 0, then under X = F, A k = U k Σ k Vk T is the minimizer where U k and V k are the first k columns of U and V and Σ k = diag(σ 1,..., σ k ). Our output is then U k, Σ k, V k. We can calculate A k in O(nd 2 ) time, by calculating the SVD of A. We would like to do better. First, a few definitions: Definition 2. Proj A B is the projection of the columns of B onto the colspace(a). Definition 3. Let A = UΣV T be a singular decomposition. A + = V Σ 1 U T is called Moore- Penrose pseudoinverse of A. 1
2 3.1 Algorithm Today we are going to use a sketch which is used both in subspace embedding and approximate matrix multiplication to compute Ãk with rank at most k such that A Ãk F (1+ɛ) A A k F, following Sarlós approach [8]. The first works which got some decent error (like ε A F ) was due to Papadimitriou [7] and Frieze, Kanna and Vempala [5]. Theorem 4. Define Ãk = Proj AΠ T,k(A). As long as Π R m n is an 1/2 subspace embedding for a certain k-dimensional subspace V k and satisfies approximate matrix multiplication with error ε/k, then A Ãk F (1 + O(ε)) A A k F, where Proj V,k (A) is the best rank k approximation to Proj V (A), i.e., projecting the columns of A to V. Before we prove this theorem, let us first convince ourselves that this algorithm is fast, and that we can compute Proj AΠ T,k(A) quickly. To satisfy the conditions in the above theorem, we know that Π R m d can be chosen with m = O(k/ε) e.g. using a random sign matrix (or slightly larger m using a faster subspace embedding). We need to multiply AΠ T. We can use a fast subspace embedding to compute AΠ T quickly, then we can compute the SVD of AΠ T = U Σ V T in O(nm 2 ) time. Let [ ] k denote the best rank-k approximation under Frobenius norm. We then want to compute [U U T A] k = U [U T A] k. Computing U T A takes O(mnd) time, then computing the SVD of U T A takes O(dm 2 ) time. Note that this is already better than the O(nd 2 ) time to compute the SVD of A, but we can do better if we approximate. In particular, by using the right combination of subspace embeddings, for constant ε the scheme described here can be made to take O(nnz(A)) + Õ(ndk) time (where Õ hides log n factors). We will shoot instead for O(nnz(A)) + Õ(nk2 ). Consider that: We want to compute Ãk = argmin X:rank(X) k U X A 2 F. If X+ is the argmin without the rank constraint, then the argmin with the rank constraint is [U X + ] k = U [X + ] k, where [ ] k denotes the best rank-k approximation under Frobenius error. Rather than find X +, we use approximate regression to find an approximately optimal X. That is, we compute X = argmin X Π U X Π A 2 F where Π is an α-subspace embedding for the column space of U (note U has rank m). Then our final output is U [ X] k. Why does the above work? (Thanks to Michael Cohen for describing the following simple argument.) First note ( ) 1 + α U X + A 2 F U X A 2 1 α F = (U X + A) + U ( X X + ) 2 F = U X + A 2 F + U ( X X + ) 2 F = U X + A 2 F + X X + 2 F 2
3 and thus X X + 2 F O(α) U X + A 2 F. The second equality above holds since the matrix U preserves Frobenius norms, and the first equality since U X + A has a column space orthogonal to the column space of U. Next, suppose f, f are two functions mapping the same domain to R such that f(x) f(x) η for all x in the domain. Then clearly f(argmin x f(x)) minx f(x) + 2η. Now, let the domain be the set of all rank-k matrices, and let f(z) = U X + Z F and f(z) = U X Z F. Then η = U X + U X F = X + X F. Thus U [ X] k A 2 F = U [ X] k U X + F + (I U U T )A 2 F ( U [X + ] k U X + F + 2 X + X F ) 2 + (I U U T )A 2 F ( U [X + ] k U X + F + O( α) U X + A F ) 2 + (I U U T )A 2 F = ( U [X + ] k U X + F + O( α) U X + A F ) 2 + U X + A 2 F = U [X + ] k U X + 2 F + O( α) U [X + ] k U X + F U X + A F + O(α) U X + A 2 F + U X + A 2 F = U [X + ] k A 2 F + O( α) U [X + ] k U X + F U X + A F + O(α) U X + A 2 F (1) (1 + O(α)) U [X + ] k A 2 F + O( α) U [X + ] k U X + F U X + A F (2) (1 + O(α)) U [X + ] k A 2 F + O( α) U [X + ] k A 2 F (3) = (1 + O( α)) U [X + ] k A 2 F where (1) used that U [X + ] k U X + + U X + A 2 F = U [X + ] k A 2 F + U [X + ] k U X + 2 F since U X + A has columns orthogonal to the column space of U. Also, (2) used that U X + A F U [X + ] k A F, since U X + is the best Frobenius approximation to A in the column space of U. Finally, (3) again used U X + A F U [X + ] k A F, and also used the triangle inequality U [X + ] k U X + F U [X + ] k A F + U X + A F 2 U [X + ] k A F. Thus we have established the following theorem, which follows from the above calculations and Theorem 4. Theorem 5. Let Π 1 R m 1 n be a 1/2 subspace embedding for a certain k-dimensional subspace V k, and suppose Π 1 also satisfies approximate matrix multiplication with error ε/k. Let Π 2 R m 2 n be an α-subspace embedding for the column space of U, where AΠ T 1 = U Σ V T is the SVD (and hence U has rank at most m 1 ). Let à k = U [ X] k where X = argmin Π 2 U X Π 2 A 2 F. X Then à k has rank k and A à k F (1 + O(ε) + O( α)) A A k F. In particular, the error is (1 + O(ε)) A A k F for α = ε. 3
4 In the remaining part of these lecture notes, we show that Proj AΠ T,k(A) actually is a good rank-k approximation to A (i.e. we prove Theorem 4). In the following proof, we will denote the first k columns of U and V as U k and V k and the remaining columns by U k and V k. Proof. Let Y be the column span of Proj AΠ T (A k ) and the orthogonal projection operator onto Y as P. Then, A Proj AΠ T,k(A) 2 F A P A 2 F = A k P A k 2 F + A k P A k 2 F Then we can bound the second term in that sum: A k = (I P )A k 2 F A k 1 F Now we just need to show that A k P A k 2 F ε A k 2 F : A P A 2 F = A k (AΠ T )(AΠ T ) + A k ) 2 F A k (AΠ T )(AΠ T ) + A k 2 F = = A T k AT k (ΠAT ) + (ΠA T ) 2 F n A T (i) k A T k (ΠA T ) + (ΠA T ) (i) 2 2 i=1 Here superscript (i) means the ith column. Now we have a bunch of different approximate regression problems which have the following form: min x ΠA T k x Π(AT ) (i) 2, which has optimal value x = (ΠA T k )+ (ΠA T ) (i). Consider the problem min x ΠA T k x (AT ) (i) 2 as original regression problem. In this case optimal x gives A T k x = Proj A T k ((A T ) (i) ) = (A T k )(i). Now we can use the analysis on the approximate least square from last week. In our problem, we have a bunch of w i, β i, α i with S = A T k = V kσ k Uk T and b i = (A T ) (i). Here, w i 2 = Sx b 2 = (A T k )(i) (A T ) (i) 2. Hence i w i 2 = A A k 2 F. On the other hand, i β i 2 = A T k AT k (ΠAT k )+ (ΠA T ) 2 F. Since (ΠV k) T (ΠV k )β i = (ΠV k ) T Πw i, if all singlar values of ΠV k are at least 1/2 1/4, we have i β i 2 2 i (ΠV k ) T (ΠV k )β i 2 = i (ΠV k ) T Πw i 2 = (ΠV k ) T ΠW T F where W has w i as ith column. What does it look like? (ΠV k ) T ΠW exactly look like approximate matrix multiplication of V k and W. Since columns of W and V k are orthogonal, we have Vk T W = 0, hence if Π is a sketch for approximate matrix multiplication of error ε = ε/k, then P Π ( (ΠV k ) T (ΠW ) 2 F > ε W 2 F ) < δ since V k 2 F = k. Clearly W 2 F = i w i 2 = A A k 2 F, we get the desired result. 4
5 3.2 Further results What we just talked about gives a good low-rank approximation but every column of à k is a linear combination of potentially all columns of A. In applications (e.g. information retrieval), we want a few number of columns be spanning our low dimensional subspace. There has been work on finding fewer columns of A (call them C) such that A (CC + A) k 2 F is small, but we will not talk about it deeply. Boutsidis et al. [1] showed that we can take C with 2k/ε columns and error ɛ A A k F. Guruswami and Sinop got C with k ε + k 1 columns such that A CC+ A F (1 + ɛ) A A k F. 3.3 K-Means as a Low-Rank Approximation Problem The k-means problem, which was stated on the problem set, involved a set of points (x 1,..., x n ) R d. Let A be the matrix with the ith row equal to x T i. Given a partition P(P 1,..., P k ) of points into k clusters, then the best centroids are averages of the clusters. Define the matrix X p R n k such that: 1 ifi P j (X p ) i,j = Pj 0 otherwise Note that X T p X p = I. It can e shown that the ith row of X p X T p A is the centroid of the cluster that X i belongs to. Thus, solving k-means is equivalent to finding some P = argmin A X p X T p A - this is a constrained rank-k approximation problem. Cohen et. al[3] show that Π can have m = O(k/ε 2 ) for a (1 + ε) approximation, or a m = O(lg k/ε 2 ) for a (9 + ε) approximation (the second bound is specifically for the k-means problem). It is an open problem whether this second bon can get a better approximation. 4 Compressed Sensing 4.1 Basic Idea Consider x R n. If x is a k sparse vector, we could represent it in a far more compressed manner. Thus, we define a measure of how compressible a vector is as a measure of how close it is to being k sparse. Definition 6. Let x head(k) be the k elements of largest magnitude in x. Let x tail(k) be the rest of x. Therefore, we call x compressible if x tail(k) is small. The goal here is to approximately recover x from few linear measurements. Consider we have a matrix Πx such that each the ith row is equal to α i, x for some α 1,..., α m R n. We want to recover a x from ΠX such that x x p C ε,p,q x tail(k) q, where C ε,p,q is some constant dependent on ε, p and q. Depending on the problem formulation, I may or may not get to choose this matrix Π. 5
6 4.2 Approximate Sparsity There are many practical applications in which approximately sparse vectors appear. Pixelated images, for example, are usually approximately sparse in some basis U. For example, consider an n by n image x R n2. then x = Uy for some basis U, and y is approximately sparse. Thus we can get measurements from ΠU y. Images are typically sparse in the wavelet basis. We will describe how to transform to the Haar wavelet basis here. Assume that n is a power of two. Then: 1. Break the image x into squares of size four pixels. 2. Initialize a new image, with four regions R 1, R 2, R 3, R Each block of four pixels, b, in x has a corresponding single pixel in each of R 1b, R 2b, R 3b, and R 4b based on its location. For each block of four b: Let the b have pixel values p 1, p 2, p 3, and p 4. R 1b 1 4 (p 1 + p 2 + p 3 + p 4 ) R 2b 1 4 (p 1 p 2 + p 3 p 4 ) R 3b 1 4 (p 1 p 2 p 3 + p 4 ) R 4b 1 4 (p 1 p 2 + p 3 p 4 ) 4. Recurse on R 1, R 2, R 3, and R 4. The general idea is this: usually, pixels are relatively constant in certain regions. Thus, the values in all regions except for the first are usually relatively small. If you view images after this transform, the upper left hand regions will often be closer to white, while the rest will be relatively sparse. Theorem 7 (Candès, Romberg, Tao [2], Donoho [4]). There exists a Π R m n with m = O(klg(n/k)) and a poly-time algorithm Alg s.t. if x = Alg(Πx) then x x 2 O(k 1/2 ) x tail(k) 1 If x is actually k-spares, 2k measurements are necessary and sufficient. We will see this by examining Prony s method in one of our problem sets, and investigate compressed sensing further next class. References [1] Christos Boutsidis, Petros Drineas, Malik Magdon-Ismail. Near Optimal Column-based Matrix Reconstruction. FOCS, , [2] Emmanuel J. Candès, Justin K. Romberg, Terence Tao. Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Transactions on Information Theory, 52(2): , [3] Michael B. Cohen, Sam Elder, Cameron Musco, Christopher Musco, Madalina Persu. Dimensionality Reduction for k-means Clustering and Low Rank Approximation. STOC, ,
7 [4] David L. Donoho. Compressed sensing, IEEE Transactions on Information Theory, 52(4): , [5] Alan M. Frieze, Ravi Kannan, Santosh Vempala. Fast Monte-carlo Algorithms for Finding Low-rank Approximations. J. ACM, 51(6): , [6] Venkatesan Guruswami, Ali Kemal Sinop. Optimal Column-based Low-rank Matrix Reconstruction. SODA, , [7] Christos H. Papadimitriou, Prabhakar Raghavan, Hisao Tamaki, Santosh Vempala. Latent Semantic Indexing: A Probabilistic Analysis. J. Comput. Syst. Sci., 61(2): , [8] Tamás Sarlós. Improved Approximation Algorithms for Large Matrices via Random Projections. FOCS, ,
CS 229r: Algorithms for Big Data Fall Lecture 17 10/28
CS 229r: Algorithms for Big Data Fall 2015 Prof. Jelani Nelson Lecture 17 10/28 Scribe: Morris Yau 1 Overview In the last lecture we defined subspace embeddings a subspace embedding is a linear transformation
More informationdimensionality reduction for k-means and low rank approximation
dimensionality reduction for k-means and low rank approximation Michael Cohen, Sam Elder, Cameron Musco, Christopher Musco, Mădălina Persu Massachusetts Institute of Technology 0 overview Simple techniques
More informationto be more efficient on enormous scale, in a stream, or in distributed settings.
16 Matrix Sketching The singular value decomposition (SVD) can be interpreted as finding the most dominant directions in an (n d) matrix A (or n points in R d ). Typically n > d. It is typically easy to
More informationCS 229r: Algorithms for Big Data Fall Lecture 19 Nov 5
CS 229r: Algorithms for Big Data Fall 215 Prof. Jelani Nelson Lecture 19 Nov 5 Scribe: Abdul Wasay 1 Overview In the last lecture, we started discussing the problem of compressed sensing where we are given
More informationRandomized Numerical Linear Algebra: Review and Progresses
ized ized SVD ized : Review and Progresses Zhihua Department of Computer Science and Engineering Shanghai Jiao Tong University The 12th China Workshop on Machine Learning and Applications Xi an, November
More informationLecture 9: Matrix approximation continued
0368-348-01-Algorithms in Data Mining Fall 013 Lecturer: Edo Liberty Lecture 9: Matrix approximation continued Warning: This note may contain typos and other inaccuracies which are usually discussed during
More informationsublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU)
sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU) 0 overview Our Contributions: 1 overview Our Contributions: A near optimal low-rank
More informationRandomized algorithms for the approximation of matrices
Randomized algorithms for the approximation of matrices Luis Rademacher The Ohio State University Computer Science and Engineering (joint work with Amit Deshpande, Santosh Vempala, Grant Wang) Two topics
More informationColumn Selection via Adaptive Sampling
Column Selection via Adaptive Sampling Saurabh Paul Global Risk Sciences, Paypal Inc. saupaul@paypal.com Malik Magdon-Ismail CS Dept., Rensselaer Polytechnic Institute magdon@cs.rpi.edu Petros Drineas
More informationSketching as a Tool for Numerical Linear Algebra
Sketching as a Tool for Numerical Linear Algebra (Part 2) David P. Woodruff presented by Sepehr Assadi o(n) Big Data Reading Group University of Pennsylvania February, 2015 Sepehr Assadi (Penn) Sketching
More informationDimensionality Reduction Notes 3
Dimensionality Reduction Notes 3 Jelani Nelson minilek@seas.harvard.edu August 13, 2015 1 Gordon s theorem Let T be a finite subset of some normed vector space with norm X. We say that a sequence T 0 T
More informationEECS 275 Matrix Computation
EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 22 1 / 21 Overview
More informationRelative-Error CUR Matrix Decompositions
RandNLA Reading Group University of California, Berkeley Tuesday, April 7, 2015. Motivation study [low-rank] matrix approximations that are explicitly expressed in terms of a small numbers of columns and/or
More informationApproximate Spectral Clustering via Randomized Sketching
Approximate Spectral Clustering via Randomized Sketching Christos Boutsidis Yahoo! Labs, New York Joint work with Alex Gittens (Ebay), Anju Kambadur (IBM) The big picture: sketch and solve Tradeoff: Speed
More informationA Fast Algorithm For Computing The A-optimal Sampling Distributions In A Big Data Linear Regression
A Fast Algorithm For Computing The A-optimal Sampling Distributions In A Big Data Linear Regression Hanxiang Peng and Fei Tan Indiana University Purdue University Indianapolis Department of Mathematical
More informationEfficiently Implementing Sparsity in Learning
Efficiently Implementing Sparsity in Learning M. Magdon-Ismail Rensselaer Polytechnic Institute (Joint Work) December 9, 2013. Out-of-Sample is What Counts NO YES A pattern exists We don t know it We have
More informationA randomized algorithm for approximating the SVD of a matrix
A randomized algorithm for approximating the SVD of a matrix Joint work with Per-Gunnar Martinsson (U. of Colorado) and Vladimir Rokhlin (Yale) Mark Tygert Program in Applied Mathematics Yale University
More informationCompressed Sensing and Robust Recovery of Low Rank Matrices
Compressed Sensing and Robust Recovery of Low Rank Matrices M. Fazel, E. Candès, B. Recht, P. Parrilo Electrical Engineering, University of Washington Applied and Computational Mathematics Dept., Caltech
More informationStrengthened Sobolev inequalities for a random subspace of functions
Strengthened Sobolev inequalities for a random subspace of functions Rachel Ward University of Texas at Austin April 2013 2 Discrete Sobolev inequalities Proposition (Sobolev inequality for discrete images)
More informationRandNLA: Randomized Numerical Linear Algebra
RandNLA: Randomized Numerical Linear Algebra Petros Drineas Rensselaer Polytechnic Institute Computer Science Department To access my web page: drineas RandNLA: sketch a matrix by row/ column sampling
More informationOptimality of the Johnson-Lindenstrauss Lemma
Optimality of the Johnson-Lindenstrauss Lemma Kasper Green Larsen Jelani Nelson September 7, 2016 Abstract For any integers d, n 2 and 1/(min{n, d}) 0.4999 < ε < 1, we show the existence of a set of n
More informationEE 381V: Large Scale Optimization Fall Lecture 24 April 11
EE 381V: Large Scale Optimization Fall 2012 Lecture 24 April 11 Lecturer: Caramanis & Sanghavi Scribe: Tao Huang 24.1 Review In past classes, we studied the problem of sparsity. Sparsity problem is that
More informationrandomized block krylov methods for stronger and faster approximate svd
randomized block krylov methods for stronger and faster approximate svd Cameron Musco and Christopher Musco December 2, 25 Massachusetts Institute of Technology, EECS singular value decomposition n d left
More informationarxiv: v2 [cs.ds] 17 Feb 2016
Efficient Algorithm for Sparse Matrices Mina Ghashami University of Utah ghashami@cs.utah.edu Edo Liberty Yahoo Labs edo.liberty@yahoo.com Jeff M. Phillips University of Utah jeffp@cs.utah.edu arxiv:1602.00412v2
More informationLow-Rank PSD Approximation in Input-Sparsity Time
Low-Rank PSD Approximation in Input-Sparsity Time Kenneth L. Clarkson IBM Research Almaden klclarks@us.ibm.com David P. Woodruff IBM Research Almaden dpwoodru@us.ibm.com Abstract We give algorithms for
More informationSketching as a Tool for Numerical Linear Algebra
Sketching as a Tool for Numerical Linear Algebra David P. Woodruff presented by Sepehr Assadi o(n) Big Data Reading Group University of Pennsylvania February, 2015 Sepehr Assadi (Penn) Sketching for Numerical
More informationarxiv: v6 [cs.ds] 11 Jul 2012
Simple and Deterministic Matrix Sketching Edo Liberty arxiv:1206.0594v6 [cs.ds] 11 Jul 2012 Abstract We adapt a well known streaming algorithm for approximating item frequencies to the matrix sketching
More informationBindel, Fall 2009 Matrix Computations (CS 6210) Week 8: Friday, Oct 17
Logistics Week 8: Friday, Oct 17 1. HW 3 errata: in Problem 1, I meant to say p i < i, not that p i is strictly ascending my apologies. You would want p i > i if you were simply forming the matrices and
More information7 Principal Component Analysis
7 Principal Component Analysis This topic will build a series of techniques to deal with high-dimensional data. Unlike regression problems, our goal is not to predict a value (the y-coordinate), it is
More informationCS60021: Scalable Data Mining. Dimensionality Reduction
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 1 CS60021: Scalable Data Mining Dimensionality Reduction Sourangshu Bhattacharya Assumption: Data lies on or near a
More informationRandNLA: Randomization in Numerical Linear Algebra
RandNLA: Randomization in Numerical Linear Algebra Petros Drineas Department of Computer Science Rensselaer Polytechnic Institute To access my web page: drineas Why RandNLA? Randomization and sampling
More informationRandomized Algorithms for Matrix Computations
Randomized Algorithms for Matrix Computations Ilse Ipsen Students: John Holodnak, Kevin Penner, Thomas Wentworth Research supported in part by NSF CISE CCF, DARPA XData Randomized Algorithms Solve a deterministic
More informationDense Error Correction for Low-Rank Matrices via Principal Component Pursuit
Dense Error Correction for Low-Rank Matrices via Principal Component Pursuit Arvind Ganesh, John Wright, Xiaodong Li, Emmanuel J. Candès, and Yi Ma, Microsoft Research Asia, Beijing, P.R.C Dept. of Electrical
More informationMulti-Linear Mappings, SVD, HOSVD, and the Numerical Solution of Ill-Conditioned Tensor Least Squares Problems
Multi-Linear Mappings, SVD, HOSVD, and the Numerical Solution of Ill-Conditioned Tensor Least Squares Problems Lars Eldén Department of Mathematics, Linköping University 1 April 2005 ERCIM April 2005 Multi-Linear
More informationPrincipal Component Analysis
Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used
More informationCan matrix coherence be efficiently and accurately estimated?
Mehryar Mohri Courant Institute and Google Research New York, NY mohri@cs.nyu.edu Ameet Talwalkar Computer Science Division University of California, Berkeley ameet@eecs.berkeley.edu Abstract Matrix coherence
More informationLecture 9: Low Rank Approximation
CSE 521: Design and Analysis of Algorithms I Fall 2018 Lecture 9: Low Rank Approximation Lecturer: Shayan Oveis Gharan February 8th Scribe: Jun Qi Disclaimer: These notes have not been subjected to the
More informationLecture 24: Element-wise Sampling of Graphs and Linear Equation Solving. 22 Element-wise Sampling of Graphs and Linear Equation Solving
Stat260/CS294: Randomized Algorithms for Matrices and Data Lecture 24-12/02/2013 Lecture 24: Element-wise Sampling of Graphs and Linear Equation Solving Lecturer: Michael Mahoney Scribe: Michael Mahoney
More informationSparse Features for PCA-Like Linear Regression
Sparse Features for PCA-Like Linear Regression Christos Boutsidis Mathematical Sciences Department IBM T J Watson Research Center Yorktown Heights, New York cboutsi@usibmcom Petros Drineas Computer Science
More informationSketched Ridge Regression:
Sketched Ridge Regression: Optimization and Statistical Perspectives Shusen Wang UC Berkeley Alex Gittens RPI Michael Mahoney UC Berkeley Overview Ridge Regression min w f w = 1 n Xw y + γ w Over-determined:
More informationUsing Matrix Decompositions in Formal Concept Analysis
Using Matrix Decompositions in Formal Concept Analysis Vaclav Snasel 1, Petr Gajdos 1, Hussam M. Dahwa Abdulla 1, Martin Polovincak 1 1 Dept. of Computer Science, Faculty of Electrical Engineering and
More informationLecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora
princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora Scribe: Today we continue the
More informationFast Approximate Matrix Multiplication by Solving Linear Systems
Electronic Colloquium on Computational Complexity, Report No. 117 (2014) Fast Approximate Matrix Multiplication by Solving Linear Systems Shiva Manne 1 and Manjish Pal 2 1 Birla Institute of Technology,
More informationAN INTRODUCTION TO COMPRESSIVE SENSING
AN INTRODUCTION TO COMPRESSIVE SENSING Rodrigo B. Platte School of Mathematical and Statistical Sciences APM/EEE598 Reverse Engineering of Complex Dynamical Networks OUTLINE 1 INTRODUCTION 2 INCOHERENCE
More informationLatent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology
Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2014 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276,
More informationProblem set 5: SVD, Orthogonal projections, etc.
Problem set 5: SVD, Orthogonal projections, etc. February 21, 2017 1 SVD 1. Work out again the SVD theorem done in the class: If A is a real m n matrix then here exist orthogonal matrices such that where
More informationProperties of Matrices and Operations on Matrices
Properties of Matrices and Operations on Matrices A common data structure for statistical analysis is a rectangular array or matris. Rows represent individual observational units, or just observations,
More informationJeffrey D. Ullman Stanford University
Jeffrey D. Ullman Stanford University 2 Often, our data can be represented by an m-by-n matrix. And this matrix can be closely approximated by the product of two matrices that share a small common dimension
More informationSVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)
Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular
More information14 Singular Value Decomposition
14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing
More informationLecture 9: September 28
0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These
More informationMatrices, Vector Spaces, and Information Retrieval
Matrices, Vector Spaces, and Information Authors: M. W. Berry and Z. Drmac and E. R. Jessup SIAM 1999: Society for Industrial and Applied Mathematics Speaker: Mattia Parigiani 1 Introduction Large volumes
More informationSimple and Deterministic Matrix Sketches
Simple and Deterministic Matrix Sketches Edo Liberty + ongoing work with: Mina Ghashami, Jeff Philips and David Woodruff. Edo Liberty: Simple and Deterministic Matrix Sketches 1 / 41 Data Matrices Often
More informationDATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD
DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Numerical Linear Algebra Background Cho-Jui Hsieh UC Davis May 15, 2018 Linear Algebra Background Vectors A vector has a direction and a magnitude
More informationSingular Value Decompsition
Singular Value Decompsition Massoud Malek One of the most useful results from linear algebra, is a matrix decomposition known as the singular value decomposition It has many useful applications in almost
More informationRandomness-in-Structured Ensembles for Compressed Sensing of Images
Randomness-in-Structured Ensembles for Compressed Sensing of Images Abdolreza Abdolhosseini Moghadam Dep. of Electrical and Computer Engineering Michigan State University Email: abdolhos@msu.edu Hayder
More informationInformation Retrieval
Introduction to Information CS276: Information and Web Search Christopher Manning and Pandu Nayak Lecture 13: Latent Semantic Indexing Ch. 18 Today s topic Latent Semantic Indexing Term-document matrices
More informationLecture 5 : Projections
Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization
More informationLatent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology
Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2016 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276,
More informationUNIT 6: The singular value decomposition.
UNIT 6: The singular value decomposition. María Barbero Liñán Universidad Carlos III de Madrid Bachelor in Statistics and Business Mathematical methods II 2011-2012 A square matrix is symmetric if A T
More informationLecture 12: Randomized Least-squares Approximation in Practice, Cont. 12 Randomized Least-squares Approximation in Practice, Cont.
Stat60/CS94: Randomized Algorithms for Matrices and Data Lecture 1-10/14/013 Lecture 1: Randomized Least-squares Approximation in Practice, Cont. Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning:
More informationLow Rank Matrix Approximation
Low Rank Matrix Approximation John T. Svadlenka Ph.D. Program in Computer Science The Graduate Center of the City University of New York New York, NY 10036 USA jsvadlenka@gradcenter.cuny.edu Abstract Low
More informationRobust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds
Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds Tao Wu Institute for Mathematics and Scientific Computing Karl-Franzens-University of Graz joint work with Prof.
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Lecture 5: Numerical Linear Algebra Cho-Jui Hsieh UC Davis April 20, 2017 Linear Algebra Background Vectors A vector has a direction and a magnitude
More informationIV. Matrix Approximation using Least-Squares
IV. Matrix Approximation using Least-Squares The SVD and Matrix Approximation We begin with the following fundamental question. Let A be an M N matrix with rank R. What is the closest matrix to A that
More information10-725/36-725: Convex Optimization Prerequisite Topics
10-725/36-725: Convex Optimization Prerequisite Topics February 3, 2015 This is meant to be a brief, informal refresher of some topics that will form building blocks in this course. The content of the
More informationIEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 61, NO. 2, FEBRUARY
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 61, NO. 2, FEBRUARY 2015 1045 Randomized Dimensionality Reduction for k-means Clustering Christos Boutsidis, Anastasios Zouzias, Michael W. Mahoney, and Petros
More informationUsing SVD to Recommend Movies
Michael Percy University of California, Santa Cruz Last update: December 12, 2009 Last update: December 12, 2009 1 / Outline 1 Introduction 2 Singular Value Decomposition 3 Experiments 4 Conclusion Last
More informationLecture 13 October 6, Covering Numbers and Maurey s Empirical Method
CS 395T: Sublinear Algorithms Fall 2016 Prof. Eric Price Lecture 13 October 6, 2016 Scribe: Kiyeon Jeon and Loc Hoang 1 Overview In the last lecture we covered the lower bound for p th moment (p > 2) and
More informationLecture Notes 5: Multiresolution Analysis
Optimization-based data analysis Fall 2017 Lecture Notes 5: Multiresolution Analysis 1 Frames A frame is a generalization of an orthonormal basis. The inner products between the vectors in a frame and
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu
More informationGI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil
GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection
More informationPseudoinverse & Moore-Penrose Conditions
ECE 275AB Lecture 7 Fall 2008 V1.0 c K. Kreutz-Delgado, UC San Diego p. 1/1 Lecture 7 ECE 275A Pseudoinverse & Moore-Penrose Conditions ECE 275AB Lecture 7 Fall 2008 V1.0 c K. Kreutz-Delgado, UC San Diego
More informationTHE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR
THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR 1. Definition Existence Theorem 1. Assume that A R m n. Then there exist orthogonal matrices U R m m V R n n, values σ 1 σ 2... σ p 0 with p = min{m, n},
More informationLinear Algebra for Machine Learning. Sargur N. Srihari
Linear Algebra for Machine Learning Sargur N. srihari@cedar.buffalo.edu 1 Overview Linear Algebra is based on continuous math rather than discrete math Computer scientists have little experience with it
More informationThe University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.
The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment Two Caramanis/Sanghavi Due: Tuesday, Feb. 19, 2013. Computational
More informationarxiv: v2 [cs.ds] 1 May 2013
Dimension Independent Matrix Square using MapReduce arxiv:1304.1467v2 [cs.ds] 1 May 2013 Reza Bosagh Zadeh Institute for Computational and Mathematical Engineering rezab@stanford.edu Gunnar Carlsson Mathematics
More informationSTAT 309: MATHEMATICAL COMPUTATIONS I FALL 2013 PROBLEM SET 2
STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2013 PROBLEM SET 2 1. You are not allowed to use the svd for this problem, i.e. no arguments should depend on the svd of A or A. Let W be a subspace of C n. The
More informationSingular Value Decomposition
Singular Value Decomposition CS 205A: Mathematical Methods for Robotics, Vision, and Graphics Doug James (and Justin Solomon) CS 205A: Mathematical Methods Singular Value Decomposition 1 / 35 Understanding
More informationSignal Recovery from Permuted Observations
EE381V Course Project Signal Recovery from Permuted Observations 1 Problem Shanshan Wu (sw33323) May 8th, 2015 We start with the following problem: let s R n be an unknown n-dimensional real-valued signal,
More informationData Mining Lecture 4: Covariance, EVD, PCA & SVD
Data Mining Lecture 4: Covariance, EVD, PCA & SVD Jo Houghton ECS Southampton February 25, 2019 1 / 28 Variance and Covariance - Expectation A random variable takes on different values due to chance The
More informationFast Monte Carlo Algorithms for Matrix Operations & Massive Data Set Analysis
Fast Monte Carlo Algorithms for Matrix Operations & Massive Data Set Analysis Michael W. Mahoney Yale University Dept. of Mathematics http://cs-www.cs.yale.edu/homes/mmahoney Joint work with: P. Drineas
More informationTighter Low-rank Approximation via Sampling the Leveraged Element
Tighter Low-rank Approximation via Sampling the Leveraged Element Srinadh Bhojanapalli The University of Texas at Austin bsrinadh@utexas.edu Prateek Jain Microsoft Research, India prajain@microsoft.com
More informationDimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas
Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx
More information9 Searching the Internet with the SVD
9 Searching the Internet with the SVD 9.1 Information retrieval Over the last 20 years the number of internet users has grown exponentially with time; see Figure 1. Trying to extract information from this
More informationEE731 Lecture Notes: Matrix Computations for Signal Processing
EE731 Lecture Notes: Matrix Computations for Signal Processing James P. Reilly c Department of Electrical and Computer Engineering McMaster University September 22, 2005 0 Preface This collection of ten
More informationFast low rank approximations of matrices and tensors
Fast low rank approximations of matrices and tensors S. Friedland, V. Mehrmann, A. Miedlar and M. Nkengla Univ. Illinois at Chicago & Technische Universität Berlin Gene Golub memorial meeting, Berlin,
More informationSensing systems limited by constraints: physical size, time, cost, energy
Rebecca Willett Sensing systems limited by constraints: physical size, time, cost, energy Reduce the number of measurements needed for reconstruction Higher accuracy data subject to constraints Original
More informationConditions for Robust Principal Component Analysis
Rose-Hulman Undergraduate Mathematics Journal Volume 12 Issue 2 Article 9 Conditions for Robust Principal Component Analysis Michael Hornstein Stanford University, mdhornstein@gmail.com Follow this and
More informationCollaborative Filtering: A Machine Learning Perspective
Collaborative Filtering: A Machine Learning Perspective Chapter 6: Dimensionality Reduction Benjamin Marlin Presenter: Chaitanya Desai Collaborative Filtering: A Machine Learning Perspective p.1/18 Topics
More informationQuick Introduction to Nonnegative Matrix Factorization
Quick Introduction to Nonnegative Matrix Factorization Norm Matloff University of California at Davis 1 The Goal Given an u v matrix A with nonnegative elements, we wish to find nonnegative, rank-k matrices
More informationOSNAP: Faster numerical linear algebra algorithms via sparser subspace embeddings
OSNAP: Faster numerical linear algebra algorithms via sparser subspace embeddings Jelani Nelson Huy L. Nguy ên Abstract An oblivious subspace embedding (OSE) given some parameters ε, d is a distribution
More informationRandomized Algorithms in Linear Algebra and Applications in Data Analysis
Randomized Algorithms in Linear Algebra and Applications in Data Analysis Petros Drineas Rensselaer Polytechnic Institute Computer Science Department To access my web page: drineas Why linear algebra?
More informationRandom Methods for Linear Algebra
Gittens gittens@acm.caltech.edu Applied and Computational Mathematics California Institue of Technology October 2, 2009 Outline The Johnson-Lindenstrauss Transform 1 The Johnson-Lindenstrauss Transform
More informationLecture 16 Oct. 26, 2017
Sketching Algorithms for Big Data Fall 2017 Prof. Piotr Indyk Lecture 16 Oct. 26, 2017 Scribe: Chi-Ning Chou 1 Overview In the last lecture we constructed sparse RIP 1 matrix via expander and showed that
More informationarxiv: v2 [stat.ml] 29 Nov 2018
Randomized Iterative Algorithms for Fisher Discriminant Analysis Agniva Chowdhury Jiasen Yang Petros Drineas arxiv:1809.03045v2 [stat.ml] 29 Nov 2018 Abstract Fisher discriminant analysis FDA is a widely
More informationStructured matrix factorizations. Example: Eigenfaces
Structured matrix factorizations Example: Eigenfaces An extremely large variety of interesting and important problems in machine learning can be formulated as: Given a matrix, find a matrix and a matrix
More information