Lecture 9: SVD, Low Rank Approximation

Similar documents
Lecture 9: Low Rank Approximation

Lecture 8: Linear Algebra Background

Lecture 1 and 2: Random Spanning Trees

Lecture 12: Introduction to Spectral Graph Theory, Cheeger s inequality

Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Lecture 8 : Eigenvalues and Eigenvectors

1 Linearity and Linear Systems

Chapter 3 Transformations

Lecture 4: September 12

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

Assignment #10: Diagonalization of Symmetric Matrices, Quadratic Forms, Optimization, Singular Value Decomposition. Name:

Introduction to Data Mining

Lecture 4: January 26

Multivariate Statistical Analysis

Lecture 13: Spectral Graph Theory

Review of Some Concepts from Linear Algebra: Part 2

Lecture 1: Contraction Algorithm

Lecture 26: April 22nd

Lecture 14: Random Walks, Local Graph Clustering, Linear Programming

Review of Linear Algebra

Review of Linear Algebra

Geometric Modeling Summer Semester 2010 Mathematical Tools (1)

Fall TMA4145 Linear Methods. Exercise set Given the matrix 1 2

Lecture 02 Linear Algebra Basics

Review problems for MA 54, Fall 2004.

Quantum Computing Lecture 2. Review of Linear Algebra

Linear Algebra Review. Vectors

Lecture 13 Spectral Graph Algorithms

Matrix decompositions

Statistical Geometry Processing Winter Semester 2011/2012

Latent Semantic Models. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze

Singular Value Decompsition

Ma/CS 6b Class 23: Eigenvalues in Regular Graphs

Background Mathematics (2/2) 1. David Barber

EE 381V: Large Scale Learning Spring Lecture 16 March 7

Example Linear Algebra Competency Test

Linear Algebra (Review) Volker Tresp 2017

Linear Algebra Methods for Data Mining

Lecture 4: Hashing and Streaming Algorithms

be a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u

Lecture 7: Schwartz-Zippel Lemma, Perfect Matching. 1.1 Polynomial Identity Testing and Schwartz-Zippel Lemma

Using SVD to Recommend Movies

Linear Algebra, part 3. Going back to least squares. Mathematical Models, Analysis and Simulation = 0. a T 1 e. a T n e. Anna-Karin Tornberg

Lecture 7: Positive Semidefinite Matrices

Large Scale Data Analysis Using Deep Learning

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

Lecture 3: Review of Linear Algebra

σ 11 σ 22 σ pp 0 with p = min(n, m) The σ ii s are the singular values. Notation change σ ii A 1 σ 2

Krylov Subspace Methods to Calculate PageRank

Throughout these notes we assume V, W are finite dimensional inner product spaces over C.

1 Singular Value Decomposition and Principal Component

8.1 Concentration inequality for Gaussian random matrix (cont d)

linearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice

Lecture notes on Quantum Computing. Chapter 1 Mathematical Background

Properties of Matrices and Operations on Matrices

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.

Lecture 7 Spectral methods

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Stat 159/259: Linear Algebra Notes

The Singular Value Decomposition

Linear Algebra, part 3 QR and SVD

SINGULAR VALUE DECOMPOSITION

CS60021: Scalable Data Mining. Dimensionality Reduction

Applied Linear Algebra in Geoscience Using MATLAB

Lecture 2: Linear Algebra Review

14 Singular Value Decomposition

Lecture 2: Linear Algebra

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

Lecture 16: Roots of the Matching Polynomial

Singular Value Decomposition

Math 4153 Exam 3 Review. The syllabus for Exam 3 is Chapter 6 (pages ), Chapter 7 through page 137, and Chapter 8 through page 182 in Axler.

1 Last time: least-squares problems

j=1 u 1jv 1j. 1/ 2 Lemma 1. An orthogonal set of vectors must be linearly independent.

Lecture 12 Eigenvalue Problem. Review of Eigenvalues Some properties Power method Shift method Inverse power method Deflation QR Method

LESSON 35: EIGENVALUES AND EIGENVECTORS APRIL 21, (1) We might also write v as v. Both notations refer to a vector.

Lecture 6 Positive Definite Matrices

Linear Algebra for Machine Learning. Sargur N. Srihari

2.1 Laplacian Variants

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

MATH 304 Linear Algebra Lecture 34: Review for Test 2.

Contents. Preface for the Instructor. Preface for the Student. xvii. Acknowledgments. 1 Vector Spaces 1 1.A R n and C n 2

Lecture 3: Review of Linear Algebra

Linear algebra and applications to graphs Part 1

First of all, the notion of linearity does not depend on which coordinates are used. Recall that a map T : R n R m is linear if

UNCONSTRAINED OPTIMIZATION PAUL SCHRIMPF OCTOBER 24, 2013

The Eigenvalue Problem: Perturbation Theory

Conceptual Questions for Review

Math 407: Linear Optimization

Final Exam, Linear Algebra, Fall, 2003, W. Stephen Wilson

Linear Least Squares. Using SVD Decomposition.

Singular Value Decomposition and Principal Component Analysis (PCA) I

1. What is the determinant of the following matrix? a 1 a 2 4a 3 2a 2 b 1 b 2 4b 3 2b c 1. = 4, then det

Linear Algebra: Matrix Eigenvalue Problems

Spectral Graph Theory Lecture 2. The Laplacian. Daniel A. Spielman September 4, x T M x. ψ i = arg min

Linear Algebra (Review) Volker Tresp 2018

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2017 LECTURE 5

Image Registration Lecture 2: Vectors and Matrices

Transcription:

CSE 521: Design and Analysis of Algorithms I Spring 2016 Lecture 9: SVD, Low Rank Approimation Lecturer: Shayan Oveis Gharan April 25th Scribe: Koosha Khalvati Disclaimer: hese notes have not been subjected to the usual scrutiny reserved for formal publications. 9.1 Singular Value Decomposition (SVD) Claim 9.1. For matri M R m n (m n), there eist two sets of k orthonormal columns v 1,..., v k and u 1,..., u k such that: k M = σ i u i vi where for each i, σ > 0 and k min{m, n}. In the above representation, σ i is called a singular value, u i is a left singular vector, and v i is a right singular vector. Proof. o prove the eistence of these two sets of columns, we construct them: First, note that M M R n n is a symmetric and positive semidefinite matri. o see the latter observe that for any vector R n, (M M) =, M 2 0. Secondly, by the spectral theorem, there eists eigenvalues λ 1,..., λ n with corresponding orthonormal eigenvectors v 1,..., v n such that: M M = λ i v i vi Use these v i s (eigenvectors of M M), the right singular vectors in SVD of M. Note that in the above representation for all i, λ i 0. We also need to throw away all v i, λ i where λ i = 0. It remains to construct u i. We let each u i be proportional to Mv i. Firstly, observe that for any i, j, Mv i, Mv j are orthogonal, { Mv i, Mv j = M λ i if i = j Mv i, v j = λ i v i, v j = (9.1) 0 otherwise. So, we just need to normalize u i. Define, u i = Mv i Mv i = Mv i λi = Mv i σ i Note that singular values σ i are equal to λ i ; since M M is PSD, λ i 0 and σ i is well defined. In particular, observe that if M is a symmetric matri, σ i is the absolute value of the i-th eigenvalue of M. Now, we want to show that these v i s and u i s meet SVD conditions. Recall that v i s are orthonormal because they are eigenvectors of M M, and u i s are orthonormal by (9.1). We claim that M = σ i u i v i 9-1

9-2 Lecture 9: SVD, Low Rank Approimation. One way to prove that two operators are the same is to show that they act identically on a set of basis vectors w 1,..., w n. In particular, we show that for any j, Mv j = ( σi u i v i ) v j. (9.2) If the above holds for all i, since v 1,..., v n form a basis, for any vector v one can write Mv = M v j v, v j = j=1 So, the two matri are indeed equal. It remains to prove (9.2) for all j. ( σi u i v i Mv j v, v j = j=1 ) v j = k ( σi u i vi j=1 σ i u i v i, v j = σ j u j = σ j Mv j σ j = Mv j. ) ( ) v j v, v i = σi u i vi v. his is because v i, v j is equal to 0 when i and j are different and it is equal to 1 when i = j. In the last equality we used the definition of u i. his proves (9.2), as desired. 9.2 Low Rank Approimation In the rest of this lecture and part of the net one we study low rank approimation of matrices. First, let s define the rank of the matri: here are many ways one can define the rank of a matri. Rank of Matri M, rank(m), is the number of linearly independent columns in M. It is also equal to the number of linearly independent rows in M. In addition, it is equal to the number of none-zero eigenvalues (or singular values) of M. It is possible to prove that all of the definitions above are one concept. 9.2.1 Motivation In the low rank approimation, the goal is to approimate a given matri M, with a low rank Matri, i.e. M k such that M M k is approimately zero. We have not specified the norm; in general one should choose the norm based on the specific application. his area is also known as principal component analysis. It has a similar flavor as the dimension reduction technique that we studied a few lectures ago. We have a high dimensional data represented as a matri and we want to approimate it with a much lower dimensional object. Note that a rank k approimation of M can be epressed using only k singular values and the corresponding k singular vectors. So, in principal, a rank k approimation only needs O(kn) bits of memory. So, at a very high level one can use low rank approimation to approimately store an m n matri with only O(kn) bits. In the net lecture we will see some of its applications in optimization. Besides these applications, one can use low rank approimation to reveal hidden structures in a given data set. Let us give several eamples of this phenomenon.

Lecture 9: SVD, Low Rank Approimation 9-3 9.2.2 Application1: Consumer-Product Matri A Consumer-product matri is a matri (M R n d ) where each row corresponds to a consumer and each column corresponds to a product. Entry i, j of the matri represents the probability that consumer i purchases product j. We can hypothesize that there are of k hidden features in consumers such as age, gender, and annual income, etc and the decision of each consumer is only a function of these hidden features. With this hypothesis we can rewrite M = AB as a product of two matrices: a factor weigh matri, A R n k and a purchase probability matri B R k d. In particular, each row of A represents a consumer as a weighted sum of the k underlying features and each column of B represents the purchase probability for a consumer with only one feature. Ideally, all elements of consumer-product matri are available and we can use a low rank approimation of M to find these k hidden features for a small value of k. In the real world however, we usually have only some of the elements of the matri. So, our goal is to predict the unknown elements. here are several eamples of such a question. In the Netfli challenge, we were given a partial ratings of the Netfli users and our goal was to predict the rating of each user for the rest of the movies. In online advertisement, the goal is to find the best ad to show to a given user, given that he has purchased one or two items in the past. 9.2.3 Application2: Hidden Partitioning Given graph, assume that there are two hidden communities such that the probability p of eistence of an edge between the nodes of one community is much more than the probability q of an edge connecting the nodes in different communities. Given a sample of such a graph our goal is to recover the hidden communities with high probability. If we reorder the vertices such that the first community consists of nodes 1,..., n/2 and the second one consists of n/2 + 1,..., n, the epected matri look like the following: [ ] p q M 2 = q p Observe that the above matri is a rank 2 matri. So, one may epect that by using a rank 2 approimation of the adjacency matri of the given graph he can recover the hidden partition. his is indeed possible assuming p is sufficiently larger than q [McS01]. 9.2.4 Application3: Image Compression As each image is represented by a matri, it is possible to compress an image by approimating it by a lowerrank matri. o see an eample of image compression by lower-rank matri approimation in MALAB, please check the course homepage. 9.2.5 Principal Component Analysis In this section we study low rank approimation with respect to the operator norm. Definition 9.2 (Norm of the Matri). he operator norm of a matri M is defined as follows: M 2 = ma M 2 2.

9-4 Lecture 9: SVD, Low Rank Approimation In words, the operator norm shows how large a vector can get once it is multiplied with M. It follows by the Rayleigh quotient that the operator norm of M is equal to its largest singular value. Claim 9.3. M 2 = σ ma (M) Proof. We can write, ma M 2 2 = ma M 2 2 M 2 = M 2 = λ ma (M M) = σ ma (M), where the second to last equality follows by the Rayleigh quotient. In the following theorem we show that the best rank k approimation with respect to operator norm satisfies M M k σ k+1. Note that this approimation does not directly imply that the entries of M M k are closed to zero; it only upper bounds the action of M M k on vectors. heorem 9.4. Let M R m n with singular values σ 1 σ 2 σ n. For any integer k 1, min M k M M k 2 = σ k+1 (9.3) Proof. We need to prove two statements: i) here eist a rank-k matri M k such that M M k 2 = σ k+1. (2) For any rank k matri M k, M M k 2 σ k+1. We start with part (i). We let By definition of M, M M k = where the last equality uses Claim 9.3. M k = k σ i u i vi σ i u i v i M M k 2 = σ ma ( m σ i u i v i ) = σ k+1 Now, we prove part (ii). So, assume M k is an arbitrary rank k matri. For a matri M, the null space of M NULL(M) = { : M = 0} is the set of vector that M map to zero. Let null(m) be the dimension of the null space of M, NULL(M). It is a well known fact that for any M R m n, rank(m) + null(m) = n. his is because any vector which is orthogonal to the right singular vectors of M is in NULL(M). So, the above equality follows from the fact that M has rank(m) right singular vectors (with positive singular values). As an application, since M k has rank k, we must have null( M k ) = n rank( M k ) = n k. (9.4)

Lecture 9: SVD, Low Rank Approimation 9-5 By the Rayleigh quotient, we can write, σ k+1 (M) 2 = λ k+1 (M M) = min ma M M S:(n k)dims S M M ma NULL( M k ) (M = ma M k ) (M M k ) NULL( M k ) ma (M M k ) (M M k ). he first inequality uses the fact that NULL(M) is a n k dimensional linear space; so a special case of S being a n k dimensional linear space is S = NULL(M). he second equality uses that M k = 0 for any NULL( M k ). Now, we are done using another application of the Rayleigh quotient. ma (M M k ) (M M k ) his completes the proof of heorem 9.4. = λ ma ((M M k ) (M M k )) = σ ma (M M k ) 2. 9.2.6 Approimation of Frobenius Norm In this section we study low rank approimation with respect to different norm known as the Frobenius norm. Definition 9.5 (Frobenius Norm). For any matri M, the Frobenius norm of M is defined as follows: M F = i, jm 2 i,j Again, we are looking for the best rank-k approimation with respect to the Frobenius Norm. heorem 9.6. Given a matri M Rm n with singular values σ 1 σ 2 σ n, we have min M k M M k 2 F = σ 2 i (9.5) Before, proving the theorem, let us see how we can relate the Frobenius norm to the singular values of a matri. Claim 9.7. For any matri M R m n, M 2 F = σ 2 i Proof. First, observe that i, i-th entry of the matri M M, is the inner product of the i-th column of M with itself. (M M) i,i = Mj,i. 2 j Summing up over all i we have (M M) i,i = i,j M 2 i,j = M 2 F.

9-6 REFERENCES Now, observe the the LHS is indeed r(m M). In the last lecture we proved that the trace of a symmetric matri is equal to sum of its eigenvalues. So, the LHS is equal to σ 2 i. o prove heorem 9.6 we need to prove two statements: i) here is a rank-k matri M k such that M M k 2 F = m σ2 i ii) For any rank k matri M k, M M k 2 F m σ2 i. Here, we only prove (i). Similar to heorem 9.4 let M k = σ i u i vi, hen, as desired. M M k 2 F = σ i u i vi = m σ 2 i. References [McS01] F. McSherry. Spectral Partitioning of Random Graphs. In: FOCS. 2001, pp. 529 537 (cit. on p. 9-3).