Lecture 9: Low Rank Approximation

Similar documents
Lecture 9: SVD, Low Rank Approximation

Lecture 8: Linear Algebra Background

Lecture 12: Introduction to Spectral Graph Theory, Cheeger s inequality

Lecture 13: Spectral Graph Theory

Lecture 1 and 2: Random Spanning Trees

Lecture 12 : Graph Laplacians and Cheeger s Inequality

Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora

Ma/CS 6b Class 23: Eigenvalues in Regular Graphs

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

Chapter 7: Symmetric Matrices and Quadratic Forms

MIT Algebraic techniques and semidefinite optimization February 14, Lecture 3

Lecture 14: Random Walks, Local Graph Clustering, Linear Programming

Lecture 1: Contraction Algorithm

Conceptual Questions for Review

Linear Algebra Review

Lecture 7: Positive Semidefinite Matrices

Review of Some Concepts from Linear Algebra: Part 2

7. Symmetric Matrices and Quadratic Forms

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

Lecture 8 : Eigenvalues and Eigenvectors

Sketching as a Tool for Numerical Linear Algebra

CS 229r: Algorithms for Big Data Fall Lecture 17 10/28

U.C. Berkeley CS270: Algorithms Lecture 21 Professor Vazirani and Professor Rao Last revised. Lecture 21

Contents. Preface for the Instructor. Preface for the Student. xvii. Acknowledgments. 1 Vector Spaces 1 1.A R n and C n 2

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

Matrix Theory, Math6304 Lecture Notes from March 22, 2016 taken by Kazem Safari

25.2 Last Time: Matrix Multiplication in Streaming Model

Lecture 17: D-Stable Polynomials and Lee-Yang Theorems

Graph Partitioning Algorithms and Laplacian Eigenvalues

Lecture 7: Schwartz-Zippel Lemma, Perfect Matching. 1.1 Polynomial Identity Testing and Schwartz-Zippel Lemma

Spectral Theorem for Self-adjoint Linear Operators

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Chapter 3 Transformations

Computational math: Assignment 1

Lecture 7 Spectral methods

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Lecture 13 Spectral Graph Algorithms

Lecture 26: April 22nd

Chapter 6: Orthogonality

1 Review: symmetric matrices, their eigenvalues and eigenvectors

Least Squares Optimization

Lecture 2: Linear Algebra Review

MATH36001 Generalized Inverses and the SVD 2015

Properties of Matrices and Operations on Matrices

Duke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014

2. Review of Linear Algebra

Fall TMA4145 Linear Methods. Exercise set Given the matrix 1 2

Lecture 6: September 19

Designing Information Devices and Systems II

UNIT 6: The singular value decomposition.

MATH 583A REVIEW SESSION #1

2.1 Laplacian Variants

Principal Component Analysis

Review problems for MA 54, Fall 2004.

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Linear Algebra: Matrix Eigenvalue Problems

Lecture 4: Hashing and Streaming Algorithms

Exercise Sheet 1.

sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU)

Example Linear Algebra Competency Test

Ma/CS 6b Class 20: Spectral Graph Theory

Throughout these notes we assume V, W are finite dimensional inner product spaces over C.

Lecture 3: Review of Linear Algebra

ORIE 6334 Spectral Graph Theory September 22, Lecture 11

Graph Matching & Information. Geometry. Towards a Quantum Approach. David Emms

Lecture 16: Roots of the Matching Polynomial

Least Squares Optimization

Lecture 5: Lowerbound on the Permanent and Application to TSP

CSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming

Ma/CS 6b Class 20: Spectral Graph Theory

Eigenvalue problems and optimization

CS168: The Modern Algorithmic Toolbox Lectures #11 and #12: Spectral Graph Theory

Lecture Semidefinite Programming and Graph Partitioning

Linear Algebra Review. Vectors

14 Singular Value Decomposition

ORIE 6334 Spectral Graph Theory November 8, Lecture 22

MLCC 2015 Dimensionality Reduction and PCA

Least Squares Optimization

Final Exam, Linear Algebra, Fall, 2003, W. Stephen Wilson

MATH 304 Linear Algebra Lecture 20: The Gram-Schmidt process (continued). Eigenvalues and eigenvectors.

ECE 598: Representation Learning: Algorithms and Models Fall 2017

MTH 2032 SemesterII

Chap 3. Linear Algebra

Full-State Feedback Design for a Multi-Input System

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 5

LINEAR ALGEBRA 1, 2012-I PARTIAL EXAM 3 SOLUTIONS TO PRACTICE PROBLEMS

(VI.D) Generalized Eigenspaces

σ 11 σ 22 σ pp 0 with p = min(n, m) The σ ii s are the singular values. Notation change σ ii A 1 σ 2

Lecture 17: Cheeger s Inequality and the Sparsest Cut Problem

Homework 2. Solutions T =

Lecture 10. Lecturer: Aleksander Mądry Scribes: Mani Bastani Parizi and Christos Kalaitzis

Stat 159/259: Linear Algebra Notes

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

MA201: Further Mathematical Methods (Linear Algebra) 2002

MATH 20F: LINEAR ALGEBRA LECTURE B00 (T. KEMP)

E2 212: Matrix Theory (Fall 2010) Solutions to Test - 1

linearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice

Transcription:

CSE 521: Design and Analysis of Algorithms I Fall 2018 Lecture 9: Low Rank Approximation Lecturer: Shayan Oveis Gharan February 8th Scribe: Jun Qi Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications. In this lecture we discuss theory and applications of singular value decomposition. 1.1 A Geometric Intuition of a Matrix/Operator Let M = n λ iv i v i be a symmetric matrix. We can represent M geometrically by a ellipse defined with the following equation: x R n : x M 2 x = 1. The axis of this ellipse correspond to eigenvectors of M and length of the i-th axis is equal to the i-th largest eigenvalue in absolute value. This is because if we let x = λ i v i, then x M 2 x = (λ i v i ) M 2 (λ i v i ) = (λ i v i ) 1 λ i v i = 1, where we used the v i is an eigenvector of M 2 with corresponding eigenvalue of 1/λ 2 i. So, this says that along the v i direction the farthest point is exactly λ i away from the origin. In particular, the farthest point of the ellipse is i λ i away from the origin. We can understand this ellipse differently: It can be seen as the image of the unit sphere around the origin with respect to operator M. Recall that the set points on the unit sphere are all x R n such that x = 1. The image with respect to M, is Mx. We claim that for any x such that x = 1, Mx is on the ellipse. It is enough to see (Mx) M 2 (Mx) = x M M 2 Mx = x x x 2 = 1. 1.2 Best Low Rank Approximation Theorem 1.1. For any matrix M R m n where the infimum is over all rank k matrices ˆM. inf M ˆM 2 = σ k+1, (1.1) rank( ˆM)=k We did not discuss the proof of this theorem in class. completeness. We are including the proof here for the sake of Proof. To proof 1.1, we need to prove two statements: (i) There exists a rank k matrix ˆM k such that M ˆM k 2 = σ k+1 ; (ii) For any rank k matrix ˆM k, M ˆM k σ k+1. 1-1

1-2 Lecture 9: Low Rank Approximation We start with part (i). We let ˆM k = σ i u i v T i (1.2) By definition of M, M ˆM k = σ i u i v T i M ˆM k 2 = σ ( σ i u i v T i ) = σ k+1 (1.3) Now, we prove part (ii). Let ˆM k be an arbitrary rank k matrix. The null space of the matrix M, NULL(M) = {x : Mx = 0} (1.4) is the set of vector that M maps to zero. Let null(m) be the dimension of the linear space NULL(M). It is a well-known fact that for any M R m n, rank(m) + null(m) = n. (1.5) It is easy to see this from the SVD decomposition; null is just n minus the number of nonzero singular values of M. Putting it differently, any vector orthogonal to the right singular vectors of M is in NULL(M). So, the above equality follows from the fact that M has rank(m) right singular vectors (with positive singular values). As an application, since ˆM k has rank k, we have null( ˆM k ) = n rank( ˆM k ) = n k. (1.6) By the Rayleigh quotient, we obtain σ k+1 (M) 2 = λ k+1 (M T M) = min x T M T Mx S:(n k)dims x S x T M T Mx x NULL( ˆM k ) x T (M = ˆM k ) T (M ˆM k )x x NULL( ˆM k ) x x T (M ˆM k ) T (M ˆM k )x (1.7) The first inequality uses the fact that NULL(M) is a n k dimensional linear space; so a special case of S being a n k dimensional linear space is S = NULL(M). The second equality uses that ˆM k x = 0 for any x NULL( ˆM k ). Now, we are done using another application of the Rayleigh quotient. x x T (M ˆM k ) T (M ˆM k )x The completes the proof of 1.1. = λ ((M ˆM k ) T (M ˆM k )) = σ (M ˆM k ) 2. (1.8) Theorem 1.2. For any matrix M R m n (with m n) with singular values σ 1 σ 2... σ m inf M ˆM k 2 ˆM k :rank( ˆM F = k )=k σi 2 (1.9)

Lecture 9: Low Rank Approximation 1-3 Proof. Since ˆM k has rank k, we can assume columns of ˆM span{w1, w 2,..., w k } where {w 1,..., w k } is a set of orthonormal vectors for the linear space of columns of ˆMk. First, observe that M ˆM 2 F = M i ˆM i 2. Now, let us answer the following question: Choose a vector v span{w 1,..., w k } that minimizes M i v 2? It is not hard to see that the optimum solution is the projection of M i onto the subspace of w i s, i.e., we must have ˆM i = M i, w i w i. (1.10) j=1 To write the above equation in matrix form we need to use the notion of a projection matrix. Given a set of orthonormal vectors w 1,..., w k, a projection matrix Π is defined as w i wi T. A projection matrix projects any vector in R n in the linear span of its vectors; in the above case we have Π k M i = w j, M i w j. j=1 Projection matrices have many nice properties that come in handy when we need to use them. For example, all their singular values are either 0 or 1. For any projection matrix Π, we have Π 2 = Π. There is a unique projection matrix of rank n which is the identity matrix. Have this in hand, we can write ˆM = Π k M. Therefore, M ˆM 2 F = M Π k M 2 F = (I Π k )M 2 F = Π k M 2 F In the last identity Π k is nothing but the projection matrix on the space orthogonal to {w 1,..., w k }. More precisely, let us add w k+1,..., w n such that w 1,..., w n form an orthonormal basis of R n. Then, Π k = w i wi T. Now, we need to show that to minimize Π k M 2 F, w 1,..., w k must completely lie on the space of top singular vectors of M. Firstly, by (??), Π k M 2 F = Tr(M T Π T k Π k M) = Tr(M T Π k M) = Tr(MM T Π k ) The last identity follows by the invariance of the trace under cyclic permutation. Now, we can rewrite the above as follows: Tr(MM T Π k ) = Tr( σi 2 u i u T i w j w T j ) = = j=k+1 Tr(σi 2 u i u T i w j w T j ) Tr(σi 2 u i, w j 2 ) = i,j σ 2 i u i, w j 2 σ 2 i (1.11)

1-4 Lecture 9: Low Rank Approximation The second identity follows by linearity of trace and the third identity follows by invariance of the trace under cyclic permutation. To see that last inequality, observe that for any k +1 j n, n u i, w j 2 = 1. So, i,j σ2 i u i, w j 2 is minimized if all w j s lie in the linear space span{u k+1,..., u n } as desired. The above theorems may not be desirable in practice because SVD computation is very costly. However, there are numerous algorithms that efficiently (in almost linear time) approximate the SVD of a given matrix. The following famous paper received the best paper award of STOC 2013 to resolve this question. Theorem 1.3 (Clarkson-Woodruff 2013 [1]). There is an algorithm that for any matrix M R n n and an input parameter k, finds a rank k matrix ˆM such that M ˆM 2 F (1 + ɛ) inf M N:rank(N)=k N 2 F. The algorithm runs in time O(nnz(A) ( k ɛ + k log k) + n poly( k ɛ )). Here nnz(m) denotes the number of nonzero entries of M, i.e., the input length of M. 1.3 Applications We conclude this lecture by giving several other applications of low rank approximation. Hidden Partition Given a graph, assume that there are two hidden communities A, B each of size n/2 such that for any pair of individuals i, j A there is an edge between them with probability p; similarly, for any pair of individuals i, j B there is an edge with probability p. For any pair of individuals in the two sides the probability of existing an edge is q. Given a sample of such a graph and assuming p q, we want to approximately recover A, B. If we reorder the vertices such that the first community consists of nodes 1,..., n/2 and the second one consists of n/2 + 1,..., n, the expected matrix looks like the following: [ ] a b ˆM 2 = c d Observe that the above matrix is a rank 2 matrix. So, one may expect that by applying a rank 2 approximation of the adjacency matrix of the given graph he can recover the hidden partition. This is indeed possible assuming p is sufficiently larger than q. Max-cut Our most formal application of low rank approximation. We use this idea to design an optimization algorithm for the imum cut problem. Given a graph G = (V, E) we want to find a set S V which imizes E(S, S). Although the min-cut can be solved optimally, -cut problem is an NP-hard problem. The best known approximation algorithm for this problem has an approximation factor of 0.878 by a seminal work of Goemans and Williamson. They showed taht there is a polynomial time algorithm which always return a cut (T, T ) such that E(T, T ) 0.878 S E(S, S). (1.12) Firstly, we formulate the -cut problem algebraically. For a set S V, let { 1 S 1 if i S i = 0 otherwise (1.13)

Lecture 9: Low Rank Approximation 1-5 be the indicator vector of the set S. We claim that for any set S V, E(S, S) = 1 ST A1 S. (1.14) In fact, for any two vectors x and y, x T Ay = i,j x i A i,j y j. (1.15) So we can rewrite the -cut problem as the following algebraic problem: S E(S, S) = x {0,1} n xt A(1 x) (1.16) The rest of the proof is a fairly general argument and use no combinatorial structure of the underlying problem. Namely we give an algorithm to solve the quadratic optimization problem x {0,1} n xa(1 x) for a given matrix A. We do this task in two steps. First, we approximate A by a low rank matrix and we show that the optimum solution of the optimization problem for the low rank matrix is close to the optimum solution of A. Then, we design an algorithm to approximately solve the above quadratic problem for a low rank matrix. Roughly speaking, using the low rank property we reduce our n dimensional problem to a k dimensional question, and then we use the k-dimensional question using an ɛ-net. 1.4 Reference [1] Clarkson, Kenneth L., and David P. Woodruff. Low rank approximation and regression in input sparsity time. In Proceedings of the forty-fifth annual ACM symposium on Theory of computing, pp. 81-90. ACM, 2013. [2] McSherry, Frank. Spectral partitioning of random graphs. In Foundations of Computer Science, 2001. Proceedings. 42nd IEEE Symposium on, pp. 529-537. IEEE, 2001. [3] Clarkson, Kenneth L., and David P. Woodruff. Low rank approximation and regression in input sparsity time. In Proceedings of the forty-fifth annual ACM symposium on Theory of computing, pp. 81-90. ACM, 2013.