Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016

Similar documents
14 Singular Value Decomposition

Multivariate Statistical Analysis

Principal Component Analysis

Eigenvalues, Eigenvectors. Eigenvalues and eigenvector will be fundamentally related to the nature of the solutions of state space systems.

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

MATH 304 Linear Algebra Lecture 20: The Gram-Schmidt process (continued). Eigenvalues and eigenvectors.

Matrix Vector Products

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach

15 Singular Value Decomposition

The Singular Value Decomposition

Lecture 1: Review of linear algebra

DIAGONALIZATION. In order to see the implications of this definition, let us consider the following example Example 1. Consider the matrix

Chapter 7. Canonical Forms. 7.1 Eigenvalues and Eigenvectors

Designing Information Devices and Systems II

1 Principal Components Analysis

PCA and admixture models

Tutorial on Principal Component Analysis

Chapter 6: Orthogonality

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

235 Final exam review questions

Lecture: Face Recognition and Feature Reduction

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Lecture: Face Recognition and Feature Reduction

7 Principal Component Analysis

Contents. Preface for the Instructor. Preface for the Student. xvii. Acknowledgments. 1 Vector Spaces 1 1.A R n and C n 2

Probabilistic Latent Semantic Analysis

Lecture 5 Supspace Tranformations Eigendecompositions, kernel PCA and CCA

MA 265 FINAL EXAM Fall 2012

Principal Component Analysis and Singular Value Decomposition. Volker Tresp, Clemens Otte Summer 2014

Linear Algebra 1. M.T.Nair Department of Mathematics, IIT Madras. and in that case x is called an eigenvector of T corresponding to the eigenvalue λ.

Principal Component Analysis

Linear Algebra review Powers of a diagonalizable matrix Spectral decomposition

Dimensionality Reduction

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)

Properties of Matrices and Operations on Matrices

Definition (T -invariant subspace) Example. Example

Eigenvalues and diagonalization

Chapter 7: Symmetric Matrices and Quadratic Forms

Lecture 3: Review of Linear Algebra

Lecture 3: Review of Linear Algebra

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

Singular Value Decomposition

Principal Component Analysis

PRINCIPAL COMPONENTS ANALYSIS

E2 212: Matrix Theory (Fall 2010) Solutions to Test - 1

1 Singular Value Decomposition and Principal Component

Principal Component Analysis

Chapter 3 Transformations

Linear System Theory

PCA, Kernel PCA, ICA

Bare minimum on matrix algebra. Psychology 588: Covariance structure and factor models

Numerical Methods I Singular Value Decomposition

Computational Methods CMSC/AMSC/MAPL 460. Eigenvalues and Eigenvectors. Ramani Duraiswami, Dept. of Computer Science

Lecture 5. Gaussian Models - Part 1. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. November 29, 2016

Review problems for MA 54, Fall 2004.

Lec 2: Mathematical Economics

ALGEBRA QUALIFYING EXAM PROBLEMS LINEAR ALGEBRA

Lecture 10 - Eigenvalues problem

LECTURE 16: PCA AND SVD

33AH, WINTER 2018: STUDY GUIDE FOR FINAL EXAM

No books, no notes, no calculators. You must show work, unless the question is a true/false, yes/no, or fill-in-the-blank question.

Applied Linear Algebra in Geoscience Using MATLAB

Remark 1 By definition, an eigenvector must be a nonzero vector, but eigenvalue could be zero.

Mathematical foundations - linear algebra

MLCC 2015 Dimensionality Reduction and PCA

Linear Algebra review Powers of a diagonalizable matrix Spectral decomposition

Diagonalization. Hung-yi Lee

Remark By definition, an eigenvector must be a nonzero vector, but eigenvalue could be zero.

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

Lecture Notes 2: Matrices

Singular Value Decomposition

Linear Algebra: Matrix Eigenvalue Problems

MATH 583A REVIEW SESSION #1

Notes on Eigenvalues, Singular Values and QR

Chap 3. Linear Algebra

Lecture 2: Linear Algebra Review

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Fall TMA4145 Linear Methods. Exercise set Given the matrix 1 2

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

Control Systems. Linear Algebra topics. L. Lanari

Linear Algebra - Part II

Linear Algebra- Final Exam Review

ELE/MCE 503 Linear Algebra Facts Fall 2018

Linear Subspace Models

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Throughout these notes we assume V, W are finite dimensional inner product spaces over C.

University of Colorado at Denver Mathematics Department Applied Linear Algebra Preliminary Exam With Solutions 16 January 2009, 10:00 am 2:00 pm

. = V c = V [x]v (5.1) c 1. c k

Background Mathematics (2/2) 1. David Barber

CS 340 Lec. 6: Linear Dimensionality Reduction

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

Pseudoinverse & Orthogonal Projection Operators

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Assignment #10: Diagonalization of Symmetric Matrices, Quadratic Forms, Optimization, Singular Value Decomposition. Name:

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

Least Squares Optimization

University of Colorado Denver Department of Mathematical and Statistical Sciences Applied Linear Algebra Ph.D. Preliminary Exam June 8, 2012

IV. Matrix Approximation using Least-Squares

Knowledge Discovery and Data Mining 1 (VO) ( )

DS-GA 1002 Lecture notes 10 November 23, Linear models

Transcription:

Lecture 8 Principal Component Analysis Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 13, 2016 Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 1 / 31

Outline 1 Eigen Decomposition Eigenvalues and Eigenvectors Some Properties 2 Singular Value Decomposition Singular Values Connection with Eigenvalues Singular Value Decomposition 3 Principal Component Analysis Discovering Latent Factors Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 2 / 31

Outline 1 Eigen Decomposition Eigenvalues and Eigenvectors Some Properties 2 Singular Value Decomposition Singular Values Connection with Eigenvalues Singular Value Decomposition 3 Principal Component Analysis Discovering Latent Factors Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 3 / 31

Eigenvalues given a matrix A C n n a nonzero vector v C n is said to be its (right) eigenvector if Av = λv for some scalar λ C; λ is called an eigenvalue of A the set of all eigenvalues of A is called its spectrum and it s denoted by σ(a) the Matlab command [ V,D]= e i g (A) produces a diagonal matrix D of eigenvalues and a matrix V such that AV = VD with D = diag(λ i ) if A is diagonalizable then V is full-rank (i.e. rank(v ) = n) and the columns of V correspond to n linearly independent eigenvectors, i.e. V = {v 1,..., v n}; in this case one has n A = VDV 1 = λ i v i v T i if à = TAT 1, where T is a nonsingular matrix, then à and A are called similar matrices; at the previous point, A and D are similar matrices Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 4 / 31 i=1

Outline 1 Eigen Decomposition Eigenvalues and Eigenvectors Some Properties 2 Singular Value Decomposition Singular Values Connection with Eigenvalues Singular Value Decomposition 3 Principal Component Analysis Discovering Latent Factors Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 5 / 31

Eigenvalues Real Matrix given a real matrix A R n n all its eigenvalues σ(a) are the roots of the characteristic polynomial equation det(λi A) = 0 given that A is a real matrix, if λ C is an eigenvalue then its conjugate λ C is also an eigenvalue, i.e., σ(a) = σ(a) det(λ I A) = 0 = det(λ I A) = det(λ I A) = det(λ I A) = 0 σ(a) = σ(a T ) since det(λi A) = det((λi A) T ) = det(λi A T ) if à = TAT 1, where T is a nonsingular matrix, then σ(ã) = σ(a) since det(λi Ã) = det(λtt 1 TAT 1 ) = det(t(λi A)T 1 ) = = det(t) det(λi A) det(t 1 ( ) = det(λi A) det I = det(t 1 ) det T = 1 ) Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 6 / 31

Eigenvalues Real Matrix given a real matrix A R n n eigenvectors of A associated to different eigenvalues are linearly independent Av i = λ i v i, λ 1 λ 2, v 1 v 2 α 1v 1 + α 2v 2 = 0 = A(α 1v 1 + α 2v 2) = 0 = α 1λ 1v 1 + α 2λ 2v 2 = 0 since λ 1(α 1v 1 + α 2v 2) = 0 = α 2(λ 2 λ 1)v 2 = 0 = α 2 = 0 = α 1 = 0 if σ(a) = n, i.e. σ(a) = {λ 1, λ 2,..., λ n} with λ i λ j for i j, then A is diagonalizable Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 7 / 31

Eigenvalues Real Matrix given a real matrix A R n n in general σ(a) = {λ 1,..., λ m} with m n and λ i λ j for i j, and one has det(λi A) = (λ λ 1) µa(λ 1) (λ λ 2) µa(λ 2)...(λ λ m) µa(λm) m i=1 µ a(λ i ) = n where µ a(λ i ) is the algebraic multiplicity of λ i in the general case one can always find the Jordan canonical form J λ i 1 1 A = VJV 1 J 2. J =. J.. i = λ.. i... 1 J p λ i Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 8 / 31

Eigenvalues Real Matrix given a real matrix A R n n let E(λ i ) ker(a λ i I) = {v R n : (A λ i I)v = 0} be the eigenspace of λ i let µ g (λ i ) dim(e(λ i )) be the geometric multiplicity of λ i (µ g (λ i ) µ a(λ i )) if E(λ i ) = span{v i1,..., v iµi }, with µ i µ g (λ i ), then AV i = V i D i with V i = [v i1,..., v iµi ] R n µ i and D i = λ i I µi if µ g (λ i ) = µ a(λ i ) for each i {1,..., m} then and A is diagonalizable R n = E(λ 1)... E(λ m) n µ g (λ i ) = n i=1 Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 9 / 31

Eigenvalues Real Symmetric Matrix given a real symmetric matrix S R nxn (i.e. S = S T ) all eigenvalues of S are real, i.e. σ(s) R eigenvectors v 1, v 2 of S associated to different eigenvalues are linearly independent and orthogonal Sv i = λ i v i, λ 1 λ 2, v 1 v 2, v T 2 (Sv 1 λ 1v 1) = 0 = (Sv 2) T v 1 λ 1v T 2 v 1 = 0 = v T 2 v 1(λ 2 λ 1) = 0 = v T 2 v 1 = 0 S is diagonalizable and one can find an orthonormal basis V such that V T = V 1 S = VDV T = n λ i v i v T i i=1 if S > 0 (S 0) then λ i > 0 (λ i 0) for i {1,..., m} N.B.: the above results can be applied for instance to covariance matrices Σ which are symmetric and positive definite Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 10 / 31

Outline 1 Eigen Decomposition Eigenvalues and Eigenvectors Some Properties 2 Singular Value Decomposition Singular Values Connection with Eigenvalues Singular Value Decomposition 3 Principal Component Analysis Discovering Latent Factors Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 11 / 31

Singular Value Decomposition idea: the singular values generalize the notion of eigenvalues to any kind of matrix given a matrix X R N D this can be decomposed as follows X N D = U S = V T N N N D D D r σ i u i v T i i=1 U R N N is orthonormal, U T U = UU T = I N V R D D is orthonormal, V T V = VV T = I D σ 1 σ 2... σ r 0 are the singular values, r = min(n, D) and S R N D σ 1... σ 1 S = if N > D or S =... if N D 0D N 0 N D σ D σ N Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 12 / 31

Outline 1 Eigen Decomposition Eigenvalues and Eigenvectors Some Properties 2 Singular Value Decomposition Singular Values Connection with Eigenvalues Singular Value Decomposition 3 Principal Component Analysis Discovering Latent Factors Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 13 / 31

Singular Value Decomposition Connection with Eigenvalues given a matrix X R N D one has X N D V T = U S N N N D D D U R N N is orthonormal, U T U = UU T = I N V R D D is orthonormal, V T V = VV T = I D = r i=1 σ iu i v T i where r = min(n, D) S R N D contains the singular values σ i on the main diagonal and 0s elsewhere we have X T X = (VS T U T )(USV T ) = VS T SV T = VDV T where D S T S is a diagonal matrix containing the squared singular values σ 2 i (and possibly some additional zeros) on the main diagonal and 0s elsewhere since (X T X)V = VD, then V contains the eigenvectors of X T X the eigenvalues of X T X are the squared singular values σ 2 i, contained in D the columns of V are called the right singular vectors of X Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 14 / 31

Singular Value Decomposition Connection with Eigenvalues given a matrix X R N D one has X N D V T = U S N N N D D D U R N N is orthonormal, U T U = UU T = I N V R D D is orthonormal, V T V = VV T = I D = r i=1 σ iu i v T i where r = min(n, D) S R N D contains the singular values σ i on the main diagonal and 0s elsewhere we have XX T = (USV T )(VS T U T ) = USS T U T = U ˆDU T where ˆD SS T is a diagonal matrix containing the squared singular values σ 2 i (and possibly some additional zeros) on the main diagonal and 0s elsewhere since (XX T )U = U ˆD, then U contains the eigenvectors of XX T the eigenvalues of XX T are the squared singular values σ 2 i, contained in ˆD the columns of U are called the left singular vectors of X Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 15 / 31

Outline 1 Eigen Decomposition Eigenvalues and Eigenvectors Some Properties 2 Singular Value Decomposition Singular Values Connection with Eigenvalues Singular Value Decomposition 3 Principal Component Analysis Discovering Latent Factors Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 16 / 31

Singular Value Decomposition given a matrix X R N D with N > D this can be decomposed as follows X N D = U S = V T N N N D D D σ 1... S = 0 N D D σ i u i v T i the last N D columns of U are irrelevant, since they will be multiplied by zero i=1 σ D Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 17 / 31

Singular Value Decomposition given a matrix X R N D with N > D this can be decomposed as X N D = U S V T N N N D D D = D i=1 σ iu i v T i since the last N D columns of U are irrelevant, we can directly neglect them and consider the economy sized SVD X N D = Û Ŝ = V T N D D D D D D σ i u i v T i where Û T Û = I D, V T V = VV T = I D and Ŝ = diag(σ 1,..., σ D ) i=1 Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 18 / 31

Singular Value Decomposition given a matrix X R N D with N > D this can be decomposed as X = U S V T = D i=1 σ iu i v T i N D N N N D D D a rank L approximation of X considers only the first L < D columns of U and the first L singular values we can then consider the truncated SVD X L = U L N D N L S L L L V T L D L = L σ i u i v T i where U T L U L = I L, V T L V L = I D and S L = diag(σ 1,..., σ L ) i=1 Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 19 / 31

Singular Value Decomposition given a matrix X R N D with N > D it is possible to show that X L = U L S L V T L which solve is the best rank L approximation matrix X L = argmin X ˆX F ˆX with the constraint rank(ˆx) = L where A F denotes the Frobenious norm ( m n A F = i=1 j=1 a 2 ij ) 1/2 = ( trace(a T A) ) 1/2 by replacing ˆX = X L one obtains [ SL X X L F = U(S 0 ] [ )V T SL F = S 0 ] F = r i=l+1 σ i Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 20 / 31

Outline 1 Eigen Decomposition Eigenvalues and Eigenvectors Some Properties 2 Singular Value Decomposition Singular Values Connection with Eigenvalues Singular Value Decomposition 3 Principal Component Analysis Discovering Latent Factors Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 21 / 31

Discovering Latent Factors An example dimensionality reduction: it is often useful to reduce the dimensionality by projecting the data to a lower dimensional subspace which captures the essence of the data for example in the above plot: the 2d approximation is quite good, most points lie close to this subspace projecting points onto the red line 1d approx is a rather poor approximation Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 22 / 31

Discovering Latent Factors Motivations although data may appear high dimensional, there may only be a small number of degrees of variability, corresponding to latent factors low dimensional representations, when used as input to statistical models, often result in better predictive accuracy (focusing on the essence ) low dimensional representations are useful for enabling fast nearest neighbor searches two dimensional projections are very useful for visualizing high dimensional data Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 23 / 31

Principal Component Analysis PCA the most common approach to dimensionality reduction is called Principal Components Analysis or PCA given x i R D, we compute an approximation ˆx i = Wz i with z i R L where L < D (dimensionality reduction) W R D L and W T W = I (orthogonal W) so as to minimize the reconstruction error J(W, Z) = 1 N x i ˆx i 2 = 1 N x i Wz i 2 N N i=1 i=1 since W T W = I one has x 2 = (x T x) 1/2 = (z T W T Wz) 1/2 = (z T z) 1/2 = z 2 in 3D or 2D W can represent part of a rotation matrix this can be thought of as an unsupervised version of (multi-output) linear regression, where we observe the high-dimensional response y = x, but not the low-dimensional cause z Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 24 / 31

Principal Component Analysis PCA the objective function (reconstruction error) J(W, Z) = 1 N x i ˆx i 2 = 1 N x i Wz i 2 N N is equivalent to J(W, Z) = X T WZ T 2 F where A F denotes the Frobenious norm i=1 ( m n A F = i=1 j=1 a 2 ij i=1 ) 1/2 = ( trace(a T A) ) 1/2 Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 25 / 31

Principal Component Analysis PCA we have to minimize the objective function J(W, Z) = 1 N x i ˆx i 2 = 1 N x i Wz i 2 N N i=1 or equivalently J(W, Z) = X T WZ T 2 F subject to W R D L and W T W = I it is possible to prove that the optimal solution is obtained by setting where X L = U L S L V T L Ŵ = V L i=1 Ẑ = XW (z i = ŴT x i = V T L x i ) is the rank L truncated SVD of X we have X T X = VS 2 V T hence (X T X)V L = V L S 2 L since V = [V L ] V L contains the L eigenvectors (principal directions) with the largest eigenvalues of the empirical covariance matrix Σ = 1 N XT X = 1 N N i=1 x ix T i z i = V T L x i is the orthogonal projection of x i on the eigenvectors in V L Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 26 / 31

Principal Component Analysis PCA by replacing Z = XW in the objective function and recalling that W T W = I ( ) J(Z) = X T WZ T 2 F = trace (X T WZ T ) T (X T WZ T ) = = trace(xx T XWZ T ZW T X T + ZZ T ) = trace(x T X) trace(w T X T XW) = = trace(x T X) trace(z T Z) = trace(nσ X ) trace(nσ Z ) where we used the following facts: trace(ab) = trace(ba), trace(a T ) = trace(a) minimizing the reconstruction error is equivalent to maximizing the variance of the projected data z i = Wx i Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 27 / 31

Principal Component Analysis PCA the principal directions are the ones along which the data shows maximal variance this means that PCA can be misled by directions in which the variance is high merely because of the measurement scale in the figure below, the vertical axis (weight) uses a large range than the horizontal axis (height), resulting in a line that looks somewhat unnatural it is therefore standard practice to standardize the data first where µ j = 1 N x ij (x ij µ j ) N i=1 x ij and σj 2 = 1 N N 1 i=1 (x ij µ j ) 2 σ j Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 28 / 31

Principal Component Analysis PCA left: the mean and the first three PC basis vectors (eigendigits) based on 25 images of the digit 3 (from the MNIST dataset) right: reconstruction of an image based on 2, 10, 100 and all the basis vectors Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 29 / 31

Principal Component Analysis PCA low rank approximations to an image top left: the original image is of size 200 320, so has rank 200 subsequent images have ranks 2, 5, and 20. Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 30 / 31

Credits Kevin Murphy s book Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 31 / 31