Lecture 10, Principal Component Analysis

Principal Cmpnent Analysis Lecture 10, Principal Cmpnent Analysis Ha Helen Zhang Fall 2017 Ha Helen Zhang Lecture 10, Principal Cmpnent Analysis 1 / 16

Principal Cmpnent Analysis Lecture 10, Principal Cmpnent Analysis Ha Helen Zhang Fall 2017 Ha Helen Zhang Lecture 10, Principal Cmpnent Analysis 2 / 16

Mtivatins Principal Cmpnent Analysis The principal cmpnent analysis (PCA) is cncerned with explaining the variance-cvariance structure f X = (X 1,, X p ) thrugh a few linear cmbinatins f these variables. Main purpses: data (dimensin) reductin interpretatin Easy t visualize Ha Helen Zhang Lecture 10, Principal Cmpnent Analysis 3 / 16

Principal Cmpnent Analysis Variance-Cvariance Matrix f Randm Vectr Define the randm vectr and its mean vectr X = (X 1,, X p ), µ = E(X) = (µ 1,, µ p ). The variance-cvariance matrix f X is the Σ = Cv(X) = E(X µ)(x µ), its ij-th entry σ ij = E(X i µ i )(X j µ j ) fr any 1 i j p. µ is the ppulatin mean Σ is the ppulatin variance-cvariance matrix. In practice, µ and Σ are unknwn and estimated frm the data. Ha Helen Zhang Lecture 10, Principal Cmpnent Analysis 4 / 16

Principal Cmpnent Analysis Sample Variance-Cvariance Matrix Sample mean: X = 1 n X 1 n, X is the design matrix, and 1 n is the vectr f 1 f length n. (Unbiased) Sample variance-cvariance matrix S n = 1 n 1 X cx c = 1 n 1 n (X i X)(X i X), i=1 where X c the centered design matrix, and X i = (X i1,, X ip ) fr i = 1,, n. It is easy t shw that S n = 1 n 1 X (I n 1 n 1 n1 n)x. Ha Helen Zhang Lecture 10, Principal Cmpnent Analysis 5 / 16

Principal Cmpnent Analysis Linear Cmbinatins f Inputs Cnsider the linear cmbinatins Z 1 = v 1X = v 11 X 1 + v 12 X 2 + + v 1p X p, Z 2 = v 2X = v 21 X 1 + v 22 X 2 + + v 2p X p, = Z p = v px = v p1 X 1 + v p2 X 2 + + v pp X p. Then Var(Z j ) = v jσv j, j = 1,, p. Cv(Z j, Z k ) = v jσv k, j k. Ha Helen Zhang Lecture 10, Principal Cmpnent Analysis 6 / 16

What is PCA Principal Cmpnent Analysis Principal cmpnent analysis (PCA, Pearsn 1901) is a statistical prcedure that uses an rthgnal transfrmatin t cnvert a set f bservatins f crrelated variables int a set f linearly uncrrelated variables (called principal cmpnents) finds directins with maximum variability Principal cmpnents (PCs): PCs are uncrrelated, rthgnal, linear cmbinatins Z 1,, Z p whse variances are as large as pssible. PCs frm a new crdinate system by rtating the riginal system cnstructed by X 1,, X p Ha Helen Zhang Lecture 10, Principal Cmpnent Analysis 7 / 16

Principal Cmpnent Analysis Ð Ñ ÒØ Ó ËØ Ø Ø Ð Ä ÖÒ Ò À Ø Ì Ö Ò ² Ö Ñ Ò ¾¼¼½ ÔØ Ö -4-2 0 2 4-4 -2 0 2 4 Largest Principal Cmpnent Smallest Principal Cmpnent ÈË Ö Ö ÔÐ Ñ ÒØ ½ ¾ ÙÖ º ÈÖ Ò Ô Ð ÓÑÔÓÒ ÒØ Ó ÓÑ ÒÔÙØ Ø ÔÓ ÒØ º Ì Ð Ö Ø ÔÖ Ò Ô Ð ÓÑÔÓÒ ÒØ Ø Ö ¹ Ø ÓÒ Ø Ø Ñ Ü Ñ Þ Ø Ú Ö Ò Ó Ø ÔÖÓ Ø Ø Ò Ø Ñ ÐÐ Ø ÔÖ Ò Ô Ð ÓÑÔÓÒ ÒØ Ñ Ò Ñ Þ Ø Ø Ú Ö Ò º Ê Ö Ö ÓÒ ÔÖÓ Ø Ý ÓÒØÓ Ø ÓÑ¹ ÔÓÒ ÒØ Ò Ø Ò Ö Ò Ø Ó Æ ÒØ Ó Ø ÐÓÛ¹ Ú Ö Ò ÓÑÔÓÒ ÒØ ÑÓÖ Ø Ò Ø ¹Ú Ö Ò ÓÑ¹ ÔÓÒ ÒØ º Ha Helen Zhang Lecture 10, Principal Cmpnent Analysis 8 / 16

Principal Cmpnent Analysis Mathematical Frmulatin The prcedure seeks the directin f high variances: The first PC = linear cmbinatin Z 1 = v 1 X that maximizes Var(v 1 X) subject t v 1 = 1. The secnd PC = linear cmbinatin Z 2 = v 2 X that maximizes Var(v 2 X) subject t v 2 = 1 and Cv(v 1 X, v 2 X) = 0 The jth PC satisfies max Var(v jx) subject t v j = 1, where j = 2,, p. Cv(v l X, v jx v j ) = v l Σv j = 0, fr l = 1,..., j 1, Ha Helen Zhang Lecture 10, Principal Cmpnent Analysis 9 / 16

Principal Cmpnent Analysis Interpretatin f PCA Z 1 = v 1 X has the largest sample variance amng all nrmalized linear cmbinatins f the clumns f X. Z 2 = v 2 X has the highest variance amng all nrmalized liner cmbinatins f the clumns f X, satisfying v 2 rthgnal t v 1.... The last PC Z p = v px has the minimum variance amng all nrmalized linear cmbinatins f the clumns f X, subject t v p being rthgnal t the earlier nes. If Σ is unknwn, we use S n as its estimatr. Ha Helen Zhang Lecture 10, Principal Cmpnent Analysis 10 / 16

Hw t Slve PCs Principal Cmpnent Analysis There are tw ways: eigen-decmpsitin f Σ singular value decmpsitin (SVD) f X c. Cmment: Efficient algrithms exist t calculate SVD f X withut cmputing X T X Cmputing SVD is nw the standard way t calculate PCA frm a data matrix Ha Helen Zhang Lecture 10, Principal Cmpnent Analysis 11 / 16

Principal Cmpnent Analysis Eigen-Decmpsitin f Σ Assume Σ has p eigenvalue-eigenvectr pairs (λ, e) satisfying: Σe j = λ j e j, j = 1, p, where λ 1 λ 2 λ p > 0 and e j = 1 fr all j. This gives the fllwing spectral decmpsitin Σ = p λ j e j e j. j=1 The jth PC is given by Z j = e jx and its variance is Var(Z j ) = e jσe j = λ j. The magnitude f e jk measures the imprtance f the kth variable t the jth PC, irrespective f the ther variables. Ha Helen Zhang Lecture 10, Principal Cmpnent Analysis 12 / 16

Number f PCs Principal Cmpnent Analysis The ttal (ppulatin) variance f inputs p p Var(X j ) = σ jj = j=1 j=1 p λ j = j=1 p Var(Z j ). j=1 Prprtin f ttal variance due t the jth PC The number f PCs are decided based n λ j p k=1 λ k. the amunt f ttal sample variance explained, the variances f the sample PC, and the subject-matter interpretatins the scree plt plt the rdered eigenvalues λ 1,, λ p and lk fr the elbw (bend) in the plt. The number f PCs is the pint where the remaining eigenvalues are relatively small and all abut the same size. Ha Helen Zhang Lecture 10, Principal Cmpnent Analysis 13 / 16

Principal Cmpnent Analysis Ha Helen Zhang Lecture 10, Principal Cmpnent Analysis 14 / 16

Wide Applicatins Principal Cmpnent Analysis PCA is very useful in explratry data analysis. prvide a simpler and mre parsimnius descriptin f the cvariance structure dimensin reductin visualizatin fr high-dimensinal data Applicatins: in signal prcessing, called discrete KLT transfrm in linear algebra, called eigenvalue decmpsitin (EVD) f X T X. Glub and Van Lan (1983), called singular value decmpsitin (SVD) f X. in nise and vibratin, called spectral decmpsitin. Ha Helen Zhang Lecture 10, Principal Cmpnent Analysis 15 / 16

Further Remarks Principal Cmpnent Analysis Remarks: PCs are slely determined by the cvariance matrix Σ. The PCA analysis des nt require a multivariate nrmal distributin. Cncerns: unsupervised learning ignre the respnse Ha Helen Zhang Lecture 10, Principal Cmpnent Analysis 16 / 16