Prncple Component Anlyss Jng Go SUNY Bufflo
Why Dmensonlty Reducton? We hve too mny dmensons o reson bout or obtn nsghts from o vsulze oo much nose n the dt Need to reduce them to smller set of fctors Better representton of dt wthout losng much nformton Cn buld more effectve dt nlyses on the reduced-dmensonl spce: clssfcton, clusterng, pttern recognton
Component Anlyss Dscover new set of fctors/dmensons/es gnst whch to represent, descrbe or evlute the dt Fctors re combntons of observed vrbles My be more effectve bses for nsghts Observed dt re descrbed n terms of these fctors rther thn n terms of orgnl vrbles/dmensons 3
Bsc Concept Ares of vrnce n dt re where tems cn be best dscrmnted nd key underlyng phenomen observed Ares of gretest sgnl n the dt If two tems or dmensons re hghly correlted or dependent hey re lkely to represent hghly relted phenomen If they tell us bout the sme underlyng vrnce n the dt, combnng them to form sngle mesure s resonble 4
Bsc Concept So we wnt to combne relted vrbles, nd focus on uncorrelted or ndependent ones, especlly those long whch the observtons hve hgh vrnce We wnt smller set of vrbles tht epln most of the vrnce n the orgnl dt, n more compct nd nsghtful form hese vrbles re clled fctors or prncpl components 5
Prncpl Component Anlyss Most common form of fctor nlyss he new vrbles/dmensons Are lner combntons of the orgnl ones Are uncorrelted wth one nother Orthogonl n dmenson spce Cpture s much of the orgnl vrnce n the dt s possble Are clled Prncpl Components 6
Orgnl Vrble B Wht re the new es? PC PC Orgnl Vrble A Orthogonl drectons of gretest vrnce n dt Projectons long PC dscrmnte the dt most long ny one s 7
Prncpl Components Frst prncpl component s the drecton of gretest vrblty (covrnce) n the dt Second s the net orthogonl (uncorrelted) drecton of gretest vrblty So frst remove ll the vrblty long the frst component, nd then fnd the net drecton of gretest vrblty And so on 8
Prncpl Components Anlyss (PCA) Prncple Lner projecton method to reduce the number of prmeters rnsfer set of correlted vrbles nto new set of uncorrelted vrbles Mp the dt nto spce of lower dmensonlty Propertes It cn be vewed s rotton of the estng es to new postons n the spce defned by orgnl vrbles New es re orthogonl nd represent the drectons wth mmum vrblty 9
Algebrc defnton of PCs Gven smple of n observtons on vector of p vrbles p,,, n defne the frst prncpl component of the smple by the lner trnsformton z where the vector s chosen such tht p j j, j,,, n vr[ z ] j ( ( j,, s mmum..,, j,, p pj ) ) 0
Algebrc dervton of PCs o fnd frst note tht where s the covrnce mtr. n n S ) ) (( ] vr[ S n n z z E z n n the men. s n n In the followng, we ssume the Dt s centered. 0
Algebrc dervton of PCs Assume Form the mtr: X 0,,, ] [ n pn then S n XX
Algebrc dervton of PCs vr[ z ] o fnd tht mmzes subject to Let λ be Lgrnge multpler L L S S S S ( ) 0 therefore s n egenvector of S correspondng to the lrgest egenvlue. 3
Algebrc dervton of PCs o fnd the net coeffcent vector subject to nd to cov[ z, z] cov[ 0 z, z ] mmzng uncorrelted S vr[ z ] then let λ nd φ be Lgrnge multplers, nd mmze L S ) ( 4
Algebrc dervton of PCs We fnd tht whose egenvlue s lso n egenvector of S s the second lrgest. In generl vr[ z k ] k S k k he k th lrgest egenvlue of S s the vrnce of the k th PC. z k he k th PC n the smple. retns the k th gretest frcton of the vrton 5
Algebrc dervton of PCs Mn steps for computng PCs Form the covrnce mtr S. Compute ts egenvectors: p Use the frst d egenvectors to form the d PCs. he trnsformton G s gven by G [,,, d A test pont p ] G d d. 6
Dmensonlty Reducton Orgnl dt reduced dt Lner trnsformton G dp Y d X p G p d : X Y G X d 7
Steps of PCA Let X be the men vector (tkng the men of ll rows) Adjust the orgnl dt by the men X X = X Compute the covrnce mtr S of djusted X Fnd the egenvectors nd egenvlues of S. 8
Prncpl components - Vrnce 5 0 Vrnce (%) 5 0 5 0 PC PC PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC0 9
0 rnsformed Dt Egenvlues j corresponds to vrnce on ech component j hus, sort by j ke the frst d egenvectors ; where d s the number of top egenvlues hese re the drectons wth the lrgest vrnces n n d d y y y.........
An Emple X X X' X' Men=4. Men=53.8 9 63-5. 9.5 39 74 4.9 0.5 30 87 5.9 33.5 30 3 5.9-30.75 00 90 80 70 60 50 40 30 0 0 0 0 0 0 30 40 50 Seres 5 35-9. -8.75 40 30 5 43-9. -0.75 5 3-9. -.75 30 73 5.9 9.5 0 0 0-5 -0-5 -0 0 5 0 5 0-0 -30-40 Seres
Covrnce Mtr C= 75 06 06 48 We fnd out: Egenvectors: =(-0.98,-0.), =5.8 =(0.,-0.98), =560.
rnsform to One-dmenson We keep the dmenson of =(0.,-0.98) We cn obtn the fnl dt s 0.5 0.4 0.3 0. 0. 0-40 -0-0. 0 0 40-0. -0.3-0.4-0.5 y -0.4-6.7-3.35 3.374 6.464 8.64 9.404-7.63 y 0. 0.98 0.* 0.98* 3