LECTURE 9: Principal Components Analysis

LECURE 9: Prcpal Compoets Aalss he curse of dmesoalt Dmesoalt reducto Feature selecto vs. feature etracto Sal represetato vs. sal classfcato Prcpal Compoets Aalss Itroducto to Patter Aalss Rcardo Guterrez-Osua eas A& Uverst

he curse of dmesoalt () he curse of dmesoalt A term coed Bellma 96 Refers to the prolems assocated wth multvarate data aalss as the dmesoalt creases We wll llustrate these prolems wth a smple eample Cosder a 3-class patter recoto prolem A smple approach would e to Dvde the feature space to uform s Compute the rato of eamples for each class at each ad, For a ew eample, fd ts ad choose the predomat class that I our to prolem we decde to start wth oe sle feature ad dvde the real le to 3 semets After do ths, we otce that there ests too much overlap amo the classes, so we decde to corporate a secod feature to tr ad mprove separalt Itroducto to Patter Aalss Rcardo Guterrez-Osua eas A& Uverst

he curse of dmesoalt () We decde to preserve the raulart of each as, whch rases the umer of s from 3 ( D) to 3 9 ( D) At ths pot we eed to make a decso: do we mata the dest of eamples per or do we keep the umer of eamples had for the oe-dmesoal case? Choos to mata the dest creases the umer of eamples from 9 ( D) to 7 ( D) Choos to mata the umer of eamples results a D scatter plot that s ver sparse Costat dest Costat # eamples ov to three features makes the prolem worse: he umer of s rows to 3 3 7 For the same dest of eamples the umer of eeded eamples ecomes 8 For the same umer of eamples, well, the 3D scatter plot s almost empt 3 Itroducto to Patter Aalss Rcardo Guterrez-Osua eas A& Uverst 3

he curse of dmesoalt (3) Ovousl, our approach to dvde the sample space to equall spaced s was qute effcet here are other approaches that are much less susceptle to the curse of dmesoalt, ut the prolem stll ests How do we eat the curse of dmesoalt? B corporat pror kowlede B provd creas smoothess of the taret fucto B reduc the dmesoalt I practce, the curse of dmesoalt meas that, for a ve sample sze, there s a mamum umer of features aove whch the performace of our classfer wll derade rather tha mprove I most cases, the addtoal formato that s lost dscard some features s (more tha) compesated a more accurate mapp the lowerdmesoal space performace dmesoalt Itroducto to Patter Aalss Rcardo Guterrez-Osua eas A& Uverst 4

he curse of dmesoalt (4) here are ma mplcatos of the curse of dmesoalt Epoetal rowth the umer of eamples requred to mata a ve sampl dest For a dest of eamples/ ad D dmesos, the total umer of eamples s D Epoetal rowth the complet of the taret fucto (a dest estmate) wth creas dmesoalt A fucto defed hh-dmesoal space s lkel to e much more comple tha a fucto defed a lower-dmesoal space, ad those complcatos are harder to dscer Fredma hs meas that, order to lear t well, a more comple taret fucto requres deser sample pots! What to do f t a t Gaussa? For oe dmeso a lare umer of dest fuctos ca e foud tetooks, ut for hh-dmesos ol the multvarate Gaussa dest s avalale. oreover, for larer values of D the Gaussa dest ca ol e hadled a smplfed form! Humas have a etraordar capact to dscer patters ad clusters, ad 3-dmesos, ut these capaltes derade drastcall for 4 or hher dmesos Itroducto to Patter Aalss Rcardo Guterrez-Osua eas A& Uverst 5

Itroducto to Patter Aalss Rcardo Guterrez-Osua eas A& Uverst 6 Dmesoalt reducto () wo approaches are avalale to perform dmesoalt reducto Feature etracto: creat a suset of ew features comatos of the est features Feature selecto: choos a suset of all the features (the oes more formatve) he prolem of feature etracto ca e stated as Gve a feature space R fd a mapp f():r R wth < such that the trasformed feature vector R preserves (most of) the formato or structure R. A optmal mapp f() wll e oe that results o crease the mmum proalt of error hs s, a Baes decso rule appled to the tal space R ad to the reduced space R eld the same classfcato rate etracto feature selecto feature f

Dmesoalt reducto () I eeral, the optmal mapp f() wll e a o-lear fucto However, there s o sstematc wa to eerate o-lear trasforms he selecto of a partcular suset of trasforms s prolem depedet For ths reaso, feature etracto s commol lmted to lear trasforms: W hs s, s a lear projecto of OE: Whe the mapp s a o-lear fucto, the reduced space s called a mafold lear feature etracto w w w w w w L L O w w w We wll focus o lear feature etracto for ow, ad revst o-lear techques whe we cover mult-laer perceptros Itroducto to Patter Aalss Rcardo Guterrez-Osua eas A& Uverst 7

Sal represetato versus classfcato he selecto of the feature etracto mapp f() s uded a ojectve fucto that we seek to mamze (or mmze) Deped o the crtera used the ojectve fucto, feature etracto techques are rouped to two cateores: Sal represetato: he oal of the feature etracto mapp s to represet the samples accuratel a lower-dmesoal space Classfcato: he oal of the feature etracto mapp s to ehace the class-dscrmator formato the lower-dmesoal space Wth the realm of lear feature etracto, two techques are commol used Prcpal Compoets Aalss (PCA) uses a sal represetato crtero Lear Dscrmat Aalss (LDA) uses a sal classfcato crtero Feature Sal represetato Classfcato Feature Itroducto to Patter Aalss Rcardo Guterrez-Osua eas A& Uverst 8

Itroducto to Patter Aalss Rcardo Guterrez-Osua eas A& Uverst 9 Prcpal Compoets Aalss, PCA () he ojectve of PCA s to perform dmesoalt reducto whle preserv as much of the radomess (varace) the hh-dmesoal space as possle Let e a -dmesoal radom vector, represeted as a lear comato of orthoormal ass vectors [ϕ ϕ... ϕ ] as Suppose we choose to represet wth ol (<) of the ass vectors. We ca do ths replac the compoets [,, ] wth some pre-selected costats he represetato error s the We ca measure ths represetato error the mea-squared matude of Our oal s to fd the ass vectors ϕ ad costats that mmze ths mea-square error j j where j ˆ() ( ) ˆ() () [ ] ( )( ) ( ) [ ] j j j j E E () E () ε

Itroducto to Patter Aalss Rcardo Guterrez-Osua eas A& Uverst Prcpal Compoets Aalss, PCA () As we have doe earler the course, the optmal values of ca e foud comput the partal dervatve of the ojectve fucto ad equat t to zero herefore, we wll replace the dscarded dmesos s ther epected value (a tutve soluto) he mea-square error ca the e wrtte as where Σ s the covarace matr of We seek to fd the soluto that mmzes ths epresso suject to the orthoormalt costrat, whch we corporate to the epresso us a set of Larae multplers λ Comput the partal dervatve wth respect to the ass vectors So ϕ ad λ are the eevectors ad eevalues of the covarace matr Σ ( ) [ ] [ ] ( ) [ ] E E E ( ) [ ] ( ) ( ) [ ] ( )( ) [ ] Σ E[] E[] E E[ ] E[ ] E ] E[ E () ε ) λ ( Σ () ε ( ) ( ) ( ) A A A A d d OE : λ Σ λ Σ ) λ ( Σ () ε smmmetrc A s f

Prcpal Compoets Aalss, PCA (3) We ca the epress the sum-square error as ε () Σ λ λ I order to mmze ths measure, λ wll have to e smallest eevalues herefore, to represet wth mmum sum-square error, we wll choose the eevectors ϕ correspod to the larest eevalues λ. PCA dmesoalt reducto he optmal* appromato of a radom vector R a lear comato of (<) depedet vectors s otaed project the radom vector oto the eevectors ϕ correspod to the larest eevalues λ of the covarace matr Σ *optmalt s defed as the mmum of the sum-square matude of the appromato error Itroducto to Patter Aalss Rcardo Guterrez-Osua eas A& Uverst

Prcpal Compoets Aalss, PCA (4) OES Sce PCA uses the eevectors of the covarace matr Σ, t s ale to fd the depedet aes of the data uder the umodal Gaussa assumpto For o-gaussa or mult-modal Gaussa data, PCA smpl de-correlates the aes he ma lmtato of PCA s that t does ot cosder class separalt sce t does ot take to accout the class lael of the feature vector PCA smpl performs a coordate rotato that als the trasformed aes wth the drectos of mamum varace here s o uaratee that the drectos of mamum varace wll cota ood features for dscrmato Hstorcal remarks Prcpal Compoets Aalss s the oldest techque multvarate aalss PCA s also kow as the Karhue-Loève trasform (commucato theor) PCA was frst troduced Pearso 9, ad t epereced several modfcatos utl t was eeralzed Loève 963 Itroducto to Patter Aalss Rcardo Guterrez-Osua eas A& Uverst

PCA eample () I ths eample we have a three-dmesoal Gaussa dstruto wth the follow parameters he three pars of prcpal compoet projectos are show elow µ [ 5 ] 5 ad Σ 7 4 4 7 4 otce that the frst projecto has the larest varace, followed the secod projecto Also otce that the PCA projectos de-correlates the as (we kew ths sce Lecture 3, thouh) 3 8 6 4 - -4-6 -8 5 - -5 5 5 5 8 3 5 3 6 4-5 - -5 - -5 - -5 5 5-5 - -5 5 5 - -8-6 -4-4 6 8 Itroducto to Patter Aalss Rcardo Guterrez-Osua eas A& Uverst 3

PCA eample () hs eample shows a projecto of a three-dmesoal data set to two dmesos Itall, ecept for the eloato of the cloud, there s o apparet structure the set of pots Choos a approprate rotato allows us to uvel the uderl structure. (You ca thk of ths rotato as "walk aroud" the three-dmesoal set, look for the est vewpot) PCA ca help fd such uderl structure. It selects a rotato such that most of the varalt wth the data set s represeted the frst few dmesos of the rotated data I our three-dmesoal case, ths ma seem of lttle use However, whe the data s hhl multdmesoal ( s of dmesos), ths aalss s qute powerful Itroducto to Patter Aalss Rcardo Guterrez-Osua eas A& Uverst 4

PCA eample (3) Compute the prcpal compoets for the follow two-dmesoal dataset X(, ){(,),(3,3),(3,5),(5,4),(5,6),(6,5),(8,7),(9,8)} Let s frst plot the data to et a dea of whch soluto we should epect SOLUIO ( had) he (ased) covarace estmate of the data s: 6.5 4.5 Σ 4.5 3.5 he eevalues are the zeros of the characterstc equato 6.5 - λ 4.5 Σ v λv Σ λi λ 9.34; λ 4.5 3.5 - λ he eevectors are the solutos of the sstem 8 6 4.4; v v 4 6 8 6.5 4.5 6.5 4.5 4.5 v 3.5 v 4.5 v 3.5 v λv λv λv λv v.8 v.59 v -.59 v.8 HI: o solve each sstem mauall, frst assume that oe of the varales s equal to oe (.e. v ), the fd the other oe ad fall ormalze the vector to make t ut-leth Itroducto to Patter Aalss Rcardo Guterrez-Osua eas A& Uverst 5