Priipal Compoet Aalysis Nuo Vasoelos ECE Departmet, UCSD
Curse of dimesioality typial observatio i Bayes deisio theory: error ireases whe umber of features is large problem: eve for simple models (e.g. Gaussia) we eed large # of eamples to have good estimates Q: what does large mea? his depeds o the dimesio of the spae the best way to see this is to thik of a histogram suppose you have 00 poits ad you eed at least 0 bis per ais i order to get a reasoable quatizatio for uiform data you get, o average, deet id, bad i 2D, terrible i 3D (9 out of eah0 bis empty) dimesio 2 3 poits/bi 0 0. 2
Curse of dimesioality this is the urse of dimesioality for a give lassifier umber of eamples required to maitai lassifiatio auray ireases epoetially with the dimesio of the spae i higher dimesios the lassifier has more parameters higher ompleity, harder to lear 3
Dimesioality redutio what do we do about this? we avoid ueessary dimesios ueessary a be measured i two ways:.features are ot disrimiat 2.features are ot idepedet o-disrimiat meas that they do ot separate the lasses well disrimiat o-disrimiat 4
Dimesioality redutio Q: how do we detet the presee of feature orrelatios? A: the data lives i a low dimesioal subspae (up to some amouts of oise). E.g. ew feature y salary o o o o o o o o o oo o o o ar loa projetio oto D subspae: y a salary o o o o o o o o ar loa i the eample above we have a 3D hyper-plae i 5D if we a fid this hyper-plae we a projet the data oto it get rid of half of the dimesios without itroduig sigifiat error 5
Priipal ompoet aalysis basi idea: if the data lives i a subspae, it is goig to look very flat whe viewed from the full spae, e.g. D subspae i 2D 2D subspae i 3D this meas that if we fit a Gaussia to the data the iso-probability otours are goig to be highly skewed ellipsoids 6
Priipal ompoet aalysis how do we fid these ellipsoids? whe we talked about metris we said that the Mahalaobis distae measures the atural uits for the problem beause it is adapted to the ovariae of the data we also kow that what is speial about it is that it uses Σ - hee, the iformatio must be i Σ d(, y) ( y) Σ ( y) 7
Gaussia review the equiprobability otours of a Gaussia are the poits suh that let s osider the hage of variable z -µ, whih oly moves the origi by µ. he equatio is the equatio of a ellipse. this is easy to see whe Σ is diagoal: 8
Gaussia review this is the equatio of a ellipse with priipal legths σ i e.g. whe d 2 is the ellipse z 2 σ 2 σ z itrodue the trasformatio y Φ Τ z 9
Gaussia review itrodue the trasformatio y Φ Τ z the y has ovariae if Φ is orthoormal this is just a rotatio ad we have y 2 z 2 φ 2 λ2 λ φ y y Φ Τ z σ 2 σ z we obtai a rotated ellipse with priipal ompoets φ ad φ 2 whih are the olums of Φ ote that is the eige-deompositio of Σ y 0
Priipal ompoet aalysis If y is Gaussia with ovariae Σ, the equiprobability otours are the ellipses whose priipal ompoets φ i are the eigevetors of Σ priipal legths λ i are the eigevalues of Σ φ 2 y 2 λ2 λ φ y by omputig the eigevalues we kow if the data is flat λ >> λ 2 : flat λ λ 2 : ot flat y 2 y 2 λ 2 λ λ 2 y λ y
Priipal ompoet aalysis (learig) 2
Priipal ompoet aalysis 3
Priipal ompoet aalysis how do I determie the umber of eigevetors to keep? oe possibility is to plot eigevalue magitudes this is a sree plot usually there is a fast derease i the eigevalue magitude followed by a flat area oe good hoie is the kee of this urve 4
Priipal ompoet aalysis aother possibility is the peretage of eplaied variae remember that eigevalues are a measure of variae y 2 z 2 φ 2 λ2 λ φ y y Φ Τ z σ 2 σ z ratio r k measures % of total variae otaied i the top k eigevalues measure of the fratio of data variability alog the assoiated eigevetors r k k 2 λi i i λ 2 i 5
Priipal ompoet aalysis a atural measure is to pik the eigevetors that eplai p % of the data variability a be doe by plottig the ratio r k as a futio of k r k k 2 λi i i λ 2 i e.g. we eed 3 eigevetors to over 70% of the variability of this dataset 6
Priipal ompoet aalysis there is a alterative maer to ompute the priipal ompoets, based o sigular value deompositio SVD: ay real m matri (>m) a be deomposed as A ΜΠΝ where M is a m olum orthoormal matri of left sigular vetors (olums of M) Π a m m diagoal matri of sigular values N a m m row orthoormal matri of right sigular vetors (olums of N) Μ Μ I Ν Ν I 7
8 PCA by SVD to relate this to PCA, we osider the data matri the sample mea is K i i M K µ
9 PCA by SVD ad we a eter the data by subtratig the mea to eah olum of this is the etered data matri I µ µ µ K K
20 PCA by SVD the sample ovariae is where i is the i th olum of this a be writte as ( )( ) ( ) Σ i i i i i i µ µ Σ M K
2 PCA by SVD the matri is real d. Assumig > d it has SVD deompositio ad M ΜΠΝ I I Ν Ν Μ Μ Ν ΝΠ ΜΠΝ ΝΠΜ Σ 2
PCA by SVD Σ Ν Π Ν otig that N is d d ad orthoormal, ad Π 2 diagoal, shows that this is just the eigevalue deompositio of Σ it follows that the eigevetors of Σ are the olums of N the eigevalues of Σ are 2 λ i π i this gives a alterative algorithm for PCA 22
PCA by SVD omputatio of PCA by SVD give with oe eample per olum ) reate the etered data-matri 2) ompute its SVD I ΜΠΝ 3) priipal ompoets are olums of N, eigevalues are λ i π i 23
Priipal ompoet aalysis priipal ompoets are usually quite iformative about the struture of the data eample the priipal ompoets for the spae of images of faes the figure oly show the first 6 eigevetors ote lightig, struture, et 24
Priipal ompoets aalysis PCA has bee applied to virtually all learig problems e.g. eigeshapes for fae morphig morphed faes 25
Priipal ompoet aalysis soud average soud images Eigeobjets orrespodig to the three highest eigevalues 26
Priipal ompoet aalysis turbulee flames eigeflames 27
Priipal ompoet aalysis video eigerigs reostrutio 28
Priipal ompoet aalysis tet: latet semati ideig represet eah doumet by a word histogram perform SVD o the doumet word matri terms oepts terms oepts doumets doumets priipal ompoets as the diretios of semati oepts 29
Latet semati aalysis appliatios: doumet lassifiatio, iformatio goal: solve two fudametal problems i laguage syoymy: differet writers use differet words to desribe the same idea. polysemy, the same word a have multiple meaigs reasos: origial term-doumet matri is too large for the omputig resoures origial term-doumet matri is oisy: for istae, aedotal istaes of terms are to be elimiated. origial term-doumet matri overly sparse relative to "true" term-doumet matri. E.g. lists oly words atually i eah doumet, whereas we might be iterested i all words related to eah doumet-- muh larger set due to syoymy 30
Latet semati aalysis after PCA some dimesios get "merged": {(ar), (truk), (flower)} --> {(.3452 * ar + 0.2828 * truk), (flower)} this mitigates syoymy, merges the dimesios assoiated with terms that have similar meaigs. ad mitigates polysemy, ompoets of polysemous words that poit i the "right" diretio are added to the ompoets of words that share this sese. oversely, ompoets that poit i other diretios ted to either simply ael out, or, at worst, to be smaller tha ompoets i the diretios orrespodig to the iteded sese. 3
Etesios i a few letures we will talk about kerels turs out that ay algorithm whih depeds o the data through dot-produts oly, i.e. the matri of elemets a be kerelized i this is usually beefiial, we will see why later for ow we look at the questio of whether PCA a be writte i the form above reall the data matri is j K 32
33 Etesios the etered-data matri ad the ovariae the eigevetor φ i of eigevalue λ i is hee, the eigevetor matri is I Σ i i i i i i i φ α α λ φ λ φ, Γ Γ Φ, d d d λ α λ α K
34 Etesios we et ote that, from the eigevetor deompositio ad i.e. ΣΦ Λ Φ Σ ΦΛΦ ( )( )Γ Γ Γ Λ Γ ( )( ) ΓΛΓ
Etesios i summary, we have this meas that we a obtai PCA by ) assemblig - ( )( ) 2) omputig its eige-deompositio (Λ,Γ) PCA Σ ΦΛΦ Φ the priipal ompoets are the give by Γ the eigevalues are give by Λ Γ ( )( ) ΓΛΓ 35
36 Etesios the matri is the matri of dot-produts of the etered data-poits it is symmetri ( ) M K K M K M K
37 Etesios hee whih, usig ( )( ) K K ( ) i k k k k k k K K M K
Etesios is just the ovariae of the olums of the matri K K ( ) K i.e., the dot-produt matri for the data M M 38
Etesios i summary, to get PCA ) ompute the dot-produt matri K 2) ompute its eige-deompositio (Λ,Γ) PCA the priipal ompoets are the give by Φ Γ the eigevalues are give by Λ the projetio of the data-poits o the priipal ompoets is give by Φ Γ K Γ this allows the omputatio of the eigevalues ad PCA oeffiiets whe we oly have aess to the dot-produt matri K 39
40