Priipal Compoet Aalysis Nuo Vasoelos (Ke Kreutz-Delgado) UCSD
Curse of dimesioality Typial observatio i Bayes deisio theory: Error ireases whe umber of features is large Eve for simple models (e.g. Gaussia) we eed a large umber of examples to have good estimates Q: what does large mea? This depeds o the dimesio of the spae The best way to see this is to thik of a histogram suppose you have 100 poits ad you eed at least 10 bis per axis i order to get a reasoable quatizatio for uiform data you get, o average, dimesio 1 2 3 poits/bi 10 1 0.1 whih is deet i1d, bad i 2D, terrible i 3D (9 out of eah10 bis are empty!) 2
Curse of Dimesioality This is the urse of dimesioality: For a give lassifier the umber of examples required to maitai lassifiatio auray ireases expoetially with the dimesio of the feature spae I higher dimesios the lassifier has more parameters Therefore: Higher omplexity & Harder to lear 3
Dimesioality Redutio What do we do about this? Avoid ueessary dimesios Ueessary features arise i two ways: 1.features are ot disrimiat 2.features are ot idepedet (are highly orrelated) No-disrimiat meas that they do ot separate the lasses well disrimiat o-disrimiat 4
Dimesioality Redutio Q: How do we detet the presee of feature orrelatios? A: The data lives i a low dimesioal subspae (up to some amouts of oise). E.g. ew feature y salary o o o o o o o o o o o o o o ar loa projetio oto 1D subspae: y = a x salary o o o o o o o o ar loa I the example above we have a 3D hyper-plae i 5D If we a fid this hyper-plae we a: Projet the data oto it Get rid of two dimesios without itroduig sigifiat error 5
Priipal Compoets Basi idea: If the data lives i a (lower dimesioal) subspae, it is goig to look very flat whe viewed from the full spae, e.g. 1D subspae i 2D 2D subspae i 3D This meas that: If we fit a Gaussia to the data the iso-probability otours are goig to be highly skewed ellipsoids The diretios that explai most of the variae i the fitted data give the Priiple Compoets of the data. 6
Priipal Compoets How do we fid these ellipsoids? Whe we talked about metris we said that the Mahalaobis distae measures the atural uits for the problem beause it is adapted to the ovariae of the data We also kow that What is speial about it is that it uses S -1 Hee, iformatio about possible subspae struture must be i the ovariae matrix S d x x x 2 T 1 (, ) ( ) S ( ) 7
Multivariate Gaussia Review The equiprobability otours (level sets) of a Gaussia are the poits suh that Let s osider the hage of variable z = x-, whih oly moves the origi by. The equatio is the equatio of a ellipse (a hyperellipse). This is easy to see whe S is diagoal: 8
Gaussia Review This is the equatio of a ellipse with priipal legths s i E.g. whe d = 2 is the ellipse z 2 s 2 s 1 z 1 9
Gaussia Review Itrodue a trasformatio y = F z The y has ovariae If F is proper orthogoal this is just a rotatio ad we have y 2 z 2 f 2 s 2 s 1 f 1 y 1 y = F z s 2 s 1 z 1 We obtai a rotated ellipse with priipal ompoets f 1 ad f 2 whih are the olums of F Note that is the eigedeompositio of S y 10
Priipal Compoet Aalysis (PCA) If y is Gaussia with ovariae S, the equiprobability otours are the ellipses whose Priipal Compoets f i are the eigevetors of S Priipal Values (legths) s i are the square roots of the eigevalues l i of S y 2 f 2 s 2 s 1 f 1 y 1 By omputig the eigevalues we kow if the data is flat s 1 >> s 2 : flat s 1 = s 2 : ot flat y 2 y 2 s 2 s 1 s 2 y 1 s 1 y 1 11
Learig-based PCA 12
Learig-based PCA 13
Priipal Compoet Aalysis How to determie the umber of eigevetors to keep? Oe possibility is to plot eigevalue magitudes This is alled a Sree Plot Usually there is a fast derease i the eigevalue magitude followed by a flat area Oe good hoie is the kee of this urve 14
Priipal Compoet Aalysis Aother possibility: Peretage of Explaied Variae Remember that eigevalues are a measure of variae alog the priiple diretios (eigevetors) y 2 z 2 f 2 l2 l 1 f 1 y 1 y = F z s 2 s 1 z 1 Ratio r k measures % of total variae otaied i the top k eigevalues Measure of the fratio of data variability alog the assoiated eigevetors r k k i1 i1 s s 2 i 2 i 15
Priipal Compoet Aalysis Give r k a atural measure is to pik the eigevetors that explai p % of the data variability This a be doe by plottig the ratio r k as a futio of k E.g. we eed 3 eigevetors to over 70% of the variability of this dataset 16
PCA by SVD There is a alterative way to ompute the priipal ompoets, based o the sigular value deompositio ( Codesed ) Sigular Value Deompositio (SVD): Ay full-rak x m matrix ( >m) a be deomposed as T A P M is a x m (osquare) olum orthogoal matrix of left sigular vetors (olums of M) P is a m x m (square) diagoal matrix otaiig the m sigular values (whih are ozero ad stritly positive) N a m x m row orthogoal matrix of right sigular vetors (olums of N = rows of N T ) T I T NN T mm I mm 17
PCA by SVD To relate this to PCA, we ostrut the d x Data Matrix The sample mea is X x1 x 1 1 1 1 x i x1 x X1 i1 1 18
PCA by SVD We eter the data by subtratig the mea from eah olum of X This yields the d x Cetered Data Matrix X x x 1 1 1 X 1 X X11 X I 11 T T T 19
PCA by SVD The Sample Covariae is the d x d matrix 1 T 1 T S xi xi xi xi i where x i is the i th olum of X This a be writte as i S x 1 1 x x X X 1 T 1 x 20
PCA by SVD The etered data matrix is x d. Assumig it has rak = d, it has the SVD: T This yields: X X P T T x x 1 T I T 1 1 1 S XX P P P T T T 2 T I 21
PCA by SVD Notig that N is d x d ad orthoormal, ad P 2 diagoal, shows that this is just the eigevalue deompositio of S It follows that The eigevetors of S are the olums of N The eigevalues of S are l s i 1 2 T S P This gives a alterative algorithm for PCA 2 i 2 i 22
PCA by SVD Summary of Computatio of PCA by SVD: Give X with oe example per olum 1) Create the (trasposed) Cetered Data-Matrix: 2) Compute its SVD: 1 X I 11 X T T T X T P T 3) Priipal Compoets are olums of N; Priiple Values are: s i l i i 23
Priipal Compoet Aalysis Priipal ompoets are ofte quite iformative about the struture of the data Example: Eigefaes, the priipal ompoets for the spae of images of faes The figure oly show the first 16 eigevetors (eigefaes) Note lightig, struture, et 24
Priipal Compoets Aalysis PCA has bee applied to virtually all learig problems E.g. eigeshapes for fae morphig morphed faes 25
Priipal Compoet Aalysis Soud average soud images Eigesouds orrespodig to the three highest eigevalues 26
Priipal Compoet Aalysis Turbulee Flames Eigeflames 27
Priipal Compoet Aalysis Video Eigerigs reostrutio 28
doumets doumets Priipal Compoet Aalysis Text: Latet Semati Idexig Represet eah doumet by a word histogram Perform SVD o the doumet x word matrix terms oepts x x terms oepts = Priipal ompoets as the diretios of semati oepts 29
Latet Semati Aalysis Appliatios: doumet lassifiatio, iformatio Goal: solve two fudametal problems i laguage Syoymy: differet writers use differet words to desribe the same idea. Polysemy: the same word a have multiple meaigs Reasos: Origial term-doumet matrix is too large for the omputig resoures Origial term-doumet matrix is oisy: for istae, aedotal istaes of terms are to be elimiated. Origial term-doumet matrix overly sparse relative to "true" term-doumet matrix. E.g. lists oly words atually i eah doumet, whereas we might be iterested i all words related to eah doumet-- muh larger set due to syoymy 30
Latet Semati Aalysis After PCA some dimesios get "merged": {(ar), (truk), (flower)} --> {(1.3452 * ar + 0.2828 * truk), (flower)} This mitigates syoymy, Merges the dimesios assoiated with terms that have similar meaigs. Ad mitigates polysemy, Compoets of polysemous words that poit i the "right" diretio are added to the ompoets of words that share this sese. Coversely, ompoets that poit i other diretios ted to either simply ael out, or, at worst, to be smaller tha ompoets i the diretios orrespodig to the iteded sese. 31
Extesios Soo we will talk about kerels It turs out that ay algorithm whih depeds o the data through dot-produts oly, i.e. the matrix of elemets T i x x j a be kerelized This is usually beefiial, we will see why later For ow we look at the questio of whether PCA a be writte i the ier produt form metioed above Reall the data matrix is X x1 x 32
Extesios Reall the etered data matrix, ovariae, ad SVD: X X I 1 11 T X MP T N T This yields: X X MP M, F N X MP 1 P T 2 T 1 2 Hee, solvig for the d positive (ozero) eigevalues of the ier produt matrix X T X, ad for their assoiated eigevetors, provides a alterative way to ompute the eigedeompositio of the sample ovariae matrix eeded to perform a SVD., 33
Extesios I summary, we have T S FF F X MP 1 This meas that we a obtai PCA by 1) Assemblig the ier-produt matrix X T X 2) Computig its eigedeompositio P 2, ) PCA 1 1 X X M P M MM T 2 T T The priipal ompoets are the give by F = X P 1 The eigevalues are give by 1 / ) P 2 34
Extesios What is iterestig here is that we oly eed the matrix x1 T K X X x1 x x x T x This is the ier produt matrix of dot-produts of the etered data-poits Notie that you do t eed the poits themselves, oly their dot-produts (similarities) 35
Extesios I summary, to get PCA 1) Compute the dot-produt matrix K = X T X 2) Compute its eigedeompositio P 2, ) PCA: For a ovariae matrix S = FF T Priipal Compoets are give by F = X P 1 Eigevalues are give by 1 / ) P 2 Projetio of the etered data-poits oto the priipal ompoets is give by T T X F X X MP K MP 1 1 This allows the omputatio of the eigevalues ad PCA oeffiiets whe we oly have aess to the dot-produt (ier produt) matrix K 36
END 37