Principal Component Analysis

Similar documents
Principal Component Analysis. Nuno Vasconcelos ECE Department, UCSD

Sx [ ] = x must yield a

Basic Probability/Statistical Theory I

ε > 0 N N n N a n < ε. Now notice that a n = a n.

Observer Design with Reduced Measurement Information

Fluids Lecture 2 Notes

Bernoulli Numbers. n(n+1) = n(n+1)(2n+1) = n(n 1) 2

Machine Learning for Data Science (CS 4786)

Singular value decomposition. Mathématiques appliquées (MATH0504-1) B. Dewals, Ch. Geuzaine

Chapter 8 Hypothesis Testing

Machine Learning for Data Science (CS 4786)

Inverse Matrix. A meaning that matrix B is an inverse of matrix A.

SYNTHESIS OF SIGNAL USING THE EXPONENTIAL FOURIER SERIES

Class #25 Wednesday, April 19, 2018

After the completion of this section the student. V.4.2. Power Series Solution. V.4.3. The Method of Frobenius. V.4.4. Taylor Series Solution

1 Last time: similar and diagonalizable matrices

Chapter 18 Summary Sampling Distribution Models

A widely used display of protein shapes is based on the coordinates of the alpha carbons - - C α

For a 3 3 diagonal matrix we find. Thus e 1 is a eigenvector corresponding to eigenvalue λ = a 11. Thus matrix A has eigenvalues 2 and 3.

Mon Apr Second derivative test, and maybe another conic diagonalization example. Announcements: Warm-up Exercise:

Bertrand s Postulate

Solutions 3.2-Page 215

Axis Aligned Ellipsoid

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

11 Correlation and Regression

(8) 1f = f. can be viewed as a real vector space where addition is defined by ( a1+ bi

Chimica Inorganica 3

Summation Method for Some Special Series Exactly

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

Image Spaces. What might an image space be

TMA4205 Numerical Linear Algebra. The Poisson problem in R 2 : diagonalization methods

Topics in Eigen-analysis

Orthogonal transformations

a for a 1 1 matrix. a b a b 2 2 matrix: We define det ad bc 3 3 matrix: We define a a a a a a a a a a a a a a a a a a

Machine Learning for Data Science (CS4786) Lecture 4

The Stokes Theorem. (Sect. 16.7) The curl of a vector field in space

ANOTHER PROOF FOR FERMAT S LAST THEOREM 1. INTRODUCTION

COMP26120: Introducing Complexity Analysis (2018/19) Lucas Cordeiro

Lecture 8. Dirac and Weierstrass

APPLICATION OF YOUNG S INEQUALITY TO VOLUMES OF CONVEX SETS

Lecture 8: October 20, Applications of SVD: least squares approximation

Apply change-of-basis formula to rewrite x as a linear combination of eigenvectors v j.

( ) ( ) ( ) notation: [ ]

THE MEASUREMENT OF THE SPEED OF THE LIGHT

(VII.A) Review of Orthogonality

Series III. Chapter Alternating Series

Factor Analysis. Lecture 10: Factor Analysis and Principal Component Analysis. Sam Roweis

Roberto s Notes on Series Chapter 2: Convergence tests Section 7. Alternating series

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Math E-21b Spring 2018 Homework #2

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression

Chapter Vectors

Société de Calcul Mathématique SA Mathematical Modelling Company, Corp.

, then cv V. Differential Equations Elements of Lineaer Algebra Name: Consider the differential equation. and y2 cos( kx)

Production Test of Rotary Compressors Using Wavelet Analysis

Physics 324, Fall Dirac Notation. These notes were produced by David Kaplan for Phys. 324 in Autumn 2001.

Chapter 12 EM algorithms The Expectation-Maximization (EM) algorithm is a maximum likelihood method for models that have hidden variables eg. Gaussian

(3) If you replace row i of A by its sum with a multiple of another row, then the determinant is unchanged! Expand across the i th row:

Section 1.1. Calculus: Areas And Tangents. Difference Equations to Differential Equations

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Stat 421-SP2012 Interval Estimation Section

Mixtures of Gaussians and the EM Algorithm

5.1 Review of Singular Value Decomposition (SVD)

6.003 Homework #3 Solutions

Infinite Sequences and Series

Fall 2013 MTH431/531 Real analysis Section Notes

Symmetric Matrices and Quadratic Forms

Chapter 1. Complex Numbers. Dr. Pulak Sahoo

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

Lecture 7: Fourier Series and Complex Power Series

Math 155 (Lecture 3)

We will conclude the chapter with the study a few methods and techniques which are useful

Session 5. (1) Principal component analysis and Karhunen-Loève transformation

10-701/ Machine Learning Mid-term Exam Solution

Solving the ZF Receiver Equation for MIMO Systems Under Variable Channel Conditions Using the Block Fourier Algorithm


Dimensionality Reduction vs. Clustering

6.3 Testing Series With Positive Terms

Explicit and closed formed solution of a differential equation. Closed form: since finite algebraic combination of. converges for x x0

x c the remainder is Pc ().

4.3 Growth Rates of Solutions to Recurrences

PAPER : IIT-JAM 2010

CMSE 820: Math. Foundations of Data Sci.

U8L1: Sec Equations of Lines in R 2

Lecture 3: August 31

Exponential Moving Average Pieter P

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Certain inclusion properties of subclass of starlike and convex functions of positive order involving Hohlov operator

Digital Signal Processing. Homework 2 Solution. Due Monday 4 October Following the method on page 38, the difference equation

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11

Cov(aX, cy ) Var(X) Var(Y ) It is completely invariant to affine transformations: for any a, b, c, d R, ρ(ax + b, cy + d) = a.s. X i. as n.

Machine Learning Assignment-1

MA Advanced Econometrics: Properties of Least Squares Estimators

Parallel Vector Algorithms David A. Padua

The beta density, Bayes, Laplace, and Pólya

Lecture 2: Monte Carlo Simulation

CALCULATION OF FIBONACCI VECTORS

Lecture 24 Floods and flood frequency

MATH10212 Linear Algebra B Proof Problems

Chapter 4: Angle Modulation

Transcription:

Priipal Compoet Aalysis Nuo Vasoelos (Ke Kreutz-Delgado) UCSD

Curse of dimesioality Typial observatio i Bayes deisio theory: Error ireases whe umber of features is large Eve for simple models (e.g. Gaussia) we eed a large umber of examples to have good estimates Q: what does large mea? This depeds o the dimesio of the spae The best way to see this is to thik of a histogram suppose you have 100 poits ad you eed at least 10 bis per axis i order to get a reasoable quatizatio for uiform data you get, o average, dimesio 1 2 3 poits/bi 10 1 0.1 whih is deet i1d, bad i 2D, terrible i 3D (9 out of eah10 bis are empty!) 2

Curse of Dimesioality This is the urse of dimesioality: For a give lassifier the umber of examples required to maitai lassifiatio auray ireases expoetially with the dimesio of the feature spae I higher dimesios the lassifier has more parameters Therefore: Higher omplexity & Harder to lear 3

Dimesioality Redutio What do we do about this? Avoid ueessary dimesios Ueessary features arise i two ways: 1.features are ot disrimiat 2.features are ot idepedet (are highly orrelated) No-disrimiat meas that they do ot separate the lasses well disrimiat o-disrimiat 4

Dimesioality Redutio Q: How do we detet the presee of feature orrelatios? A: The data lives i a low dimesioal subspae (up to some amouts of oise). E.g. ew feature y salary o o o o o o o o o o o o o o ar loa projetio oto 1D subspae: y = a x salary o o o o o o o o ar loa I the example above we have a 3D hyper-plae i 5D If we a fid this hyper-plae we a: Projet the data oto it Get rid of two dimesios without itroduig sigifiat error 5

Priipal Compoets Basi idea: If the data lives i a (lower dimesioal) subspae, it is goig to look very flat whe viewed from the full spae, e.g. 1D subspae i 2D 2D subspae i 3D This meas that: If we fit a Gaussia to the data the iso-probability otours are goig to be highly skewed ellipsoids The diretios that explai most of the variae i the fitted data give the Priiple Compoets of the data. 6

Priipal Compoets How do we fid these ellipsoids? Whe we talked about metris we said that the Mahalaobis distae measures the atural uits for the problem beause it is adapted to the ovariae of the data We also kow that What is speial about it is that it uses S -1 Hee, iformatio about possible subspae struture must be i the ovariae matrix S d x x x 2 T 1 (, ) ( ) S ( ) 7

Multivariate Gaussia Review The equiprobability otours (level sets) of a Gaussia are the poits suh that Let s osider the hage of variable z = x-, whih oly moves the origi by. The equatio is the equatio of a ellipse (a hyperellipse). This is easy to see whe S is diagoal: 8

Gaussia Review This is the equatio of a ellipse with priipal legths s i E.g. whe d = 2 is the ellipse z 2 s 2 s 1 z 1 9

Gaussia Review Itrodue a trasformatio y = F z The y has ovariae If F is proper orthogoal this is just a rotatio ad we have y 2 z 2 f 2 s 2 s 1 f 1 y 1 y = F z s 2 s 1 z 1 We obtai a rotated ellipse with priipal ompoets f 1 ad f 2 whih are the olums of F Note that is the eigedeompositio of S y 10

Priipal Compoet Aalysis (PCA) If y is Gaussia with ovariae S, the equiprobability otours are the ellipses whose Priipal Compoets f i are the eigevetors of S Priipal Values (legths) s i are the square roots of the eigevalues l i of S y 2 f 2 s 2 s 1 f 1 y 1 By omputig the eigevalues we kow if the data is flat s 1 >> s 2 : flat s 1 = s 2 : ot flat y 2 y 2 s 2 s 1 s 2 y 1 s 1 y 1 11

Learig-based PCA 12

Learig-based PCA 13

Priipal Compoet Aalysis How to determie the umber of eigevetors to keep? Oe possibility is to plot eigevalue magitudes This is alled a Sree Plot Usually there is a fast derease i the eigevalue magitude followed by a flat area Oe good hoie is the kee of this urve 14

Priipal Compoet Aalysis Aother possibility: Peretage of Explaied Variae Remember that eigevalues are a measure of variae alog the priiple diretios (eigevetors) y 2 z 2 f 2 l2 l 1 f 1 y 1 y = F z s 2 s 1 z 1 Ratio r k measures % of total variae otaied i the top k eigevalues Measure of the fratio of data variability alog the assoiated eigevetors r k k i1 i1 s s 2 i 2 i 15

Priipal Compoet Aalysis Give r k a atural measure is to pik the eigevetors that explai p % of the data variability This a be doe by plottig the ratio r k as a futio of k E.g. we eed 3 eigevetors to over 70% of the variability of this dataset 16

PCA by SVD There is a alterative way to ompute the priipal ompoets, based o the sigular value deompositio ( Codesed ) Sigular Value Deompositio (SVD): Ay full-rak x m matrix ( >m) a be deomposed as T A P M is a x m (osquare) olum orthogoal matrix of left sigular vetors (olums of M) P is a m x m (square) diagoal matrix otaiig the m sigular values (whih are ozero ad stritly positive) N a m x m row orthogoal matrix of right sigular vetors (olums of N = rows of N T ) T I T NN T mm I mm 17

PCA by SVD To relate this to PCA, we ostrut the d x Data Matrix The sample mea is X x1 x 1 1 1 1 x i x1 x X1 i1 1 18

PCA by SVD We eter the data by subtratig the mea from eah olum of X This yields the d x Cetered Data Matrix X x x 1 1 1 X 1 X X11 X I 11 T T T 19

PCA by SVD The Sample Covariae is the d x d matrix 1 T 1 T S xi xi xi xi i where x i is the i th olum of X This a be writte as i S x 1 1 x x X X 1 T 1 x 20

PCA by SVD The etered data matrix is x d. Assumig it has rak = d, it has the SVD: T This yields: X X P T T x x 1 T I T 1 1 1 S XX P P P T T T 2 T I 21

PCA by SVD Notig that N is d x d ad orthoormal, ad P 2 diagoal, shows that this is just the eigevalue deompositio of S It follows that The eigevetors of S are the olums of N The eigevalues of S are l s i 1 2 T S P This gives a alterative algorithm for PCA 2 i 2 i 22

PCA by SVD Summary of Computatio of PCA by SVD: Give X with oe example per olum 1) Create the (trasposed) Cetered Data-Matrix: 2) Compute its SVD: 1 X I 11 X T T T X T P T 3) Priipal Compoets are olums of N; Priiple Values are: s i l i i 23

Priipal Compoet Aalysis Priipal ompoets are ofte quite iformative about the struture of the data Example: Eigefaes, the priipal ompoets for the spae of images of faes The figure oly show the first 16 eigevetors (eigefaes) Note lightig, struture, et 24

Priipal Compoets Aalysis PCA has bee applied to virtually all learig problems E.g. eigeshapes for fae morphig morphed faes 25

Priipal Compoet Aalysis Soud average soud images Eigesouds orrespodig to the three highest eigevalues 26

Priipal Compoet Aalysis Turbulee Flames Eigeflames 27

Priipal Compoet Aalysis Video Eigerigs reostrutio 28

doumets doumets Priipal Compoet Aalysis Text: Latet Semati Idexig Represet eah doumet by a word histogram Perform SVD o the doumet x word matrix terms oepts x x terms oepts = Priipal ompoets as the diretios of semati oepts 29

Latet Semati Aalysis Appliatios: doumet lassifiatio, iformatio Goal: solve two fudametal problems i laguage Syoymy: differet writers use differet words to desribe the same idea. Polysemy: the same word a have multiple meaigs Reasos: Origial term-doumet matrix is too large for the omputig resoures Origial term-doumet matrix is oisy: for istae, aedotal istaes of terms are to be elimiated. Origial term-doumet matrix overly sparse relative to "true" term-doumet matrix. E.g. lists oly words atually i eah doumet, whereas we might be iterested i all words related to eah doumet-- muh larger set due to syoymy 30

Latet Semati Aalysis After PCA some dimesios get "merged": {(ar), (truk), (flower)} --> {(1.3452 * ar + 0.2828 * truk), (flower)} This mitigates syoymy, Merges the dimesios assoiated with terms that have similar meaigs. Ad mitigates polysemy, Compoets of polysemous words that poit i the "right" diretio are added to the ompoets of words that share this sese. Coversely, ompoets that poit i other diretios ted to either simply ael out, or, at worst, to be smaller tha ompoets i the diretios orrespodig to the iteded sese. 31

Extesios Soo we will talk about kerels It turs out that ay algorithm whih depeds o the data through dot-produts oly, i.e. the matrix of elemets T i x x j a be kerelized This is usually beefiial, we will see why later For ow we look at the questio of whether PCA a be writte i the ier produt form metioed above Reall the data matrix is X x1 x 32

Extesios Reall the etered data matrix, ovariae, ad SVD: X X I 1 11 T X MP T N T This yields: X X MP M, F N X MP 1 P T 2 T 1 2 Hee, solvig for the d positive (ozero) eigevalues of the ier produt matrix X T X, ad for their assoiated eigevetors, provides a alterative way to ompute the eigedeompositio of the sample ovariae matrix eeded to perform a SVD., 33

Extesios I summary, we have T S FF F X MP 1 This meas that we a obtai PCA by 1) Assemblig the ier-produt matrix X T X 2) Computig its eigedeompositio P 2, ) PCA 1 1 X X M P M MM T 2 T T The priipal ompoets are the give by F = X P 1 The eigevalues are give by 1 / ) P 2 34

Extesios What is iterestig here is that we oly eed the matrix x1 T K X X x1 x x x T x This is the ier produt matrix of dot-produts of the etered data-poits Notie that you do t eed the poits themselves, oly their dot-produts (similarities) 35

Extesios I summary, to get PCA 1) Compute the dot-produt matrix K = X T X 2) Compute its eigedeompositio P 2, ) PCA: For a ovariae matrix S = FF T Priipal Compoets are give by F = X P 1 Eigevalues are give by 1 / ) P 2 Projetio of the etered data-poits oto the priipal ompoets is give by T T X F X X MP K MP 1 1 This allows the omputatio of the eigevalues ad PCA oeffiiets whe we oly have aess to the dot-produt (ier produt) matrix K 36

END 37