Principal Component Analysis. Nuno Vasconcelos ECE Department, UCSD

Similar documents
Principal Component Analysis

Sx [ ] = x must yield a

Fluids Lecture 2 Notes

Basic Probability/Statistical Theory I

Observer Design with Reduced Measurement Information

ε > 0 N N n N a n < ε. Now notice that a n = a n.

Machine Learning for Data Science (CS 4786)

Bernoulli Numbers. n(n+1) = n(n+1)(2n+1) = n(n 1) 2

Chapter 8 Hypothesis Testing

Apply change-of-basis formula to rewrite x as a linear combination of eigenvectors v j.

Machine Learning for Data Science (CS 4786)

After the completion of this section the student. V.4.2. Power Series Solution. V.4.3. The Method of Frobenius. V.4.4. Taylor Series Solution

Math E-21b Spring 2018 Homework #2

A widely used display of protein shapes is based on the coordinates of the alpha carbons - - C α

SYNTHESIS OF SIGNAL USING THE EXPONENTIAL FOURIER SERIES

ANOTHER PROOF FOR FERMAT S LAST THEOREM 1. INTRODUCTION

Inverse Matrix. A meaning that matrix B is an inverse of matrix A.

Summation Method for Some Special Series Exactly

Chimica Inorganica 3

Class #25 Wednesday, April 19, 2018

Lecture 8. Dirac and Weierstrass

Bertrand s Postulate

Solutions 3.2-Page 215

For a 3 3 diagonal matrix we find. Thus e 1 is a eigenvector corresponding to eigenvalue λ = a 11. Thus matrix A has eigenvalues 2 and 3.

(8) 1f = f. can be viewed as a real vector space where addition is defined by ( a1+ bi

COMP26120: Introducing Complexity Analysis (2018/19) Lucas Cordeiro

Lecture 8: October 20, Applications of SVD: least squares approximation

U8L1: Sec Equations of Lines in R 2

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Probability & Statistics Chapter 8

4.3 Growth Rates of Solutions to Recurrences

Chapter 12 EM algorithms The Expectation-Maximization (EM) algorithm is a maximum likelihood method for models that have hidden variables eg. Gaussian

Lecture 24 Floods and flood frequency

Axis Aligned Ellipsoid

Chapter 18 Summary Sampling Distribution Models

THE MEASUREMENT OF THE SPEED OF THE LIGHT

x c the remainder is Pc ().

11 Correlation and Regression

a is some real number (called the coefficient) other

The Stokes Theorem. (Sect. 16.7) The curl of a vector field in space

Mixtures of Gaussians and the EM Algorithm

6.003 Homework #3 Solutions

Machine Learning for Data Science (CS4786) Lecture 4

Factor Analysis. Lecture 10: Factor Analysis and Principal Component Analysis. Sam Roweis

Cov(aX, cy ) Var(X) Var(Y ) It is completely invariant to affine transformations: for any a, b, c, d R, ρ(ax + b, cy + d) = a.s. X i. as n.

Definitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients.

Image Spaces. What might an image space be

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

Production Test of Rotary Compressors Using Wavelet Analysis

Chapter 23: Inferences About Means

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

Exponents. Learning Objectives. Pre-Activity

TMA4205 Numerical Linear Algebra. The Poisson problem in R 2 : diagonalization methods

Orthogonal transformations

Construction of Control Chart for Random Queue Length for (M / M / c): ( / FCFS) Queueing Model Using Skewness

, then cv V. Differential Equations Elements of Lineaer Algebra Name: Consider the differential equation. and y2 cos( kx)

CALCULUS BASIC SUMMER REVIEW

Physics 3 (PHYF144) Chap 8: The Nature of Light and the Laws of Geometric Optics - 1

Singular value decomposition. Mathématiques appliquées (MATH0504-1) B. Dewals, Ch. Geuzaine

Topics in Eigen-analysis

What is a Hypothesis? Hypothesis is a statement about a population parameter developed for the purpose of testing.

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Chapter 4: Angle Modulation

24 MATH 101B: ALGEBRA II, PART D: REPRESENTATIONS OF GROUPS

The Relationship of the Cotangent Function to Special Relativity Theory, Silver Means, p-cycles, and Chaos Theory

PAPER : IIT-JAM 2010

CHAPTER 8 SYSTEMS OF PARTICLES

Solving the ZF Receiver Equation for MIMO Systems Under Variable Channel Conditions Using the Block Fourier Algorithm

SOME NOTES ON INEQUALITIES

U8L1: Sec Equations of Lines in R 2

Algorithms. Elementary Sorting. Dong Kyue Kim Hanyang University

1 Last time: similar and diagonalizable matrices

6 Integers Modulo n. integer k can be written as k = qn + r, with q,r, 0 r b. So any integer.

Explicit and closed formed solution of a differential equation. Closed form: since finite algebraic combination of. converges for x x0

Series III. Chapter Alternating Series

NBHM QUESTION 2007 Section 1 : Algebra Q1. Let G be a group of order n. Which of the following conditions imply that G is abelian?

a. For each block, draw a free body diagram. Identify the source of each force in each free body diagram.

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

Practical Spectral Anaysis (continue) (from Boaz Porat s book) Frequency Measurement

10-701/ Machine Learning Mid-term Exam Solution

The axial dispersion model for tubular reactors at steady state can be described by the following equations: dc dz R n cn = 0 (1) (2) 1 d 2 c.


a for a 1 1 matrix. a b a b 2 2 matrix: We define det ad bc 3 3 matrix: We define a a a a a a a a a a a a a a a a a a

Some examples of vector spaces

Mon Apr Second derivative test, and maybe another conic diagonalization example. Announcements: Warm-up Exercise:

Mathematics Extension 1

Algebra II Notes Unit Seven: Powers, Roots, and Radicals

The beta density, Bayes, Laplace, and Pólya

Math 155 (Lecture 3)

2 f(x) dx = 1, 0. 2f(x 1) dx d) 1 4t t6 t. t 2 dt i)

λ = 0.4 c 2nf max = n = 3orɛ R = 9

Fall 2013 MTH431/531 Real analysis Section Notes

MATH10212 Linear Algebra B Proof Problems

Symmetric Matrices and Quadratic Forms

Chapter Vectors

( ) ( ) ( ) notation: [ ]

(VII.A) Review of Orthogonality

State Space Representation

Probability, Expectation Value and Uncertainty

Empirical Process Theory and Oracle Inequalities

Transcription:

Priipal Compoet Aalysis Nuo Vasoelos ECE Departmet, UCSD

Curse of dimesioality typial observatio i Bayes deisio theory: error ireases whe umber of features is large problem: eve for simple models (e.g. Gaussia) we eed large # of eamples to have good estimates Q: what does large mea? his depeds o the dimesio of the spae the best way to see this is to thik of a histogram suppose you have 00 poits ad you eed at least 0 bis per ais i order to get a reasoable quatizatio for uiform data you get, o average, deet id, bad i 2D, terrible i 3D (9 out of eah0 bis empty) dimesio 2 3 poits/bi 0 0. 2

Curse of dimesioality this is the urse of dimesioality for a give lassifier umber of eamples required to maitai lassifiatio auray ireases epoetially with the dimesio of the spae i higher dimesios the lassifier has more parameters higher ompleity, harder to lear 3

Dimesioality redutio what do we do about this? we avoid ueessary dimesios ueessary a be measured i two ways:.features are ot disrimiat 2.features are ot idepedet o-disrimiat meas that they do ot separate the lasses well disrimiat o-disrimiat 4

Dimesioality redutio Q: how do we detet the presee of feature orrelatios? A: the data lives i a low dimesioal subspae (up to some amouts of oise). E.g. ew feature y salary o o o o o o o o o oo o o o ar loa projetio oto D subspae: y a salary o o o o o o o o ar loa i the eample above we have a 3D hyper-plae i 5D if we a fid this hyper-plae we a projet the data oto it get rid of half of the dimesios without itroduig sigifiat error 5

Priipal ompoet aalysis basi idea: if the data lives i a subspae, it is goig to look very flat whe viewed from the full spae, e.g. D subspae i 2D 2D subspae i 3D this meas that if we fit a Gaussia to the data the iso-probability otours are goig to be highly skewed ellipsoids 6

Priipal ompoet aalysis how do we fid these ellipsoids? whe we talked about metris we said that the Mahalaobis distae measures the atural uits for the problem beause it is adapted to the ovariae of the data we also kow that what is speial about it is that it uses Σ - hee, the iformatio must be i Σ d(, y) ( y) Σ ( y) 7

Gaussia review the equiprobability otours of a Gaussia are the poits suh that let s osider the hage of variable z -µ, whih oly moves the origi by µ. he equatio is the equatio of a ellipse. this is easy to see whe Σ is diagoal: 8

Gaussia review this is the equatio of a ellipse with priipal legths σ i e.g. whe d 2 is the ellipse z 2 σ 2 σ z itrodue the trasformatio y Φ Τ z 9

Gaussia review itrodue the trasformatio y Φ Τ z the y has ovariae if Φ is orthoormal this is just a rotatio ad we have y 2 z 2 φ 2 λ2 λ φ y y Φ Τ z σ 2 σ z we obtai a rotated ellipse with priipal ompoets φ ad φ 2 whih are the olums of Φ ote that is the eige-deompositio of Σ y 0

Priipal ompoet aalysis If y is Gaussia with ovariae Σ, the equiprobability otours are the ellipses whose priipal ompoets φ i are the eigevetors of Σ priipal legths λ i are the eigevalues of Σ φ 2 y 2 λ2 λ φ y by omputig the eigevalues we kow if the data is flat λ >> λ 2 : flat λ λ 2 : ot flat y 2 y 2 λ 2 λ λ 2 y λ y

Priipal ompoet aalysis (learig) 2

Priipal ompoet aalysis 3

Priipal ompoet aalysis how do I determie the umber of eigevetors to keep? oe possibility is to plot eigevalue magitudes this is a sree plot usually there is a fast derease i the eigevalue magitude followed by a flat area oe good hoie is the kee of this urve 4

Priipal ompoet aalysis aother possibility is the peretage of eplaied variae remember that eigevalues are a measure of variae y 2 z 2 φ 2 λ2 λ φ y y Φ Τ z σ 2 σ z ratio r k measures % of total variae otaied i the top k eigevalues measure of the fratio of data variability alog the assoiated eigevetors r k k 2 λi i i λ 2 i 5

Priipal ompoet aalysis a atural measure is to pik the eigevetors that eplai p % of the data variability a be doe by plottig the ratio r k as a futio of k r k k 2 λi i i λ 2 i e.g. we eed 3 eigevetors to over 70% of the variability of this dataset 6

Priipal ompoet aalysis there is a alterative maer to ompute the priipal ompoets, based o sigular value deompositio SVD: ay real m matri (>m) a be deomposed as A ΜΠΝ where M is a m olum orthoormal matri of left sigular vetors (olums of M) Π a m m diagoal matri of sigular values N a m m row orthoormal matri of right sigular vetors (olums of N) Μ Μ I Ν Ν I 7

8 PCA by SVD to relate this to PCA, we osider the data matri the sample mea is K i i M K µ

9 PCA by SVD ad we a eter the data by subtratig the mea to eah olum of this is the etered data matri I µ µ µ K K

20 PCA by SVD the sample ovariae is where i is the i th olum of this a be writte as ( )( ) ( ) Σ i i i i i i µ µ Σ M K

2 PCA by SVD the matri is real d. Assumig > d it has SVD deompositio ad M ΜΠΝ I I Ν Ν Μ Μ Ν ΝΠ ΜΠΝ ΝΠΜ Σ 2

PCA by SVD Σ Ν Π Ν otig that N is d d ad orthoormal, ad Π 2 diagoal, shows that this is just the eigevalue deompositio of Σ it follows that the eigevetors of Σ are the olums of N the eigevalues of Σ are 2 λ i π i this gives a alterative algorithm for PCA 22

PCA by SVD omputatio of PCA by SVD give with oe eample per olum ) reate the etered data-matri 2) ompute its SVD I ΜΠΝ 3) priipal ompoets are olums of N, eigevalues are λ i π i 23

Priipal ompoet aalysis priipal ompoets are usually quite iformative about the struture of the data eample the priipal ompoets for the spae of images of faes the figure oly show the first 6 eigevetors ote lightig, struture, et 24

Priipal ompoets aalysis PCA has bee applied to virtually all learig problems e.g. eigeshapes for fae morphig morphed faes 25

Priipal ompoet aalysis soud average soud images Eigeobjets orrespodig to the three highest eigevalues 26

Priipal ompoet aalysis turbulee flames eigeflames 27

Priipal ompoet aalysis video eigerigs reostrutio 28

Priipal ompoet aalysis tet: latet semati ideig represet eah doumet by a word histogram perform SVD o the doumet word matri terms oepts terms oepts doumets doumets priipal ompoets as the diretios of semati oepts 29

Latet semati aalysis appliatios: doumet lassifiatio, iformatio goal: solve two fudametal problems i laguage syoymy: differet writers use differet words to desribe the same idea. polysemy, the same word a have multiple meaigs reasos: origial term-doumet matri is too large for the omputig resoures origial term-doumet matri is oisy: for istae, aedotal istaes of terms are to be elimiated. origial term-doumet matri overly sparse relative to "true" term-doumet matri. E.g. lists oly words atually i eah doumet, whereas we might be iterested i all words related to eah doumet-- muh larger set due to syoymy 30

Latet semati aalysis after PCA some dimesios get "merged": {(ar), (truk), (flower)} --> {(.3452 * ar + 0.2828 * truk), (flower)} this mitigates syoymy, merges the dimesios assoiated with terms that have similar meaigs. ad mitigates polysemy, ompoets of polysemous words that poit i the "right" diretio are added to the ompoets of words that share this sese. oversely, ompoets that poit i other diretios ted to either simply ael out, or, at worst, to be smaller tha ompoets i the diretios orrespodig to the iteded sese. 3

Etesios i a few letures we will talk about kerels turs out that ay algorithm whih depeds o the data through dot-produts oly, i.e. the matri of elemets a be kerelized i this is usually beefiial, we will see why later for ow we look at the questio of whether PCA a be writte i the form above reall the data matri is j K 32

33 Etesios the etered-data matri ad the ovariae the eigevetor φ i of eigevalue λ i is hee, the eigevetor matri is I Σ i i i i i i i φ α α λ φ λ φ, Γ Γ Φ, d d d λ α λ α K

34 Etesios we et ote that, from the eigevetor deompositio ad i.e. ΣΦ Λ Φ Σ ΦΛΦ ( )( )Γ Γ Γ Λ Γ ( )( ) ΓΛΓ

Etesios i summary, we have this meas that we a obtai PCA by ) assemblig - ( )( ) 2) omputig its eige-deompositio (Λ,Γ) PCA Σ ΦΛΦ Φ the priipal ompoets are the give by Γ the eigevalues are give by Λ Γ ( )( ) ΓΛΓ 35

36 Etesios the matri is the matri of dot-produts of the etered data-poits it is symmetri ( ) M K K M K M K

37 Etesios hee whih, usig ( )( ) K K ( ) i k k k k k k K K M K

Etesios is just the ovariae of the olums of the matri K K ( ) K i.e., the dot-produt matri for the data M M 38

Etesios i summary, to get PCA ) ompute the dot-produt matri K 2) ompute its eige-deompositio (Λ,Γ) PCA the priipal ompoets are the give by Φ Γ the eigevalues are give by Λ the projetio of the data-poits o the priipal ompoets is give by Φ Γ K Γ this allows the omputatio of the eigevalues ad PCA oeffiiets whe we oly have aess to the dot-produt matri K 39

40