Signal Analysis. Principal Component Analysis

Similar documents
Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

1 Principal Components Analysis

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Principal Component Analysis

IV. Matrix Approximation using Least-Squares

Lecture: Face Recognition and Feature Reduction

Lecture 3: Review of Linear Algebra

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Lecture: Face Recognition and Feature Reduction

Linear Algebra Review. Fei-Fei Li

Linear Algebra & Geometry why is linear algebra useful in computer vision?

EECS 275 Matrix Computation

Principal Component Analysis. Applied Multivariate Statistics Spring 2012

Face Recognition and Biometric Systems

Principal Component Analysis (PCA)

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

Principal Component Analysis CS498

Singular Value Decomposition. 1 Singular Value Decomposition and the Four Fundamental Subspaces

Computation. For QDA we need to calculate: Lets first consider the case that

Lecture 3: Review of Linear Algebra

Problem # Max points possible Actual score Total 120

Announcements (repeat) Principal Components Analysis

Principal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 17

Tutorial on Principal Component Analysis

7. Variable extraction and dimensionality reduction

Exercises * on Principal Component Analysis

Principal Component Analysis

Karhunen-Loève Transform KLT. JanKees van der Poel D.Sc. Student, Mechanical Engineering

Covariance and Correlation Matrix

COMP6237 Data Mining Covariance, EVD, PCA & SVD. Jonathon Hare

Principal Component Analysis and Linear Discriminant Analysis

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

What is Principal Component Analysis?

Econ Slides from Lecture 8

Linear Algebra Review. Fei-Fei Li

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Maximum variance formulation

Principal Component Analysis

1 Singular Value Decomposition and Principal Component

LECTURE NOTE #10 PROF. ALAN YUILLE

Principal Component Analysis

CS4670: Computer Vision Kavita Bala. Lecture 7: Harris Corner Detec=on

Data Mining Lecture 4: Covariance, EVD, PCA & SVD

Signal Analysis. Multi resolution Analysis (II)

Lecture 13: Orthogonal projections and least squares (Section ) Thang Huynh, UC San Diego 2/9/2018

Eigenvalues, Eigenvectors, and an Intro to PCA

PCA, Kernel PCA, ICA

22.3. Repeated Eigenvalues and Symmetric Matrices. Introduction. Prerequisites. Learning Outcomes

Assignment #10: Diagonalization of Symmetric Matrices, Quadratic Forms, Optimization, Singular Value Decomposition. Name:

Math 520 Exam 2 Topic Outline Sections 1 3 (Xiao/Dumas/Liaw) Spring 2008

Eigenvalues, Eigenvectors, and an Intro to PCA

The Mathematics of Facial Recognition

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

14 Singular Value Decomposition

Computational paradigms for the measurement signals processing. Metodologies for the development of classification algorithms.

Covariance and Principal Components

CSE 554 Lecture 7: Alignment

c Springer, Reprinted with permission.

Eigenvalues, Eigenvectors, and an Intro to PCA

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

MATH 20F: LINEAR ALGEBRA LECTURE B00 (T. KEMP)

Preprocessing & dimensionality reduction

CS 4495 Computer Vision Principle Component Analysis

Dimensionality Reduction

Covariance to PCA. CS 510 Lecture #8 February 17, 2014

Principal Component Analysis (PCA) CSC411/2515 Tutorial

Singular Value Decomposition

Machine Learning - MT & 14. PCA and MDS

Designing Information Devices and Systems II Fall 2015 Note 5

Introduction to Machine Learning

Eigenvalues and diagonalization

Statistical Geometry Processing Winter Semester 2011/2012

Covariance to PCA. CS 510 Lecture #14 February 23, 2018

CS540 Machine learning Lecture 5

Final Exam, Linear Algebra, Fall, 2003, W. Stephen Wilson

PCA and admixture models

Concentration Ellipsoids

Gopalkrishna Veni. Project 4 (Active Shape Models)

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

Vectors and Matrices Statistics with Vectors and Matrices

Lecture: Face Recognition

Machine Learning 2nd Edition

Singular Value Decomposition and Principal Component Analysis (PCA) I

Least Squares Optimization

15 Singular Value Decomposition

PRINCIPAL COMPONENT ANALYSIS

12.2 Dimensionality Reduction

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

LECTURE 16: PCA AND SVD

UNIT 6: The singular value decomposition.

Principal Component Analysis

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

Linear Algebra. Session 12

PRINCIPAL COMPONENTS ANALYSIS

Computational Methods. Eigenvalues and Singular Values

STA 414/2104: Lecture 8

Efficient Observation of Random Phenomena

CHAPTER 4 PRINCIPAL COMPONENT ANALYSIS-BASED FUSION

Transcription:

Multi dimensional Signal Analysis Lecture 2E Principal Component Analysis

Subspace representation Note! Given avector space V of dimension N a scalar product defined by G 0 a subspace U of dimension M < N An N M basis matrix B of the subspace U a vector v V we can determine v 1 U that is closest to v v 1 U is a subspace representation of v V v 1 is independent of B, it only depends on U Lecture 2E, Klas Nordberg, LiU 2

Subspace representation V v U v 1 Lecture 2E, Klas Nordberg, LiU 3

Subspace representation More precisely: the coordinates of v 1 relative basis B is given by c = G 1 c Where c = B T G 0 v and G = B T G 0 B Lecture 2E, Klas Nordberg, LiU 4

Stochastic signals In the previous applications normalised convolution and filter optimisation the basis was fixed This means that U is fixed An alternative approach is to allow the signal vector v to be a stochastic variable and to determine an M dimensional subspace U such that v 1 is as close as possible to v in average Lecture 2E, Klas Nordberg, LiU 5

Initial problem formulation We want to minimise i i ² = E k v v 1 k 2 E means here to take the expectation value or mean where the expectation value is taken over all observations of the signal v ² is minimised i i over all M dimensional i l subspaces U Lecture 2E, Klas Nordberg, LiU 6

Problem formulation To simplify things, we assume that V is real = R N V is real R G 0 = I B is an ON basis of subspace U: Leads to: h u v i = u T v B = B B T B = I Lecture 2E, Klas Nordberg, LiU 7

Problem formulation This simplification i also means that: v 1 = B c = B B T v We want to determine an M dim subspace U (represented by ON basis B) such that we minimise ² = E k v B B T v k 2 Lecture 2E, Klas Nordberg, LiU 8

Solving the problem We assume that t v has known statistical ti ti properties We want to determine B that minimises ² This is the same as maximising ² 1 = E [v T BB T v] (why?) for ON basis B This is an optimisation problem with a set of constraints (B T B = I) Lecture 2E, Klas Nordberg, LiU 9

Practical solution As soon as M, = the dimension of U, = the number of columns in B, is determined, we optimise ² 1 over the N M elements in B with the constraints B T B = I Can be solved by means of standard techniques for constrained optimisation Lagrange s method Lecture 2E, Klas Nordberg, LiU 10

A simple example Simple example: M = 1 B has only one single column b 1 We want to maximise ² = E [v T 1T v] = E 1T v v T = T E [v v T 1 b 1 b ] [b b 1 ] b 1 ] b 1 with constraint c = b 1 T b 1 = 1 Lecture 2E, Klas Nordberg, LiU 11

A simple example Use Lagrange s method: ² 1 = λ c where the gradients are with respect to the elements in b 1 Leads to E [ vv T ] b 1 = λ b 1 Lecture 2E, Klas Nordberg, LiU 12

A simple example Consequently: b 1 is an eigenvector corresponding to eigenvalue λ of E [ vv T ] Remember: kb 1 k = 1, since B is ON Lecture 2E, Klas Nordberg, LiU 13

The correlation matrix It is clear that the matrix Approximation of C from P samples: C = E [ vv T ] C 1 P X 1 P k=1 v k v T k is an important thing to know in order to solve the problem C is called the correlation matrix of the signal C is symmetric and N N C is positive definite (why?) Lecture 2E, Klas Nordberg, LiU 14

A simple example With this choice of b 1, ² 1 becomes ² 1 = b 1 T Cb 1 = λ b 1T b 1 = λ Since we want to maximise ² 1 we should choose λ as large as possible b 1 is a normalised eigenvector corresponding to the largest eigenvalue of C Lecture 2E, Klas Nordberg, LiU 15

A simple example From ² 1 = λ 1 = largest eigenvalue of C follows that ² = sum of all eigenvalues of C except λ 1 (why?) Lecture 2E, Klas Nordberg, LiU 16

A slightly more complicated example Let U be 2 dimensional: B has two columns b 1 and b 2 We want to maximise i ² 1 = E [v T BB T v] ] with the constraints c 1 = b 1T b 1 = 1, c 2 = b 1T b 2 = 0, c 3 = b 2T b 2 = 1 Lecture 2E, Klas Nordberg, LiU 17

A slightly more complicated example In the general case, when M > 1, the subspace ON basis matrix B is not unique B = B Q with Q O(M) is also a solution (why?) We need additional constraints on B in order to make B unique We choose: B T C B is diagonal Subspace basis vectors are uncorrelated Lecture 2E, Klas Nordberg, LiU 18

A slightly more complicated example This additional constraint leads to Cb 1 = λ 1 b 1 C b 2 = λ 2 b 2 Both b 1 and b 2 must the normalised and mutually orthogonal eigenvectors of C, with eigenvalues λ 1 and λ 2 Lecture 2E, Klas Nordberg, LiU 19

A slightly more complicated example We want to maximise i (why?) ² 1 = E [v T BB T v] = b 1 T Cb 1 + b 2 T C b 2 = λ 1 + λ 2 and therefore we should choose b 1 and b 2 as normalised eigenvectors corresponding to the two largest eigenvalues of C Remember multiplicity of eigenvalues Since C is symmetric, b 1 and b 2 can always be chosen as orthogonal! Lecture 2E, Klas Nordberg, LiU 20

Generalisation Based on these examples we present a general result: We want to determine an M dimensional subspace U,, described by a ON basis matrix B. 1. Form the correlation matrix C 2. Compute eigenvalues and eigenvectors of C 3. The basis B consists of M eigenvectors corresponding to the M largest eigenvalues of C Lecture 2E, Klas Nordberg, LiU 21

Generalisation This gives ² 1 = [v T BB T v] = λ 1 +... + λ M and ² = λ M+1 +... + λ N where λ 1 λ 2... λ N are the eigenvalues of C Lecture 2E, Klas Nordberg, LiU 22

Principal components The eigenvectors of C are in this context referred to as principal p components The magnitude of a principal component is given by the corresponding eigenvalue The M dimensional subspace U is spanned by the M largest principal components of C To determine the basis B in this way is called principal component analysis or PCA Lecture 2E, Klas Nordberg, LiU 23

Analysis and reconstruction In PCA we analyse the signal v to get its coordinates relative to the subspace basis B: c = B T v If necessary, we can then reconstruct v 1 as v 1 = B c Lecture 2E, Klas Nordberg, LiU 24

Analysis and reconstruction Stochastic process PCA Reconstruction v R N c R M v 1 R N C R N N In some application, for example, clustering, the reconstruction step is not relevant Lecture 2E, Klas Nordberg, LiU 25

Applications In signal processing, PCA can be used for General data analysis For an unknown data/signal determine if it can be reduced in dimension Data/signal compression An N dimensional signal can be represented with an M dimensional basis (M < N) Reduces the amount to data needed to store/transmit the signal Data dependent compression (+) Effective since the compression is data dependent ( ) Overhead since B must be stored/transmitted as well Lecture 2E, Klas Nordberg, LiU 26

Signal model PCA can typically be applied to signals with a statistical model (mean and C known) PCA can also be applied to a finite data set in the form of high dimensional vectors Dimensionality reduction C is estimated from the finite data set Lecture 2E, Klas Nordberg, LiU 27

How large subspace? The idea behind PCA is to estimate C from a larges set of observation of the data/signal v and then choose the basis B as the M largest principal components How do we know a suitable value for M? No optimal strategy! Application dependent Lecture 2E, Klas Nordberg, LiU 28

How large subspace? In some applications i M may be fixed, i.e., already given In other applications it makes sense to analyse the eigenvalues λ 1,..., λ N in order to choose a suitable M For example: ² / ² 1 small (why?) Usually each dimension i of U corresponds to a cost Represents a coordinate of v 1 that needs to be stored or transmitted We want to keep M as small as possible Balance between cost and decrease in ² Lecture 2E, Klas Nordberg, LiU 29

An example A 512 512 pixel image Lecture 2E, Klas Nordberg, LiU 30

An example Let us divide id this image into 8 8 pixels block This gives in total 64 64 = 4096 blocks The pixels of each block constitute our signal v, a vector in R 64 (8 8 reshaped to a 64 dim column vector) From the 4096 observations of v we form the 64 64 correlation matrix C We also compute the eigenvalues and eigenvectors of C Lecture 2E, Klas Nordberg, LiU 31

The eigenvalues of C Here are the 64 eigenvalues of C One large eigenvalue, the rest are relatively small Lecture 2E, Klas Nordberg, LiU 32

The eigenvalues of C If we remove the largest eigenvalue and plot the rest: Lecture 2E, Klas Nordberg, LiU 33

The eigenvectors of C Each eigenvector of C is a 64 dimensional vector that can be reshaped p into an 8 8 block b1 b4 b2 b5 Lecture 2E, Klas Nordberg, LiU b3 b6 34

The eigenvectors of C We notice that b 1 is approximately flat or constant (DC) b 2 and b 3 are approximately shaped like a plane with iha slope in two perpendicular directions. Linear b 4, b 5 and b 6 are approximately shaped like quadratic surfaces Lecture 2E, Klas Nordberg, LiU 35

An example The distribution of the eigenvalues implies that already by going gfrom 0 to 1 dimension of U the error ² is reduced quite a lot By adding a few more dimensions it should be possible to represent the signal quite well Let us try with some different values for M and look at the result Lecture 2E, Klas Nordberg, LiU 36

An example Each block, as a vector v R 64, is projected onto v 1 U v 1 can be represented with only M coordinates v 1 is a reasonably good approximation of v, at least the mean error should be low How good is determined by the distribution of eigenvalues relative to M Lecture 2E, Klas Nordberg, LiU 37

M = 1 Here v is projected onto the first principal component b 1 Since b 1 is flat or constant, each block becomes flat or constant Lecture 2E, Klas Nordberg, LiU 38

M = 2 Here v is projected onto the first two principal components b 1 and b 2 Since b 2 is a slope in one direction we can now represent that change in each block But not a slope in the orthogonal direction! Lecture 2E, Klas Nordberg, LiU 39

M = 3 Here v is projected onto the first three principal components b 1... b 3 Since b 3 is a slope in the other direction we can now have slopes in any direction within each block Lecture 2E, Klas Nordberg, LiU 40

M=6 Here v is projected onto the first six principal i components b 1... b 6 With 3 more dimensions, each block can contain more details Here, data has been reduced by a factor 64/6 11 Lecture 2E, Klas Nordberg, LiU 41

General observation A general observation: In areas of low spatial frequency the approximation is good Inareasof high spatialfrequency the approximation is worse The more dtil details or higher h spatial frequency, the more dimensions are needed for an accurate representation Lecture 2E, Klas Nordberg, LiU 42

Karhunen Loève transformation Introducing the new basis B in this way and computing the coordinates of v relative to B is sometimes also referred to as Karhunen Loève transformation B T v gives the transformed signal (= the coordinates of v relative to basis B) BB T v reconstructs the projected signal v 1 Lecture 2E, Klas Nordberg, LiU 43

Covariance and mean In some application PCA is used to make a statistical analysis of a signal v It is then very common to describe v in terms of its mean m v and its covariance matrix m v = E [ v ] Cov v = E [ (v m v ) (v m v ) T ] v v v Lecture 2E, Klas Nordberg, LiU 44

Covariance and mean The PCA is then based on the covariance matrix Cov v rather than the correlation C This corresponds to translating the origin of the subspace U to the mean m v and do correlation based PCA there Both approaches are called PCA Statistical data analysis typically use Cov v Data compression typically use C Lecture 2E, Klas Nordberg, LiU 45

The data matrix In the case that we approximate the statistics of v from a finite number of P observations, these can be described by a data matrix A that holds all these v in its columns An estimate of C = E [ vv T ] is then given by C = (1/P) A A T Lecture 2E, Klas Nordberg, LiU 46

SVD vs EVD The principal i components are the eigenvectors of AA T These are also the left singular vectors of A An alternative to computing the PCA, that in some cases may have better numerical properties: 1. Form the data matrix A 2. Compute its SVD 3. Form B from the left singular vector that corresponds to the M largest singular values Lecture 2E, Klas Nordberg, LiU 47

What you should know includes Formulation of the problem that t PCA solves: Find a subspace U that minimises ² = E k v B B T v k 2 This subspace is represented by an ON basis B Solution: B consists of the M largest eigenvectors of C = E[ v v T ] Called principal components Alternatively computed by means of SVD ² = sum of residual eigenvalues of C Applications: Dimensionality reduction Signal compression Lecture 2E, Klas Nordberg, LiU 48