Karhunen-Loève Transform KLT. JanKees van der Poel D.Sc. Student, Mechanical Engineering

Similar documents
Robot Image Credit: Viktoriya Sukhanova 123RF.com. Dimensionality Reduction

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Advanced Introduction to Machine Learning CMU-10715

A Tutorial on Data Reduction. Principal Component Analysis Theoretical Discussion. By Shireen Elhabian and Aly Farag

A tutorial on Principal Components Analysis

Unsupervised Learning: K- Means & PCA

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Principal Component Analysis

Principal Component Analysis

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

1 Singular Value Decomposition and Principal Component

ISSN: (Online) Volume 3, Issue 5, May 2015 International Journal of Advance Research in Computer Science and Management Studies

Keywords Eigenface, face recognition, kernel principal component analysis, machine learning. II. LITERATURE REVIEW & OVERVIEW OF PROPOSED METHODOLOGY

Data Preprocessing Tasks

Computational paradigms for the measurement signals processing. Metodologies for the development of classification algorithms.

Singular Value Decomposition. 1 Singular Value Decomposition and the Four Fundamental Subspaces

Deriving Principal Component Analysis (PCA)

Principal Components Analysis (PCA)

Principal Component Analysis CS498

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Lecture: Face Recognition and Feature Reduction

7. Variable extraction and dimensionality reduction

Principal Component Analysis (PCA)

Linear Algebra Review. Fei-Fei Li

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Machine Learning (Spring 2012) Principal Component Analysis

Eigenimaging for Facial Recognition

PCA, Kernel PCA, ICA

Principal Component Analysis

Lecture: Face Recognition and Feature Reduction

14 Singular Value Decomposition

CS 4495 Computer Vision Principle Component Analysis

Linear Algebra Review. Fei-Fei Li

Image Registration Lecture 2: Vectors and Matrices

Introduction to Machine Learning

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Announcements (repeat) Principal Components Analysis

Quantitative Understanding in Biology Principal Components Analysis

Signal Analysis. Principal Component Analysis

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Principal Component Analysis (PCA)

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Singular Value Decomposition and Digital Image Compression

PCA FACE RECOGNITION

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)

15 Singular Value Decomposition

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

CSC 411 Lecture 12: Principal Component Analysis

Example: Face Detection

Covariance and Principal Components

PRINCIPAL COMPONENT ANALYSIS

22.3. Repeated Eigenvalues and Symmetric Matrices. Introduction. Prerequisites. Learning Outcomes

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

EE731 Lecture Notes: Matrix Computations for Signal Processing

Dimensionality Reduction

Conceptual Questions for Review

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

Dimensionality Reduction with Principal Component Analysis

System 1 (last lecture) : limited to rigidly structured shapes. System 2 : recognition of a class of varying shapes. Need to:

Linear Subspace Models

Background Mathematics (2/2) 1. David Barber

Dot Products. K. Behrend. April 3, Abstract A short review of some basic facts on the dot product. Projections. The spectral theorem.

Vectors and Matrices Statistics with Vectors and Matrices

Principal Component Analysis (PCA) CSC411/2515 Tutorial

Math 1553, Introduction to Linear Algebra

Dimensionality reduction

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

What is Principal Component Analysis?

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

Linear Algebra and Matrices

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Numerical Methods I Singular Value Decomposition

PCA Review. CS 510 February 25 th, 2013

Principal Components Analysis (PCA) and Singular Value Decomposition (SVD) with applications to Microarrays

Lecture 3: Review of Linear Algebra

Eigenvalues, Eigenvectors, and an Intro to PCA

Principal Component Analysis. Applied Multivariate Statistics Spring 2012

Getting Started with Communications Engineering

PRINCIPAL COMPONENTS ANALYSIS

3.3 Eigenvalues and Eigenvectors

Topic 14 Notes Jeremy Orloff

Eigenvalues, Eigenvectors, and an Intro to PCA

Principal Component Analysis (PCA) Theory, Practice, and Examples

STATISTICAL SHAPE MODELS (SSM)

Eigenvalues, Eigenvectors, and an Intro to PCA

Applied Linear Algebra in Geoscience Using MATLAB

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Designing Information Devices and Systems II Fall 2018 Elad Alon and Miki Lustig Homework 9

Review problems for MA 54, Fall 2004.

4 ORTHOGONALITY ORTHOGONALITY OF THE FOUR SUBSPACES 4.1

LINEAR ALGEBRA KNOWLEDGE SURVEY

Data Mining Lecture 4: Covariance, EVD, PCA & SVD

Main matrix factorizations

Maximum variance formulation

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

Face Detection and Recognition

Eigenvalues and Eigenvectors

Lecture: Face Recognition

Transcription:

Karhunen-Loève Transform KLT JanKees van der Poel D.Sc. Student, Mechanical Engineering

Karhunen-Loève Transform Has many names cited in literature: Karhunen-Loève Transform (KLT); Karhunen-Loève Decomposition (or Expansion); Principal (or Principle) Component Analysis (PCA); Principal (or Principle) Factor Analysis (PFA); Singular Value decomposition (SVD); Proper Orthogonal Decomposition (POD); 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 2

Karhunen-Loève Transform Has many names cited in literature: Galerkin Method (this variation is used to find solutions to certain types of Partial Differential Equations, PDEs, specially in the field of Mechanical Engineering and electromechanically coupled systems); Hotelling Transform; and Collective Coordinates. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 3

Karhunen-Loève Transform Karhunen-Loève Transform (KLT) takes a given collection of data (an input collection) and creates an orthogonal basis (the KLT basis) for the data. An orthogonal basis for a space V is a set of mutually orthogonal vectors (in other words, they are linearly independent) {b i } that span the space V. Here is provided an overview of KLT for some specific type of input collections. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 4

Karhunen-Loève Transform Pearson (1901), Hotelling (1933), Kosambi (1943), Loève (1945), Karhunen (1946), Pougachev (1953) and Obukhov (1954) have been independently credited to the discovery of KLT under one of its many titles. KLT has applications in almost any scientific field. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 5

Karhunen-Loève Transform KLT has been widely used in: Studies of turbulence; Thermal/chemical reactions; Feed-forward and feedback control design applications (KLT is used to obtain a reduced order model for simulations or control design); Data analysis or compression (characterization of human faces, map generation by robots and freight traffic prediction); 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 6

Karhunen-Loève Transform One of the most important mathematical matrix factorizations is what is called the Singular Value Decomposition (SVD). The Singular Value Decomposition has many useful properties desirable in many applications. The Principle Components Analysis (PCA) is an application of the SVD. It identifies patterns in data, expressing this data in a way as to highlight their similarities and differences. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 7

Karhunen-Loève Transform To make things easy, the name Principal Component Analysis (PCA) will be used from now on, instead of KLT or SVD. In our field of signal/image processing, this is the known name for the Karhunen-Loève Transform What is Principal Component Analysis? Patterns in data can be hard to find in high dimensional data (where the luxury of graphical representation is not available). 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 8

Principal Component Analysis So, use PCA for analyzing the data. Once the data patterns where found, reduce the number of data dimensions (without much loss of information), by compressing the data (this makes more easy to visualize the hidden data pattern). The PCA basically analyzes the data in order to reduce its dimensions, eliminate superpositions and it better using linear combinations obtained from the original variables. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 9

Value Data Presentation Example: 53 blood and urine measurements from 65 people (33 alcoholics, 32 non-alcoholics). H-WBC H-RBC H-Hgb H-Hct H-MCV H-MCH H-MCHC A1 8.0000 4.8200 14.1000 41.0000 85.0000 29.0000 34.0000 A2 7.3000 5.0200 14.7000 43.0000 86.0000 29.0000 34.0000 A3 4.3000 4.4800 14.1000 41.0000 91.0000 32.0000 35.0000 A4 7.5000 4.4700 14.9000 45.0000 101.0000 33.0000 33.0000 A5 7.3000 5.5200 15.4000 46.0000 84.0000 28.0000 33.0000 A6 6.9000 4.8600 16.0000 47.0000 97.0000 33.0000 34.0000 A7 7.8000 4.6800 14.7000 43.0000 92.0000 31.0000 34.0000 A8 8.6000 4.8200 15.8000 42.0000 88.0000 33.0000 37.0000 A9 5.1000 4.7100 14.0000 43.0000 92.0000 30.0000 32.0000 Matrix Format 1000 900 800 700 600 500 400 300 200 100 0 10 20 30 40 Measurement measurement 50 60 Spectral Format 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 10

M-EPI H-Bands C-LDH Data Presentation 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0 Univariate 10 20 30 40 50 60 70 Person 4 3 2 1 Trivariate Bivariate 550 500 450 400 350 300 250 200 150 100 50 0 50 150 250 350 450 C-Triglycerides 0 600 C-LDH 400 200 0 0 100200300 400 500 C-Triglycerides 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 11

Data Presentation Is there a better presentation than the common Cartesian axes? That is, do we really need a space with 53 dimensions to view the data? This rises the question of how to find the best low dimension space that conveys maximum useful information. The answer is Find the Principal Components! 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 12

Principal Components All of the Principal Components (PCs) start at the origin of the ordinate axes. The first PC is the direction of maximum variance from origin. All subsequent PCs are orthogonal to the first PC, describing maximum residual variance. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 13

Algebraic Interpretation nd Case Let's say that m points in a space with n (n large) dimensions are given. Now, how does one project these m points on to a low dimensional space while preserving broad trends in the data, while also allowing it to be visualized? 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 14

Algebraic Interpretation 1D Case Given m points in a n (n large) dimensional space, how does one project these m points on to a one dimensional space? Simply choose a line that fits the data so the points are spread out well along the line. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 15

Algebraic Interpretation 1D Case Formally, minimize the sum of squares of distances to the line. Why sum of squares? Because it allows fast minimization, assuming the line passes through zero! 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 16

Algebraic Interpretation 1D Case Minimizing the sum of squares of distances to the line is the same as maximizing the sum of squares of the projections on that line. Many thanks to Pythagoras! 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 17

Basic Mathematical Concepts Before getting to a description of PCA, this tutorial first introduces mathematical concepts that will be used in PCA: Standard deviation, covariance, and eigenvectors and eigenvalues This background knowledge is meant to make the PCA section very easy, but can be skipped if the concepts are already familiar. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 18

Standard Deviation The Standard Deviation (SD) of a data set is a measure of how spread out the data is. The average distance from the mean of the data set to a point. The datasets [0, 8, 12, 20] and [8, 9, 11, 12] have the same mean (that is 10) but are quite different. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 19

Standard Deviation By means of the Standard Deviation it is possible, in some way, to differentiate these two sets As expected, the first set ([0, 8, 12, 20]) has a much larger standard deviation than the second set ([8, 9, 11, 12]) due to the fact that the data is much more spread out from the mean. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 20

Variance Variance is another measure of the spread of data in a data set. In fact it is almost identical to the standard deviation. The only difference is that the variance is simply the standard deviation squared. Variance, in addition to Standard Deviation, was introduced to provide a solid platform from which the next section, covariance, can be launched. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 21

Covariance Both standard deviation and variance are purely one dimensional measures. However many data sets have more than one dimension. The aim of the statistical analysis of these kind of data sets is usually to see if there is any relationship between its dimensions. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 22

Covariance Standard deviation and variance only operate on one dimensional data, so it is only possible to calculate the standard deviation for each dimension of the data set independently of the other dimensions. However, it is useful to have a similar measure to find out how much the dimensions vary from the mean with respect to each other. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 23

Covariance Covariance is always calculated between two dimensions. With a 3D data (X, Y, Z), covariance is calculated between (X, Y), (X, Z) and (Y, Z). With a nd data set, [n!/2*(n-2)!] different covariance values can be calculated. The covariance calculated between a dimension and itself gives the variance. The covariance between (X, X), (Y, Y) and (Z, Z) gives the variance of the X, Y and Z dimensions. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 24

Covariance Matrix As an example, let s make up the covariance matrix for an imaginary 3 dimensional data set, with the usual dimensions x, y and z. In this case, the covariance matrix has three rows and three columns with these values: C cov( x, x) cov( y, x) cov( z, x) cov( x, cov( y, cov( z, cov( x, z) cov( y, z) cov( z, z) 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 25 y) y) y)

Covariance Matrix Down the main diagonal, one can see that the covariance value is computed between one of the dimensions and itself (which are the variances for that dimension). Since cov(a,b) = cov(b,a), the covariance matrix is symmetrical about the main diagonal. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 26

Eigenvectors and Eigenvalues A vector v is an eigenvector of a square matrix (m by m) M if M*v (multiplication of the matrix M by the vector v) gives a multiple of v, i.e., a *v (multiplication of the scalar by the vector v). In this case, is called the eigenvalue of M that is associated to the eigenvector v. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 27

Eigenvector Properties Eigenvectors can only be found for square matrices. Not every square matrix has eigenvectors. An m by m matrix has m eigenvectors, given that they exist. For example, given a 3 by 3 matrix that has eigenvectors, there are three of them. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 28

Eigenvector Properties Even if the eigenvector is scaled by some amount before being multiplied, one still gets the same multiple of it as a result. This is because if a vector is scaled by some amount, all it is done is to make it longer, not changing its direction 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 29

Eigenvector Properties All the eigenvectors of a matrix are perpendicular (orthogonal), i.e., at right angles to each other, no matter how many dimensions the matrix have. This is important because it means that the data can be expressed in terms of these perpendicular eigenvectors, instead of expressing them in terms of their axes. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 30

The PCA Method Step 1: Get some data to use in a simple example. I am going to use my own two dimensional data set. I have chosen a two dimensional data set because I can provide plots of the data to show what the PCA analysis is doing at each step. The data I have used is found in the next slide. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 31

The PCA Method The data used in this example is shown here. Data = alturas pesos 183 79 173 69 120 45 168 70 188 81 158 61 201 98 163 63 193 79 167 71 178 73 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 32

The PCA Method Step 2: Subtract the mean. For PCA to work properly, you have to subtract the mean from each of the data dimensions. The mean subtracted is the average across each dimension. All the x values have their mean value subtracted from them, as well as all the y values have their mean value subtracted from them. This produces a data set whose mean is zero. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 33

The PCA Method The data with its mean subtracted (adjusted data) is shown here. Both the data and the adjusted data are plotted in the next slide. Data = alturas pesos 11 7.27 1-2.72-52 -26.72-4 -1.72 16 9.27-14 -10.72 29 26.27-9 -8.72 21 7.27-5 -0.72 6 1.27 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 34

The PCA Method 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 35

The PCA Method Step 3: Calculate the covariance matrix. Since the data is two dimensional, the covariance matrix will have two rows and two columns: C 471.80 277.70 277.70 180.02 One should notice that heights and weights do normally increase together. As the non-diagonal e l e m e n t s in t h i s covariance matrix are positive, we should expect that both x and y v a r i a b l e s increase together. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 36

The PCA Method Step 4: Calculate the eigenvectors and eigenvalues of the data matrix. In Matlab, this step is performed using eig (only for square matrices) or svd (matrices with any shape) commands. As the data matrix is not square, we only can use the svd command. The eigenvectors and eigenvalues are rather important, giving useful information about the data. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 37

The PCA Method Step 4: Calculate the eigenvectors and eigenvalues of the data matrix. Here are the eigenvectors, which are found along the diagonal of the matrix S, diag(s) in Matlab, and the eigenvalues: eigenvalues eigenvectors 623.1194 16.0392 0.9220 0.3871 0.3871 0.9220 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 38

The PCA Method Looking at the plot of the adjusted data shown here, one can see how it has quite a strong pattern. As expected from the covariance matrix (and from the common sense), both of the variables increase together. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 39

The PCA Method On top of the adjusted data I have plotted both eigenvectors as well (appearing as a red and a green line). As stated earlier, they are perpendicular to each other. More important than this is that they provide information about the data patterns. One of the eigenvectors goes right through the middle of the points, drawing a line of best fit. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 40

The PCA Method The first eigenvector (the one plotted in green) shows us that these two data sets are very related to each other along that line. The second eigenvector (the one plotted in red) gives the other, and less important, pattern in the data. It shows that all the points follow the main line, but are off to its side by some amount. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 41

The PCA Method By the process of taking the eigenvectors of the covariance matrix, we have been able to extract lines that characterize the data. The rest of the steps involve transforming the data so that this data is expressed in terms of these lines. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 42

The PCA Method Recalling the important aspects from the previous figure: Two lines are perpendicular to each other, being interchangeably orthogonal ; The eigenvectors provides us a way to see hidden patterns of the data; One of the eigenvectors draws a line which best fits to the data. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 43

The PCA Method Step 5: Choosing components and forming a feature vector. Here comes the notion of data compression and reduced dimensionality. Eigenvalues have different values: the highest one corresponds to the eigenvector that is the principal component of the data set (the most significant relationship between the data dimensions). 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 44

The PCA Method Once the eigenvectors are found from the data matrix, they are ordered by their eigenvalues, from the highest to the lowest. This gives the components in order of significance. The components which are less significant can be ignored. Some information is lost but, if the eigenvalues are small, the amount lost is not too much. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 45

The PCA Method If some components are left out, the final data set will have less dimensions than the original. If the original data set has n dimensions and n eigenvectors are calculated (together with their eigenvalues) and only the first p eigenvectors are chosen, then the final data set will have only p dimensions. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 46

The PCA Method Now, what needs to be done is to form a feature vector (a fancy name for a matrix of vectors). This feature vector is constructed by taking the eigenvectors that are to be kept from the list of eigenvectors and form a matrix with them in the columns. eigenvector 1, Feature _ Vector eigenvector 2,, eigenvector n T 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 47

The PCA Method Using the data set seen before, and the fact that there are two eigenvectors, there are two choices. One is to form a feature vector with both of the eigenvectors: eigenvectors 0.9220 0.3871 0.3871 0.9220 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 48

The PCA Method The other is to form a feature vector leaving out the smaller, less significant, component and only have a single column: eigenvalues eigenvectors 623.1194 16.0392 0.9220 0.3871 0.3871 0.9220 Most significant eigenvector Most significant eigenvalue Less significant eigenvalue Less significant eigenvector 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 49

The PCA Method In other words, the result is a feature vector with p vectors, selected from n eigenvectors (where p < n). This is the most common option. eigenvalue 623.1194 Most significant eigenvalue eigenvector 0.9220 0.3871 Most significant eigenvector 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 50

The PCA Method Step 6: Deriving the new data set. This the final step in PCA (and the easiest one). Chose the components (eigenvectors) to be kept in the data set and form a feature vector. Just remember that the eigenvector with the highest eigenvalue is the principal component of the data set. Take the transpose of the vector and multiply it on the left of the transposed original data set. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 51

The PCA Method Final _ Data RowFeatureVector RowDataAdjusted The matrix called RowFeaureVector has the transposed eigenvectors in its columns. The eigenvectors are now in the rows, with the most significant one at the top. The matrix called RowDataAdusted has the transposed mean adjusted data in its columns. The data items are in each column, each row holding a separate dimension. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 52

The PCA Method This sudden transpose of all data is confusing, but equations from now on are easier if the transpose of the feature vector and the data is taken first. Better that having to always carry a little T symbol above their names! Final_Data is the final data set, with data items in columns, and dimensions along rows. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 53

The PCA Method The original data is now only given in terms of the chosen vectors. The original data set was written in terms of the x and y axes. The data can be expressed in terms of any axes, but the expression is most efficient if these axes are perpendicular. This is why it was important that eigenvectors are always perpendicular to each other. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 54

The PCA Method So, the original data (expressed in terms of the x and y axes ) is now expressed in terms of the eigenvectors found. If a reduced dimension is needed (throwing some of the eigenvectors out), the new data will be expressed in terms of the vectors that were kept. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 55

The PCA Method 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 56

The PCA Method Among all possible orthogonal transforms, PCA is optimal in the following sense: KLT completely decorrelates the signal; and KLT maximally compacts the energy (in other words, the information) contained in the signal. But the PCA is computationally expensive and is not supposed to be used carelessly. Instead, one can use the Discrete Cosine Transform, DCT, which approaches the KLT in this sense. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 57

The PCA Method Examples Here, we switch to Matlab in order to run some examples that (I sincerely hope) may clarify the things to you: Project the data into the principal component axis, show the rank one approximation, and compress an image by reducing the number of its coefficients (PCA.m), pretty much as by using the DCT. Show the difference between the least squares and the PCA and do the alignment of 3D models using the PCA properties (SVD.m). 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 58

The PCA Method Examples Some things should be noticed about the power of PCA to compress an image (as seen in the PCA.m example). The amount of memory required to store an uncompressed image of size m n is M image = m*n. So, notice that the amount of memory we need to store an image increases exponentially as its dimensions get larger. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 59

The PCA Method Examples But, the amount of memory required to store an SVD image (also of size m n) approximation using rank k is M approx = k(m + n + 1). So, notice that the amount of memory required increases linearly as the dimensions get larger, as opposed to exponentially. Thus, as the image gets larger, more memory is saved by using SVD. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 60

The PCA Method Examples Perform face recognition using the Principal Component Analysis approach! This is accomplished using a technique known in the literature by the Eigenface Technique. We will see an example of how to do it using a well known Face Database called The AT & T Faces Database. Two Matlab functions: facerecognitionexample.m and loadfacedatabase.m. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 61

What is the Eigenface Technique? The idea is that face images can be economically represented by their projection onto a small number of basis images derived by finding the most significant eigenvectors of the pixel wise covariance matrix for a set of training images. A lot of people like to play with this technique, but in my tutorial I will simply show how to get some eigenfaces and play with them in Matlab. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 62

AT&T Database of Faces AT&T Database of Faces contains a set of face images. Database used in the context of a face recognition project. Ten different images of 40 distinct subjects taken at different times (varying lighting, facial details and expressions) and against a dark homogeneous background with subjects in an upright, frontal position (some side movement was tolerated). 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 63

AT&T Database of Faces The images have a size of 92x112 pixels (in other words, 10304 pixels) and 256 grey levels per pixel, organized in 40 directories (one for each subject) and each directory contains ten different images of a subject. Matlab can read PNG files and other formats without help. So, it is relatively easy to load all face database into Matlab s workspace and process it. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 64

Getting The Faces Into One Big Matrix First of all, we need to put all the faces of the database in one huge matrix with a size of 112*92 = 10304 lines and 400 columns. This step is done by the function called loadfacedatabase.m. It reads a bunch of images, makes column vectors out of each of one of them, put all together and return the result. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 65

Getting the Recognition to Work Here we change to Matlab directly, because the steps we do to perform the face recognition task are better explained seeing the function called facerecognitionexample.m. All the steps necessary to perform this task are done in this function and it is ready to be executed and commented. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 66

Cases When PCA Fail (1) PCA projects data onto a set of orthogonal vectors (principle components). This restricts the new input components to be a linear combination of old ones. However, there are cases where the intrinsic freedom of data can not be expressed as a linear combination of input components In such cases PCA will overestimate the input dimensionality. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 67

Cases When PCA Fail (1) So, PCA does is not capable to find the nonlinear intrinsic dimension of data (like the angle between the two vectors in the example above). Instead, it will find out two components with equal importance. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 68

Cases When PCA Fail (2) In cases when components with small variability really matter, PCA will make mistakes due to its unsupervised nature. In such cases, if we only consider the projections of two classes of data as input, they will become indistinguishable. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 69

Any (Reasonable) Doubts? 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 70