Karhunen-Loève Transform KLT. JanKees van der Poel D.Sc. Student, Mechanical Engineering

Size: px

Start display at page:

Download "Karhunen-Loève Transform KLT. JanKees van der Poel D.Sc. Student, Mechanical Engineering"

Maximillian Lee
6 years ago
Views:

1 Karhunen-Loève Transform KLT JanKees van der Poel D.Sc. Student, Mechanical Engineering

2 Karhunen-Loève Transform Has many names cited in literature: Karhunen-Loève Transform (KLT); Karhunen-Loève Decomposition (or Expansion); Principal (or Principle) Component Analysis (PCA); Principal (or Principle) Factor Analysis (PFA); Singular Value decomposition (SVD); Proper Orthogonal Decomposition (POD); 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 2

3 Karhunen-Loève Transform Has many names cited in literature: Galerkin Method (this variation is used to find solutions to certain types of Partial Differential Equations, PDEs, specially in the field of Mechanical Engineering and electromechanically coupled systems); Hotelling Transform; and Collective Coordinates. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 3

4 Karhunen-Loève Transform Karhunen-Loève Transform (KLT) takes a given collection of data (an input collection) and creates an orthogonal basis (the KLT basis) for the data. An orthogonal basis for a space V is a set of mutually orthogonal vectors (in other words, they are linearly independent) {b i } that span the space V. Here is provided an overview of KLT for some specific type of input collections. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 4

5 Karhunen-Loève Transform Pearson (1901), Hotelling (1933), Kosambi (1943), Loève (1945), Karhunen (1946), Pougachev (1953) and Obukhov (1954) have been independently credited to the discovery of KLT under one of its many titles. KLT has applications in almost any scientific field. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 5

6 Karhunen-Loève Transform KLT has been widely used in: Studies of turbulence; Thermal/chemical reactions; Feed-forward and feedback control design applications (KLT is used to obtain a reduced order model for simulations or control design); Data analysis or compression (characterization of human faces, map generation by robots and freight traffic prediction); 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 6

7 Karhunen-Loève Transform One of the most important mathematical matrix factorizations is what is called the Singular Value Decomposition (SVD). The Singular Value Decomposition has many useful properties desirable in many applications. The Principle Components Analysis (PCA) is an application of the SVD. It identifies patterns in data, expressing this data in a way as to highlight their similarities and differences. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 7

8 Karhunen-Loève Transform To make things easy, the name Principal Component Analysis (PCA) will be used from now on, instead of KLT or SVD. In our field of signal/image processing, this is the known name for the Karhunen-Loève Transform What is Principal Component Analysis? Patterns in data can be hard to find in high dimensional data (where the luxury of graphical representation is not available). 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 8

9 Principal Component Analysis So, use PCA for analyzing the data. Once the data patterns where found, reduce the number of data dimensions (without much loss of information), by compressing the data (this makes more easy to visualize the hidden data pattern). The PCA basically analyzes the data in order to reduce its dimensions, eliminate superpositions and it better using linear combinations obtained from the original variables. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 9

10 Value Data Presentation Example: 53 blood and urine measurements from 65 people (33 alcoholics, 32 non-alcoholics). H-WBC H-RBC H-Hgb H-Hct H-MCV H-MCH H-MCHC A A A A A A A A A Matrix Format Measurement measurement Spectral Format 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 10

11 M-EPI H-Bands C-LDH Data Presentation Univariate Person Trivariate Bivariate C-Triglycerides C-LDH C-Triglycerides 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 11

12 Data Presentation Is there a better presentation than the common Cartesian axes? That is, do we really need a space with 53 dimensions to view the data? This rises the question of how to find the best low dimension space that conveys maximum useful information. The answer is Find the Principal Components! 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 12

13 Principal Components All of the Principal Components (PCs) start at the origin of the ordinate axes. The first PC is the direction of maximum variance from origin. All subsequent PCs are orthogonal to the first PC, describing maximum residual variance. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 13

14 Algebraic Interpretation nd Case Let's say that m points in a space with n (n large) dimensions are given. Now, how does one project these m points on to a low dimensional space while preserving broad trends in the data, while also allowing it to be visualized? 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 14

15 Algebraic Interpretation 1D Case Given m points in a n (n large) dimensional space, how does one project these m points on to a one dimensional space? Simply choose a line that fits the data so the points are spread out well along the line. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 15

16 Algebraic Interpretation 1D Case Formally, minimize the sum of squares of distances to the line. Why sum of squares? Because it allows fast minimization, assuming the line passes through zero! 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 16

17 Algebraic Interpretation 1D Case Minimizing the sum of squares of distances to the line is the same as maximizing the sum of squares of the projections on that line. Many thanks to Pythagoras! 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 17

18 Basic Mathematical Concepts Before getting to a description of PCA, this tutorial first introduces mathematical concepts that will be used in PCA: Standard deviation, covariance, and eigenvectors and eigenvalues This background knowledge is meant to make the PCA section very easy, but can be skipped if the concepts are already familiar. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 18

19 Standard Deviation The Standard Deviation (SD) of a data set is a measure of how spread out the data is. The average distance from the mean of the data set to a point. The datasets [0, 8, 12, 20] and [8, 9, 11, 12] have the same mean (that is 10) but are quite different. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 19

20 Standard Deviation By means of the Standard Deviation it is possible, in some way, to differentiate these two sets As expected, the first set ([0, 8, 12, 20]) has a much larger standard deviation than the second set ([8, 9, 11, 12]) due to the fact that the data is much more spread out from the mean. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 20

21 Variance Variance is another measure of the spread of data in a data set. In fact it is almost identical to the standard deviation. The only difference is that the variance is simply the standard deviation squared. Variance, in addition to Standard Deviation, was introduced to provide a solid platform from which the next section, covariance, can be launched. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 21

22 Covariance Both standard deviation and variance are purely one dimensional measures. However many data sets have more than one dimension. The aim of the statistical analysis of these kind of data sets is usually to see if there is any relationship between its dimensions. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 22

23 Covariance Standard deviation and variance only operate on one dimensional data, so it is only possible to calculate the standard deviation for each dimension of the data set independently of the other dimensions. However, it is useful to have a similar measure to find out how much the dimensions vary from the mean with respect to each other. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 23

24 Covariance Covariance is always calculated between two dimensions. With a 3D data (X, Y, Z), covariance is calculated between (X, Y), (X, Z) and (Y, Z). With a nd data set, [n!/2*(n-2)!] different covariance values can be calculated. The covariance calculated between a dimension and itself gives the variance. The covariance between (X, X), (Y, Y) and (Z, Z) gives the variance of the X, Y and Z dimensions. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 24

25 Covariance Matrix As an example, let s make up the covariance matrix for an imaginary 3 dimensional data set, with the usual dimensions x, y and z. In this case, the covariance matrix has three rows and three columns with these values: C cov( x, x) cov( y, x) cov( z, x) cov( x, cov( y, cov( z, cov( x, z) cov( y, z) cov( z, z) 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 25 y) y) y)

26 Covariance Matrix Down the main diagonal, one can see that the covariance value is computed between one of the dimensions and itself (which are the variances for that dimension). Since cov(a,b) = cov(b,a), the covariance matrix is symmetrical about the main diagonal. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 26

27 Eigenvectors and Eigenvalues A vector v is an eigenvector of a square matrix (m by m) M if M*v (multiplication of the matrix M by the vector v) gives a multiple of v, i.e., a *v (multiplication of the scalar by the vector v). In this case, is called the eigenvalue of M that is associated to the eigenvector v. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 27

28 Eigenvector Properties Eigenvectors can only be found for square matrices. Not every square matrix has eigenvectors. An m by m matrix has m eigenvectors, given that they exist. For example, given a 3 by 3 matrix that has eigenvectors, there are three of them. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 28

29 Eigenvector Properties Even if the eigenvector is scaled by some amount before being multiplied, one still gets the same multiple of it as a result. This is because if a vector is scaled by some amount, all it is done is to make it longer, not changing its direction 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 29

30 Eigenvector Properties All the eigenvectors of a matrix are perpendicular (orthogonal), i.e., at right angles to each other, no matter how many dimensions the matrix have. This is important because it means that the data can be expressed in terms of these perpendicular eigenvectors, instead of expressing them in terms of their axes. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 30

31 The PCA Method Step 1: Get some data to use in a simple example. I am going to use my own two dimensional data set. I have chosen a two dimensional data set because I can provide plots of the data to show what the PCA analysis is doing at each step. The data I have used is found in the next slide. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 31

32 The PCA Method The data used in this example is shown here. Data = alturas pesos de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 32

33 The PCA Method Step 2: Subtract the mean. For PCA to work properly, you have to subtract the mean from each of the data dimensions. The mean subtracted is the average across each dimension. All the x values have their mean value subtracted from them, as well as all the y values have their mean value subtracted from them. This produces a data set whose mean is zero. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 33

34 The PCA Method The data with its mean subtracted (adjusted data) is shown here. Both the data and the adjusted data are plotted in the next slide. Data = alturas pesos de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 34

35 The PCA Method 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 35

36 The PCA Method Step 3: Calculate the covariance matrix. Since the data is two dimensional, the covariance matrix will have two rows and two columns: C One should notice that heights and weights do normally increase together. As the non-diagonal e l e m e n t s in t h i s covariance matrix are positive, we should expect that both x and y v a r i a b l e s increase together. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 36

37 The PCA Method Step 4: Calculate the eigenvectors and eigenvalues of the data matrix. In Matlab, this step is performed using eig (only for square matrices) or svd (matrices with any shape) commands. As the data matrix is not square, we only can use the svd command. The eigenvectors and eigenvalues are rather important, giving useful information about the data. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 37

38 The PCA Method Step 4: Calculate the eigenvectors and eigenvalues of the data matrix. Here are the eigenvectors, which are found along the diagonal of the matrix S, diag(s) in Matlab, and the eigenvalues: eigenvalues eigenvectors de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 38

39 The PCA Method Looking at the plot of the adjusted data shown here, one can see how it has quite a strong pattern. As expected from the covariance matrix (and from the common sense), both of the variables increase together. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 39

40 The PCA Method On top of the adjusted data I have plotted both eigenvectors as well (appearing as a red and a green line). As stated earlier, they are perpendicular to each other. More important than this is that they provide information about the data patterns. One of the eigenvectors goes right through the middle of the points, drawing a line of best fit. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 40

41 The PCA Method The first eigenvector (the one plotted in green) shows us that these two data sets are very related to each other along that line. The second eigenvector (the one plotted in red) gives the other, and less important, pattern in the data. It shows that all the points follow the main line, but are off to its side by some amount. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 41

42 The PCA Method By the process of taking the eigenvectors of the covariance matrix, we have been able to extract lines that characterize the data. The rest of the steps involve transforming the data so that this data is expressed in terms of these lines. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 42

43 The PCA Method Recalling the important aspects from the previous figure: Two lines are perpendicular to each other, being interchangeably orthogonal ; The eigenvectors provides us a way to see hidden patterns of the data; One of the eigenvectors draws a line which best fits to the data. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 43

44 The PCA Method Step 5: Choosing components and forming a feature vector. Here comes the notion of data compression and reduced dimensionality. Eigenvalues have different values: the highest one corresponds to the eigenvector that is the principal component of the data set (the most significant relationship between the data dimensions). 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 44

45 The PCA Method Once the eigenvectors are found from the data matrix, they are ordered by their eigenvalues, from the highest to the lowest. This gives the components in order of significance. The components which are less significant can be ignored. Some information is lost but, if the eigenvalues are small, the amount lost is not too much. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 45

46 The PCA Method If some components are left out, the final data set will have less dimensions than the original. If the original data set has n dimensions and n eigenvectors are calculated (together with their eigenvalues) and only the first p eigenvectors are chosen, then the final data set will have only p dimensions. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 46

47 The PCA Method Now, what needs to be done is to form a feature vector (a fancy name for a matrix of vectors). This feature vector is constructed by taking the eigenvectors that are to be kept from the list of eigenvectors and form a matrix with them in the columns. eigenvector 1, Feature _ Vector eigenvector 2,, eigenvector n T 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 47

48 The PCA Method Using the data set seen before, and the fact that there are two eigenvectors, there are two choices. One is to form a feature vector with both of the eigenvectors: eigenvectors de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 48

49 The PCA Method The other is to form a feature vector leaving out the smaller, less significant, component and only have a single column: eigenvalues eigenvectors Most significant eigenvector Most significant eigenvalue Less significant eigenvalue Less significant eigenvector 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 49

50 The PCA Method In other words, the result is a feature vector with p vectors, selected from n eigenvectors (where p < n). This is the most common option. eigenvalue Most significant eigenvalue eigenvector Most significant eigenvector 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 50

51 The PCA Method Step 6: Deriving the new data set. This the final step in PCA (and the easiest one). Chose the components (eigenvectors) to be kept in the data set and form a feature vector. Just remember that the eigenvector with the highest eigenvalue is the principal component of the data set. Take the transpose of the vector and multiply it on the left of the transposed original data set. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 51

52 The PCA Method Final _ Data RowFeatureVector RowDataAdjusted The matrix called RowFeaureVector has the transposed eigenvectors in its columns. The eigenvectors are now in the rows, with the most significant one at the top. The matrix called RowDataAdusted has the transposed mean adjusted data in its columns. The data items are in each column, each row holding a separate dimension. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 52

53 The PCA Method This sudden transpose of all data is confusing, but equations from now on are easier if the transpose of the feature vector and the data is taken first. Better that having to always carry a little T symbol above their names! Final_Data is the final data set, with data items in columns, and dimensions along rows. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 53

54 The PCA Method The original data is now only given in terms of the chosen vectors. The original data set was written in terms of the x and y axes. The data can be expressed in terms of any axes, but the expression is most efficient if these axes are perpendicular. This is why it was important that eigenvectors are always perpendicular to each other. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 54

55 The PCA Method So, the original data (expressed in terms of the x and y axes ) is now expressed in terms of the eigenvectors found. If a reduced dimension is needed (throwing some of the eigenvectors out), the new data will be expressed in terms of the vectors that were kept. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 55

56 The PCA Method 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 56

57 The PCA Method Among all possible orthogonal transforms, PCA is optimal in the following sense: KLT completely decorrelates the signal; and KLT maximally compacts the energy (in other words, the information) contained in the signal. But the PCA is computationally expensive and is not supposed to be used carelessly. Instead, one can use the Discrete Cosine Transform, DCT, which approaches the KLT in this sense. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 57

58 The PCA Method Examples Here, we switch to Matlab in order to run some examples that (I sincerely hope) may clarify the things to you: Project the data into the principal component axis, show the rank one approximation, and compress an image by reducing the number of its coefficients (PCA.m), pretty much as by using the DCT. Show the difference between the least squares and the PCA and do the alignment of 3D models using the PCA properties (SVD.m). 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 58

59 The PCA Method Examples Some things should be noticed about the power of PCA to compress an image (as seen in the PCA.m example). The amount of memory required to store an uncompressed image of size m n is M image = m*n. So, notice that the amount of memory we need to store an image increases exponentially as its dimensions get larger. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 59

60 The PCA Method Examples But, the amount of memory required to store an SVD image (also of size m n) approximation using rank k is M approx = k(m + n + 1). So, notice that the amount of memory required increases linearly as the dimensions get larger, as opposed to exponentially. Thus, as the image gets larger, more memory is saved by using SVD. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 60

61 The PCA Method Examples Perform face recognition using the Principal Component Analysis approach! This is accomplished using a technique known in the literature by the Eigenface Technique. We will see an example of how to do it using a well known Face Database called The AT & T Faces Database. Two Matlab functions: facerecognitionexample.m and loadfacedatabase.m. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 61

62 What is the Eigenface Technique? The idea is that face images can be economically represented by their projection onto a small number of basis images derived by finding the most significant eigenvectors of the pixel wise covariance matrix for a set of training images. A lot of people like to play with this technique, but in my tutorial I will simply show how to get some eigenfaces and play with them in Matlab. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 62

63 AT&T Database of Faces AT&T Database of Faces contains a set of face images. Database used in the context of a face recognition project. Ten different images of 40 distinct subjects taken at different times (varying lighting, facial details and expressions) and against a dark homogeneous background with subjects in an upright, frontal position (some side movement was tolerated). 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 63

64 AT&T Database of Faces The images have a size of 92x112 pixels (in other words, pixels) and 256 grey levels per pixel, organized in 40 directories (one for each subject) and each directory contains ten different images of a subject. Matlab can read PNG files and other formats without help. So, it is relatively easy to load all face database into Matlab s workspace and process it. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 64

65 Getting The Faces Into One Big Matrix First of all, we need to put all the faces of the database in one huge matrix with a size of 112*92 = lines and 400 columns. This step is done by the function called loadfacedatabase.m. It reads a bunch of images, makes column vectors out of each of one of them, put all together and return the result. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 65

66 Getting the Recognition to Work Here we change to Matlab directly, because the steps we do to perform the face recognition task are better explained seeing the function called facerecognitionexample.m. All the steps necessary to perform this task are done in this function and it is ready to be executed and commented. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 66

67 Cases When PCA Fail (1) PCA projects data onto a set of orthogonal vectors (principle components). This restricts the new input components to be a linear combination of old ones. However, there are cases where the intrinsic freedom of data can not be expressed as a linear combination of input components In such cases PCA will overestimate the input dimensionality. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 67

68 Cases When PCA Fail (1) So, PCA does is not capable to find the nonlinear intrinsic dimension of data (like the angle between the two vectors in the example above). Instead, it will find out two components with equal importance. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 68

69 Cases When PCA Fail (2) In cases when components with small variability really matter, PCA will make mistakes due to its unsupervised nature. In such cases, if we only consider the projections of two classes of data as input, they will become indistinguishable. 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 69

70 Any (Reasonable) Doubts? 2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 70

Robot Image Credit: Viktoriya Sukhanova 123RF.com. Dimensionality Reduction

Robot Image Credit: Viktoriya Sukhanova 13RF.com Dimensionality Reduction Feature Selection vs. Dimensionality Reduction Feature Selection (last time) Select a subset of features. When classifying novel