The Principal Component Analysis

Size: px

Start display at page:

Download "The Principal Component Analysis"

Hester Chase
5 years ago
Views:

1 The Principal Component Analysis Philippe B. Laval KSU Fall 2017 Philippe B. Laval (KSU) PCA Fall / 27

2 Introduction Every 80 minutes, the two Landsat satellites go around the world, recording images of our planet along a 185 Km wide path. Every 16 days, each satellite covers the entire surface of the planet. Every 8 days, every single location on the planet can be monitored. These images can be used by urban planners to study the rate and direction of population growth. They can be used to analyze soil moisture, vegetation growth, lakes and rivers. Governments can use them to detect and assess damages from natural disasters... Sensors on board the satellites acquire seven simultaneous images of every region by recording energy from separate wavelength bands. Each image is digitized and stored as a matrix., each number indicating the signal intensity of the corresponding pixel. Each of the seven image is one channel of a multichannel or multispectral image. The seven images of one fixed region typically contains redundant information. Yet, certain features will only appear in one or two images because their color or temperature was only captured by two of the seven sensors. Philippe B. Laval (KSU) PCA Fall / 27

3 Introduction Principal Component Analysis (PCA) is an effective way to suppress redundant information and provide in only one or two composite images most of the information from the initial data. Roughly speaking, the goal is to find a special linear combination of the images that combines each pixel of the seven images into one single new pixel. PCA can be applied to any data that consists of lists of measurements made on a collections of objects or individuals. For example, consider a chemical process that produces a plastic material. To monitor the process, 300 samples of the material produced are taken and each sample is subject to a battery of eight tests. The lab report for each sample is a vector in R 8 (8 tests). Since there are 300 samples, the set of these vectors is a matrix. Such a matrix is called the matrix of observations. Loosely speaking, we say that the process control data is eighth-dimensional. Philippe B. Laval (KSU) PCA Fall / 27

4 Introduction Example Suppose we have N college students for which we record their weights and heights. This is an example of two-dimensional data. Let [ X j be ] the observation vector in R 2 for the j th wj student that is X j = where w j is the weight (in Kg) and h j is the height (in m) of the jth student. Then, the [ matrix of observations ] has the form w1 w 2 w N = [X h 1 h 2 h 1 X 2 X N ]. Here is a possible matrix of N observations (randomly generated with height in meters and weight in kilograms) for N = 25 visualized on a scatter graph shown on the next slide. h j Philippe B. Laval (KSU) PCA Fall / 27

5 Introduction Figure: Scatter Plot of Observations Philippe B. Laval (KSU) PCA Fall / 27

6 Introduction We will use [X 1 X 2 X N ] to denote the matrix of observations. It will be a p N matrix meaning that we are observing (measuring) p different variables and we collect N samples of the measurements of these variables. x 1 x 2 We will let X = be the generic vector of all the variables we. x p are observing. Each X i will denote one measure of these variables, the ith measure. x i 1 x i 2 We will denote its components X i =.. When the number of variables being observed is small, we can use different names for the variables as we did in the example above. Philippe B. Laval (KSU) PCA Fall / 27 x i p

7 Mean and Covariance Definition (Sample Mean) Let [X 1 X 2 X N ] be a p N matrix of observations (meaning that for each object, we have p observations, each X i is a p 1 vector). The sample mean M of the observation vectors X 1, X 2,..., X N is given by M = 1 N N i=1 X i The reader will note that M is also a p 1 vector. Geometrically, the sample mean is the point at the "center" of the scatter plot. For i = 1, 2,..., N, we define X ] i = X i M and B = [ X1 X2 XN. The columns of B have a zero sample mean. Philippe B. Laval (KSU) PCA Fall / 27

8 Mean and Covariance Definition If the columns of a matrix have a zero sample mean, then B is said to be in mean-deviation form. Example The scatter plot of the observation matrix of the previous example put in mean-deviation form is shown in the picture on the next slide. You will note that the "center" of the data is now at the origin. Philippe B. Laval (KSU) PCA Fall / 27

9 Mean and Covariance Figure: Observation Data in Mean-Deviation Form Philippe B. Laval (KSU) PCA Fall / 27

10 Mean and Covariance Definition The sample covariance matrix or the covariance matrix is the p p matrix defined by S = 1 N 1 BBT Since any matrix of the form BB T is symmetric and positive semidefinite, so is S. Recall that a matrix A is positive definite (positive semidefinite) if for every nonzero vector x, x T Ax > 0 (x T Ax 0). Philippe B. Laval (KSU) PCA Fall / 27

11 Mean and Covariance Example Suppose that measurements are made on 4 individuals and the observation vectors are X 1 = 2, X 2 = 2, X 3 = 8, X 4 = Compute the sample mean and the covariance matrix. Philippe B. Laval (KSU) PCA Fall / 27

12 Mean and Covariance We should have found that the sample mean is M = The corresponding matrixin mean-deviation form is B = The covariance matrix is S = Philippe B. Laval (KSU) PCA Fall / 27

13 Mean and Covariance We now discuss the meaning of the entries in S, the covariance matrix, assuming that S = [s ij ]. We continue using the notation developed in this section. Philippe B. Laval (KSU) PCA Fall / 27

14 Mean and Covariance Definition (variance) For i = 1, 2,..., p, the diagonal entry s ii is called the variance of x i The variance of x j measures the spread of the values of x j. In the example above, from our computations, the variance of x 1 is 10, the variance of x 2 is 8 and the variance of x 3 is 32. What is important here is the relative size of these numbers. The fact that 32 > 10 indicates that the set of third entries contains a wider spread than the set of first entries. Definition (total variance) The total variance of the data is the sum of the diagonal entries in S. For a square matrix S (recall S is p p), the sum of the diagonal entries is called the trace of S and is denoted tr (S). Thus total variance = tr (S) Philippe B. Laval (KSU) PCA Fall / 27

15 Mean and Covariance Definition (covariance) The entry s ij for i i, in S is called the covariance of x i and x j. In the example above, looking at S, we see that the covariance between entries 1 and 3 is 0. This means that x 1 and x 3 are uncorrelated. Philippe B. Laval (KSU) PCA Fall / 27

16 PCA Using Eigenvalues As above, let us assume that our original variables are represented by the p 1 vector X and we perform N measurements we denote X 1, X 2,..., X N. For simplicity, we will assume that the p N matrix of observations [X 1 X 2 X N ] is already in mean-deviation form. If this were not the case, the first step would be to put it in that form. The goal of PCA is to find an orthogonal p p matrix, P = [u 1 u 2... u p ] that determines a change of variables X = PY that is x 1 x 2. x p = [u 1 u 2... u p ] with the property that the new variables y 1, y 2,..., y p are uncorrelated and are arranged in order of decreasing variance. y 1 y 2. y p Philippe B. Laval (KSU) PCA Fall / 27

17 PCA Using Eigenvalues A Question: From the statements on the previous slide, what can we say about the covariance matrix corresponding to the new variables? Another Question? How can we achieve the answer to the previous question? Philippe B. Laval (KSU) PCA Fall / 27

18 PCA Using Eigenvalues The orthogonal change of variable X = PY means that each observation vector X i is transformed into a new vector we call Y i such that X i = PY i. Therefore, Y i = P 1 X i = P T X i since P is supposed to be orthogonal. This is true for i = 1, 2,..., N. It is not hard to see (see homework) that for each orthogonal matrix P, the covariance matrix for Y 1, Y 2,..., Y N is P T SP where S is the covariance matrix for X 1, X 2,..., X N. So, the desired orthogonal matrix P is the one that makes P T SP diagonal (why?). Let D be a diagonal matrix with the eigenvalues λ 1, λ 2,..., λ p of S on the diagonal arranged so that λ 1 λ 2... λ p 0 and let P be an orthogonal matrix whose columns are the unit eigenvectors u 1, u 2,..., u p corresponding to the eigenvalues λ 1 λ 2... λ p. Then, S = PDP T and D = P T SP. Philippe B. Laval (KSU) PCA Fall / 27

19 PCA Using Eigenvalues Definition The unit eigenvectors u 1, u 2,..., u p of the covariance matrix S are called the principal components of the data in the matrix of observations. The first principal component is the eigenvector corresponding to the largest eigenvalue of S. The second principal component is the eigenvector corresponding to the second largest eigenvalue of S and so on. The first principal component u 1 determines the new variable y 1 in the following way. Let c 1, c 2,..., c p be the entries in u 1. Since u T 1 is the first row of P T, the equation Y = P T X shows that p y 1 = u T 1 X = c i x i Thus y 1 is a linear combination of the original variable using the entries in u 1 as weights. In a similar way, u 2 determines the entries in y 2, and so on. Philippe B. Laval (KSU) PCA Fall / 27 i=1

20 PCA Using Eigenvalues: Summary Given a matrix of observation B = [X 1 X 2 X N ] which is assumed to be in mean-deviation form, to perform a principal component analysis, we do the following: 1 Find the covariance matrix S = 1 N 1 BBT 2 Diagonalize S using eigenvalues. We let λ 1, λ 2,..., λ p be the eigenvalues of S arranged so that λ 1 λ 2... λ p 0 and let P be an orthogonal matrix whose columns are the unit eigenvectors u 1, u 2,..., u p corresponding to the eigenvalues λ 1 λ 2... λ p. 3 The unit eigenvectors u 1, u 2,..., u p of the covariance matrix S are called the principal components of the data in the matrix of observations. 4 The new uncorrelated variables are defined by Y = P T X where Y,like X is p 1. The covariance matrix of Y is P T SP. 5 The components y 1, y 2,..., y p of y can be expressed in terms of the components x 1, x 2,..., x p of X by y i = u T i X. Philippe B. Laval (KSU) PCA Fall / 27

21 PCA Using Eigenvalues Example Suppose the covariance matrix of some data is S = Find the principal components of the data, list the new variable determined by the first principal component and give the diagonal form of S. Philippe B. Laval (KSU) PCA Fall / 27

22 Dimension Reduction PCA is valuable when most of the variation in the data is due to variations in only a few of the new variables y 1, y 2,..., y p. It can be shown that an orthogonal change of variables X = PY does not change the total variance of the data. This means that if S = PDP T then { } { } total variance total variance = of x 1, x 2,..., x p of y 1, y 2,..., y p = tr (D) = λ 1 + λ λ p The variance of y i is λ i, and the quotient measures the tr (S) fraction of the total variance captured by y i. λ i We can eliminate the variables for which is small. tr (S) Philippe B. Laval (KSU) PCA Fall / 27 λ i

23 Dimension Reduction Example In the example above, compute the various percentages of variance captured by each variable. Example Consider the following data: Boy #1 #2 #3 #4 #5 Weight (lb) Height (in) Find the covariance matrix for the data. 2 Make a principal component analysis of the data to find a single size index that explains most of the variation in the data. Philippe B. Laval (KSU) PCA Fall / 27

24 PCA Using SVD The goal is the same as above, what changes is how we achieve it. As above, we will assume that the p N matrix of observations [X 1 X 2 X N ] is already in mean-deviation form. If this were not the case, the first step would be to put it in that form. Assume the SVD of the matrix of observations [X 1 X 2 X N ] is UΣV T. Here, U is p p, Σ is p N and V T is N N hence V is N N. We define the new variable Y by X = UY or Y = U T X. It can be shown that the covariance matrix 1 corresponding to Y is Y is N 1 Σ2 which is already in diagonal form; this is what we wanted. In this case, the principal components will be the columns of U. If we let U = [u 1 u 2... u p ] then the equation Y = P T X p shows that y 1 = u T 1 X = c i x i where c 1, c 2,..., c p are the entries in u 1. i=1 Thus y 1 is a linear combination of the original variable using the entries in u 1 as weights. In a similar way, u 2 determines the entries in y 2, and so on. Philippe B. Laval (KSU) PCA Fall / 27

25 PCA Using SVD: Summary Given a matrix of observation B = [X 1 X 2 X N ] which is assumed to be in mean-deviation form, to perform a principal component analysis, we do the following: 1 Find a SVD for B, say B = UΣV T. 2 If we let U = [u 1 u 2... u p ] then we define Y = U T X. The covariance 1 matrix of Y is N 1 Σ2. Y is the new uncorrelated variable. 1 3 We let λ 1, λ 2,..., λ p be the diagonal elements of N 1 Σ2 (automatically arranged so that λ 1 λ 2... λ p 0). Then λ i is the varaince of y i which is also the variance of x i. 4 The unit vectors u 1, u 2,..., u p are called the principal components of the data in the matrix of observations. We can get all the components we were able to get with the eigenvalue diagonalization. We simply got them differently. MATLAB uses this technique as it is a little bit more robust than the eigenvalue method. Philippe B. Laval (KSU) PCA Fall / 27

26 PCA and MATLAB Newer versions of MATLAB have a function to perform this, it is called pca. You can get help for it in MATLAB. Two important things to remember about MATLAB s pca: 1 It uses the SVD approach to find the principal components and related information. 2 It works differently than what is explained in these notes in the following sense. MATLAB s matrix of observation is the transpose of the matrix of observations used in these notes. In other words, the columns of our matrix of observations contain the various variables used for our data. For MATLAB, these variables are stored as the rows of the matrix of observations. Suppose that A is our matrix of observations. To use MATLAB s PCA, use pca(a ), where A is the transpose of A. Philippe B. Laval (KSU) PCA Fall / 27

27 Exercises See the problems at the end of the notes on Principal Component Analysis. Philippe B. Laval (KSU) PCA Fall / 27

Independent Component Analysis

Independent Component Analysis Philippe B. Laval KSU Fall 2017 Philippe B. Laval (KSU) ICA Fall 2017 1 / 18 Introduction Independent Component Analysis (ICA) falls under the broader topic of Blind Source