Principal Components Analysis

Size: px

Start display at page:

Download "Principal Components Analysis"

Raymond Wilkinson
6 years ago
Views:

1 Principal Components Analysis Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 16-Mar-2017 Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 1

2 Copyright Copyright c 2017 by Nathaniel E. Helwig Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 2

3 Outline of Notes 1) Background Overview Data, Cov, Cor Orthogonal Rotation 3) Sample PCs Definition Calculation Properties 2) Population PCs Definition Calculation Properties 4) PCA in Practice Covariance or Correlation? Decathlon Example Number of Components Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 3

4 Background Background Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 4

5 Background Overview of Principal Components Analysis Definition and Purposes of PCA Principal Components Analysis (PCA) finds linear combinations of variables that best explain the covariation structure of the variables. There are two typical purposes of PCA: 1 Data reduction: explain covariation between p variables using r < p linear combinations 2 Data interpretation: find features (i.e., components) that are important for explaining covariation Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 5

6 Data Matrix Background Data, Covariance, and Correlation Matrix The data matrix refers to the array of numbers x 11 x 12 x 1p x 21 x 22 x 2p X = x 31 x 32 x 3p x n1 x n2 x np where x ij is the j-th variable collected from the i-th item (e.g., subject). items/subjects are rows variables are columns X is a data matrix of order n p (# items by # variables). Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 6

7 Background Population Covariance Matrix Data, Covariance, and Correlation Matrix The population covariance matrix refers to the symmetric array σ 11 σ 12 σ 13 σ 1p σ 21 σ 22 σ 23 σ 2p Σ = σ 31 σ 32 σ 33 σ 3p σ p1 σ p2 σ p3 σ pp where σ jj = E([X j µ j ] 2 ) is the population variance of the j-th variable σ jk = E([X j µ j ][X k µ k ]) is the population covariance between the j-th and k-th variables µ j = E(X j ) is the population mean of the j-th variable Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 7

8 Background Sample Covariance Matrix Data, Covariance, and Correlation Matrix The sample covariance matrix refers to the symmetric array where s 2 1 s 12 s 13 s 1p s 21 s2 2 s 23 s 2p S = s 31 s 32 s3 2 s 3p s p1 s p2 s p3 sp 2 s 2 j = 1 n 1 n i=1 (x ij x j ) 2 is the sample variance of the j-th variable s jk = 1 n 1 n i=1 (x ij x j )(x ik x k ) is the sample covariance between the j-th and k-th variables x j = (1/n) n i=1 x ij is the sample mean of the j-th variable Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 8

9 Background Covariance Matrix from Data Matrix Data, Covariance, and Correlation Matrix We can calculate the (sample) covariance matrix such as S = 1 n 1 X cx c where X c = X 1 n x = CX with x = ( x 1,..., x p ) denoting the vector of variable means C = I n n 1 1 n 1 n denoting a centering matrix Note that the centered matrix X c has the form x 11 x 1 x 12 x 2 x 1p x p x 21 x 1 x 22 x 2 x 2p x p X c = x 31 x 1 x 32 x 2 x 3p x p x n1 x 1 x n2 x 2 x np x p Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 9

10 Background Population Correlation Matrix Data, Covariance, and Correlation Matrix The population correlation matrix refers to the symmetric array 1 ρ 12 ρ 13 ρ 1p ρ 21 1 ρ 23 ρ 2p P = ρ 31 ρ 32 1 ρ 3p ρ p1 ρ p2 ρ p3 1 where ρ jk = σ jk σjj σ kk is the Pearson correlation coefficient between variables X j and X k. Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 10

11 Background Sample Correlation Matrix Data, Covariance, and Correlation Matrix The sample correlation matrix refers to the symmetric array 1 r 12 r 13 r 1p r 21 1 r 23 r 2p R = r 31 r 32 1 r 3p r p1 r p2 r p3 1 where r jk = s jk s j s k = n i=1 (x ij x j )(x ik x k ) n i=1 (x ij x j ) 2 n i=1 (x ik x k ) 2 is the Pearson correlation coefficient between variables x j and x k. Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 11

12 Background Correlation Matrix from Data Matrix Data, Covariance, and Correlation Matrix We can calculate the (sample) correlation matrix such as R = 1 n 1 X sx s where X s = CXD 1 with C = I n n 1 1 n 1 n denoting a centering matrix D = diag(s 1,..., s p ) denoting a diagonal scaling matrix Note that the standardized matrix X s has the form (x 11 x 1 )/s 1 (x 12 x 2 )/s 2 (x 1p x p )/s p (x 21 x 1 )/s 1 (x 22 x 2 )/s 2 (x 2p x p )/s p X s = (x 31 x 1 )/s 1 (x 32 x 2 )/s 2 (x 3p x p )/s p (x n1 x 1 )/s 1 (x n2 x 2 )/s 2 (x np x p )/s p Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 12

13 Background Orthogonal Rotation Rotating Points in Two Dimensions Suppose we have z = (x, y) R 2, i.e., points in 2D Euclidean space. A 2 2 orthogonal rotation of (x, y) of the form ( ) ( ) ( ) x cos(θ) sin(θ) x y = sin(θ) cos(θ) y rotates (x, y) counter-clockwise around the origin by an angle of θ and ( ) ( ) ( ) x cos(θ) sin(θ) x = sin(θ) cos(θ) y y rotates (x, y) clockwise around the origin by an angle of θ. Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 13

14 y Background Orthogonal Rotation Visualization of 2D Clockwise Rotation No Rotation 30 degrees 45 degrees a b c d f e g h i j k xyrot[,2] a b c d e f g h i j k xyrot[,2] a c b d i j k e f g h x xyrot[,1] xyrot[,1] 60 degrees 90 degrees 180 degrees xyrot[,2] a b c d j k i h e f g xyrot[,2] a b c j k i d h e f g xyrot[,2] k j i h g f e d c a b xyrot[,1] Nathaniel E. Helwig (U of Minnesota) xyrot[,1] Principal Components Analysis xyrot[,1] Updated 16-Mar-2017 : Slide 14

15 Background Orthogonal Rotation Visualization of 2D Clockwise Rotation (R Code) rotmat2d <- function(theta){ matrix(c(cos(theta),sin(theta),-sin(theta),cos(theta)),2,2) } x <- seq(-2,2,length=11) y <- 4*exp(-x^2) - 2 xy <- cbind(x,y) rang <- c(30,45,60,90,180) dev.new(width=12,height=8,norstudiogd=true) par(mfrow=c(2,3)) plot(x,y,xlim=c(-3,3),ylim=c(-3,3),main="no Rotation") text(x,y,labels=letters[1:11],cex=1.5) for(j in 1:5){ rmat <- rotmat2d(rang[j]*2*pi/360) xyrot <- xy%*%rmat plot(xyrot,xlim=c(-3,3),ylim=c(-3,3)) text(xyrot,labels=letters[1:11],cex=1.5) title(paste(rang[j]," degrees")) } Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 15

16 Background Orthogonal Rotation Orthogonal Rotation in Two Dimensions Note that the 2 2 rotation matrix ( ) cos(θ) sin(θ) R = sin(θ) cos(θ) is an orthogonal matrix for all θ: ( ) ( ) R cos(θ) sin(θ) cos(θ) sin(θ) R = sin(θ) cos(θ) sin(θ) cos(θ) ( cos = 2 (θ) + sin 2 ) (θ) cos(θ) sin(θ) cos(θ) sin(θ) cos(θ) sin(θ) cos(θ) sin(θ) cos 2 (θ) + sin 2 (θ) ( ) 1 0 = 0 1 Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 16

17 Background Orthogonal Rotation Orthogonal Rotation in Higher Dimensions Suppose we have a data matrix X with p columns. Rows of X are coordinates of points in p-dimensional space Note: when p = 2 we have situation on previous slides A p p orthogonal rotation is an orthogonal linear transformation. R R = RR = I p where I p is p p identity matrix If X = XR is rotated data matrix, then X X = XX Orthogonal rotation preserves relationships between points Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 17

18 Population Principal Components Population Principal Components Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 18

19 Population Principal Components Definition Linear Combinations of Random Variables X = (X 1,..., X p ) is a random vector with covariance matrix Σ, where λ 1 λ p 0 are the eigenvalues of Σ. Consider forming new variables Y 1,..., Y p by taking p different linear combinations of the X j variables: Y 1 = b 1 X = b 11X 1 + b 21 X b p1 X p Y 2 = b 2 X = b 12X 1 + b 22 X b p2 X p. Y p = b px = b 1p X 1 + b 2p X b pp X p where b k = (b 1k,..., b pk ) is the k-th linear combination vector. b k are called the loadings for the k-th principal component Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 19

20 Population Principal Components Definition Defining Principal Components in the Population Note that the random variable Y k = b kx has the properties: Var(Y k ) = b k Σb k Cov(Y k, Y l ) = b k Σb l The principal components are the uncorrelated linear combinations Y 1,..., Y p whose variances are as large as possible. b 1 = argmax{var(b 1 X)} b 1 =1 b 2 = argmax{var(b 2 X)} subject to Cov(b 1 X, b 2 X) = 0 b 2 =1 b l = argmax b l =1 {Var(b l X)} subject to Cov(b k X, b l X) = 0 k < l Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 20

21 Population Principal Components Definition Visualizing Principal Components for Bivariate Normal Figure: Figure 8.1 from Applied Multivariate Statistical Analysis, 6th Ed (Johnson & Wichern). Note that e 1 and e 2 denote the eigenvectors of Σ. Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 21

22 Population Principal Components Calculation PCA Solution via the Eigenvalue Decomposition We can write the population covariance matrix Σ such as Σ = VΛV = where p λ k v k v k k=1 V = [v 1,..., v p ] contains the eigenvectors (V V = VV = I p ) Λ = diag(λ 1,..., λ p ) contains the (non-negative) eigenvalues The PCA solution is obtained by setting b k = v k for k = 1,..., p: Var(Y k ) = Var(v k X) = v k Σv k = v k VΛV v k = λ k Cov(Y k, Y l ) = Cov(v k X, v l X) = v k Σv l = v k VΛV v l = 0 if k l Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 22

23 Population Principal Components Properties Variance Explained by Principal Components Y 1,..., Y p has the same total variance as X 1,..., X p : p Var(X j ) = tr(σ) = tr(vλv ) = tr(λ) = j=1 p Var(Y j ) j=1 The proportion of the total variance accounted for by the k-th PC is R 2 k = λ k p l=1 λ l If r k=1 R2 k 1 for some r < p, we do not lose much transforming the original variables into fewer new (principal component) variables. Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 23

24 Population Principal Components Properties Covariance of Variables and Principal Components The covariance between X j and Y k has the form Cov(X j, Y k ) = Cov(e j X, v k X) = e j Σv k = e j (VΛV )v k = e j v kλ k = v jk λ k where e j is a vector of zeros with a one in the j-th position v k = (v 1k,..., v pk ) is the k-th eigenvector of Σ This implies that the correlation between X j and Y k has the form Cor(X j, Y k ) = Cov(X j, Y k ) Var(Xj ) Var(Y k ) = v jkλ k σjj λk = v jk λk σjj Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 24

25 Sample Principal Components Sample Principal Components Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 25

26 Sample Principal Components Definition Linear Combinations of Observed Random Variables x i = (x i1,..., x ip ) is an observed random vector and x i iid (µ, Σ). Consider forming new variables y i1,..., y ip by taking p different linear combinations of the x ij variables: y i1 = b 1 x i = b 11 x i1 + b 21 x i2 + + b p1 x ip y i2 = b 2 x i = b 12 x i1 + b 22 x i2 + + b p2 x ip. y ip = b px i = b 1p x i1 + b 2p x i2 + + b pp x ip where b k = (b 1k,..., b pk ) is the k-th linear combination vector. Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 26

27 Sample Principal Components Definition Sample Properties of Linear Combinations Note that the sample mean and variance of the y ik variables are: ȳ k = 1 n n y ik = b k x i=1 s 2 y k = 1 n 1 = 1 n 1 n i=1 (y ik ȳ k ) 2 = 1 n 1 n (b k x i b k x)(b k x i b k x) i=1 n b k (x i x)(x i x) b k = b k Sb k i=1 and the sample covariance between y ik and y il is given by s yk y l = 1 n 1 n (y ik ȳ k )(y il ȳ l ) = b k Sb l i=1 Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 27

28 Sample Principal Components Definition Defining Principal Components in the Sample The principal components are the uncorrelated linear combinations y i1,..., y ip whose sample variances are as large as possible. b 1 = argmax{b 1 Sb 1} b 1 =1 b 2 = argmax{b 2 Sb 2} subject to b 1 Sb 2 = 0 b 2 =1. b l = argmax b l =1 {b l Sb l} subject to b k Sb l = 0 k < l Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 28

29 Sample Principal Components Calculation PCA Solution via the Eigenvalue Decomposition We can write the sample covariance matrix S such as S = ˆVˆΛˆV = p ˆλ k ˆv k ˆv k k=1 where ˆV = [ˆv 1,..., ˆv p ] contains the eigenvectors (ˆV ˆV = ˆVˆV = I p ) ˆΛ = diag(ˆλ 1,..., ˆλ p ) contains the (non-negative) eigenvalues The PCA solution is obtained by setting b k = ˆv k for k = 1,..., p: s 2 y k = ˆv k Sˆv k = ˆv kˆvˆλˆv ˆv k = ˆλ k s yk y l = ˆv k Sˆv l = ˆv kˆvˆλˆv ˆv l = 0 if k l Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 29

30 Sample Principal Components Properties Variance Explained by Principal Components {y i1,..., y ip } n i=1 has the same total variance as {x i1,..., x ip } n i=1 : p sk 2 = tr(s) = tr(ˆvˆλˆv ) = tr(ˆλ) = j=1 p j=1 s 2 y k The proportion of the total variance accounted for by the k-th PC is ˆR 2 k = ˆλ k p l=1 ˆλ l If r ˆR k=1 k 2 1 for some r < p, we do not lose much transforming the original variables into fewer new (principal component) variables. Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 30

31 Sample Principal Components Properties Covariance of Variables and Principal Components The sample covariance between x ij and y ik has the form Cov(x ij, y ik ) = 1 n (x ij x j )(y ik ȳ k ) n 1 = 1 n 1 i=1 n e j (x i x)(x i x) ˆv k i=1 = e j Sˆv k = e j ˆv k ˆλk = ˆv jk ˆλk where e j is a vector of zeros with a one in the j-th position ˆv k = (ˆv 1k,..., ˆv pk ) is the k-th eigenvector of S This implies that the (sample) correlation between x ij and y ik is Cor(x ij, y ik ) = Cov(x ij, y ik ) Var(xij ) Var(y ik ) = ˆv jk ˆλ k s j ˆλ 1/2 = ˆv 1/2 jk ˆλ k s j k Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 31

32 Sample Principal Components Large Sample Properties Properties Assume that x i iid N(µ, Σ) and that the eigenvalues of Σ are strictly positive and unique: λ 1 > > λ p > 0. As n, we have that n(ˆλ λ) N(0, 2Λ 2 ) n(ˆv k v k ) N(0, V k ) where V k = λ k l k λ l (λ l λ k ) 2 v l v l Furthermore, as n, we have that ˆλ k and ˆv k are independent. Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 32

33 Principal Components Analysis in Practice Principal Components Analysis in Practice Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 33

34 Principal Components Analysis in Practice Covariance or Correlation Matrix? PCA Solution is Related to SVD (Covariance Matrix) Let ÛˆDˆV denote the SVD of X c = CX. The PCA (covariance) solution is directly related to the SVD of X c : Y = ÛˆD and B = ˆV Note that columns of ˆV are the... Right singular vectors of the mean-centered data matrix X c Eigenvectors of the covariance matrix S = 1 n 1 X cx c = ˆVˆΛˆV where ˆΛ = diag(ˆλ 1,..., ˆλ p ) with ˆλ k = ˆd 2 kk n 1 and ˆD = diag(ˆd 11,..., ˆd pp ). Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 34

35 Principal Components Analysis in Practice Covariance or Correlation Matrix? PCA Solution is Related to SVD (Correlation Matrix) Let Ũ DṼ denote the SVD of X s = CXD 1 with D = diag(s 1,..., s p ). The PCA (correlation) solution is directly related to the SVD of X cs : Y = Ũ D and B = Ṽ Note that columns of Ṽ are the... Right singular vectors of the centered and scaled data matrix X s Eigenvectors of the correlation matrix R = 1 n 1 X sx s = Ṽ ΛṼ where Λ = diag( λ 1,..., λ p ) with λ k = d 2 kk n 1 and D = diag( d 11,..., d pp ). Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 35

36 Principal Components Analysis in Practice Covariance or Correlation Matrix? Covariance versus Correlation Considerations Problem: there is no simple relationship between SVDs of X c and X s. No simple relationship between PCs obtained from S and R Rescaling variables can fundamentally change our results Note that PCA is trying to explain the variation in S or R If units of p variables are comparable, covariance PCA may be more informative (because units of measurement are retained) If units of p variables are incomparable, correlation PCA may be more appropriate (because units of measurement are removed) Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 36

37 Principal Components Analysis in Practice Decathlon Example Men s Olympic Decathlon Data from 1988 Data from men s 1988 Olympic decathlon Total of n = 34 athletes Have p = 10 variables giving score for each decathlon event Have overall decathlon score also (score) > decathlon[1:9,] run100 long.jump shot high.jump run400 hurdle discus pole.vault javelin run1500 score Schenk Voss Steen Thompson Blondel Plaziat Bright De.Wit Johnson Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 37

38 Principal Components Analysis in Practice Resigning Running Events Decathlon Example For the running events (run100, run400, run1500, and hurdle), lower scores correspond to better performance, whereas higher scores represent better performance for other events. To make interpretation simpler, we will resign the running events: > decathlon[,c(1,5,6,10)] <- (-1)*decathlon[,c(1,5,6,10)] > decathlon[1:9,] run100 long.jump shot high.jump run400 hurdle discus pole.vault javelin run1500 score Schenk Voss Steen Thompson Blondel Plaziat Bright De.Wit Johnson Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 38

39 Principal Components Analysis in Practice PCA on Covariance Matrix Decathlon Example # PCA on covariance matrix (default) > pcacov <- princomp(x=decathlon[,1:10]) > names(pcacov) [1] "sdev" "loadings" "center" "scale" "n.obs" [6] "scores" "call" # resign PCA solution > pcsign <- sign(colsums(pcacov$loadings^3)) > pcacov$loadings <- pcacov$loadings %*% diag(pcsign) > pcacov$scores <- pcacov$scores %*% diag(pcsign) # Note: R uses MLE of covariance matrix > n <- nrow(decathlon) > sum((pcacov$sdev - sqrt(eigen((n-1)/n*cov(decathlon[,1:10]))$values))^2) [1] e-28 Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 39

40 Principal Components Analysis in Practice PCA on Correlation Matrix Decathlon Example # PCA on correlation matrix > pcacor <- princomp(x=decathlon[,1:10], cor=true) > names(pcacor) [1] "sdev" "loadings" "center" "scale" "n.obs" [6] "scores" "call" # resign PCA solution > pcsign <- sign(colsums(pcacor$loadings^3)) > pcacor$loadings <- pcacor$loadings %*% diag(pcsign) > pcacor$scores <- pcacor$scores %*% diag(pcsign) # Note: PC standard deviations are sqrts correlation matrix eigenvalues > sum((pcacor$sdev - sqrt(eigen(cor(decathlon[,1:10]))$values))^2) [1] e-30 Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 40

41 Principal Components Analysis in Practice Decathlon Example Plot Covariance and Correlation PCA Results PCA of Covariance Matrix PCA of Correlation Matrix PC2 Loadings javelin discus shot pole.vault high.jump long.jump run100 hurdle score run400 run1500 PC2 Loadings run1500 run400 long.jump run100 score hurdle high.jump pole.vault javelin discus shot PC1 Loaings PC1 Loaings > dev.new(width=10, height=5, norstudiogd=true) > par(mfrow=c(1,2)) > plot(pcacov$loadings[,1:2], xlab="pc1 Loaings", ylab="pc2 Loadings", + type="n", main="pca of Covariance Matrix", xlim=c(-0.15, 1.15), ylim=c(-0.15, 1.15)) > text(pcacov$loadings[,1:2], labels=colnames(decathlon)) > plot(pcacor$loadings[,1:2], xlab="pc1 Loaings", ylab="pc2 Loadings", + type="n", main="pca of Correlation Matrix", xlim=c(0.05, 0.45), ylim=c(-0.5,0.6)) > text(pcacor$loadings[,1:2], labels=colnames(decathlon)) Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 41

42 Principal Components Analysis in Practice Decathlon Example Correlation of PC Scores and Overall Decathlon Score PCA of Covariance Matrix PCA of Correlation Matrix Decathlon Score Decathlon Score PC2 Score PC1 Score > round(cor(decathlon$score, pcacov$scores),3) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] > round(cor(decathlon$score, pcacor$scores),3) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 42

43 Principal Components Analysis in Practice Dimensionality Problem Choosing the Number of Components In practice, the optimal number of components is often unknown. In some cases, possible/feasible values may be known a priori from theory and/or past research. In other cases, we need to use some data-driven approach to select a reasonable number of components. Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 43

44 Principal Components Analysis in Practice Choosing the Number of Components Scree Plots A scree plot displays the variance explained by each component. Example Scree Plot vaf We look for the elbow of the plot, i.e., point where line bends. Could do formal test on derivative of scree line, but common sense approach often works fine # Components (q) Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 44

45 Principal Components Analysis in Practice Scree Plots For Decathlon PCA Choosing the Number of Components PCA of Covariance Matrix PCA of Correlation Matrix Variance of PC Variance of PC # PCs # PCs > dev.new(width=10, height=5, norstudiogd=true) > par(mfrow=c(1,2)) > plot(1:10, pcacov$sdev^2, type="b", xlab="# PCs", ylab="variance of PC", + main="pca of Covariance Matrix") > plot(1:10, pcacor$sdev^2, type="b", xlab="# PCs", ylab="variance of PC", + main="pca of Correlation Matrix") Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 45

Introduction to Normal Distribution

Introduction to Normal Distribution Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 17-Jan-2017 Nathaniel E. Helwig (U of Minnesota) Introduction