TMA4267 Linear Statistical Models V2017 [L3]

Size: px

Start display at page:

Download "TMA4267 Linear Statistical Models V2017 [L3]"

Franklin Jackson
6 years ago
Views:

1 TMA4267 Linear Statistical Models V2017 [L3] Part 1: Multivariate RVs and normal distribution (L3) Covariance and positive definiteness [H:2.2,2.3,3.3], Principal components [H ] Quiz with Kahoot! Mette Langaas Department of Mathematical Sciences, NTNU To be lectured: January 17, / 26

2 Previously X (p 1) : random vector, described by joint probability distribution function (pdf) f (x) and cumulative distribution function (cdf) F (x), or, as we will see (in L4), by the (multivariate) moment generating function (MGF) M X (t) = E(e tt X ). Important aspects: moments. E(X ): mean of a random vector (or matrix) is found as the mean of each element. Cov(X ) = E((X µ)(x µ) T ): p p variance-covariance matrix, with variances on the diagonal and covariance off-diagonal, real, symmetric. Rules for vector of linear combinations CX : E(CX ) = Cµ and Cov(CX ) = CΣC T. 1 / 26

3 Today! Requirements and properties of Σ = Cov(X ): symmetric, positive definite (SPD), via spectral decomposition (eigenvalues/eigenvectors). The square root matrix. Linear combinations with maximal variability: principal components are linear combinations made from eigenvectors. PCA-plots. Kahoot! on what we have worked with so far. Next lecture: move on to multivariate normal data! 2 / 26

4 Drinking habits data set Coffee Tea Cocoa Liquer Wine Beer Norway Danmark Finland Iceland Sweden France Ireland Italy Jugoslavia The Netherlands Poland Portugal Soviet Union Spain Schweitz Great Britain Chech Repl Germany Hungary Austria New Zealand / 26

5 The variance-covariance matrix and positive definiteness X (p 1) with symmetric Σ = Cov(X ) Want variance of linear combination to be positive: c T Σc for all c 0, which means that Σ needs to positive definite. Write Σ > 0. This is true if all eigenvalues of Σ are positive (eigenvalues of symmetric matrix are real). Spectral decompositions (diagonalization): Σ = PΛP T. Square root matrix defined from spectral decomposition: Σ 1 2 = PΛ 1 2 P T. 4 / 26

6 Drinking habits Coffee Tea Cocoa Liquer Wine Beer Norway Danmark Finland Iceland Sweden France Ireland Italy Jugoslavia The Netherlands Poland Portugal Soviet Union Spain Schweitz Great Britain Chech Repl Germany Hungary Austria New Zealand / 26

7 Estimators for µ and Σ [H3.3] X 1, X 2,..., X n i.i.d E(X ) = µ and Cov(X ). X = 1 n n j=1 S 2 = 1 n 1 X j n (X j X )(X j X ) T j=1 are two commonly used estimators for the mean and covariance matrix. RecEx1.P7: we may write S 2 using a centering matrix. 6 / 26

8 Drinking habits data set > drinkcov Coffee Tea Cocoa Liquer Wine Beer Coffee Tea Cocoa Liquer Wine Beer / 26

9 Correlation plot Coffee Tea Cocoa Liquer Wine Beer 1 Coffee 0.8 Tea Cocoa Liquer 0.2 Wine Beer / 26

10 prcomp drink <- read.csv("drikke.txt",sep=",",header=true) drink <- na.omit(drink) # remove missing data # now for PCA pca <- prcomp(drink,scale=true) # scale: variables are scaled, and automatically center=true > names(pca) [1] "sdev" "rotation" "center" "scale" "x" > pca$rotation # the loadings PC1 PC2 PC3 PC4 PC5 PC6 Coffee Tea Cocoa Liquer Wine Beer > s <- cor(drink) # cor, not cov, since covariates are scaled > eigen(s) # same as pca$rotations - opposite sign of some vectors $values [1] $vectors [,1] [,2] [,3] [,4] [,5] [,6] [1,] [2,] [3,] [4,] [5,] [6,] / 26

11 Principal components Let Σ be the covariance matrix associated with the random vector X p 1. The covariance matrix has the eigenvalue-vector pairs (λ j, e j ), where λ 1 λ 2 λ p 0. The mth principal component is given by Y m = e T mx = e m1 X 1 + e m2 X e mp X p and has Var(Y m ) = e T mσe m = λ m, i = 1, 2,..., p Cov(Y i, Y m ) = e T i Σe m = 0 i m 10 / 26

12 Principal components: idea 1. We choose principal component 1, PC 1 = c T 1 X, to have maximal variance max Var(c T c 1 0,c T 1 c 1 X ) 1=1 2. We choose principal component 2, PC 2 = c T 2 X, to have maximal variance and to be uncorrelated with PC 1. max Var(c T c 2 0,c T 2 c 2 X ) and c T 1 Σc 2 = 0 2=1 11 / 26

13 Principal components: idea 3. We choose principal component 3, PC 1 = c T 3 X, to have maximal variance and be uncorrelated with PC 1 and PC 2. for i = 1, and so on. max Var(c T c 3 0,c T 3 c 3 X ) and c T i Σc 3 = 0 3=1 It can be shown that choosing c i = e i (ith eigenvector of Σ) fulfills these requirements. 12 / 26

14 Principal component scores > pca$x PC1 PC2 PC3 PC4 PC5 PC6 Norway Danmark Finland Iceland Sweden France Ireland Italy Jugoslavia The Netherlands Poland Portugal Soviet Union Spain Schweitz Great Britain Chech Repl Germany Hungary Austria New Zealand / 26

15 Scores PCA1 vs PCA2 The Netherlands pca$x[, 2] Italy France Germany Austria Danmark Schweitz Norway Sweden Finland Portugal Iceland Spain Chech Repl Jugoslavia Hungary Soviet Union Poland New Zealand Great Britain Ireland pca$x[, 1] 14 / 26

16 Scores PCA1 vs PCA3 Italy Portugal Ireland pca$x[, 3] France New Zealand Soviet Union Jugoslavia Schweitz Norway Austria Danmark Sweden Spain Iceland Germany The Netherlands Chech Repl Poland Finland Great Britain Hungary pca$x[, 1] 15 / 26

17 Scores PCA2 vs PCA3 Italy Ireland Portugal pca$x[, 3] Poland Soviet Union Great Britain New Zealand France Jugoslavia Spain Iceland Chech Repl Schweitz Norway Austria Danmark Sweden Germany Finland The Netherlan Hungary pca$x[, 2] 16 / 26

18 Scores PCA1 vs PCA2 with biplot PC2 Wine Italy France The Netherlands Coffee Cocoa Germany Austria Beer Danmark Schweitz Norway Sweden Finland Portugal Iceland Spain Chech Repl New Zealand Jugoslavia Liquer Hungary Soviet Union Poland Great Britain Tea Ireland PC1 17 / 26

19 Biplots PC3 Italy Portugal Irelan Wine New Zealand TeaGreat Britain France Soviet Union Jugoslavia Schweitz Norway Austria Danmark Sweden Cocoa Spain Iceland Coffee Germany The Netherlands Beer Chech Repl Finland Poland Hungary Liquer PC3 Italy Portugal Ireland New Great Zealand Tea Britain Wine Soviet Union France Jugoslavia Norway Schweitz Danmark Austria Spain Iceland Sweden Cocoa Beer Germany Coffee The Nethe Chech Repl Poland Finland Hungary Liquer PC1 PC PC4 Norway Iceland Coffee Sweden Finland Danmark Soviet Union Jugoslavia Austria New Zealand Poland Tea Irelan Portugal Great Britain Italy Schweitz Germany Spain Chech Repl Liquer The Netherlands Beer Hungary Cocoa France Wine PC5 Beer Danmark Austria Germany Chech New Repl Zealand Wine Portugal Coffee Finland Hungary Spain Schweitz Sweden Great Britain France Norway Irelan Italy Jugoslavia Liquer Tea Iceland Soviet PolandUnion The Cocoa Netherlands PC1 PC1 18 / 26

20 Proportion of total population variance Total population variance: p j=1 Var(X j) = trσ = p j=1 λ j = p j=1 Var(Z j). Proportion of total population variance explained by PC m: λ m p j=1 λ j Proportion of total population variance explained by the first m PCs: m j=1 λ j p j=1 λ j 19 / 26

21 How many PCs are needed? Dependent on: The proportion of the total sample variance that we would like to explain. 80%? More? Look at the eigenvalues; small eigenvalues may be an evidence of collinearity problems. 20 / 26

22 Importance of components > summary(pca) Importance of components: PC1 PC2 PC3 PC4 PC5 PC6 Standard deviation Proportion of Variance Cumulative Proportion > eigen(s) $values [1] > sqrt(eigen(s)$values) [1] / 26

23 Screeplot pca Variances / 26

24 PC from standardized variables X can be standardized to have mean 0 and unit variances. X = V 1 2 (X µ) Principal components made from standardized variables will be based on the eigenvalues and eigenvectors of the correlation matrix ρ = V 1 2 ΣV 1 2. Achilles heel: Since Σ and ρ do not have the same eigenvectors/eigenvalues, the principal components made from Σ and ρ will not be the same. Unless we have a good reason to compare the variances for the different X j s we should make PCs from the standardized variables. For standardized variables p j=1 Var(X j ) = p, and Proportion of total population variance explained by PC m: λ mp. 23 / 26

25 Principal components from singular value decomposition The singular value decomposistion of a (data) matrix X n p is given by: X n p = U n p D p p V T p p where the columns of U are the eigenvectors of X X T D is a diagonal matrix with singular values on the diagonal, i.e. the square root of the eigenvalues of X X T and X T X (they have the same eigenvalues). the columns of V are the eigenvectors of X T X. And, the principal components (scores) of the data are defined as the columns of Z = X V = UD 24 / 26

26 PCR: summary PCA finds linear combinations Y that best represents the X. The PCs are found in an unsupervised way. The "truth" is not known. A plot of PC1 vs PC2 is often used to see if there is separation (subgroups in the data). The principal component loadings are often given interpretation (overall consumption, PCA can be combined with linear regression. 25 / 26

27 Quiz with Kahoot! at kahoot.it. - based on what we have gone through so far! 26 / 26

2. Matrix Algebra and Random Vectors

2. Matrix Algebra and Random Vectors 2.1 Introduction Multivariate data can be conveniently display as array of numbers. In general, a rectangular array of numbers with, for instance, n rows and p columns