Matrices and Multivariate Statistics - II

Size: px

Start display at page:

Download "Matrices and Multivariate Statistics - II"

Alannah Evans
5 years ago
Views:

1 Matrices and Multivariate Statistics - II Richard Mott November 2011

2 Multivariate Random Variables Consider a set of dependent random variables z = (z 1,..., z n ) E(z i ) = µ i cov(z i, z j ) = σ ij = σ ji ; var(z i ) = σ ii E(z) = µ var(z) = Σ - variance covariance matrix 1

3 Variance Covariance Matrices Σ is real symmetric It is also positive semidefinite: x Σx 0 for all linear combinations x of the z i Diagonalizable with eigenvalues 0 Σ = UΛU = UΛ 1/2 U UΛ 1/2 U = A 2 A = UΛ 1/2 U is the square root of Σ Also Σ = UΛU = UΛ 1/2 U UΛ 1/2 U = B B 2

4 The Linear Regression Model y = Xβ + e y n-vector of observations X n p matrix of explanatory variables e n-vector of errors, uncorrelated and normally distributed β p -vector of parameters to be estimated 3

5 Least squares equation Least squares estimator ˆβ minimises the residual sum of squares S = (y Xβ) (y Xβ) It satisfies the equation (X X)ˆβ = X y 4

6 Least squares equation (X X)ˆβ = X y Proof by completing the square: S(β) = (y Xβ + Xˆβ Xˆβ) (y Xβ + Xˆβ Xˆβ) = (y Xˆβ Xβ + Xˆβ) (y Xˆβ Xβ + Xˆβ) = (y Xˆβ) (y Xˆβ) 2(y Xˆβ) X(β ˆβ) +(β ˆβ) X X(β ˆβ) But (y Xˆβ) X(β ˆβ) = (y X ˆβ X X)(ˆβ β) = 0 and (β ˆβ) X X(β ˆβ) 0 S(β) (y Xˆβ) (y Xˆβ), min if β = ˆβ 5

7 Analysis of Variance Identity Residual SS = (y Xˆβ) (y Xˆβ) = y y 2y Xˆβ + ˆβX Xˆβ = y y ˆβX Xˆβ = Total SS - Fitting SS ŷ = Xˆβ are the predicted values. 6

8 Least squares equation (X X)ˆβ = X y ˆβ = (X X) 1 X y if (X X) is invertible If X X is not of full rank it can t be solved without adding constraints 7

9 Least Squares with constraints X X is symmetric, so can be diagonalised X X = UΛU, (U is p p) Rotate: let W = XU, so W W = Λ y = Xβ + e becomes y = XUU β + e, i.e. y = Wα + e, where α = U β. 8

10 Least Squares with constraints Then X Xˆβ = X y becomes Λˆα = W y α i for which λ i = 0 can take any value, - so set them to 0. Set generalised inverse Λ + = Λ 1 except 1/λ i is replaced by 0 if λ i = 0 9

11 Least Squares with constraints Then ˆα = Λ + W y Predicted values: ŷ = Xˆβ = X(UΛ + U )X y 10

12 Least Squares with Ridge Regression Another way of dealing with non-invertible least squares problems: Replace X X by X X + ki where k is a small constant Eigenvectors are unchanged Eigenvalues become Λ(k) = Λ + ki > 0 11

13 Ridge Regression Least squares estimator becomes ˆβ(k) = (UΛ(k) 1 U )X y ˆβ(0) is the usual least-squares estimator A shrinkage estimator: ˆβ(k) ˆβ(0) Equivalent to a Bayesian analysis with a Normal prior on β 12

14 Multivariate Normal Distribution First consider n independent N(0, 1) random variables y = (y 1,..., y n ) Their joint probability density is f(y) = i exp( y2 i /2)/(2π)n/2 = exp( y y/2)/(2π) n/2 Now consider z = µ + Ay where µ is a a fixed n-vector and A is an invertible n n matrix. E(z) = µ; var(z) = A A = Σ 13

15 Multivariate Normal Distribution y = (z µ)a 1 y y = (z µ) (A A) 1 (z µ) f(y) = exp( (z µ) (A A) 1 (z µ)/2)/(2π) n/2 14

16 Multivariate Normal Distribution Change variable: y z. Unit hypercube in y space gets mapped to a parallelipiped in z space with volume A, so dy = dz/ A f(z) = exp( (z µ) (A A) 1 (z µ)/2) A 1 (2π) n/2 f(z) = exp( (z µ) Σ 1 (z µ)/2) Σ 1/2 (2π) n/2 15

17 Linear Models with Correlated Errors y = Xβ + e Standard least squares solution is not applicable when errors are correlated with variance matrix Σ = A Aσ 2. The transformation w = A 1 y creates a new model with uncorrelated errors w = (A 1 X)β + A 1 e 16

18 Linear Models with Correlated Errors var(a 1 e) = A 1 ΣA 1 = A 1 (A A)A 1 σ 2 = Iσ 2 This is the idea behind several methods for genetic mapping with related individuals 17

19 Principal Components Analysis Let A be a n p data matrix of p different measurements made on each of n subjects. Each column of the matrix is a sample of values for a particular measurement e.g. a battery of the same phenotypic measurements is made on multiple individuals. For simplicity assume A has been centred so the mean of each column is 0 18

20 Principal Components Analysis Is there a simplification of the data that summarises most of the variation? The covariance matrix of the p measurements is A A = U ΛU The first k principal components are the k eigenvectors with the k largest eigenvalues λ 1 λ 2... λ k. 19

21 Principal Components Analysis The total variance v = i λ i Variance explained by first k PCs = k i=1 λ i /v Often a few PCs explain a large fraction (e.g. 80%) of the variance Scatter plots of the n individuals projected onto the first few PCs often reveals important structure 20

22 Principal Components Analysis Principal components has a close relationship with multiple linear models: Λˆα = W y Set Λ + be Λ 1 except 1/λ i is replaced by 0 if λ i < ɛ Reducing the number of principal components in the regression tends to improve the performance of ill-conditioned linear models 21

MLES & Multivariate Normal Theory

MLES & Multivariate Normal Theory Merlise Clyde September 6, 2016 Outline Expectations of Quadratic Forms Distribution Linear Transformations Distribution of estimates under normality Properties of MLE s Recap Ŷ = ˆµ is an unbiased estimate