MLES & Multivariate Normal Theory

Size: px

Start display at page:

Download "MLES & Multivariate Normal Theory"

Percival Quinn
5 years ago
Views:

1 Merlise Clyde September 6, 2016

2 Outline Expectations of Quadratic Forms Distribution Linear Transformations Distribution of estimates under normality

3 Properties of MLE s Recap Ŷ = ˆµ is an unbiased estimate of µ = Xβ E[e] = 0 if µ C(X) E[e] = E[(I P X )Y] MLE of σ 2 : ˆσ 2 = et e n = YT (I P X )Y n Is this an unbiased estimate of σ 2? Need expectations of quadratic forms Y T AY for A an n n matrix Y a random vector in R n

4 Quadratic Forms Without loss of generality we can assume that A = A T Y T AY is a scalar Y T AY = (Y T AY) T = Y T A T Y may take A = A T Y T AY + Y T A T Y 2 = Y T AY Y T (A + AT ) Y = Y T AY 2

5 Expectations of Quadratic Forms Theorem Let Y be a random vector in R n with E[Y] = µ and Cov(Y) = Σ. Then E[Y T AY] = traσ + µ T Aµ. Result useful for finding expected values of Mean Squares; no normality required!

6 Proof Start with (Y µ) T A(Y µ), expand and take expectations E[(Y µ) T A(Y µ)] = E[Y T AY + µ T Aµ µ T AY Y T Aµ] = E[Y T AY] + µ T Aµ µ T Aµ µ T Aµ = E[Y T AY] µ T Aµ Rearrange E[Y T AY] = E[(Y µ) T A(Y µ)] + µ T Aµ = E[tr(Y µ) T A(Y µ)] + µ T Aµ = E[trA(Y µ)(y µ) T ] + µ T Aµ = tre[a(y µ)(y µ) T ] + µ T Aµ = trae([(y µ)(y µ) T ] + µ T Aµ = traσ + µ T Aµ tra n i=1 a ii

7 Expectation of ˆσ 2 Use the theorem: E[Y T (I P X )Y] = tr(i P X )σ 2 I + µ T (I P X )µ = σ 2 tr(i P X ) = σ 2 r(i P X ) = σ 2 (n r(x)) Therefore an unbiased estimate of σ 2 is e T e n r(x) If X is full rank (r(x) = p) and P X = X(X T X) 1 X T then the tr(p X ) = tr(x(x T X) 1 X T ) = tr(x T X(X T X) 1 ) = tr(i p ) = p

8 Spectral Theorem Theorem If A (n n) is a symmetric real matrix then there exists a U (n n) such that U T U = UU T = I n and a diagonal matrix Λ with elements λ i such that A = UΛU T U is an orthogonal matrix; U 1 = U T The columns of U from an Orthonormal Basis for R n rank of A equals the number of non-zero eigenvalues λ i Columns of U associated with non-zero eigenvalues form an ONB for C(A) (eigenvectors of A) A p = UΛ p U T (matrix powers) a square root of A > 0 is UΛ 1/2 U T

9 Projections Projection Matrix If P is an orthogonal projection matrix, then its eigenvalues λ i are either zero or one with tr(p) = i (λ i) = r(p) P = UΛU T P = P 2 UΛU T UΛU T = UΛ 2 U T Λ = Λ 2 is true only for λ i = 1 or λ i = 0 Since r(p) is the number of non-zero eigenvalues, r(p) = λ i = tr(p) [ ] [ ] Ir 0 U T P = [U P U P ] P 0 0 n r U T = U P U T P P r P = u i u T i sum of r rank 1 projections. i=1

10 Distributions Distribution of ˆβ Distribution of P X Y Distribution of e Distribution ot ˆσ 2

11 Univariate Normal Definition We say that Z has a standard Normal distribution Z N(0, 1) with mean 0 and variance 1 if it has density f Z (z) = 1 2π e 1 2 z2 If Y = µ + σz then Y N(µ, σ 2 ) with mean µ and variance σ 2 f Y (y) = 1 2πσ 2 e 2( 1 z µ σ ) 2

12 Standard iid Let z i N(0, 1) for i = 1,..., d and define z 1 z 2 Z. z d Density of Z: f Z (z) = d j=1 1 2π e z2 i /2 = (2π) d/2 e 1 2 (ZT Z) E[Z] = 0 and Cov[Z] = I d Z N(0 d, I d )

13 For a d dimensional multivariate normal random vector, we write Y N d (µ, Σ) Density E[Y] = µ: d dimensional vector with means E[Y j ] Cov[Y] = Σ: d d matrix with diagonal elements that are the variances of Y j and off diagonal elements that are the covariances E[(Y j µ j )(Y k µ k )] If Σ is positive definite (x Σx > 0 for any x 0 in R d ) then Y has a density a p(y) = (2π) d/2 Σ 1/2 exp( 1 2 (Y µ)t Σ 1 (Y µ)) a with respect to Lebesgue measure on R d

14 Density Density of Z N(0, I d ): f Z (z) = d j=1 1 2π e z2 i /2 = (2π) d/2 e 1 2 (ZT Z) Write Y = µ + AZ Solve for Z = g(y) Jacobian of the transformation J(Z Y) = g Y substitute g(y) for Z in density and multiply by Jacobian f Y (y) = f Z (z)j(z Y)

15 Density Y = µ + AZ for Z N(0, I d ) (1) Proof. since Σ > 0, an A (d d) such that A > 0 and AA T = Σ A > 0 A 1 exists Multiply both sides (1) by A 1 : A 1 Y = A 1 µ + A 1 AZ Rearrange A 1 (Y µ) = Z Jacobian of transformation dz = A 1 dy Substitute and simplify algebra f (Y) = (2π) d/2 Σ 1/2 exp( 1 2 (Y µ)t Σ 1 (Y µ))

16 Singular Case Y = µ + AZ with Z R d and A is n d E[Y] = µ Cov(Y) = AA T 0 Y N(µ, Σ) where Σ = AA T If Σ is singular then there is no density (on R n ), but claim that Y still has a multivariate normal distribution! Definition Y R n has a multivariate normal distribution N(µ, Σ) if for any v R n v T Y has a normal distribution with mean v T µ and variance v T Σv see Lessons in Sakai for videos using Characteristic functions

17 Linear Transformations are Normal If Y N n (µ, Σ) then for A m n AY N m (Aµ, AΣA T ) AΣA T does not have to be positive definite!

18 Equal in Distribution Multiple ways to define the same normal: Z 1 N(0, I n ), Z 1 R n and take A d n Z 2 N(0, I p ), Z 2 R p and take B d p Define Y = µ + AZ 1 Define W = µ + BZ 2 Theorem If Y = µ + AZ 1 and W = µ + BZ 2 then Y D = W if and only if AA T = BB T = Σ

19 Zero Correlation and Independence Theorem For a random vector Y N(µ, Σ) partitioned as [ ] ([ ] [ Y1 µ1 Σ11 Σ Y = N, 12 Y 2 µ 2 Σ 21 Σ 22 then Cov(Y 1, Y 2 ) = Σ 12 = Σ T 21 = 0 if and only if Y 1 and Y 2 are independent. ])

20 Independence Implies Zero Covariance Proof. Cov(Y 1, Y 2 ) = E[(Y 1 µ 1 )(Y 2 µ 2 ) T ] If Y 1 and Y 2 are independent E[(Y 1 µ 1 )(Y 2 µ 2 ) T ] = E[(Y 1 µ 1 )E(Y 2 µ 2 ) T ] = 00 T = 0 therefore Σ 12 = 0

21 Zero Covariance Implies Independence Assume Σ 12 = 0 Proof Choose an [ A1 0 A = 0 A 2 ] such that A 1 A T 1 = Σ 11, A 2 A T 2 = Σ 22 Partition Z = [ Z1 Z 2 ] ([ 01 N 0 2 ] [ I1 0, 0 I 2 ]) and µ = [ µ1 µ 2 ] then Y D = AZ + µ N(µ, Σ)

22 Continued Proof. [ Y1 Y 2 ] [ D A1 Z = 1 + µ 1 A 2 Z 2 + µ 2 ] But Z 1 and Z 2 are independent Functions of Z 1 and Z 2 are independent Therefore Y 1 and Y 2 are independent For Zero Covariance implies independence

23 Another Useful Result Corollary If Y N(µ, σ 2 I n ) and AB T = 0 then AY and BY are independent. Proof. [ W1 W 2 ] = [ A B ] [ AY Y = BY ] Cov(W 1, W 2 ) = Cov(AY, BY) = σ 2 AB T AY and BY are independent if AB T = 0

Maximum Likelihood Estimation

Maximum Likelihood Estimation Merlise Clyde STA721 Linear Models Duke University August 31, 2017 Outline Topics Likelihood Function Projections Maximum Likelihood Estimates Readings: Christensen Chapter