MATH5745 Multivariate Methods Lecture 07

Size: px

Start display at page:

Download "MATH5745 Multivariate Methods Lecture 07"

Chrystal Louisa Bailey
5 years ago
Views:

1 MATH5745 Multivariate Methods Lecture 07 Tests of hypothesis on covariance matrix March 16, 2018 MATH5745 Multivariate Methods Lecture 07 March 16, / 39

2 Test on covariance matrices: Introduction example I Frets (1921) 1 gave the head length and breadth (in mm.) for first and second sons in a number of families. First son Second son Head length Head breadth Head length Head breadth y 1 y 2 y 3 y Frets, G.P. (1921) Heredity of head form in man. Genetica, 3, MATH5745 Multivariate Methods Lecture 07 March 16, / 39

3 Test on covariance matrices: Introduction example II The sample covariance matrix for the Frets (1921) data is S = To test the hypothesis that all variables are independent we would test a hypothesis that the covariance matrix Σ satisfies: Σ = σ σ σ σ 44. MATH5745 Multivariate Methods Lecture 07 March 16, / 39

4 Test on covariance matrices: Introduction example III The sample covariance matrix for the Frets (1921) data is S = To test the hypothesis that the two sons are independent we would test a hypothesis that the covariance matrix Σ satisfies: Σ = σ 11 σ σ 21 σ σ 33 σ σ 43 σ 44. MATH5745 Multivariate Methods Lecture 07 March 16, / 39

5 Test on covariance matrices: Introduction example IV The sample covariance matrix for the Frets (1921) data is S = To test the hypothesis that the two sons have common variances and covariances we might test a hypothesis that the covariance matrix Σ satisfies: Σ = σ 11 σ 12 σ 13 σ 14 σ 21 σ 22 σ 23 σ 24 σ 31 σ 32 σ 11 σ 12 σ 41 σ 42 σ 21 σ 22. MATH5745 Multivariate Methods Lecture 07 March 16, / 39

6 Test on covariance matrices: Introduction example V The sample covariance matrix for the Frets (1921) data is S = We might want to test a specific proposal about the covariance matrix: Σ = MATH5745 Multivariate Methods Lecture 07 March 16, / 39

7 Test on covariance matrices Consider now some tests on covariance matrices. Like tests on mean vectors, can perform tests on covariance matrices. These are the null hypotheses we are interested in: H 0 : Σ = Σ 0 (where Σ 0 is a given known matrix) H 0 : Σ = σ 2 I (special case) H 0 : Σ = σ 2 {(1 ρ)i + ρj} (special case) H 0 : Σ = diag(σ 11, σ 22,..., σ pp ) (special case) H 0 : Σ 1 = Σ 2 (do two populations have same covariance structure?) MATH5745 Multivariate Methods Lecture 07 March 16, / 39

8 Likelihood ratio tests: Introduction I Have observations y i and wish to test hypotheses about parameters θ. The likelihood function is L(θ; Y). We test a null hypothesis H 0 against an alternative hypothesis H 1. The likelihood ratio test statistic is Clearly 0 < λ < 1. λ = Small values of λ imply H 0 false. Large values λ 1 suggest H 0 true. sup L(θ; Y) H 0 sup L(θ; Y). H 0 H 1 Can be shown that 2 log λ χ 2 r where r = dim H 0 H 1 dim H 0. MATH5745 Multivariate Methods Lecture 07 March 16, / 39

9 Likelihood ratio tests: Introduction II The likelihood function is L(θ; Y). The log-likelihood function is l(θ; Y) = log L(θ; Y). sup L(θ; Y) H Recall: Likelihood ratio statistic is λ = 0 sup L(θ; Y). H 0 H 1 Can write 2 log λ = 2l 0 + 2l 1. Here l 0 is the maximised log-likelihood under H 0,. l 1 is the maximised log-likelihood under H 0 H 1. Reject H 0 for large values of U = 2 log λ. Recall: 2 log λ χ 2 r where r = dim H 0 H 1 dim H 0. MATH5745 Multivariate Methods Lecture 07 March 16, / 39

10 Likelihood ratio tests: Multivariate normal distribution I Suppose we have y 1, y 2,..., y n ind N p (µ, Σ). Likelihood is L(µ, Σ) = 2πΣ n 2 exp { 1 2 The log-likelihood is } n (y i µ) Σ 1 (y i µ). l(µ, Σ) = 1 2 n log 2πΣ 1 2 n (y i µ) Σ 1 (y i µ). MATH5745 Multivariate Methods Lecture 07 March 16, / 39

11 Likelihood ratio tests: Multivariate normal distribution II Recall last slide: For y 1, y 2,..., y n ind N p (µ, Σ) the log-likelihood is l(µ, Σ) = 1 2 n log 2πΣ 1 2 n (y i µ) Σ 1 (y i µ). Recall lecture 05: we can write the log-likelihood as l(µ, Σ) = 1 2 n log 2πΣ 1 2 n trace(σ 1 V) 1 2 n(y µ) Σ 1 (y µ) where V = 1 n 1 n (y i y)(y i y). Can show that maximum likelihood estimators are: ˆµ = y and ˆΣ = V. The maximised log-likelihood is then l(ˆµ, ˆΣ) = 1 2 n log 2πV 1 2 np. [QUESTION] Why is trace( ˆΣ 1 V) = p? MATH5745 Multivariate Methods Lecture 07 March 16, / 39

12 Testing H 0 : Σ = Σ 0 Suppose we observe data of n observation vectors y 1, y 2,..., y n. The y i assumed to form a random sample from N p (µ, Σ). We want to test H 0 : Σ = Σ 0. Notice that µ is not specified in H 0. Suppose we observe the sample covariance matrix S (and equivalently we observe V). Idea: How far is V (assumed sampled from Σ) from Σ 0? MATH5745 Multivariate Methods Lecture 07 March 16, / 39

13 Testing H 0 : Σ = Σ 0 : Likelihood ratio test We want to test H 0 : Σ = Σ 0. The maximised log-likelihood under H 0 is, with ˆµ = y, l 0 = 1 2 n log 2πΣ n trace(σ 1 0 V). The maximised log-likelihood under H 1 is, with ˆµ = y and ˆΣ = V, The likelihood ratio test statistic is U = 2 log λ = 2l 0 + 2l 1 l 1 = 1 2 n log 2πV 1 2 np. = n log 2πΣ 0 + n trace(σ 1 0 V) n log 2πV np = n(trace(σ 1 0 V) log Σ 1 0 V p). MATH5745 Multivariate Methods Lecture 07 March 16, / 39

14 Testing H 0 : Σ = Σ 0 The test statistic is given by U = n(trace(σ 1 0 V) log Σ 1 0 V p). Notice that alternative representations are possible as trace(σ 1 0 V) = trace(vσ 1 0 ) and log Σ 1 0 V = log Σ 0 + log V. If V = Σ 0, then U = 0. dim H 0 = p, dim H 0 H 1 = p p(p + 1) so r = 1 2p(p + 1). Under H 0, the test statistic U satisfies U χ p(p+1) Reject H 0 if U > χ p(p+1)(5%). The χ 2 approximation is OK for large n. MATH5745 Multivariate Methods Lecture 07 March 16, / 39

15 Testing H 0 : Σ = Σ 0 : Re-expression of U Suppose λ 1, λ 2,..., λ p are the eigenvalues of Σ 1 0 V. Then trace ( p Σ 1 0 V) = λ i. Also Σ 1 ( p p ) 0 V = λ i so log Σ 1 0 V = log λ i = p log λ i. The test statistic is U = n(trace(σ 1 0 V) log Σ 1 0 V p). This can be written as [ ] [ p p p ] U = n + λ i log λ i p = n (λ i log λ i ) p. MATH5745 Multivariate Methods Lecture 07 March 16, / 39

16 Testing H 0 : Σ = Σ 0 : Re-expression of U Alternatively: Suppose λ 1, λ 2,..., λ p are the eigenvalues of Σ 1 0 ( V. Write: a = 1 p p ) 1/p λ i and g = λ i. p Then trace ( Σ 1 0 V) = p p λ i = ap and Σ 1 0 V = λ i = g p. The test statistic is U = n(trace(σ 1 0 V) log Σ 1 0 V p). This gives U = np(a log g 1). MATH5745 Multivariate Methods Lecture 07 March 16, / 39

17 Testing H 0 : Σ = Σ 0 : Rencher results. Rencher gives the formula U R = (n 1) ( trace(σ 1 0 using the unbiased covariance matrix S. This is NOT the same as U. S) log Σ 1S p) For moderate n, Rencher gives the following approximation [ UR = 1 1 ( 2p )] U R 6n 7 p + 1 as giving a better fit to χ p(p+1) under H 0. I would use U based upon the maximum likelihood estimate V. 0 MATH5745 Multivariate Methods Lecture 07 March 16, / 39

18 Testing H 0 : Σ = Σ 0. Special cases Test whether the variables are independent with common variance σ 2. σ σ H 0 : Σ = = σ2 I σ 2 Test whether the variables have common correlation (dependencies) and common variance. σ 2 σ 2 ρ... σ 2 ρ σ 2 ρ σ 2... σ 2 ρ H 0 : Σ = = σ2 {(1 ρ)i + ρj}. σ 2 ρ σ 2 ρ... σ 2 MATH5745 Multivariate Methods Lecture 07 March 16, / 39

19 Testing H 0 : Σ = Σ 0. Example. Consider the probe word example in Lecture 04. Timm (1975) reported response time (in ms) to probe words in five positions in a sentence from 11 individuals y 1 y 2 y 3 y 4 y The five variables look commensurate, so we are interested in testing H 0 : Σ = σ 2 I. MATH5745 Multivariate Methods Lecture 07 March 16, / 39

20 Testing H 0 : Σ = Σ 0. Example. The sample covariance matrix is given by S = The estimate ˆΣ = V is given by V = n 1 n S = MATH5745 Multivariate Methods Lecture 07 March 16, / 39

21 Testing H 0 : Σ = Σ 0. Example 1. Suppose we are interested in the null hypothesis H 0 : Σ = MATH5745 Multivariate Methods Lecture 07 March 16, / 39

22 Testing H 0 : Σ = Σ 0. Example 1. We have n = 11, p = 5, trace(σ 1 0 V) = , log Σ 0 = , log V = U = 11( ) = Degrees of freedom 1 2p(p + 1) = 15. χ 2 15 (5%) = so reject H 0 at 5% level χ 2 15 pdf Reject H 0 at 5% level Pr(χ 2 15 > ) = 0.05 U= U MATH5745 Multivariate Methods Lecture 07 March 16, / 39

23 Testing H 0 : Σ = Σ 0. Example 1 (Rencher statistic). We have n = 11, p = 5, trace(σ 1 0 S) = , log Σ 0 = , log S = U R = 10( ) = Degrees of freedom 1 2p(p + 1) = 15. χ 2 15 (5%) = so reject H 0 at 5% level. Because of moderate ( size of n, consider )] the approximation: UR [1 = 1 6(10) 7 2(5) U R = Compare this to χ 2 15 (5%) = Again reject H 0 at 5% level. MATH5745 Multivariate Methods Lecture 07 March 16, / 39

24 Testing H 0 : Σ = Σ 0. Example 2. Suppose we are interested in the null hypothesis H 0 : Σ = MATH5745 Multivariate Methods Lecture 07 March 16, / 39

25 Testing H 0 : Σ = Σ 0. Example 2. We have n = 11, p = 5, log V = (the same). trace(σ 1 0 V) = , log Σ 0 = U = 11( ) = Degrees of freedom 1 2p(p + 1) = 15. χ 2 15 (5%) = so do not reject H 0 at 5% level χ 2 15 pdf U= Do not reject H 0 at 5% level Pr(χ 2 15 > ) = U MATH5745 Multivariate Methods Lecture 07 March 16, / 39

26 Testing H 0 : Σ = Σ 0. Example 2 (Rencher statistic). We have n = 11, p = 5, log S = (the same). trace(σ 1 0 S) = , log Σ 0 = , log S = U R = 10( ) = Degrees of freedom 1 2p(p + 1) = 15. χ 2 15 (5%) = so do not reject H 0 at 5% level. Because of moderate ( size, we consider )] the approximation UR [1 = 1 6(10) 1 2(5) U R = Compare this to χ 2 15 (5%) = Again do not reject H 0 at 5% level. MATH5745 Multivariate Methods Lecture 07 March 16, / 39

27 Testing H 0 : Σ = Σ 0. Example 3. Suppose we are interested in the null hypothesis H 0 : Σ = MATH5745 Multivariate Methods Lecture 07 March 16, / 39

28 Testing H 0 : Σ = Σ 0. Example 3. We have n = 11, p = 5, log V = (the same). trace(σ 1 0 V) = , log Σ 0 = U = 11( ) = Degrees of freedom 1 2p(p + 1) = 15. χ 2 15 (5%) = so reject H 0 at 5% level χ 2 15 pdf Reject H 0 at 5% level Pr(χ 2 15 > ) = 0.05 U= U MATH5745 Multivariate Methods Lecture 07 March 16, / 39

29 Testing H 0 : Σ = Σ 0. Example 3 (Rencher statistic). We have n = 11, p = 5, log S = (the same). trace(σ 1 0 S) = , log Σ 0 = , log S = U R = 10( ) = Degrees of freedom 1 2p(p + 1) = 15. χ 2 15 (5%) = so reject H 0 at 5% level. Because of moderate ( size, we consider )] the approximation UR [1 = 1 6(10) 1 2(5) U R = Compare this to χ 2 15 (5%) = Again (just) reject H 0 at 5% level. MATH5745 Multivariate Methods Lecture 07 March 16, / 39

30 Testing H 0 : Σ 1 = Σ 2 = = Σ k. Imagine at time 1 (say) we have n 1 observations on p variables. Suppose at time 2 (say) we obtain a different sample with n 2 observations from the same p variables. Notice: we have different units of observations, but the same variables measured. We are interested in whether the covariances come from the same population. We want to test (here) H 0 : Σ 1 = Σ 2. More generally we have k samples and wish to test the null hypothesis H 0 : Σ 1 = Σ 2 = = Σ k. MATH5745 Multivariate Methods Lecture 07 March 16, / 39

31 Testing H 0 : Σ 1 = Σ 2 = = Σ k. Suppose we observe n i observation vectors y ij taken from N p (µ i, Σ i ), for j = 1, 2,..., n i and i = 1, 2,..., k. We want to test H 0 : Σ 1 = Σ 2 = = Σ k. Notice that µ i is not specified in H 0. Sample i with n i observations has sample mean y i and sample covariance matrix S i (and equivalently V i ) where Let n = k n i, V = 1 n V i = (n i 1) n i S i. k n i V i, and S = 1 n k S is the pooled unbiased estimator of the common Σ. k (n i 1)S i. MATH5745 Multivariate Methods Lecture 07 March 16, / 39

32 Testing H 0 : Σ 1 = Σ 2 = = Σ k : Likelihood ratio test We want to test H 0 : Σ 1 = Σ 2 = = Σ k. The maximised log-likelihood under H 0 is, with ˆµ i = y i and ˆΣ i = V, l 0 = 1 2 n log 2πV 1 2 np. The maximised log-likelihood under H 1 is, with ˆµ i = y i and ˆΣ i = V i, l 1 = 1 2 k n i log 2πV i 1 2 np. The likelihood ratio test statistic is U = 2 log λ = 2l 0 + 2l 1 = n log 2πV = k k n i log 2πV i n i log V 1 i V. MATH5745 Multivariate Methods Lecture 07 March 16, / 39

33 Testing H 0 : Σ 1 = Σ 2 = = Σ k : Likelihood ratio test The likelihood ratio test statistic is given by U = k n i log V 1 i V. Notice that this can be written as k { V n U = n log V n i log V i or U = log V 1 n 1 V2 n 2 Vk n k dim H 0 = pk p(p + 1), dim H 0 H 1 = pk + 1 2p(p + 1)k so degrees of freedom of likelihood ratio test is r = 1 2 (k 1)p(p + 1). Under H 0, the test statistic U satisfies U χ (k 1)p(p+1) Reject H 0 if U > χ 2 1 (k 1)p(p+1)(5%). 2 The χ 2 approximation is OK for large n. }. MATH5745 Multivariate Methods Lecture 07 March 16, / 39

34 Testing H 0 : Σ 1 = Σ 2 = = Σ k. Box M statistic. The likelihood ratio test statistic is U = k n i log V 1 i V. If n i is small, this gives too much weight to the contribution of V. Box (1949) proposed an alternative test statistic M using the unbiased estimators S i and S. Here S i is the i-th sample covariance matrix, and S is the pooled sample covariance matrix S = 1 n k k (n i 1)S i where n = k n i. MATH5745 Multivariate Methods Lecture 07 March 16, / 39

35 Testing H 0 : Σ 1 = Σ 2 = = Σ k. Box M statistic. The Box M statistic is M = c k (n i 1) log S 1 i c is a constant which gives M χ (k 1)p(p+1) if H 0 is true. The Box M statistic can be alternatively written as { } k M = c (n k) log S (n i 1) log S i. S. MATH5745 Multivariate Methods Lecture 07 March 16, / 39

36 Testing H 0 : Σ 1 = Σ 2 = = Σ k. Box M statistic. Box s constant c is given by { ( 2p 2 k )} + 3p 1 1 c = 1 6(k 1)(p + 1) n i 1 1. n k If H 0 is true, then M χ (k 1)p(p+1). Reject H 0 if M > χ (k 1)p(p+1)(5%). Notice that if n 1 = n 2 = = n k = n/k, then c = 1 { (k + 1)(2p 2 } + 3p 1). 6(n k)(p + 1) Box s test is OK for n i at least 20 and k and p not exceeding 5. MATH5745 Multivariate Methods Lecture 07 March 16, / 39

37 Testing H 0 : Σ 1 = Σ 2. Example. Consider the data on four psychological test scores among 32 males and 32 females from lecture 06. We consider them as two samples, (1) males and (2) females. We test H 0 : Σ 1 = Σ 2. MATH5745 Multivariate Methods Lecture 07 March 16, / 39

38 Testing H 0 : Σ 1 = Σ 2. Example. Recall 16 y 1 = 16 27, S 1 = y 2 = 14 17, S 2 = 22 S = (32 1)S 1 + (32 1)S = MATH5745 Multivariate Methods Lecture 07 March 16, / 39

39 Testing H 0 : Σ 1 = Σ 2. Example. Here n 1 = 32, n 2 = 32, n = 64, k = 2, p = 4. Also log S 1 = , log S 2 = , log S = { (k + 1)(2p 2 } { } + 3p 1) (5)(43) c = 1 = 1 = (n k)(p + 1) 6(62)(5) Box M statistic is M = c {62 log S 31 log S 1 31 log S 2 } = { } = { } = M has r = 1 2 (k 1)p(p + 1) = 10 degrees of freedom. χ 2 10 (5%) = Do not reject H 0 at 5% level. MATH5745 Multivariate Methods Lecture 07 March 16, / 39

Hypothesis Testing. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA

Hypothesis Testing. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA Hypothesis Testing Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA An Example Mardia et al. (979, p. ) reprint data from Frets (9) giving the length and breadth (in