2.5.3 Generalized likelihood ratio tests

Size: px

Start display at page:

Download "2.5.3 Generalized likelihood ratio tests"

Dayna Lee
6 years ago
Views:

1 25 HYPOTHESIS TESTING Generalized likelihood ratio tests When a UMP test does not exist, we usually use a generalized likelihood ratio test to verify H 0 : ϑ Θ against H 1 : ϑ Θ\Θ It can be used when H 0 is composite, which none of the above methods can The generalized likelihood ratio test has rejection region R = {y : λ(y a}, where λ(y = max ϑ Θ L(ϑ y max ϑ Θ L(ϑ y is the generalized likelihood ratio and a is a constant chosen to give significance levelα, that is such that P(λ(Y a H 0 = α If we let ˆϑ denote the maximum likelihood estimate of ϑ and let ϑ 0 denote the value of ϑ which maximises the likelihood over all values of ϑ in Θ, then we may write λ(y = L( ϑ 0 y L( ϑ y The quantity ϑ 0 is called the restricted maximum likelihood estimate ofϑunder H 0 Example 237 Suppose that Y i iid N(µ,σ 2 and consider testing H 0 : µ = µ 0 againsth 1 : µ µ 0 Then the likelihood is L(µ,σ 2 y = (2πσ 2 n 2 exp { 1 2σ 2 } n (y i µ 2 The maximum likelihood estimate ofϑ=(µ,σ 2 T is ϑ = ( µ, σ 2 T, where µ = y and σ 2 = n (y i y 2 /n Similarly, the restricted maximum likelihood estimate of ϑ = (µ,σ 2 T under H 0 is ϑ 0 = ( µ 0, σ 2 0 T, where µ 0 = µ 0 and σ 2 0 = n (y i µ 0 2 /n

2 128 CHAPTER 2 ELEMENTS OF STATISTICAL INFERENCE Thus, the generalized likelihood ratio is λ(y = L(µ 0,ˆσ 2 0 y L(ˆµ,ˆσ 2 y = = { } (2πˆσ 0 2 n 2 exp 1 n 2ˆσ 0 2 (y i µ 0 2 (2πˆσ 2 n 2 exp { 1 n 2ˆσ 2 (y i y 2} n { 2 2 (ˆσ n n exp (y i y 2 ˆσ n (y i y n n (y } i µ n (y i µ 0 2 = { n (y i y 2 }n 2 n (y i µ 0 2 Since the rejection region isr = {y : λ(y a}, we reject H 0 if { n (y i y 2 }n 2 n n a (y i y (y 2 i µ 0 2 n (y i µ 0 b, 2 where a and b are constants chosen to give significance level α Now, we may write n n (y i µ 0 2 = {(y i y+(y µ 0 } 2 So we reject H 0 if = = n (y i y 2 +2(y µ 0 n (y i y+n(y µ 0 2 n (y i y 2 +n(y µ 0 2 Thus, rearranging, we reject H 0 if n (y i y 2 n (y i y 2 +n(y µ 0 2 b 1+ n(y µ 0 2 n (y i y c n(y µ n (y i y d, 2 1 n 1 where c and d are constants chosen to give significance level α, that is we can write ( n(y µ0 2 α = P(λ(Y a H 0 = P d H S 2 0, wheres 2 = 1 n 1 n (Y i Y 2

3 25 HYPOTHESIS TESTING 129 To getdwe need to work out the distribution of n(y µ 0 2 under the null hypothesis S 2 ForY i N(µ,σ 2 we have iid Y N (µ, σ2 n and so, underh 0, This gives Also Y N (µ 0, σ2 n and n ( Y µ0 σ 2 n(y µ 0 2 σ 2 χ 2 1 (n 1S 2 σ 2 χ 2 n 1 N(0,1 Now we may use the fact that if U and V are independent rvs such that U χ 2 ν 1 and V χ 2 ν 2, then U/ν 1 V/ν 2 F ν1,ν 2 Here, U = n(y µ 0 2 σ 2 and V = (n 1S2 Hence, ifh σ 2 0 is true, we have F = U/1 V/(n 1 = n(y µ 0 2 S 2 F 1,n 1 Therefore, we reject H 0 at a significance levelα if n(y µ 0 2 F s 2 1,n 1,α, wheref 1,n 1,α is such that P(F F 1,n 1,α = α Equivalently, we reject H 0 if n(y µ0 2 that is, if s 2 t n 1, α 2, y µ 0 s2 /n t n 1, α 2 Of course, this is the usual two-sidedttest In fact, many of the standard tests in situations with normal distributions are generalized likelihood ratio tests

4 130 CHAPTER 2 ELEMENTS OF STATISTICAL INFERENCE 254 Wilks theorem In more complex cases, we have to use the following approximation to find the rejection region The result is stated without proof Theorem 29 Wilks theorem Assume that the joint distribution ofy 1,,Y n depends onpunknown parameters and that, underh 0, the joint distribution depends onp 0 unknown parameters Let ν = p p 0 Then, under some regularity conditions, when the null hypothesis is true, the distribution of the statistic 2log{λ(Y } converges to a χ 2 ν distribution as the sample sizen, ie, when H 0 is true and n is large, 2 log{λ(y } approx χ2 ν Thus, for large n, the rejection region for a test with approximate significance levelαis R = {y : 2log{λ(y} χ 2 ν,α } Example 238 Suppose that Y i iid Poisson(λ and consider testing H 0 : λ = λ 0 againsth 1 : λ λ 0 Then we have seen that no UMP test exists in this case Now, the likelihood is y i e nλ L(λ y = λ n n y i! The maximum likelihood estimate of λ is ˆλ = y and the restricted maximum likelihood estimate of λ under H 0 is ˆλ 0 = λ 0 Thus, the generalized likelihood ratio is n λ(y = L(λ 0 y = λ y i 0 e nλ 0 n y i! n L(ˆλ y y i! y n y i e ny ( n λ0 y i = e n(y λ0 y It follows that { ( λ0 2log{λ(y} = 2 nylog y { ( y = 2n ylog +λ 0 y λ 0 } +n(y λ 0 }

5 25 HYPOTHESIS TESTING 131 Here, p = 1 andp 0 = 0, and soν = 1 Therefore, by Wilks theorem, whenh 0 is true and n is large, { ( } Y 2n Y log +λ 0 Y χ 2 1 λ 0 Hence, for a test with approximate significance level α, we reject H 0 if and only if { ( } y 2n ylog +λ 0 y χ 2 1,α λ 0 Example 239 Suppose that Y i, i = 1,,n, are iid random variables with the probability mass function given by { ϑj, if y = j, j = 1,2,3; P(Y = y = 0, otherwise, whereϑ j are unknown parameters such thatϑ 1 +ϑ 2 +ϑ 3 = 1 andϑ j 0 Consider testing H 0 : ϑ 1 = ϑ 2 = ϑ 3 against H 1 : H 0 is not true We will use the Wilks theorem to derive the critical region for testing this hypothesis at an approximate significance levelα Here the full parameter space Θ is two-dimensional because there are only two free parameters, ie,ϑ 3 = 1 ϑ 1 ϑ 2 and Hencep = 2 Θ = {ϑ = (ϑ 1,ϑ 2,ϑ 3 T : ϑ 3 = 1 ϑ 1 ϑ 2,ϑ j 0} The restricted parameter space is zero-dimensional, because under the null hypothesis all the parameters are equal and as they sum up to 1, they all must be equal to1/3 Hence { ( 1 Θ = ϑ = 3, 1 3, 1 } T 3 and so p 0 = 0 (zero unknown parameters That is the number of degrees of freedom of theχ 2 distribution is ν = p p 0 = 2 To calculateλ(y we need to find the MLE(ϑ inθand in the restricted spaceθ

6 132 CHAPTER 2 ELEMENTS OF STATISTICAL INFERENCE MLE(ϑ inθ: The likelihood function is L(ϑ;y = ϑ n 1 1 ϑn 2 2 (1 ϑ 1 ϑ 2 n 3, Then, the log- where n j is the number of responses equal to j, j = 1,2,3 likelihood is Forj = 1,2 we have, l(ϑ;y = n 1 log(ϑ 1 +n 2 log(ϑ 2 +n 3 log(1 ϑ 1 ϑ 2 When compared to zero, this gives l = n j n 3 + ( 1 ϑ j ϑ j 1 ϑ 1 ϑ 2 ϑ j n j = ϑ 3 n 3 = γ Now, since ϑ 1 + ϑ 2 + ϑ 3 = 1 we obtain γ = 1 n, where n = n 1 +n 2 +n 3 Then the estimates of the parameters are ϑ j = n j /n, j = 1,2,3 The second derivatives are 2 l ϑ 2 j = n j ϑ 2 j 2 l ϑ j ϑ i = n 3 ϑ 2 3 n 3 ϑ 2 3 The determinant of the second derivatives is positive for allϑand the element 2 l ϑ 2 1 is negative for allϑ, so also for ϑ Hence, there is a maximum at ϑ and MLE(ϑ inθ : L(ϑ;y = MLE(ϑ j = ϑ j = n j n ( 1 3 n1 ( 1 3 n2 ( n3 1 3 That is ϑ 0 = ( 1, 1, 1 T Now, we can calculateλ(y: λ(y = L( ϑ ( 0 ;y 1 n1 ( 1 n2 ( 1 n3 ( n1 ( n2 ( n3 L( ϑ;y = 3 3 ( 3 n n n n1 n1 ( n2 n2 ( n3 n3 = n n n 3n 1 3n 2 3n 3

7 25 HYPOTHESIS TESTING 133 That is 2log{λ(y} = 2 3 ( n n j log 3n j We reject H 0 at an approximate significance level α if the observed sample belongs to the critical regionr, where { 3 ( } 3nj R = y : 2 n j log χ 2 2;α n

8 134 CHAPTER 2 ELEMENTS OF STATISTICAL INFERENCE 255 Contingency tables We now show that the usual test for association in contingency tables is a generalized likelihood ratio test and that its asymptotic distribution is an example of Wilks theorem Suppose that we have anr c contingency table in which there arey ij individuals classified in row i and column j, when N individuals are classified independently Let ϑ ij be the corresponding probability that an individual is classified in row i and columnj, so thatϑ ij 0 and r c ϑ ij = 1 Then the variablesy ij have a multinomial distribution with parameters N and ϑ ij fori = 1,2,,r andj = 1,2,,c Example 240 In an experiment 150 patients were allocated to three groups of 45, 45 and 60 patients each Two groups were given a new drug at different dose levels and the third group received placebo The responses are as follows: Improved No difference Worse r i = c y ij Placebo Half dose Full dose c j = r y ij N = 150 We are interested in testing the hypothesis that the response to the drug does not depend on the dose level The null hypothesis is that the row and column classifications are independent, and the alternative is that they are dependent More precisely, the null hypothesis ish 0 : ϑ ij = a i b j for somea i > 0 andb j > 0, with r a i = 1 and c b j = 1, and the alternative is H 1 : ϑ ij a i b j for at least one pair(i,j The usual test statistic is given by X 2 = where E ij = R ic j N, (Y ij E ij 2 E ij,

9 25 HYPOTHESIS TESTING 135 R i = c Y ij and C j = r Y ij For large N, X 2 χ 2 (r 1(c 1 approximately when H 0 is true, and so we reject H 0 at the level of significanceαifx 2 χ 2 (r 1(c 1,α Now consider the generalized likelihood ratio test Then the likelihood is L(ϑ y = r N! c y ij! r c ϑ y ij ij = A r c ϑ y ij ij, where the coefficient A is the number of ways that N subjects can be divided in rc groups withy ij in theij-th group The log-likelihood is l(ϑ;y = log(a+ y ij log(ϑ ij We have to maximize this, subject to the constraint r c ϑ ij = 1 Let Σ represent the sum over all pairs(i,j except(r,c Then we may write l(ϑ y = log(a+σ y ij log(ϑ ij +y rc log(1 Σ ϑ ij Thus, for(i, j (r, c, solving the equation yields ˆϑ ij /y ij = ˆϑ rc /y rc = γ, say l = y ij y rc = y ij y rc = 0 ϑ ij ϑ ij 1 Σ ϑ ij ϑ ij ϑ rc Since r c ϑ ij = r c y ijγ = 1, we have γ = 1/N, so that the maximum likelihood estimate of ϑ ij is ˆϑ ij = y ij /N for i = 1,2,,r and j = 1,2,,c It follows that l( ϑ y = log(a+ y ij log ( yij N

10 136 CHAPTER 2 ELEMENTS OF STATISTICAL INFERENCE Now, underh 0, we haveϑ ij = a i b j and so l(ϑ y = log(a+ = log(a+ y ij {log(a i +log(b j } r i log(a i + c j log(b j Now, we maximize this subject to the constraints r a i = 1 and c b j = 1 That is, we maximize l(ϑ y = log(a+σ r i log(a i +r r log(a r +Σ c j log(b j +c c log(b c = log(a+σ r i log(a i +r r log(1 Σ a i +Σ c j log(b j +c c log(1 Σ b j Then, l = r i r r a i a i a r which, when compared to zero gives or However, Hence, γ = 1/N and so 1 = r i â i = r r â r â i r i = âr r r = γ â i = r i γ = Nγ â i = r i N Similarly, we obtain the ML estimates forb j as b j = c j /N Thus, the restricted maximum likelihood estimate ofϑ ij underh 0 is fori = 1,2,,r andj = 1,2,,c ϑ 0 ij = â i b j = r i C j N N = e ij/n It follows that l( ϑ 0 y = log(a+ y ij log ( eij N

11 25 HYPOTHESIS TESTING 137 Hence, we obtain { } L( ϑ 0 y 2log{λ(y} = 2log = 2{l( ϑ 0 y l( ϑ y} = 2 L( ϑ y y ij log ( yij e ij Here,p = rc 1 andp 0 = (r 1+(c 1 = r+c 2, and soν = (r 1(c 1 Therefore, by Wilks theorem, when H 0 is true andn is large, ( Yij 2 Y ij log E ij approx χ2 (r 1(c 1 The test statistics X 2 and 2log{λ(y} are asymptotically equivalent, that is, they differ by quantities that tend to zero as N To see this, first note thaty ij e ij is small relative toy ij and e ij when N is large Thus, since y ij = e ij +(y ij e ij, we may write (by Taylor series expansion of log(1+x around x, cut after second order term ( ( yij log = log 1+ y ij e ij (y ij e ij 1 (y ij e ij 2 e ij e ij e ij 2 e 2 ij Hence, we have y ij log ( yij e ij { (yij e ij {e ij +(y ij e ij } 1 e ij 2 (y ij e ij + 1 (y ij e ij 2 2 e ij } (y ij e ij 2 e 2 ij Since r c (y ij e ij = 0, it follows that 2 log{λ(y} (y ij e ij 2 e ij So we now see why we use thex 2 test statistic Example 241 continued Now we will test the hypothesis from the previous example The table ofe ij values is following

12 138 CHAPTER 2 ELEMENTS OF STATISTICAL INFERENCE Improved No difference Worse Placebo Half dose Full dose This gives X 2 obs = (y ij e ij 2 = 1417 e ij The critical value is χ 2 4;005 = 9488, hence there is no evidence to reject the null hypothesis saying that the different responses are independent of the levels of the new drug and placebo We obtain the same conclusion using the critical region derived from the Wilks theorem, which is { ( } yij R = y : 2 y ij log χ 2 ν;α e ij Here 2 y ij log ( yij e ij = 142

2.6.3 Generalized likelihood ratio tests

2.6.3 Generalized likelihood ratio tests 26 HYPOTHESIS TESTING 113 263 Generalized likelihood ratio tests When a UMP test does not exist, we usually use a generalized likelihood ratio test to verify H 0 : θ Θ against H 1 : θ Θ\Θ It can be used