Hypothesis Testing: The Generalized Likelihood Ratio Test

Hypothesis Testing: The Generalized Likelihood Ratio Test Consider testing the hypotheses H 0 : θ Θ 0 H 1 : θ Θ \ Θ 0 Definition: The Generalized Likelihood Ratio (GLR Let L(θ be a likelihood for a random sample having joint pdf f( x; θ for θ Θ. The (generalized likelihood ratio (GLR is defined to be λ = λ( x = max θ Θ 0 L(θ max θ Θ\Θ0 L(θ = L(ˆ L(ˆθ where ˆθ denote the usual unrestricted MLE and ˆ denoted the MLE when H 0 is true. (If those maximums don t exists, use supremums! Example: Suppose we wish to test H 0 : θ 3 versus H 1 : θ > 3 and we consider the likelihood function L(θ. We maximize L(θ by setting d dθ L(θ = 0. It is usually the case that there is only one solution to d dθ L(θ = 0 (and that it is indeed a max and not a min!. Call this solution ˆθ. If ˆθ is the unique solution, there will be no other turning points on the graph of L(θ. Suppose the graph of L(θ is as shown in Figure 1. Figure 1: A Unimodal Likelihood From Figure 1, we see that the location of the standard unrestricted MLE and then we see that the maximum of the likelihood over the restricted set where θ 3 occurs at = 3.

Figure 2: A Bimodal Likelihood Of course it s possible (though not probable that the likelihood function looks like that shown in Figure 2. (This behavior would be reflected in multiple solutions to d dθ L(θ = 0. In this case, the standard unrestricted (ˆθ and restricted (ˆ are shown. *************************************************** The Generalized Likelihood Ratio Test Reject H 0 if λ k where k is chosen to give a size α test. (i.e.: α = max θ Θ 0 P (λ k; θ Remarks: *************************************************** 1. This is just like the Neyman-Pearson test in the simple versus simple case, the only change here is the addition of the maxes. 2. The original Neyman-Pearson test can be thought of as a simple likelihood ratio test (or SLRT, pronounced slirt since max L(θ = L( when Θ 0 = { }. θ Θ 0 Example: Let X 1, X 2,..., X n be a random sample from the N(µ, σ 2 distribution where σ 2 is known. Test H 0 : µ = µ 0 versus H 1 : µ µ 0. (Note: There is no UMP for this problem as is often the case for a two-sided alternative hypothesis.

Derive a GLRT of size α. The joint pdf is f( x, µ = (2πσ 2 n/2 e 1 2σ 2 (xi µ 2. Since a likelihood is any function proportional to the joint pdf, let s take L(µ = e 1 2σ 2 (xi µ 2. We already know the usual (unrestricted MLE for µ: ˆµ = X. Question: Now what maximizes L(µ when H 0 is true? Answer: That s easy since H 0 contains only one point! (µ 0 (exciting not factorial... max µ=µ 0 L(µ = L(µ 0! λ = e 1 2σ 2 (xi µ 0 2 e 1 2σ 2 (xi x 2 = e 1 [ 2σ 2 (x i µ 0 2 (x i x 2 ] Since we re going to have to compute a probability P (λ( X k; H 0, let s simplify λ: (xi µ 0 2 (x i x 2 = x 2 i 2µ 0 xi + nµ 2 0 x 2 i + 2x x i nx 2 = 2µ 0 xi + nµ 2 0 + 2x x i nx 2 Hey! This sort of looks like something squared... In fact, if we pull the n out: (Cool! n( 2µ 0 1 n xi + µ 2 0 + 2x 1 n xi x 2 = n( 2µ 0 x + µ 2 0 + 2x2 x 2 = n( 2µ 0 x + µ 2 0 + x2 = n(x µ 0 2 [ n(x µ0 2 ] λ = exp 2σ 2 Recall that we will reject H 0 if λ k where k is such that P (λ( X k; H 0 = α:

( [ ] n(x µ P exp 0 2 k; H 2σ 2 0 = P ( n(x µ 0 2 2σ 2 ln k; H 0 = P ( n(x µ 0 2 σ 2 2 ln k; H 0 ( ( 2 = P X µ0 σ/ n k1 ; H 0 where k 1 is such that this probability is α. Now if H 0 is true and µ is indeed µ 0, then X µ 0 σ/ n N(0, 1 and so ( 2 X µ0 σ/ χ 2 (1 n So P ( X µ0 σ/ n 2 k 1 ; H 0 = P (W k 1 where W χ 2 (1. Hence, k 1 = χ 2 α(1. we will reject H 0 if ( 2 X µ0 σ/ χ 2 n α(1. This is the GLRT of size α! Example: (Note: I could only think of one example of a composite versus composite that isn t a computational nightmare, so I ll save this easy problem for you! Let X 1, X 2,..., X n be a random sample from the unif(0, θ] distribution. (Note that I closed the right side of the interval. I only did this so that we won t have a problem with a max, but this wouldn t matter at all if we were using the more general definition of the GLR that uses supremums. Find the GLRT of size α for H 0 : θ = versus H 1 : θ. f(x; θ = 1 θ I (0,θ](x

n L(θ = θ n I (0,θ] (x i = θ n I (0,θ] (x (n i=1 We already know that the plain old unrestricted MLE is ˆθ = X (n. (This is because the derivative of θ n with respect to θ set equal to zero gives no information, so since 0 < x i θ for i = 1, 2,..., n, the smallest θ can be is x (n which then maximizes the decreasing L(θ = θ n. Since H 0 consists of only one point: θ =, the maximum of L(θ restricted to this one point set is simply L(. λ = θ n 0 I (0,θ0 ](x (n x n (n I (0,x (n ](x (n = ( x(n n I (0,θ0](x (n As usual, we will reject H 0 if ( x(n n I (0,θ0](x (n k where k is such that P ( (X(n n I (0,θ0](X (n k; H 0 = α Well, under H 0, that indicator is always 1, so we can drop it: α = P ( (X(n n k; H 0 = P (X (n k 1/n ; H 0 = P (X (n k 1 ; H 0 Finally, we solve for k 1 : α = P (X (n k 1 ; H 0 = [P (X 1 k 1 ; H 0 ] n = k 1 k 1 = α 1/n. the GLRT of size α, for H 0 : θ = versus H 1 : θ for a random sample of size n from the unif(0, θ] distribution, is to reject H 0 in favor of H 1 if X (n α 1/n. Wait a minute...

Whoa, what s going on in that last problem? Does that rejection rule make sense? One would certainly think that is is not true that θ, the upper limit of the support set for the sample, is equal to if we happened to observe x (n >. Shouldn t we be rejecting for some values of X (n that are too large? We don t just want to say: Well of course we would automatically reject if X (n > because making up rules to suit our needs will mess with the size of the test that we worked so hard to obtain. The answer is yes but I didn t mention it as we were going through the example because I didn t want to muck up the steps of a standard GLRT procedure with this special case of having the indicators mixed with the parameters. Normally, We set something less than or equal to k. We simplify this to something else less (greater than or equal to some k 1. We say that the original k doesn t matter anymore. With indicators mixed with parameters, it does matter... this is a weird sticky little point that I will not hold you accountable for in this course, but, for the record, I will expound upon it here. Going back to the original rejection rule, we reject H 0 if ( X(n n I (0,θ0] k. Since k = (k 1 / n, this becomes, reject H 0 if : ( X(n n I (0,θ0](X (n ( ( n k1 θ0 α 1/n = n = α. For a non-trivial α (ie: α > 0, if X (n is greater than, the left hand side of this inequality will be zero, hence less than α, hence we will reject, as desired.