STAT 801: Mathematical Statistics. Hypothesis Testing

STAT 801: Mathematical Statistics Hypothesis Testing Hypothesis testing: a statistical problem where you must choose, on the basis o data X, between two alternatives. We ormalize this as the problem o choosing between two hypotheses: H o : θ Θ 0 or H 1 : θ Θ 1 where Θ 0 and Θ 1 are a partition o the model P θ ; θ Θ. That is Θ 0 Θ 1 = Θ and Θ 0 Θ 1 =. A rule or making the required choice can be described in two ways: 1. In terms o the set called the rejection or critical region o the test. R = {X : we choose Θ 1 i we observe X} 2. In terms o a unction φ(x) which is equal to 1 or those x or which we choose Θ 1 and 0 or those x or which we choose Θ 0. For technical reasons which will come up soon I preer to use the second description. However, each φ corresponds to a unique rejection region R φ = {x : φ(x) = 1}. Neyman Pearson approach treats two hypotheses asymmetrically. Hypothesis H o reerred to as the null hypothesis (traditionally the hypothesis that some treatment has no eect). Deinition: The power unction o a test φ (or the corresponding critical region R φ ) is π(θ) = P θ (X R φ ) = E θ (φ(x)) Interested in optimality theory, that is, the problem o inding the best φ. A good φ will evidently have π(θ) small or θ Θ 0 and large or θ Θ 1. There is generally a trade o which can be made in many ways, however. Simple versus Simple testing Finding a best test is easiest when the hypotheses are very precise. Deinition: A hypothesis H i is simple i Θ i contains only a single value θ i. The simple versus simple testing problem arises when we test θ = θ 0 against θ = θ 1 so that Θ has only two points in it. This problem is o importance as a technical tool, not because it is a realistic situation. Suppose that the model speciies that i θ = θ 0 then the density o X is 0 (x) and i θ = θ 1 then the density o X is 1 (x). How should we choose φ? To answer the question we begin by studying the problem o minimizing the total error probability. Type I error: the error made when θ = θ 0 but we choose H 1, that is, X R φ. Type II error: when θ = θ 1 but we choose H 0. The level o a simple versus simple test is α = P θ0 (We make a Type I error) or Other error probability denoted β is α = P θ0 (X R φ ) = E θ0 (φ(x)) β = P θ1 (X R φ ) = E θ1 (1 φ(x)). Minimize α + β, the total error probability given by α + β = E θ0 (φ(x)) + E θ1 (1 φ(x)) = [φ(x) 0 (x) + (1 φ(x)) 1 (x)]dx 1

Problem: choose, or each x, either the value 0 or the value 1, in such a way as to minimize the integral. But or each x the quantity φ(x) 0 (x) + (1 φ(x)) 1 (x) is between 0 (x) and 1 (x). To make it small we take φ(x) = 1 i 1 (x) > 0 (x) and φ(x) = 0 i 1 (x) < 0 (x). It makes no dierence what we do or those x or which 1 (x) = 0 (x). Notice: divide both sides o inequalities to get condition in terms o likelihood ratio 1 (x)/ 0 (x). Theorem: For each ixed λ the quantity β + λα is minimized by any φ which has { 1 1(x) φ(x) = > λ 0 1(x) < λ Neyman and Pearson suggested that in practice the two kinds o errors might well have unequal consequences. They suggested that rather than minimize any quantity o the orm above you pick the more serious kind o error, label it Type I and require your rule to hold the probability α o a Type I error to be no more than some prespeciied level α 0. (This value α 0 is typically 0.05 these days, chiely or historical reasons.) The Neyman and Pearson approach is then to minimize β subject to the constraint α α 0. Usually this is really equivalent to the constraint α = α 0 (because i you use α < α 0 you could make R larger and keep α α 0 but make β smaller. For discrete models, however, this may not be possible. Example: Suppose X is Binomial(n, p) and either p = p 0 = 1/2 or p = p 1 = 3/4. I R is any critical region (so R is a subset o {0, 1,..., n}) then P 1/2 (X R) = k 2 n or some integer k. I we want α 0 = 0.05 with say n = 5 or example we have to recognize that the possible values o α are 0, 1/32 = 0.03125, 2/32 = 0.0625 and so on. Possible rejection regions or α 0 = 0.05: Region α β R 1 = 0 1 R 2 = {x = 0} 0.03125 1 (1/4) 5 R 3 = {x = 5} 0.03125 1 (3/4) 5 So R 3 minimizes β subject to α < 0.05. Raise α 0 slightly to 0.0625: possible rejection regions are R 1, R 2, R 3 and R 4 = R 2 R 3. The irst three have the same α and β as beore while R 4 has α = α 0 = 0.0625 an β = 1 (3/4) 5 (1/4) 5. Thus R 4 is optimal! Problem: i all trials are ailures optimal R chooses p = 3/4 rather than p = 1/2. But p = 1/2 makes 5 ailures much more likely does p = 3/4. Problem: discreteness. Here s how we get around the problem. First we expand the set o possible values o φ to include numbers between 0 and 1. Values o φ(x) between 0 and 1 represent the chance that we choose H 1 given that we observe x; the idea is that we actually toss a (biased) coin to decide! This tactic will show us the kinds o rejection regions which are sensible. In practice we then restrict our attention to levels α 0 or which the best φ is always either 0 or 1. In the binomial example we will insist that the value o α 0 be either 0 or P θ0 (X 5) or P θ0 (X 4) or... Smaller example: 4 possible values o X and 2 4 possible rejection regions. Here is a table o the levels or each possible rejection region R: R α 0 {3}, {0} 1/8 {0,3} 2/8 {1}, {2} 3/8 {0,1}, {0,2}, {1,3}, {2,3} 4/8 {0,1,3}, {0,2,3} 5/8 {1,2} 6/8 {0,1,3}, {0,2,3} 7/8 {0,1,2,3} 1 2

Best level 2/8 test has rejection region {0, 3}, β = 1 [(3/4) 3 + (1/4) 3 ] = 36/64. Best level 2/8 test using randomization rejects when X = 3 and, when X = 2 tosses a coin with P (H) = 1/3, then rejects i you get H. Level is 1/8+(1/3)(3/8) = 2/8; probability o Type II error is β = 1 [(3/4) 3 +(1/3)(3)(3/4) 2 (1/4)] = 28/64. Deinition: A hypothesis test is a unction φ(x) whose values are always in [0, 1]. I we observe X = x then we choose H 1 with conditional probability φ(x). In this case we have and π(θ) = E θ (φ(x)) α = E 0 (φ(x)) β = 1 E 1 (φ(x)) Note that a test using a rejection region C is equivalent to φ(x) = 1(x C) The Neyman Pearson Lemma: In testing 0 against 1 the probability β o a type II error is minimized, subject to α α 0 by the test unction: 1 1(x) > λ φ(x) = γ 1(x) = λ 0 1(x) < λ where λ is the largest constant such that and and where γ is any number chosen so that P 0 ( 1(X) 0 (X) λ) α 0 P 0 ( 1(X) 0 (X) λ) 1 α 0 E 0 (φ(x)) = P 0 ( 1(X) 0 (X) > λ) = α 0 + γp 0 ( 1(X) 0 (X) = λ) The value o γ is unique i P 0 ( 1(X) 0(X) = λ) > 0. Example: Binomial(n, p) with p 0 = 1/2 and p 1 = 3/4: ratio 1 / 0 is 3 x 2 n I n = 5 this ratio is one o 1, 3, 9, 27, 81, 243 divided by 32. Suppose we have α = 0.05. λ must be one o the possible values o 1 / 0. I we try λ = 243/32 then and So λ = 81/32. P 0 (3 X 2 5 243/32) = P 0 (X = 5) = 1/32 < 0.05 P 0 (3 X 2 5 81/32) = P 0 (X 4) = 6/32 > 0.05 3

Since we must solve or γ and ind P 0 (3 X 2 5 > 81/32) = P 0 (X = 5) = 1/32 P 0 (X = 5) + γp 0 (X = 4) = 0.05 γ = 0.05 1/32 5/32 = 0.12 NOTE: No-one ever uses this procedure. Instead the value o α 0 used in discrete problems is chosen to be a possible value o the rejection probability when γ = 0 (or γ = 1). When the sample size is large you can come very close to any desired α 0 with a non-randomized test. I α 0 = 6/32 then we can either take λ to be 243/32 and γ = 1 or λ = 81/32 and γ = 0. However, our deinition o λ in the theorem makes λ = 81/32 and γ = 0. When the theorem is used or continuous distributions it can be the case that the cd o 1 (X)/ 0 (X) has a lat spot where it is equal to 1 α 0. This is the point o the word largest in the theorem. Example: I X 1,..., X n are iid N(µ, 1) and we have µ 0 = 0 and µ 1 > 0 then 1 (X 1,..., X n ) 0 (X 1,..., X n ) = exp{µ 1 Xi nµ 2 1/2 µ 0 Xi + nµ 2 0/2} which simpliies to exp{µ 1 Xi nµ 2 1/2} Now choose λ so that P 0 (exp{µ 1 Xi nµ 2 1/2} > λ) = α 0 Can make it equal because 1 (X)/ 0 (X) has a continuous distribution. Rewrite probability as P 0 ( X i > [log(λ) + nµ 2 1/2]/µ 1 ) Let z α be upper α critical point o N(0, 1); then = ( ) log(λ) + nµ 2 1 Φ 1 /2 n 1/2 µ 1 z α0 = [log(λ) + nµ 2 1/2]/[n 1/2 µ 1 ]. Solve to get a ormula or λ in terms o z α0, n and µ 1. The rejection region looks complicated: reject i a complicated statistic is larger than λ which has a complicated ormula. But in calculating λ we re-expressed the rejection region in terms o Xi > z α0 n The key eature is that this rejection region is the same or any µ 1 > 0. [WARNING: in the algebra above I used µ 1 > 0.] This is why the Neyman Pearson lemma is a lemma! Deinition: In the general problem o testing Θ 0 against Θ 1 the level o a test unction φ is The power unction is α = sup θ Θ 0 E θ (φ(x)) π(θ) = E θ (φ(x)) A test φ is a Uniormly Most Powerul level α 0 test i 4

1. φ has level α α o 2. I φ has level α α 0 then or every θ Θ 1 we have E θ (φ(x)) E θ (φ (X)) Proo o Neyman Pearson lemma: Given a test φ with level strictly less than α 0 we can deine the test φ (x) = 1 α 0 1 α φ(x) + α 0 α 1 α has level α 0 and β smaller than that o φ. Hence we may assume without loss that α = α 0 and minimize β subject to α = α 0. However, the argument which ollows doesn t actually need this. Lagrange Multipliers Suppose you want to minimize (x) subject to g(x) = 0. Consider irst the unction I x λ minimizes h λ then or any other x h λ (x) = (x) + λg(x) (x λ ) (x) + λ[g(x) g(x λ )] Now suppose you can ind a value o λ such that the solution x λ has g(x λ ) = 0. Then or any x we have (x λ ) (x) + λg(x) and or any x satisying the constraint g(x) = 0 we have (x λ ) (x) This proves that or this special value o λ the quantity x λ minimizes (x) subject to g(x) = 0. Notice that to ind x λ you set the usual partial derivatives equal to 0; then to ind the special x λ you add in the condition g(x λ ) = 0. Return to proo o NP lemma For each λ > 0 we have seen that φ λ minimizes λα + β where φ λ = 1( 1 (x)/ 0 (x) λ). As λ increases the level o φ λ decreases rom 1 when λ = 0 to 0 when λ =. There is thus a value λ 0 where or λ < λ 0 the level is less than α 0 while or λ > λ 0 the level is at least α 0. Temporarily let δ = P 0 ( 1 (X)/ 0 (X) = λ 0 ). I δ = 0 deine φ = φ λ. I δ > 0 deine 1 φ(x) = γ 0 1(x) > λ 0 1(x) = λ 0 1(x) < λ 0 where P 0 ( 1 (X)/ 0 (X) < λ 0 ) + γδ = α 0. You can check that γ [0, 1]. Now φ has level α 0 and according to the theorem above minimizes α + λ 0 β. Suppose φ is some other test with level α α 0. Then λ 0 α φ + β φ λ 0 α φ + β φ We can rearrange this as Since β φ β φ + (α φ α φ )λ 0 α φ α 0 = α φ 5

the second term is non-negative and β φ β φ which proves the Neyman Pearson Lemma. Example application o NP: Binomial(n, p) to test p = p 0 versus p 1 or a p 1 > p 0 the NP test is o the orm φ(x) = 1(X > k) + γ1(x = k) where we choose k so that and γ [0, 1) so that P p0 (X > k) α 0 < P p0 (X k) α 0 = P p0 (X > k) + γp p0 (X = k) This rejection region depends only on p 0 and not on p 1 so that this test is UMP or p = p 0 against p > p 0. Since this test has level α 0 even or the larger null hypothesis it is also UMP or p p 0 against p > p 0. Application o the NP lemma: In the N(µ, 1) model consider Θ 1 = {µ > 0} and Θ 0 = {0} or Θ 0 = {µ 0}. The UMP level α 0 test o H 0 : µ Θ 0 against H 1 : µ Θ 1 is φ(x 1,..., X n ) = 1(n 1/2 X > zα0 ) Proo: For either choice o Θ 0 this test has level α 0 because or µ 0 we have P µ (n 1/2 X > z α0 ) = P µ (n 1/2 ( X µ) > z α0 n 1/2 µ) = P (N(0, 1) > z α0 n 1/2 µ) P (N(0, 1) > z α0 ) = α 0 (Notice the use o µ 0. The central point is that the critical point is determined by the behaviour on the edge o the null hypothesis.) Now i φ is any other level α 0 test then we have Fix a µ > 0. According to the NP lemma where φ µ rejects i E 0 (φ(x 1,..., X n )) α 0 E µ (φ(x 1,..., X n )) E µ (φ µ (X 1,..., X n )) µ (X 1,..., X n )/ 0 (X 1,..., X n ) > λ or a suitable λ. But we just checked that this test had a rejection region o the orm n 1/2 X > zα0 which is the rejection region o φ. The NP lemma produces the same test or every µ > 0 chosen as an alternative. So we have shown that φ µ = φ or any µ > 0. This phenomenon is somewhat general. What happened was this. For any µ > µ 0 the likelihood ratio µ / 0 is an increasing unction o X i. The rejection region o the NP test is thus always a region o the orm X i > k. The value o the constant k is determined by the requirement that the test have level α 0 and this depends on µ 0 not on µ 1. Deinition: The amily θ ; θ Θ R has monotone likelihood ratio with respect to a statistic T (X) i or each θ 1 > θ 0 the likelihood ratio θ1 (X)/ θ0 (X) is a monotone increasing unction o T (X). Theorem: For a monotone likelihood ratio amily the Uniormly Most Powerul level α test o θ θ 0 (or o θ = θ 0 ) against the alternative θ > θ 0 is 1 T (x) > t α φ(x) = γ T (X) = t α 0 T (x) < t α 6

where P θ0 (T (X) > t α ) + γp θ0 (T (X) = t α ) = α 0. Typical amily where this works: one parameter exponential amily. Usually there is no UMP test. Example: test µ = µ 0 against two sided alternative µ µ 0. There is no UMP level α test. I there were its power at µ > µ 0 would have to be as high as that o the one sided level α test and so its rejection region would have to be the same as that test, rejecting or large positive values o X µ0. But it also has to have power as good as the one sided test or the alternative µ < µ 0 and so would have to reject or large negative values o X µ0. This would make its level too large. Favourite test: usual 2 sided test rejects or large values o X µ 0. Test maximizes power subject to two constraints: irst, level α; second power is minimized at µ = µ 0. Second condition means power on alternative is larger than on the null. 7