40.530: Statistics. Professor Chen Zehua. Singapore University of Design and Technology

Size: px

Start display at page:

Download "40.530: Statistics. Professor Chen Zehua. Singapore University of Design and Technology"

Chad Waters
5 years ago
Views:

1 Singapore University of Design and Technology

2 Lecture 9: Hypothesis testing, uniformly most powerful tests. The Neyman-Pearson framework Let P be the family of distributions of concern. The Neyman-Pearson framework of hypothesis testing on P is as follows: Two hypotheses, a null hypothesis H 0 and an alternative hypothesis H 1, are formulated as H 0 : P P 0, H 1 : P P 1, where P 0 P 1 = φ and P 0 P 1 = P. Let S(X ) be a test statistic based on the sample X and c be a constant. The decision is made by the rule: if S(X ) c, reject H 0, otherwise, do not reject H 0. The constant c is chosen such that, for a specified small α, sup P(S(X ) c) α. P P 0

3 Type I and Type II errors When H 0 is rejected but H 0 is indeed true, the error is called the type I error. When H 0 is accepted but H 0 is in fact wrong, the error is called the type II error. Probabilities of errors and power function Type I error rate: Type II error rate: β S (P) = P(S(X ) > c) P P 0 1 β S (P) = 1 P(S(X ) > c) P P 1, β S (P), as a function of P, is called the power function of S.

4 Remarks Type I and type II error rates cannot be minimized simultaneously. Under the Neyman-Pearson framework, the Type I error rate is controlled by α, but Type II error rate is not controlled. Under the Neyman-Pearson framework, the two hypotheses should be formulated in a way such that the Type I error is a more serious one than the Type II error from a practical point-of-view. Significance level and size of the test If sup P P 0 P(S(X ) c) α, the number α, which controls the Type I error rate, is called the significance level of the test. The supremum sup P P0 P(S(X ) c) is called the size of the test.

5 Choice of significance level The choice of α is somewhat subjective. Standard values, 0.10, 0.05, and 0.01, are often used for convenience. p-value The smallest possible level of significance at which H 0 would be rejected for the computed S(x) is given by ˆα(x) = sup P P 0 P(S(X ) > S(x)) = sup P P 0 {1 Ψ(S(x) P)}, where Ψ(S(x) P) is the c.d.f. of S(X ) under distribution P. Such an ˆα(x) is called the p-value of the test S. Remark The p-value provides additional information for a test, using p-values is more appropriate than using fixed-level tests in a scientific problem. ˆα(X ) is a random variable. The p-value of the test is the realization of this random variable.

6 Critical (acceptance) region and random test The region of X : C S = {X : S(X ) c}, is called the critical region of the test. The complement of the critical region, i.e., A S = {X : S(X ) < c}, is called the acceptance region of the test. A test can also be defined by the indicator function of the critical region, i.e., T (X ) = I CS (X ). This indicator funtion is simply refered to as a test. The test T (X ) is a statistic taking values 0 or 1. In general, we can define a test as a statistic T (X ) taking values in [0, 1], which gives the probability T (x) to reject H 0 when X = x is observed. Such a test are called a randomized test. The power function of a randomized test T is given by β T (P) = E[T (X ) P].

7 Comparison of tests If there are many tests, we wish to find an optimal one in a certain sense. The ideal case is to have a test to minimize both the Type I and Type II error rates simultaneously. However, this is impossible. The strategy is confine ourselves to the class of tests satisfying sup P P 0 β T (P) α, where α [0, 1] is a given level of significance. Then select the test in the class that maximizes the power β T (P) over all P P 1. Definition (Uniformly most poserful test) A test T of size α is a uniformly most powerful (UMP) test if and only if β T (P) β T (P) for all P P 1 and T of level α.

8 A remark If U(X ) is a sufficient statistic for P P, then for any test T (X ), E(T U) has the same power function as T and, therefore, to find a UMP test we may consider tests that are functions of U only. Neyman-Pearson lemma Let P 0 = {P 0 } and P 1 = {P 1 }. Let f j be the p.d.f. of P j, j = 0, 1. (i) (Existence). For every α, there exists a UMP test of size α, given by 1 f 1 (X ) > cf 0 (X ) T (X ) = γ f 1 (X ) = cf 0 (X ) 0 f 1 (X ) < cf 0 (X ), where γ [0, 1] and c 0 are some constants chosen so that E[T (X )] = α when P = P 0 (c = is allowed). (ii) (Uniqueness). If T is a UMP test of size α, then { 1 f1 (X ) > cf T (X ) = 0 (X ) a.s. P. 0 f 1 (X ) < cf 0 (X )

9 Remarks The Neyman-Pearson lemma shows that if both H 0 and H 1 are simple (a hypothesis is simple iff the corresponding set of populations contains exactly one element), there exists a UMP test that can be determined uniquely (a.s. P) except on the set B = {x : f 1 (x) = cf 0 (x)}. If ν(b) = 0, there is a unique nonrandomized UMP test; o.w., UMP tests are randomized on the set B and the randomization is necessary for UMP tests to have the given size α We can always choose a UMP test that is constant on B. Example 9.1 Suppose that X is a sample of size 1, P 0 = {P 0 }, and P 1 = {P 1 }, where P 0 is N(0, 1) and P 1 is the double exponential distribution DE(0, 2) with the p.d.f. 4 1 e x /2. Since P(f 1 (X ) = cf 0 (X )) = 0, there is a unique nonrandomized UMP test.

10 Example 9.1 (cont.) By Neyman-Pearson lemma, the UMP test T (x) = 1 if and only if π 8 ex2 x > c 2 for some c > 0, which is equivalent to x > t or x < 1 t for some t > 1 2. It is not necessary to find out c. Suppose that α < 1 3. To determine t, we use α = E 0 [T (X )] = P 0 ( X > t) + P 0 ( X < 1 t). If t 1, then P 0 ( X > t) P 0 ( X > 1) = > α. Hence t should be larger than 1 and α = P 0 ( X > t) = Φ( t) + 1 Φ(t). Thus, t = Φ 1 (1 α/2) and T (X ) = I (t, ) ( X ). Intuitively, the reason why the UMP test in this example rejects H 0 when X is large is that the probability of getting a large X is much higher under H 1 (i.e., P is the double exponential distribution DE(0, 2)). The power of T when P P 1 is E 1 [T (X )] = P 1 ( X > t) = t t e x /2 dx = e t/2.

11 Example 9.2 Let X 1,..., X n be i.i.d. binary variables with p = P(X 1 = 1). Suppose that H 0 : p = p 0 and H 1 : p = p 1, where 0 < p 0 < p 1 < 1. By Neyman-Pearson lemma, a UMP test of size α is 1 λ(y ) > c T (Y ) = γ λ(y ) = c 0 λ(y ) < c, where Y = ( ) Y ( ) n Y n i=1 X i and λ(y ) = p1 1 p1 p 0 1 p 0. Since λ(y ) is increasing in Y, there is an integer m > 0 such that 1 Y > m T (Y ) = γ Y = m 0 Y < m, where m and γ satisfy α = E 0 [T (Y )] = P 0 (Y > m) + γp 0 (Y = m).

12 Example 9.2 (cont.) Since Y has the binomial distribution Bi(p, n), we can determine m and γ from n ( ) ( ) n α = p j n 0 j (1 p 0) n j + γ p0 m (1 p 0 ) n m. m j=m+1 Unless n ( ) n α = p j 0 j (1 p 0) n j j=m+1 for some integer m, in which case we can choose γ = 0, the UMP test T is a randomized test. Remark An interesting phenomenon in Example 9.2 is that the UMP test T does not depend on p 1. In such a case, T is in fact a UMP test for testing H 0 : p = p 0 versus H 1 : p > p 0. Lemma 9.1 Suppose that there is a test T of size α such that for every P 1 P 1, T is UMP for testing H 0 versus the hypothesis P = P 1. Then T is UMP for testing H 0 versus H 1.

13 Monotone likelihood ratio family Suppose that the distribution of X is in P = {P θ : θ Θ}, a parametric family indexed by a real-valued θ, and that P is dominated by a σ-finite measure ν. Let f θ = dp θ /dν be the p.d.f. of P. The family P is said to have monotone likelihood ratio in Y (X ) (a real-valued statistic) if and only if, for any θ 1 < θ 2, f θ2 (x)/f θ1 (x) is a nondecreasing function of Y (x) for values x at which at least one of f θ1 (x) and f θ2 (x) is positive. Lemma 9.2 Suppose that the distribution of X is in a parametric family P indexed by a real-valued θ and that P has monotone likelihood ratio in Y (X ). If ψ is a nondecreasing function of Y, then g(θ) = E[ψ(Y )] is a nondecreasing function of θ.

14 Example 9.3 Let θ be real-valued and η(θ) be a nondecreasing function of θ. Then the one-parameter exponential family with f θ (x) = exp{η(θ)y (x) ξ(θ)}h(x) has monotone likelihood ratio in Y (X ). Example 9.4 Let X 1,..., X n be i.i.d. from the uniform distribution on (0, θ), where θ > 0. The Lebesgue p.d.f. of X = (X 1,..., X n ) is f θ (x) = θ n I (0,θ) (x (n) ), where x (n) is the value of the largest order statistic X (n). For θ 1 < θ 2, f θ2 (x) f θ1 (x) = θn I 1 (0,θ2 )(x (n) ) θ2 n I (0,θ1 )(x (n) ), which is a nondecreasing function of x (n) for x s at which at least one of f θ1 (x) and f θ2 (x) is positive, i.e., x (n) < θ 2. Hence the family of distributions of X has monotone likelihood ratio in X (n).

15 Example 9.5 The following families have monotone likelihood ratio: the double exponential distribution family {DE(θ, c)} with a known c; the exponential distribution family {E(θ, c)} with a known c; the logistic distribution family {LG(θ, c)} with a known c; the hypergeometric distribution family {HG(r, θ, N θ)} with known r and N. An example of a family that does not have monotone likelihood ratio is the Cauchy distribution family {C(θ, c)} with a known c. Testing one sided hypotheses Hypotheses of the form H 0 : θ θ 0 (or H 0 : θ θ 0 ) versus H 1 : θ > θ 0 (or H 1 : θ < θ 0 ) are called one-sided hypotheses for any fixed constant θ 0.

16 Theorem 9.1 Suppose that X has a distribution in P = {P θ : θ Θ} (Θ R) that has monotone likelihood ratio in Y (X ). Consider the problem of testing H 0 : θ θ 0 versus H 1 : θ > θ 0, where θ 0 is a given constant. (i) There exists a UMP test of size α, which is given by 1 Y (X ) > c T (X ) = γ Y (X ) = c 0 Y (X ) < c, where c and γ are determined by β T (θ 0 ) = α, and β T (θ) = E[T (X )] is the power function of a test T. (ii) β T (θ) is strictly increasing for all θ s for which 0 < β T (θ) < 1. (iii) For any θ < θ 0, T minimizes β T (θ) (the type I error probability of T ) among all tests T satisfying β T (θ 0 ) = α.

17 (iv) Assume that P θ (f θ (X ) = cf θ0 (X )) = 0 for any θ > θ 0 and c 0, where f θ is the p.d.f. of P θ. If T is a test with β T (θ 0 ) = β T (θ 0 ), then for any θ > θ 0, either β T (θ) < β T (θ) or T = T a.s. P θ. (v) For any fixed θ 1, T is UMP for testing H 0 : θ θ 1 versus H 1 : θ > θ 1, with size β T (θ 1 ). Corollary 9.1 (one-parameter exponential families) Suppose that X has a p.d.f. in a one-parameter exponential family with η being a strictly monotone function of θ. If η is increasing, then T given by Theorem 7.1 is UMP for testing H 0 : θ θ 0 versus H 1 : θ > θ 0, where γ and c are determined by β T (θ 0 ) = α. If η is decreasing or H 0 : θ θ 0 (H 1 : θ < θ 0 ), the result is still valid by reversing inequalities in the definition of T.

18 Example 9.6 Let X 1,..., X n be i.i.d. from the N(µ, σ 2 ) distribution with an unknown µ R and a known σ 2. Consider H 0 : µ µ 0 versus H 1 : µ > µ 0, where µ 0 is a fixed constant. The p.d.f. of X = (X 1,..., X n ) is from a one-parameter exponential family with Y (X ) = X and η(µ) = nµ/σ 2. By Corollary 9.1 and the fact that X is N(µ, σ 2 /n), the UMP test is T (X ) = I (cα, )( X ), where c α = σz 1 α / n + µ 0 and z a = Φ 1 (a). Discussion To derive a UMP test for testing H 0 : θ θ 0 versus H 1 : θ > θ 0 when X has a p.d.f. in a one-parameter exponential family, it is essential to know the distribution of Y (X ). Typically, a nonrandomized test can be obtained if the distribution of Y is continuous; otherwise UMP tests are randomized.

19 Example 9.7 Let X 1,..., X n be i.i.d. random variables from the Poisson distribution P(θ) with an unknown θ > 0. The p.d.f. of X = (X 1,..., X n ) is from a one-parameter exponential family with Y (X ) = n i=1 X i and η(θ) = log θ. Note that Y has the Poisson distribution P(nθ). By Corollary 9.1, a UMP test for H 0 : θ θ 0 versus H 1 : θ > θ 0 is given by Theorem 6.2 with c and γ satisfying α = j=c+1 e nθ 0 (nθ 0 ) j j! + γ enθ 0 (nθ 0 ) c. c! Example 9.8 Let X 1,..., X n be i.i.d. random variables from the uniform distribution U(0, θ), θ > 0. Consider the hypotheses H 0 : θ θ 0 and H 1 : θ > θ 0. The p.d.f. of X = (X 1,..., X n ) is in a family with monotone likelihood ratio in Y (X ) = X (n) (Example 9.4).

20 Example 9.8 (continued) By Theorem 9.1, a UMP test is T. Since X (n) has the Lebesgue p.d.f. nθ n x n 1 I (0,θ) (x), the UMP test T is nonrandomized and α = β T (θ 0 ) = n θ n 0 θ0 Hence c = θ 0 (1 α) 1/n. The power function of T when θ > θ 0 is β T (θ) = n θ n θ c c x n 1 dx = 1 cn θ0 n. x n 1 dx = 1 θn 0 (1 α) θ n. In this problem, however, UMP tests are not unique. (Note that the condition P θ (f θ (X ) = cf θ0 (X )) = 0 in Theorem 9.1(iv) is not satisfied.) It can be shown (exercise) that the following test is also UMP with size α: { 1 X(n) > θ T (X ) = 0 α X (n) θ 0.

21 Generalized Neyman-Pearson lemma Let f 1,..., f m+1 be functions on R p integrable w.r.t. a σ-finite ν. For given constants t 1,..., t m, let T be the class of functions φ (from R p to [0, 1]) satisfying φf i dν t i, i = 1,..., m, (1) and T 0 be the set of φ s in T satisfying (1) with all inequalities replaced by equalities. (i) If there are constants c 1,..., c m such that { 1 fm+1 (x) > c φ (x) = 1 f 1 (x) + + c m f m (x) 0 f m+1 (x) < c 1 f 1 (x) + + c m f m (x) is a member of T 0, then φ maximizes φf m+1 dν over φ T 0. (ii) If c i 0 for all i, then φ maximizes φf m+1 dν over φ T. (iii) M = { ( φf 1 dν,..., φf m dν) : φ is from R p to [0, 1] } is convex and closed. If (t 1,..., t m ) is an interior point of M, the constants c 1,..., c m in (i) exist.

22 Two-sided hypotheses The following hypotheses are called two-sided hypotheses: H 0 : θ θ 1 or θ θ 2 versus H 1 : θ 1 < θ < θ 2, (2) H 0 : θ 1 θ θ 2 versus H 1 : θ < θ 1 or θ > θ 2, (3) H 0 : θ = θ 0 versus H 1 : θ θ 0, (4) where θ 0, θ 1, and θ 2 are given constants and θ 1 < θ 2. Remark UMP tests for hypotheses (2) exist for one-parameter exponential families and some non-exponential families. A UMP test does not exist in general for testing hypotheses (3) or (4). The reason is that UMP tests for testing one-sided hypotheses do not have level α for testing (2); but they are of level α for testing (3) or (4) and there does not exist a single test more powerful than all tests that are UMP for testing one-sided hypotheses.

23 Theorem 9.2 (UMP tests for two-sided hypotheses) Suppose that X has a p.d.f. w.r.t. a σ-finite measure in a one-parameter exponential family: f θ (x) = exp{η(θ)y (x) ξ(θ)}h(x). (i) For testing hypotheses (2), a UMP test of size α is 1 c 1 < Y (X ) < c 2 T (X ) = γ i Y (X ) = c i, i = 1, 2 0 Y (X ) < c 1 or Y (X ) > c 2, c i s and γ i s are determined by (5) β T (θ 1 ) = β T (θ 2 ) = α. (6) (ii) T minimizes β T (θ) over θ < θ 1, θ > θ 2 and T satisfying (6). (iii) If T and T are two tests satisfying (5) and β T (θ 1 ) = β T (θ 1 ) and if the region {T = 1} is to the right of {T = 1}, then β T (θ) < β T (θ) for θ > θ 1 and β T (θ) > β T (θ) for θ < θ 1. If both T and T satisfy (5) and (6), then T = T a.s. P.

24 Example 9.9 Let X 1,..., X n be i.i.d. from N(θ, 1). By Theorem 9.2, a UMP test for testing (2) is T (X ) = I (c1,c 2 )( X ), where c i s are determined by and Φ ( n(c 2 θ 1 ) ) Φ ( n(c 1 θ 1 ) ) = α Φ ( n(c 2 θ 2 ) ) Φ ( n(c 1 θ 2 ) ) = α.

Chapter 6. Hypothesis Tests Lecture 20: UMP tests and Neyman-Pearson lemma

Chapter 6. Hypothesis Tests Lecture 20: UMP tests and Neyman-Pearson lemma Theory of testing hypotheses X: a sample from a population P in P, a family of populations. Based on the observed X, we test a