Chapter 4 Theory of Tests 4.1 Introduction Parametric model: (X, B X, P θ ), P θ P = {P θ θ Θ} where Θ = H 0 +H 1 X = K +A : K: critical region = rejection region / A: acceptance region A decision rule d, where d K, if x K d(x) = d A, if x A is called a non randomised test. One tries to choose K in such a way that the number of wrong decisions becomes as small as possible. We distinguish: Type I error: H 0 is correct, but is rejected (decision d K ). Type II error: H 1 is correct, but decision for H 0 (decision d A ). Decision for H 0 H 1 is correct is correct H 0 correct Type II error H 1 Type I error correct Given a boundary α (significance level) for the probability of committing an error of I. kind one tries to find a test which minimizes the probability of an error of II. kind. 37
38 CHAPTER 4. THEORY OF TESTS Def. 4.1.1: (a) A measurable function ϕ : X [0, 1] is called a test function (a test). ϕ(x) is the probability for the decision d K, if x is the sample outcome. (b) ϕ is called an α-level test, if sup E θ [ϕ(x)] α. (4.1) θ H 0 (c) The probability of rejecting H 0 if P θ is the underlying distribution, β ϕ (θ) := P θ (d K ) = E θ [ϕ(x)] is called the power of the test. β ϕ : Θ [0, 1] is the power function of the test ϕ. The left hand side of (4.1) is called the size of the test ϕ. (d) If φ α is the set of all α level tests for the test problem (H 0, H 1 ), then ϕ 0 φ α is a most powerful test (MP-test) for an alternative θ H 1, if β ϕ0 (θ) β ϕ (θ) ϕ φ α and ϕ φ α is a uniformly most powerful test (UMP-test) for H 0 against H 1 of level α, if β ϕ (θ) = sup ϕ φ α β ϕ (θ) θ H 1. (4.2) (e) A test ϕ φ α is called unbiased, if β ϕ (θ) α θ H 1. (4.3) (f) A solution of (4.1), (4.2) and (4.3) is called a uniformly most powerful unbiased (UMPU-) α level test.
4.2. TEST OF A SIMPLE HYPOTHESIS AGAINST A SIMPLE ALTERNATIVE39 4.2 Test of a Simple Hypothesis against a Simple Alternative In this section Θ = {θ 0, θ 1 }, H 0 = {θ 0 }, H 1 = {θ 1 }. In this case there always exists a dominating measure, e.g. the measure µ = P θ0 + P θ1. The densities are denoted by f( ; θ 0 ) = f 0, f( ; θ 1 ) = f 1. Theorem 4.2.1 (Foundamental Lemma of Neyman and Pearson): (a) Any test of the form 1, if f 1 (x) > kf 0 (x) ϕ(x) = γ(x), if f 1 (x) = kf 0 (x) 0, if f 1 (x) < kf 0 (x) (4.4) with k 0, 0 γ(x) 1, is a most powerful test of its size with 0 α 1 for H 0 : θ = θ 0 against H 1 : θ = θ 1. For k =, the test 1, if f 0 (x) = 0 ϕ(x) = 0, if f 0 (x) > 0 (4.5) is an M.P. test of its size α = 0 for H 0 against H 1. (b) For each level α, with 0 α 1, there exists a test of the form (4.4) or (4.5) with E θ0 [ϕ(x)] = α. Here γ(x) = γ (a constant). The constants k and γ, 0 γ 1, are determined by α = E θ0 [ϕ(x)] = P θ0 (f 1 (X) > kf 0 (X)) + γp θ0 (f 1 (X) = kf 0 (X))(4.6) Remark: A test of type (4.4) is called a Neyman-Pearson test with accompanying number k. Remarks:
40 CHAPTER 4. THEORY OF TESTS As the reasoning in the proof shows, the case µ{f 1 = kf 0 } = 0 leads to a non randomized test. Since the trivial α level test ϕ α with E θ0 [ϕ (X)] = E θ1 [ϕ (X)] = α does not have the form (4.4), it follows that E θ1 [ϕ(x)] α which means that a Neyman-Pearson test is unbiased. If there exists a sufficient statistic S for the family {f 0, f 1 }, then the NP test is a function of S. 4.3 Families With Monotone Likelihood Ratio In this section we consider the problem of testing one-sided hypotheses for Θ IR an interval. In the sequel let P = {P θ θ Θ} µ and assume, that for the µ densities f( ; θ) > 0 µ a.e. for all θ Θ holds. Def. 4.3.1: We say that the family P has a monotone likelihood ratio (MLR) in the statistic T (X), if for θ 1 < θ 2, f( ; θ 1 ) f( ; θ 2 ), the ratio f(x; θ 2 )/f(x; θ 1 ) is a nondecreasing (nonincreasing) function of T (x) on {x X f(x; θ 1 ) > 0 f(x; θ 2 ) > 0}. Theorem 4.3.1: The familiy E 1 with density f(x; θ) = C(θ) exp{q(θ)t (x)}h(x) has for Q nondecreasing (nonincreasing) a monotone likelihood ratio in T (X). Remark: With the reparametrization λ := Q(θ) this property can always be achieved. Theorem 4.3.2: Let the family F = {f( ; θ) θ Θ} have a monotone likelihood ration in T (x). For testing H 0 : θ θ 0 against H 1 : θ > θ 0 any test of the form
4.4. UNBIASED TESTS 41 1, if T (x) > c ϕ(x) = γ, if T (x) = c 0, if T (x) < c (4.7) has a nondecreasing power function and is UMP of its size E θ0 [ϕ(x)] = α (provided the size α > 0). Remark: For symmetry reasons Theorem 4.3.2 yields also a UMP test for the test problem H 0 : θ θ 0 against H 1 : θ < θ 0. In general, the results of the above Theorem cannot be extended to two-sided problems. One exception is the family E 1 : Theorem 4.3.3: For the family E 1 there exists a UMP test of the hypothesis H 0 : θ θ 1 or θ θ 2 (θ 1 < θ 2 against H 1 : θ 1 < θ < θ 2 that is of the form 1, if c 1 < T (X) < c 2 ϕ(x) = γ i, if T (X) = c i, i = 1, 2 0, if T (X) < c 1 or T (X) > c 2, where the c s and the γ s are given by E θ1 [ϕ(x)] = E θ2 [ϕ(x)] = α. Remark: UMP tests for H 0 : θ 1 θ θ 1 or H 0 : θ = θ 0 do not exist, even in the family E 1. 4.4 Unbiased Tests Unbiased tests we encountered already in Def. 4.1.1. They have the property that β ϕ (θ) α for θ Θ 0 and β ϕ (θ) α for θ Θ 1. 4.4.1 α Similar Tests Def. 4.4.1: (1) Let U α φ α be the class of all unbiased sized tests of H 0.
42 CHAPTER 4. THEORY OF TESTS (2) A test ϕ is said to be α similar on a subset Θ Θ, if β ϕ (θ) = E θ [ϕ(x)] = α for θ Θ. (3) A test is said to be similar on a set Θ Θ, if its α similar for some α [0, 1]. Theorem 4.4.1: Let β ϕ (θ) be continuous in θ for any ϕ. If ϕ U α for H 0 against H 1, then it is α similar on the boundary Λ = Θ 0 Θ 1. Def. 4.4.2: A test ϕ that is UMP among all α similar tests on the boundary Λ is said to ba a UMP α similar test. Theorem 4.4.2: Let the power function β of every test ϕ of H 0 against H 1 be continuous in θ. Then a UMP α similar test is UMP unbiased, provided its size is α. Remark: The continuity of β ϕ is not always easy to show. 4.4.2 Local MP Unbiased Tests To test the hypothesis H 0 : θ θ 0 we try to find a locally optimal unbiased test which, in a neighbourhood of θ 0 fulfils the following conditions: (0) β ϕ is twice continuously differentiable with respect to θ (1) β ϕ (θ 0 ) = α (2) β ϕ(θ 0 ) = 0 (3) β ϕ(θ 0 ) max. Theorem 4.4.3 (Locally MP Unbiased Tests): Let f θ F = {f θ θ Θ} be twice continuously differentiable in θ. If the power function of a test ϕ, given by 1, f(x; θ0 ) > k 0 f(x; θ 0 ) + k 1 f(x; θ 0 ) ϕ(x; k 0, k 1, c) = γ, f(x; θ0 ) = k 0 f(x; θ 0 ) + k 1 f(x; θ 0 ) 0, f(x; θ0 ) < k 0 f(x; θ 0 ) + k 1 f(x; θ 0 )
4.4. UNBIASED TESTS 43 fulfils the conditions (0), (1) and (2), then also (3) is fulfilled. The question whether one can always find constants k 0, k 1 and γ such that (1) and (2) holds, remains open, excepting exponential families. Theorem 4.4.4: Let P = E 1 with µ density f(x; θ) = C(θ)e θt (x) h(x). If the power function of the test 1, T (X) / [T 1, T 2 ] ϕ(x; T 1, T 2, c) = c, T (X) = T 1 or T (X) = T 2 ] 0, T (X) [T 1, T 2 ] fulfils (1) and (2), then also (3). (4.8) 4.4.3 UMP Unbiased Tests in One-Parameter Exponential Families Ref. Lehmann, Testing... (1997), pp. 134 ff. In 4.3 we have seen that UMP-test for hypotheses (i) H 0 : θ θ 0 against H 1 : θ > θ 0 or (ii) H 0 : θ θ 1 or θ θ 2 against H 1 : θ 1 < θ < θ 2 exist, but not for (iii) H 0 : θ 1 θ θ 2 against H 1 : θ < θ 1 or θ > θ 2. Theorem 4.4.5: Let P = E 1 with µ density f(x; θ) = C(θ)e θt (x) h(x). Then there exists a UMP Unbiased test, which is given by (4.8), where the constants T 1 and T 2 and γ are given by E θ1 [ϕ(x)] = E θ2 [ϕ(x)] = α. (4.9) 4.4.4 Invariant Tests Def. 4.4.3:
44 CHAPTER 4. THEORY OF TESTS (1) A group G of transformations on X leves the hypothesis testing problem invariant if it leaves both {P θ θ Θ 0 } and {P θ θ Θ 1 }invariant. (2) We say that ϕ is invariant under G if ϕ(g(x)) = ϕ(x) for all x X and g G. (3) A statistic T is (a) invariant under G, if T (g(x)) = T (x) x X and g G. (b) maximal invariant, if T (x 1 ) = T (x 2 ) x 1 g G. = g(x 2 ) for some Def. 4.4.4: Let ΦI α donote the set of all invariant tests of size α with respect to G for H 0 : θ Θ 0 against H 1 : θ Θ 1. If there exists a UMP test in ΦI α, then we call it a UMP invariant test of H 0 against H 1. Theorem 4.4.6: Let T be maximal invariant with respect to G. ϕ is invariant under G if and only if ϕ is a function of T. Then Remark: If a hypothesis testing problem is invariant under a group G, it suffices to restrict attention to functions of maximal invariant statistics T. 4.5 Likelihood Ratio Tests Let P = {P θ θ Θ} µ and Θ = Θ 0 + Θ 1. In many cases UMP tests do not exist, and where they exist, the approach can only be applied to particular families of distributions. The Likelihood Ratio test (LR) is an intuitive and plausible procedure which often leads to UMPU tests. Def. 4.5.1: For testing H 0 against H 1, a test of the form: reject H 0 if and only if λ(x) > c, where c is some constant and λ(x) = sup f(x 1,..., x n ; θ) θ Θ sup f(x 1,..., x n ; θ) = θ Θ 0 f(x 1,..., x n ; ˆθ ML ) f(x 1,..., x n ; θ)
4.6. ASYMPTOTIC TESTS 45 is called a likelihood ratio test. Here ˆθ ML is the unrestricted Maximum Likelihood estimator, θ is the MLestimator under the restriction θ Θ 0. The constant c is determined from the size restriction sup P θ (x λ(x) > c) = α. θ Θ 0 It can easily be seen that for testing a simple hypthesis against a simple alternative to a given size α(0 α 1) nonrandomized Neyman-Pearson tests and LR tests are equivalent, if they exist; the LR test for θ Θ 0 against θ Θ 1 is a function of every sufficient statistic S for θ (see Theorem 2.2.1 resp. 2.2.2). Theorem 4.5.1: Let the regularity conditions of Theorem 3.2.5 (Cramér- Rao inequality) hold. Then under H 0 the statistic2 ln λ(x) is asymptotically distributed as a χ 2 random variable with degrees of freedom equal to the difference between the number of independent parameters in Θ and the number in Θ 0. 4.6 Asymptotic Tests For 4.6.1-4.6.3 see Buse, The American Statistician, 1982, 36, pp. 153-157. Let Θ IR k. H 0 : h(θ) = 0 where h : IR k IR r (r k) 4.6.1 Wald-Test See: Transactions of the American Mathematical Society 1943, pp. 426-482. Let R θ := h(θ) with rank R θ T θ = r and W = h(ˆθ ML ) T [ RˆθML [I(ˆθ ML )] 1 RˆθML ] h(ˆθml ),
46 CHAPTER 4. THEORY OF TESTS where ˆθ ML is the unrestricted ML estimator. Under H 0 W is asymptotically χ 2 (r) distributed and the test is of the form 1, W > c ϕ(x) = for a certain constant c. 0, W < c 4.6.2 Lagrange Multiplier Test It is based on the Lagrange multiplier approach: Φ(θ; η) = l(θ; η) + η T h(θ), where η is the Lagrange multiplier. Let ˆθ (r) = arg sup θ H 0 L(θ; X 1,..., X n ). The test statistic is where LM = (ˆθ (r) ) T [I(ˆθ (r) )] 1 (ˆθ (r) ) = Ψ(ˆθ (r) ) T I(ˆθ (r) ) 1 Ψ(ˆθ (r) ), (ˆθ (r) ) = Ψ(ˆθ (r) ) is the score function log f θ at θ = ˆθ (r). Under H 0 LM has an asymptotic χ 2 (r) distribution and the test statistic is distributed as in 4.6.1. 4.6.3 Likelihood Ratio Test It works as described in 4.5, where for determining the constant c the asymptotic χ 2 (r) distribution is being used. 4.7 Goodness of Fit Tests We consider the general testing problem H 0 : P P 0 against H 1 : P P 1, where P = P 0 +P 1. Def. 4.7.1 A sequence of tests (ϕ n (X)) n IN is called consistent for the testing problem H 0 : P P 0 against H 1 : P P 1, if lim β 1, P ϕ n n (x) = X P 1 0, P X P 0
4.7. GOODNESS OF FIT TESTS 47 For consistent tests the power function converges to the ideal power function, which for the onesided problem H 0 : θ θ 0 against H 1 : θ > θ 0 is given by the Heaviside function 1, θ > θ 0 H θ0 (θ) = 0, θ θ 0 At the first instant we look at the (two-sided) testing problem H 0 : P X = P 0 with c.d.f. F 0 against H 1 : P X P 0. Here the class P is the class of all distributions with densities with respect to Lebesque measure. In H 0 the density F 0 has to be specified completely. According to the Gliwenko-Cantinelli Lemma (Theorem 1.2) the empirical c.d.f. F n converges almost surely univormly to F 0. For the maximal difference n := sup F n (x) F 0 (x) x IR the following result holds: Theorem 4.7.1 (Kolmogorov): Let the c.d.f. F 0 be continuous. Then lim P ( n n z) = H(z), n where ( 1) k e 2k2 z 2 z > 0 H(z) = k= 0 z 0. The limit distribution H does obviously not depend on F 0. Hence the asymptotic test 1 n n > k ϕ(x) = 0 n n < k with k equal to the (1 α) Quantil of H is distribution free. The Gliwenko- Cantelli lemma ensures consistency.
48 CHAPTER 4. THEORY OF TESTS A further asymptotic goodness of fit test is the so-called χ 2 goodness of fit test. It is based on a comparison between observed and under H 0 expected frequencies. Starting point is the following asymptotic result. Theorem 4.7.2: Let the random vector (X 1,..., X k ) have a polynomial M(n; p 1,..., p k ) distribution with 0 < p i < 1, p i = 1. Then the k statistic i=1 k X 2 (X i np i ) 2 =, X 1 +... + X k = n, np i i=1 is asymptotically χ 2 (k 1) distributed. The χ 2 -test is 1 X 2 > c ϕ(x) = 0 X 2 < c where c is the (1 α) Quantile of the χ 2 (k 1) distribution. Some rules of prudence are adequate when this test is applied in practice. For F 0 continuous an appropriate division of IR into k classes is necessary such that p i = F 0 (x i ) F 0 (x i 1). The p i should be approximately equal and as a rule of thumb np i 5 for all i = 1,..., k is recommended. The test is also applicable if the parameteres of some F θ are estimated by ML (with a corresponding reduction of degrees of freedom).