Chapter 6 Testing. 1. Neyman Pearson Tests. 2. Unbiased Tests; Conditional Tests; Permutation Tests

Size: px

Start display at page:

Download "Chapter 6 Testing. 1. Neyman Pearson Tests. 2. Unbiased Tests; Conditional Tests; Permutation Tests"

Myles Stevenson
5 years ago
Views:

1 Chapter 6 Testing. eyman Pearson Tests 2. Unbiased Tests; Conditional Tests; Permutation Tests 2. Unbiased tests 2.2 Application to -parameter exponential families 2.3 UMPU tests for families with nuisance parameters via conditioning 2.4 Application to general exponential families 2.5 Permutation tests 3. Invariance in testing; Rank Methods 3. otation and basic results. 3.2 Orbits and maximal invariants. 3.3 Examples. 3.4 UMP G - invariant tests. 3.5 Rank tests 4. Local Asymptotic Theory of Tests: (more) contiguity theory 5. Confidence Sets and p-values 6. Global / Omnibus tests: tests based on empirical distributions and measures

2 2

3 Chapter 6 Testing eyman Pearson Tests Basic otation. Consider the hypothesis testing problem as in Examples 5..4 and 5.5.4, but with Θ 0 = {0}, Θ = {} (simple hypotheses). Let φ be a critical function (or decision rule); let α = size or level E 0 φ(x); and let β = power = E φ(x). Theorem. (eyman - Pearson lemma). Let P 0 and P have densities p 0 and p with respect to some dominating measure µ (recall that µ = P 0 + P always works). Let 0 α. Then: (i) There exists a constant k and a critical function φ of the form { if p (x) > kp φ(x) = 0 (x) () 0 if p (x) < kp 0 (x) such that (2) E 0 φ(x) = α. (ii) The test of () and (2) is a most powerful α level test of P 0 versus P. (iii) If φ is a most powerful level α test of P 0 versus P, then it must be of the form () a.e. µ. It also satisfies (2) unless there is a test of size < α with power =. Corollary If 0 < α < and β is the power of the most powerful level α test, then α < β unless P 0 = P. Proof. Let 0 < α <. (i) ow P 0 (p (X) > cp 0 (X)) = P 0 (Y p (X)/p 0 (X) > c) = F Y (c). Let k inf{c : F Y (c) < α}, and if P 0 (Y = k) > 0, let γ (α P 0 (Y > k))/p 0 (Y = k). Thus with if p (x) > kp 0 (x) φ(x) = γ if p (x) = kp 0 (x) 0 if p (x) < kp 0 (x), we have E 0 φ(x) = P 0 (Y > k) + γp 0 (Y = k) = α. 3

4 4 CHAPTER 6. TESTIG (ii) Let φ be another test with E 0 φ α. ow (φ φ )(p kp 0 )dµ = X [φ φ >0] [φ φ <0] (φ φ )(p kp 0 )dµ 0, and this implies that β φ β φ = (φ φ )p dµ X k (φ φ )p 0 dµ = k(α E 0 φ ) 0. X Thus φ is most powerful. (iii) Let φ be most powerful of level α. Define φ as in (i). Then X (φ φ )(p kp 0 )dµ = [φ φ ] [p kp 0 0] (φ φ )(p kp 0 )dµ { 0 as in (ii) > 0 if µ([φ φ ] [p kp 0 ]) > 0 = 0 since > 0 contradicts φ being most powerful. Thus µ([φ φ ] [p kp 0 ]) = 0. Thus φ = φ on the set where p kp 0. If φ were of size < α and power <, then it would be possible to include more points (or parts of points) in the rejection region, and thereby increase the power, until either the power is or the size is α. Thus either E 0 φ (X) = α or E φ (X) =. Corollary proof. φ # (x) α has power α, so β α. If β = α, then φ # α is in fact most powerful; and hence (iii) shows that φ(x) = α satisfies (i); that is, p (x) = kp 0 (x) a.e. µ. Thus k = and P = P 0. If α = 0, let k = and φ(x) = whenever p (x)/p 0 (x) = ; this is γ =. If α =, let k = 0 and γ =, so that we reject for all x with p (x) > 0 or p 0 (x) > 0. Definition. If the family of densities {p θ : θ [θ 0, θ ] R} is such that p θ (x)/p θ (x) is nondecreasing in T (x) for each θ < θ, then the family is said to have monotone likelihood ratio (MLR). Definition.2 A test is of size α if sup E θ φ(x) = α. θ Θ 0 Let C α {φ : φ is of size α}. A test φ 0 is uniformly most powerful of size α (UMP of size α) if it has size α and E θ φ 0 (X) E θ φ(x) for all θ Θ and all φ C α.

5 . EYMA PEARSO TESTS 5 Theorem.2 (Karlin - Rubin). Suppose that X has density p θ with MLR in T (x). (i) Then there exists a UMP level α test of H : θ θ 0 versus K : θ > θ 0 which is of the form if T (x) > c φ(x) = γ if T (x) = c 0 if T (x) < c with E θ0 φ(x) = α. (ii) β(θ) = E θ φ(x) is increasing in θ for β <. (iii) For all θ this same test is the UMP level α β(θ ) test of H : θ θ versus K : θ > θ. (iv) For all θ < θ 0, the test of (i) minimizes β(θ) among all tests satisfying α = E θ0 φ. Proof. (i) and (ii): The most powerful level α test of θ 0 versus θ > θ 0 is the φ above, by the eyman - Pearson lemma, which guarantees the existence of c and γ. Thus φ is UMP of θ 0 versus θ > θ 0. According to the P lemma (ii), this same test is most powerful of θ versus θ ; thus (ii) follows from the P corollary. Thus φ is also level α in the smaller class of tests of H versus K; and hence is UMP there also: note that with C α {φ : sup θ θ0 E θ φ = α} and C θ 0 α {φ : E θ0 φ α}, C α C θ 0 α. (iii) The same argument works. (iv) To minimize power, just apply the P lemma with inequalities reversed. Example. (Hypergeometric). Suppose that we sample without replacement n items from a population of items of which θ = D are defective. Let X number of defective items in the sample. Then ( D D ) P D (X = x) p D (x) = x)( n x (, for x = 0 (n + D),..., D n. n) Since p D+ (x) p D (x) = D + D n + x D D + x is increasing in x, this family of distributions has MLR in T (X) = X. Thus the UMP test of H : D D 0 versus K : D > D 0 rejects H if X is too big : φ(x) = {X > c} + γ{x = c} where P D0 (X > c) + γp D0 (X = c) = α. Reminder: E(X) = nd/ and V ar(x) = n(d/)( D/)( (n )/( )). Example.2 (One-parameter exponential families). Suppose that p θ (x) = c(θ) exp(q(θ)t (x))h(x) with respect to the dominating measure µ where Q(θ) is increasing in θ. Then if T (x) > c φ(x) = γ if T (x) = c 0 if T (x) < c with E θ0 φ(x) = α is UMP level α for testing H : θ θ 0 versus K : θ > θ 0. [See pages 70-7 in TSH for binomial, negative binomial, Poisson, and exponential examples].

6 6 CHAPTER 6. TESTIG Example.3 (oncentral t, χ 2, and F distributions). The noncentral t, χ 2, and F distributions have MLR in their noncentrality parameters. See Lehmann and Romano, page 224 for the t distribution; see Lehmann and Romano problem 7.4, page 307 for the χ 2 and F distributions. Example.4 (Counterexample: Cauchy location family). The Cauchy location family p θ (x) = π ( + (x θ) 2 ) does not have MLR. Theorem.3 (Generalized eyman-pearson lemma). Let f 0, f,..., f m be real-valued, µ integrable functions defined on a Euclidean space X. Let φ 0 be any function of the form if f 0 (x) > k f (x) + + k m f m (x) φ 0 (x) = γ(x) if f 0 (x) = k f (x) + + k m f m (x) 0 if f 0 (x) < k f (x) + + k m f m (x) where 0 γ(x). Then φ 0 maximizes φf 0 dµ over all φ, 0 φ such that φf i dµ = φ 0 f i dµ, i =,..., m. If k j 0 for j =,..., m, then φ 0 maximizes φf 0 dµ over all functions φ, 0 φ such that φf i dµ φ 0 f i dµ, i =,..., m. Proof. ote that k (φ 0 φ)(f 0 k j f j )dµ 0 j= since the integrand is 0 by the definition of φ 0. Hence (φ 0 φ)f 0 dµ m k j (φ 0 φ)f j dµ 0 j= in either of the above cases, and hence φ 0 f 0 dµ φf 0 dµ. This is a short form of the generalized eyman-pearson lemma; for a long form with more details and existence results, see Lehmann and Romano, TSH, page 77. Example.5 Suppose that X,..., X n are i.i.d. from the Cauchy location family p(x; θ) = π + (x θ) 2,

7 . EYMA PEARSO TESTS 7 and consider testing H : θ = θ 0 versus θ > θ 0. Can we find a test φ of size α such that φ maximizes d dθ β φ(θ 0 ) = d dθ E θφ(x)? θ=θ0 For any test φ the power is given by β φ (θ) = E θ φ(x) = φ(x)p(x; θ)dx, so, if the interchange of d/dθ and is justifiable, then β φ (θ) = φ(x) p(x; θ)dx. θ Thus, by the generalized -P lemma, any test of the form if θ p(x; θ 0) > kp(x; θ 0 ) φ(x) = γ(x) if θ p(x; θ 0) = kp(x; θ 0 ) 0 if θ p(x; θ 0) < kp(x; θ 0 ) maximizes β φ (θ 0) among all φ with E θ0 φ(x) α. This test is said to be locally most powerful of size α; cf. Ferguson, section 5.5, page 235. But θ p(x; θ 0) > kp(x; θ 0 ) is equivalent to or or θ p(x; θ 0) > k, p(x; θ 0 ) θ log p(x; θ 0) > k, S n (θ 0 ) = n l θ (X i ; θ 0 ) > k. hence for the Cauchy family (with θ 0 0 without loss of generality), since 2(x θ) log p(x; θ) = θ + (x θ) 2, the locally most powerful test is given by { if n /2 n 2X i +X φ(x) = i 2 0 if n /2 n 2X i +Xi 2 > k < k where k is such that E 0 φ(x) = α. Under θ = θ 0 0, with Y i 2X i /( + X 2 i ), E 0 Y i = 0, Var 0 (Y i ) = /2. Hence, by the CLT, k may be approximated by 2 /2 z α where P (Z > z α ) = α. (It would be possible refine this first order approximation to k by way of an Edgeworth expansion; see e.g. Shorack (2000), page 392.) ote that x/( + x 2 ) 0 as x. Thus, if α < /2 so that k > 0, the rejection set of φ is a bounded set in R n ; and since the probability that X = (X,..., X n ) is in any given bounded set tends to 0 as θ, β φ (θ) 0 as θ.

8 8 CHAPTER 6. TESTIG Consistency of eyman - Pearson tests Let P and Q be probability measures, and suppose that p and q are their densities with respect to a common σ finite measure µ on (X, A). Recall that the Hellinger distance H(P, Q) between P and Q is given by H 2 (P, Q) = ( p pqdµ q) 2 dµ = = ρ(p, Q) 2 where ρ(p, Q) pqdµ is the Hellinger affinity between P and Q. Proposition. H(P, Q) = 0 if and only if p = q a.e. µ if and only if ρ(p, Q) =. Furthermore ρ(p, Q) = 0 if and only if p q in the Hilbert space L 2 (µ). Recall that if X,..., X n are i.i.d. P or Q with joint densities p n (x) = p(x) = n p(x i ), or q n (x) = q(x) = n q(x i ), then ρ(p n, Q n ) = ρ(p, Q) n 0 unless p = q a.e. µ (which implies ρ(p, Q) = ). Theorem.4 (Size and power consistency of eyman-pearson type tests). For testing p versus q the test { if qn (x) > k φ n (x) = n p n (x) 0 if q n (x) < k n p n (x) with 0 < a k n a 2 < for all n is size and power consistent if P Q: both probabilities of error converge to zero as n. In fact, E P φ n (X) k /2 n ρ(p, Q) n a /2 ρ(p, Q) n, E Q ( φ n (X)) k /2 n ρ(p, Q) n a /2 2 ρ(p, Q) n. Proof. For the type I error probability we have E P φ n (X) = φ n (x)p n (x)dµ(x) = φ n (x)p /2 kn /2 φ n (x)p /2 n (x)qn /2 (x)dµ(x) kn /2 n (x)pn /2 (x)dµ(x) p /2 n (x)q /2 n (x)dµ(x) = k /2 n ρ(p n, Q n ) = k /2 n ρ(p, Q) n. The argument for type II errors is similar: E Q ( φ n (X)) = ( φ n (x))q n (x)dµ(x) = ( φ n (x))q /2 kn /2 ( φ n (x))pn /2 (x)qn /2 (x)dµ(x) kn /2 n (x)qn /2 (x)dµ(x) p /2 n (x)q /2 n (x)dµ(x) = k /2 n ρ(p n, Q n ) = k +/2 n ρ(p, Q) n.

9 . EYMA PEARSO TESTS 9 ow suppose that P = P θ0 and Q = P θn where P θ P {P θ : θ Θ} is Hellinger differentiable at θ 0 and θ n = θ 0 + n /2 h. Thus nh 2 (P θ0, P θn ) = 2 n { p θn p θ0 } 2 dµ and consequently ρ(p θ0, P θn ) n = 2 4 ht I(θ 0 )h, ( nh2 (P θ0, P θn ) n ( exp ) 8 ht I(θ 0 )h. Hence from the same argument used to prove Theorem.4, ( lim sup E θ0 φ n (X) a /2 exp ) (a) n 8 ht I(θ 0 )h, while (b) ) n ( lim sup E θn ( φ n (X)) a +/2 2 exp ) n 8 ht I(θ 0 )h. If we choose k n = a for all n and fix h and a so that a /2 exp( h T I(θ 0 )h/8) = α, then a = α exp( h T I(θ 0 )h/8), and hence the RHS of (b) is given by α exp( h T I(θ 0 )h/4).

10 0 CHAPTER 6. TESTIG 2 Unbiased Tests; Conditional Tests; Permutation Tests 2. Unbiased Tests otation: Consider testing H : θ Θ 0 versus K : θ Θ where X P θ, for some θ Θ = Θ 0 + Θ, is observed. Let φ denote a critical (or test) function. Definition 2. φ is unbiased if β φ (θ) α for all θ Θ and β φ (θ) α for all θ Θ 0. φ is similar on the boundary (SOB) if β φ (θ) = α for all θ Θ 0 Θ Θ B. Remark 2. If φ is a UMP level α test, then φ is unbiased. Proof: compare φ with the trivial test function φ 0 α. Remark 2.2 If φ is unbiased and β φ (θ) is continuous for θ Θ, then φ is SOB. Proof: Let θ n s in Θ 0 converge to θ 0 Θ B. Then β φ (θ 0 ) = lim n β φ (θ n ) α. Similarly β φ (θ 0 ) α by considering θ n s in Θ converging to θ 0. Hence β φ (θ 0 ) = α. Definition 2.2 A uniformly most powerful unbiased level α test is a test φ 0 for which E θ φ 0 E θ φ for all θ Θ and for all unbiased level α tests φ. Lemma 2. If P = {P θ : θ Θ} is such that β φ (θ) is continuous for all test functions φ, then if φ 0 is UMP SOB for H versus K and if φ 0 is level α for H versus K, then φ 0 is UMP unbiased (UMPU) for H versus K. Proof. The unbiased tests are a subset of the SOB tests by remark 2.2. Since φ 0 is UMP SOB, it is thus at least as powerful as any unbiased test. But φ 0 is unbiased since its power is greater than or equal to that of the SOB test φ α, and since it is level α. Thus φ 0 is UMPU. Remark 2.3 For a multiparameter exponential family with densities dp θ dµ (x) = c(θ) exp( θ j T j (x)), with θ = (θ,..., θ d ) R d, the power function β φ (θ) is continuous in θ for all φ. Proof. Apply theorem 2.7. of chapter 2 of Lehmann and Romano (2005) with φ to find that c(θ) is continuous; then apply it again with φ denoting an arbitrary critical function.

11 2. UBIASED TESTS; CODITIOAL TESTS; PERMUTATIO TESTS 2.2 Application to one-parameter exponential families Suppose that p θ (x) = c(θ) exp(θt (x))h(x) for θ Θ R with respect to a σ finite measure µ on some subset of R n. Problems: Test () H : θ θ 0 versus K : θ > θ 0 ; (2) H 2 : θ θ or θ θ 2 versus K 2 : θ < θ < θ 2 ; (3) H 3 : θ θ θ 2 versus K 3 : θ < θ or θ 2 < θ; (4) H 4 : θ = θ 0 versus K 4 : θ θ 0. Theorem 2. () The test φ with E θ0 φ (T ) = α given by if T (X) > c φ (T (X)) = γ if T (X) = c 0 if T (X) < c is UMP for H versus K. (2) The test φ 2 with E θi φ 2 (T ) = α, i =, 2 given by if c < T (X) < c 2 φ 2 (T (X)) = γ i if T (X) = c i 0 if otherwise is UMP for H 2 versus K 2. (3) The test φ 3 with E θi φ 3 (T ) = α, i =, 2 given by if T (X) < c or T (X) > c 2 φ 3 (T (X)) = γ i if T (X) = c i 0 if otherwise is UMPU for H 3 versus K 3. (4) The test φ 4 with E θ0 φ 4 (T ) = α and E θ0 T φ 4 (T ) = αe θ0 T given by if T (X) < c or T (X) > c 2 φ 4 (T (X)) = γ i if T (X) = c i 0 if otherwise is UMPU for H 4 versus K 3. Furthermore, if T is symmetrically distributed about a under θ 0, then E θ0 φ 4 (T ) = α, c 2 = 2a c and γ = γ 2 determine the constants. The characteristic behavior of the power of these four tests is as follows: Proof. () and (2) were proved earlier using the P lemma (via MLR) and its generalized version respectively. For (2), see pages 8-82 in Lehmann and Romano (2005). For (3), see Lehmann and Romano (2005), page 2. (4) We need only consider tests φ(x) = ψ(t (x)) based on the sufficient statistic T, whose distribution is of the form p θ (t) = c(θ)e θt with respect to some σ finite measure ν. Since all power functions are continuous in the case of an exponential family, it follows that any unbiased test ψ satisfies α = β ψ (θ 0 ) = E θ0 ψ(t ) and has a minimum at θ 0.

12 2 CHAPTER 6. TESTIG But by theorem 2.7., chapter 2, TSH, β ψ is differentiable, and can be differentiated under the integral sign; hence β ψ (θ) = d ψ(t)c(θ) exp(θt)dν(t) dθ = c (θ) c(θ) E θψ(t ) + E θ (T ψ(t )) = ( E θ T )E θ ψ(t ) + E θ (T ψ(t )) since, with ψ 0 α, 0 = β ψ 0 (θ) = c (θ)/c(θ) + E θ (T ). Thus 0 = β ψ (θ 0) = E θ0 (T ψ(t )) αe θ0 T. Thus any unbiased test ψ(t ) satisfies the two conditions of the statement of our theorem. We will apply the generalized P lemma to show that φ as given is UMPU. Let M {(E θ0 ψ(t ), E θ0 T ψ(t )) : ψ(t ) is a critical function}. Then M is convex and contains {(u, ue θ0 T ) : 0 < u < }. Also M contains points (α, v) with v > αe θ0 T ; since, by problem 8 of chapter 3, Lehmann TSH, there exist tests (UMP one-sided ones) having β (θ 0 ) > 0. Likewise M contains points (α, v) with v < αe θ0 T. Hence (α, αe θ0 T ) is an interior point of M. Thus, by the generalized P lemma (iv), there exist k, k 2 such that (a) ψ(t) = = { when c(θ0 )(k + k 2 t)e θ0t < c(θ )e θ t 0 when c(θ 0 )(k + k 2 t)e θ0t > c(θ )e θ t { when a + a 2 t < e bt 0 when a + a 2 t > e bt having the property that it maximizes E θ ψ(t ). But the region described in (a) is either one-sided or else the complement of an interval. By theorem 3..6 it cannot be one-sided (since one-sided tests have strictly monotone power functions violating β (θ 0 ) = 0). Thus (b) ψ(t ) = { if T < c or T > c 2 0 if c < T < c 2. Since this test does not depend on θ θ 0, it is the UMP (within the class of level α tests having β (θ 0 ) = 0) test of H 4 versus K 4. Since ψ 0 α is in this class, ψ is unbiased. And this class of test includes the unbiased tests. Hence ψ is UMPU. If T is distributed symmetrically about some point a under θ 0, then any test ψ symmetric about a that satisfies E θ0 ψ(t ) = α will also satisfy E θ0 T ψ(t ) = E θ0 (T a)ψ(t ) + ae θ0 ψ(t ) = 0 + aα = αe θ0 T automatically.

13 2. UBIASED TESTS; CODITIOAL TESTS; PERMUTATIO TESTS UMPU tests for families with nuisance parameter via conditioning Definition 2.3 Let T be sufficient for P B {P θ : θ Θ B }, and let P T {Pθ T function φ is said to have eyman structure with respect to T if : θ Θ B}. A test E(φ(X) T ) = α a.s. P T. Remark 2.4 If φ has eyman structure with respect to T, then φ is SOB. Proof. E θ φ(x) = E θ E(φ(X) T ) = E θ α = α for all θ Θ B. Theorem 2.2 Let X be a random variable with distribution P θ P = {P θ : θ Θ}, and let T be sufficient for P B = {P θ : θ Θ B }. Then all SOB tests have eyman structure with respect T if and only if the family of distributions P T {Pθ T : θ Θ B } is boundedly complete: i.e. if E P h(t ) = 0 for all P P T with h bounded, then h = 0 a.e. P T. Proof. Suppose that P T is boundedly complete. Let φ be a SOB level α test; and define ψ(t ) E(φ(X) T ). ow E θ (ψ(t ) α) = E θ (E(φ(X) T )) α = E θ φ(x) α = 0 for all θ Θ B, and since ψ(t ) α is bounded, the bounded completeness of P T implies ψ(t ) = α a.e. P T. Hence α = ψ(t ) = E(φ(X) T ) a.e. P T, and φ has eyman structure with respect to T. ow suppose that all SOB tests have eyman structure. Assume P T is not boundedly complete. Then there exists h such that h some M with E θ h(t ) = 0 for all θ Θ B and h(t ) 0 with probability > 0 for some θ 0 Θ B. Define φ(t ) ch(t ) + α where c {α ( α)}/m. Then 0 φ(t ) so φ is a critical function, and E θ φ(t ) = α for all θ Θ B, so that φ is SOB. But E(φ(T ) T ) = φ(t ) α with probability > 0 for the above θ 0, so φ does not have eyman structure. This is a contradiction, and hence it follows that indeed P T is boundedly complete. Remark 2.5 Suppose that: (i) All critical functions φ have continuous power functions β φ. (ii) T is sufficient for P B = {P θ : θ Θ B } and P T {P T θ : θ Θ B } is boundedly complete. (Remark 2.3 says that (i) is always true for exponential families p θ (x) = c(θ) exp( θ j T j (x)); and theorem 4.3., TSH, page 6, allows us to check (ii) for these same families.) Then all unbiased tests are SOB and all SOB tests have eyman structure. Thus if we can find a UMP eyman structure test φ 0 and we can show that φ 0 is unbiased, then φ 0 is UMPU. Why is it easier to find UMP eyman structure tests? eyman structure tests are characterized by having conditional probability of rejection equal to α on each surface T = t. But the distribution on each such surface is independent of θ Θ B because T is sufficient for P T. Thus the problem has been reduced to testing a one parameter hypothesis for each fixed value of t; and in many problems we can easily find the most powerful test of this simple hypothesis. Example 2. (Comparing two Poisson distributions). Example 2.2 (Comparing two Binomial distributions).

14 4 CHAPTER 6. TESTIG Example 2.3 (Comparing two normal means when variances are equal). Example 2.4 (Paired normals with nuisance shifts). 2.4 Application to general exponential families; k parameter Consider the exponential family P = {P θ,ξ } given by p θ,ξ (x) = c(θ, ξ) exp(θu(x) + k ξ i T i (x)) with respect to a σ finite dominating measure µ on some subset of R n where Θ is convex, has dimension k +, and contains interior points θ i, i =, 2. Problems: Test () H : θ θ 0 versus K : θ > θ 0 ; (2) H 2 : θ θ or θ θ 2 versus K 2 : θ < θ < θ 2 ; (3) H 3 : θ θ θ 2 versus K 3 : θ < θ or θ 2 < θ; (4) H 4 : θ = θ 0 versus K 4 : θ θ 0. Theorem 2.3 The following are UMPU tests for the hypothesis testing problems -4 respectively: () The test φ given by if U > c(t) φ (x) = γ(t) if U = c(t) 0 if if U < c(t) where E θ0 (φ (U) T = t) = α is UMPU for H versus K. (2) The test φ 2 given by if c (t) < U < c 2 (t) φ 2 (x) = γ i (t) if U = c i (t) 0 if if else where E θi (φ 2 (U) T = t) = α, i =, 2, is UMPU for H 2 versus K 2. (3) The test φ 3 given by if U < c (t) or U > c 2 (t) φ 3 (x) = γ i (t) if U = c i (t) 0 if if else where E θi (φ 3 (U) T = t) = α, i =, 2 is UMPU for H 3 versus K 3. (4) The test φ 4 given by if U < c (t) or U > c 2 (t) φ 4 (x) = γ i (t) if U = c i (t) 0 if if else where E θ0 (φ 4 (U) T = t) = α and E θ0 {Uφ 4 (U) T = t} = αe θ0 {U T = t} is UMPU for H 4 versus K 4.

15 2. UBIASED TESTS; CODITIOAL TESTS; PERMUTATIO TESTS 5 Remark 2.6 If V = h(u, T ) is increasing in U for each fixed t and is independent of T on Θ B, then if V > c φ(x) = γ if V = c 0 if V < c is UMPU in (). Remark 2.7 If h h(u, T ) = a(t)u +b(t) with a(t) > 0, then the second constraint in (4) becomes { } { } V b(t) V b(t) E θ0 φ T = t = αe θ0 T = t a(t) a(t) or E θ0 (V φ T = t) = αe θ0 (V T = t), and if this V is independent of T on the boundary, then the test is unconditional. 2.5 Permutation Tests Consider testing H c : X,..., X m, Y,..., Y n are i.i.d. with df F F c where F c is the collection of all continuous distribution functions on R, versus K : X,..., X m, Y,..., Y n have joint density function h. We seek a most powerful similar test: φ is similar if () E (F,F ) φ(x, Y ) = α for all F F c. But if Z (Z,..., Z ) with m + n denotes the ordered values of the combined sample X,..., X m, Y,..., Y n, then when H c is true, Z is sufficient and complete; see e.g. Lehmann and Romano, TSH, page 8. Hence () holds if and only if (by theorem 2.2) E(φ(X, Y ) Z = z) = α for a.e. z = (z,..., z ) (2) = φ(πz)! = φ(z )! π Π z where the sum is over all! permutations z of z. Thus if α = I/!, then any test which is performed conditionally on Z = z and rejects for exactly I of the! permutations z of z is a level α similar test; moreover (2) says that any level α similar test is of this form. Definition 2.4 Tests satisfying (2) are called permutation tests. (Thus a test of H c versus K is similar if and only if it is a permutation test.) We now need to find a most powerful permutation test by maximizing the conditional power. But E h (φ(x, Y ) Z = z) = φ(z h(z ) ) h(z ). z

16 6 CHAPTER 6. TESTIG Since the conditional densities under the composite null hypothesis and under the simple alternative h are p 0 (z z) =! and p (z z) = h(z ) h(z ), z {πz : π Π}, the conditional power is maximized by rejecting for large values of p (z z) p 0 (z z) = K zh(z ) with K z = Thus, at level α = I/! we reject if h(z ) > c(z)! h(z ). where c(z) is chosen so that we reject for exactly I of the! permutations z of z; or else we use a randomized version of such a test. Example 2.5 Suppose now that we specify a particular alternative: K : X,..., X m are i.i.d. (θ, σ 2 ) Y,..., Y n are i.i.d. (θ 2, σ 2 ) where θ < θ 2 and σ 2 are fixed constants. Then the similar test of H c that is most powerful again this simple K rejects H for those permuations z of z which lead to large values of { ( (2πσ 2 ) /2 exp m )} 2σ 2 (X i θ ) 2 + (Y j θ 2 ) 2, or small values of m (X i θ ) 2 + or large values of = m Xi 2 + (Y j θ 2 ) 2 Y 2 j + mθ 2 + nθ 2 2 2θ m X i 2θ 2 Y j, j= θ m or large values of X i + θ 2 Y j mθ + nθ 2 ( = mn (θ 2 θ )(Y X), m X i + Y j ) Y X, or large values of θ m X i + θ 2 Y j θ ( m X i + Y j ) = (θ 2 θ ) Y j, j=

17 2. UBIASED TESTS; CODITIOAL TESTS; PERMUTATIO TESTS 7 or large values of 2 = mn (Y X) { Z 2 i (P Z i ) 2 2 mn mn (Y X)2 } (Y X) { (Xi X) 2 + τ. (Y j Y ) 2} Thus the most powerful similar test of H c versus K is { φ(z if τ > cα (z) ) = 0 if τ < c α (z) where c α (z) is chosen so that exactly α! of the permutations z lead to rejection (if this is possible; if not we can use a randomized test). But we know that τ takes on at most ( m) distinct values according to each of the ( m) assignments zc of m of the z i s to be X i s. Thus { if τ(zc ) > c φ(z c ) = α (z) (3) 0 if τ(z c ) < c α (z) where c α (z) is chosen so that exactly α ( m) of the assignments zc of m of the z i s to be X i s leads to rejection. Since the test (3) does not depend on which θ < θ 2 or σ 2 we started with, the test is actually a UMP similar test of H c versus K θ <θ 2,σ 2K ; i.e. different normal distributions with θ < θ 2, σ 2 unknown. Example 2.6 Suppose that (X, X 2 ) = (56, 72), (Y, Y 2, Y 3 ) = (68, 47, 86). Thus X = 64, Y = 67, Y X = 3. Here Z = (47, 56, 68, 72, 86), and ( 5 2) = 5!/(2!3!) = 0. (ote that 5! = 20.) ote Table 6.: ( 5 2) Possible Values of τ, = 5, m = 2 combination Y X Y j τ Y Y Y X X Y Y X Y X Y X Y Y X Y Y X X Y X Y Y Y X Y X Y X Y Y X X Y Y X Y Y X Y X Y X Y Y X X Y Y Y that ( 20 0) = 84, 756, and, by Stirling s formula (m! 2πm(m/e) m ) that ( ) 2m 2 2m as m, m πm

18 8 CHAPTER 6. TESTIG so the exact permutation test is difficult computationally for all but small sample sizes. sampling from the permutation distribution is always possible. But Remark 2.8 We will call the present test reject if τ > c α (z) the permutation t - test; it is the UMP similar test of H c versus K specified above. If we consider the smaller null hypothesis H G : X,..., X m, Y,..., Y n i.i.d. (θ, σ 2 ) with θ, σ 2 unknown, then we recall that the classical t -test reject if τ > t m+n 2,α is the UMPU test of H G versus K. The classical t test has greater power than the permutation t test for H G ; but is not a similar test of H c. If we could show that for a.e. z the numbers c α (z) and t m+n 2,α where just about equal, then the classical t test and the permutation t test would be almost identical. Theorem 2.4 If F F c has E F X 2 < and if 0 < lim inf(m/) lim sup(m/) <, then c α (z) z α where P ((0, ) > z α ) = α. t m+n 2,α 0. Since we also know that t m+n 2,α z α, it follows that c α (z) Proof. Let an urn contain balls numbered z,..., z. Let Y,..., Y n denote the numbers on n balls drawn without replacement. let z = z i, σz 2 = (z i z) 2, m = n. Then ( EY = z, and σ 2 V ar(y ) = n ) σ 2 z n. Moreover, by the Wald - Wolfowitz - oether - Hájek finite sampling CLT Y z σ d (0, ) as long as the oether condition (a) η max i z i z 2 z i z 2 0 holds. ow rewrite the permutation t statistic τ: note that Y X = Y m m X i m Y i + n m Y = (Y z), m

19 2. UBIASED TESTS; CODITIOAL TESTS; PERMUTATIO TESTS 9 and hence if τ = = = d 2 mn (Y X) { Z 2 i (P Z i ) 2 2 σ2 z 2 2 q Y z n ( n ) n mn (Y X)2 } (Y z) 2 n ( ) (Y z)/σ (Y z) 2 σ 2 Z 0 Z 2 = Z (0, ) Y z d Z (0, ) σ in probability or, better yet, almost surely; i.e. if ( Y z ) P t Z = z Φ(t) σ in probability or almost surely. But this holds under the present hypotheses in view of the finite - sampling CLT 2.5 which follows, if we can show that (b) η a.s. 0 where η is key quantity in the oether condition (a). To accomplish this note that even under the alternative hypothesis F G and E F X 2 <, E G Y 2 <, { } (Z i Z) 2 = Zi 2 Z 2 (m ) (n ) = S2 X + S2 Y a.s. λσx 2 + ( λ)σy 2 min{σx, 2 σy 2 } > 0 for any subsequence for which λ m/ λ, and hence the denominator of η (divided by ) has a positive limit inferior almost surely. To see that the numerator converges almost surely to zero, first recall that max i n X i /n a.s. 0 if and only if E F X <. Hence max i n X i 2 /n a.s. 0 if and only if E F X 2 <. Thus we rewrite the numerator divided by as { } max Z i Z 2 2 max i Z i 2 + Z 2 i 2 { max{max i m X i 2, max j n Y j 2 } + 2 max{ m max X i 2, i m a.s n max j n Y j 2 } + 2 ( m X + n Y ) 2 } ( m X + n Y ) 2

20 20 CHAPTER 6. TESTIG Hence (b) holds (even under the alternative if E F X 2 < and E G Y 2 < ). Theorem 2.5 (Wald - Wolfowitz - oether - Hájek finite - sampling central limit theorem). If 0 < lim inf(m/) lim sup(m/) <, then if and only if (4) Y z σ d Z (0, ) as η max i z i z 2 z i z 2 0 as. Moreover, sup t ( Y z P σ ) t Φ(t) 5 ( ) /4 m n η for all. Proof. See Hájek, Ann. Math. Statist. 32, For still better rates under stronger conditions, see Bolthausen (984).

21 3. IVARIACE I TESTIG; RAK METHODS 2 3 Invariance in Testing; Rank Methods 3. otation and Basic Results Let (X, A, P θ ) be a probability space for all θ Θ, and suppose θ θ implies P θ P θ. We observe X P θ. Suppose that g : X X is one-to-one, onto X, and measurable, and suppose that the distribution of gx when X P θ is some P θ = P gθ ; that is () P θ (gx A) = Pḡθ (X A) for all A A, or equivalently P θ (g A) = Pḡθ (A) for all A A; or, equivalently, P θ (A) = Pḡθ (ga) for all A A. Hence (2) E θ h(g(x)) = Eḡθ h(x). Suppose that ḡθ = Θ. Let G denote a group of such tranformations g. We want to test H : θ Θ H versus K : θ Θ K. Proposition 3. G is a group of one-to-one transformations of Θ onto Θ and is homomorphic to G. Proof. Suppose that ḡθ = ḡθ 2. Then P θ = P θ2 by (). Thus θ = θ 2 by assumption. Thus ḡ G is one-to-one. Closure, associativity, and identity are easy. If X P θ, then g X Pḡ θ, and (g 2 g )X = g 2 (g X) Pḡ2 ḡ θ, while (g 2 g )X P g2 g, so g 2 g = ḡ 2 ḡ. If X P θ, then g X P g θ, so g g X Pḡ g θ, while g g X = X P θ, so ḡ g = ē; thus ḡ = g, and G is a group. Definition 3. A group of one-to-one transformations of X onto X is said to leave the testing problem H versus K invariant provided ḡθ = Θ and ḡθ H = Θ H for all g G. 3.2 Orbits and maximal invariants Definition 3.2 x x 2 mod(g) if x 2 = g(x ) for some g G. Proposition 3.2 is an equivalence relation. Proof. Reflexive: x x since x = e(x ). Symmetric: g(x ) = x 2 implies g (x 2 ) = x. Transitive: x x 2 and x 2 x 3 implies x x 3 since g (x ) = x 2 and g 2 (x 2 ) = x 3 implies (g 2 g )(x ) = x 3.

22 22 CHAPTER 6. TESTIG Definition 3.3 The equivalence classes of are called the orbits of G. Thus orbit(x) = {g(x) : g G}. A function φ defined on the sample space X is invariant if φ(g(x)) = φ(x) for all x X and all g G. Proposition 3.3 A test function φ is invariant if and only if φ is constant on each orbit of G. Proof. This follows immediately from the definitions. Definition 3.4 A measurable function T : X R k for some k is a maximal invariant for G (or GMI), if T is invariant and T (x ) = T (x 2 ) implies x x 2. That is, T is constant on the orbits of G and takes on distinct values on distinct orbits. Theorem 3. Let T be a GMI. Then φ is invariant if and only if there exists a function h such that φ(x) = h(t (x)) for all x X. Proof. Suppose that φ(x) = h(t (x)). Then φ(gx) = h(t (gx)) = h(t (x)) = φ(x), so φ is invariant. On the other hand, suppose that φ is G invariant. Then T (x ) = T (x 2 ) implies x x 2 implies g(x ) = x 2 for some g G. Thus φ(x 2 ) = φ(gx ) = φ(x ); that is, φ is constant on the orbit. It follows that φ is a function of T. 3.3 Examples Example 3. (Translation group). Suppose that X = R n and G = {g : gx = x + c, c R}. Then T (x) = (x x n,..., x n x n ) is a GMI. Proof: Clearly T is invariant. Suppose that T (x) = T (x ). Then x i = x i (x n x n ) for i =,..., n, and this holds trivially for i = n. Thus x = g(x) = x + c with c = (x n x n ). Example 3.2 (Scale group). Suppose that X = {x R n : x n 0}, and G = {g : gx = cx, c R \ {0}}. Then T (x) = (x /x n,..., x n /x n ) is a GMI. Proof: Clearly T is invariant. Suppose that T (x) = T (x ). Then x i = (x n/x n )x i for i =,..., n, this holds trivially for i = n. Thus x = g(x) = cx with c = (x n/x n ). Example 3.3 (Orthogonal group). Suppose that X = R n and G = {g : gx = Γx, Γ an n n orthogonal matrix}. Then T (x) = x T x = n x2 i is a GMI. Proof: T (gx) = x T Γ T Γx = x T x, so T is invariant. Suppose that T (x) = T (x ). Then there exists Γ = Γ x,x such that x = Γx. Example 3.4 (Permutation group). Suppose that X = R n \ {ties}, and G = {g : g(x) = πx = (x π(),..., x π(n) ) for some permutation π = (π(),..., π(n)) of {,..., n}. ote that #(G) = n!. Then T (x) = (x (),..., x (n) ) x ( ), the vector of ordered x s is a GMI. Proof. T (gx) = T (πx) = x ( ) = T (x), so T is invariant. Moreover, if T (x ) = T (x), then x = πx for some π Π, so T is maximal.

23 3. IVARIACE I TESTIG; RAK METHODS 23 Example 3.5 (Rank transformation group). Suppose that X = {x R n : x i x j for all i j} = R n \ {ties}, and G = {g : g(x = (f(x ),..., f(x n )), f continuous and strictly increasing}. Then T (x) r = (r,..., r n ) where r i #{j n : x j x i } denotes the rank of x i (among x,..., x n ). Proof: T is clearly invariant. If T (x ) = T (x), then, relabeling if necessary, we have a picture as follows: Example 3.6 (Sign group). Suppose that X = R n and that G = {g, e} n where g(x) = x and e(x) = x. Then T (x) = ( x,..., x n ) is a GMI. Example 3.7 (Affine group). Suppose that X = {x R n : x n x n } and that G = {g : g(x) = ax + b with a 0, b R}. Then ( x x n T (x) =,..., x ) n 2 x n x n x n x n x n is a GMI. ote that ( x x T (x) =,..., x ) n x s s is also a GMI (on X {x R n : s > 0} where s 2 n n (x i x) 2 ). Remark 3. In the previous example G = G 2 G = scale translation = {g 2 g : g G, g 2 G 2 }. Then Y = (x x n, x 2 x n,..., x n x n ) is a G MI. In the space of the G MI we have Z = (y /y n,..., y n 2 /y n ) is a G 2 MI. Thus Z is the GMI. If this works, it is OK; see theorem 2 on page 28 of TSH. But it doesn t always work. When G = G 2 G, it does work if G is a normal subgroup of G. [Recall that G is a normal subgroup of G if and only if gg g = G for all g G.] Example 3.8 (Signed rank transformation group). Suppose that X = R \{ties} as in example 3.5 (but with instead of n), but now let and G = {g : g(x) = (f(x ),..., f(x )), f is odd, continuous, and strictly increasing}. Then T (x) = (r, s) = (r,..., r m, s,..., s n ) where r,..., r m denote the ranks of x i,..., x im among x,..., x and s,..., s n denote the rank of x j,..., x jn among x,..., x and where x i,..., x im < 0 < x j,..., x jn. Proof: T is clearly invariant. To show maximal invariance, the picture is much as in example 3.5, but with the function f being odd; see Lehmann TSH pages

24 24 CHAPTER 6. TESTIG Example 3.9 Suppose that X = {(x, x 2 ) : x 2 > 0} and that G = {g : g(x) = (x +b, x 2 ), b R}. Then T (x) = x 2 is a GMI. Example 3.0 Suppose that X = {(x, x 2 ) : x 2 > 0} as in example 3.9, but now suppose that the group G = {g : g(x) = (cx, cx 2 ), c > 0} or G = {g : g(x) = (cx, c x 2 ), c 0}. Then T (x) = x /x 2 is a GMI in the first case (c > 0), and T (x) = x /x 2 is a GMI in the second case (c 0). Example 3. Suppose that X = {(x, x 2, x 3, x 4 ) : x 3, x 4 > 0} and that G = {g : g(x) = (cx + a, cx 2 + b, cx 3, cx 4 ), a, b R, c > 0}. Then T (x) = x 3 /x 4 is a GMI. 3.4 UMP G-invariant tests Theorem 3.2 If T (X) is any G invariant function and if ν(θ) is a GMI, then the distribution of T (X) depends on θ only through ν(θ). Proof. Suppose that ν(θ ) = ν(θ 2 ). Then there exists ḡ G such that gθ = θ 2. Let g be the element of G corresponding to ḡ G. Then by () P θ (T (X) A) = P θ (T (gx) A) = Pḡθ (T (X) A) = P θ2 (T (X) A) for all A A. Thus the distribution of T is a function of ν(θ). Theorem 3.3 Suppose that H versus K is invariant under G. Let T (X) and δ ν(θ) denote the GMI and the GMI; and suppose both are real-valued. Suppose that the densities p δ (t) = (dpδ T /dµ)(t) with respect to some σ finite measure µ have MLR in T ; and suppose that H versus K is equivalent to H : δ δ 0 versus K : δ > δ 0. Then there exists a UMP G invariant level α test of H versus K given by if T > c ψ(t ) = γ if T = c 0 if T < c with E δ0 ψ(t ) = α. Proof. By theorem 3. any G invariant test φ is of the form φ = ψ(t ). By theorem 3.2, the distribution of T depends only on δ. Thus our theorem for UMP tests when there is MLR completes the proof. Example 3.2 Tests of σ 2 for (µ, σ 2 ). Let X,..., X n be i.i.d. (µ, σ 2 ). consider testing H : σ σ 0 versus K : σ > σ 0. Then G = {g : g(x) = x + c, c R} leaves H versus K invariant. By sufficiency we can restrict attention to tests based on X and S n (X i X) 2. let G denote the induced group g (X, S) = (X + c, S). Thus S is a G MI by example 3.9. ow S σ 2 χ 2 n, which has MLR in S. Thus by theorem 3.3, the UMP G invariant test of H versus K rejects H if S > σ0 2χ2 n,α. By theorem 6.5.3, Lehmann and Romano, TSH (2005), page 229, it is also the UMP G-invariant test; also see Ferguson, page 57. (Recall from chapter 2 that this optimal normal theory test has undesirable robustness of level problems when the data fail to be normally distributed.)

25 3. IVARIACE I TESTIG; RAK METHODS 25 Example 3.3 Two-sample t - test. Let X,..., X m be i.i.d. (µ, σ 2 ) and Y,..., Y n be i.i.d. (ν, σ 2 ), and consider testing H : ν µ versus K : ν > µ. By sufficiency we can restrict attention to tests based on (X, Y, S) with S = (X i X) 2 + (Y i Y ) 2. Then the group G = {g : g(x) = ax + b, a > 0, b R} leaves H versus K invariant and if G denotes the induced group g (X, Y, S) = (ax + b, ay + b, a 2 S), then T (X, Y, S) = (Y X)/ S is a G MI. ote that t mn (Y X) S 2 = mn ( 2)T t m+n 2(δ) with δ mn/(ν µ)/σ, and that H versus K is equivalent to H : δ 0 versus K : δ > 0. Since the non-central t distributions have MLR, the UMP G invariant test of H versus K is the two-sample t test, reject H if t > t m+n 2,α. Example 3.4 (Sampling inspection by variables). Let Y, Y,..., Y n be i.i.d. (µ, σ 2 ). Let p P (Y y 0 ) P (good) for some fixed number y 0. Consider testing H : p p 0 versus K : p < p 0. ow ( Y y0 (µ y 0 ) p = P (Y y 0 ) = P y ) 0 µ σ σ ( X θ = P θ ) σ σ where X i Y i y 0 (θ µ y 0, σ 2 ) = Φ( θ/σ) = Φ(θ/σ), or θ/σ = Φ ( p). Thus, on the basis of X,..., X n we wish to test H : θ/σ c 0 Φ ( p 0 ) versus K : θ/σ > c 0. ow X, S = S 2 are sufficient. Also, H versus K is invariant under the group of example 3.0 with c > 0; and a GMI in the space of the sufficient statistic is T = nx/s. ow T t n (δ) where δ nθ/σ, and the family of distributions has MLR in T. Also H versus K is equivalent to H : δ δ 0 nφ ( p 0 ) versus K : δ > δ 0. Thus the UMP G invariant level α test of H versus K rejects H if T > t n,α (δ 0 ). [ote the use of the non-central t-distribution as a null distribution here!] Example 3.5 (AOVA - General Linear Model). The canonical form of AOVA is as follows: Θ : Z n (η, σ 2 I), η V k R n where V k is a subspace of R n with dimension k < n, and η i = 0, i = k +,..., n, Θ 0 : Z n (η, σ 2 I), η V k r R n where V k r is a subspace of V k with dimension r < k, and and η i = 0, i =,..., r, k +,..., n.

26 26 CHAPTER 6. TESTIG We let and, finally G {g : g z = (z,..., z r, z r+ + r+,..., z k + k, z k+,..., z n ), with i R}, { g2 : g G 2 = 2 z = (z,..., } z r, z r+,..., z k, z k+,..., z n ), (z,...,, z r ) an orthogonal transformation of (z,..., z r ) { g3 : g G 3 = 3 z = (z,..., z r, z r+,..., z k, zk+,..., } z n), (zk+,...,, z n) an orthogonal transformation of (z k+,..., z n ) G 4 = {g r : g 4 z = cz, where c 0}; G G 4 G 3 G 2 G {g 4 g 3 g 2 g : g i G i, i =,..., 4}. Then H versus K is invariant under G. ow T (z) = (z,..., z r, z k+,..., z n ) is a G MI. In the space of the G MI, a G 2 MI is T 2 (z) = ( r z2 i, z k+,..., z n ). In the space of the G 2 G MI, a G 3 MI is T 3 (z) = ( r z2 i, n i=k+ z2 i ). In the space of the G 3 G 2 G MI, a G 4 MI is T (z) = ((n k)/r) ( r z2 i / n ) k+ z2 i. ow T (z) is a GMI; thus any G invariant test function for H versus K is a function of T (z) by theorem 3.. Similarly, and (σ 2, η,..., η r ) is a G MI; r (σ 2, ηi 2 ) is a G 2 G MI; and a G 3 G 2 G MI; δ 2 λ 2 = r η2 i σ 2 is a GMI. Thus the distribution of any invariant test depends only on δ 2. ow T F r,n k (δ 2 ), which has MLR in T. Also, H versus K is equivalent to H : δ = 0 versus K : δ > 0. Thus the UMP G invariant test of H versus K rejects H when T > F r,n k,α. Reduction to canonical form The above analysis has been developed for the linear model in canonical form. ow the question is: how do we reduce a model stated in a more usual way to the canonical form? Suppose that X n (ξ, σ 2 I) where ξ EX = Aθ L, where A is a (known) n k matrix of rank k, θ is a k vector of (unknown) parameters, and L is the k dimensional subspace of R n spanned by the columns of the matrix A. Let B be a given r k matrix, and consider testing H : Bθ = 0 or ξ L where L is a (k r) dimensional subspace of R n. To transform this form of the testing problem to canonical form, let T be an n n orthogonal matrix with: (i) the last n k rows of T are orthogonal to L; i.e. orthogonal to the columns of A.

27 3. IVARIACE I TESTIG; RAK METHODS 27 (ii) the rows r +,..., k of T span L. Then set Z = T X. We compute and note that: η EZ = T Aθ = T ξ, (a) η k+ = = η n = 0 always by (i). (b) η = = η r = 0 under H by (ii) since the first r rows of T are orthogonal to L. ow we will re-express the F statistic we have derived in terms of the X s: S 2 (η) (Z i η i ) 2 = i=k+ Z 2 i k (Z i η i ) 2 + i=k+ by taking η i = ˆη i = Z i, i =,..., k. But since T is orthogonal, η = T ξ, and Z = T X, (3) (4) so that (5) S 2 (η) = Z η 2 = (Z η) T (Z η T ) = (X ξ) T (X ξ T ) = (X i ξ i ) 2 min ξ L (X i ξ i ) 2 = (X i ˆξ i ) 2 = i=k+ where ˆξ is the Least Squares (LS) estimator of ξ under ξ = Aθ L. Similarly, under H : ξ L (or η = = η r = 0 in the canonical form), Z 2 i Z 2 i S 2 (η) = r k Zi 2 + (Z i ηi 2 ) + r Zi 2 + i=r+ k+ Z 2 i i=k+ Z 2 i by taking η i = ˆη i = Z i for i = r +,..., k, and hence by (4) (6) min ξ L (X i ξ i ) 2 (X i ˆˆξi ) 2 = r Zi 2 + i=k+ Z 2 i where ˆξ is the least squares estimate of ξ under the hypothesis ξ L. Combining (5) and (6) yields r Zi 2 = (X i ˆˆξi ) 2 (X i ˆξ i ) 2 ;

28 28 CHAPTER 6. TESTIG here L is a subspace of dimension k r contained in L, which is a subspace of dimension k contained in R n. ow since X ˆξ L (7) X ˆξ L; in particular, X ˆξ ˆξ ˆξ L. Hence X ˆξ 2 = X ˆξ 2 + ˆξ ˆξ 2 by (7), and we have r Zi 2 = ˆξ ˆξ 2 = (ˆξ i ˆˆξi ) 2, and the F statistics which yields the UMP G invariant test of H : ξ L versus K : ξ / L may be written as { n n F = (ˆξ i ˆˆξi ) 2 /r n (X i ˆξ i ) 2 /(n k) = (X i ˆˆξi ) 2 n (X i ˆξ } i ) 2 /r n (X i ˆξ. i ) 2 /(n k) To re-express the noncentrality parameter of the distribution of F under the alternative hypothesis in terms of ξ (instead of η), let ξ L, and let ξ 0 denote the projection of ξ onto L : thus ξ = ξ 0 + (ξ ξ 0 ) where ξ 0 L and ξ ξ 0 L. Then δ 2 = r ηi 2 /σ 2 = {ˆξ i (ξ) ˆˆξi (ξ)} Rank tests First we need to be able to compute probabilities for rank vectors. Our first job here is to develop a fundamental formula due to Hoeffding which allows us to do this. Let Z,..., Z be independent real-valued random variables with densities f,..., f respectively. Let R i rank of Z i in Z,..., Z = #{j : Z j Z i } = F (Z i ) for i =,..., where F is the empirical distribution of the Z i s. Thus P (R = r) = f (z ) f (z )dz dz S where S {z : R i (z) = r i, i =,..., } = {z : z d < < z d } where d = r, the inverse permutation, r d = r r = e. (Example: = 3; z = (0, 5, 8). Then r = (3,, 2) and d = (2, 3, ).) Hence, letting z di v i, S = {V < < V },

29 3. IVARIACE I TESTIG; RAK METHODS 29 and P (R = r) = f (v r ) f (v r ) v < <v!h(v r ) h(v r )!h(v ) h(v )dv =! E f (V (r )) f (V (r )) h(v (r )) h(v (r )) where V () < < V () are the order statistics of a sample of size from h. This formula is one version of Hoeffding s formula. Of course, sometimes direct calculation succeeds immediately. Here are two simple, but important, examples: Example 3.6 Suppose that F i = F i P (R = e) = P (X < < X ) = = = = = with i > 0, i =,..., and F continuous. Then 0 u u 0 u 2 u i=3 i i j=. j x < <x i u i i du i This yields any probability P (R = r), r Π, by relabeling: P (R = r) = di i j= d j. df i (x i ) i u i i 2 u du 2 du Example 3.7 (Proportional hazards alternative). Similarly, suppose that ( F i ) = ( F ) i with i > 0, i =,..., and F continuous; this is equivalent to Λ i log( F i ) = i { log( F )} = i Λ, the proportional hazards model. Then P (R = e) = P (X < < X ) = = = = 0 u u i j=i. j x < <x i ( u i ) i du i This yields any probability P (R = r), r Π, by relabeling: P (R = r) = di j=i d j. d{ ( F (x i )) i }

30 30 CHAPTER 6. TESTIG ow suppose that X,..., X m are i.i.d. F and Y,..., Y n are i.i.d. G, F, G F c ; and let G denote the group of all strictly increasing continuous transformations of the real line onto itself, example 3.5. Proposition 3.4 A. The two-sample problem of testing H : F = G versus K : F < s G, F, G F c, is invariant under G. B. The rank vector R is a G MI. C. ψ(u) = G F (u) is a G MI. D. The ordered Y ranks Q < < Q n are sufficient for R; Q i H (G n (i/n)), i =,..., n. E. Hoeffding s formula: suppose that f and g have densities f and g respectively, and that f(x) = 0 implies g(x) = 0. Then P (Q = q) = ( f n)e n g(v (qj )) j= f(v (qj )) where V () < < V () are the order statistics of a sample of size from F. Furthermore, this probability may be rewritten as P (Q = q) = ( n ψ n)e (U (qj )) j= where U () < < U () are the order statistics of a sample of Uniform(0, ) random variables. Proof. Statements A - C follow easily from the preceding development. To prove E, we specialize Hoeffding s formula by taking f i = f for i =,..., m, f i = g for i = m +,...,, and h = f. Then P (R = r) = n! E F g f (V (r m+j )) = n! E F g f (V (q j )). Hence j= j= P (Q = q) = P (R = r) = n! E F g f (V (q j )) r: q(r)=q j= = m!n! n! E F g f (V (q j )). j= r: q(r)=q ote that the claimed sufficiency of Q for R in D follows from these computations. To see the second formula, note that ψ (u) = (g/f)(f (u)), and that (F (U (),..., F (U () )) d = (V (),..., V () ). Here are several applications of Hoeffding s formula: we use the preceding results to find locally most powerful rank tests in several different two-sample testing problems: location, scale, and Lehmann alternatives of the proportional hazards type.

31 3. IVARIACE I TESTIG; RAK METHODS 3 Proposition 3.5 (Locally most powerful rank test for location). Suppose that F has an absolutely continuous density f for which f (x) dx <. Then the locally most powerful rank test of H : F = G versus K : G = F ( θ) with θ > 0 is of the form if S ( ) n j= E F f f (V (q j )) > k α φ(q) = γ if S = k α 0 if S < k α where V () < < V () are the order statistics in a sample of size from F. Proof. For a rank test, φ = φ(q), we want to maximize the slope of the power function at θ = 0: i.e. to maximize the slope at θ = 0 of β φ (θ) = E θ φ(q) = q φ(q)p θ (Q = q). To do this we clearly want to find those q for which d dθ P θ(q = q) θ=0 is maximum. But, by using proposition 3.4 and differentiation under the expectation (which can be justified by the assumption that f (x) dx < ), { d dθ P θ(q = q) = d n } f(v (qj ) θ) ( θ=0 dθ )E F f(v n (qj )) θ=0 { } d n f(v (qj ) θ) = ( )E F dθ f(v n (qj )) θ=0 = ( F n)e f f (V (q j )) j= since d dθ n j= f(x j θ) f(x j ) = = n k= j=,j k k= f f (x k). f(x j θ) f(x j ) { f (x k θ) } θ=0 f(x k ) θ=0 Example 3.8 If F is (µ, σ 2 ), then without loss (by the monotone transformation g(x) = (X µ)/σ, we may take F = Φ, the standard (0, ) distribution function. Then (f /f)(x) = x, so E{( f /f)(v (i) )} = E(Z (i) ) where Z () < < Z () are the order statistics of standard normal ((0, )) random variables, and S = n j= E(Z (q j )). ote that E(Z (i) ) may be approximated by Φ (i/( + )), or by Φ ((3i )/(3 + )).

32 32 CHAPTER 6. TESTIG Example 3.9 If F is logistic, f(x) = e x /( + e x ) 2, then f = F ( F ), and f /f = 2F. Since F (V (i) ) = d U (i) where U,..., U are uniform(0, ) random variables with EU (i) = i/( + ), the LMPRT of F versus G = F ( θ) rejects H for large values of S = n j= Q j; this is the Wilcoxon statistic. Proposition 3.6 (Locally most powerful rank test for scale). Suppose that F has an absolutely continuous density f for which xf (x) dx <. Then the locally most powerful rank test of H : F = G versus K : G = F ( /θ) with θ > is of the form if S n j= a (q j ) > k α φ(q) = γ if S = k α 0 if S < k α where a (i) E F { V (i) f f (V (i))} and V () < < V () are the order statistics in a sample of size from F. Example 3.20 If f(x) = e x [0, ) (x), then (f /f)(x) =, and hence a (i) = E F { + V (i) } = E F {V (i) } where V (i) are the order statistics of a sample of size from F. But V (i) d = where Z j are i.i.d. exponential(), and hence i j= Z j j + E(V (i) ) = i j= j + = k= i+ k = E{ log( U (i))} since F (t) = log( t). These are the Savage scores for testing exponential scale change; the approximate scores are ( a (i) = log i ), i =,...,, + and the resulting test is sometimes called the log-rank test. Its modern derivation in survival analysis is via different considerations which allow for the introduction of censoring, and rewritten in a martingale framework. [Recall that k= k log γ =.5772, Euler s constant, as, so k= i+ k = = k= k= i k k k= k log = log( i ) ( i k= ) log( i) log( i k )

33 3. IVARIACE I TESTIG; RAK METHODS 33 for large.] ote that when F is exponential(), then ( G(x)) = F (x/θ) = exp( x/θ) = ( F (x)) /θ, or, Λ G = (/θ)λ F = Λ F with = /θ. Hence ψ(u) = ( u) /θ = ( u), and ψ (u) = ( u). Since the distribution of the ranks is the same for all (F, G) pairs with the same ψ, it follows that in fact the Savage test is the locally most powerful rank test of H : F = G versus the Lehmann alternative K : ( G) = ( F ), <. Example 3.2 If F is (0, σ 2 ), the LMPRT of F = G versus K : G = F ( /θ), θ >, rejects for large values of S n j= a (Q j ) where a (i) E(Z 2 (i) ) and Z () < < Z () is an ordered sample from (0, ). The approximate scores are (Φ (i/( + )) 2. Remark 3.2 ote that any rank statistic of the form S can be rewritten in terms of empirical distributions as follows: S = a (Q j ) = j= a (R m+j ) = j= a (i)z i where Z i = 0 or according as the ith largest of the combined sample is an X or Y. Let H (x) = empirical df of the combined sample. Then H (i/) = ith largest of the combined sample, ng n (H (i/)) = the number of Y i s H (i/), and Z i = {ng n (H )}(i/) where h(y) h(y) h(y ). Therefore we can write S = = n a (i)z i = 0 φ (u)dg n (H (u)) a (i) {ng n (H )}(i/) where φ (u) a (i){(i )/ < u i/} for 0 < u <. If φ φ and λ λ, then it is often true that under alternatives F G, n S = where H = λf + ( λ)g. 0 φ (u)dg n H (u) a.s. 0 φ(u)dg H (u)

34 34 CHAPTER 6. TESTIG 4 Efficiency of Tests 4. The Power of two tests Example 4. (Power of the one-sample t test:) Let X,..., X n be i.i.d. (θ, σ 2 ). We wish to test H : θ θ 0 versus K : θ > θ 0. The classical test of H versus K rejects H when t n n(x θ 0 )/S > t n,α. (i) This test has asymptotically correct level of significance (assuming E(X 2 ) < as we have by hypothesis) since, with Z (0, ), P θ0 (t n > t n,α ) P (Z > z α ) = α. (ii) This test is consistent since, when a fixed θ > θ 0 is true t n = d n(x θ) n(θ θ0 ) + S S Z + = and t n,α z α so that P θ (t n > t n,α ). (iii) If X,..., X n are i.i.d (θ n, σ 2 (θ 0 + n /2 c n, σ 2 ) where c n c, then n(x θn ) t n = + c n S S d Z + c σ (c/σ, ). Let β t n(θ) denote the power of the t test based on X,..., X n against the alternative θ. Then () (2) β t n(θ n ) = β t n(θ 0 + n /2 c n ) = P θ0 +c n/ n(t n > t n,α ) P ((c/σ, ) > z α ). Example 4.2 Let X,..., X n be i.i.d. with d.f. F = F 0 ( θ) where F 0 has unique median 0 (so that F 0 (0) = /2). We wish to test H : θ θ 0 versus K : θ > θ 0. Let Y i {X i θ 0 } = [θ0, )(X i ) for i =,..., n. The sign test of H versus K rejects H when S n n(y n /2) exceeds the upper α percentage s n,α of its distribution when θ 0 is true. (i) When θ 0 is true, Y i is Bernoulli(/2) so that S n d Z/2 (0, /4). Since the exact distribution of ny n = n Y i is Binomial(n, /2) for all d.f. s F as above, the test has exact level of significance α for all such F. (ii) This test is consistent, since when a θ exceeding θ 0 is true S n = n(y P θ (X θ 0 )) + n{p θ (X > θ 0 ) /2)} d (0, p( p)) + = with p F (θ 0 θ) > /2 so that P θ (S n > s n,α ). (iii) If X,..., X n are i.i.d. F 0 ( (θ 0 + n /2 d n )) where d n d as n and where we now assume that F 0 has a strictly positive derivative f 0 at 0. Then, using F 0 (0) = /2, we have S n = n(y P θ0 +d n/ n(x θ 0 )) + n{p θ0 +d n/ n(x > θ 0 ) /2)} = n /2 { Binomial(n, F 0 ( d n / n)) n( F 0 ( d n / n)) } + n(f 0 (0) F 0 ( d n / n)) d Z/2 + df 0 (0) (df 0 (0), /4).

Chapter 6 Testing. 1. Neyman Pearson Tests. 2. Unbiased Tests; Conditional Tests; Permutation Tests

Chapter 6 Testing. 1. Neyman Pearson Tests. 2. Unbiased Tests; Conditional Tests; Permutation Tests Chapter 6 Testing. eyman Pearson Tests 2. Unbiased Tests; Conditional Tests; Permutation Tests 2. Unbiased tests 2.2 Application to -parameter exponential families 2.3 UMPU tests for families with nuisance