Nonparametric one-sided testing for the mean and related extremum problems Norbert Gaffke University of Magdeburg, Faculty of Mathematics D-39016 Magdeburg, PF 4120, Germany E-mail: norbert.gaffke@mathematik.uni-magdeburg.de Abstract We consider the nonparametric model of n i. i. d. nonnegative real random variables whose distribution is unknown. An interesting parameter of that distribution is its expectation µ. Wang & Zhao (2003) studied the problem of testing the one-sided hypotheses H 0 : µ µ 0 vs. H 1 : µ > µ 0 (with a given µ 0 > 0, where w.l.g. one may take µ 0 = 1). For n = 1 there is a UMP nonrandomized level test. Somewhat surprisingly, for n = 2 Wang & Zhao obtained a UMP nonrandomized monotone symmetric level test. However, they conjectured that the result will not carry over to larger sample size n 3. Unfortunately, their conjecture is true as we will show. Also, we present an alternative proof of their (positive) result for n = 2. Our derivations are based on a study of related classes of extremum problems on products of probability measures. Key words: Monotone symmetric test, UMP test, order statistics, probability measures, weak topology, semi-continuity. 2000 Mathematics Subject Classification: Primary 62G10, secondary 28C15. 1 Introduction Let X 1,..., X n be nonnegative real i.i.d. random variables whose distribution P X i is unknown. We are interested in the expectation parameter µ = E(X i ). Note that µ [ 0, ]. Consider the one-sided testing problem, H 0 : µ µ 0 vs. H 1 : µ > µ 0, for a given µ 0 > 0. W.l.g. we may assume that µ 0 = 1, since for any given value µ 0 > 0 we can transform to that case via X i = X i /µ 0 (1 i n). So in the following we will restrict to the testing problem H 0 : µ 1 vs. H 1 : µ > 1. (1) The recent paper [1] deals with testing the one-sided hypotheses (1). For n = 1 the UMP nonrandomized level test is given by φ = 1 [1/, ). For n = 2 there was derived in [1] a UMP test within a class of tests slighly larger than the class of all nonrandomized 1
monotone (nondecreasing) symmetric level tests (see Section 3). This is is a highly nontrivial and somewhat surprising result. Note that a test φ is said to be symmetric iff it is a symmetric function of the observations x 1,..., x n, and φ is said to be monotone (nondecreasing) iff x, z [ 0, ) n and x z (w.r.t. the componentwise semi-ordering) imply φ(x) φ(z). Unfortunately, that result cannot be extended to larger sample size n 3, as we will show in Section 5 and as it was conjectured in [1], p. 76. Also, we will give a proof of the positive result for n = 2 which, as we think, is more tranparent than that presented in [1]. Our derivations are based on a study of related classes of extremum problems. E. g., an extremum problem arises when the true level of a given test for (1) has to be determined. The auxiliary results on such extremum problems presented in Section 2 might also be of some independent interest. 2 2 Related extremum problems Let S be a nonempty compact subset of R n, and let Prob(S) denote the set of all Borel probability measures on S. Endowed with the weak topology (inducing weak convergence) Prob(S) is a compact space (cf. [2], Satz 31.2 and Bemerkung 1 on p. 237). Clearly, Prob(S) is a convex set, i.e., if Q 1, Q 2 Prob(S) and 0 λ 1 then λq 1 + (1 λ)q 2 Prob(S). A real function g : S R is said to be upper semi-continuous (u.s.c.) iff for any given u R the set { z S : g(z) u} is closed. Equivalently, g is u.s.c. iff for any sequence z k S (k N) converging to some z 0 S one has lim sup k g(z k ) g(z 0 ). Similarly, we have the notion of upper semi-continuity of a real function on Prob(S). A function G : Prob(S) R is said to be u.s.c. iff for any given u R the set { Q Prob(S) : G(Q) u } is a closed set w.r.t. the weak topology. Equivalently, the u.s.c. property of G means that whenever Q k Prob(S) (k N) is a sequence converging weakly to some Q 0 Prob(S) then lim sup k G(Q k ) G(Q 0 ). Lemma 2.1 Let S be a nonempty compact subset of R n and g : S R be a nonnegative u.s.c. function. Then, by G(Q) = E Q (g) (the expectation of g w.r.t. Q), for all Q Prob(S), we have a nonnegative real u.s.c. function on Prob(S). Proof. Firstly, we consider the special case that g is the indicator function of a closed subset C of S, i.e., g = 1 C on S. Then G(Q) = Q(C) for all Q Prob(S). If Q k (k N) is a sequence in Prob(S) which converges weakly to some Q 0 Prob(S), then by [3], Theorem 2.1, we have lim sup k Q k (C) Q 0 (C), i.e., the function G is u.s.c. Now let g be an arbirary nonnegative u.s.c. function on S. As an u.s.c. function on the compact set S the function g is bounded above, and hence 0 g(z) v for all z S with some finite v > 0. In particular, G(Q) = E Q (g) is nonnegative and finite for all Q Prob(S). Also, together with a well-known formula for expectations of nonnegative functions, we have for all Q Prob(S), abbreviating C g (u) = { z S : g(z) u }, E Q (g) = 0 Q( C g (u) ) du = v 0 Q( C g (u) ) du. Let Q k be a sequence in Prob(S) which converges weakly to Q 0 Prob(S). Since for any 0 u v the set C g (u) is closed, we have by the above, lim sup k Q k ( C g (u) ) Q 0 ( C g (u) ) for all 0 u v, and hence (by a result from integration theory, cf. [2],
3 p. 100, Ex. 1), lim sup k v 0 Q k ( C g (u) ) du v i.e., lim sup k G(Q k ) G(Q o ), and so G is u.s.c. 0 v lim sup Q k ( C g (u) ) du Q 0 ( C g (u) ) du, k 0 The support of a probability measure Q Prob(S), denoted by supp(q), is the smallest closed subset of S which has Q-probability equal to one. That is, supp(q) is a closed subset of S with Q(supp(Q)) = 1, and if A is any closed subset of S with Q(A) = 1 then supp(q) A. More explicitly, supp(q) consists of all points z S such that for each open neighborhood U z of z in R n one has Q( U z S ) > 0. Clearly, if Q is an r-point distribution (where r N) giving probabilities q 1,..., q r > 0 to distinct points z 1,..., z r S, resp., (where r i=1 q i = 1), for short ( ) z1... z Q = r, q 1... q r then supp(q) = { z 1,..., z r }. Theorem 2.2 For a given real number c > 1 consider the compact interval I = [ 0, c ] of the real line, and let g : I R be a nonnegative u.s.c. function. Consider the extremum problem (EP1) maximize G(Q) = E Q (g) s.t. Q Prob(I), E[Q] = 1, where E[Q] = z dq(z) denotes the expectation of Q. Then: I (i) There exists an optimal solution Q 0 to problem (EP1). (ii) Let Q 0 Prob(I) with E[Q 0 ] = 1 be given. Q 0 is an optimal solution to (EP1) if and only if there exist ρ [ 0, ) and τ R such that g(z) ρ + τz z I and g(z) = ρ + τz z supp(q 0 ). (iii) There exists a one- or two-point distribution Q 0 which is an optimal solution to (EP1). Proof. The feasible region of (EP1) is compact w.r.t. the weak topology and, by Lemma 2.1, the objective function G is u.s.c. w.r.t. the weak topology. This proves (i). To prove the only if part of (ii), let Q 0 be an optimal solution to (EP1). Consider the subset of R 2, C = { ( E[Q], G(Q) ) : Q Prob(I) }. Because of E[λQ 1 + (1 λ)q 2 ] = λe[q 1 ] + (1 λ)e[q 2 ] and G(λQ 1 + (1 λ)q 2 )) = λg(q 1 ) + (1 λ)g(q 2 ) for all Q 1, Q 2 Prob(I) and all 0 λ 1, the set C is convex. Denote by γ the optimum value of (EP1), i.e., γ = G(Q 0 ). So the point (1, γ) belongs to C and, clearly, it must be a boundary point of C. So there is a supporting straight line for C at (1, γ), i.e., there exist a, b R not both equal to zero and such that ae[q] + bg(q) a + bγ Q Prob(I). (2)
4 Specializing to one-point distributions, (2) yields az + bg(z) a + bγ z I. (3) The expectation w.r.t. Q 0 of the l.h.s. of (3) equals a + bγ, and hence az + bg(z) = a + bγ Q 0 -a.s. (4) Case 1: b < 0. Then (3) rewrites as g(z) a b + γ a b z z I. Suppose that there is some z 0 I such that the strict inequality holds, i.e., g(z 0 ) > a b + γ a b z 0. Then we can construct a one- or a two-point distribution Q 1 Prob(I) which has z 0 as a support point and with E[Q 1 ] = 1. So Q 1 is feasible for (EP1) and G(Q 1 ) = E Q1 (g) > γ, which is a contradiction. Hence it follows that g(z) = a b + γ a b z z I, and thus, taking ρ = a +γ and τ = a, we have g(z) = ρ+τz for all z I. In particular, b b ρ = g(0) 0 and Q 0 satisfies the necessary condition of (ii). Case 2: b = 0. Then a 0 and (3) rewrites as az a for all z I. This yiels either z 1 for all z I or z 1 for all z I, which is a contradiction (I = [ 0, c ] and c > 1). So actually, case 2 cannot occur. Case 3: b > 0. Then, with ρ = a + γ and τ = a, (3) and (4) rewrite as b b g(z) ρ + τz z I, and g(z) = ρ + τz Q 0 -a.s. The latter can be stated as ( ) Q 0 {z I : g(z) ρ + τz} = 1, and since g is u.s.c. the set of all z I with g(z) ρ + τz is closed. So the support of Q 0 must be a subset of that set, and we have g(z) = ρ + τz z supp(q 0 ). Clearly, ρ 0 follows from 0 g(0) ρ. To prove the if part of (ii), let Q 0 Prob(I) with E[Q 0 ] = 1, and ρ 0, τ R such that g(z) ρ + τz for all z I and g(z) = ρ + τz for all z supp(q 0 ). Then, for any Q Prob(I) with E[Q] = 1, we obtain E Q (g) ρ + τ and E Q0 (g) = ρ + τ, hence G(Q) G(Q 0 ), and so Q 0 is an optimal solution to (EP1). It remains to prove (iii). Choose any optimal solution Q 0 of (EP1) according to (i), and choose ρ and τ according to (ii). If 1 supp(q 0 ) then the one-point distribution at 1, δ 1
5 say, is trivially feasible to (EP1), and G(δ 1 ) = g(1) = ρ + τ = E Q0 (g) = G(Q 0 ), i.e.. δ 1 is also an optimal solution to (EP1). Now let 1 supp(q 0 ). Because of E[Q 0 ] = 1, the support of Q 0 is neither a subset of the subinterval [ 0, 1 ] nor a subset of the subinterval [ 1, c ]. Hence there exist z 1, z 2 supp(q 0 ) with z 1 < 1 < z 2. Choose 0 < λ < 1 with λz 1 + (1 λ)z 2 = 1. So the two-point distribution ( ) z1 z Q 1 = 2 λ 1 λ is feasible for (EP1) and g(z) = ρ + τz on the support {z 1, z 2 } of Q 1. By (ii), Q 1 is also an optimal solution to (EP1). Now, for I = [ 0, c ] with a given finite c > 1 as above, we consider the n-dimensional cube I n, where n N, n 2. For Q 1,..., Q n Prob(I) we denote by n i=1 Q i the product of Q 1,..., Q n, i.e., the joint distribution of n stochastically independent random variables X 1,..., X n with distributions Q 1,..., Q n, resp., which is a member of Prob(I n ). If Q 1 =... = Q n = Q, say, then we write Q n instead of n i=1 Q. Let g : In R be a given nonnegative u.s.c. function on the cube I n. Firstly, we will restrict to the case that g is symmetric, i.e., permutationally invariant, g(z π(1),..., z π(n) ) = g(z 1,..., z n ) for all (z 1,..., z n ) I n and all permutations π of 1,..., n. We consider the following extremum problem. (EP2) maximize G n (Q) = E Q n(g) s.t. Q Prob(I), E[Q] = 1. A necessary condition that a given feasible (for (EP2)) Q 0 is an optimal solution to (EP2) is that the directional derivatives of the objective function at Q 0 for all feasible directions are nonpositive, i.e., 1 lim ε 0+ ε ( G n ( (1 ε)q 0 + εq ) G n (Q 0 ) ) 0 Q Prob(I) with E[Q] = 1. (5) In fact, the directional derivative (the limit on the l.h.s. of (5)) exists and is given by ( ) n E Q n 1 0 Q (g) G n(q 0 ), (6) which can be seen as follows. By the symmetry of g we get for any 0 < ε < 1, G n ( (1 ε)q 0 + εq ) = E ( (1 ε)q0 +εq) n (g) = (1 ε) n E Q n 0 (g) + n 1 (1 ε) j ε n j( n j=1 j) EQ j 0 Qn j (g) + ε n E Q n(g). Subtracting by G n (Q 0 ) = E Q n 0 (g), dividing by ε, and letting ε 0 the limit becomes ( ) (1 ε) n 1 G ε n (Q 0 ) + ( n n 1) EQ n 1 0 Q (g), lim ε 0+
6 (1 ε) and formula (6) follows from lim n 1 ε 0+ ε expectation in (6) can be written as = n and ( n n 1) = n. We note that the E Q n 1 0 Q (g) = E Q(g Q0 ), where (7) g Q0 (t) = g(z 1,..., z n 1, t) dq n 1 0 (z 1,..., z n 1 ) t I. (8) I n 1 Theorem 2.3 Let I = [0, c ] with a real c > 1, let n N, n 2, and g : I n R be nonnegative, symmetric, and u.s.c. Then, for the extremum problem (EP2) above one has: (i) There exists an optimal solution to (EP2). (ii) If Q 0 is an optimal solution to (EP2) then there exist ρ [ 0, ) and τ R such that the function g Q0 from (8) satisfies g Q0 (t) ρ + τt t I and g Q0 (t) = ρ + τt t supp(q 0 ), and for the optimum value of (EP2) we have G n (Q 0 ) = ρ + τ. Proof. By Lemma 2.1 the function G : Prob(I n ) R, G(R) = E R (g) is u.s.c. w.r.t. the weak topology on Prob(I n ). Since Q Q n is a continuous mapping from Prob(I) to Prob(I n ) (w.r.t. the weak topologies), we see from G n (Q) = G(Q n ), Q Prob(I), that the objective function G n of (EP2) is u.s.c. By the compactness of the feasible region of (EP2) statement (i) follows. To prove (ii), let Q 0 be an optimal solution to (EP2). Then, for any Q Prob(I) with E[Q] = 1 and any 0 < ε < 1 the convex combination (1 ε)q 0 + εq is again feasible for (EP2) and hence G n ( (1 ε)q 0 + εq ) G n (Q 0 ). So the directional derivatives from (5) must be nonpositive and together with (6), (7), (8) we get E Q (g Q0 ) E Q0 (g Q0 ). In other words, Q 0 is also an optimal solution to the extremum problem maximize G Q0 (Q) = E Q (g Q0 ) s.t. Q Prob(I), E[Q] = 1, which is of type (EP1) considered in Theorem 2.2. To apply Theorem 2.2 we have to verify the conditions that g Q0 is nonnegative real and u.s.c. As a nonnegative u.s.c. function g satisfies 0 g(z) v for all z I n for some finite v > 0, from which we immediately get 0 g Q0 (t) v for all t I. Let t k I (k N) be a sequence converging to some t 0 I. The u.s.c. property of g implies lim sup k g(z 1,..., z n 1, t k ) g(z 1,..., z n 1, t 0 ) for all (z 1,..., z n 1 ) I n 1, and thus lim sup k g Q0 (t k ) = lim sup k I n 1 g(z 1,..., z n 1, t k ) dq n 1 0 (z 1,..., z n 1 ) lim sup g(z 1,..., z n 1, t k ) dq n 1 0 (z 1,..., z n 1 ) I n 1 k I n 1 g(z 1,..., z n 1, t 0 ) dq 0 n 1 (z 1,..., z n 1 ) = g Q0 (t 0 ).
This proves the u.s.c. property of g Q0. Now we can apply Theorem 2.2, part (ii), which shows that Q 0 satisfies the condition in (ii) of Theorem 2.3, where G n (Q 0 ) = ρ+τ follows from G n (Q 0 ) = E Q0 (g Q0 ) = ρ + τe[q 0 ] = ρ + τ. 7 We consider still another extremum problem, (EP3) maximize Gn (Q 1,..., Q n ) = E N n i=1 Q (g) i s.t. Q 1,..., Q n Prob(I), E[Q i ] = 1, 1 i n. Again, the given function g : I n R is assumed to be nonnegative and u.s.c., but we do not impose here a symmetry condition on g. Theorem 2.4 Let I = [0, c ] with a real c > 1, let n N, n 2, and g : I n R be nonnegative and u.s.c. Then for the extremum problem (EP3) from above there exists an optimal solution (Q 1,..., Q n) such that each Q i (1 i n) is a one- or two-point distribution. Proof. By Lemma 2.1 the function G : Prob(I n ) R, G(R) = E R (g) is u.s.c. w.r.t. the weak topology on Prob(I n ). Since (Q 1,..., Q n ) n i=1 Q i is a continuous mapping from (Prob(I)) n to Prob(I n ) (w.r.t. the weak topologies), we see from n ) G n (Q 1,..., Q n ) = G( that the objective function G n of (EP3) is u.s.c. By i=1 Q i the compactness of the feasible region of (EP3) it follows that there exists an optimal solution (Q 1,..., Q n) to (EP3). It remains to show that each Q i can be chosen as a one- or two-point distribution. To this end let (Q 1,..., Q n) be any optimal solution to (EP3). For a given i {1,..., n} we can write G n (Q 1,..., Q n) = gi dq i, where g i (t) = I ( n g(z 1,..., z i 1, t, z i+1,..., z n ) d I n 1 j=1 j i Q j ) (z 1,..., z i 1, z i+1,..., z n ) for all t I. Along similar lines as in the proof of Theorem 2.3 one can see that gi nonnegative real u.s.c. function on I. Considering the (EP1), is a maximize G i (Q) = E Q (g i ) s.t. Q Prob(I), E[Q] = 1, we get from Theorem 2.2, part (iii), the existence of a one- or two-point distribution Q i which is an optimal solution to that (EP1). Observing that E Q (g i i ) = G n (Q 1,..., Q i 1, Q i, Q i+1,..., Q n), E Q i (gi ) = G n (Q 1,..., Q n), we have G n (Q 1,..., Q i 1, Q i, Q i+1,..., Q n) G n (Q 1,..., Q n),
and, since (Q 1,,..., Q n) is an optimal solution to our original (EP3), the inequality is actually an equality. So we have shown that if (Q 1,..., Q n) is an optimal solution to (EP3) then for any given i {1,..., n} there is a one- or two-point distribution Q i such that (Q 1,..., Q i 1, Q i, Q i+1,..., Q n) is also an optimal solution to (EP3). Applying this successively for i = 1,..., n the result follows. We close this section by adding a technical lemma, which will enable us for certain extremum problems below to cut down probability distributions on the real half line [ 0, ) to a compact interval I = [ 0, c ] (with a c > 1) as considered in the above extremum problems. By Prob( [ 0, ) ) we denote the set of all Borel probability measures on [ 0, ). A function g : [ 0, ) n R is said to be nondecreasing iff x, y [ 0, ) n and x y (componentwise) imply g(x) g(y). Lemma 2.5 Let g : [ 0, ) n R be nonnegative and nondecreasing (and Borelmeasurable). Assume that there is a real constant c > 1 such that g(z 1,..., z n ) = g( h c (z 1 ),..., h c (z n ) ) z 1,..., z n [ 0, ), where we denote h c : [ 0, ) [ 0, c ], h c (t) = min{t, c}. Let Q 1,..., Q n Prob( [ 0, ) ) with E[Q i ] 1 (1 i n) be given. Denote by Q hc i Prob( [0, c ]) the distribution of h c under Q i (1 i n), by δ c Prob([ 0, c ]) the one-point distribution at c, and define Q i,c Prob([ 0, c ]) (1 i n) by Q i,c = (1 λ i )Q hc i Then: + λ i δ c, where λ i = (1 E[Q hc i ])/(c E[Q hc i ]), (1 i n). E[Q i,c ] = 1 (1 i n) and g d n [ 0, ) n Q i i=1 g d n Q i,c. [ 0, c ] n i=1 8 Proof. Obviously, E[Q h c i ] E[Q i ] 1, hence 0 λ i < 1, and the Q i,c are in fact probability distributions on [ 0, c ]. Also, by the definition of λ i, E[Q i,c ] = (1 λ i )E[Q h c i ] + λ i c = 1. Clearly, for each i = 1,..., n, the distribution Q hc i is stochastically smaller than or equal to Q i,c, and since g is nondecreasing we have (cf. [4], Theorem 4.B.10, part (b)), g d n Q h c i g d n Q i,c. [ 0, c ] n i=1 [ 0, c ] n i=1 The integral on the l.h.s. equals g( h(z 1 ),..., h(z n ) ) d n Q i, [ 0, ) n i=1 which is the same as g d n [0, ) n i=1 Q i because of g( h c (z 1 ),..., h c (z n ) ) = g(z 1,..., z n ) by the assumed condition on g. Remark. If in the lemma the Q i are all equal, Q 1 =... = Q n = Q, say, then the Q i,c are all equal as well, Q 1,c =... = Q n,c = Q c, say.
9 3 An important statistic We return to the nonparametric statistical model of Section 1, i.e., we suppose that the values x 1,..., x n of n i.i.d. nonnegative real random variables X 1,..., X n are observed whose distribution is unknown, and we consider the one-sided hypotheses (1) about the expectation µ = E(X i ). Let 0 < < 1 be a given level. For n = 1, it was shown by Wang & Zhao in [1], pp. 82-83, that φ = 1 [1/, ) is UMP level among all nonrandomized level tests. Also they showed that this is no longer true if one admits randomized tests. For n = 2, they constructed a UMP test within the class of all strongly monotone and symmetric level tests (p. 84 of [1]), which is a highly nontrivial result. We employ the notion of a strongly monotone test from [1] for arbitrary n. A test φ : [ 0, ) n [ 0, 1 ] is said to be strongly monotone (nondecreasing) iff, firstly, φ(x) > 0, x z, x z imply φ(z) = 1 and, secondly, φ(x) < 1, x z, x z imply φ(z) = 0. Clearly, strong monotonicity of φ implies monotonicity of φ, i.e., φ is nondecreasing (x z implies φ(x) φ(z)). For a nonrandomized test φ strong monotonicity of φ is the same as monotonicity of φ (φ is nondecreasing). So, the class of all strongly monotone symmetric level tests for H 0 is slightly larger than the class of all nonrandomized monotone symmetric level tests for H 0. The following statistic introduced by Wang & Zhao in [1] for n = 2 is meaningful for the testing problem (1). We introduce that statistic for arbitrary n, ( ) W (x) = sup P X (i) x (i) i = 1,..., n, x = (x 1,..., x n ) [ 0, ) n, (9) H 0 where X (1) X (2)... X (n) denote the order statistics of the random variables X 1,..., X n and x (1) x (2)... x (n) denote the nondecreasingly rearranged components of x = (x 1,..., x n ). Our notation in (9) is somewhat loose; the sup on the r.h.s. of (9) is to mean that one takes into account all possible distributions Q = P X i of n i.i.d. nonnegative real random variables X 1,..., X n with E(X i ) 1 and takes the supremum over of all resulting probabilities P ( X (i) x (i) i = 1,..., n), for fixed x = (x 1,..., x n ) [ 0, ) n. That is, the value W (x) is the optimum value of the extremum problem, maximize G x,n (Q) = Q n ( C x ) (10) s.t. Q Prob([0, )), E[Q] 1, where x = (x 1,..., x n ) [ 0, ) n is given and C x denotes the set C x = { z = (z 1,..., z n ) [ 0, ) n : z (i) x (i) i = 1,..., n }. (11) Obviously, the function W on [ 0, ) n is symmetric and nonincreasing. Below we will see that W is continuous (see Lemma 3.5). So the following result is obtained by analogous arguments as those in [1], Corollary A.1 and its proof on pp. 91-92. Lemma 3.1 Consider the (nonrandomized) test ψ = 1 {W }. Then, for any strongly monotone symmetric level test φ for H 0 we have φ ψ. In particular, if ψ keeps level on H 0 then ψ is UMP within the class of all strongly monotone symmetric level tests for H 0 vs. H 1.
As it will turn out, the test ψ keeps level on H 0 only for n 2; moreover, for n 3 no UMP strongly monotone symmetric level test for H 0 vs. H 1 exists. To see this we need a more explicit description of the statistic W than that from the definition (9) or (10), resp. For n = 1 we have C x = [ x, ), i.e., { } W (x) = sup Q( [ x, ) ) : Q Prob( [ 0, ) ), E[Q] 1, from which one easily obtains (using the Tchebychev-Markov inequality in case x > 1), { 1 x 1 W (x) =, if 1/x x > 1. So for n = 1 we have ψ = 1 [1/, ), which keeps level on H 0, again by the Tchebychev- Markov inequality. Now let n 2. From the definition (9) or (10) we have trivially W (x) = 1 for all x = (x 1,..., x n ) [ 0, ) n with x (n) 1. In the nontrivial case that x (n) > 1 the extremum problem (10) can be brought into an (EP2) problem from Section 2 by using Lemma 2.5. The function g = 1 Cx satisfies the assumptions of Lemma 2.5 with c = x (n). By that lemma the optimum value W (x) of the extremum problem (10) remains the same when we restrict in (10) to probability distributions Q on [ 0, x (n) ] with E[Q] = 1. So we can apply Theorem 2.3 on an (EP2), which yields the following (cp. [1], Lemma A1 for the case n = 2). Lemma 3.2 Let n 2. Then, for a given x = (x 1,..., x n ) [ 0, ) n with x (n) > 1, the maximum value W (x) of (10) is attained by some Q x for which E[Q x ] = 1 and supp(q x ) = { 0, x (1),..., x (n) } or supp(q x ) = { x (1),..., x (n) }. 10 Proof. Theorem 2.3 with c = x (n), g = 1 Cx, and C x from (11), yields the existence of an optimal solution Q x Prob( [ 0, x (n) ] ) to (10), and the existence of ρ 0, τ R such that g Qx (t) ρ + τt t [ 0, x (n) ] and g Qx (t) = ρ + τt t supp(q x ), (12) where g Qx (t) = Q n 1 x ( C x (t) ), C x (t) = {s [ 0, x (n) ] n 1 : (s, t) C x }. (13) The distribution Q x must assign positive probability to the value x (n), since otherwise we would have Q n x(c x ) = 0 which cannot be true (any probability distribution Q Prob([ 0, x (n) ]) with E[Q] = 1 which assigns positive probability to x (n) has Q n (C x ) > 0). Clearly, the sets C x (t) are nondecreasing in t [ 0, x (n) ], and hence the function g Qx is nondecreasing. Also, g Qx is constant on the subinterval 0 t < x (1) (if this is nonempty), and on the subinterval x (k) t < x (k+1) (if this is nonempty) for each k = 1,..., n 1. More explicitly, it is easily seen that C x (t) =, if 0 t < x (1) ; (14) { C x (t) = s [ 0, x (n) ] n 1 : s (i) x (i) (1 i k 1), } s (i) x (i+1) (k i n 1), (15) if x (k) t < x (k+1) and 1 k n 1 ; { } C x (x (n) ) = s [ 0, x (n) ] n 1 : s (i) x (i) (1 i n 1). (16)
Since g Qx is nondecreasing we must have τ 0 in (12) (otherwise x (n) would be the only support point of Q x contradicting E[Q x ] = 1 and x (n) > 1). Suppose τ = 0. Then, g Qx (t) ρ for all t [ 0, x (n) ] and the equality g Qx (t) = ρ holds for all t supp(q x ). The point s in R n 1 having all components equal to x (n) belongs ( ) n 1 to C x (x (n) ) by (16) and hence ρ = g Qx (x (n) ) Q x ({x (n) }) > 0. So we have ρ > 0 and, because of g Qx (t) = 0 if 0 t < x (1), we have supp(q x ) [ x (1), x (n) ], in particular x (1) < x (n). Let t 0 be the smallest support point of Q x. There is a k {1,..., n 1} with x (k) t 0 < x (k+1). The interval I k = [ x (k), x (k+1) ) has positive Q x -probability, since it contains the smallest support point t 0. Consider the set D = { s = (s 1,..., s n 1 ) : s i I k (1 i k), s i = x (n) (k + 1 i n 1) }. Then Q n 1 x (D) = (Q x (I k )) k (Q x ({x (n) })) n 1 k > 0. For each s D we see from (15) that s C x (x (k+1) ), but s C x (x (k) ). Hence D C x (x (k+1) ) \ C x (x (k) ) and Q n 1 x (D) g Qx (x (k+1) ) g Qx (x (k) ), in particular g Qx (x (k) ) < g Qx (x (k+1) ). On the other hand we have ρ = g Qx (t 0 ) = g Qx (x (k) ) and ρ = g Qx (x (n) ), hence g Qx (x (k) ) = g Qx (x (k+1) ) and thus a contradiction, i.e., τ = 0 cannot hold. So we have τ > 0. By (12) the increasing straight line ρ + τt can touch the graph of the nondecreasing step-function g Qx (t) only at the very left points of its plateaus, the abscissa of which are 0, x (1),..., x (n). Hence supp(q x ) { 0, x (1),..., x (n) }. (17) It remains to show that each x (k) (k = 1,..., n) is in fact a support point of Q x. We know that this is true for k = n. Suppose that for some k {2,..., n} we have x (k) supp(q x ), but x (k 1) supp(q x ), and thus, in particular, x (k 1) < x (k). Clearly, the Q n 1 x -probability of a set C x (t) remains the same when restrict to the subset of those points s in C x (t) with components in supp(q x ). By (15) we have C x (x (k) ) = { s : s (i) x (i) (1 i k 1), s (i) x (i+1) (k i n 1) }, C x (x (k 1) ) = { s : s (i) x (i) (1 i k 2), s (i) x (i+1) (k 1 i n 1) }. But for s with components in supp(q x ) we get from (17) and x (k 1) supp(q x ) that the inequality s (k 1) x (k 1) is equivalent to s (k 1) x (k), which shows that the sets C x (x (k) ) and C x (x (k 1) ) contain the same set of points s with components in supp(q x ). Hence it follows that and by (12), g Qx (x (k) ) = Q n 1 x (C x (x (k) )) = Q n 1 x (C x (x (k 1) )) = g Qx (x (k 1) ), ρ + τx (k) = g Qx (x (k) ) = g Qx (x (k 1) ) ρ + τx (k 1), which is a contradiction to τ > 0 and x (k) > x (k 1). Hence we must have x (k 1) supp(q x ) as well. This completes the proof of our lemma. 11 From Lemma 3.2 it follows that the value W (x) (in the nontrivial case x (n) > 1) equals the maximum of Q n (C x ) taken over all discrete probability distributions Q with {x (1),..., x (n) } supp(q) {0, x (1),..., x (n) } and E[Q] = 1.
For n = 2 that maximum was explicitly determined in [1], Lemma A.1, giving the following. Lemma 3.3 (Wang & Zhao) Let n = 2. Then, for all x = (x 1, x 2 ) [ 0, ) 2, we have 1, if x (2) 1 1, if x W (x) = (2) > 1, x (1) > x (2) x (1) (2x (2) x (1) ) 1 (x (2) 1) 2, if x (x (2) x (1) ) 2 (2) > 1, x (1) x (2) x 2 (2) x (2) x 2 (2) x (2) 12 For n 3 no explicit description of W seems to be possible. However, an alternative (implicit ) representation of the statistic will be useful; in particular, it will help to prove continuity of W (for arbitrary n) below. Lemma 3.4 Let U (1) U (2)... U (n) be the order statistics of n i.i.d. standard uniformly distributed (on the interval ( 0, 1 ) ) random variables U 1,..., U n. Then, for any x = (x 1,..., x n ) [ 0, ) n with x (n) > 1, we have { W (x) = max P ( U (1) u 1,..., U (n) u n ) : n } 0 u 1... u n 1, u i (x (i) x (i 1) ) = x (n) 1, where x (0) = 0. Also, as we already know, W (x) = 1 whenever x (n) 1. Proof. Firstly, we note that if X 1,..., X n are i.i.d. real random variables, F denoting the (right continuous) c.d.f. of X i, and if a 1,..., a n R, then ( ) ( ) P X (i) a i i = 1,..., n = P U (i) F l (a i ) i = 1,..., n, (18) where U 1,..., U n are i.i.d. standard uniform random variables and F l (t) = lim v t, v<t F (v) for t R, (the left continuous c.d.f. of X i ). Eq. (18) can be seen as follows. Consider the pseudo-inverse of F, i=1 F (u) = min{ t R : F (t) u } u ( 0, 1 ). Then, as it is well-known, we may write X i = F (U i ) (1 i n), with some i.i.d. standard uniform random variables U 1,..., U n. It is easily seen that for any a R and any u ( 0, 1 ) the following implications are true, u > F l (a) = F (u) a = u F l (a). This gives, observing that X (i) = F (U (i) ) (1 i n), { } { } { } U (i) > F l (a i ) i X (i) a i i U (i) F l (a i ) i,
and since the event U (i) = F l (a i ) has probability zero for all i, we conclude (18). Now let x = (x 1,..., x n ) [ 0, ) n with x (n) > 1 be given. Denote by m(x) the maximum on the r.h.s. of the asserted formula of the lemma. Note that the term maximum (instead of supremum ) is correct, since the probability P (U (i) u i i = 1,..., n) is a continuous function of u = (u 1,..., u n ) and the set of u s over which the supremum is taken is a compact (and nonempty) set. We have to show that m(x) = W (x). By definition of W (x) through (9) or (10) and by Lemma 3.2, W (x) is the maximum of all probabilities ( ) P X (i) x (i) i = 1,..., n, taken over all i.i.d. random variables X 1,..., X n with values in {0, x (1),..., x (n) } and with E(X i ) = 1. For any such random variables X 1,..., X n denote by F the (right continuous) c.d.f. of X i and by F l its left continuous version. A well-known formula for the expectation of a nonnegative random variable gives 1 = E(X 1 ) = (1 F 0 l(t))dt. Since X 1 has its values in {0, x (1),..., x (n) } we can write 1 F l (t) = n (1 F l (x (i) )) 1 ( x(i 1),x (i) ](t) t > 0, i=1 and hence n 1 = (1 F l (x (i) ) (x (i) x (i 1) ), i.e., i=1 n F l (x (i) ) (x (i) x (i 1) ) = x (n) 1. So, choosing u i = F l (x (i) ) (1 i n), we have 0 u 1... u n 1, n i=1 u i(x (i) x (i 1) ) = x (n) 1, and, together with (18), P ( X (i) x (i) i = 1,..., n) = P ( U (i) u i i = 1,..., n), which proves W (x) m(x). To prove the reverse inequality let 0 u 1... u n 1 with n i=1 u i(x (i) x (i 1) ) = x (n) 1 be given. Then, obviously, the function F (t) = i=1 n u i 1 [ x(i 1),x (i) )(t) + 1 [ x(n), )(t), t R, i=1 is the (right continuous) c.d.f. of some probability distribution Q with support in the set {0, x (1),..., x (n) }. The left continuous version F l satisfies F l (x (i) ) u i (1 i n), and for the expectation of Q we have E[Q] = 0 (1 F (t))dt = n (1 u i )(x (i) x (i 1) ) = 1. i=1 Let X 1,..., X n be i.i.d. Q-distributed random variables. From (18) together with F l (x (i) ) u i (1 i n) we get P ( X (i) x (i) i = 1,..., n) P ( U (i) u i i = 1,..., n), 13 which shows W (x) m(x).
14 Lemma 3.5 The function W is continuous on [ 0, ) n. Proof. Since W is constantly equal to 1 on the cube C = [ 0, 1 ] n, it suffices to prove (i) continuity of W on the set C c = [ 0, ) n \ C, and (ii) W (x) 1 whenever x C c approaches the boundary of C. In view of Lemma 3.4, the following abbreviations will be convenient. f(u) = P ( U (i) u i i = 1,..., n ), for u = (u 1,..., u n ) R n, which is a continuous function on R n, D = { u R n : 0 u 1... u n 1 }, { n } D(x) = u D : u i (x (i) x (i 1) ) = x (n) 1 i=1 for x C c. Clearly, D and D(x) (for x C c ) are nonempty compact and convex sets. It is not hard to see that the vertices of D(x) are given by ( ξ i,..., ξ }{{} i, 1,..., 1 ) for 1 i n with x }{{} (i) 1, where ξ i = x (i) 1 x (i) ; i times n i times ( 0,..., 0, ξ }{{} ij,..., ξ ij, 1,..., 1 ) }{{}}{{} for 1 i < j n with x (i) < 1 < x (j), i times j i times n j times where ξ ij = (x (j) 1)/(x (j) x (i) ). By Lemma 3.5 we have W (x) = max f(u) for x u D(x) Cc. (19) Ad (i): Let x (0) C c and x (k) C c (k N) be a sequence with lim k x (k) = x (0). For each k N let u (k) D(x (k) ) with W (x (k) ) = f(u (k) ). Denote a = lim sup k f(u (k) ). Choose an infinite subsequence u (k), k K N, such that f(u (k) ) (k, k K) converges to a, and due to the compactness of D choose a further convergent subsequence u (k), k K K, with some limit u (0) D. From n i=1 u(k) i (x (k) (i) x(k) (i 1) ) = x(k) (n) 1 for all k K we get by taking the limits, n i=1 u(0) i (x (0) (i) x(0) (i 1) ) = x(0) (n) 1, i.e., u (0) D(x (0) ). Hence by continuity of f and by (19), a = lim k, k K f(u(k) ) = f(u (0) ) W (x (0) ), which shows that lim sup k W (x (k) ) W (x (0) ). It remains to prove the inequality lim inf k W (x (k) ) W (x (0) ). To this end it suffices to show that for any given point u (0) D(x (0) ) one can find a sequence u (k) D(x (k) ) (k N) converging to u (0). Then, taking u (0) D(x (0) ) with f(u (0) ) = W (x (0) ), we get lim k f(u (k) ) = f(u (0) ) = W (x (0) ) and f(u (k) ) W (x (k) ) for all k, from which W (x (0) ) lim inf k W (x (k) ) follows. Now, since D(x (k) ) is a convex set for each k, the set of all limits of convergent sequences u (k) D(x (k) ) (k N) is again a convex set, and it is a subset of D(x (0) ) (see our arguments above). Thus, that set of limits coincides with D(x (0) ) iff each vertex of D(x (0) ) is a limit of some sequence u (k) D(x (k) ). But this is readily verified in view of our description of the vertices of a set D(x) from above. In fact, the vertices of
D(x (0) ) are the limits of the corresponding vertices of D(x (k) ) with one exception: If x (0) (i) = 1 for some i {1,..., n 1} and x(k) (i) < 1 for infinitely many k, then the vertex (0,..., 0, 1,..., 1) (where 0 appears i times) of D(x (0) ) is the limit of u (k) D(x (k) ) for k, where u (k) = ( ξ (k) i,..., ξ (k) i, 1,..., 1 ) with ξ (k) i }{{} i times = x(k) (i) 1, if x (k) x (k) (i) 1, (i) 15 u (k) = ( 0,..., 0, ξ (k) }{{} in,..., ξ(k) i times in ) with ξ(k) in = x(k) (n) 1 x (k) (n) x(k) (i), if x (k) (i) < 1. Ad (ii): Let x (k) C c (k N) converge to some boundary point x (0) of C, hence in particular lim k x (k) (n) = 1. For each k consider the point u(k) with components all equal to (x (k) (n) 1)/x(k) (n), which belongs to D(x(k) ). Obviously, lim k u (k) = (0,..., 0). Again by (19), W (x (k) ) f(u (k) ) for all k and thus, by continuity of f, lim inf k W (x(k) ) f(0,..., 0) = 1, i.e., lim k W (x (k) ) = 1. 4 The Wang-Zhao result for n = 2 Here we present an alternative proof of [1], Theorem 5.2. Theorem 4.1 (Wang & Zhao) Let n = 2. Then, for any given ( 0, 1 ), we have sup Q 2 ( W ) =, Q H 0 where Q H 0 means that Q Prob( [ 0, ) ) with E[Q] 1. So, by Lemma 3.1, the test ψ = 1 {W } is UMP among all strongly monotone symmetric level tests for H 0 vs. H 1. Proof. In what follows we denote c = 1 1 1 and I = [ 0, c ]. By the formula of Lemma 3.3, W (0, c ) =. Since W is symmetric and nonincreasing, we see that the indicator function g = 1 {W } satisfies the conditions of Lemma 2.5 (with c = c ). By that lemma the sup in the assertion equals the maximum value of the (EP2), maximize Q 2 ( W ), (20) s.t. Q Prob(I ), E[Q] = 1.
By Lemma 3.2, resolving the inequality W (x 1, x 2 ) for x 1, x 2 I, yields (after some lengthy but elementary calculations, cp. [1], p. 84), W (x 1, x 2 ) 1 x (2) c, x (1) w (x (2) ), (x 1, x 2 I ), (21) 16 where we have introduced the function w : [ 1, c ] [ 0, 1 ], w (t) = Also we get { t t 2 1, if 1 t 1 t t 1 1, if 1 < t c. (22) W (x 1, x 2 ) = 1 x (2) c, x (1) = w (x (2) ), (x 1, x 2 I ). (23) It is easily seen that the function w given by (22) is a continuous and decreasing oneto-one function from [ 1 1, c ] onto [ 0, ], and so there is the inverse function w 1 We firstly show that for the (EP2) from (20) feasible probability distributions Q with Q 2 (W ) = are the following. Let x = (x 1, x 2) I 2 be such that W (x ) = and choose Q x according to Lemma 3.2, i.e., { x (1), x (2)} supp(q x ) { 0, x (1), x (2)}, E[Q x ] = 1, (24) and W (x ) = Q 2 x ({z I2 : z (1) x (1) z (2) x (2)}). (25) From (21), (23), (24) we see that an x = (x 1, x 2 ) supp(q 2 x ) satisfies W (x) (= W (x )) if and only if x (1) = x (1), x (2) = x (2) or x (1) = x (2) = x (2), i.e., if and only if x (1) x (1), x (2) x (2). Hence, together with (25), we obtain Q 2 x ( W ) = Q2 x ( {x supp(q2 x ) : x (1) x (1), x (2) x (2)} ) = W (x ) =. So we have proved that for any x I 2 with W (x ) = the probability distribution Q x from Lemma 3.2 satisfies Q 2 x ( W ) =. The rest of our proof will show, in particular, that those distributions Q x are the optimal solutions to the (EP2) from (20). Let Q 0 be any optimal solution to (20) (which exists by Theorem 2.3, part (i)). By part (ii) of Theorem 2.3 there exist real numbers ρ 0 and τ such that g Q0 (t) ρ + τt t I and g Q0 (t) = ρ + τt t supp(q 0 ), (26) where g Q0 (t) = Q 0 ( {s I : W (s, t) } ) t I, (27) and then Q 2 0( W ) = ρ + τ. So we have to conclude that ρ + τ. From (27) and (21) we obtain. g Q0 (t) = { 1 Q0 ( [w (t), c ] ), if t c Q 0 ( [w 1 (t), c ] ), if 0 t 1. (28) The function g Q0 is nondecreasing and thus (26) forces τ 0. Suppose τ = 0. Since 0 g Q0 (t) 1 for all t [ 0, c ] and g Q0 (c ) = 1, (26) forces ρ 1 and hence ρ = 1. The minimum support point s 0 of Q 0 must be less than or equal to 1 (because of
E[Q 0 ] = 1), in particular s 0 < 1, hence by (26) and (28) Q 0 ( [ w 1 (s 0 ), c ] ) = 1. But w 1 (s 0 ) > 1 and hence s 0 supp(q 0 ) which is a contradiction. So we have τ > 0. For the support of Q 0 we conclude further: If t supp(q 0 ) and 1 t c, then w (t) supp(q 0 ) ; (29) if s supp(q 0 ) and 0 < s < 1, then w 1 (s) supp(q 0 ). (30) This can be seen as follows. Let t supp(q 0 ) and 1 t c, but suppose that w (t) supp(q 0 ) (hence t > 1 since w ( 1 ) = 1 ). So there is an ε > 0 such that w (t) + ε 1 and Q 0 ( [ w (t), w (t) + ε ] ) = 0. Since w is continuous and decreasing, there is a δ > 0 such that t δ > 1 and w (t) < w (t δ) w (t) + ε, hence Q 0 ( [ w (t), w (t δ) ] ) = 0, and we get from (26), (28), ρ+τt = g Q0 (t) = Q 0 ( [ w (t), c ] ) = Q 0 ( [ w (t δ), c ] ) = g Q0 (t δ) ρ+τ(t δ), which is a contradiction in view of τ > 0. The arguments are similar for the case that s supp(q 0 ) and 0 < s < 1 ; suppose that (s) supp(q 0 ). Then there is an ε > 0 such that w 1 w 1 (s) + ε c and Q 0 ( [ w 1 (s), w 1 (s) + ε ] ) = 0. Since w 1 is continuous and decreasing, there is a δ > 0 such that s δ 0 and w 1 (s) < w 1 (s δ) w 1 (s) + ε, hence Q 0 ( [ w 1 (s), w 1 (s δ) ] ) = 0, and we get from (26), (28), ρ+τs = g Q0 (s) = Q 0 ( [ w 1 (s), c ] ) = Q 0 ( [ w 1 (s δ), c ] ) = g Q0 (s δ) ρ+τ(s δ), which is a contradiction in view of τ > 0. 1 Let t 0 denote the maximum support point of Q 0 ; by (29), (30), t 0 c. Denote s 0 = w (t 0 ). By (26), (28), and (29), ρ + τs 0 = Q 0 (t 0 ) and ρ + τt 0 = Q 0 ( [ s 0, t 0 ] ), (31) where for short we write Q 0 (t 0 ) instead of Q 0 ({t 0 }). We have to distinguish three cases to prove ρ + τ. Case 1: t 0 = c. Case 2: t 0 < c and 0 supp(q 0 ). Case 3: t 0 < c and 0 supp(q 0 ). Ad case 1: Then s 0 = 0 and (31) rewrites as hence c ρ = c Q 0 (c ) E[Q 0 ] = 1 and ρ = Q 0 (c ) and ρ + τc = 1, ρ + τ = ρ + 1 ρ c = (1 1 c )ρ + 1 c (1 1 c ) 1 c + 1 c =. 17
Ad case 2: Then, by (26) and (28), ρ = Q 0 (c ) = 0, and together with (31), τt 0 = Q 0 ( [s 0, t 0 ] ) < 1. If t 0 1 this yields ρ + τ = τ < 1/t 0. Now let t 0 < 1. From 1 = E[Q 0 ] s 0 Q 0 ( [s 0, t 0 ) ) + t 0 Q 0 (t 0 ) = s 0 Q 0 ( [s 0, t 0 ] ) + (t 0 s 0 )Q 0 (t 0 ) (32) we obtain, observing (31) and ρ = 0, 1 s 0 τt 0 + (t 0 s 0 )τs 0 = τs 0 (2t 0 s 0 ). 1 Because of t 0 < 1 and s 0 = w (t 0 ) we see from (22) that s 0 (2t 0 s 0 ) = 1, and we conclude ρ + τ = τ. Ad case 3: Then, by (29), (30), s 0 is the minimum support point of Q 0 and hence Q 0 ( [s 0, t 0 ] ) = 1. For short let us write λ 0 = Q 0 (t 0 ). So (31) rewrites as Resolving for ρ and τ yields ρ + τs 0 = λ 0 and ρ + τt 0 = 1. ρ = t 0λ 0 s 0 t 0 s 0, τ = 1 λ 0 t 0 s 0. Since ρ 0, we have λ 0 s 0 /t 0 ; since (32) is generally true, we have s 0 +(t 0 s 0 )λ 0 1, hence λ 0 (1 s 0 )/(t 0 s 0 ). Thus, s 0 /t 0 (1 s 0 )/(t 0 s 0 ), which after straightforward calculations gives s 0 t 0 t 2 0 t 0. Since s 0 = w (t 0 ) we see from (22) that t 0 1, and hence s 0 = w (t 0 ) = t 0 t 0 1 1. So we obtain, Now, 1 s 0 t 0 s 0 ρ + τ = t 0λ 0 s 0 t 0 s 0 + 1 λ 0 t 0 s 0 = (t 0 1)λ 0 + 1 s 0 t 0 s 0 (t 0 1) 1 s 0 t 0 s 0 + 1 s 0 t 0 s 0 = 1 s 0 t 0 s 0 ( t 0 1 t 0 s 0 + 1 ). = 1 1 and t 0 1 t 0 s 0 = 1, and thus ρ + τ (1 1 ) (1 + 1 ) =. 18 5 The negative result for n 3 As announced earlier, we will prove that if n 3 then the test ψ from Lemma 3.1 does not keep the level on H 0, i.e., sup Q n ( W ) >, Q H 0 and, moreover, there does not exist a UMP strongly monotone symmetric level test for H 0 vs. H 1. To this end we need the following lemma.
Lemma 5.1 Let n 3 and ( 0, 1 ) be given. There exist real numbers b > a > 1 such that W ( 0,..., 0, a, a, a) = W ( 0,..., 0, a, b) =. }{{}}{{} n 3 times n 2 times Proof. For any β > 1 and 1 j n we get from Lemma 3.2, n ( ) n W ( 0,..., 0, β,..., β) = ( }{{}}{{} 1 β i )i (1 1 β )n i. (33) n j times j times i=j This shows, in particular, that for any sequence x (k) [ 0, ) n (k N) with x (k) n) for k we have lim k W (x (k) ) = 0, since with β k = x (k) (n) W (x (k) ) W ( 0,..., 0, β }{{} k ) = 1 n 1 times and j = 1 in (33) ( 1 1 β k ) n (k ) 0. Consider W (0,..., 0, β, β, β) (zero appearing n 3 times) for 1 β <. For β = 1, W (0,..., 0, 1, 1, 1) = 1 and for β, W (0,..., 0, β, β, β) 0. By the continuity of W there exists an a > 1 such that W (0,..., 0, a, a, a) =. The r.h.s. of (33) decreases when j increases, and in particular γ = W ( 0,..., 0, a, a) > W ( 0,..., 0, a, a, a) =. }{{}}{{} n 2 times n 3 times Consider W (0,..., 0, a, β) (zero appearing n 2 times) for a β <. For β = a, W (0,..., 0, a, a) = γ >, and for β, W (0,..., 0, a, β) 0. By the continuity of W there exists a b > a such that W (0,..., 0, a, b) =. Theorem 5.2 Let n 3 and ( 0, 1 ). Choose b > a > 1 as in Lemma 5.1. Then: (i) For x = (0,..., 0, a, b) (with n 2 zeros) and Q x Q n x ( W ) >. In particular, the test ψ does not keep the level on H 0. (ii) from Lemma 3.2 we have The tests φ 1 and φ 2, defined by { { 1, if x(n 2) a 1, if x(n 1) a, x φ 1 (x) =, φ 0, else 2 (x) = (n) b 0, else are nonrandomized monotone symmetric level tests for H 0, and no strongly monotone symmetric level test for H 0 can be at least as powerful (uniformly on H 1 ) as both φ 1 and φ 2. In particular, there does not exist a UMP strongly monotone symmetric level test for H 0 vs. H 1. Proof. (i) Because of E[Q x ] = 1 and b > a > 1 we have supp(q x ) = { 0, a, b }. The set {W } has as a subset { } C x = x [ 0, ) n : x (i) x (i) i = 1,..., n,, 19
since W is symmetric and nonincreasing and W (x ) =. Another subset of {W } is given by { } D = x [ 0, ) n : x (i) = 0, 1 i n 3, and x (n 2) = x (n 1) = x (n) = a, 20 again by the symmetry of W and by W (0,..., 0, a, a, a) =. Clearly, C x disjoint, and hence Q n x ( W ) Qn x ( C x ) + Qn x ( D ), and the assertion follows from and D are Q n x (C x ) = W (x ) = and Q x (D ) = ( n 3) (Qx (0)) n 3 (Q x (a)) 3 > 0. (ii) The tests φ 1 and φ 2 are clearly nonrandomized, monotone, and symmetric. By definition of W we have, with x = (0,..., 0, a, b) (zero appearing n 2 times) as above, y = (0,..., 0, a, a, a) (zero appearing n 3 times), and notation C x, C y from (11), sup E Q n(φ 1 ) = sup Q n (C x ) = W (x ) =, Q H 0 Q H 0 sup E Q n(φ 2 ) = sup Q n (C y ) = W (y ) =. Q H 0 Q H 0 So φ 1 and φ 2 are nonrandomized monotone symmetric level tests for H 0. Now let φ be any strongly monotone symmetric level test for H 0 which is uniformly at least as powerful on H 1 as φ 1, i.e., E Q n(φ) E Q n(φ 1 ) Q Prob( [ 0, ) ) with E[Q] > 1. (34) We have to show that φ is not uniformly at least as powerful ( on H) 1 as φ 2. To this end 0 a we specialize (34) to the two-point distributions Q = with 1/a < λ < 1, 1 λ λ which yields E Q n(φ) = n ( ) n ϕ i λ i (1 λ) n i E Q n(φ 1 ) = i i=0 where we have denoted Hence it follows that ϕ 3 = 1, i.e., n i=3 ϕ i = φ( 0,..., 0, a,..., a), 0 i n. }{{}}{{} n i times i times ( ) n λ i (1 λ) n i, (35) i φ(0,..., 0, a, a, a) = 1. (36) For, suppose that ϕ 3 < 1. Then, by strong monotonicity of φ, ϕ i = 0 for i = 0, 1, 2 which contradicts (35). As a next step we show that θ = φ(0,..., 0, a, b) < 1. (37) Suppose that θ = 1. Then φ φ 2 by the symmetry and monotonicity of φ. By Lemma 3.2 there is a probability distribution Q 0 with support equal to {0, a, b} and
with E[Q 0 ] = 1 such that = W (0,..., 0, 0, a, b) = E Q n 0 (φ 2 ). But E Q n 0 (φ), and thus we must have φ(x) = φ 2 (x) for all x from the support of Q n 0, i.e., for all x with components in {0, a, b}, and in particular φ(0,..., 0, a, a, a) = φ 2 (0,..., 0, a, a, a) = 0, contradicting (36). Hence (37) follows. Consider the sets C = {x = (x 1,..., x n ) : x i {0, a, b} (1 i n), x (n 2) = 0, x (n 1) = a, x (n) = b}, D = {x = (x 1,..., x n ) : x i {0, a, b} (1 i n), x (n) a}. Then we have φ(x) + (1 θ) 1 C (x) φ 2 (x) + 1 D (x) (38) x = (x 1,..., x n ) with x i {0, a, b} (1 i n), which can be seen as follows. The l.h.s. of (38) does not exceed 1 because of φ(x) = θ on C, while the r.h.s. of (38) yields either 0, 1, or 2. So it remains to verify inequality (38) for all x with components in {0, a, b} and φ 2 (x) = 1 D (x) = 0. The latter means that x (n 1) = 0 and x (n) = b, hence φ(x) = φ(0,..., 0, b) which equals zero by (37) and the strong monotonicity of φ, and clearly 1 C (x) = 0. This shows (38). Hence it follows that for any probability distribution Q with supp(q) = {0, a, b} we have E Q n(φ 2 ) E Q n(φ) (1 θ) Q n (C) Q n (D) = n(n 1)(1 θ) Q(b)Q(a)(Q(0)) n 2 (1 Q(b)) n. Taking Q(0) = Q(a) = ε, Q(b) = 1 2ε with 0 < ε < 1/2 and E[Q] > 1, i.e., ε < (b 1)/(2b a), one gets E Q n(φ 2 ) E Q n(φ) n(n 1)(1 θ)(1 2ε)ε n 1 (2ε) n ( ) = ε n 1 n(n 1)(1 θ) [2n(n 1)(1 θ) + 2 n ]ε. The latter expression is positive for some positive small ε. So we have shown that there exists a probability distribution Q on [ 0, ) with E[Q] > 1 and E Q n(φ) < E Q n(φ 2 ), i.e., φ is not uniformly at least as powerful on H 1 as φ 2. 21 References [1] Wang, W., Zhao, L.H., Nonparametric tests for the mean of a non-negative population. J. Statist. Plann. Inference 110 (2003), pp. 75-96. [2] Bauer, H., Maß und Integrationstheorie. [In German]. De Gruyter, Berlin, 1992. [3] Billingsley, P., Weak Convergence of Measures: Applications in Probability. Soc. for Industrial and Applied Mathematics, Philadelphia, 1971. [4] Shaked, M., Shanthikumar, J.G., Stochastic Orders and Their Applications. Academic Press, San Diego, 1994.