Normal Approximation for Non-linear Statistics Using a Concentration Inequality Approach

Normal Approximation for Non-linear Statistics Using a Concentration Inequality Approach Louis H.Y. Chen 1 National University of Singapore and Qi-Man Shao 2 Hong Kong University of Science and Technology, University of Oregon,and Zhejiang University Abstract. Let T be a general sampling statistic that can be written as a linear statistic plus an error term. Uniform and non-uniform Berry-Esseen type bounds for T are obtained. The bounds are best possible for many known statistics. Applications to U-statistic, multi-sample U-statistic, L-statistic, random sums, and functions of non-linear statistics are discussed. Nov. 15, 2005 AMS 2000 subject classification: Primary 62E20, 60F05; secondary 60G50. Key words and phrases. Normal approximation, uniform Berry-Esseen bound, non-uniform Berry- Esseen bound, concentration inequality approach, nonlinear statistics, U-statistics, multi-sample U- statistics, L-statistic, random sums, functions of non-linear statistics. 1 Research is partially supported by grant R-146-000-013-112 at the National University of Singapore. 2 Research is partially supported by NSF grant DMS-0103487 and grant R-146-000-038-101 at the National University of Singapore. 1

1 Introduction Let X 1, X 2,..., X n be independent random variables and let T : T (X 1,..., X n be a general sampling statistic. In many cases T can be written as a linear statistic plus an error term, say T W +, where W g i (X i, : (X 1,..., X n T W and g i : g n,i are Borel measurable functions. Typical cases include U-statistics, multi-sample U- statistics, L-statistics, and random sums. Assume that (1.1 E(g i (X i 0 for i 1, 2,..., n, and E(gi 2 (X i 1. It is clear that if 0 in probability as n, then we have the following central limit theorem (1.2 sup P (T z Φ(z 0 z where Φ denotes the standard normal distribution function, provided that the Lindeberg condition holds: ε > 0, Egi 2 (X i I( g i (X i > ε 0. If in addition, E p < for some p > 0, then by the Chebyshev inequality, one can obtain the following rate of convergence: (1.3 sup z P (T z Φ(z sup P (W z Φ(z + 2(E p 1/(1+p. z The first term on the right hand side of (1.3 is well-understood via the Berry-Esseen inequality. For example, using Stein s method, Chen and Shao (2001 obtained ( n (1.4 sup P (W z Φ(z 4.1 Egi 2 (X i I( g i (X i > 1 + z E g i (X i 3 I( g i (X i 1. However, the bound (E p 1/(1+p is in general not sharp for many commonly used statistics. Many authors have worked towards obtaining better Berry-Esseen bounds. For example, sharp Berry-Esseen bounds have been obtained for general symmetric statistics in van Zwet (1984 and Friedrich (1989. An Edgeworth expansion with remainder O(n 1 for symmetric statistics was proved by Bentkus, Götze and Zwet (1997. The main purpose of this paper is to establish uniform and non-uniform Berry-Esseen bounds for general nonlinear statistics. The bounds are best possible for many known statistics. Our proof is 2

based on a randomized concentration inequality approach to bounding P (W + z P (W z. Since proofs of uniform and non-uniform bounds for sums of independent random variables can be proved via Stein s method [8], which is much neater and simpler than the traditional Fourier analysis approach, this paper provides a direct and unifying treatment towards the Berry-Esseen bounds for general non-linear statistics. This paper is organized as follows. The main results are stated in next section, five applications are presented in Section 3 and an example is given in Section 4 to show the sharpness of the main results. Proofs of the main results are given in Section 5, while proofs of other results including Example 4.1 are postponed to Section 6. Throughout this paper, C will denote an absolute constant whose value may change at each appearance. The L p norm of a random variable X is denoted by X p, i.e., X p (E X p 1/p for p 1. 2 Main results Let {X i, 1 i n, T, W, be defined as in Section 1. In the following theorems, we assume that (1.1 is satisfied. Put (2.1 β and let δ > 0 satisfy (2.2 E g i (X i 2 I( g i (X i > 1 + E g i (X i 3 I( g i (X i 1 E g i (X i min(δ, g i (X i 1/2. Theorem 2.1 For each 1 i n, let i be a random variable such that X i and ( i, W g i (X i are independent. Then (2.3 sup P (T z P (W z 4δ + E W + z for δ satisfying (2.2. In particular, we have E g i (X i ( i (2.4 and (2.5 sup P (T z P (W z 2β + E W + z sup P (T z Φ(z 6.1β + E W + z E g i (X i ( i E g i (X i ( i. 3

Next theorem provides a non-uniform bound. Theorem 2.2 For each 1 i n, let i be a random variable such that X i and ( i, {X j, j i are independent. Then for δ satisfying (2.2 and for z R 1, (2.6 P (T z P (W z γ z + e z /3 τ where (2.7 (2.8 γ z P ( > ( z + 1/3 + P ( g i (X i > ( z + 1/3 + P ( W g i (X i > ( z 2/3P ( g i (X i > 1, τ 21δ + 8.1 2 + 3.5 g i (X i 2 i 2. In particular, if E g i (X i p < for 2 < p 3, then (2.9 P (T z Φ(z P ( > ( z + 1/3 + C( z + 1 p( 2 + g i (X i 2 i 2 + E g i (X i p. A result similar to (2.5 has been obtained by Friedrich (1989 for g i E(T X i using the method of characteristic function. Our proof is direct and simpler and the bounds are easier to calculate. The non-uniform bounds in (2.6 and (2.9 for general non-linear statistics are new. Remark 2.1 Assume E g i (X i p < for p > 2. Let (2.10 ( 2(p 2 p 2 δ (p 1 p 1 E g i (X i p 1/(p 2. Then (2.2 is satisfied. This follows from the inequality (2.11 min(a, b a (p 2p 2 a p 1 (p 1 p 1 b p 2 for a 0 and b > 0. Remark 2.2 If β 1/2, then (2.2 is satisfied with δ β/2. 4

Remark 2.3 Let δ > 0 be such that Egi 2 (X i I( g i (X i > δ 1/2. Then (2.2 holds. In particular, if X 1, X 2,, X n are independent and identically distributed (i.i.d. random variables and g i g 1, then (2.2 is satisfied with δ c 0 / n, where c 0 is a constant such that E( ng 1 (X 1 2 I( ng 1 (X 1 > c 0 1/2. Remark 2.4 In Theorems 2.1 and 2.2, the choice of i is flexible. For example, one can choose i (X 1,..., X i 1, 0, X i+1,..., X n or i (X 1,..., X i 1, ˆX i, X i+1,..., X n, where { ˆX i, 1 i n is an independent copy of {X i, 1 i n. The choice of g i is also flexible. It can be more general than g i (x E(T X i x, which is commonly used by others in the literature. Remark 2.5 Let X 1,..., X n be independent normally distributed random variables with mean zero and variance 1/n, and let W, T and be as in Example 4.1. Then (2.12 E W + E X i 3 + E X i ( (X 1,..., X i,..., X n (X 1,..., 0,..., X n Cε 2/3 for (1/ε 4/3 n 16(1/ε 4/3. This together with (4.5 shows that the bound in (2.4 is achievable. Moreover, the term E g i (X i ( i in (2.4 can not be dropped off. 3 Applications Theorems 2.1 and 2.2 can be applied to a wide range of different statistics and provide bounds of the best possible order in many instances. To illustrate the usefulness and the generality of these results, we give five applications in this section. The uniform bounds refine many existing results with specifying absolute constants, while the non-uniform bounds are new for many cases. 3.1 U-statistics Let X 1, X 2,..., X n be a sequence of independent and identically distributed (i.i.d. random variables, and let h(x 1,..., x m be a real-valued Borel measurable symmetric function of m variables, where m (2 m < n may depend on n. Consider the Hoeffding (1948 U-statistic ( n 1 U n h(x m i1,..., X im. 1i 1 <...<i mn 5

The U-statistic elegantly and usefully generalizes the notion of a sample mean. Numerous investigations on the limiting properties of the U-statistics have been done during the last few decades. A systematic presentation of the theory of U-statistics was given in Koroljuk and Borovskikh (1994. We refer the study on uniform Berry-Esseen bound for U-statistics to Filippova (1962, Grams and Serfling (1973, Bickel (1974, Chan and Wierman (1977, Callaert and Janseen (1978, Serfling (1980,Van Zwet (1984, and Friedrich (1989. One can also refer to Wang, Jing and Zhao (2000 on uniform Berry-Esseen bound for studentized U-statistics. Applying Theorems 2.1 and 2.2 to the U-statistic, we have Theorem 3.1 Assume that Eh(X 1,..., X m 0 and σ 2 Eh 2 (X 1,..., X m <. Let g(x E(h(X 1, X 2,..., X m X 1 x and σ1 2 Eg2 (X 1. Assume that σ 1 > 0. Then ( n ( 1 n sup P U n z P g(x i z (1 + 2(m 1σ z mσ 1 nσ1 (m(n m + 1 1/2 + c 0 (3.1, σ 1 n where c 0 is a constant such that Eg 2 (X 1 I( g(x 1 > c 0 σ 1 σ 2 1 /2. If in addition E g(x 1 p < for 2 < p 3, then (3.2 sup P z ( n U n z Φ(z (1 + 2(m 1σ mσ 1 (m(n m + 1 1/2 + 6.1E g(x 1 p σ 1 n (p 2/2 σ p 1 and for z R 1, ( n (3.3 P U n z Φ(z mσ 1 9mσ 2 (1 + z 2 (n m + 1σ1 2 CE g(x 1 p + (1 + z p n (p 2/2 σ p. 1 + 13.5e z /3 m 1/2 σ (n m + 1 1/2 σ 1 Moreover, if E h(x 1,..., X m p < for 2 < p 3, then for z R 1, ( n (3.4 P U n z Φ(z mσ 1 Cm1/2 E h(x 1,..., X m p (1 + z p (n m + 1 1/2 σ p 1 CE g(x 1 p + (1 + z p n (p 2/2 σ p. 1 Note that the error in (3.1 is of order O(n 1/2 only under the assumption of finite second moment of h. The result appears not known before. The uniform bound given in (3.2 is not new, however, the specifying constant for general m is new. Finite second moment of h is not the weakest assumption for the uniform bound. Friedrich (1989 obtained an order of O(n 1/2 when E h 5/3 < which is necessary for the bound as shown by Bentkus, Götze and Zitikis (1994. 6

then (3.5 For the non-uniform bound, Zhao and Chen (1983 proved that if m 2, E h(x 1, X 2 3 <, P ( n U n z Φ(z An 1/2 (1 + z 3 mσ 1 for z R 1, where the constant A does not depend on n and z but the moment of h. Clearly, (3.4 refines Zhao and Chen s result specifying the relationship of the constant A with the moment condition. After we finished proving Theorem 3.1, Wang (2001 informed the second author that he also obtained (3.4 for m 2 and p 3. Remark 3.1 (3.3 implies that (3.6 P ( n U n z Φ(z mσ 1 Cm 1/2 σ 2 (1 + z 3 (n m + 1 1/2 σ 2 1 CE g(x 1 p + (1 + z p n (p 2/2 σ p 1 for z ((n m + 1/m 1/2. For z > ((n m + 1/m 1/2, the bound like (3.6 can be easily obtained by using the Chebyshev inequality. On the other hand, if (3.6 holds for any z R 1, then it appears necessary to assume E h(x 1,..., X m p <. 3.2 Multi-sample U-statistics Consider k independent sequences {X j1,..., X jnj of i.i.d. random variables, j 1,..., k. Let h(x jl, l 1,..., m j, j 1,..., k be a measurable function symmetric with respect to m j arguments of the j-th set, m j 1, j 1,..., k. Let θ Eh(X jl, l 1,..., m j, j 1,..., k. The multi-sample U-statistic is defined as { k U n ( nj m j 1 h(xjl, l i j1,..., i jmj, j 1,..., k where n (n 1,..., n k and the summation is carried out over all 1 i j1 <... < i jmj n j, n j 2m j, j 1,..., k. Clearly, U n is an unbiased estimate of θ. The two-sample Wilcoxon statistic and the two-sample ω 2 -statistic are two typical examples of the multi-sample U-statistics. Without loss of generality, assume θ 0. For j 1,..., k, define ( h j (x E h(x 11,..., X 1m1 ;... ; X k1,..., X kmk X j1 x 7

and let σ 2 j Eh2 j (X j1 and σ 2 n k m 2 j n j σ 2 j. A uniform Berry-Esseen bound with order O((min 1jk n j 1/2 for the multi-sample U-statistics was obtained by Helmers and Janssen (1982 and Borovskich (1983 (see, [Koroljuk and Borovskich (1994, pp. 304-311.]. Next theorem refines their results. Theorem 3.2 Assume that θ 0, σ 2 : Eh 2 (X 11,..., X 1m1 ;... ; X k1,..., X kmk < and max 1jk σ j > 0. Then for 2 < p 3 (3.7 ( sup P z σ 1 n U n z Φ(z (1 + 2σ σ n k m 2 j n j + 6.6 σ p n k m p j n p 1 j E h j (X j1 p and for z R 1 (3.8 ( P σ 1 n U n z Φ(z 9σ 2 (1 + z 2 σ 2 n C + (1 + z p σ p n ( k k m 2 j n j 2 + 13.5e z /3 σ σ n m p j E h j(x j1 p. n p 1 j k m 2 j n j 3.3 L-statistics Let X 1,..., X n be i.i.d. random variables with a common distribution function F, and let F n be the empirical distribution function defined by F n (x n 1 n Let J(t be a real-valued function on [0, 1] and define T (G for non-decreasing measurable function G. Put σ 2 I(X i x for x R 1. xj(g(xdg(x J(F (sj(f (tf (min(s, t(1 F (max(s, tdsdt and g(x (I(x s F (sj(f (sds. 8

The statistic T (F n is called an L-statistic (see [Serfling (1980, Chapter 8]. Uniform Berry-Esseen bounds for L-statistic for smoothing J were given by Helmers (1977, and Helmers, Janssen and Serfling (1990. Applying Theorems 2.1 and 2.2 yields the following uniform and non-uniform bounds for L-statistic. Theorem 3.3 Let n 4. Assume that EX 2 1 < and E g(x 1 p < for 2 < p 3. If the weight function J(t is Lipschitz of order 1 on [0, 1], that is, there exists a constant c 0 such that (3.9 then (3.10 and (3.11 J(t J(s c 0 t s for 0 s, t 1 sup P ( nσ 1 (T (F n T (F z Φ(z (1 + 2c 0 X 1 2 + 6.1E g(x 1 p z nσ n (p 2/2 σ p P ( nσ 1 (T (F n T (F z Φ(z 9c 2 0 EX2 1 (1 + z 2 nσ 2 + C ( c0 X 1 2 (1 + z p + E g(x 1 p nσ n (p 2/2 σ p 3.4 Random sums of independent random variables with non-random centering Let {X i, i 1 be i.i.d. random variables with EX i µ and Var(X i σ 2, and let {N n, n 1 be a sequence of non-negative integer-valued random variables that are independent of {X i, i 1. Assume that EN 2 n < and N n EN n Var(Nn d. N(0, 1. Then by Robbins (1948, Nn X i (EN n µ σ 2 EN n + µ 2 Var(N n d. N(0, 1. This is a special case of limit theorems for random sums with non-random centering. This kind of problems arises in the study, for example, of Galton-Watson branching processes. We refer to Finkelstein, Kruglov and Tucker (1994 and references therein for recent developments in this area. As another application of our general result, we give a uniform Berry-Esseen bound for the random sum. 9

Theorem 3.4 Let {Y i, i 1 be i.i.d. non-negative integer-valued random variables with EY i ν and Var(Y i τ 2. Put N n n Y i. Assume that E X i 3 < and that {Y i, i 1 and {X i, i 1 are independent. Then (3.12 ( N n sup P X i nµν x n(νσ 2 + τ 2 µ 2 x Φ(x Cn 1/2( τ 2 ν 2 + E X 1 3 σ 3 + σ µ. ν 3.5 Functions of non-linear statistics Let X 1, X 2,..., X n be a random sample and ˆΘ n ˆΘ n (X 1,..., X n be a weak consistent estimator of an unknown parameter θ. Assume that ˆΘ n can be written as ˆΘ n θ + 1 ( n g i (X i + n where g i are Borel measurable functions with Eg i (X i 0 and n Eg 2 i (X i 1, and : n (X 1,..., X n 0 in probability. Let h be a real-valued function differentiable in a neighborhood of θ with g (θ 0. Then, it is known that n(h( ˆΘn h(θ h (θ d. N(0, 1 under some regularity conditions. When ˆΘ n is the sample mean, the Berry-Esseen bound and Edgeworth expansion have been well studied (see Bhattacharya and Ghosh (1978. The next theorem shows that the results in Section 3 can be extended to functions of non-linear statistics. Theorem 3.5 Assume that h (θ 0 and δ(c 0 sup x θ c0 h (x < for some c 0 > 0. Then for 2 < p 3, (3.13 where W n g i (X i. ( n(h( sup P ˆΘ n h(θ z h (θ (1 + c 0δ(c 0 h (θ ( E W + z Φ(z E g i (X i ( i +6.1 E g i (X i p + 4 c 2 0 n + 2E 4.4c3 p 0 δ(c 0 + c 0 n1/2 h (θ n (p 2/2, 10

4 An example In this section we give an example to show that the bound of (2.4 in Theorem 2.1 is achievable. Moreover, the term E g i (X i ( i in (2.4 can not be dropped off. The example also provides a counter-example to a result of Shorack (2000 and of Bolthausen and Götze (1993. Example 4.1 Let X 1,..., X n be independent normally distributed random variables with mean zero and variance 1/n. Define W X i, T : T ε W ε W 1/2 + ε c 0 and T W ε W 1/2 + ε c 0, where c 0 E( W 1/2 2/π 0 x 1/2 e x2 /2 dx. Let { ˆX i, 1 i n be an independent copy of {X i, 1 i n and define (4.1 α 1 E (X 1,..., X i,..., X n (X 1,..., n ˆX i,..., X n. Then ET 0 and for 0 < ε < 1/64 and n (1/ε 4 (4.2 (4.3 (4.4 where C is an absolute constant. P (T ε c 0 Φ(ε c 0 ε 2/3 /6, E W + E 7ε, E + E X i 3 + α Cε (4.5 Clearly, (4.2 implies that sup P (T ε z Φ(z ε 2/3 /6. z A result of Shorack (2000 (see Lemma 11.1.3, p. 261, [22] states that for any random variables W and, (4.6 sup z P (W + z Φ(z sup P (W z Φ(z + 4E W + 4E. z Another result which is in Theorem 2 of Bolthausen and Götze (1993 states that if ET 0, then (4.7 ( sup P (T z Φ(z C E + z where C is an absolute constant and α is defined in (4.1. E g i (X i 3 + α, In view of (4.3, (4.4 and (4.5, the result of Shrock and of Bolthausen and Götze can be shown to lead to a contradiction. 11

5 Proof of Main Theorems In this section we prove Theorems 2.1 and 2.2 and Remarks 2.1 and 2.2. Proof of Theorem 2.1. (2.5 follows from (2.4 and (1.4. When β > 1/2, (2.4 is trivial. For β 1/2, (2.4 is a consequence of (2.3 and Remark 2.2. Thus, we only need to prove (2.3. Note that (5.1 P (z W z P (T z P (W z P (z W z +. It suffices to show that (5.2 P (z W z + 4δ + E W + E g i (X i ( i and (5.3 where δ satisfies (2.2. Let (5.4 Let P (z W z 4δ + E W + E g i (X i ( i /2 δ for w z δ, f (w w 1 2 (2z + for z δ w z + + δ, /2 + δ for w > z + + δ. ξ i g i (X i, ˆMi (t ξ i {I( ξ i t 0 I(0 < t ξ i, M i (t E ˆM i (t, ˆM(t n ˆM i (t, M(t E ˆM(t. Since ξ i and f i (W ξ i are independent for 1 i n and Eξ i 0, we have (5.5 E{W f (W E{ξ i (f (W f (W ξ i 1in + E{ξ i (f (W ξ i f i (W ξ i 1in : H 1 + H 2. Using the fact that ˆM(t 0 and f (w 0, we have (5.6 H 1 1in 1in 0 E {ξ i f (W + tdt ξ i { E f (W + t ˆM i (tdt 12

{ E { E f (W + t ˆM(tdt f (W + t ˆM(tdt t δ { E I(z W z + 1in H 1,1 H 1,2, t δ ˆM(tdt { E I(z W z + ξ i min(δ, ξ i where H 1,1 P (z W z + Eη i, H 1,2 E η i Eη i, η i ξ i min(δ, ξ i. 1in 1in By (2.2, Hence (5.7 Eη i 1/2. 1in H 1,1 (1/2P (z W z +. By the Cauchy-Schwarz inequality, (5.8 H 1,2 (E( η i Eη i 2 1/2 1in ( 1in Eη 2 i 1/2 δ. As to H 2, it is easy to see that Hence (5.9 Combining (5.5, (5.7, (5.8 and (5.9 yields f (w f i (w i /2 i /2. H 2 (1/2 E ξ i ( i. { P (z W z + 2 E W f (W + δ + (1/2 E ξ i ( i E W + 2δE W + 2δ + E ξ i ( i 4δ + E W + E ξ i ( i. 13

This proves (5.2. Similarly, one can prove (5.3 and hence Theorem 2.1. Proof of Theorem 2.2. First, we prove (2.9. For z 4, (2.9 holds by (2.5. For z > 4, consider two cases. Case 1. n E g i (X i p > 1/2. By the Rosenthal (1970 inequality, we have (5.10 Hence P ( W > ( z 2/3 P ( W > z /6 ( z /6 p E W p C( z + 1 p{( n p/2 Egi 2 n (X i + E g i (X i p n C( z + 1 p E g i (X i p. P (T z Φ(z P ( > ( z + 1/3 + P ( W > ( z 2/3 + P ( N(0, 1 > z n P ( > ( z + 1/3 + C( z + 1 p E g i (X i p, which shows that (2.9 holds. Case 2. n E g i (X i p 1/2. Similar to (5.10, we have P ( W g i (X i > ( z 2/3 C( z + 1 p{( n p/2 Egj 2 n (X j + E g j (X j p C( z + 1 p and hence γ z P ( > ( z + 1/3 + (( z + 1/3 p E g i (X i p + C( z + 1 p E g i (X i p n P ( > ( z + 1/3 + C( z + 1 p E g i (X i p. By Remark 2.1, we can choose δ ( 2(p 2 p 2 (p 1 p 1 E g i (X i p 1/(p 2 2(p 2p 2 (p 1 p 1 E g i (X i p. Combining the above inequalities with (2.6 and the non-uniform Berry-Essee bound for independent random variables yields (2.9. 14

Next we prove (2.6. The main idea of the proof is first to truncate g i (X i and then adopt the proof of Theorem 2.1 to the truncated sum. Without loss of generality, assume z 0 as we can simply apply the result to T. By (5.1, it suffices to show that (5.11 and (5.12 P (z W z γ z + e z/3 τ P (z W z + γ z + e z/3 τ. Since the proof of (5.12 is similar to that of (5.11, we only prove (5.11. It is easy to see that P (z W z P ( > (z + 1/3 + P (z W z, (z + 1/3. Now (5.11 follows directly by Lemmas 5.1 and 5.2 below. This completes the proof of Theorem 2.2. Lemma 5.1 Let Then ξ i g i (X i, ξi ξ i I(ξ i 1, W n ξ i. (5.13 P (z W z, (z + 1/3 P (z W z, (z + 1/3 + P (ξ i > (z + 1/3 + P (W ξ i > (z 2/3P ( ξ i > 1. Proof. We have P (z W z, (z + 1/3 P (z W z, (z + 1/3, max 1in ξ i 1 +P (z W z, (z + 1/3, max ξ i > 1 1in P (z W z, (z + 1/3 + P (W > (2z 1/3, ξ i > 1 and P (W > (2z 1/3, ξ i > 1 15

as desired. P (ξ i > (z + 1/3 + P (W > (2z 1/3, ξ i (z + 1/3, ξ i > 1 P (ξ i > (z + 1/3 + P (W ξ i > (z 2/3, ξ i > 1 P (ξ i > (z + 1/3 + P (W ξ i > (z 2/3P ( ξ i > 1, Lemma 5.2 We have (5.14 P (z W z, (z + 1/3 e z/3 τ. Proof. Noting that E ξ i 0, e s 1 + s + s 2 (e a 1 aa 2 for s a and a > 0 and that a ξ i a, we have for a > 0 (5.15 Ee a W n Ee a ξ i n ( 1 + a E ξ i + (e a 1 ae ξ i 2 ( exp (e a 1 a E ξ i 2 ( exp (e a 1 a Eξi 2 exp(e a 1 a. In particular, we have Ee W /2 exp(e 1/2 1.5. If δ 0.07, then This proves (5.14 when δ 0.07. P (z W z, (z + 1/3 P ( W > (2z 1/3 e z/3+1/6 Ee W /2 e z/3 exp(e.5 4/3 1.38e z/3 20δ e z/3. For δ < 0.07, let 0 for w z δ, (5.16 f (w e w/2 (w z + + δ for z δ w z + δ, e w/2 ( + 2δ for w > z + δ. Put M i (t ξ i {I( ξ i t 0 I(0 < t ξ i, 16 M(t n M i (t.

By (5.5 and similar to (5.6, we have (5.17 E{W f ( W { E f ( W + t M(tdt { + E ξ i (f ( W ξ i f i ( W ξ i : G 1 + G 2, It follows from the fact that M(t 0, f (w ew/2 for z δ w z + δ and f (w 0 for all w, (5.18 { G 1 E t δ f ( W + t M(tdt { E e W /2 I(z W z, (z + 1/3 { E e W /2 I(z W z, (z + 1/3 { +E e W /2 I(z W z, (z + 1/3 G 1,1 G 1,2, t δ t δ t δ M(tdt E M(tdt ( M(t E M(tdt where G 1,1 e z/3 1/6 P (z W z, (z + 1/3 { G 1,2 E e W /2 M(t E M(t dt. t δ t δ E M(tdt, By (2.2 and the assumption that δ 0.07, Hence (5.19 t δ E M(tdt E ξ i min(δ, ξ i E ξ i min(δ, ξ i 1/2. ( G 1,1 (1/2e z/3 1/6 P z W z, (z + 1/3. By (5.15, we have Ee W exp(e 2 < 2.06. It follows from the Cauchy-Schwarz inequality that (5.20 G 1,2.5 t δ ( 0.5Ee W + 2E ˆM(t M(t 2 dt { 0.5 2.06δ + 2 Eξi 2 (I( ξ i t 0 + I(0 < t ξ i dt t δ 17

{ 0.5 2.06δ + 2 { 0.5 2.06δ + 2δ Eξi 2 Eξi 2 min(δ, ξ i 2.03δ. As to G 2, it is easy to see that f (w f i (w e w/2 i e w/2 i. Hence, by the Hölder inequality, (5.15 and the assumption that ξ i and W ξ i are independent (5.21 G 2 E ξ i e ( W ξ i /2 ( i (Eξ i 2 e W 1/2 ξ i (E( i 2 1/2 ( Eξi 2 Ee W ξ i 1/2 i 2 1.44 ξ i 2 i 2. Following the proof of (5.15 and by using e s 1 s (e a 1/a for s a and a > 0, we have EW 2 e W Eξi 2 e ξ i Ee W ξ i + Eξ i (e ξ i 1Eξ j (e ξ j 1Ee W ξ i ξ j 1i jn 2.06 e Eξi 2 + 2.06(e 1 2 Eξi 2 Eξj 2 1i jn Thus, we obtain 2.06 e + 2.06(e 1 2 < 3.42 2. (5.22 E{W f ( W E W e W /2 ( + 2δ { ( 2 + 2δ E(W 2 e W 1/2 3.42( 2 + 2δ. Combining (5.17, (5.19, (5.20, (5.21 and (5.22 yields P (z W z, (z + 1/3 2e z/3+1/6{ 3.42( 2 + 2δ + 2.03δ + 1.44 ξ i 2 i 2 e z/3{ 21δ + 8.1 2 + 3.5 ξ i 2 i 2 e z/3 τ. 18

This proves (5.14. Proof of Remark 2.1. It is known that for x 0, y 0, α > 0, γ > 0 with α + γ 1 x α y γ αx + γy, ( which yields with α (p 2/(p 1, γ 1/(p 1, x b(p 1/(p 2 and y a x α y γ αx + γy b + (p 2p 2 a p 1 (p 1 p 1 b p 2 ( p 2 p 1 p 2 ap 1 b p 2 1/(p 1, or On the other hand, it is clear that b a (p 2p 2 a p 1 (p 1 p 1 b p 2. a a (p 2p 2 a p 1 (p 1 p 1 b p 2. This proves (2.11. Now (2.2 follows directly from (2.11, (2.10 and the assumption (1.1. Proof of Remark 2.2. Note that δ β/2 1/4. Applying (2.11 with p 3 yields E g i (X i min(δ, g i (X i E g i (X i I( g i (X i 1 min(δ, g i (X i { Egi 2 (X i I( g i (X i 1 E g i (X i 3 I( g i (X i 1/(4δ ( 1 4δ Egi 2 (X i I( g i (X i > 1 + E g i (X i 3 I( g i (X i 1 /(4δ 1 β/(4δ 1/2. This proves Remark 2.2. 6 Proofs of Other Theorems In this section, we prove Theorems 3.1-3.5. 19

6.1 Proof of Theorem 3.1 For 1 k m, let h k (x 1,..., x k E(h(X 1,..., X m X 1 x 1,..., X k x k and h k (x 1,..., x k h k (x 1,..., x k k g(x i. Observing that U n n 1 m ( n 1 g(x i + h m m (X i1,..., X im, 1i 1 <...<i mn we have where n mσ 1 U n W +, W 1 nσ1 n n mσ 1 ( n m g(x i, 1 1i 1 <...<i mn h m (X i1,..., X im. Let l n ( n 1 h mσ 1 m m (X i1,..., X im. 1i 1 <...<i m,i j l for all j By Theorems 2.1 and 2.2 (with Remark 2.3 for proof of (3.1, it suffices to show that (6.1 E 2 (m 12 σ 2 m(n m + 1σ 2 1 and (6.2 E l 2 It is known that (see, e.g., [18], p.271 2(m 1 2 σ 2 nm(n m + 1σ1 2. (6.3 ( E 1i 1 <...<i mn 2 h m (X i1,..., X im ( n m ( m ( n m E h 2 m j m j j(x 1,..., X j. j2 Note that (6.4 E h 2 j(x 1,..., X j j j Eh 2 j(x 1,..., X j 2 E[g(X i h k (X 1,..., X j ] + E( g(x i 2 Eh 2 j(x 1,..., X j 2jE[g(X 1 E(h(X 1,..., X m X 1,..., X j ] + keg 2 (X 1 20

Eh 2 j(x 1,..., X j 2jE[g(X 1 h(x 1,..., X m ] + jeg 2 (X 1 Eh 2 j(x 1,..., X j 2jEg 2 (X 1 + jeg 2 (X 1 Eh 2 j(x 1,..., X j jeg1(x 2 1. We next prove that for 2 j m (6.5 Eh 2 j 1(X 1..., X j 1 j 1 Eh 2 j j(x 1,..., X j Since E h 2 2 (X 1, X 2 0, (6.5 holds for j 2 by (6.4. Assume that (6.5 is true for j. Then (6.6 E(h j+1 (X 1,..., X j+1 h j (X 1,..., X j h j (X 2,..., X j+1 2 On the other hand, we have Eh 2 j+1(x 1,..., X j+1 4E[h j+1 (X 1,..., X j+1 h j (X 1,..., X j ] +2Eh 2 j(x 1,..., X j + 2Eh j (X 1,..., X j h j (X 2,..., X j+1 Eh 2 j+1(x 1,..., X j+1 2Eh 2 j(x 1,..., X j ( +2E E(h j (X 1,..., X j h j (X 2,..., X j+1 X 2,..., X j Eh 2 j+1(x 1,..., X j+1 2Eh 2 j(x 1,..., X j + 2Eh 2 j 1(X 1,..., X j 1. (6.7 E(h j+1 (X 1,..., X j+1 h j (X 1,..., X j h j (X 2,..., X j+1 2 ( E E(h j+1 (X 1,..., X j+1 h j (X 1,..., X j h j (X 2,..., X j+1 X 1,..., X j 2 Eh 2 j 1(X 1,..., X j 1. Combining (6.5 and (6.6 yields 2Eh 2 j(x 1,..., X j Eh 2 j+1(x 1,..., X j+1 + Eh 2 j 1(X 1,..., X j 1 Eh 2 j+1(x 1,..., X j+1 + j 1 Eh 2 j j(x 1,..., X j by the induction hypothesis, which in turn reduces to (6.4 for j + 1. This proves (6.4. It follows from (6.4 that (6.8 Eh 2 j(x 1,..., X j j m Eh2 m(x 1,..., X m j m σ2. (6.9 To complete the proof of (6.1, we need the following two inequalities: m ( m j j2 ( n m j m(m 12 n m j m (n m + 1n( m 21

and (6.10 m 1 for n > m 2. In fact, we have and m 1 ( m 1 ( n m j + 1 2(m 12 n j m 1 j m (n m + 1n( m m ( m j j2 ( n m j m j m m j2 m 1 ( m 1 j 1 ( n m m j ( m 1 ( n m j m 1 j ( n 1 m 1 ( n 1 {1 m 1 ( n 1 { 1 m 1 ( n m m 1 (n m!/(n m m + 1! (n 1!/(n m! n 1 jn m+1 ( n 1 n 1 m 1 m 1 jn m+1 j ( n 1 (m 1 2 m 1 n m + 1 (m 1 2 m (n m + 1n( n m ( m 1 ( n m j + 1 j m 1 j m m 1 m m 1 m m 1 m 2 j0 m 1 ( n 2 m m 2 ( n 1 ( (m 1 2 m 1 m(n 1. 2(m 1 2 (n m + 1n( n m (1 m 1 j ( m 1 ( n m j j m 1 j m 1 + 1 m ( m 2 ( n m + 1 j m 2 j m + 1 ( n 1 m { m 1 + (m 12 (n m + 1m m 1 ( n m m 1 m 1 ( m 1 ( n m j m 1 j ( m 1 ( n m j m 1 j 22

(6.11 From (6.8 and (6.9 we obtain that This proves (6.1. Similarly, by (6.10 (6.12 E( l 2 E 2 n ( n 2E { 2 h m 2 σ1 2 m m (X 1,..., X m 1i 1 <...<i mn m 1 n m 2 σ 2 1 ( n m nσ2 m 2 σ 2 1 ( n m j2 m 1 j2 (m 1 2 σ 2 (n m + 1 m σ1 2. ( m j ( m j ( n m m j ( n m j m j m Eh 2 j(x 1,..., X j n ( n 2E ({ 2 m 2 σ1 2 hm (X m i1,..., X im 1i 1 <...<i mn 1i 1 <...<i mn, all i j l n ( n 2E ( 2 h m 2 σ1 2 m m (X i1,..., X im 1, X m 1i 1 <...<i m 1 n 1 n ( n 2 ( n 1 m 2 σ1 2 m m 1 n ( n 2 ( n 1 m 2 σ1 2 m m 1 σ2 mσ 2 1 ( n m 1 m 1 2(m 1 2 σ 2 m n(n m + 1 σ1 2. m 1 m 1 ( m 1 ( n m E h 2 j m 1 j j+1(x 1,..., X j ( m 1 ( n m Eh 2 j m 1 j j+1(x 1,..., X j ( m 1 ( n m j + 1 j m 1 j m This proves (6.2 and hence completes the proof of Theorem 3.1. 6.2 Proof of Theorem 3.2 We follow a similar argument as that in the proof of Theorem 3.1. For 1 j k, let X j (X j1,..., X jmj and x j (x j1,..., x jmj and define k m j h(x 1,..., x k h(x 1,..., x k h j (x ji. For the given U-statistic U n, we define its projection k n j Û n l1 23 E(U n X jl.

Since we have m j /n j k n j Û n l1 ( nj 1 ( nj /, m j 1 m j m j n j h j (X jl. The difference U n Û n can be rewritten as { k U n Û n ( nj m j 1 h(x1i 1,..., X ki k, where X ji j (X jij1,..., X jijmj and the summation is carried out over all indices 1 i j1 < i j2 <... < i jmj n j, j 1, 2,..., k. Thus, we have with Let W σ 1 n n k j l1 σ 1 n U n W + m j n j h j (X jl, { σ 1 k ( nj 1 n h(x1i m j 1,..., X ki k. { jl σ 1 k ( nv 1 n (jl h(x1i m v1 v 1,..., X ki k, where the summation is carried out over all indices 1 i v1 < i v2 <... < i vmv n v, 1 v k, v j and 1 i j1 < i j2 <... < i jmj n j with i js l for 1 s m j. By Theorems 2.1 and 2.2, it suffices to show that (6.13 E 2 σ2 ( k m 2 j 2 σ 2 n n j and (6.14 For 0 d j m j, 1 j k let E jl 2 2σ2 m 2 j n 2 j σ2 n k v1 m 2 v n v Y d1,...,d k (x ji, 1 i d j, 1 j k E h(x j1,..., x jd1, X jdj +1,..., X jmj, 1 j k 24

and Noting that y d1,...,d k EY 2 d 1,...,d k (X ji, 1 i d j, 1 j k. E( h(x 1i 1,..., X ki k X jl 0 for every 1 l m j, 1 j k, we have (see (4.5.8 in [18] (6.15 ( 2 E U n Û n { k σ 2{ k ( nj m j 1 σ 2( k ( nj m j 1 m 2 j n j 2, d 1 +... + d k 2 0 d j m j, 1 j k d 1 +... + d k 2 0 d j m j, 1 j k k {( mj d j ( nj m j m j d j y d1,...,d k k {( mj ( nj m j d j m j d j where in the last inequality we used the fact that (6.16 d 1 +... + d k 2 0 d j m j, 1 j k k {( mj ( nj m ( j k m 2 j 2 d j m j d j n j k ( nj m j. (See below for proof. This proves (6.13. As to (6.14, consider j 1 only. Similar to (6.12 and (6.15, we have (with X 1i 1 (X 1i1,1,..., X 1i1,m1 1, X 1,m 1 σ 2 ne 1l 2 { k v1 σ 2{ k ( nv m v 2E ( v1 1 i v1 < i v2 <... < i vmv n v, 2 v k 1 i 1,1 < i 1,2 <... < i 1,mj 1 n j 1 ( nv 2 ( n1 1 k ( nv m v m 1 1 m v2 v d 1 +... + d k 1 0 d 1 m 1 1, 0 d v m v, 2 v k ( m1 1 d 1 ( n1 m 1 m 1 1 d 1 k 2 h(x 1i, X 1 2i 2,..., X ki k v2 ( mv ( nv m v d v m v d v 25

σ2 m { 1 k ( nv 1 { n 1 m v1 v 1d 1 m 1 1 + 2vk d 1 0,1d vm v σ2 m { 1 k ( nv 1 { (m1 1 2 ( n1 1 k ( nj n 1 m v1 v n 1 m 1 + 1 m 1 1 m j2 j σ2 m 2 1 n 2 1 1vk for n 1 2m 1. This proves (6.14. ( n1 m + 1 m 2 v m 1 1 n 2vk v m v + 1 m 2 v n v m v + 1 2σ2 m 2 1 n 2 1 1vk m 2 v n v Now we prove (6.16. Consider two cases in the summation: Case 1: At least one of d j 2, say d 1 2. In this case, by (6.9 for n 1 2m 1. d 1 2 1 d j m j, 1 j k { k j2 { k m4 1 n 2 1 j2 ( nj m j ( nj m j m 2 k 2d 1 m 1 m 2 1 (m 1 1 2 2n 1 (n 1 m 1 + 1 k ( nj m j k j2 {( mj ( nj m j d j m j d j 2d 1 m 1 k ( m1 ( n1 m 1 d 1 m 1 d 1 ( nj m j ( m1 d 1 ( n1 m 1 m 1 d 1 d1 m ( nj m j Case 2. At least two of {d j are equal to 1, say d 1 d 2 1. Then k {( mj ( nj m j d d 1 d 2 1 j m j d j m 1 m 2 ( n1 m 1 m 1 1 m2 1 m2 2 n 1 n 2 k ( nj m j. ( n2 m 2 m 2 1 k j3 ( nj m j 26

Thus, we have d 1 +... + d k 2 1 d j m j, 1 j k ( k m 4 j n 2 + j ( k m 2 j 2 k n j k 1i jk ( nj m j. {( mj ( nj m j d j m j d j m 2 i m2 j n i n j k This proves (6.16. Now the proof of Theorem 3.2 is complete. 6.3 Proof of Theorem 3.3 Let ψ(t t 0 and hence J(sds. As in [Serfling (1980, p.265], we have ( nj m j T (F n T (F [ψ(f n (x ψ(f (x]dx nσ 1 (T (F n T (F W +, where Let where F n,i (x 1 n W 1 n nσ (I(X i x F (xj(f (xdx nσ 1 [ψ(f n (x ψ(f (x (F n (x F (xj(f (x]dx. η i (x I(X i x F (x, g i (X i 1 (I(X i x F (xj(f (xdx, nσ i nσ 1 [ψ(f n,i (x ψ(f (x (F n,i (x F (xj(f (x]dx, {F (x + 1jn,j i I(X j x. We only need to prove (6.17 and (6.18 σ 2 E 2 c 2 0 n 1 EX 2 1 σ 2 E i 2 2c 2 0n 2 EX 2 1 27

Observe that the Lipschitz condition (3.9 implies (6.19 t s ψ(t ψ(s (t sj(s (J(u + s J(sdu 0.5 c 0 (t s 2 0 for 0 s, t 1. Hence Observe that and ( σ 2 E 2 0.25c 2 0 ne 2 2 2 (F n (x F (x 2 dx ( 0.25c 2 0 n 3 n 2dxdy E η i (xη j (y ( 0.25c 2 0 n 3 3n 2 Eη1(xEη 2 1(y 2 + ne{η1(xη 2 1(y 2 dxdy. ( Eη1(xEη 2 1(ydxdy 2 2 F (x(1 F (xdx (E X1 2 EX1 2 E{η 2 1(xη 2 1(ydxdy xy xy E{η 2 1(xη 2 1(ydxdy { (1 F (x 2 (1 F (y 2 F (x +F 2 (x(1 F (y 2 (F (y F (x + F 2 (xf 2 (y(1 F (y dxdy 2 F (x(1 F (ydxdy 2 { 2 xy { x0 xy0 + 0<xy x F (xdx + y 0 + x0,y>0 { 2 E(X1 2 + E(X 1 + 2 + EX1 EX+ 1 F (x(1 F (ydxdy y(1 F (ydy + x0 F (xdx (1 F (ydy y>0 4EX 2 1 This proves (6.17. Next we prove (6.18. Observe that σ i n [ψ(f n (x ψ(f n,i (x (F n (x F n,i (xj(f n,i (xdx + (F n (x F n,i (x[j(f n,i (x J(F (x]dx 28

( 2 E ηi (xdx 2 0.5c 0 (F n (x F n,i (x 2 dx +c 0 F n (x F n,i (x F n,i (x F (x dx 0.5c 0 n 2 (I(X i x F (x 2 dx +c 0 n 2 I(X i x F (x j x F (xdx j i{i(x 0.5c 0 n 2 ηi 2 (xdx + c 0 n 2 η i (x η j (x dx, j i Eη 2 1(xη 2 1(ydxdy 4EX 2 1 and ( E η i (x 2 η j (x dx j i (n 1 E{ η i (x j i η j (x η i (y j i E η i (xη i (y E{ j i η j (x j i η j (y dxdy η j (y dxdy η i (x 2 η i (y 2 η j (x 2 η j (y 2 dxdy j i j i η i (x 2 2 η i (y 2 2dxdy (n 1(E X 1 2 (n 1EX 2 1. Therefore ( σ 2 E i 2 n 3 c 2 0E 0.5 ηi 2 (xdx + η i (x 2 η j (x dx j i { ( 2 ( ηi 2 (xdx + 1.5E n 3 c 2 0 0.75E { n 3 c 2 0 3EX1 2 + 1.5(n 1EX1 2 2n 2 EX 2 1. η i (x 2 η j (x dx j i This proves (6.18 and hence the theorem. 29

6.4 Proof of Theorem 3.4 Let Z 1 and Z 2 be independent standard normal random variables that are independent of {X i and {Y i. Put and write b Nn νσ 2 + τ 2 µ 2, T n X i nµν Nn, H n X i N n µ nb Nn σ T n Nn σ H n + (N n nνµ Nn σ, T n (Z 1 Z 1 + (N n nνµ. nb nb nb nb Applying the Berry-Esseen bound to H n for given N n yields (6.20 sup P (T n x P (T n (Z 1 x x ( X1 3 P ( N n nν > nν/2 + CE Nn σ 3 I{ N n nν nν/2 4n 1 ν 2 τ 2 + Cn 1/2 ν 1/2 σ 3 E X 1 3. Let x.5nν x 1.5nν W n N n nν nτ, for x <.5nν for.5nν x 1.5nν for x > 1.5nν T n (Z 1 : N n σ nb Z 1 + (N n nνµ nb τµ b (W n + σ ν τµ Z 1 +, where Let ( N n nνσz 1 nτµ. i ( (N n Y i + ν nνσz 1 nτµ. Then E( W n Z 1 Z ( 1 σ Wn (N n nν E Z 1 σ nτµ nν nµ ν and 1 nτ E( (Y i ν( i Z 1 Z 1 n 3/2 τ 2 µ ν E(Y i ν 2 Z 1 σ n 3/2 µ ν. 30

Now letting T n (Z 1, Z 2 τµ b and applying Theorem 2.1 for given Z 1 yields (Z 2 + σ ν τµ Z 1 (6.21 sup P (T n (Z 1 x P (T n (Z 1, Z 2 x x P ( N n nν > 0.5nν + sup P ( T n (Z 1 x P (T n (Z 1, Z 2 x x 4τ 2 ( nν 2 + C E X1 3 n 1/2 σ 3 + E Z 1 σ n 1/2 µ ν Cn 1/2( τ 2 ν 2 + E X 1 3 σ 3 + σ µ ν It is clear that T n (Z 1, Z 2 has a standard normal distribution. This proves (3.12 by (6.20 and (6.21. 6.5 Proof of Theorem 3.5 Since (3.13 is trivial if n E g i (X i p > 1/6, we assume (6.22 E g i (X i p 1/6. Let W n g i (X i. It is known that for 2 < p 3 (6.23 Observe that n(h( ˆΘn h(θ (6.24 h (θ where E W p 2(EW 2 p/2 + E g i (X i p 2.2. n ( ˆΘn θ h h (θ( (θ ˆΘ n θ + W + + : W + Λ + R, n h (θ 0 n 1/2 (W + 0 [h (θ + t h (θ]dt [h (θ + t h (θ]dt n (n 1/2 W +(n 1/2 Λ + h [h (θ + t h (θ]dt, (θ 0 n n 1/2 (W + R h [h (θ + t h (θ]dt, (θ (n 1/2 W +(n 1/2 c 0 /2 for x < c 0 /2, x x for c 0 /2 x c 0 /2, c 0 /2 for x > c 0 /2. 31

Clearly, n 1/2 W c 0 /2 and n 1/2 c 0 /2 imply R 0. Hence (6.25 P ( R > 0 P ( W > c 0 n 1/2 /2 + P ( > c 0 n 1/2 /2 4/(c 2 0n + 2E /(c 0 n 1/2. To apply Theorem 2.1, let W i W g(x i and Noting that Λ i i + n (n 1/2 W i +(n 1/2 i h [h (θ + t h (θ]dt. (θ 0 (6.26 (n 1/2 W +(n 1/2 [h (θ + t h (θ]dt 0 0.5δ(c 0 ((n 1/2 W + (n 1/2 2 δ(c 0 ((n 1/2 W 2 + (n 1/2 2 ( δ(c 0 (c 0 /2 3 p (n 1/2 W p 1 + (c 0 /2n 1/2, we have (6.27 E W Λ E W + (c 0/2 3 p δ(c 0 h (θ n (p 2/2 E W p + c 0δ(c 0 h E W (θ (1 + c 0δ(c 0 E W h + 2.2c3 p 0 δ(c 0 (θ h (θ n (p 2/2. Similar to (6.26, From this we obtain (n 1/2 W +(n 1/2 [h (θ + t h (θ]dt (n 1/2 W i +(n 1/2 i ( δ(c 0 (c 0 3 p (n 1/2 W (n 1/2 W i (n 1/2 W + (n 1/2 W i p 2 +c 0 (n 1/2 (n 1/2 i ( δ(c 0 c 3 p 0 n (p 1/2 g(x i (2 W i p 2 + g(x i p 2 + c 0 n 1/2 i. (6.28 E g(x i (Λ Λ i E g(x i ( i nδ(c0 + {(c h 0 3 p n (p 1/2 n ( E g(x i 2 (2 W i p 2 + g(x i p 2 (θ 32

n +c 0 n 1/2 E g(x i ( i (1 + c 0δ(c 0 n h E g(x i ( i (θ + c3 p 0 δ(c 0 h (θ n (p 2/2 (2Eg 2 (X i + E g(x i p (1 + c 0δ(c 0 n h E g(x i ( i + 2.2c3 p 0 δ(c 0 (θ h (θ n (p 2/2. This proves (3.13 by (2.5, (6.25, (6.27 and (6.28. 6.6 Proofs of Example 4.1 and Remark 2.5 First we prove Example 4.1. Let Z denote a standard normally distributed random variable and φ(x be the standard normal density function. Observe that 0 < c 0 < 2 and P (T ε c 0 Φ(ε c 0 P (Z ε/ Z 1/2 0 Φ(ε c 0 which proves (4.2. Clearly, we have P (Z 0 + P (Z 3/2 ε, Z > 0 Φ(ε c 0 ε 2/3 0 ε 2/3 2ε φ(tdt εc0 0 φ(tdt φ(tdt (ε 2/3 2ε/3 ε 2/3 /6, E W + E εe c 0 Z Z Z 1/2 c0 + εe Z 1/2 ε(c 0 + 1 + 2c 0 7ε by the fact that c 0 < 2. This proves (4.3. As to (4.4, observe that (6.29 E + E X i 3 2cε + 4n 1/2 8ε provided n > ε 2. Below we bound α. Since {X i are i.i.d., we have α εe W 1/2 ˆX 1 + X 2 +... X n 1/2. 33

Let Y and Z be independent standard normal random variables, and let r (n 1/n and s 1 r 2. Noting that EW ( ˆX 1 + X 2 +... X n r and EZ(sY + rz r, we see that (W, ˆX 1 + X 2 +... X n and (Z, sy + rz have the same distribution. Hence (6.30 Write E W 1/2 ˆX 1 + X 2 +... X n 1/2 E Z 1/2 sy + rz 1/2 sy + rz Z E Z 1/2 sy + rz 1/2 ( Z 1/2 + sy + rz 1/2 { Y se Z 1/2 sy + rz 1/2 ( Z 1/2 + sy + rz 1/2 { Z +(1 re Z 1/2 sy + rz 1/2 ( Z 1/2 + sy + rz 1/2 : sr 1 + (1 rr 2. R 1 E{ I( rz s Y /2 + E{ I(s Y /2 < rz 2s Y + E{ I( rz > 2s Y : R 1,1 + R 1,2 + R 1,3. Let C denote an absolute constant. Then { Y I( rz s Y /2 R 1,1 2E Z 1/2 sy (4/sE(s Y /(2r 1/2 4s 1/2, R 1,2 { Y I(s Y /2 < rz 2s Y 4E sy sy + rz 1/2 Cs 1/2 and Thus, we have R 1,3 { Y I( rz > 2s Y 2E Z 3/2 { Y CE (s Y 1/2 Cs 1/2. R 1 Cs 1/2. As to R 2, we have R 2 E(1/ sy + rz 1/2 c 0 34

Note that r 1 and s 2/n as n. Combining the above inequalities yields (6.31 α Cεn 1/4. By (6.29 and (6.31, we have E + E X i 3 + α 8ε + Cε 1/2 n 1/8 (8 + Cε provided that n > (1/ε 4. This proves (4.4. Finally, we prove (2.12 in Remark 2.5. Let θ (n 1/n, ρ (1/n 1/2, and let Y and Z be independent standard normal random variables. By (6.29, (6.32 E + E X i 3 4ε + 4n 1/2 8ε 2/3 for n (1/ε 4/3. Following the proof of (6.30, we have (6.33 E X i ( (X 1,..., X i,..., X n (X 1,..., 0,..., X n nεe X 1 ( X 1 + X 2 +... X n 1/2 X 2 +... X n 1/2 nεe ρy ( ρy + θz 1/2 θz 1/2 { nερ 2 Y 2 E ρy + θz 1/2 θz 1/2 ( ρy + θz 1/2 + θz 1/2 { Y 2 I( θz ρ Y /2 εe ρy + θz θz 1/2 { Y 2 I(ρ Y /2 < θz 2ρ Y +εe ρy + θz 1/2 θz { Y 2 I( θz > 2ρ Y +εe ρy + θz 1/2 θz { 2ερ 1 E Y θz 1/2 I( θz ρ Y /2 { Y I(ρ Y /2 < θz 2ρ Y +2ερ 1 E αy + θz 1/2 { Y 2 I( θz > 2ρ Y +2εE θz 3/2 Cερ 1/2 Cεn 1/4 2Cε 2/3 for an absolute constant C, provided that n 16(1/ε 4/3. This proves (2.12, by (6.32 and (6.33. 35

Acknowledgments. The authors are thankful to Xuming He for his contribution to the construction of the example in Section 4. References [1] Bentickus, V., Götze, F. and Zitikis, M. (1994. Lower estimates of the convergence rate for U-statistics. Ann. Probab. 22, 1701-1714. [2] Bhattacharya, R.N. and Ghosh, J.K. (1978. On the validity of the formal Edgeworth expansion. Ann. Statist. 6, 434-485. [3] Bickel,P. (1974. Edgeworth expansion in nonparametric statistics. Ann., Statist. 2, 1-20. [4] Bolthausen, E. and Götze,F. (1993. The rate of convergence for multivariate sampling statistics. Ann. Statist. 21 1692-1710. [5] Borovskich, Yu.V. (1983. Asymptotics of U-statistics and von Mises functionals. Sov. Math. Dokl. 27, 303-308. [6] Callaert, H. and Janssen, P. (1978. The Berry-Esseen theorem for U-statistics. Ann., Statist. 6, 417-421. [7] Chan, Y.K. and Wierman, J. (1977. On the Berry-Esseen theorem for U-statistics. Ann. Probab. 5, 136-139. [8] Chen, L.H.Y. and Shao, Q.M. (2001. A non-uniform Berry-Esseen bound via Stein s method. Probab. Theory Related Fields 120, 236-254. [9] Figiel, T., Hitczenko, P., Johnson, W.B., Schechtman, G., and Zinn, J. (1997. Extremal properties of Rademacher functions with applications to the Khintchine and Rosenthal inequalities. Trans. Amer. Math. Soc. 349, 997-1027. [10] Filippova, A.A. (1962. Mises s theorem of the asympototic behavior of functionals of empirical distribution functions and its statistical applications. Theory Probab. Appl. 7, 24-57. 36

[11] Finelstein, M., Kruglov, V.M., and Tucker, H.G. (1994. Convergence in law of random sums with non-random centering. J. Theoret. Probab. 3, 565-598. [12] Friedrich, K. O. (1989. A Berry-Esseen bound for functions of independent random variables. Ann. Statist. 17, 170 183. [13] Grams, W.F. and Serfling, R.J. (1973. Convergence rates for U-statistics and related statistics. Ann., Statist. 1, 153-160. [14] Helmers, R. (1977. The order of the normal approximation for linear combinations of order statistics with smooth weight functions. Ann. Probab. 5, 940-953. [15] Helmers, R. and Janssen, P. (1982. On the Berry-Esseen theorem for multivariate U-statistics. Math. Cent. Rep. SW 90/82, Mathematisch Centrum, Amsterdam, pp. 1-22. [16] Helmers, R., Janssen, P. and Serfling, R. J. (1990. Berry-Esseen bound and bootstrap results for genalized L-statistics. Scand. J. Statist. bf 17, 65-77. [17] Hoeffding, W. (1948. A class of statististics with asymptotically normal distribution. Ann. Math. Statist. 19, 293-325. [18] Koroljuk, V.S. and Borovskich, Yu. V. (1994. Theory of U-statistics. Kluwer Academic Publishers, Boston. [19] Robbins, H. (1948. The asymptotic distribution of the sum of a random numbers of random random variables. Bull. A.M.S. 54, 1151-1161. [20] Rosenthal, H.P. (1970. On the subspaces of L p (p > 2 spanned by sequences of independent random variables. Israel J. Math. 8, 273-303. [21] Serfling, R.J. (1980. Approximation Theorems of Mathematical Statisitics. Wiley, New York. [22] Shorack, G.R. (2000. Probability for Statisticians. Springer, New York. [23] Wang, Q. (2001. Non-uniform Berry-Esséen Bound for U-Statistics. Statist. Sinica (to appear [24] Wang, Q., Jing, B.Y. and Zhao, L. (2000. The Berry-Esseen bound for studentized statistics. Ann. Probab. 28, 511 535. 37

[25] Zhao, L.C. and Chen, X.R. (1983. Non-uniform convergence rates for distributions of U- statistics. Sci. Sinica (Ser. A 26, 795-810. [26] van Zwet, W.R. (1984. A Berry-Esseen bound for symmetric statisics. Z. Wahrsch. Verw. Gebiete 66, 425-440. Louis H.Y. Chen Qi-Man Shao Institute for Mathematical Sciences Department of Mathematics National University of Singapore Hong Kong University of Science and Technology Singapore 118402 Clear Water Bay, Kowloon Republic of Singapore Hong Kong lhychen@ims.nus.edu.sg maqmshao@ust.hk and Department of Mathematics Department of Mathematics Department of Statistics & Applied Probability University of Oregon National University of Singapore Eugene, OR 97403 Singapore 117543 USA Republic of Singapore and Department of Mathematics Zhejiang University Hangzhou, Zhejiang 310027 China 38