Normal Approximation for Non-linear Statistics Using a Concentration Inequality Approach

Similar documents
EXACT CONVERGENCE RATE AND LEADING TERM IN THE CENTRAL LIMIT THEOREM FOR U-STATISTICS

Introduction to Self-normalized Limit Theory

Non-uniform Berry Esseen Bounds for Weighted U-Statistics and Generalized L-Statistics

Bounds of the normal approximation to random-sum Wilcoxon statistics

Edgeworth expansions for a sample sum from a finite set of independent random variables

Self-normalized Cramér-Type Large Deviations for Independent Random Variables

EMPIRICAL EDGEWORTH EXPANSION FOR FINITE POPULATION STATISTICS. I. M. Bloznelis. April Introduction

Bahadur representations for bootstrap quantiles 1

MOMENT CONVERGENCE RATES OF LIL FOR NEGATIVELY ASSOCIATED SEQUENCES

AN EDGEWORTH EXPANSION FOR SYMMETRIC FINITE POPULATION STATISTICS. M. Bloznelis 1, F. Götze

Cramér type moderate deviations for trimmed L-statistics

Convergence rates in weighted L 1 spaces of kernel density estimators for linear processes

RELATIVE ERRORS IN CENTRAL LIMIT THEOREMS FOR STUDENT S t STATISTIC, WITH APPLICATIONS

RIESZ BASES AND UNCONDITIONAL BASES

On large deviations of sums of independent random variables

A Gentle Introduction to Stein s Method for Normal Approximation I

MODERATE DEVIATIONS IN POISSON APPROXIMATION: A FIRST ATTEMPT

On large deviations for combinatorial sums

A BERRY ESSEEN BOUND FOR STUDENT S STATISTIC IN THE NON-I.I.D. CASE. 1. Introduction and Results

Cramér-Type Moderate Deviation Theorems for Two-Sample Studentized (Self-normalized) U-Statistics. Wen-Xin Zhou

THE LINDEBERG-FELLER CENTRAL LIMIT THEOREM VIA ZERO BIAS TRANSFORMATION

Almost Sure Central Limit Theorem for Self-Normalized Partial Sums of Negatively Associated Random Variables

SOME CONVERSE LIMIT THEOREMS FOR EXCHANGEABLE BOOTSTRAPS

A NOTE ON THE COMPLETE MOMENT CONVERGENCE FOR ARRAYS OF B-VALUED RANDOM VARIABLES

PRECISE ASYMPTOTIC IN THE LAW OF THE ITERATED LOGARITHM AND COMPLETE CONVERGENCE FOR U-STATISTICS

Submitted to the Brazilian Journal of Probability and Statistics

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½

Self-normalized Cramér type moderate deviations for the maximum of sums

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach

ORTHOGONAL DECOMPOSITION OF FINITE POPULATION STATISTICS AND ITS APPLICATIONS TO DISTRIBUTIONAL ASYMPTOTICS. and Informatics, 2 Bielefeld University

Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012

On probabilities of large and moderate deviations for L-statistics: a survey of some recent developments

AN INEQUALITY FOR TAIL PROBABILITIES OF MARTINGALES WITH BOUNDED DIFFERENCES

Han-Ying Liang, Dong-Xia Zhang, and Jong-Il Baek

ENTROPY-BASED GOODNESS OF FIT TEST FOR A COMPOSITE HYPOTHESIS

LIFE SPAN OF BLOW-UP SOLUTIONS FOR HIGHER-ORDER SEMILINEAR PARABOLIC EQUATIONS

Almost sure limit theorems for U-statistics

Spectral Gap and Concentration for Some Spherically Symmetric Probability Measures

Stein s Method and the Zero Bias Transformation with Application to Simple Random Sampling

Wasserstein-2 bounds in normal approximation under local dependence

Gaussian vectors and central limit theorem

Bounds on the Constant in the Mean Central Limit Theorem

Continuous Random Variables

Additive functionals of infinite-variance moving averages. Wei Biao Wu The University of Chicago TECHNICAL REPORT NO. 535

Journal of Inequalities in Pure and Applied Mathematics

Strong log-concavity is preserved by convolution

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

Selected Exercises on Expectations and Some Probability Inequalities

CHAPTER 3: LARGE SAMPLE THEORY

Chp 4. Expectation and Variance

Arithmetic progressions in sumsets

Tail bound inequalities and empirical likelihood for the mean

A generalization of Cramér large deviations for martingales

Lecture 13: Subsampling vs Bootstrap. Dimitris N. Politis, Joseph P. Romano, Michael Wolf

FOURIER TRANSFORMS OF STATIONARY PROCESSES 1. k=1

The Central Limit Theorem: More of the Story

A CHARACTERIZATION OF STRICT LOCAL MINIMIZERS OF ORDER ONE FOR STATIC MINMAX PROBLEMS IN THE PARAMETRIC CONSTRAINT CASE

ALMOST SURE CONVERGENCE OF THE BARTLETT ESTIMATOR

SHARP BOUNDARY TRACE INEQUALITIES. 1. Introduction

Asymptotic Statistics-III. Changliang Zou

Limit Theorems for Exchangeable Random Variables via Martingales

ECE 4400:693 - Information Theory

Formulas for probability theory and linear models SF2941

A regeneration proof of the central limit theorem for uniformly ergodic Markov chains

A generalization of Strassen s functional LIL

Kolmogorov Berry-Esseen bounds for binomial functionals

Optimal global rates of convergence for interpolation problems with random design

On the Error Bound in the Normal Approximation for Jack Measures (Joint work with Le Van Thanh)

A CLT FOR MULTI-DIMENSIONAL MARTINGALE DIFFERENCES IN A LEXICOGRAPHIC ORDER GUY COHEN. Dedicated to the memory of Mikhail Gordin

Concentration inequalities and the entropy method

Qualifying Exam in Probability and Statistics.

arxiv: v1 [math.pr] 6 Jan 2014

THE SMALLEST SINGULAR VALUE OF A RANDOM RECTANGULAR MATRIX

The Convergence Rate for the Normal Approximation of Extreme Sums

Normal Approximation for Hierarchical Structures

Properties of Random Variables

HERMITE MULTIPLIERS AND PSEUDO-MULTIPLIERS

ON THE COMPLETE CONVERGENCE FOR WEIGHTED SUMS OF DEPENDENT RANDOM VARIABLES UNDER CONDITION OF WEIGHTED INTEGRABILITY

S. Lototsky and B.L. Rozovskii Center for Applied Mathematical Sciences University of Southern California, Los Angeles, CA

A NOTE ON A DISTRIBUTION OF WEIGHTED SUMS OF I.I.D. RAYLEIGH RANDOM VARIABLES

Wittmann Type Strong Laws of Large Numbers for Blockwise m-negatively Associated Random Variables

large number of i.i.d. observations from P. For concreteness, suppose

INEQUALITIES FOR SUMS OF INDEPENDENT RANDOM VARIABLES IN LORENTZ SPACES

Laplace s Equation. Chapter Mean Value Formulas

The strong law of large numbers for arrays of NA random variables

Advanced Probability Theory (Math541)

Piecewise Smooth Solutions to the Burgers-Hilbert Equation

EXISTENCE OF MOMENTS IN THE HSU ROBBINS ERDŐS THEOREM

STAT 200C: High-dimensional Statistics

DEPARTMENT MATHEMATIK ARBEITSBEREICH MATHEMATISCHE STATISTIK UND STOCHASTISCHE PROZESSE

5 Operations on Multiple Random Variables

Asymptotically Efficient Nonparametric Estimation of Nonlinear Spectral Functionals

ARCS IN FINITE PROJECTIVE SPACES. Basic objects and definitions

1 Exercises for lecture 1

Testing Goodness-of-Fit for Exponential Distribution Based on Cumulative Residual Entropy

On lower and upper bounds for probabilities of unions and the Borel Cantelli lemma

Zeros of lacunary random polynomials

Pointwise convergence rates and central limit theorems for kernel density estimators in linear processes

ONE DIMENSIONAL MARGINALS OF OPERATOR STABLE LAWS AND THEIR DOMAINS OF ATTRACTION

Krzysztof Burdzy University of Washington. = X(Y (t)), t 0}

Transcription:

Normal Approximation for Non-linear Statistics Using a Concentration Inequality Approach Louis H.Y. Chen 1 National University of Singapore and Qi-Man Shao 2 Hong Kong University of Science and Technology, University of Oregon,and Zhejiang University Abstract. Let T be a general sampling statistic that can be written as a linear statistic plus an error term. Uniform and non-uniform Berry-Esseen type bounds for T are obtained. The bounds are best possible for many known statistics. Applications to U-statistic, multi-sample U-statistic, L-statistic, random sums, and functions of non-linear statistics are discussed. Nov. 15, 2005 AMS 2000 subject classification: Primary 62E20, 60F05; secondary 60G50. Key words and phrases. Normal approximation, uniform Berry-Esseen bound, non-uniform Berry- Esseen bound, concentration inequality approach, nonlinear statistics, U-statistics, multi-sample U- statistics, L-statistic, random sums, functions of non-linear statistics. 1 Research is partially supported by grant R-146-000-013-112 at the National University of Singapore. 2 Research is partially supported by NSF grant DMS-0103487 and grant R-146-000-038-101 at the National University of Singapore. 1

1 Introduction Let X 1, X 2,..., X n be independent random variables and let T : T (X 1,..., X n be a general sampling statistic. In many cases T can be written as a linear statistic plus an error term, say T W +, where W g i (X i, : (X 1,..., X n T W and g i : g n,i are Borel measurable functions. Typical cases include U-statistics, multi-sample U- statistics, L-statistics, and random sums. Assume that (1.1 E(g i (X i 0 for i 1, 2,..., n, and E(gi 2 (X i 1. It is clear that if 0 in probability as n, then we have the following central limit theorem (1.2 sup P (T z Φ(z 0 z where Φ denotes the standard normal distribution function, provided that the Lindeberg condition holds: ε > 0, Egi 2 (X i I( g i (X i > ε 0. If in addition, E p < for some p > 0, then by the Chebyshev inequality, one can obtain the following rate of convergence: (1.3 sup z P (T z Φ(z sup P (W z Φ(z + 2(E p 1/(1+p. z The first term on the right hand side of (1.3 is well-understood via the Berry-Esseen inequality. For example, using Stein s method, Chen and Shao (2001 obtained ( n (1.4 sup P (W z Φ(z 4.1 Egi 2 (X i I( g i (X i > 1 + z E g i (X i 3 I( g i (X i 1. However, the bound (E p 1/(1+p is in general not sharp for many commonly used statistics. Many authors have worked towards obtaining better Berry-Esseen bounds. For example, sharp Berry-Esseen bounds have been obtained for general symmetric statistics in van Zwet (1984 and Friedrich (1989. An Edgeworth expansion with remainder O(n 1 for symmetric statistics was proved by Bentkus, Götze and Zwet (1997. The main purpose of this paper is to establish uniform and non-uniform Berry-Esseen bounds for general nonlinear statistics. The bounds are best possible for many known statistics. Our proof is 2

based on a randomized concentration inequality approach to bounding P (W + z P (W z. Since proofs of uniform and non-uniform bounds for sums of independent random variables can be proved via Stein s method [8], which is much neater and simpler than the traditional Fourier analysis approach, this paper provides a direct and unifying treatment towards the Berry-Esseen bounds for general non-linear statistics. This paper is organized as follows. The main results are stated in next section, five applications are presented in Section 3 and an example is given in Section 4 to show the sharpness of the main results. Proofs of the main results are given in Section 5, while proofs of other results including Example 4.1 are postponed to Section 6. Throughout this paper, C will denote an absolute constant whose value may change at each appearance. The L p norm of a random variable X is denoted by X p, i.e., X p (E X p 1/p for p 1. 2 Main results Let {X i, 1 i n, T, W, be defined as in Section 1. In the following theorems, we assume that (1.1 is satisfied. Put (2.1 β and let δ > 0 satisfy (2.2 E g i (X i 2 I( g i (X i > 1 + E g i (X i 3 I( g i (X i 1 E g i (X i min(δ, g i (X i 1/2. Theorem 2.1 For each 1 i n, let i be a random variable such that X i and ( i, W g i (X i are independent. Then (2.3 sup P (T z P (W z 4δ + E W + z for δ satisfying (2.2. In particular, we have E g i (X i ( i (2.4 and (2.5 sup P (T z P (W z 2β + E W + z sup P (T z Φ(z 6.1β + E W + z E g i (X i ( i E g i (X i ( i. 3

Next theorem provides a non-uniform bound. Theorem 2.2 For each 1 i n, let i be a random variable such that X i and ( i, {X j, j i are independent. Then for δ satisfying (2.2 and for z R 1, (2.6 P (T z P (W z γ z + e z /3 τ where (2.7 (2.8 γ z P ( > ( z + 1/3 + P ( g i (X i > ( z + 1/3 + P ( W g i (X i > ( z 2/3P ( g i (X i > 1, τ 21δ + 8.1 2 + 3.5 g i (X i 2 i 2. In particular, if E g i (X i p < for 2 < p 3, then (2.9 P (T z Φ(z P ( > ( z + 1/3 + C( z + 1 p( 2 + g i (X i 2 i 2 + E g i (X i p. A result similar to (2.5 has been obtained by Friedrich (1989 for g i E(T X i using the method of characteristic function. Our proof is direct and simpler and the bounds are easier to calculate. The non-uniform bounds in (2.6 and (2.9 for general non-linear statistics are new. Remark 2.1 Assume E g i (X i p < for p > 2. Let (2.10 ( 2(p 2 p 2 δ (p 1 p 1 E g i (X i p 1/(p 2. Then (2.2 is satisfied. This follows from the inequality (2.11 min(a, b a (p 2p 2 a p 1 (p 1 p 1 b p 2 for a 0 and b > 0. Remark 2.2 If β 1/2, then (2.2 is satisfied with δ β/2. 4

Remark 2.3 Let δ > 0 be such that Egi 2 (X i I( g i (X i > δ 1/2. Then (2.2 holds. In particular, if X 1, X 2,, X n are independent and identically distributed (i.i.d. random variables and g i g 1, then (2.2 is satisfied with δ c 0 / n, where c 0 is a constant such that E( ng 1 (X 1 2 I( ng 1 (X 1 > c 0 1/2. Remark 2.4 In Theorems 2.1 and 2.2, the choice of i is flexible. For example, one can choose i (X 1,..., X i 1, 0, X i+1,..., X n or i (X 1,..., X i 1, ˆX i, X i+1,..., X n, where { ˆX i, 1 i n is an independent copy of {X i, 1 i n. The choice of g i is also flexible. It can be more general than g i (x E(T X i x, which is commonly used by others in the literature. Remark 2.5 Let X 1,..., X n be independent normally distributed random variables with mean zero and variance 1/n, and let W, T and be as in Example 4.1. Then (2.12 E W + E X i 3 + E X i ( (X 1,..., X i,..., X n (X 1,..., 0,..., X n Cε 2/3 for (1/ε 4/3 n 16(1/ε 4/3. This together with (4.5 shows that the bound in (2.4 is achievable. Moreover, the term E g i (X i ( i in (2.4 can not be dropped off. 3 Applications Theorems 2.1 and 2.2 can be applied to a wide range of different statistics and provide bounds of the best possible order in many instances. To illustrate the usefulness and the generality of these results, we give five applications in this section. The uniform bounds refine many existing results with specifying absolute constants, while the non-uniform bounds are new for many cases. 3.1 U-statistics Let X 1, X 2,..., X n be a sequence of independent and identically distributed (i.i.d. random variables, and let h(x 1,..., x m be a real-valued Borel measurable symmetric function of m variables, where m (2 m < n may depend on n. Consider the Hoeffding (1948 U-statistic ( n 1 U n h(x m i1,..., X im. 1i 1 <...<i mn 5

The U-statistic elegantly and usefully generalizes the notion of a sample mean. Numerous investigations on the limiting properties of the U-statistics have been done during the last few decades. A systematic presentation of the theory of U-statistics was given in Koroljuk and Borovskikh (1994. We refer the study on uniform Berry-Esseen bound for U-statistics to Filippova (1962, Grams and Serfling (1973, Bickel (1974, Chan and Wierman (1977, Callaert and Janseen (1978, Serfling (1980,Van Zwet (1984, and Friedrich (1989. One can also refer to Wang, Jing and Zhao (2000 on uniform Berry-Esseen bound for studentized U-statistics. Applying Theorems 2.1 and 2.2 to the U-statistic, we have Theorem 3.1 Assume that Eh(X 1,..., X m 0 and σ 2 Eh 2 (X 1,..., X m <. Let g(x E(h(X 1, X 2,..., X m X 1 x and σ1 2 Eg2 (X 1. Assume that σ 1 > 0. Then ( n ( 1 n sup P U n z P g(x i z (1 + 2(m 1σ z mσ 1 nσ1 (m(n m + 1 1/2 + c 0 (3.1, σ 1 n where c 0 is a constant such that Eg 2 (X 1 I( g(x 1 > c 0 σ 1 σ 2 1 /2. If in addition E g(x 1 p < for 2 < p 3, then (3.2 sup P z ( n U n z Φ(z (1 + 2(m 1σ mσ 1 (m(n m + 1 1/2 + 6.1E g(x 1 p σ 1 n (p 2/2 σ p 1 and for z R 1, ( n (3.3 P U n z Φ(z mσ 1 9mσ 2 (1 + z 2 (n m + 1σ1 2 CE g(x 1 p + (1 + z p n (p 2/2 σ p. 1 + 13.5e z /3 m 1/2 σ (n m + 1 1/2 σ 1 Moreover, if E h(x 1,..., X m p < for 2 < p 3, then for z R 1, ( n (3.4 P U n z Φ(z mσ 1 Cm1/2 E h(x 1,..., X m p (1 + z p (n m + 1 1/2 σ p 1 CE g(x 1 p + (1 + z p n (p 2/2 σ p. 1 Note that the error in (3.1 is of order O(n 1/2 only under the assumption of finite second moment of h. The result appears not known before. The uniform bound given in (3.2 is not new, however, the specifying constant for general m is new. Finite second moment of h is not the weakest assumption for the uniform bound. Friedrich (1989 obtained an order of O(n 1/2 when E h 5/3 < which is necessary for the bound as shown by Bentkus, Götze and Zitikis (1994. 6

then (3.5 For the non-uniform bound, Zhao and Chen (1983 proved that if m 2, E h(x 1, X 2 3 <, P ( n U n z Φ(z An 1/2 (1 + z 3 mσ 1 for z R 1, where the constant A does not depend on n and z but the moment of h. Clearly, (3.4 refines Zhao and Chen s result specifying the relationship of the constant A with the moment condition. After we finished proving Theorem 3.1, Wang (2001 informed the second author that he also obtained (3.4 for m 2 and p 3. Remark 3.1 (3.3 implies that (3.6 P ( n U n z Φ(z mσ 1 Cm 1/2 σ 2 (1 + z 3 (n m + 1 1/2 σ 2 1 CE g(x 1 p + (1 + z p n (p 2/2 σ p 1 for z ((n m + 1/m 1/2. For z > ((n m + 1/m 1/2, the bound like (3.6 can be easily obtained by using the Chebyshev inequality. On the other hand, if (3.6 holds for any z R 1, then it appears necessary to assume E h(x 1,..., X m p <. 3.2 Multi-sample U-statistics Consider k independent sequences {X j1,..., X jnj of i.i.d. random variables, j 1,..., k. Let h(x jl, l 1,..., m j, j 1,..., k be a measurable function symmetric with respect to m j arguments of the j-th set, m j 1, j 1,..., k. Let θ Eh(X jl, l 1,..., m j, j 1,..., k. The multi-sample U-statistic is defined as { k U n ( nj m j 1 h(xjl, l i j1,..., i jmj, j 1,..., k where n (n 1,..., n k and the summation is carried out over all 1 i j1 <... < i jmj n j, n j 2m j, j 1,..., k. Clearly, U n is an unbiased estimate of θ. The two-sample Wilcoxon statistic and the two-sample ω 2 -statistic are two typical examples of the multi-sample U-statistics. Without loss of generality, assume θ 0. For j 1,..., k, define ( h j (x E h(x 11,..., X 1m1 ;... ; X k1,..., X kmk X j1 x 7

and let σ 2 j Eh2 j (X j1 and σ 2 n k m 2 j n j σ 2 j. A uniform Berry-Esseen bound with order O((min 1jk n j 1/2 for the multi-sample U-statistics was obtained by Helmers and Janssen (1982 and Borovskich (1983 (see, [Koroljuk and Borovskich (1994, pp. 304-311.]. Next theorem refines their results. Theorem 3.2 Assume that θ 0, σ 2 : Eh 2 (X 11,..., X 1m1 ;... ; X k1,..., X kmk < and max 1jk σ j > 0. Then for 2 < p 3 (3.7 ( sup P z σ 1 n U n z Φ(z (1 + 2σ σ n k m 2 j n j + 6.6 σ p n k m p j n p 1 j E h j (X j1 p and for z R 1 (3.8 ( P σ 1 n U n z Φ(z 9σ 2 (1 + z 2 σ 2 n C + (1 + z p σ p n ( k k m 2 j n j 2 + 13.5e z /3 σ σ n m p j E h j(x j1 p. n p 1 j k m 2 j n j 3.3 L-statistics Let X 1,..., X n be i.i.d. random variables with a common distribution function F, and let F n be the empirical distribution function defined by F n (x n 1 n Let J(t be a real-valued function on [0, 1] and define T (G for non-decreasing measurable function G. Put σ 2 I(X i x for x R 1. xj(g(xdg(x J(F (sj(f (tf (min(s, t(1 F (max(s, tdsdt and g(x (I(x s F (sj(f (sds. 8

The statistic T (F n is called an L-statistic (see [Serfling (1980, Chapter 8]. Uniform Berry-Esseen bounds for L-statistic for smoothing J were given by Helmers (1977, and Helmers, Janssen and Serfling (1990. Applying Theorems 2.1 and 2.2 yields the following uniform and non-uniform bounds for L-statistic. Theorem 3.3 Let n 4. Assume that EX 2 1 < and E g(x 1 p < for 2 < p 3. If the weight function J(t is Lipschitz of order 1 on [0, 1], that is, there exists a constant c 0 such that (3.9 then (3.10 and (3.11 J(t J(s c 0 t s for 0 s, t 1 sup P ( nσ 1 (T (F n T (F z Φ(z (1 + 2c 0 X 1 2 + 6.1E g(x 1 p z nσ n (p 2/2 σ p P ( nσ 1 (T (F n T (F z Φ(z 9c 2 0 EX2 1 (1 + z 2 nσ 2 + C ( c0 X 1 2 (1 + z p + E g(x 1 p nσ n (p 2/2 σ p 3.4 Random sums of independent random variables with non-random centering Let {X i, i 1 be i.i.d. random variables with EX i µ and Var(X i σ 2, and let {N n, n 1 be a sequence of non-negative integer-valued random variables that are independent of {X i, i 1. Assume that EN 2 n < and N n EN n Var(Nn d. N(0, 1. Then by Robbins (1948, Nn X i (EN n µ σ 2 EN n + µ 2 Var(N n d. N(0, 1. This is a special case of limit theorems for random sums with non-random centering. This kind of problems arises in the study, for example, of Galton-Watson branching processes. We refer to Finkelstein, Kruglov and Tucker (1994 and references therein for recent developments in this area. As another application of our general result, we give a uniform Berry-Esseen bound for the random sum. 9

Theorem 3.4 Let {Y i, i 1 be i.i.d. non-negative integer-valued random variables with EY i ν and Var(Y i τ 2. Put N n n Y i. Assume that E X i 3 < and that {Y i, i 1 and {X i, i 1 are independent. Then (3.12 ( N n sup P X i nµν x n(νσ 2 + τ 2 µ 2 x Φ(x Cn 1/2( τ 2 ν 2 + E X 1 3 σ 3 + σ µ. ν 3.5 Functions of non-linear statistics Let X 1, X 2,..., X n be a random sample and ˆΘ n ˆΘ n (X 1,..., X n be a weak consistent estimator of an unknown parameter θ. Assume that ˆΘ n can be written as ˆΘ n θ + 1 ( n g i (X i + n where g i are Borel measurable functions with Eg i (X i 0 and n Eg 2 i (X i 1, and : n (X 1,..., X n 0 in probability. Let h be a real-valued function differentiable in a neighborhood of θ with g (θ 0. Then, it is known that n(h( ˆΘn h(θ h (θ d. N(0, 1 under some regularity conditions. When ˆΘ n is the sample mean, the Berry-Esseen bound and Edgeworth expansion have been well studied (see Bhattacharya and Ghosh (1978. The next theorem shows that the results in Section 3 can be extended to functions of non-linear statistics. Theorem 3.5 Assume that h (θ 0 and δ(c 0 sup x θ c0 h (x < for some c 0 > 0. Then for 2 < p 3, (3.13 where W n g i (X i. ( n(h( sup P ˆΘ n h(θ z h (θ (1 + c 0δ(c 0 h (θ ( E W + z Φ(z E g i (X i ( i +6.1 E g i (X i p + 4 c 2 0 n + 2E 4.4c3 p 0 δ(c 0 + c 0 n1/2 h (θ n (p 2/2, 10

4 An example In this section we give an example to show that the bound of (2.4 in Theorem 2.1 is achievable. Moreover, the term E g i (X i ( i in (2.4 can not be dropped off. The example also provides a counter-example to a result of Shorack (2000 and of Bolthausen and Götze (1993. Example 4.1 Let X 1,..., X n be independent normally distributed random variables with mean zero and variance 1/n. Define W X i, T : T ε W ε W 1/2 + ε c 0 and T W ε W 1/2 + ε c 0, where c 0 E( W 1/2 2/π 0 x 1/2 e x2 /2 dx. Let { ˆX i, 1 i n be an independent copy of {X i, 1 i n and define (4.1 α 1 E (X 1,..., X i,..., X n (X 1,..., n ˆX i,..., X n. Then ET 0 and for 0 < ε < 1/64 and n (1/ε 4 (4.2 (4.3 (4.4 where C is an absolute constant. P (T ε c 0 Φ(ε c 0 ε 2/3 /6, E W + E 7ε, E + E X i 3 + α Cε (4.5 Clearly, (4.2 implies that sup P (T ε z Φ(z ε 2/3 /6. z A result of Shorack (2000 (see Lemma 11.1.3, p. 261, [22] states that for any random variables W and, (4.6 sup z P (W + z Φ(z sup P (W z Φ(z + 4E W + 4E. z Another result which is in Theorem 2 of Bolthausen and Götze (1993 states that if ET 0, then (4.7 ( sup P (T z Φ(z C E + z where C is an absolute constant and α is defined in (4.1. E g i (X i 3 + α, In view of (4.3, (4.4 and (4.5, the result of Shrock and of Bolthausen and Götze can be shown to lead to a contradiction. 11

5 Proof of Main Theorems In this section we prove Theorems 2.1 and 2.2 and Remarks 2.1 and 2.2. Proof of Theorem 2.1. (2.5 follows from (2.4 and (1.4. When β > 1/2, (2.4 is trivial. For β 1/2, (2.4 is a consequence of (2.3 and Remark 2.2. Thus, we only need to prove (2.3. Note that (5.1 P (z W z P (T z P (W z P (z W z +. It suffices to show that (5.2 P (z W z + 4δ + E W + E g i (X i ( i and (5.3 where δ satisfies (2.2. Let (5.4 Let P (z W z 4δ + E W + E g i (X i ( i /2 δ for w z δ, f (w w 1 2 (2z + for z δ w z + + δ, /2 + δ for w > z + + δ. ξ i g i (X i, ˆMi (t ξ i {I( ξ i t 0 I(0 < t ξ i, M i (t E ˆM i (t, ˆM(t n ˆM i (t, M(t E ˆM(t. Since ξ i and f i (W ξ i are independent for 1 i n and Eξ i 0, we have (5.5 E{W f (W E{ξ i (f (W f (W ξ i 1in + E{ξ i (f (W ξ i f i (W ξ i 1in : H 1 + H 2. Using the fact that ˆM(t 0 and f (w 0, we have (5.6 H 1 1in 1in 0 E {ξ i f (W + tdt ξ i { E f (W + t ˆM i (tdt 12

{ E { E f (W + t ˆM(tdt f (W + t ˆM(tdt t δ { E I(z W z + 1in H 1,1 H 1,2, t δ ˆM(tdt { E I(z W z + ξ i min(δ, ξ i where H 1,1 P (z W z + Eη i, H 1,2 E η i Eη i, η i ξ i min(δ, ξ i. 1in 1in By (2.2, Hence (5.7 Eη i 1/2. 1in H 1,1 (1/2P (z W z +. By the Cauchy-Schwarz inequality, (5.8 H 1,2 (E( η i Eη i 2 1/2 1in ( 1in Eη 2 i 1/2 δ. As to H 2, it is easy to see that Hence (5.9 Combining (5.5, (5.7, (5.8 and (5.9 yields f (w f i (w i /2 i /2. H 2 (1/2 E ξ i ( i. { P (z W z + 2 E W f (W + δ + (1/2 E ξ i ( i E W + 2δE W + 2δ + E ξ i ( i 4δ + E W + E ξ i ( i. 13

This proves (5.2. Similarly, one can prove (5.3 and hence Theorem 2.1. Proof of Theorem 2.2. First, we prove (2.9. For z 4, (2.9 holds by (2.5. For z > 4, consider two cases. Case 1. n E g i (X i p > 1/2. By the Rosenthal (1970 inequality, we have (5.10 Hence P ( W > ( z 2/3 P ( W > z /6 ( z /6 p E W p C( z + 1 p{( n p/2 Egi 2 n (X i + E g i (X i p n C( z + 1 p E g i (X i p. P (T z Φ(z P ( > ( z + 1/3 + P ( W > ( z 2/3 + P ( N(0, 1 > z n P ( > ( z + 1/3 + C( z + 1 p E g i (X i p, which shows that (2.9 holds. Case 2. n E g i (X i p 1/2. Similar to (5.10, we have P ( W g i (X i > ( z 2/3 C( z + 1 p{( n p/2 Egj 2 n (X j + E g j (X j p C( z + 1 p and hence γ z P ( > ( z + 1/3 + (( z + 1/3 p E g i (X i p + C( z + 1 p E g i (X i p n P ( > ( z + 1/3 + C( z + 1 p E g i (X i p. By Remark 2.1, we can choose δ ( 2(p 2 p 2 (p 1 p 1 E g i (X i p 1/(p 2 2(p 2p 2 (p 1 p 1 E g i (X i p. Combining the above inequalities with (2.6 and the non-uniform Berry-Essee bound for independent random variables yields (2.9. 14

Next we prove (2.6. The main idea of the proof is first to truncate g i (X i and then adopt the proof of Theorem 2.1 to the truncated sum. Without loss of generality, assume z 0 as we can simply apply the result to T. By (5.1, it suffices to show that (5.11 and (5.12 P (z W z γ z + e z/3 τ P (z W z + γ z + e z/3 τ. Since the proof of (5.12 is similar to that of (5.11, we only prove (5.11. It is easy to see that P (z W z P ( > (z + 1/3 + P (z W z, (z + 1/3. Now (5.11 follows directly by Lemmas 5.1 and 5.2 below. This completes the proof of Theorem 2.2. Lemma 5.1 Let Then ξ i g i (X i, ξi ξ i I(ξ i 1, W n ξ i. (5.13 P (z W z, (z + 1/3 P (z W z, (z + 1/3 + P (ξ i > (z + 1/3 + P (W ξ i > (z 2/3P ( ξ i > 1. Proof. We have P (z W z, (z + 1/3 P (z W z, (z + 1/3, max 1in ξ i 1 +P (z W z, (z + 1/3, max ξ i > 1 1in P (z W z, (z + 1/3 + P (W > (2z 1/3, ξ i > 1 and P (W > (2z 1/3, ξ i > 1 15

as desired. P (ξ i > (z + 1/3 + P (W > (2z 1/3, ξ i (z + 1/3, ξ i > 1 P (ξ i > (z + 1/3 + P (W ξ i > (z 2/3, ξ i > 1 P (ξ i > (z + 1/3 + P (W ξ i > (z 2/3P ( ξ i > 1, Lemma 5.2 We have (5.14 P (z W z, (z + 1/3 e z/3 τ. Proof. Noting that E ξ i 0, e s 1 + s + s 2 (e a 1 aa 2 for s a and a > 0 and that a ξ i a, we have for a > 0 (5.15 Ee a W n Ee a ξ i n ( 1 + a E ξ i + (e a 1 ae ξ i 2 ( exp (e a 1 a E ξ i 2 ( exp (e a 1 a Eξi 2 exp(e a 1 a. In particular, we have Ee W /2 exp(e 1/2 1.5. If δ 0.07, then This proves (5.14 when δ 0.07. P (z W z, (z + 1/3 P ( W > (2z 1/3 e z/3+1/6 Ee W /2 e z/3 exp(e.5 4/3 1.38e z/3 20δ e z/3. For δ < 0.07, let 0 for w z δ, (5.16 f (w e w/2 (w z + + δ for z δ w z + δ, e w/2 ( + 2δ for w > z + δ. Put M i (t ξ i {I( ξ i t 0 I(0 < t ξ i, 16 M(t n M i (t.

By (5.5 and similar to (5.6, we have (5.17 E{W f ( W { E f ( W + t M(tdt { + E ξ i (f ( W ξ i f i ( W ξ i : G 1 + G 2, It follows from the fact that M(t 0, f (w ew/2 for z δ w z + δ and f (w 0 for all w, (5.18 { G 1 E t δ f ( W + t M(tdt { E e W /2 I(z W z, (z + 1/3 { E e W /2 I(z W z, (z + 1/3 { +E e W /2 I(z W z, (z + 1/3 G 1,1 G 1,2, t δ t δ t δ M(tdt E M(tdt ( M(t E M(tdt where G 1,1 e z/3 1/6 P (z W z, (z + 1/3 { G 1,2 E e W /2 M(t E M(t dt. t δ t δ E M(tdt, By (2.2 and the assumption that δ 0.07, Hence (5.19 t δ E M(tdt E ξ i min(δ, ξ i E ξ i min(δ, ξ i 1/2. ( G 1,1 (1/2e z/3 1/6 P z W z, (z + 1/3. By (5.15, we have Ee W exp(e 2 < 2.06. It follows from the Cauchy-Schwarz inequality that (5.20 G 1,2.5 t δ ( 0.5Ee W + 2E ˆM(t M(t 2 dt { 0.5 2.06δ + 2 Eξi 2 (I( ξ i t 0 + I(0 < t ξ i dt t δ 17

{ 0.5 2.06δ + 2 { 0.5 2.06δ + 2δ Eξi 2 Eξi 2 min(δ, ξ i 2.03δ. As to G 2, it is easy to see that f (w f i (w e w/2 i e w/2 i. Hence, by the Hölder inequality, (5.15 and the assumption that ξ i and W ξ i are independent (5.21 G 2 E ξ i e ( W ξ i /2 ( i (Eξ i 2 e W 1/2 ξ i (E( i 2 1/2 ( Eξi 2 Ee W ξ i 1/2 i 2 1.44 ξ i 2 i 2. Following the proof of (5.15 and by using e s 1 s (e a 1/a for s a and a > 0, we have EW 2 e W Eξi 2 e ξ i Ee W ξ i + Eξ i (e ξ i 1Eξ j (e ξ j 1Ee W ξ i ξ j 1i jn 2.06 e Eξi 2 + 2.06(e 1 2 Eξi 2 Eξj 2 1i jn Thus, we obtain 2.06 e + 2.06(e 1 2 < 3.42 2. (5.22 E{W f ( W E W e W /2 ( + 2δ { ( 2 + 2δ E(W 2 e W 1/2 3.42( 2 + 2δ. Combining (5.17, (5.19, (5.20, (5.21 and (5.22 yields P (z W z, (z + 1/3 2e z/3+1/6{ 3.42( 2 + 2δ + 2.03δ + 1.44 ξ i 2 i 2 e z/3{ 21δ + 8.1 2 + 3.5 ξ i 2 i 2 e z/3 τ. 18

This proves (5.14. Proof of Remark 2.1. It is known that for x 0, y 0, α > 0, γ > 0 with α + γ 1 x α y γ αx + γy, ( which yields with α (p 2/(p 1, γ 1/(p 1, x b(p 1/(p 2 and y a x α y γ αx + γy b + (p 2p 2 a p 1 (p 1 p 1 b p 2 ( p 2 p 1 p 2 ap 1 b p 2 1/(p 1, or On the other hand, it is clear that b a (p 2p 2 a p 1 (p 1 p 1 b p 2. a a (p 2p 2 a p 1 (p 1 p 1 b p 2. This proves (2.11. Now (2.2 follows directly from (2.11, (2.10 and the assumption (1.1. Proof of Remark 2.2. Note that δ β/2 1/4. Applying (2.11 with p 3 yields E g i (X i min(δ, g i (X i E g i (X i I( g i (X i 1 min(δ, g i (X i { Egi 2 (X i I( g i (X i 1 E g i (X i 3 I( g i (X i 1/(4δ ( 1 4δ Egi 2 (X i I( g i (X i > 1 + E g i (X i 3 I( g i (X i 1 /(4δ 1 β/(4δ 1/2. This proves Remark 2.2. 6 Proofs of Other Theorems In this section, we prove Theorems 3.1-3.5. 19

6.1 Proof of Theorem 3.1 For 1 k m, let h k (x 1,..., x k E(h(X 1,..., X m X 1 x 1,..., X k x k and h k (x 1,..., x k h k (x 1,..., x k k g(x i. Observing that U n n 1 m ( n 1 g(x i + h m m (X i1,..., X im, 1i 1 <...<i mn we have where n mσ 1 U n W +, W 1 nσ1 n n mσ 1 ( n m g(x i, 1 1i 1 <...<i mn h m (X i1,..., X im. Let l n ( n 1 h mσ 1 m m (X i1,..., X im. 1i 1 <...<i m,i j l for all j By Theorems 2.1 and 2.2 (with Remark 2.3 for proof of (3.1, it suffices to show that (6.1 E 2 (m 12 σ 2 m(n m + 1σ 2 1 and (6.2 E l 2 It is known that (see, e.g., [18], p.271 2(m 1 2 σ 2 nm(n m + 1σ1 2. (6.3 ( E 1i 1 <...<i mn 2 h m (X i1,..., X im ( n m ( m ( n m E h 2 m j m j j(x 1,..., X j. j2 Note that (6.4 E h 2 j(x 1,..., X j j j Eh 2 j(x 1,..., X j 2 E[g(X i h k (X 1,..., X j ] + E( g(x i 2 Eh 2 j(x 1,..., X j 2jE[g(X 1 E(h(X 1,..., X m X 1,..., X j ] + keg 2 (X 1 20

Eh 2 j(x 1,..., X j 2jE[g(X 1 h(x 1,..., X m ] + jeg 2 (X 1 Eh 2 j(x 1,..., X j 2jEg 2 (X 1 + jeg 2 (X 1 Eh 2 j(x 1,..., X j jeg1(x 2 1. We next prove that for 2 j m (6.5 Eh 2 j 1(X 1..., X j 1 j 1 Eh 2 j j(x 1,..., X j Since E h 2 2 (X 1, X 2 0, (6.5 holds for j 2 by (6.4. Assume that (6.5 is true for j. Then (6.6 E(h j+1 (X 1,..., X j+1 h j (X 1,..., X j h j (X 2,..., X j+1 2 On the other hand, we have Eh 2 j+1(x 1,..., X j+1 4E[h j+1 (X 1,..., X j+1 h j (X 1,..., X j ] +2Eh 2 j(x 1,..., X j + 2Eh j (X 1,..., X j h j (X 2,..., X j+1 Eh 2 j+1(x 1,..., X j+1 2Eh 2 j(x 1,..., X j ( +2E E(h j (X 1,..., X j h j (X 2,..., X j+1 X 2,..., X j Eh 2 j+1(x 1,..., X j+1 2Eh 2 j(x 1,..., X j + 2Eh 2 j 1(X 1,..., X j 1. (6.7 E(h j+1 (X 1,..., X j+1 h j (X 1,..., X j h j (X 2,..., X j+1 2 ( E E(h j+1 (X 1,..., X j+1 h j (X 1,..., X j h j (X 2,..., X j+1 X 1,..., X j 2 Eh 2 j 1(X 1,..., X j 1. Combining (6.5 and (6.6 yields 2Eh 2 j(x 1,..., X j Eh 2 j+1(x 1,..., X j+1 + Eh 2 j 1(X 1,..., X j 1 Eh 2 j+1(x 1,..., X j+1 + j 1 Eh 2 j j(x 1,..., X j by the induction hypothesis, which in turn reduces to (6.4 for j + 1. This proves (6.4. It follows from (6.4 that (6.8 Eh 2 j(x 1,..., X j j m Eh2 m(x 1,..., X m j m σ2. (6.9 To complete the proof of (6.1, we need the following two inequalities: m ( m j j2 ( n m j m(m 12 n m j m (n m + 1n( m 21

and (6.10 m 1 for n > m 2. In fact, we have and m 1 ( m 1 ( n m j + 1 2(m 12 n j m 1 j m (n m + 1n( m m ( m j j2 ( n m j m j m m j2 m 1 ( m 1 j 1 ( n m m j ( m 1 ( n m j m 1 j ( n 1 m 1 ( n 1 {1 m 1 ( n 1 { 1 m 1 ( n m m 1 (n m!/(n m m + 1! (n 1!/(n m! n 1 jn m+1 ( n 1 n 1 m 1 m 1 jn m+1 j ( n 1 (m 1 2 m 1 n m + 1 (m 1 2 m (n m + 1n( n m ( m 1 ( n m j + 1 j m 1 j m m 1 m m 1 m m 1 m 2 j0 m 1 ( n 2 m m 2 ( n 1 ( (m 1 2 m 1 m(n 1. 2(m 1 2 (n m + 1n( n m (1 m 1 j ( m 1 ( n m j j m 1 j m 1 + 1 m ( m 2 ( n m + 1 j m 2 j m + 1 ( n 1 m { m 1 + (m 12 (n m + 1m m 1 ( n m m 1 m 1 ( m 1 ( n m j m 1 j ( m 1 ( n m j m 1 j 22

(6.11 From (6.8 and (6.9 we obtain that This proves (6.1. Similarly, by (6.10 (6.12 E( l 2 E 2 n ( n 2E { 2 h m 2 σ1 2 m m (X 1,..., X m 1i 1 <...<i mn m 1 n m 2 σ 2 1 ( n m nσ2 m 2 σ 2 1 ( n m j2 m 1 j2 (m 1 2 σ 2 (n m + 1 m σ1 2. ( m j ( m j ( n m m j ( n m j m j m Eh 2 j(x 1,..., X j n ( n 2E ({ 2 m 2 σ1 2 hm (X m i1,..., X im 1i 1 <...<i mn 1i 1 <...<i mn, all i j l n ( n 2E ( 2 h m 2 σ1 2 m m (X i1,..., X im 1, X m 1i 1 <...<i m 1 n 1 n ( n 2 ( n 1 m 2 σ1 2 m m 1 n ( n 2 ( n 1 m 2 σ1 2 m m 1 σ2 mσ 2 1 ( n m 1 m 1 2(m 1 2 σ 2 m n(n m + 1 σ1 2. m 1 m 1 ( m 1 ( n m E h 2 j m 1 j j+1(x 1,..., X j ( m 1 ( n m Eh 2 j m 1 j j+1(x 1,..., X j ( m 1 ( n m j + 1 j m 1 j m This proves (6.2 and hence completes the proof of Theorem 3.1. 6.2 Proof of Theorem 3.2 We follow a similar argument as that in the proof of Theorem 3.1. For 1 j k, let X j (X j1,..., X jmj and x j (x j1,..., x jmj and define k m j h(x 1,..., x k h(x 1,..., x k h j (x ji. For the given U-statistic U n, we define its projection k n j Û n l1 23 E(U n X jl.

Since we have m j /n j k n j Û n l1 ( nj 1 ( nj /, m j 1 m j m j n j h j (X jl. The difference U n Û n can be rewritten as { k U n Û n ( nj m j 1 h(x1i 1,..., X ki k, where X ji j (X jij1,..., X jijmj and the summation is carried out over all indices 1 i j1 < i j2 <... < i jmj n j, j 1, 2,..., k. Thus, we have with Let W σ 1 n n k j l1 σ 1 n U n W + m j n j h j (X jl, { σ 1 k ( nj 1 n h(x1i m j 1,..., X ki k. { jl σ 1 k ( nv 1 n (jl h(x1i m v1 v 1,..., X ki k, where the summation is carried out over all indices 1 i v1 < i v2 <... < i vmv n v, 1 v k, v j and 1 i j1 < i j2 <... < i jmj n j with i js l for 1 s m j. By Theorems 2.1 and 2.2, it suffices to show that (6.13 E 2 σ2 ( k m 2 j 2 σ 2 n n j and (6.14 For 0 d j m j, 1 j k let E jl 2 2σ2 m 2 j n 2 j σ2 n k v1 m 2 v n v Y d1,...,d k (x ji, 1 i d j, 1 j k E h(x j1,..., x jd1, X jdj +1,..., X jmj, 1 j k 24

and Noting that y d1,...,d k EY 2 d 1,...,d k (X ji, 1 i d j, 1 j k. E( h(x 1i 1,..., X ki k X jl 0 for every 1 l m j, 1 j k, we have (see (4.5.8 in [18] (6.15 ( 2 E U n Û n { k σ 2{ k ( nj m j 1 σ 2( k ( nj m j 1 m 2 j n j 2, d 1 +... + d k 2 0 d j m j, 1 j k d 1 +... + d k 2 0 d j m j, 1 j k k {( mj d j ( nj m j m j d j y d1,...,d k k {( mj ( nj m j d j m j d j where in the last inequality we used the fact that (6.16 d 1 +... + d k 2 0 d j m j, 1 j k k {( mj ( nj m ( j k m 2 j 2 d j m j d j n j k ( nj m j. (See below for proof. This proves (6.13. As to (6.14, consider j 1 only. Similar to (6.12 and (6.15, we have (with X 1i 1 (X 1i1,1,..., X 1i1,m1 1, X 1,m 1 σ 2 ne 1l 2 { k v1 σ 2{ k ( nv m v 2E ( v1 1 i v1 < i v2 <... < i vmv n v, 2 v k 1 i 1,1 < i 1,2 <... < i 1,mj 1 n j 1 ( nv 2 ( n1 1 k ( nv m v m 1 1 m v2 v d 1 +... + d k 1 0 d 1 m 1 1, 0 d v m v, 2 v k ( m1 1 d 1 ( n1 m 1 m 1 1 d 1 k 2 h(x 1i, X 1 2i 2,..., X ki k v2 ( mv ( nv m v d v m v d v 25

σ2 m { 1 k ( nv 1 { n 1 m v1 v 1d 1 m 1 1 + 2vk d 1 0,1d vm v σ2 m { 1 k ( nv 1 { (m1 1 2 ( n1 1 k ( nj n 1 m v1 v n 1 m 1 + 1 m 1 1 m j2 j σ2 m 2 1 n 2 1 1vk for n 1 2m 1. This proves (6.14. ( n1 m + 1 m 2 v m 1 1 n 2vk v m v + 1 m 2 v n v m v + 1 2σ2 m 2 1 n 2 1 1vk m 2 v n v Now we prove (6.16. Consider two cases in the summation: Case 1: At least one of d j 2, say d 1 2. In this case, by (6.9 for n 1 2m 1. d 1 2 1 d j m j, 1 j k { k j2 { k m4 1 n 2 1 j2 ( nj m j ( nj m j m 2 k 2d 1 m 1 m 2 1 (m 1 1 2 2n 1 (n 1 m 1 + 1 k ( nj m j k j2 {( mj ( nj m j d j m j d j 2d 1 m 1 k ( m1 ( n1 m 1 d 1 m 1 d 1 ( nj m j ( m1 d 1 ( n1 m 1 m 1 d 1 d1 m ( nj m j Case 2. At least two of {d j are equal to 1, say d 1 d 2 1. Then k {( mj ( nj m j d d 1 d 2 1 j m j d j m 1 m 2 ( n1 m 1 m 1 1 m2 1 m2 2 n 1 n 2 k ( nj m j. ( n2 m 2 m 2 1 k j3 ( nj m j 26

Thus, we have d 1 +... + d k 2 1 d j m j, 1 j k ( k m 4 j n 2 + j ( k m 2 j 2 k n j k 1i jk ( nj m j. {( mj ( nj m j d j m j d j m 2 i m2 j n i n j k This proves (6.16. Now the proof of Theorem 3.2 is complete. 6.3 Proof of Theorem 3.3 Let ψ(t t 0 and hence J(sds. As in [Serfling (1980, p.265], we have ( nj m j T (F n T (F [ψ(f n (x ψ(f (x]dx nσ 1 (T (F n T (F W +, where Let where F n,i (x 1 n W 1 n nσ (I(X i x F (xj(f (xdx nσ 1 [ψ(f n (x ψ(f (x (F n (x F (xj(f (x]dx. η i (x I(X i x F (x, g i (X i 1 (I(X i x F (xj(f (xdx, nσ i nσ 1 [ψ(f n,i (x ψ(f (x (F n,i (x F (xj(f (x]dx, {F (x + 1jn,j i I(X j x. We only need to prove (6.17 and (6.18 σ 2 E 2 c 2 0 n 1 EX 2 1 σ 2 E i 2 2c 2 0n 2 EX 2 1 27

Observe that the Lipschitz condition (3.9 implies (6.19 t s ψ(t ψ(s (t sj(s (J(u + s J(sdu 0.5 c 0 (t s 2 0 for 0 s, t 1. Hence Observe that and ( σ 2 E 2 0.25c 2 0 ne 2 2 2 (F n (x F (x 2 dx ( 0.25c 2 0 n 3 n 2dxdy E η i (xη j (y ( 0.25c 2 0 n 3 3n 2 Eη1(xEη 2 1(y 2 + ne{η1(xη 2 1(y 2 dxdy. ( Eη1(xEη 2 1(ydxdy 2 2 F (x(1 F (xdx (E X1 2 EX1 2 E{η 2 1(xη 2 1(ydxdy xy xy E{η 2 1(xη 2 1(ydxdy { (1 F (x 2 (1 F (y 2 F (x +F 2 (x(1 F (y 2 (F (y F (x + F 2 (xf 2 (y(1 F (y dxdy 2 F (x(1 F (ydxdy 2 { 2 xy { x0 xy0 + 0<xy x F (xdx + y 0 + x0,y>0 { 2 E(X1 2 + E(X 1 + 2 + EX1 EX+ 1 F (x(1 F (ydxdy y(1 F (ydy + x0 F (xdx (1 F (ydy y>0 4EX 2 1 This proves (6.17. Next we prove (6.18. Observe that σ i n [ψ(f n (x ψ(f n,i (x (F n (x F n,i (xj(f n,i (xdx + (F n (x F n,i (x[j(f n,i (x J(F (x]dx 28

( 2 E ηi (xdx 2 0.5c 0 (F n (x F n,i (x 2 dx +c 0 F n (x F n,i (x F n,i (x F (x dx 0.5c 0 n 2 (I(X i x F (x 2 dx +c 0 n 2 I(X i x F (x j x F (xdx j i{i(x 0.5c 0 n 2 ηi 2 (xdx + c 0 n 2 η i (x η j (x dx, j i Eη 2 1(xη 2 1(ydxdy 4EX 2 1 and ( E η i (x 2 η j (x dx j i (n 1 E{ η i (x j i η j (x η i (y j i E η i (xη i (y E{ j i η j (x j i η j (y dxdy η j (y dxdy η i (x 2 η i (y 2 η j (x 2 η j (y 2 dxdy j i j i η i (x 2 2 η i (y 2 2dxdy (n 1(E X 1 2 (n 1EX 2 1. Therefore ( σ 2 E i 2 n 3 c 2 0E 0.5 ηi 2 (xdx + η i (x 2 η j (x dx j i { ( 2 ( ηi 2 (xdx + 1.5E n 3 c 2 0 0.75E { n 3 c 2 0 3EX1 2 + 1.5(n 1EX1 2 2n 2 EX 2 1. η i (x 2 η j (x dx j i This proves (6.18 and hence the theorem. 29

6.4 Proof of Theorem 3.4 Let Z 1 and Z 2 be independent standard normal random variables that are independent of {X i and {Y i. Put and write b Nn νσ 2 + τ 2 µ 2, T n X i nµν Nn, H n X i N n µ nb Nn σ T n Nn σ H n + (N n nνµ Nn σ, T n (Z 1 Z 1 + (N n nνµ. nb nb nb nb Applying the Berry-Esseen bound to H n for given N n yields (6.20 sup P (T n x P (T n (Z 1 x x ( X1 3 P ( N n nν > nν/2 + CE Nn σ 3 I{ N n nν nν/2 4n 1 ν 2 τ 2 + Cn 1/2 ν 1/2 σ 3 E X 1 3. Let x.5nν x 1.5nν W n N n nν nτ, for x <.5nν for.5nν x 1.5nν for x > 1.5nν T n (Z 1 : N n σ nb Z 1 + (N n nνµ nb τµ b (W n + σ ν τµ Z 1 +, where Let ( N n nνσz 1 nτµ. i ( (N n Y i + ν nνσz 1 nτµ. Then E( W n Z 1 Z ( 1 σ Wn (N n nν E Z 1 σ nτµ nν nµ ν and 1 nτ E( (Y i ν( i Z 1 Z 1 n 3/2 τ 2 µ ν E(Y i ν 2 Z 1 σ n 3/2 µ ν. 30

Now letting T n (Z 1, Z 2 τµ b and applying Theorem 2.1 for given Z 1 yields (Z 2 + σ ν τµ Z 1 (6.21 sup P (T n (Z 1 x P (T n (Z 1, Z 2 x x P ( N n nν > 0.5nν + sup P ( T n (Z 1 x P (T n (Z 1, Z 2 x x 4τ 2 ( nν 2 + C E X1 3 n 1/2 σ 3 + E Z 1 σ n 1/2 µ ν Cn 1/2( τ 2 ν 2 + E X 1 3 σ 3 + σ µ ν It is clear that T n (Z 1, Z 2 has a standard normal distribution. This proves (3.12 by (6.20 and (6.21. 6.5 Proof of Theorem 3.5 Since (3.13 is trivial if n E g i (X i p > 1/6, we assume (6.22 E g i (X i p 1/6. Let W n g i (X i. It is known that for 2 < p 3 (6.23 Observe that n(h( ˆΘn h(θ (6.24 h (θ where E W p 2(EW 2 p/2 + E g i (X i p 2.2. n ( ˆΘn θ h h (θ( (θ ˆΘ n θ + W + + : W + Λ + R, n h (θ 0 n 1/2 (W + 0 [h (θ + t h (θ]dt [h (θ + t h (θ]dt n (n 1/2 W +(n 1/2 Λ + h [h (θ + t h (θ]dt, (θ 0 n n 1/2 (W + R h [h (θ + t h (θ]dt, (θ (n 1/2 W +(n 1/2 c 0 /2 for x < c 0 /2, x x for c 0 /2 x c 0 /2, c 0 /2 for x > c 0 /2. 31

Clearly, n 1/2 W c 0 /2 and n 1/2 c 0 /2 imply R 0. Hence (6.25 P ( R > 0 P ( W > c 0 n 1/2 /2 + P ( > c 0 n 1/2 /2 4/(c 2 0n + 2E /(c 0 n 1/2. To apply Theorem 2.1, let W i W g(x i and Noting that Λ i i + n (n 1/2 W i +(n 1/2 i h [h (θ + t h (θ]dt. (θ 0 (6.26 (n 1/2 W +(n 1/2 [h (θ + t h (θ]dt 0 0.5δ(c 0 ((n 1/2 W + (n 1/2 2 δ(c 0 ((n 1/2 W 2 + (n 1/2 2 ( δ(c 0 (c 0 /2 3 p (n 1/2 W p 1 + (c 0 /2n 1/2, we have (6.27 E W Λ E W + (c 0/2 3 p δ(c 0 h (θ n (p 2/2 E W p + c 0δ(c 0 h E W (θ (1 + c 0δ(c 0 E W h + 2.2c3 p 0 δ(c 0 (θ h (θ n (p 2/2. Similar to (6.26, From this we obtain (n 1/2 W +(n 1/2 [h (θ + t h (θ]dt (n 1/2 W i +(n 1/2 i ( δ(c 0 (c 0 3 p (n 1/2 W (n 1/2 W i (n 1/2 W + (n 1/2 W i p 2 +c 0 (n 1/2 (n 1/2 i ( δ(c 0 c 3 p 0 n (p 1/2 g(x i (2 W i p 2 + g(x i p 2 + c 0 n 1/2 i. (6.28 E g(x i (Λ Λ i E g(x i ( i nδ(c0 + {(c h 0 3 p n (p 1/2 n ( E g(x i 2 (2 W i p 2 + g(x i p 2 (θ 32

n +c 0 n 1/2 E g(x i ( i (1 + c 0δ(c 0 n h E g(x i ( i (θ + c3 p 0 δ(c 0 h (θ n (p 2/2 (2Eg 2 (X i + E g(x i p (1 + c 0δ(c 0 n h E g(x i ( i + 2.2c3 p 0 δ(c 0 (θ h (θ n (p 2/2. This proves (3.13 by (2.5, (6.25, (6.27 and (6.28. 6.6 Proofs of Example 4.1 and Remark 2.5 First we prove Example 4.1. Let Z denote a standard normally distributed random variable and φ(x be the standard normal density function. Observe that 0 < c 0 < 2 and P (T ε c 0 Φ(ε c 0 P (Z ε/ Z 1/2 0 Φ(ε c 0 which proves (4.2. Clearly, we have P (Z 0 + P (Z 3/2 ε, Z > 0 Φ(ε c 0 ε 2/3 0 ε 2/3 2ε φ(tdt εc0 0 φ(tdt φ(tdt (ε 2/3 2ε/3 ε 2/3 /6, E W + E εe c 0 Z Z Z 1/2 c0 + εe Z 1/2 ε(c 0 + 1 + 2c 0 7ε by the fact that c 0 < 2. This proves (4.3. As to (4.4, observe that (6.29 E + E X i 3 2cε + 4n 1/2 8ε provided n > ε 2. Below we bound α. Since {X i are i.i.d., we have α εe W 1/2 ˆX 1 + X 2 +... X n 1/2. 33

Let Y and Z be independent standard normal random variables, and let r (n 1/n and s 1 r 2. Noting that EW ( ˆX 1 + X 2 +... X n r and EZ(sY + rz r, we see that (W, ˆX 1 + X 2 +... X n and (Z, sy + rz have the same distribution. Hence (6.30 Write E W 1/2 ˆX 1 + X 2 +... X n 1/2 E Z 1/2 sy + rz 1/2 sy + rz Z E Z 1/2 sy + rz 1/2 ( Z 1/2 + sy + rz 1/2 { Y se Z 1/2 sy + rz 1/2 ( Z 1/2 + sy + rz 1/2 { Z +(1 re Z 1/2 sy + rz 1/2 ( Z 1/2 + sy + rz 1/2 : sr 1 + (1 rr 2. R 1 E{ I( rz s Y /2 + E{ I(s Y /2 < rz 2s Y + E{ I( rz > 2s Y : R 1,1 + R 1,2 + R 1,3. Let C denote an absolute constant. Then { Y I( rz s Y /2 R 1,1 2E Z 1/2 sy (4/sE(s Y /(2r 1/2 4s 1/2, R 1,2 { Y I(s Y /2 < rz 2s Y 4E sy sy + rz 1/2 Cs 1/2 and Thus, we have R 1,3 { Y I( rz > 2s Y 2E Z 3/2 { Y CE (s Y 1/2 Cs 1/2. R 1 Cs 1/2. As to R 2, we have R 2 E(1/ sy + rz 1/2 c 0 34

Note that r 1 and s 2/n as n. Combining the above inequalities yields (6.31 α Cεn 1/4. By (6.29 and (6.31, we have E + E X i 3 + α 8ε + Cε 1/2 n 1/8 (8 + Cε provided that n > (1/ε 4. This proves (4.4. Finally, we prove (2.12 in Remark 2.5. Let θ (n 1/n, ρ (1/n 1/2, and let Y and Z be independent standard normal random variables. By (6.29, (6.32 E + E X i 3 4ε + 4n 1/2 8ε 2/3 for n (1/ε 4/3. Following the proof of (6.30, we have (6.33 E X i ( (X 1,..., X i,..., X n (X 1,..., 0,..., X n nεe X 1 ( X 1 + X 2 +... X n 1/2 X 2 +... X n 1/2 nεe ρy ( ρy + θz 1/2 θz 1/2 { nερ 2 Y 2 E ρy + θz 1/2 θz 1/2 ( ρy + θz 1/2 + θz 1/2 { Y 2 I( θz ρ Y /2 εe ρy + θz θz 1/2 { Y 2 I(ρ Y /2 < θz 2ρ Y +εe ρy + θz 1/2 θz { Y 2 I( θz > 2ρ Y +εe ρy + θz 1/2 θz { 2ερ 1 E Y θz 1/2 I( θz ρ Y /2 { Y I(ρ Y /2 < θz 2ρ Y +2ερ 1 E αy + θz 1/2 { Y 2 I( θz > 2ρ Y +2εE θz 3/2 Cερ 1/2 Cεn 1/4 2Cε 2/3 for an absolute constant C, provided that n 16(1/ε 4/3. This proves (2.12, by (6.32 and (6.33. 35

Acknowledgments. The authors are thankful to Xuming He for his contribution to the construction of the example in Section 4. References [1] Bentickus, V., Götze, F. and Zitikis, M. (1994. Lower estimates of the convergence rate for U-statistics. Ann. Probab. 22, 1701-1714. [2] Bhattacharya, R.N. and Ghosh, J.K. (1978. On the validity of the formal Edgeworth expansion. Ann. Statist. 6, 434-485. [3] Bickel,P. (1974. Edgeworth expansion in nonparametric statistics. Ann., Statist. 2, 1-20. [4] Bolthausen, E. and Götze,F. (1993. The rate of convergence for multivariate sampling statistics. Ann. Statist. 21 1692-1710. [5] Borovskich, Yu.V. (1983. Asymptotics of U-statistics and von Mises functionals. Sov. Math. Dokl. 27, 303-308. [6] Callaert, H. and Janssen, P. (1978. The Berry-Esseen theorem for U-statistics. Ann., Statist. 6, 417-421. [7] Chan, Y.K. and Wierman, J. (1977. On the Berry-Esseen theorem for U-statistics. Ann. Probab. 5, 136-139. [8] Chen, L.H.Y. and Shao, Q.M. (2001. A non-uniform Berry-Esseen bound via Stein s method. Probab. Theory Related Fields 120, 236-254. [9] Figiel, T., Hitczenko, P., Johnson, W.B., Schechtman, G., and Zinn, J. (1997. Extremal properties of Rademacher functions with applications to the Khintchine and Rosenthal inequalities. Trans. Amer. Math. Soc. 349, 997-1027. [10] Filippova, A.A. (1962. Mises s theorem of the asympototic behavior of functionals of empirical distribution functions and its statistical applications. Theory Probab. Appl. 7, 24-57. 36

[11] Finelstein, M., Kruglov, V.M., and Tucker, H.G. (1994. Convergence in law of random sums with non-random centering. J. Theoret. Probab. 3, 565-598. [12] Friedrich, K. O. (1989. A Berry-Esseen bound for functions of independent random variables. Ann. Statist. 17, 170 183. [13] Grams, W.F. and Serfling, R.J. (1973. Convergence rates for U-statistics and related statistics. Ann., Statist. 1, 153-160. [14] Helmers, R. (1977. The order of the normal approximation for linear combinations of order statistics with smooth weight functions. Ann. Probab. 5, 940-953. [15] Helmers, R. and Janssen, P. (1982. On the Berry-Esseen theorem for multivariate U-statistics. Math. Cent. Rep. SW 90/82, Mathematisch Centrum, Amsterdam, pp. 1-22. [16] Helmers, R., Janssen, P. and Serfling, R. J. (1990. Berry-Esseen bound and bootstrap results for genalized L-statistics. Scand. J. Statist. bf 17, 65-77. [17] Hoeffding, W. (1948. A class of statististics with asymptotically normal distribution. Ann. Math. Statist. 19, 293-325. [18] Koroljuk, V.S. and Borovskich, Yu. V. (1994. Theory of U-statistics. Kluwer Academic Publishers, Boston. [19] Robbins, H. (1948. The asymptotic distribution of the sum of a random numbers of random random variables. Bull. A.M.S. 54, 1151-1161. [20] Rosenthal, H.P. (1970. On the subspaces of L p (p > 2 spanned by sequences of independent random variables. Israel J. Math. 8, 273-303. [21] Serfling, R.J. (1980. Approximation Theorems of Mathematical Statisitics. Wiley, New York. [22] Shorack, G.R. (2000. Probability for Statisticians. Springer, New York. [23] Wang, Q. (2001. Non-uniform Berry-Esséen Bound for U-Statistics. Statist. Sinica (to appear [24] Wang, Q., Jing, B.Y. and Zhao, L. (2000. The Berry-Esseen bound for studentized statistics. Ann. Probab. 28, 511 535. 37

[25] Zhao, L.C. and Chen, X.R. (1983. Non-uniform convergence rates for distributions of U- statistics. Sci. Sinica (Ser. A 26, 795-810. [26] van Zwet, W.R. (1984. A Berry-Esseen bound for symmetric statisics. Z. Wahrsch. Verw. Gebiete 66, 425-440. Louis H.Y. Chen Qi-Man Shao Institute for Mathematical Sciences Department of Mathematics National University of Singapore Hong Kong University of Science and Technology Singapore 118402 Clear Water Bay, Kowloon Republic of Singapore Hong Kong lhychen@ims.nus.edu.sg maqmshao@ust.hk and Department of Mathematics Department of Mathematics Department of Statistics & Applied Probability University of Oregon National University of Singapore Eugene, OR 97403 Singapore 117543 USA Republic of Singapore and Department of Mathematics Zhejiang University Hangzhou, Zhejiang 310027 China 38