Bounds on the Constant in the Mean Central Limit Theorem Larry Goldstein University of Southern California, Ann. Prob. 2010
Classical Berry Esseen Theorem Let X, X 1, X 2,... be i.i.d. with distribution G having mean zero, variance σ 2 and finite third moment. Then there exists C such that F n Φ CE X 3 σ 3 n where F n is the distribution function of S n = 1 n σ X i, n i=1 where for distribution functions F and G F G = for n N sup F (x) G(x). <x<
Different Metrics L, type of worse case error: F G = L 1, type of average case error: F G 1 = sup F (x) G(x) <x< F (x) G(x) dx
L p Berry Esseen Theorem For p 1 there exists a constant C such that F n Φ p CE X 3 σ 3 n for n N. (1) Let F σ be the collection of all distributions with mean zero, positive variance σ 2, and finite third moment. The L p Berry-Esseen constant c p is given by c p = inf{c : nσ 3 F n Φ p E X 3 C, n N, G F σ }. Each C in (1) is an upper bound on c p.
Upper Bounds in the Classical Case Classical case p =, 1. 1.88/7.59 (Berry/Esseen, 1941/1942) 2.... 3. 0.7975 (P. van Beeck, 1972). 4. 0.7655 (I. S. Shiganov in 1986). 5. 0.7056 (I.G. Shevtsova in 2006)
Asymptotic Refinements Let c p,m = inf{c : nσ 3 F n Φ p E X 3 C, n m, G F σ } Clearly c p,m decreases in m, so we have existence of the limit lim m c p,m = c p,.
Asymptotically Correct Constant: p = 1 For G F σ Esseen explicitly calculates the limit lim n n1/2 F n Φ 1 = A 1 (G). Zolotarev (1964), using characteristic function techniques, shows that σ 3 A 1 (G) sup G F σ E X 3 = 1 2, so c 1, = 1 2.
Stein Functional A bound on the (non-asymptotic) L 1 constant can be obtained by considering the extremum of a Stein functional. Extrema of Stein functionals are considered by Utev and Lefévre, 2003, who computed some exact norms of Stein operators.
Bound using Zero Bias Let W be a mean zero random variable with finite positive variance σ 2. We say W has the W zero bias distribution if E[W f(w )] = σ 2 E[f (W )] for all smooth f. If the distribution F of W has variance 1 and W and W are on the same space with W having the W zero bias distribution, then F Φ 1 2E W W.
Exchange One Zero Bias Coupling If W = X 1 + + X n, independent mean zero variables with variances σ 2 1,..., σ 2 n, and I is an independent index with distribution P (I = i) = σi 2 n, j=1 σ2 j then W = j I X j + X I has the W -zero bias distribution, when for each i, Xi has the X i -zero bias distribution independent of X j, j i.
Functional B(X) For X F σ let B(X) = 2σ2 L(X ) L(X) 1 E X 3. For R (more generally on any Polish space) valued random variables, given distributions F and G, one can construct X F and Y G such that E X Y = F G 1 In fact, let X = F 1 (U), Y = G 1 (U) for U U[0, 1].
Exchange One Zero Bias Coupling Let X 1,..., X n be independent random variables with distributions G i F σi, i = 1,..., n and let F n be the distribution function of W = (X 1 + + X n )/σ with σ 2 = σ 2 1 + + σ 2 n. Then with E X i X i = G i G i 1, E W W = 1 σ E X I X I = 1 σ n i=1 = 1 n σi 2E X i X i σ 3 E X i=1 i 3 E Xi 3 1 n = 2σ 3 B(X i )E Xi 3. i=1 σ 2 i σ 2 E X i X i
Exchange One Zero Bias Coupling If X 1,..., X n are independent mean zero random variables with distributions G 1,..., G n having finite variances σ 2 1,..., σ 2 n and finite third moments, then the distribution function F n of (X 1 + + X n )/σ with σ 2 = σ 2 1 + + σ 2 n obeys F n Φ 1 1 σ 3 n i=1 where the functional B(G) is given by B(G i )E X i 3 B(G) = 2σ2 G G 1 E X 3 when X has distribution G F σ.
Distribution Specific Constants In the i.i.d. case, and e.g., F n Φ 1 B(G)E X3, nσ 3 1. B(G) = 1 for mean zero two point distributions 2. B(G) = 1/3 for mean zero uniform distributions 3. B(G) = 0 for mean zero normal distributions
Universal Bound Recall that for G F σ B(G) = 2σ2 G G 1 E X 3. For a collection of distributions F σ>0 F σ, let B(F) = sup B(G). G F Then for X 1,..., X n i.i.d. with distribution in F σ, F n Φ 1 B(F σ)e X 3. nσ 3
Bounds on B(F σ ) Mean zero two point distributions give B(F σ ) 1 for all σ > 0. Using essentially only E X X E X + E X gives B(F σ ) 3 for all σ > 0. By coupling X and X together we improve the value of the constant from 3.
Value of Supremum Theorem 1 For all σ (0, ), B(F σ ) = 1. Hence, when X 1,..., X n are independent with distributions in F σi, i = 1,..., n and n i=1 σ2 i = σ2, F n Φ 1 1 σ 3 n i=1 E X i 3, and when these variables are identically distributed with variances σ 2, F n Φ 1 E X i 3 nσ 3.
Bounds on the Constant c 1 We can also prove the lower bound c 1 2 π(2φ(1) 1) ( π + 2) + 2e 1/2 2 π
Supremum of B(F σ ) Want to compute sup B(G) where B(G) = 2σ2 G G 1 G F σ E X 3. Successively reduce, in four steps, the computation of the supreumum of B(G) on F σ to computations over smaller collections of distributions.
First Reduction: σ = 1 Recall B(G) = 2σ2 G G 1 E X 3. By the scaling property it suffices to consider F 1. B(aX) = B(X) for all a 0
Second Reduction: compact support For X F 1, show that there exists X n, n = 1, 2,..., each in F 1 and having compact support, such that B(X n ) B(X). Hence it suffices to consider the class of distributions M F 1 with compact support.
Third Reduction: finite support For X M show that there exists X n, n = 1, 2,... in M, finitely supported, such that B(X n ) B(X). Hence it suffices to consider m 3 D m, where D m are mean zero variance one distributions supported on at most m points.
Fourth Reduction: three point support Use a convexity type property of B(G), which depends on the behavior of the zero bias transformation on a mixture, to obtain B(D 3 ) = B( m 3 D m ). Hence it suffices to consider D 3.
Lastly Show B(D 3 ) = 1.
Finding Extremes of Expectations Arguments along these lines were first considered by Hoeffding for the calculation of the extremes of EK(X 1,..., X n ) where X 1,..., X n are independent. Though B(G) is not of this form, the reasoning of Hoeffding applies. In some cases the final result obtained is not in closed form.
Reduction to Compact Support and Finite Support Continuity of the zero bias transformation: If then X n d X, and lim n EX2 n = EX 2 X n d X as n. Leads to continuity of B(G): If X n d X, then lim n EX2 n = EX 2 and lim n E X3 n = E X 3 B(X n ) B(X) as n.
From m 3 D m to D 3 If X µ be the µ mixture of a collection {X s, s S} of mean zero, variance 1 random variables satisfying E X 3 µ <. Then B(X µ ) sup B(X s ). s S In particular, if C is a collection of mean zero, variance 1 random variables with finite absolute third moments and C D such that every distribution in C can be represented as a mixture of distributions in D, then B(C) = B(D).
Zero Biasing a Mixture Theorem 2 Let {m s, s S} be a collection of mean zero distributions on R and µ a probability measure on S such that the variance σ 2 µ of the mixture distribution is positive and finite. Then m µ, the m µ zero bias distribution exists and is given by the mixture m µ = m sdν where dν dµ = σ2 s σµ 2. In particular, ν = µ if and only if σ 2 s is a constant µ a.s.
Mixture of Constant Variance: ν = µ L(Xµ) L(X µ ) 1 = sup Ef(Xµ) Ef(X µ ) f L = sup = f L sup f L sup f L S S S S Ef(Xs )dµ Ef(X s )dµ S Ef(X s ) Ef(X s ) dµ L(X s ) L(X s ) 1 dµ L(X s ) L(X s ) 1 dµ.
B(X µ ) sup s B(X s ) The relation dτ dµ = E X3 s E Xµ 3. (2) defines a probability measure, as E X 3 µ = E X 3 s ds.
B(X µ ) sup s B(X s ) Then B(X µ ) = 2 L(X µ) L(X µ ) 1 E Xµ 3 S 2 L(X s ) L(X s ) 1 dµ E Xµ 3 S = B(X s)e Xs 3 dµ E Xµ 3 = B(X s )dτ S sup s S B(X s )
Reduction of D m, m > 3 The distribution of any X D m is determined by the values a 1 < < a m and probabilities p = (p 1,..., p m ), all positive. The vector p must satisfy Ap = c where A = 1 1... 1 a 1 a 2... a m a 2 1 a 2 2... a 2 m and c = 1 0 1 For m > 3 there exists vector v 0 satisfying Av = 0. Hence X D m can be represented of a mixture of two distributions in D m 1..
Reduction to D 3 For every m > 3, every G D m can be represented as a finite mixture of distributions in D m 1. Hence B(D 3 ) = B( m 3 D m ). Every distribution D 3 with support points, say x < y < 0 < z, can be written as m α = αm 1 + (1 α)m 0, a mixture of the (unequal variance) mean zero distributions m 1 and m 0 supported on {x, z} and {y, z}, respectively.
Mixture with Unequal Variance For α [0, 1], with we have m α = αm 1 + (1 α)m 0 m α = βm 1 + (1 β)m 0 where β = Since x < y < 0, αx αx + (1 α)y. β 1 β = α x 1 α y > α 1 α and therefore β > α.
Calculating G(D 3 ) Write m D 3 as m α = αm 1 + (1 α)m 0 where m 1 and m 0 are mean zero two point distributions on {x, z} and {y, z}, respectively, x < y < 0 < z. Need to bound m α m α 1. (3) Any coupling of variables Y α and Y α with distributions m α and m α, respectively, gives an upper bound to (3). Let F 0, F 1, F 0, F 1 be the distribution functions of m 0, m 1, m 0 and m 1, respectively.
Bound by Coupling Set (Y 1, Y 0, Y 1, Y 0 ) equal to (F 1 1 (U), F 1 0 (U), (F 1 ) 1 (U), (F 0 ) 1 (U)) and let L(Y α, Y α ) be αl(y 1, Y 1 ) + (1 β)l(y 0, Y 0 ) + (β α)l(y 0, Y 1 ). Then (Y α, Y α ) has marginals Y α = d X α and Y α = d X α, and therefore m α m α 1 is upper bounded by α m 1 m 1 1 + (1 β) m 0 m 0 1 + (β α) m 1 m 0 1.
Bound on D 3 Goal is to have m α m α 1, or its upper bound α m 1 m 1 1 + (1 β) m 0 m 0 1 + (β α) m 1 m 0 1, bounded by E X 3 α /(2EX 2 α) = β m 1 m 1 1 + (1 β) m 0 m 0 1. Hence it suffices to show m 1 m 0 1 m 1 m 1 1. Reduces to computation of L 1 distances between uniform distribution on [x, z] and two point distributions on {y, z} and {x, z}.
m 1 m 0 1 m 1 m 1 1 m 0 on {y, z}, m 1 on {x, z} with x < y < 0 < z. Right hand side is m 1 m 1 1 = z2 + x 2 2(z x). Left hand side, under case where F 1 (y) F 0 (y), is [2(z x)(z y) 2 ] 1 ( z 4 2yz 3 + x 2 z 2 2x 2 yz +5y 2 z 2 + 3x 2 y 2 4xy 3 + 4xy 2 z 4xyz 2 + 2y 4 4y 3 z ).
Using Mathematica Taking the difference, after much cancelation m 1 m 1 1 m 1 m 0 1 is seen to equal 4y 2 z 2 2x 2 y 2 + 4xy 3 4xy 2 z + 4xyz 2 2y 4 + 4y 3 z 2(z x)(z y) 2, which factors as y(y x)(y 2 + 2z 2 y(x + 2z)) (z x)(z y) 2 and is positive, due to being in case F 1 (y) F 0 (y).
Bound over D 3 Since m 1 m 0 1 m 1 m 1 1 we have and therefore B(D 3 ) 1. m α m α 1 E X 3 α /(2EX 2 α),
Bound over D 3 Since m 1 m 0 1 m 1 m 1 1 we have m α m α 1 E Xα /(2EX 3 α), 2 and therefore B(D 3 ) 1. Hence 1 B(D 3 ) = B( D m ) = B(M) = B(F 1 ) 1. m 3
The Anti-Normal Distributions G F 1 is normal if and only if B(G) = 0; small B(G) close to normal. G, a mean zero two point distribution on x < 0 < y achieves sup G F1 B(G), the worst case for B(G), so anti-normal.
Lower Bound For L(X) = G F 1, F n Φ 1 c 1E X 3 n for all n N, and in particular for n = 1 c 1 F 1 Φ 1 E X 3 = G Φ 1 E X 3.
Lower Bound: 0.535377... For B B(p) for p (0, 1) let G p be the distribution function of X = (B p)/ pq. Then G p Φ 1 equals p q and letting q p Φ(x)dx + Φ(x) q dx + Φ(x) 1 dx, p q q p ψ(p) = pq p 2 + q 2 G p Φ 1 for p (0, 1) ψ(1/2) = 2 π(2φ(1) 1) ( π + 2) + 2e 1/2 2 π.
Upper Bounds in the Classical Case Classical case p =, 1. 1.88/7.59 (Berry/Esseen, 1941/1942) 2.... 3. 0.7975 (P. van Beeck, 1972). 4. 0.7655 (I. S. Shiganov in 1986). 5. 0.7056 (I.G. Shevtsova in 2006) 6. 0.4785 (I. Tyurin in 2010)
Higher Order Hermite Functionals Letting H k (x) be the k th Hermite Polynomial, if the moments of X match those of the standard normal up to order 2k, then there exists X (k) such that EH k (X)f(X) = Ef (k) (X (k) ). Can one compute extreme values of the natural generalizations of B(G) such as B k (G) = σ2k X (k) X 1 E X 2k+1 which might be the values of like constants when higher moments match the normal.