Bounds on the Constant in the Mean Central Limit Theorem

Similar documents
arxiv: v2 [math.pr] 19 Oct 2010

BOUNDS ON THE CONSTANT IN THE MEAN CENTRAL LIMIT THEOREM. BY LARRY GOLDSTEIN University of Southern California

A Gentle Introduction to Stein s Method for Normal Approximation I

Kolmogorov Berry-Esseen bounds for binomial functionals

Stein s Method: Distributional Approximation and Concentration of Measure

A square bias transformation: properties and applications

Stein s Method: Distributional Approximation and Concentration of Measure

Concentration of Measures by Bounded Size Bias Couplings

Concentration of Measures by Bounded Couplings

Stein s Method and the Zero Bias Transformation with Application to Simple Random Sampling

A Short Introduction to Stein s Method

Gaussian Phase Transitions and Conic Intrinsic Volumes: Steining the Steiner formula

Statistics 300B Winter 2018 Final Exam Due 24 Hours after receiving it

Lectures 22-23: Conditional Expectations

Ferromagnets and the classical Heisenberg model. Kay Kirkpatrick, UIUC

Lecture 1 Measure concentration

Normal Approximation for Hierarchical Structures

l(y j ) = 0 for all y j (1)

The Contraction Method on C([0, 1]) and Donsker s Theorem

NEW FUNCTIONAL INEQUALITIES

Math 205b Homework 2 Solutions

9. Banach algebras and C -algebras

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

Final Exam Practice Problems Math 428, Spring 2017

THE LINDEBERG-FELLER CENTRAL LIMIT THEOREM VIA ZERO BIAS TRANSFORMATION

L p Functions. Given a measure space (X, µ) and a real number p [1, ), recall that the L p -norm of a measurable function f : X R is defined by

On the absolute constants in the Berry Esseen type inequalities for identically distributed summands

Generalization theory

Functional Analysis. Martin Brokate. 1 Normed Spaces 2. 2 Hilbert Spaces The Principle of Uniform Boundedness 32

Stein Couplings for Concentration of Measure

ELEMENTS OF PROBABILITY THEORY

Multivariable Calculus

Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012

Convergence rates in weighted L 1 spaces of kernel density estimators for linear processes

Kernel Density Estimation

MODERATE DEVIATIONS IN POISSON APPROXIMATION: A FIRST ATTEMPT

review To find the coefficient of all the terms in 15ab + 60bc 17ca: Coefficient of ab = 15 Coefficient of bc = 60 Coefficient of ca = -17

Lecture I: Asymptotics for large GUE random matrices

Stat 5101 Notes: Algorithms (thru 2nd midterm)

Concentration inequalities and the entropy method

Stat 5101 Notes: Algorithms

Math 5588 Final Exam Solutions

Functional Analysis Exercise Class

MATH 425, FINAL EXAM SOLUTIONS

Central limit theorem for random multiplicative functions

STAT 200C: High-dimensional Statistics

Spring 2012 Math 541B Exam 1

Introduction to Self-normalized Limit Theory

Wasserstein-2 bounds in normal approximation under local dependence

MATH 19520/51 Class 5

Unit 3 Factors & Products

Statistics 3657 : Moment Approximations

4 Invariant Statistical Decision Problems

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2)

Homework 2: Solution

WEYL S LEMMA, ONE OF MANY. Daniel W. Stroock

4 Expectation & the Lebesgue Theorems

Partial Derivatives. w = f(x, y, z).

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

MATH 426, TOPOLOGY. p 1.

Self-normalized Cramér-Type Large Deviations for Independent Random Variables

Grade 8 Factorisation

On the Error Bound in the Normal Approximation for Jack Measures (Joint work with Le Van Thanh)

Lecture No 1 Introduction to Diffusion equations The heat equat

Random Fermionic Systems

Stein s method, logarithmic Sobolev and transport inequalities

Introduction to Empirical Processes and Semiparametric Inference Lecture 13: Entropy Calculations

7 Influence Functions

Nonparametric estimation under Shape Restrictions

Math 240 Calculus III

Midterm 1 Solutions Thursday, February 26

Functional Analysis Winter 2018/2019

Math 5520 Homework 2 Solutions

TWO MAPPINGS RELATED TO SEMI-INNER PRODUCTS AND THEIR APPLICATIONS IN GEOMETRY OF NORMED LINEAR SPACES. S.S. Dragomir and J.J.

Lecture 4 Lebesgue spaces and inequalities

Algebraic Expressions

On the Bennett-Hoeffding inequality

Ronalda Benjamin. Definition A normed space X is complete if every Cauchy sequence in X converges in X.

Example continued. Math 425 Intro to Probability Lecture 37. Example continued. Example

Overview of normed linear spaces

Comparison Method in Random Matrix Theory

Metric Spaces. Exercises Fall 2017 Lecturer: Viveka Erlandsson. Written by M.van den Berg

If Y and Y 0 satisfy (1-2), then Y = Y 0 a.s.

arxiv: v2 [math.st] 20 Feb 2013

CHAPTER 7 DIV, GRAD, AND CURL

On Ergodic Impulse Control with Constraint

19. From a resolution of the identity to operators - and bac

Selçuk Demir WS 2017 Functional Analysis Homework Sheet

The sum x 1 + x 2 + x 3 is (A): 4 (B): 6 (C): 8 (D): 14 (E): None of the above. How many pairs of positive integers (x, y) are there, those satisfy

Class 2 & 3 Overfitting & Regularization

1 Solutions to selected problems

RANDOM FIELDS AND GEOMETRY. Robert Adler and Jonathan Taylor

On large deviations for combinatorial sums

Stochastic relations of random variables and processes

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

Locally convex spaces, the hyperplane separation theorem, and the Krein-Milman theorem

Practice problems for Exam 1. a b = (2) 2 + (4) 2 + ( 3) 2 = 29

1 Hanoi Open Mathematical Competition 2017

Calculus 2502A - Advanced Calculus I Fall : Local minima and maxima

Hamburger Beiträge zur Angewandten Mathematik

Transcription:

Bounds on the Constant in the Mean Central Limit Theorem Larry Goldstein University of Southern California, Ann. Prob. 2010

Classical Berry Esseen Theorem Let X, X 1, X 2,... be i.i.d. with distribution G having mean zero, variance σ 2 and finite third moment. Then there exists C such that F n Φ CE X 3 σ 3 n where F n is the distribution function of S n = 1 n σ X i, n i=1 where for distribution functions F and G F G = for n N sup F (x) G(x). <x<

Different Metrics L, type of worse case error: F G = L 1, type of average case error: F G 1 = sup F (x) G(x) <x< F (x) G(x) dx

L p Berry Esseen Theorem For p 1 there exists a constant C such that F n Φ p CE X 3 σ 3 n for n N. (1) Let F σ be the collection of all distributions with mean zero, positive variance σ 2, and finite third moment. The L p Berry-Esseen constant c p is given by c p = inf{c : nσ 3 F n Φ p E X 3 C, n N, G F σ }. Each C in (1) is an upper bound on c p.

Upper Bounds in the Classical Case Classical case p =, 1. 1.88/7.59 (Berry/Esseen, 1941/1942) 2.... 3. 0.7975 (P. van Beeck, 1972). 4. 0.7655 (I. S. Shiganov in 1986). 5. 0.7056 (I.G. Shevtsova in 2006)

Asymptotic Refinements Let c p,m = inf{c : nσ 3 F n Φ p E X 3 C, n m, G F σ } Clearly c p,m decreases in m, so we have existence of the limit lim m c p,m = c p,.

Asymptotically Correct Constant: p = 1 For G F σ Esseen explicitly calculates the limit lim n n1/2 F n Φ 1 = A 1 (G). Zolotarev (1964), using characteristic function techniques, shows that σ 3 A 1 (G) sup G F σ E X 3 = 1 2, so c 1, = 1 2.

Stein Functional A bound on the (non-asymptotic) L 1 constant can be obtained by considering the extremum of a Stein functional. Extrema of Stein functionals are considered by Utev and Lefévre, 2003, who computed some exact norms of Stein operators.

Bound using Zero Bias Let W be a mean zero random variable with finite positive variance σ 2. We say W has the W zero bias distribution if E[W f(w )] = σ 2 E[f (W )] for all smooth f. If the distribution F of W has variance 1 and W and W are on the same space with W having the W zero bias distribution, then F Φ 1 2E W W.

Exchange One Zero Bias Coupling If W = X 1 + + X n, independent mean zero variables with variances σ 2 1,..., σ 2 n, and I is an independent index with distribution P (I = i) = σi 2 n, j=1 σ2 j then W = j I X j + X I has the W -zero bias distribution, when for each i, Xi has the X i -zero bias distribution independent of X j, j i.

Functional B(X) For X F σ let B(X) = 2σ2 L(X ) L(X) 1 E X 3. For R (more generally on any Polish space) valued random variables, given distributions F and G, one can construct X F and Y G such that E X Y = F G 1 In fact, let X = F 1 (U), Y = G 1 (U) for U U[0, 1].

Exchange One Zero Bias Coupling Let X 1,..., X n be independent random variables with distributions G i F σi, i = 1,..., n and let F n be the distribution function of W = (X 1 + + X n )/σ with σ 2 = σ 2 1 + + σ 2 n. Then with E X i X i = G i G i 1, E W W = 1 σ E X I X I = 1 σ n i=1 = 1 n σi 2E X i X i σ 3 E X i=1 i 3 E Xi 3 1 n = 2σ 3 B(X i )E Xi 3. i=1 σ 2 i σ 2 E X i X i

Exchange One Zero Bias Coupling If X 1,..., X n are independent mean zero random variables with distributions G 1,..., G n having finite variances σ 2 1,..., σ 2 n and finite third moments, then the distribution function F n of (X 1 + + X n )/σ with σ 2 = σ 2 1 + + σ 2 n obeys F n Φ 1 1 σ 3 n i=1 where the functional B(G) is given by B(G i )E X i 3 B(G) = 2σ2 G G 1 E X 3 when X has distribution G F σ.

Distribution Specific Constants In the i.i.d. case, and e.g., F n Φ 1 B(G)E X3, nσ 3 1. B(G) = 1 for mean zero two point distributions 2. B(G) = 1/3 for mean zero uniform distributions 3. B(G) = 0 for mean zero normal distributions

Universal Bound Recall that for G F σ B(G) = 2σ2 G G 1 E X 3. For a collection of distributions F σ>0 F σ, let B(F) = sup B(G). G F Then for X 1,..., X n i.i.d. with distribution in F σ, F n Φ 1 B(F σ)e X 3. nσ 3

Bounds on B(F σ ) Mean zero two point distributions give B(F σ ) 1 for all σ > 0. Using essentially only E X X E X + E X gives B(F σ ) 3 for all σ > 0. By coupling X and X together we improve the value of the constant from 3.

Value of Supremum Theorem 1 For all σ (0, ), B(F σ ) = 1. Hence, when X 1,..., X n are independent with distributions in F σi, i = 1,..., n and n i=1 σ2 i = σ2, F n Φ 1 1 σ 3 n i=1 E X i 3, and when these variables are identically distributed with variances σ 2, F n Φ 1 E X i 3 nσ 3.

Bounds on the Constant c 1 We can also prove the lower bound c 1 2 π(2φ(1) 1) ( π + 2) + 2e 1/2 2 π

Supremum of B(F σ ) Want to compute sup B(G) where B(G) = 2σ2 G G 1 G F σ E X 3. Successively reduce, in four steps, the computation of the supreumum of B(G) on F σ to computations over smaller collections of distributions.

First Reduction: σ = 1 Recall B(G) = 2σ2 G G 1 E X 3. By the scaling property it suffices to consider F 1. B(aX) = B(X) for all a 0

Second Reduction: compact support For X F 1, show that there exists X n, n = 1, 2,..., each in F 1 and having compact support, such that B(X n ) B(X). Hence it suffices to consider the class of distributions M F 1 with compact support.

Third Reduction: finite support For X M show that there exists X n, n = 1, 2,... in M, finitely supported, such that B(X n ) B(X). Hence it suffices to consider m 3 D m, where D m are mean zero variance one distributions supported on at most m points.

Fourth Reduction: three point support Use a convexity type property of B(G), which depends on the behavior of the zero bias transformation on a mixture, to obtain B(D 3 ) = B( m 3 D m ). Hence it suffices to consider D 3.

Lastly Show B(D 3 ) = 1.

Finding Extremes of Expectations Arguments along these lines were first considered by Hoeffding for the calculation of the extremes of EK(X 1,..., X n ) where X 1,..., X n are independent. Though B(G) is not of this form, the reasoning of Hoeffding applies. In some cases the final result obtained is not in closed form.

Reduction to Compact Support and Finite Support Continuity of the zero bias transformation: If then X n d X, and lim n EX2 n = EX 2 X n d X as n. Leads to continuity of B(G): If X n d X, then lim n EX2 n = EX 2 and lim n E X3 n = E X 3 B(X n ) B(X) as n.

From m 3 D m to D 3 If X µ be the µ mixture of a collection {X s, s S} of mean zero, variance 1 random variables satisfying E X 3 µ <. Then B(X µ ) sup B(X s ). s S In particular, if C is a collection of mean zero, variance 1 random variables with finite absolute third moments and C D such that every distribution in C can be represented as a mixture of distributions in D, then B(C) = B(D).

Zero Biasing a Mixture Theorem 2 Let {m s, s S} be a collection of mean zero distributions on R and µ a probability measure on S such that the variance σ 2 µ of the mixture distribution is positive and finite. Then m µ, the m µ zero bias distribution exists and is given by the mixture m µ = m sdν where dν dµ = σ2 s σµ 2. In particular, ν = µ if and only if σ 2 s is a constant µ a.s.

Mixture of Constant Variance: ν = µ L(Xµ) L(X µ ) 1 = sup Ef(Xµ) Ef(X µ ) f L = sup = f L sup f L sup f L S S S S Ef(Xs )dµ Ef(X s )dµ S Ef(X s ) Ef(X s ) dµ L(X s ) L(X s ) 1 dµ L(X s ) L(X s ) 1 dµ.

B(X µ ) sup s B(X s ) The relation dτ dµ = E X3 s E Xµ 3. (2) defines a probability measure, as E X 3 µ = E X 3 s ds.

B(X µ ) sup s B(X s ) Then B(X µ ) = 2 L(X µ) L(X µ ) 1 E Xµ 3 S 2 L(X s ) L(X s ) 1 dµ E Xµ 3 S = B(X s)e Xs 3 dµ E Xµ 3 = B(X s )dτ S sup s S B(X s )

Reduction of D m, m > 3 The distribution of any X D m is determined by the values a 1 < < a m and probabilities p = (p 1,..., p m ), all positive. The vector p must satisfy Ap = c where A = 1 1... 1 a 1 a 2... a m a 2 1 a 2 2... a 2 m and c = 1 0 1 For m > 3 there exists vector v 0 satisfying Av = 0. Hence X D m can be represented of a mixture of two distributions in D m 1..

Reduction to D 3 For every m > 3, every G D m can be represented as a finite mixture of distributions in D m 1. Hence B(D 3 ) = B( m 3 D m ). Every distribution D 3 with support points, say x < y < 0 < z, can be written as m α = αm 1 + (1 α)m 0, a mixture of the (unequal variance) mean zero distributions m 1 and m 0 supported on {x, z} and {y, z}, respectively.

Mixture with Unequal Variance For α [0, 1], with we have m α = αm 1 + (1 α)m 0 m α = βm 1 + (1 β)m 0 where β = Since x < y < 0, αx αx + (1 α)y. β 1 β = α x 1 α y > α 1 α and therefore β > α.

Calculating G(D 3 ) Write m D 3 as m α = αm 1 + (1 α)m 0 where m 1 and m 0 are mean zero two point distributions on {x, z} and {y, z}, respectively, x < y < 0 < z. Need to bound m α m α 1. (3) Any coupling of variables Y α and Y α with distributions m α and m α, respectively, gives an upper bound to (3). Let F 0, F 1, F 0, F 1 be the distribution functions of m 0, m 1, m 0 and m 1, respectively.

Bound by Coupling Set (Y 1, Y 0, Y 1, Y 0 ) equal to (F 1 1 (U), F 1 0 (U), (F 1 ) 1 (U), (F 0 ) 1 (U)) and let L(Y α, Y α ) be αl(y 1, Y 1 ) + (1 β)l(y 0, Y 0 ) + (β α)l(y 0, Y 1 ). Then (Y α, Y α ) has marginals Y α = d X α and Y α = d X α, and therefore m α m α 1 is upper bounded by α m 1 m 1 1 + (1 β) m 0 m 0 1 + (β α) m 1 m 0 1.

Bound on D 3 Goal is to have m α m α 1, or its upper bound α m 1 m 1 1 + (1 β) m 0 m 0 1 + (β α) m 1 m 0 1, bounded by E X 3 α /(2EX 2 α) = β m 1 m 1 1 + (1 β) m 0 m 0 1. Hence it suffices to show m 1 m 0 1 m 1 m 1 1. Reduces to computation of L 1 distances between uniform distribution on [x, z] and two point distributions on {y, z} and {x, z}.

m 1 m 0 1 m 1 m 1 1 m 0 on {y, z}, m 1 on {x, z} with x < y < 0 < z. Right hand side is m 1 m 1 1 = z2 + x 2 2(z x). Left hand side, under case where F 1 (y) F 0 (y), is [2(z x)(z y) 2 ] 1 ( z 4 2yz 3 + x 2 z 2 2x 2 yz +5y 2 z 2 + 3x 2 y 2 4xy 3 + 4xy 2 z 4xyz 2 + 2y 4 4y 3 z ).

Using Mathematica Taking the difference, after much cancelation m 1 m 1 1 m 1 m 0 1 is seen to equal 4y 2 z 2 2x 2 y 2 + 4xy 3 4xy 2 z + 4xyz 2 2y 4 + 4y 3 z 2(z x)(z y) 2, which factors as y(y x)(y 2 + 2z 2 y(x + 2z)) (z x)(z y) 2 and is positive, due to being in case F 1 (y) F 0 (y).

Bound over D 3 Since m 1 m 0 1 m 1 m 1 1 we have and therefore B(D 3 ) 1. m α m α 1 E X 3 α /(2EX 2 α),

Bound over D 3 Since m 1 m 0 1 m 1 m 1 1 we have m α m α 1 E Xα /(2EX 3 α), 2 and therefore B(D 3 ) 1. Hence 1 B(D 3 ) = B( D m ) = B(M) = B(F 1 ) 1. m 3

The Anti-Normal Distributions G F 1 is normal if and only if B(G) = 0; small B(G) close to normal. G, a mean zero two point distribution on x < 0 < y achieves sup G F1 B(G), the worst case for B(G), so anti-normal.

Lower Bound For L(X) = G F 1, F n Φ 1 c 1E X 3 n for all n N, and in particular for n = 1 c 1 F 1 Φ 1 E X 3 = G Φ 1 E X 3.

Lower Bound: 0.535377... For B B(p) for p (0, 1) let G p be the distribution function of X = (B p)/ pq. Then G p Φ 1 equals p q and letting q p Φ(x)dx + Φ(x) q dx + Φ(x) 1 dx, p q q p ψ(p) = pq p 2 + q 2 G p Φ 1 for p (0, 1) ψ(1/2) = 2 π(2φ(1) 1) ( π + 2) + 2e 1/2 2 π.

Upper Bounds in the Classical Case Classical case p =, 1. 1.88/7.59 (Berry/Esseen, 1941/1942) 2.... 3. 0.7975 (P. van Beeck, 1972). 4. 0.7655 (I. S. Shiganov in 1986). 5. 0.7056 (I.G. Shevtsova in 2006) 6. 0.4785 (I. Tyurin in 2010)

Higher Order Hermite Functionals Letting H k (x) be the k th Hermite Polynomial, if the moments of X match those of the standard normal up to order 2k, then there exists X (k) such that EH k (X)f(X) = Ef (k) (X (k) ). Can one compute extreme values of the natural generalizations of B(G) such as B k (G) = σ2k X (k) X 1 E X 2k+1 which might be the values of like constants when higher moments match the normal.