A Gentle Introduction to Stein s Method for Normal Approximation I

Similar documents
A Short Introduction to Stein s Method

Stein s Method: Distributional Approximation and Concentration of Measure

Stein s Method and the Zero Bias Transformation with Application to Simple Random Sampling

Bounds on the Constant in the Mean Central Limit Theorem

THE LINDEBERG-FELLER CENTRAL LIMIT THEOREM VIA ZERO BIAS TRANSFORMATION

Differential Stein operators for multivariate continuous distributions and applications

Stein s Method and Stochastic Geometry

Gaussian Phase Transitions and Conic Intrinsic Volumes: Steining the Steiner formula

Normal Approximation for Hierarchical Structures

Fundamentals of Stein s method

STEIN MEETS MALLIAVIN IN NORMAL APPROXIMATION. Louis H. Y. Chen National University of Singapore

KOLMOGOROV DISTANCE FOR MULTIVARIATE NORMAL APPROXIMATION. Yoon Tae Kim and Hyun Suk Park

Stein s method, logarithmic Sobolev and transport inequalities

Stein s method and zero bias transformation: Application to CDO pricing

Wasserstein-2 bounds in normal approximation under local dependence

Ferromagnets and the classical Heisenberg model. Kay Kirkpatrick, UIUC

NEW FUNCTIONAL INEQUALITIES

STEIN S METHOD, SEMICIRCLE DISTRIBUTION, AND REDUCED DECOMPOSITIONS OF THE LONGEST ELEMENT IN THE SYMMETRIC GROUP

Kolmogorov Berry-Esseen bounds for binomial functionals

Normal approximation of Poisson functionals in Kolmogorov distance

Lecture 3: Expected Value. These integrals are taken over all of Ω. If we wish to integrate over a measurable subset A Ω, we will write

Exercises with solutions (Set D)

MODERATE DEVIATIONS IN POISSON APPROXIMATION: A FIRST ATTEMPT

On the asymptotic behaviour of the number of renewals via translated Poisson

Stein Couplings for Concentration of Measure

Limiting Distributions

Stein s method and weak convergence on Wiener space

Mathematical Methods for Neurosciences. ENS - Master MVA Paris 6 - Master Maths-Bio ( )

Limiting Distributions

Random fraction of a biased sample

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Stein s Method Applied to Some Statistical Problems

Section 27. The Central Limit Theorem. Po-Ning Chen, Professor. Institute of Communications Engineering. National Chiao Tung University

Probability and Measure

Asymptotic Statistics-III. Changliang Zou

Stat 5101 Notes: Algorithms

Chapter 6. Stein s method, Malliavin calculus, Dirichlet forms and the fourth moment theorem

Concentration of Measures by Bounded Couplings

Lecture 21: Convergence of transformations and generating a random variable

Stein s method on Wiener chaos

Spring 2012 Math 541B Exam 1

Methods of evaluating estimators and best unbiased estimators Hamid R. Rabiee

Gaussian, Markov and stationary processes

Example continued. Math 425 Intro to Probability Lecture 37. Example continued. Example

DERIVATIONS IN SYMBOLIC LOGIC I

Lecture 9: March 26, 2014

Stein s Method: Distributional Approximation and Concentration of Measure

Multivariate approximation in total variation

Malliavin calculus and central limit theorems

A Note on Poisson Approximation for Independent Geometric Random Variables

Bounds of the normal approximation to random-sum Wilcoxon statistics

NORMAL APPROXIMATION FOR WHITE NOISE FUNCTIONALS BY STEIN S METHOD AND HIDA CALCULUS

Stat 5101 Notes: Algorithms (thru 2nd midterm)

On the Error Bound in the Normal Approximation for Jack Measures (Joint work with Le Van Thanh)

Convergence in Distribution

Introduction to Self-normalized Limit Theory

Stat 710: Mathematical Statistics Lecture 31

Homework 2: Solution

STA205 Probability: Week 8 R. Wolpert

Midterm #1. Lecture 10: Joint Distributions and the Law of Large Numbers. Joint Distributions - Example, cont. Joint Distributions - Example

Large Sample Theory. Consider a sequence of random variables Z 1, Z 2,..., Z n. Convergence in probability: Z n

The Stein and Chen-Stein methods for functionals of non-symmetric Bernoulli processes

Siegel s formula via Stein s identities

Chapter 6: Functions of Random Variables

Pointwise convergence rates and central limit theorems for kernel density estimators in linear processes

Mod-φ convergence I: examples and probabilistic estimates

STEIN S METHOD AND THE LAPLACE DISTRIBUTION. John Pike Department of Mathematics Cornell University

Practice Problem - Skewness of Bernoulli Random Variable. Lecture 7: Joint Distributions and the Law of Large Numbers. Joint Distributions - Example

Stein s Method for Steady-State Approximations: Error Bounds and Engineering Solutions

The Stein and Chen-Stein Methods for Functionals of Non-Symmetric Bernoulli Processes

A NEW CLASS OF SKEW-NORMAL DISTRIBUTIONS

Math 576: Quantitative Risk Management

Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012

Berry Esseen Bounds for Combinatorial Central Limit Theorems and Pattern Occurrences, using Zero and Size Biasing

Self-normalized Cramér-Type Large Deviations for Independent Random Variables

Concentration of Measures by Bounded Size Bias Couplings

Math 341: Probability Seventeenth Lecture (11/10/09)

6 The normal distribution, the central limit theorem and random samples

Stein s method and stochastic analysis of Rademacher functionals

The Moment Method; Convex Duality; and Large/Medium/Small Deviations

Lecture 22: Final Review

{σ x >t}p x. (σ x >t)=e at.

3 A Linear Perturbation Formula for Inverse Functions Set Up... 5

Expectation, Variance and Standard Deviation for Continuous Random Variables Class 6, Jeremy Orloff and Jonathan Bloom

Lecture 2: Review of Probability

g(x) = P (y) Proof. This is true for n = 0. Assume by the inductive hypothesis that g (n) (0) = 0 for some n. Compute g (n) (h) g (n) (0)

Last Lecture. Biostatistics Statistical Inference Lecture 14 Obtaining Best Unbiased Estimator. Related Theorems. Rao-Blackwell Theorem

Continuous Random Variables

Stein s Method for Steady-State Approximations: Error Bounds and Engineering Solutions

On large deviations for combinatorial sums

Advanced Probability Theory (Math541)

COMPLETE METRIC SPACES AND THE CONTRACTION MAPPING THEOREM

BOUNDS ON THE CONSTANT IN THE MEAN CENTRAL LIMIT THEOREM. BY LARRY GOLDSTEIN University of Southern California

Bernardo D Auria Stochastic Processes /10. Notes. Abril 13 th, 2010

1 Solution to Problem 2.1

The Central Limit Theorem: More of the Story

Exercises. T 2T. e ita φ(t)dt.

Lecture 2. (See Exercise 7.22, 7.23, 7.24 in Casella & Berger)

Ferromagnets and superconductors. Kay Kirkpatrick, UIUC

Lecture 6 Basic Probability

Transcription:

A Gentle Introduction to Stein s Method for Normal Approximation I Larry Goldstein University of Southern California

Introduction to Stein s Method for Normal Approximation 1. Much activity since Stein s 1972 paper 2. Any introduction, and this one in particular, is necessarily somewhat biased, a selective tour of a larger area than the one presented

Introduction to Stein s Method for Normal Approximation I. Background, Stein Identity, Equation, Bounds II. Size Bias Couplings III. Exchangeable Pair, Zero Bias Couplings IV. Local dependence, Nonsmooth functions

De Moive/Laplace Theorem, 1733/1820 Let X 1,..., X n be independent and identically distributed (i.i.d.) random variables with distribution and let P (X 1 = 1) = P (X 1 = 1) = 1/2, S n = X 1 + + X n and W n = S n / n. Then W n d Z with Z N (0, 1), that is, for all x R, x lim P (W 1 n x) = Φ(x) := e z2 /2 dz. n 2π

The Central Limit Theorem Let X 1,..., X n be i.i.d. with mean µ and variance σ 2. Then W n = S n nµ σ n d Z as n. May relax the identical distribution assumption by imposing the Lindeberg Condition.

The Central Limit Theorem Let X 1,..., X n be independent and identically distributed with mean µ and variance σ 2. Then W n = S n nµ σ n d Z as n. If in addition E X 1 3 <, a bound on the distance between the distribution function F n of W n and Φ is provided by the Berry-Esseen Theorem, sup F n (x) Φ(x) ce X 1 3 <x< σ 3 n where c 0.7056 [Shevtsova (2006)].

Simple Random Sampling Let A = {a 1,..., a N } be real numbers, and X 1,..., X n a simple random sample of A of size 0 < n < N. Prove W n = (S n ES n )/ Var(S n ) satisfies W n d Z as n. As the variables are dependent, the classical CLT does not apply.

Simple Random Sampling Proved by Hájek in 1960, over 200 years after Laplace. Demonstrating that the sum of a simple random sample is asymptotically normal, due to the dependence, seems to be somewhat harder than handling the independent case.

Approach by Stein s Method Without loss of generality, suppose that a = 0. a A Let X, X, X 2,..., X n+1 be a simple random sample of size n + 1 of A, and set W = X + n X i and W = X + i=2 The pair W, W is exchangeable, that is, (W, W ) = d (W, W ). n X i. i=2

Approach by Stein s Method In addition to (W, W ) being an exchangeable pair, E[X W ] = 1 n W, and E[X W ] = 1 N n W. As W W = X X, with 2 < n < N 1, E[W W ] = (1 λ)w for λ = N n(n n) (0, 1). We call such a (W, W ) a Stein pair. We are handling the random variables directly.

Approach by Stein s Method The construction of this Stein pair indicates that Stein s method might be used to prove Hájek s theorem, and additionally provide a Berry-Esseen bound. Bounds to the normal using these types of approaches typically reduce to the computation of, or bounds on, low order moments, perhaps even only on variances of certain quantities. Under the principle there is no such thing as a free lunch, such variance computations may be difficult.

Stein s Lemma Characterization of the normal distribution: if and only if Z N (0, 1) E[Zf(Z)] = E[f (Z)] for all absolutely continuous functions f such that E f (Z) <.

Stein s Lemma Characterization of the N (0, 1) distribution: E[Zf(Z)] = E[f (Z)] if and only if Z N (0, 1). Note that the normal N (0, 1) density function satisfies the dual equation φ(z) = 1 2π e z2 /2 zφ(z) = φ (z).

Proof of Stein s Lemma If direction: Assume Z is standard normal. Break Ef (Z) into two parts to handle separately, Ef (Z) = = f (z)φ(z)dz f (z)φ(z)dz + 0 0 f (z)φ(z)dz.

Proof of Stein s Lemma Consider the first integral. Use Fubini s theorem to change the order of integration to obtain 0 f (z)φ(z)dz = = = = f (z) 0 0 z y 0 0 0 z yφ(y)dydz f (z)yφ(y)dydz f (z)yφ(y)dzdy [f(y) f(0)]yφ(y)dy.

Proof of Stein s Lemma Similarly, for the second integral, 0 f (z)φ(z)dz = 0 [f(y) f(0)]yφ(y)dy. Combining gives Ef (Z) = EZ[f(Z) f(0)] = EZf(Z).

Stein s Lemma, N (µ, σ 2 ) If X N (µ, σ 2 ), then (X µ)/σ N (0, 1), so ( ) ( ) ( ) X µ X µ X µ E g = Eg. σ σ σ Letting f(x) = g((x µ)/σ), we have f (x) = σ 1 g ((x µ)/σ) or σf (x) = g ((x µ)/σ), so, in general X N (µ, σ 2 ) if and only if E(X µ)f(x) = σ 2 Ef (X).

Stein s Method: Basic Idea If F is a sufficiently large class of functions, and E(X µ)f(x) = σ 2 Ef (X) for all f F then X N (µ, σ 2 ). Hence, if E(W µ)f(w ) σ 2 Ef (W ) for all f F then W N (µ, σ 2 ). Hence, we would like to show that E[(W µ)f(w ) σ 2 f (W )] 0.

Distance to Normality Let X and Y be random variables. Many distances between the distributions of X and Y can be given by d(x, Y ) = sup Eh(X) Eh(Y ) h H for some class of functions H. Distance may also be given in terms of functions of the distribution functions F and G, of X and Y, respectively.

Kolmogorov, or L distance Letting X and Y have distribution functions F and G, respectively, F G = sup <z< F (z) G(z) = sup Eh(X) Eh(Y ) h H for H = {1 (,z] (x) : z R}.

Wasserstein, or L 1 distance for F G 1 = F (z) G(z) dz = sup Eh(X) Eh(Y ) h H H = {h : h(x) h(y) x y }. Have also that F G 1 = inf E X Y over all couplings of X and Y on a joint space with the given marginals.

Total Variation Distance Total variation distance F G TV = sup Eh(X) Eh(Y ) h H where H = {h : 0 h 1}.

From the Stein Identity to the Stein Equation Can measure distance from W to a standard normal Z by for Nh = Eh(Z) over H. Eh(W ) Nh Stein identity says discrepancy between W and Z is also reflected in E[f (W ) W f(w )].

From the Stein Identity to the Stein Equation Can measure distance from W to normal Z by for Nh = Eh(Z) over H. Eh(W ) Nh Stein identity says discrepancy between W and Z is also reflected in E[f (W ) W f(w )]. Equating these quantities at w yields the Stein Equation: f (w) wf(w) = h(w) Nh.

Stein Equation f (w) wf(w) = h(w) Nh. (1) Goal: Given h H, compute (a bound) on Eh(W ) Nh. Approach: Given h, solve (1) for f and instead compute E [f (W ) W f(w )]. Apriori, this appears harder. It also requires bounds on f in terms of h.

Solution to the Stein Equation For h such that Nh exists, write as and f (w) wf(w) = h(w) Nh (2) e w2 /2 d dw w f(w) = e w2 /2 ( ) e w2 /2 f(w) = h(w) Nh [h(z) Nh]e z2 /2 dz is the unique bounded solution to (2). Note also that f has one more derivative than h.

Solution to the Stein Equation Since Eh(Z) Nh = 0, w f(w) = e w2 /2 [h(u) Nh]e u2 /2 du = e w2 /2 w [h(u) Nh]e u2 /2 du.

Bounds on the Solution to the Stein Equation Example: If f is the unique bounded solution to the Stein equation for a bounded function h, then π f 2 h Nh.

Bounds on Solution to the Stein Equation For w 0, we have f(w) = e w2 /2 while w sup h(x) Nh e w2 /2 x 0 d w 2 dw ew /2 e u2 /2 du = 1 + we w2 /2 [h(u) Nh]e u2 /2 du w w e u2 /2 du, e u2 /2 du > 0, increasing over (, 0], so maximum is attained at w = 0.

Bounds on Solution to the Stein Equation For w 0, f(w) sup h(x) Nh e w2 /2 x 0 sup h(x) Nh x 0 0 = sup h(x) Nh 2π x 0 π = sup h(x) Nh x 0 Similar bound for w 0. 2. w e u2 /2 du e u2 /2 du ( 1 2π 0 ) e u2 /2 du

Bounds on Solution of the Stein Equation If h is bounded f and absolutely continuous with h <, π 2 h Nh, f 2 h Nh, f 2 h.

Indicator Bounds For h z (w) = 1 (,z] (w) [Chen and Shao (2005)] (Singapore lecture notes) 0 < f(w) 2π 4 and f (w) 1. Also will use one additional bound for smoothed indicators which decrease from 1 to zero over interval of some small length λ > 0.

Proof of Stein s Lemma Only if direction. Suppose E[f (Z)] = E[Zf(Z)] for all absolutely continuous functions f with E f (Z) <. Let f(w) be the solution to the Stein Equation for h(w) = 1 (,z] (w). Then f is absolutely continuous and f <. Hence that is E[1 (,z] (Z) Φ(z)] = E[f (Z) Zf(Z)] = 0, P (Z z) = E1 (,z] (Z) = Φ(z) for all z R.

Generator Approach to the Stein Equation [Barbour (1990), Götze (1991)] Note that with one more derivative (Af)(w) = σ 2 f (w) wf (w) is the generator of the Ornstein-Uhlenbeck process W t, dw t = 2σdB t W t dt which has the normal N (0, σ 2 ) as its unique stationary distribution.

Generator Approach Letting T t be the transition operator of the Ornstein-Uhlenbeck process W t, t 0, (T t h)(x) = E [h(w t ) W 0 = x], a solution to Af = h Nh is given by f = 0 T t hdt. Technique works also in higher dimensional, and more abstract spaces.

Coming Attractions II. Size Bias Couplings III. Exchangeable Pair, Zero Bias Couplings IV. Local dependence, Nonsmooth functions