Lecture 3: Central Limit Theorem Scribe: Jacy Bird (Division of Engineering and Applied Sciences, Harvard) February 8, 003 The goal of today s lecture is to investigate the asymptotic behavior of P N ( x) for large N. We use Laplace s Method to show that a well-behaved random variable tends to a multivariate normal distribution. This lecture also introduces normalized cumulants and briefly describes how these affect the distribution shape. General Distribution As we saw in previous lecture, the probability density function for the random-walk position after N independent steps, X N, is given by ( N ) e ik x d d k pˆ n (k) (π) d. () n= In the case where p(x) is iid, Equation () simplifies to e ik x (ˆp(k)) N d d k (π) d. () Here we have defined p(x) to be the PDF for each displacement and P N (x) to be the PDF for the position, X N, after N (IID) steps. Thus it follows that there are two moment generating functions (MGF) of interest: ˆp(k) is the MGF for x n and ˆP N (k) is the MGF for X N. Likewise, we can define a cumulant generating function (CGF) for steps x n and X N. As we saw in Lecture #, the CGF for steps x n is defined as ψ(k) = log ˆp(k) (3) = ic k k c k +... Since cumulants for IID random variables are additive, the cumulant generating function for X N is denoted ψ N = (k) log ˆP N (k) = Nψ(k). As shown in Lecture #, the variance of the distribution, σ N, equals C,N = Nc which is equivalent to Nσ. Thus the width of the distribution, σ N, scales like N. Shape of the Distribution for Large N The exact expression for P N (x) becomes increasingly complex as N increases; however, for sufficiently large N the distribution tends to a general shape. In the last section (and last lecture),
M. Z. Bazant 8.366 Random Walks and Diffusion Lecture 3 p hat (k) p hat (k) N 0.8 Increasing N 0.6 0.4 0. 0.5 0.5 0 0.5.5 k ε Figure : The dominant contribution of the integral occurs within a region ε away from the maximum point as N. it was showed that the width of this distribution scales as N. In this section we are going to investigate the N scaling more formally and use Laplace s Method to derive the asymptotic distribution shape. For a review of Asymptotic Methods, the reader is referred to Bender and Orszag (978). Laplace s Method takes advantage of the fact that the dominant contribution of an integral f(k)e Nφ(x,k) dk for large N occurs a small distance, ε, from the max φ(k) (Fig. ). We run into potential problems if we let x and N both go to infinity, as will be discussed in a later lecture. Therefore for the time being, we will only look at the central region of the distribution. Let s begin by rewriting Equation () in terms of the cumulant generating function, e ik x+nψ(k) allk d d k (π) d (4) For large N, the dominant contribution of the integral is near the origin, so we can collapse the limits of integration to a small region around the origin (this approximation introduces exponentially small error). Thus for large N, P N (x) k <ɛ e ik x e Nψ(k) d d k (π) d (5) The next step is to Taylor expand ψ(k) about the origin which as was done in Equation (3). This leads to P N (x) e ik x e inc k N k c d k+... d k (π) d (6) k <ɛ See Bender and Orszag, Advanced Mathematical Methods for Scientists and Engineers, Sec. 6.4 (Springer, 978), for a detailed discussion on Laplace s Method.
M. Z. Bazant 8.366 Random Walks and Diffusion Lecture 3 3 For sufficiently large N we can truncate the series after the k c k term. Now we expand the limits of integration back to to (again, this only introduces exponentially small errors). Thus for sufficiently large N, P N (x) e ik x e inc k N k c d k d k allk (π) d (7) e ik (x Nc ) e k (Nc ) k d d k (π) d (8) The next step is to transform this expression into a Gaussian integral. More specifically we want to transform e k (Nc ) k to e w /. Provided that c is positive definite and symmetric, we define w Nc k (9) Note : It s worth reviewing what is meant by the square root of a matrix (or nd order tensor). From linear algebra we know that M can be decomposed into QAQ T, where Q is orthogonal (Q = Q T, Q = ) and A is diagonal (A ij = a i δ ij with a i > 0 being the eigenvalues of M). In general, we can define M raised to the α power as M α QA α Q T. Here A α is diagonal with (A α ) ij = a α i δ ij. Hence, the square root of M is defined above with α =. Returning to the distribution, we make the change of variable w = Nc k (0) d d w = Nc d d k () = N d/ c d d k () In these equations c is the determinant of c and equals d i= ai where a i are the eigenvalues of c. With the substitution, Equation (8) is rewritten as P N (x) ) e i(x Nc Nc w e w (π) d N d d d w c (3) Equation (3) can be further simplified by defining a new variable, z x Nc c P N (x) c (π) d N d e iz w e w d d w (4) The above equation [Equation (4)] is expressed in terms of a multivariate Gaussian distribution. However, this distribution can also be interpreted as a product of d -D Gaussian distributions taken separately. Thus Equation (4) can be rewritten as P N (x) N d ( c e iz e w dw π ) ( e iz e w ) dw... (5) π
M. Z. Bazant 8.366 Random Walks and Diffusion Lecture 3 4 Noting that integrals in the above equation are inverse Fourier Transforms of the Gaussian e w, we simplify the expression to z P N (x) c e z e... (6) π π N d P N (x) e z (πn) d c (7) The transform to z can also be viewed as the definition of a new random variable. In the last few steps, we have actually shown that the scaled random variable Z N = (X Nc )(Nc ) = (XN X N ) c /,N (8) has a PDF, φ N (z), that tends to a multivariate standard normal distribution φ N (z) e z /. (9) (π) d/ This result is known as the Multidimensional Central Limit Theorem (CLT). Note : P N (x) d d x = φ N (z) d d z Note 3: For d =, Z N = X Nc σ since c N = σ. The CLT only applies to the central region of P N (x) where z = O(). This implies that in multiple dimensions vcx N = Nc + O( Nc ) which reduces to X N = Nc + O(σ N) when d =. Note 4: Note: An explicit assumption throughout has been the existence of c and c. If c is not finite, the CLT does not hold. Note 5: For a more rigorous derivation of the CLT, the reader is referred to W. Feller, An Introduction to Theory of Probability. Vol. I,II (970). Gnedenko, Kolmogorov (968). Note 6: Equation (9) is an asymptotic solution and it appears that the mean, c, and variance, c, of the distribution are of importance. However as we will see in the next section and in subsequent lectures, higher cumulants become important when we either look at smaller N or if we investigate the tails of the distribution.
M. Z. Bazant 8.366 Random Walks and Diffusion Lecture 3 5 0.6 0.5 Skewness Standard Normal λ 3 > 0 λ 3 < 0 0.6 0.5 Kurtosis Standard Normal λ 4 > 0 λ 4 < 0 0.4 0.4 0.3 0.3 φ(z) φ(z) 0. 0. 0. 0. 0 0 0. 4 3 0 3 4 Z 0. 4 3 0 3 4 Z Figure : Effect of skewness on distribution shape. Figure 3: shape. Effect of kurtosis on distribution Convergence in the Central Region For simplicity, let s consider d=, though it is not difficult to generalize to higher dimensions. We begin again with the distribution e ikx Nψ(k) dk e π We apply Laplace s Method to higher orders of asymptotic expansion as N (Z N remains fixed). The first step is to note that the dominate contribution to the integral occurs near k = 0, so we taylor expand ψ(k) about the origin producing ɛ ɛ e ikx e N[ ic k c k + i 3! c 3k 3 + 4! c 4k 4 +...] dk π where the imaginary numbers result from the definition of the cumulants. Substituting w = σ Nk and z = x Nc σ, the above equation reduces to N φ N (z) ɛ ɛ e iωz e ω λ 3 (iω) 3 e 3! + λ 4 N (iω) 4 4! N (0) + dω π, () where λ 3 = c 3, λ σ 3 4 = c 4,... λ σ 4 m = cm σ are the normalized cumulants of p(x). The normalized m cumulants play a large role in determining the distribution shape (Fig. and Fig. 3). For sufficiently large N, we can truncate the Taylor series and then expand the limits of integration back to to. That is, as N [ φ N (z) e iωz e ω + λ 3 (iω) 3 + λ 4 (iω) 4 3! N 4! N + ( ) λ3 (iω) 6 ( ) ] ω 5 3! N + O N dω N π. () The topics in the last part of this lecture where presented more thoroughly in 003. readers benefit, I have included part of the 003 notes by David Vener. To the
M. Z. Bazant 8.366 Random Walks and Diffusion Lecture 3 6 Noticing that allows us to rewrite Equation () as φ N (z) e z [ + λ 3 π 3! (iω) n e iωz e ω = ( ) n d n dz n e iωz e ω { H 3 (z) + λ 4 N N 4! H 4 (z) + where H n (z) is the n th Hermite polynomial defined by d n dz n e z ( ) n H n (z) e z. ( ) ( ) λ3 H 6 (z)} ] z 5 + O 3! N, (3) N Note 7: The Hermite polynomials come up in many different places in mathematical physics, including, perhaps most prominently, in the eigenfunctions for the quantum harmonic oscillator. The first few Hermite polynomials are H = z, H = z, H 3 = z 3 3z, H 4 = z 4 6z + 3, H 5 = z 5 0z 3 + 5z, H 6 = z 6 5z 4 + 5z 5. where Therefore, as N we can write Equation (3) as φ N (z) e z [ + h 3 (z) + h ] 4 (z) π N N +, (4) h 3 (z) = λ 3 3! H 3 (z), (5) h 4 (z) = λ 4 4! H 4 (z) + ( ) λ3 H 6 (z). (6) 3! Note 8: In dimension, this general expansion related to the CLT is often called the Edgeworth expansion in the mathematical literature. (See, e.g. W. Feller, An Introduction to Probability, Vol..) In more dimensions when related to random flights, expansions of this type are sometimes called Gram-Charlier Expansions. (See, e.g. B. Hughes, Random Walks and Random Environments.)