Gaussian vectors and central limit theorem

Similar documents
Statistics, Data Analysis, and Simulation SS 2015

Lecture 11. Multivariate Normal theory

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2

Chapter 4. Chapter 4 sections

MAS113 Introduction to Probability and Statistics. Proofs of theorems

1 Presessional Probability

3. Probability and Statistics

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

Probability Background

the convolution of f and g) given by

Limiting Distributions

Statistics for scientists and engineers

Probability and Distributions

The Multivariate Normal Distribution. In this case according to our theorem

Convergence in Distribution

Section 9.1. Expected Values of Sums

Lecture 1: August 28

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors

Formulas for probability theory and linear models SF2941

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

MAS223 Statistical Inference and Modelling Exercises

Exercises and Answers to Chapter 1

4. Distributions of Functions of Random Variables

Limiting Distributions

Generating and characteristic functions. Generating and Characteristic Functions. Probability generating function. Probability generating function

A Probability Review

Part II Probability and Measure

Continuous Random Variables and Continuous Distributions

MAS113 Introduction to Probability and Statistics. Proofs of theorems

Tom Salisbury

Probability and Statistics Notes

UC Berkeley Department of Electrical Engineering and Computer Sciences. EECS 126: Probability and Random Processes

Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama

ECE 650 Lecture 4. Intro to Estimation Theory Random Vectors. ECE 650 D. Van Alphen 1

BASICS OF PROBABILITY

Random Variables and Their Distributions

8 - Continuous random vectors

Brownian motion. Samy Tindel. Purdue University. Probability Theory 2 - MA 539

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2

Supermodular ordering of Poisson arrays

Probability and Measure

Chapter 5 continued. Chapter 5 sections

Asymptotic Statistics-VI. Changliang Zou

1 Review of Probability and Distributions

Stochastic Convergence, Delta Method & Moment Estimators

1. Stochastic Processes and filtrations

discrete random variable: probability mass function continuous random variable: probability density function

Good luck! Problem 1. (b) The random variables X 1 and X 2 are independent and N(0, 1)-distributed. Show that the random variables X 1

5 Operations on Multiple Random Variables

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

CHAPTER 1 DISTRIBUTION THEORY 1 CHAPTER 1: DISTRIBUTION THEORY

Continuous Random Variables

Lecture 6 Basic Probability

1 Exercises for lecture 1

Notes on Random Vectors and Multivariate Normal

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

STA205 Probability: Week 8 R. Wolpert

PROBABILITY THEORY BRAD BAXTER. Version:

7.3 The Chi-square, F and t-distributions

Statistics 1B. Statistics 1B 1 (1 1)

If g is also continuous and strictly increasing on J, we may apply the strictly increasing inverse function g 1 to this inequality to get

CS145: Probability & Computing

Characteristic Functions and the Central Limit Theorem

STAT 200C: High-dimensional Statistics

STAT 430/510: Lecture 16

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

Multivariate Distributions

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Stochastic Models (Lecture #4)

5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality.

EE4601 Communication Systems

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Lecture 14: Multivariate mgf s and chf s

P (A G) dp G P (A G)

Multivariate Gaussian Distribution. Auxiliary notes for Time Series Analysis SF2943. Spring 2013

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Mathematical Statistics

Statistics, Data Analysis, and Simulation SS 2013

1 Solution to Problem 2.1

BIOS 2083 Linear Models Abdus S. Wahed. Chapter 2 84

Chapter 5. Chapter 5 sections

Stein s Method and Characteristic Functions

Probability Theory and Statistics. Peter Jochumzen

Week 9 The Central Limit Theorem and Estimation Concepts

1 Review of Probability

STAT/MATH 395 A - PROBABILITY II UW Winter Quarter Moment functions. x r p X (x) (1) E[X r ] = x r f X (x) dx (2) (x E[X]) r p X (x) (3)

6 The normal distribution, the central limit theorem and random samples

Chapter 2 Continuous Distributions

Chapter 7: Special Distributions

Chp 4. Expectation and Variance

Review of Probability Theory II

Random Variables. P(x) = P[X(e)] = P(e). (1)

STAT 512 sp 2018 Summary Sheet

Lecture 22: Variance and Covariance

µ X (A) = P ( X 1 (A) )

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

Introduction to Probability and Stocastic Processes - Part I

Transcription:

Gaussian vectors and central limit theorem Samy Tindel Purdue University Probability Theory 2 - MA 539 Samy T. Gaussian vectors & CLT Probability Theory 1 / 86

Outline 1 Real Gaussian random variables 2 Random vectors 3 Gaussian random vectors 4 Central limit theorem 5 Empirical mean and variance Samy T. Gaussian vectors & CLT Probability Theory 2 / 86

Outline 1 Real Gaussian random variables 2 Random vectors 3 Gaussian random vectors 4 Central limit theorem 5 Empirical mean and variance Samy T. Gaussian vectors & CLT Probability Theory 3 / 86

Standard Gaussian random variable Definition: Let X be a real valued random variable. X is called standard Gaussian if its probability law admits the density: f (x) = 1 ( exp x 2 ), x R. 2π 2 Notation: We denote by N 1 (0, 1) or N (0, 1) this law. Samy T. Gaussian vectors & CLT Probability Theory 4 / 86

Gaussian random variable and expectations Reminder: 1 For all bounded measurable functions g, we have E[g(X)] = 1 ( g(x) exp x 2 ) dx. 2π R 2 2 In particular, ( exp x 2 ) dx = 2π. R 2 Samy T. Gaussian vectors & CLT Probability Theory 5 / 86

Gaussian moments Proposition 1. Let X N (0, 1). Then 1 For all z C, we have As a particular case, we get 2 For all n N, we have E[exp(zX)] = exp(z 2 /2). E[exp(ıtX)] = e t2 /2, t R. 0 if n is odd, E [X n ] = (2m)!, if n is even, n = 2m. m!2m Samy T. Gaussian vectors & CLT Probability Theory 6 / 86

Proof (i) Definition of the transform: R exp(zx 1 2 x 2 )dx absolutely convergent for all z C the quantity ϕ(z) = E[e zx ] is well defined and, ϕ(z) = 1 2π R exp (zx 1 ) 2 x 2 dx. (ii) Real case: Let z R. Decomposition zx 1x 2 = 1(x 2 2 z)2 + z2 2 and change of variable y = x z ϕ(z) = e z2 /2 Samy T. Gaussian vectors & CLT Probability Theory 7 / 86

Proof (2) (iii) Complex case: ϕ and z e z2 /2 are two entire functions Since those two functions coincide on R, they coincide on C. (iv) Characteristic function: In particular, if z = ıt with t R, we have E[exp(ıtX)] = e t2 /2 Samy T. Gaussian vectors & CLT Probability Theory 8 / 86

Proof (3) (v) Moments: Let n 1. Convergence of E[ X n ]: easy argument In addition, we almost surely have However, S n Y with e ıtx = lim n S n, with S n = Y = k=0 n (ıt) k X k. k=0 k! t k X k = e tx e tx + e tx. k! Since E[exp(aX)] <, we obtain that Y is integrable Applying dominated convergence, we end up with E[exp(ıtX)] = E (ıtx) n = ı n t n n 0 n! n 0 n! E[X n ]. (1) Identifying lhs and rhs, we get our formula for moments Samy T. Gaussian vectors & CLT Probability Theory 9 / 86

Gaussian random variable Corollary: Owing to the previous proposition, if X N (0, 1) E[X] = 0 and Var(X) = 1 Definition: A random variable is said to be Gaussian if there exists X N (0, 1) and two constants a and b such that Y = ax + b. Parameter identification: we have E[Y ] = b, and Var(Y ) = a 2 Var(X) = a 2. Notation: We denote by N (m, σ 2 ) the law of a Gaussian random variable with mean m and variance σ 2. Samy T. Gaussian vectors & CLT Probability Theory 10 / 86

Properties of Gaussian random variables Density: we have ( ) 1 σ 2π exp (x m)2 2σ 2 is the density of N (m, σ 2 ) Characteristic function: let Y N (m, σ 2 ). Then ( ) E[exp(ıtY )] = exp ıtm t2 2 σ2, t R. The formula above also characterizes N (m, σ 2 ) Samy T. Gaussian vectors & CLT Probability Theory 11 / 86

Gaussian law: illustration 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0!5!4!3!2!1 0 1 2 3 4 5 Figure: Distributions N (0, 1), N (1, 1), N (0, 9), N (0, 1/4). Samy T. Gaussian vectors & CLT Probability Theory 12 / 86

Sum of independent Gaussian random variables Proposition 2. Let Y 1 and Y 2 be two independent Gaussian random variables Assume Y 1 N (m 1, σ 2 1) and Y 2 N 1 (m 2, σ 2 2). Then Y 1 + Y 2 N 1 (m 1 + m 2, σ 2 1 + σ 2 2). Proof: Via characteristic functions Remarks: It is easy to identify the parameters of Y 1 + Y 2 Possible generalization to n j=1 Y j Samy T. Gaussian vectors & CLT Probability Theory 13 / 86

Outline 1 Real Gaussian random variables 2 Random vectors 3 Gaussian random vectors 4 Central limit theorem 5 Empirical mean and variance Samy T. Gaussian vectors & CLT Probability Theory 14 / 86

Matrix notation Transpose: If A is a matrix, A designates the transpose of A. Particular case: Let x R n. Then x is a column vector in R n,1 x is a row matrix Inner product: If x and y are two vectors in R n, their inner product is denoted by n x, y = x y = y x = x i y i, if x = (x 1,..., x n ), y = (y 1,..., y n ). i=1 Samy T. Gaussian vectors & CLT Probability Theory 15 / 86

Vector valued random variable Definition 3. 1 A random variable X with values in R n is given by n real valued random variables X 1,X 2,..., X n. 2 We denote by X the column matrix with coordinates X 1, X 2,..., X n : X = (X 1, X 2,..., X n ). Samy T. Gaussian vectors & CLT Probability Theory 16 / 86

Expected value and covariance Expected value: Let X R n. E[X] is the vector defined by E[X] = (E[X 1 ], E[X 2 ]..., E[X n ]). Note: here we assume that all the expectations are well-defined. Covariance: Let X R n and Y R m. The covariance matrix K X,Y R n,m is defined by K X,Y = E [(X E[X]) (Y E[Y ]) ] Elements of the covariance matrix: for 1 i n and 1 j m K X,Y (i, j) = Cov(X i, Y j ) = E [(X i E[X i ]) (Y j E[Y j ])] Samy T. Gaussian vectors & CLT Probability Theory 17 / 86

Simples properties Linear transforms and Expectation-covariance: Let X R n, A R m,n, u R m. Then E[u + AX] = u + A E[X], and K u+ax = K AX = AK X A. Another formula for the covariance: As a particular case, K X,Y = E [XY ] E [X] E [Y ]. K X = E [XX ] E [X] E [X] Samy T. Gaussian vectors & CLT Probability Theory 18 / 86

Outline 1 Real Gaussian random variables 2 Random vectors 3 Gaussian random vectors 4 Central limit theorem 5 Empirical mean and variance Samy T. Gaussian vectors & CLT Probability Theory 19 / 86

Definition Definition: Let X R n. X is a Gaussian random vector iff for all λ R n n λ, X = λ X = λ i X i i=1 is a real valued Gaussian r.v. Remarks: (1) X Gaussian vector Each component X i of X is a real Gaussian r.v (2) Key example of Gaussian vector: Independent Gaussian components X 1,..., X n (3) Easy construction of random vector X R 2 such that (i) X 1, X 2 real Gaussian (ii) X is not a Gaussian vector Samy T. Gaussian vectors & CLT Probability Theory 20 / 86

Characteristic function Proposition 4. Let X Gaussian vector with mean m and covariance K Then, for all u R n, E [exp(ı u, X )] = e ı u, m 1 2 u Ku, where we use the matrix representation for the vector u Samy T. Gaussian vectors & CLT Probability Theory 21 / 86

Proof Identification of u, X : u, X Gaussian r.v by assumption, with parameters µ := E[ u, X ] = u, m, and σ 2 := Var( u, X ) = u Ku (2) Characteristic function of 1-d Gaussian r.v: Let Y N (µ, σ 2 ). Then recall that ( ) E[exp(ıtY )] = exp ıtµ t2 2 σ2, t R. (3) Conclusion: Easily obtained by plugging (2) into (11) Samy T. Gaussian vectors & CLT Probability Theory 22 / 86

Remark and notation Remark: According to Proposition 4 The law of a Gaussian vector X is characterized by its mean m and its covariance matrix K If X and Y are two Gaussian vectors with the same mean and covariance matrix, their law is the same Caution: This is only true for Gaussian vectors. In general, two random variables sharing the same mean and variance are not equal in law Notation: If X Gaussian vector with mean m and covariance K We write X N (m, K) Samy T. Gaussian vectors & CLT Probability Theory 23 / 86

Linear transformations Proposition 5. Let Set X N (m X, K X ) A R p,n and z R p Y = AX + z Then Y N (m Y, K Y ), with m Y = z + Am X, K Y = AK X A Samy T. Gaussian vectors & CLT Probability Theory 24 / 86

Proof Aim: Let u R p. We wish to prove that u Y is a Gaussian r.v. Expression for u Y : We have u Y = u z + u AX = u z + v X, where we have set v = A u. This is a Gaussian r.v Conclusion: Y is a Gaussian vector. In addition, m Y = E[Y ] = z + AE[X] = z + Am X, and K Y = AK X A. Samy T. Gaussian vectors & CLT Probability Theory 25 / 86

Positivity of the correlation matrix Proposition 6. Let X be a random vector with covariance matrix K. Then K is a symmetric positive matrix. Proof: Symmetry: K(i, j) = Cov(X i, X j ) = Cov(X j, X i ) = K(j, i) Positivity: Let u R n and Y = u X. Then Var(Y ) = u Ku 0 Samy T. Gaussian vectors & CLT Probability Theory 26 / 86

Linear algebra lemma Lemma 7. Let Γ R n,n, symmetric and positive. Then there exists a matrix A R n,n such that Γ = AA Samy T. Gaussian vectors & CLT Probability Theory 27 / 86

Proof Diagonal form of Γ: Γ symmetric there exists an orthogonal matrix U and D 1 = Diag(λ 1,..., λ n ) such that D 1 = U ΓU Γ positive λ i 0 for all i {1, 2,..., n}. Definition of the square root: Let D = Diag(λ 1/2 1,..., λ 1/2 n ). We set A = UD. Conclusion: Recall that U 1 = U, therefore Γ = UD 1 U. Now D 1 = D 2 = DD, and thus Γ = UDD U = UD(UD) = AA. Samy T. Gaussian vectors & CLT Probability Theory 28 / 86

Construction of a Gaussian vector Theorem 8. Let m R n Γ R n,n symmetric and positive Then There exists a Gaussian vector X N (m, Γ) Samy T. Gaussian vectors & CLT Probability Theory 29 / 86

Proof Standard Gaussian vector in R n : Let Y 1, Y 2,..., Y n, i.i.d with common law N 1 (0, 1). We set Y = (Y 1,..., Y n ), and therefore Y N (0, Id n ). Definition of X: Let A R n,n such that AA = Γ. We define X as: X = m + AY. Conclusion: According to Proposition 5 we have X N (m, K X ), with K X = A K Y A = A Id A = AA = Γ. Samy T. Gaussian vectors & CLT Probability Theory 30 / 86

Decorrelation and independence Theorem 9. Let X be Gaussian vector, with X = (X 1,..., X n ). Then The random variables X 1,..., X n are independent The covariance matrix K X is diagonal. Samy T. Gaussian vectors & CLT Probability Theory 31 / 86

Proof of Decorrelation of coordinates: If X 1,..., X n are independent, then K(i, j) = Cov(X i, X j ) = 0, whenever i j. Therefore K X is diagonal. Samy T. Gaussian vectors & CLT Probability Theory 32 / 86

Proof of (1) Characteristic function of X: Set K = K X. We have shown that E[exp(ı u, X )) = e ı u,e[x] 1 2 u Ku, u R n. (4) Since K is diagonal, we have : n n u Ku = ul 2 K(l, l)= ul 2 Var(X l ). (5) l=1 l=1 Characteristic function of each coordinate: Let φ Xl be the characteristic function of X l We have φ Xl (s) = E[e ısx l ], for all s R. Taking u such that u i = 0, for all i l in (4) and (5) we get φ Xl (u l ) = E [exp(ıu l X l )] = e ıu l E[X l ] 1 2 u2 l Var(X l ). Samy T. Gaussian vectors & CLT Probability Theory 33 / 86

Proof of (2) Conclusion: We can recast (4) as follows: for all u = (u 1, u 2,..., u n ), [ ( )] n n φ Xj (u j ) = E exp ı u l X l = E[exp(ı u, X )], j=1 l=1 This means that the random variables X 1,..., X n are independent. Samy T. Gaussian vectors & CLT Probability Theory 34 / 86

Lemma about absolutely continuous r.v Lemma 10. Let ξ R n a random variable admitting a density. H a subspace of R n, such that dim(h) < n. Then P(ξ H) = 0. Samy T. Gaussian vectors & CLT Probability Theory 35 / 86

Proof Change of variables: We can assume H H with H = {(x 1, x 2,..., x n ); x n = 0} Conclusion: Denote by ϕ the density of ξ. We have: P(ξ H) P(ξ H ) = ϕ(x 1, x 2,..., x n )1 {xn=0}dx 1 dx 2... dx n R n = 0. Samy T. Gaussian vectors & CLT Probability Theory 36 / 86

Gaussian density Theorem 11. Let X N (m, K). Then 1 X admits a density iff K is invertible. 2 If K is invertible, the density of X is given by f (x) = ( 1 (2π) n/2 (det(k)) exp 1 ) 1/2 2 (x m) K 1 (x m) Samy T. Gaussian vectors & CLT Probability Theory 37 / 86

Proof (1) Density and inversion of K: We have seen X (d) = m + AY, where AA = K, Y N (0, Id n ) (i) Assume A non invertible. A non invertible Im(A) = H, with dim(h) < n P(AY H) = 1 Contradiction: X admits a density X m admits a density P(X m H) = 0 However, we have seen that P(X m H) = P(AY H)= 1. Hence X doesn t admit a density. Samy T. Gaussian vectors & CLT Probability Theory 38 / 86

Proof (2) (ii) Assume A invertible. A invertible application y m + Ay is a C 1 bijection the random variable m + AY admits a density. (iii) Conclusion. Since AA = K, we have and we get the equivalence: det(a) det(a ) = (det(a)) 2 = det(k) A invertible K is invertible. Samy T. Gaussian vectors & CLT Probability Theory 39 / 86

Proof (3) (2) Expression of the density: Let Y N (0, Id n ). Density of Y : g(y) = ( 1 (2π) exp 1 ) n/2 2 y, y. Change of variable: Set X = AY + m that is Y = A 1 (X m) Samy T. Gaussian vectors & CLT Probability Theory 40 / 86

Proof (4) Jacobian of the transformation: for x A 1 (x m) we have Jacobian = A 1 Determinant of the Jacobian: det(a 1 ) = [det(a)] 1 = [det(k)] 1/2 Expression for the inner product: We have K 1 = (AA ) 1 = (A ) 1 A 1, and y, y = A 1 (x m), A 1 (x m) = (x m) (A 1 ) A 1 (x m) = (x m) K 1 (x m). Thus X admits the density f. Since X and X share the same law, X admits the density f. Samy T. Gaussian vectors & CLT Probability Theory 41 / 86

Outline 1 Real Gaussian random variables 2 Random vectors 3 Gaussian random vectors 4 Central limit theorem 5 Empirical mean and variance Samy T. Gaussian vectors & CLT Probability Theory 42 / 86

Law of large numbers Theorem 12. We consider the following situation: (X n ; n 1) sequence of i.i.d R k -valued r.v Hypothesis: E[ X 1 ] <, and we set E[X 1 ] = m R k We define Then lim X n = m, n X n = 1 n X j. n j=1 almost surely Samy T. Gaussian vectors & CLT Probability Theory 43 / 86

Central limit theorem Theorem 13. We consider the following situation: Then {X n ; n 1} sequence of i.i.d R k -valued r.v Hypothesis: E[ X 1 2 ] < We set E[X 1 ] = m R k and Cov(X 1 ) = Γ R k,k n ( Xn m ) (d) N k (0, Γ), with Xn = 1 n n X j. j=1 Interpretation: Xn converges to m with rate n 1/2 Samy T. Gaussian vectors & CLT Probability Theory 44 / 86

Convergence in law, first definition Remark: For notational sake the remainder of the section will focus on R-valued r.v Definition 14. Let {X n ; n 1} sequence of r.v, X 0 another r.v F n distribution function of X n F 0 distribution function of X 0 We set C(F ) {x R; F continuous at point x} Definition 1: We have lim n X n (d) = X 0 if lim n F n (x) = F 0 (x) for all x C(F ). Samy T. Gaussian vectors & CLT Probability Theory 45 / 86

Convergence in law, equivalent definition Proposition 15. Let {X n ; n 1} sequence of r.v, X 0 another r.v We set C b (R) {ϕ : R R; ϕ continuous and bounded} Definition 2: We have lim n X n (d) = X 0 iff lim n E[ϕ(X n )] = E[ϕ(X 0 )] for all ϕ C b (R). Samy T. Gaussian vectors & CLT Probability Theory 46 / 86

Central limit theorem in R Theorem 16. We consider the following situation: Then {X n ; n 1} sequence of i.i.d R-valued r.v Hypothesis: E[ X 1 2 ] < We set E[X 1 ] = µ and Var(X 1 ) = σ 2 n ( X n µ ) (d) N (0, σ 2 ), with X n = 1 n n X j. j=1 Otherwise stated we have ni=1 X i nµ σn 1/2 (d) N (0, 1) Samy T. Gaussian vectors & CLT Probability Theory 47 / 86

Application: Bernoulli distribution Proposition 17. Let (X n ; n 1) sequence of i.i.d B(p) r.v Then ( ) X n p (d) n N [p(1 p)] 1/2 1 (0, 1). Remark: For practical purposes as soon as np > 15, the law of X 1 + + X n np np(1 p) is approached by N 1 (0, 1). Notice that X 1 + + X n Bin(n, p). Samy T. Gaussian vectors & CLT Probability Theory 48 / 86

Binomial distribution: plot (1) 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00 0 1 2 3 4 5 6 Figure: Distribution Bin(6; 0.5). x-axis: k, y-axis: P(X = k) Samy T. Gaussian vectors & CLT Probability Theory 49 / 86

Binomial distribution: plot (2) 0.15 0.10 0.05 0.00 0 5 10 15 20 25 30 Figure: Distribution Bin(30; 0.5). x-axis: k, y-axis: P(X = k) Samy T. Gaussian vectors & CLT Probability Theory 50 / 86

Relation between pdf and chf Theorem 18. Let F be a distribution function on R φ the characteristic function of F Then F is uniquely determined by φ Samy T. Gaussian vectors & CLT Probability Theory 51 / 86

Proof (1) Setting: We consider A r.v X with distribution F and chf φ A r.v Z with distribution G and chf γ Relation between chf: We have e ıθz φ(z) G(dz) = R R F (dx) γ(x θ) (6) Samy T. Gaussian vectors & CLT Probability Theory 52 / 86

Proof (2) Proof of (6): Invoking Fubini, we get E [ e ıθz φ(z) ] = = = R R R e ıθz φ(z) G(dz) [ ] G(dz) e ıθz e ızx F (dx) R [ ] F (dx) e ız(x θ) G(dz) R = E [γ(x θ)] Samy T. Gaussian vectors & CLT Probability Theory 53 / 86

Proof (3) Particularizing to a Gaussian case: We now consider Z σn with N N (0, 1) In this case, if n density of N (0, 1), we have G(dz) = σ 1 n(σ 1 z) dz With this setting, relation (6) becomes R e ıθσz φ(σz)n(z) dz = R e 1 2 σ2 (z θ) 2 F (dz) (7) Samy T. Gaussian vectors & CLT Probability Theory 54 / 86

Proof (4) Integration with respect to θ: Integrating (7) wrt θ we get x dθ e ıθσz φ(σz) n(z) dz = A σ,θ (x), (8) R where A σ,θ (x) = x dθ e 1 2 σ2 (z θ) 2 F (dz) R Samy T. Gaussian vectors & CLT Probability Theory 55 / 86

Proof (5) Expression for A σ,θ : We have A σ,θ (x) Fubini = c.v: s=θ z = R x F (dz) ( 2πσ 2 ) 1/2 R e 1 2 σ2 (z θ) 2 dθ x z F (dz) n 0,σ 2(s) ds Therefore, considering N X with N N (0, 1) we get A σ,θ (x) = ( 2πσ 2) 1/2 P ( σ 1 N + X x ) (9) Samy T. Gaussian vectors & CLT Probability Theory 56 / 86

Proof (6) Summary: Putting together (8) and (9) we get x dθ e ıθσz φ(σz) n(z) dz = ( 2πσ 2) 1/2 ( P σ 1 N + X x ) R Divide the above relation by (2πσ 2 ) 1/2. We obtain σ x dθ e ıθσz φ(σz) n(z) dz = P ( σ 1 N + X x ) (10) (2π) 1/2 R Samy T. Gaussian vectors & CLT Probability Theory 57 / 86

Proof (7) Convergence result: Recall that X 1,n (d) (P) (d) X 1 and X 2,n X 2 = X 1,n + X 2,n X 1 (11) Notation: We set C(F ) {x R; F continuous at point x} Samy T. Gaussian vectors & CLT Probability Theory 58 / 86

Proof (8) Limit as σ : Thanks to our convergence result, one can take limits in (10) lim σ σ (2π) 1/2 x dθ e ıθσz φ(σz) n(z) dz R = lim σ P ( σ 1 N + X x ) = P (X x) = F (x), (12) for all x C(F ) Conclusion: F is determined by φ. Samy T. Gaussian vectors & CLT Probability Theory 59 / 86

Fourier inversion Proposition 19. Let F be a distribution function on R, and X F φ the characteristic function of F Hypothesis: φ L 1 (R) Conclusion: F admits a bounded continuous density f, given by f (x) = 1 e ıyx φ(y) dy 2π R Samy T. Gaussian vectors & CLT Probability Theory 60 / 86

Proof (1) Density of σ 1 N + X: We set F σ (x) = P ( σ 1 N + X x ) Since both N and X admit a density, F σ admits a density f σ Expression for F σ : Recall relation (10) σ x dθ e ıθσz φ(σz) n(z) dz = F (2π) 1/2 σ (x) (13) R Samy T. Gaussian vectors & CLT Probability Theory 61 / 86

Proof (2) Expression for f σ : Differentiating the lhs of (13) we get f σ (θ) = (c.v: σz = y) = (n is Gaussian) = σ (2π) 1/2 σ (2π) 1/2 1 (2π) 1/2 R R R e ıθσz φ(σz) n(z) dz e ıθy φ(y) n(σ 1 y) dy e ıθy φ(y) e σ 2 y 2 2 dy Relation (10) on a finite interval: Let I = [a, b]. Using f θ we have P ( σ 1 N + X [a, b] ) = F σ (b) F σ (a) = b a f σ (θ) dθ (14) Samy T. Gaussian vectors & CLT Probability Theory 62 / 86

Proof (3) Limit of f σ : By dominated convergence, lim f σ(θ) = σ Domination of f σ : We have f σ (θ) = = 1 e ıθy φ(y) dy f (θ) (2π) 1/2 R 1 e ıθy φ(y) e σ 2 y 2 (2π) 1/2 2 dy R 1 φ(y) dy (2π) 1/2 R 1 (2π) φ 1/2 L 1 (R) Samy T. Gaussian vectors & CLT Probability Theory 63 / 86

Proof (4) Limits in (14): We use On lhs of (14): Convergence result (11) On rhs of (14): Dominated convergence (on finite interval I) We get P (X [a, b]) = F (b) F (a) = b a f (θ) dθ Conclusion: X admits f (obtained by Fourier inversion) as a density Samy T. Gaussian vectors & CLT Probability Theory 64 / 86

Convergence in law and chf Theorem 20. Let {X n ; n 1} sequence of r.v, X 0 another r.v φ n chf of X n, φ 0 chf of X 0 Then (i) We have lim n X n (ii) Assume that (d) = X 0 = lim n φ n (t) = φ 0 (t) for all t R φ 0 (0) = 1 and φ 0 continuous at point 0 Then we have lim φ (d) n(t) = φ 0 (t) for all t R = lim n n X n = X 0 Samy T. Gaussian vectors & CLT Probability Theory 65 / 86

Central limit theorem in R (repeated) Theorem 21. We consider the following situation: {X n ; n 1} sequence of i.i.d R-valued r.v Hypothesis: E[ X 1 2 ] < We set E[X 1 ] = µ and Var(X 1 ) = σ 2 Then n ( X n µ ) (d) N (0, σ 2 ), with X n = 1 n Otherwise stated we have S n nµ σn 1/2 (d) N (0, 1), with S n = n X j. j=1 n X i (15) i=1 Samy T. Gaussian vectors & CLT Probability Theory 66 / 86

Proof of CLT (1) Reduction to µ = 0, σ = 1: Set Then and ˆX i = X i µ, and Ŝ n = σ n ˆX i i=1 Ŝ n = S n nµ, ˆXi N (0, 1) σ S n nµ σn 1/2 = Ŝn n 1/2 Thus it is enough to prove (15) when µ = 0 and σ = 1 Samy T. Gaussian vectors & CLT Probability Theory 67 / 86

Proof of CLT (2) Aim: For X i such that E[X i ] = 0 and Var(X i ) = 1, set [ ] Sn ıt φ n (t) = E e n 1/2 We wish to prove that lim n φ n(t) = e 1 2 t2 According to Theorem 20 -(ii), this yields the desired result Samy T. Gaussian vectors & CLT Probability Theory 68 / 86

Taylor expansion of the chf Lemma 22. Let Y be a r.v. ψ chf of Y Hypothesis: for l 1, E [ Y l] <. Conclusion: l ψ(s) (ıs) k k! k=0 [ ] E[X k s X l+1 ] E (l + 1)! 2 sx l. l! Proof: Similar to (1). Samy T. Gaussian vectors & CLT Probability Theory 69 / 86

Proof of CLT (3) Computation for φ n : We have φ n (t) = = where ( [ ]) E e ı tx 1 n n 1/2 [ ( )] t n φ, (16) n 1/2 φ characteristic function of X 1 Samy T. Gaussian vectors & CLT Probability Theory 70 / 86

Proof of CLT (4) Expansion of φ: According to Lemma 22, we have and R n satisfies ( ) t φ n 1/2 = 1 + ıt E[X 1] n + 1/2 ı2 t 2 E[X 1 2 ] + R n 2n = 1 t2 2n + R n, (17) R n E [ t X1 3 6n t X 1 2 ] 3/2 n Behavior of R n : By dominated convergence we have lim n R n = 0 (18) n Samy T. Gaussian vectors & CLT Probability Theory 71 / 86

Products of complex numbers Lemma 23. Let {a i ; 1 i n}, such that a i C and a i 1 {b i ; 1 i n}, such that b i C and b i 1 Then we have n n n a i b i a i b i i=1 i=1 i=1 Samy T. Gaussian vectors & CLT Probability Theory 72 / 86

Proof of Lemma 23 Case n = 2: Stems directly from the identity a 1 a 2 b 1 b 2 = a 1 (a 2 b 2 ) + (a 1 b 1 )b 2 General case: By induction Samy T. Gaussian vectors & CLT Probability Theory 73 / 86

Proof of CLT (5) Summary: Thanks to (16) and (17) we have φ n (t) = [ ( )] t n ( ) t φ, and φ = 1 t2 n 1/2 n 1/2 2n + R n Application of Lemma 23: We get [ ( )] ) t n n φ (1 t2 n 1/2 2n ( ) ( ) t n φ 1 t2 n 1/2 2n (19) (20) = n R n (21) Samy T. Gaussian vectors & CLT Probability Theory 74 / 86

Proof of CLT (6) Limit for φ n : Invoking (18) and (19) we get lim n φ n(t) ( ) n 1 t2 2n = 0 In addition Therefore ( lim n lim n 1 t2 2n ) n e t2 2 φ n(t) e t 2 2 = 0 = 0 Conclusion: CLT holds, since lim n φ n(t) = e 1 2 t2 Samy T. Gaussian vectors & CLT Probability Theory 75 / 86

Outline 1 Real Gaussian random variables 2 Random vectors 3 Gaussian random vectors 4 Central limit theorem 5 Empirical mean and variance Samy T. Gaussian vectors & CLT Probability Theory 76 / 86

Gamma and chi-square laws Definition 1: For all λ > 0 and a > 0, we denote by γ(λ, a) the distribution on R defined by the density x λ 1 ( a λ Γ(λ) exp x ) 1 {x>0}, where Γ(λ) = x λ 1 e x dx a 0 This distribution is called gamma law with parameters λ, a. Definition 2: Let X 1,..., X n i.i.d N (0, 1). We set Z = n i=1 X 2 The law of Z is called chi-square distribution with n degrees of freedom. We denote this distribution by χ 2 (n). i. Samy T. Gaussian vectors & CLT Probability Theory 77 / 86

Gamma and chi-square laws (2) Proposition 24. The distribution χ 2 (n) coincides with γ(n/2, 2). As a particular case, if X 1,..., X n i.i.d N (0, 1) We set Z = n i=1 X 2 i, then we have Z γ(n/2, 2). Samy T. Gaussian vectors & CLT Probability Theory 78 / 86

Empirical mean and variance Let X 1,..., X n n real r.v Definition: we set X n = 1 n n k=1 X n is called empirical mean. S 2 n is called empirical variance. X k, and S 2 n = 1 n 1 Property: Let X 1,..., X n n i.i.d real r.v Assume E[X 1 ] = m and Var(X 1 ) = σ 2. Then E [ X n ] = m, and E [ S 2 n ] = σ 2 n (X k X n ) 2. k=1 Samy T. Gaussian vectors & CLT Probability Theory 79 / 86

Law of ( X n, S 2 n) in a Gaussian situation Theorem 25. Let X 1, X 2,...,X n i.i.d with common law N 1 (m, σ 2 ). Then 1 X n and S 2 n are independent. 2 Xn N 1 (m, σ2 n 1 ) and S 2 n σ 2 n χ 2 (n 1). Samy T. Gaussian vectors & CLT Probability Theory 80 / 86

Proof (1) (1) Reduction to m = 0 and σ = 1: we set X i = X i m σ X i = σx i + m 1 i n. The r.v X 1,..., X n are i.i.d distributed as N 1 (0, 1) empirical mean X n, empirical variance S 2 n Samy T. Gaussian vectors & CLT Probability Theory 81 / 86

Proof (2) (1) Reduction to m = 0 and σ = 1 (ctd): It is easily seen (using X i X n = σ(x i X n)) that X n = σ X n + m, and S 2 n = σ 2 S 2 n. Thus we are reduced to the case m = 0 and σ = 1 (2) Reduced case: Consider X 1,..., X n i.i.d N (0, 1) Let u 1 = n 1/2 (1, 1,..., 1) We can construct u 2,..., u n such that (u 1,..., u n ) onb of R n Let A R n,n whose columns are u 1,..., u n We set Y = A X Samy T. Gaussian vectors & CLT Probability Theory 82 / 86

Proof (3) (i) Expression for the empirical mean: A orthogonal matrix: AA = A A = Id Y N (0, K Y ) with K Y = A K X (A ) = A Id A = A A = Id, because the covariance matrix K X of X is Id. Due to the fact that the first row of A is ( ) 1 u1 1 1 = n,,,...,, n n we have: Y 1 = 1 n (X 1 + X 2 +... + X n )= n X n, or otherwise stated, X n = Y 1 n Samy T. Gaussian vectors & CLT Probability Theory 83 / 86

Proof (4) (ii) Expression for the empirical variance: Let us express S 2 n in terms of Y : (n 1)S 2 n = = n n (X k X n ) 2 = (Xk 2 2X k X n + X n 2 ) k=1 k=1 ( n ) ( n ) 2 X n X k + n X n 2. k=1 Xk 2 k=1 As a consequence, ( n ) ( n ) (n 1)Sn 2 = Xk 2 2 X n (n X n ) + n X n 2 = Xk 2 n X n 2. k=1 k=1 Samy T. Gaussian vectors & CLT Probability Theory 84 / 86

Proof (5) (ii) Expression for the empirical variance (ctd): We have Y = A X, A orthogonal n k=1 Yk 2 = n k=1 Xk 2 Hence n n n (n 1)Sn 2 = Xk 2 n X n 2 = Yk 2 Y1 2 = Yk 2. k=1 k=1 k=2 Samy T. Gaussian vectors & CLT Probability Theory 85 / 86

Proof (6) Summary: We have seen that Conclusion: X n = Y 1 n, and (n 1)S 2 n = 1 Y N (0, Id n ) Y 1,..., Y n i.i.d N (0, 1) independence of X n and S 2 n. 2 Furthermore, X n = Y 1 n X n N 1 (0, 1/n) n Yk 2 k=2 3 We also have (n 1)S 2 n = n k=2 Y 2 k the law of (n 1)S 2 n is χ 2 (n 1). Samy T. Gaussian vectors & CLT Probability Theory 86 / 86