Gaussian vectors and central limit theorem

Size: px

Start display at page:

Download "Gaussian vectors and central limit theorem"

Jared Bennett
5 years ago
Views:

1 Gaussian vectors and central limit theorem Samy Tindel Purdue University Probability Theory 2 - MA 539 Samy T. Gaussian vectors & CLT Probability Theory 1 / 86

2 Outline 1 Real Gaussian random variables 2 Random vectors 3 Gaussian random vectors 4 Central limit theorem 5 Empirical mean and variance Samy T. Gaussian vectors & CLT Probability Theory 2 / 86

3 Outline 1 Real Gaussian random variables 2 Random vectors 3 Gaussian random vectors 4 Central limit theorem 5 Empirical mean and variance Samy T. Gaussian vectors & CLT Probability Theory 3 / 86

4 Standard Gaussian random variable Definition: Let X be a real valued random variable. X is called standard Gaussian if its probability law admits the density: f (x) = 1 ( exp x 2 ), x R. 2π 2 Notation: We denote by N 1 (0, 1) or N (0, 1) this law. Samy T. Gaussian vectors & CLT Probability Theory 4 / 86

5 Gaussian random variable and expectations Reminder: 1 For all bounded measurable functions g, we have E[g(X)] = 1 ( g(x) exp x 2 ) dx. 2π R 2 2 In particular, ( exp x 2 ) dx = 2π. R 2 Samy T. Gaussian vectors & CLT Probability Theory 5 / 86

6 Gaussian moments Proposition 1. Let X N (0, 1). Then 1 For all z C, we have As a particular case, we get 2 For all n N, we have E[exp(zX)] = exp(z 2 /2). E[exp(ıtX)] = e t2 /2, t R. 0 if n is odd, E [X n ] = (2m)!, if n is even, n = 2m. m!2m Samy T. Gaussian vectors & CLT Probability Theory 6 / 86

7 Proof (i) Definition of the transform: R exp(zx 1 2 x 2 )dx absolutely convergent for all z C the quantity ϕ(z) = E[e zx ] is well defined and, ϕ(z) = 1 2π R exp (zx 1 ) 2 x 2 dx. (ii) Real case: Let z R. Decomposition zx 1x 2 = 1(x 2 2 z)2 + z2 2 and change of variable y = x z ϕ(z) = e z2 /2 Samy T. Gaussian vectors & CLT Probability Theory 7 / 86

8 Proof (2) (iii) Complex case: ϕ and z e z2 /2 are two entire functions Since those two functions coincide on R, they coincide on C. (iv) Characteristic function: In particular, if z = ıt with t R, we have E[exp(ıtX)] = e t2 /2 Samy T. Gaussian vectors & CLT Probability Theory 8 / 86

9 Proof (3) (v) Moments: Let n 1. Convergence of E[ X n ]: easy argument In addition, we almost surely have However, S n Y with e ıtx = lim n S n, with S n = Y = k=0 n (ıt) k X k. k=0 k! t k X k = e tx e tx + e tx. k! Since E[exp(aX)] <, we obtain that Y is integrable Applying dominated convergence, we end up with E[exp(ıtX)] = E (ıtx) n = ı n t n n 0 n! n 0 n! E[X n ]. (1) Identifying lhs and rhs, we get our formula for moments Samy T. Gaussian vectors & CLT Probability Theory 9 / 86

10 Gaussian random variable Corollary: Owing to the previous proposition, if X N (0, 1) E[X] = 0 and Var(X) = 1 Definition: A random variable is said to be Gaussian if there exists X N (0, 1) and two constants a and b such that Y = ax + b. Parameter identification: we have E[Y ] = b, and Var(Y ) = a 2 Var(X) = a 2. Notation: We denote by N (m, σ 2 ) the law of a Gaussian random variable with mean m and variance σ 2. Samy T. Gaussian vectors & CLT Probability Theory 10 / 86

11 Properties of Gaussian random variables Density: we have ( ) 1 σ 2π exp (x m)2 2σ 2 is the density of N (m, σ 2 ) Characteristic function: let Y N (m, σ 2 ). Then ( ) E[exp(ıtY )] = exp ıtm t2 2 σ2, t R. The formula above also characterizes N (m, σ 2 ) Samy T. Gaussian vectors & CLT Probability Theory 11 / 86

12 Gaussian law: illustration !5!4!3!2! Figure: Distributions N (0, 1), N (1, 1), N (0, 9), N (0, 1/4). Samy T. Gaussian vectors & CLT Probability Theory 12 / 86

13 Sum of independent Gaussian random variables Proposition 2. Let Y 1 and Y 2 be two independent Gaussian random variables Assume Y 1 N (m 1, σ 2 1) and Y 2 N 1 (m 2, σ 2 2). Then Y 1 + Y 2 N 1 (m 1 + m 2, σ σ 2 2). Proof: Via characteristic functions Remarks: It is easy to identify the parameters of Y 1 + Y 2 Possible generalization to n j=1 Y j Samy T. Gaussian vectors & CLT Probability Theory 13 / 86

14 Outline 1 Real Gaussian random variables 2 Random vectors 3 Gaussian random vectors 4 Central limit theorem 5 Empirical mean and variance Samy T. Gaussian vectors & CLT Probability Theory 14 / 86

15 Matrix notation Transpose: If A is a matrix, A designates the transpose of A. Particular case: Let x R n. Then x is a column vector in R n,1 x is a row matrix Inner product: If x and y are two vectors in R n, their inner product is denoted by n x, y = x y = y x = x i y i, if x = (x 1,..., x n ), y = (y 1,..., y n ). i=1 Samy T. Gaussian vectors & CLT Probability Theory 15 / 86

16 Vector valued random variable Definition 3. 1 A random variable X with values in R n is given by n real valued random variables X 1,X 2,..., X n. 2 We denote by X the column matrix with coordinates X 1, X 2,..., X n : X = (X 1, X 2,..., X n ). Samy T. Gaussian vectors & CLT Probability Theory 16 / 86

17 Expected value and covariance Expected value: Let X R n. E[X] is the vector defined by E[X] = (E[X 1 ], E[X 2 ]..., E[X n ]). Note: here we assume that all the expectations are well-defined. Covariance: Let X R n and Y R m. The covariance matrix K X,Y R n,m is defined by K X,Y = E [(X E[X]) (Y E[Y ]) ] Elements of the covariance matrix: for 1 i n and 1 j m K X,Y (i, j) = Cov(X i, Y j ) = E [(X i E[X i ]) (Y j E[Y j ])] Samy T. Gaussian vectors & CLT Probability Theory 17 / 86

18 Simples properties Linear transforms and Expectation-covariance: Let X R n, A R m,n, u R m. Then E[u + AX] = u + A E[X], and K u+ax = K AX = AK X A. Another formula for the covariance: As a particular case, K X,Y = E [XY ] E [X] E [Y ]. K X = E [XX ] E [X] E [X] Samy T. Gaussian vectors & CLT Probability Theory 18 / 86

19 Outline 1 Real Gaussian random variables 2 Random vectors 3 Gaussian random vectors 4 Central limit theorem 5 Empirical mean and variance Samy T. Gaussian vectors & CLT Probability Theory 19 / 86

20 Definition Definition: Let X R n. X is a Gaussian random vector iff for all λ R n n λ, X = λ X = λ i X i i=1 is a real valued Gaussian r.v. Remarks: (1) X Gaussian vector Each component X i of X is a real Gaussian r.v (2) Key example of Gaussian vector: Independent Gaussian components X 1,..., X n (3) Easy construction of random vector X R 2 such that (i) X 1, X 2 real Gaussian (ii) X is not a Gaussian vector Samy T. Gaussian vectors & CLT Probability Theory 20 / 86

21 Characteristic function Proposition 4. Let X Gaussian vector with mean m and covariance K Then, for all u R n, E [exp(ı u, X )] = e ı u, m 1 2 u Ku, where we use the matrix representation for the vector u Samy T. Gaussian vectors & CLT Probability Theory 21 / 86

22 Proof Identification of u, X : u, X Gaussian r.v by assumption, with parameters µ := E[ u, X ] = u, m, and σ 2 := Var( u, X ) = u Ku (2) Characteristic function of 1-d Gaussian r.v: Let Y N (µ, σ 2 ). Then recall that ( ) E[exp(ıtY )] = exp ıtµ t2 2 σ2, t R. (3) Conclusion: Easily obtained by plugging (2) into (11) Samy T. Gaussian vectors & CLT Probability Theory 22 / 86

23 Remark and notation Remark: According to Proposition 4 The law of a Gaussian vector X is characterized by its mean m and its covariance matrix K If X and Y are two Gaussian vectors with the same mean and covariance matrix, their law is the same Caution: This is only true for Gaussian vectors. In general, two random variables sharing the same mean and variance are not equal in law Notation: If X Gaussian vector with mean m and covariance K We write X N (m, K) Samy T. Gaussian vectors & CLT Probability Theory 23 / 86

24 Linear transformations Proposition 5. Let Set X N (m X, K X ) A R p,n and z R p Y = AX + z Then Y N (m Y, K Y ), with m Y = z + Am X, K Y = AK X A Samy T. Gaussian vectors & CLT Probability Theory 24 / 86

25 Proof Aim: Let u R p. We wish to prove that u Y is a Gaussian r.v. Expression for u Y : We have u Y = u z + u AX = u z + v X, where we have set v = A u. This is a Gaussian r.v Conclusion: Y is a Gaussian vector. In addition, m Y = E[Y ] = z + AE[X] = z + Am X, and K Y = AK X A. Samy T. Gaussian vectors & CLT Probability Theory 25 / 86

26 Positivity of the correlation matrix Proposition 6. Let X be a random vector with covariance matrix K. Then K is a symmetric positive matrix. Proof: Symmetry: K(i, j) = Cov(X i, X j ) = Cov(X j, X i ) = K(j, i) Positivity: Let u R n and Y = u X. Then Var(Y ) = u Ku 0 Samy T. Gaussian vectors & CLT Probability Theory 26 / 86

27 Linear algebra lemma Lemma 7. Let Γ R n,n, symmetric and positive. Then there exists a matrix A R n,n such that Γ = AA Samy T. Gaussian vectors & CLT Probability Theory 27 / 86

28 Proof Diagonal form of Γ: Γ symmetric there exists an orthogonal matrix U and D 1 = Diag(λ 1,..., λ n ) such that D 1 = U ΓU Γ positive λ i 0 for all i {1, 2,..., n}. Definition of the square root: Let D = Diag(λ 1/2 1,..., λ 1/2 n ). We set A = UD. Conclusion: Recall that U 1 = U, therefore Γ = UD 1 U. Now D 1 = D 2 = DD, and thus Γ = UDD U = UD(UD) = AA. Samy T. Gaussian vectors & CLT Probability Theory 28 / 86

29 Construction of a Gaussian vector Theorem 8. Let m R n Γ R n,n symmetric and positive Then There exists a Gaussian vector X N (m, Γ) Samy T. Gaussian vectors & CLT Probability Theory 29 / 86

30 Proof Standard Gaussian vector in R n : Let Y 1, Y 2,..., Y n, i.i.d with common law N 1 (0, 1). We set Y = (Y 1,..., Y n ), and therefore Y N (0, Id n ). Definition of X: Let A R n,n such that AA = Γ. We define X as: X = m + AY. Conclusion: According to Proposition 5 we have X N (m, K X ), with K X = A K Y A = A Id A = AA = Γ. Samy T. Gaussian vectors & CLT Probability Theory 30 / 86

31 Decorrelation and independence Theorem 9. Let X be Gaussian vector, with X = (X 1,..., X n ). Then The random variables X 1,..., X n are independent The covariance matrix K X is diagonal. Samy T. Gaussian vectors & CLT Probability Theory 31 / 86

32 Proof of Decorrelation of coordinates: If X 1,..., X n are independent, then K(i, j) = Cov(X i, X j ) = 0, whenever i j. Therefore K X is diagonal. Samy T. Gaussian vectors & CLT Probability Theory 32 / 86

33 Proof of (1) Characteristic function of X: Set K = K X. We have shown that E[exp(ı u, X )) = e ı u,e[x] 1 2 u Ku, u R n. (4) Since K is diagonal, we have : n n u Ku = ul 2 K(l, l)= ul 2 Var(X l ). (5) l=1 l=1 Characteristic function of each coordinate: Let φ Xl be the characteristic function of X l We have φ Xl (s) = E[e ısx l ], for all s R. Taking u such that u i = 0, for all i l in (4) and (5) we get φ Xl (u l ) = E [exp(ıu l X l )] = e ıu l E[X l ] 1 2 u2 l Var(X l ). Samy T. Gaussian vectors & CLT Probability Theory 33 / 86

34 Proof of (2) Conclusion: We can recast (4) as follows: for all u = (u 1, u 2,..., u n ), [ ( )] n n φ Xj (u j ) = E exp ı u l X l = E[exp(ı u, X )], j=1 l=1 This means that the random variables X 1,..., X n are independent. Samy T. Gaussian vectors & CLT Probability Theory 34 / 86

35 Lemma about absolutely continuous r.v Lemma 10. Let ξ R n a random variable admitting a density. H a subspace of R n, such that dim(h) < n. Then P(ξ H) = 0. Samy T. Gaussian vectors & CLT Probability Theory 35 / 86

36 Proof Change of variables: We can assume H H with H = {(x 1, x 2,..., x n ); x n = 0} Conclusion: Denote by ϕ the density of ξ. We have: P(ξ H) P(ξ H ) = ϕ(x 1, x 2,..., x n )1 {xn=0}dx 1 dx 2... dx n R n = 0. Samy T. Gaussian vectors & CLT Probability Theory 36 / 86

37 Gaussian density Theorem 11. Let X N (m, K). Then 1 X admits a density iff K is invertible. 2 If K is invertible, the density of X is given by f (x) = ( 1 (2π) n/2 (det(k)) exp 1 ) 1/2 2 (x m) K 1 (x m) Samy T. Gaussian vectors & CLT Probability Theory 37 / 86

38 Proof (1) Density and inversion of K: We have seen X (d) = m + AY, where AA = K, Y N (0, Id n ) (i) Assume A non invertible. A non invertible Im(A) = H, with dim(h) < n P(AY H) = 1 Contradiction: X admits a density X m admits a density P(X m H) = 0 However, we have seen that P(X m H) = P(AY H)= 1. Hence X doesn t admit a density. Samy T. Gaussian vectors & CLT Probability Theory 38 / 86

39 Proof (2) (ii) Assume A invertible. A invertible application y m + Ay is a C 1 bijection the random variable m + AY admits a density. (iii) Conclusion. Since AA = K, we have and we get the equivalence: det(a) det(a ) = (det(a)) 2 = det(k) A invertible K is invertible. Samy T. Gaussian vectors & CLT Probability Theory 39 / 86

40 Proof (3) (2) Expression of the density: Let Y N (0, Id n ). Density of Y : g(y) = ( 1 (2π) exp 1 ) n/2 2 y, y. Change of variable: Set X = AY + m that is Y = A 1 (X m) Samy T. Gaussian vectors & CLT Probability Theory 40 / 86

41 Proof (4) Jacobian of the transformation: for x A 1 (x m) we have Jacobian = A 1 Determinant of the Jacobian: det(a 1 ) = [det(a)] 1 = [det(k)] 1/2 Expression for the inner product: We have K 1 = (AA ) 1 = (A ) 1 A 1, and y, y = A 1 (x m), A 1 (x m) = (x m) (A 1 ) A 1 (x m) = (x m) K 1 (x m). Thus X admits the density f. Since X and X share the same law, X admits the density f. Samy T. Gaussian vectors & CLT Probability Theory 41 / 86

42 Outline 1 Real Gaussian random variables 2 Random vectors 3 Gaussian random vectors 4 Central limit theorem 5 Empirical mean and variance Samy T. Gaussian vectors & CLT Probability Theory 42 / 86

43 Law of large numbers Theorem 12. We consider the following situation: (X n ; n 1) sequence of i.i.d R k -valued r.v Hypothesis: E[ X 1 ] <, and we set E[X 1 ] = m R k We define Then lim X n = m, n X n = 1 n X j. n j=1 almost surely Samy T. Gaussian vectors & CLT Probability Theory 43 / 86

44 Central limit theorem Theorem 13. We consider the following situation: Then {X n ; n 1} sequence of i.i.d R k -valued r.v Hypothesis: E[ X 1 2 ] < We set E[X 1 ] = m R k and Cov(X 1 ) = Γ R k,k n ( Xn m ) (d) N k (0, Γ), with Xn = 1 n n X j. j=1 Interpretation: Xn converges to m with rate n 1/2 Samy T. Gaussian vectors & CLT Probability Theory 44 / 86

45 Convergence in law, first definition Remark: For notational sake the remainder of the section will focus on R-valued r.v Definition 14. Let {X n ; n 1} sequence of r.v, X 0 another r.v F n distribution function of X n F 0 distribution function of X 0 We set C(F ) {x R; F continuous at point x} Definition 1: We have lim n X n (d) = X 0 if lim n F n (x) = F 0 (x) for all x C(F ). Samy T. Gaussian vectors & CLT Probability Theory 45 / 86

46 Convergence in law, equivalent definition Proposition 15. Let {X n ; n 1} sequence of r.v, X 0 another r.v We set C b (R) {ϕ : R R; ϕ continuous and bounded} Definition 2: We have lim n X n (d) = X 0 iff lim n E[ϕ(X n )] = E[ϕ(X 0 )] for all ϕ C b (R). Samy T. Gaussian vectors & CLT Probability Theory 46 / 86

47 Central limit theorem in R Theorem 16. We consider the following situation: Then {X n ; n 1} sequence of i.i.d R-valued r.v Hypothesis: E[ X 1 2 ] < We set E[X 1 ] = µ and Var(X 1 ) = σ 2 n ( X n µ ) (d) N (0, σ 2 ), with X n = 1 n n X j. j=1 Otherwise stated we have ni=1 X i nµ σn 1/2 (d) N (0, 1) Samy T. Gaussian vectors & CLT Probability Theory 47 / 86

48 Application: Bernoulli distribution Proposition 17. Let (X n ; n 1) sequence of i.i.d B(p) r.v Then ( ) X n p (d) n N [p(1 p)] 1/2 1 (0, 1). Remark: For practical purposes as soon as np > 15, the law of X X n np np(1 p) is approached by N 1 (0, 1). Notice that X X n Bin(n, p). Samy T. Gaussian vectors & CLT Probability Theory 48 / 86

49 Binomial distribution: plot (1) Figure: Distribution Bin(6; 0.5). x-axis: k, y-axis: P(X = k) Samy T. Gaussian vectors & CLT Probability Theory 49 / 86

50 Binomial distribution: plot (2) Figure: Distribution Bin(30; 0.5). x-axis: k, y-axis: P(X = k) Samy T. Gaussian vectors & CLT Probability Theory 50 / 86

51 Relation between pdf and chf Theorem 18. Let F be a distribution function on R φ the characteristic function of F Then F is uniquely determined by φ Samy T. Gaussian vectors & CLT Probability Theory 51 / 86

52 Proof (1) Setting: We consider A r.v X with distribution F and chf φ A r.v Z with distribution G and chf γ Relation between chf: We have e ıθz φ(z) G(dz) = R R F (dx) γ(x θ) (6) Samy T. Gaussian vectors & CLT Probability Theory 52 / 86

53 Proof (2) Proof of (6): Invoking Fubini, we get E [ e ıθz φ(z) ] = = = R R R e ıθz φ(z) G(dz) [ ] G(dz) e ıθz e ızx F (dx) R [ ] F (dx) e ız(x θ) G(dz) R = E [γ(x θ)] Samy T. Gaussian vectors & CLT Probability Theory 53 / 86

54 Proof (3) Particularizing to a Gaussian case: We now consider Z σn with N N (0, 1) In this case, if n density of N (0, 1), we have G(dz) = σ 1 n(σ 1 z) dz With this setting, relation (6) becomes R e ıθσz φ(σz)n(z) dz = R e 1 2 σ2 (z θ) 2 F (dz) (7) Samy T. Gaussian vectors & CLT Probability Theory 54 / 86

55 Proof (4) Integration with respect to θ: Integrating (7) wrt θ we get x dθ e ıθσz φ(σz) n(z) dz = A σ,θ (x), (8) R where A σ,θ (x) = x dθ e 1 2 σ2 (z θ) 2 F (dz) R Samy T. Gaussian vectors & CLT Probability Theory 55 / 86

56 Proof (5) Expression for A σ,θ : We have A σ,θ (x) Fubini = c.v: s=θ z = R x F (dz) ( 2πσ 2 ) 1/2 R e 1 2 σ2 (z θ) 2 dθ x z F (dz) n 0,σ 2(s) ds Therefore, considering N X with N N (0, 1) we get A σ,θ (x) = ( 2πσ 2) 1/2 P ( σ 1 N + X x ) (9) Samy T. Gaussian vectors & CLT Probability Theory 56 / 86

57 Proof (6) Summary: Putting together (8) and (9) we get x dθ e ıθσz φ(σz) n(z) dz = ( 2πσ 2) 1/2 ( P σ 1 N + X x ) R Divide the above relation by (2πσ 2 ) 1/2. We obtain σ x dθ e ıθσz φ(σz) n(z) dz = P ( σ 1 N + X x ) (10) (2π) 1/2 R Samy T. Gaussian vectors & CLT Probability Theory 57 / 86

58 Proof (7) Convergence result: Recall that X 1,n (d) (P) (d) X 1 and X 2,n X 2 = X 1,n + X 2,n X 1 (11) Notation: We set C(F ) {x R; F continuous at point x} Samy T. Gaussian vectors & CLT Probability Theory 58 / 86

59 Proof (8) Limit as σ : Thanks to our convergence result, one can take limits in (10) lim σ σ (2π) 1/2 x dθ e ıθσz φ(σz) n(z) dz R = lim σ P ( σ 1 N + X x ) = P (X x) = F (x), (12) for all x C(F ) Conclusion: F is determined by φ. Samy T. Gaussian vectors & CLT Probability Theory 59 / 86

60 Fourier inversion Proposition 19. Let F be a distribution function on R, and X F φ the characteristic function of F Hypothesis: φ L 1 (R) Conclusion: F admits a bounded continuous density f, given by f (x) = 1 e ıyx φ(y) dy 2π R Samy T. Gaussian vectors & CLT Probability Theory 60 / 86

61 Proof (1) Density of σ 1 N + X: We set F σ (x) = P ( σ 1 N + X x ) Since both N and X admit a density, F σ admits a density f σ Expression for F σ : Recall relation (10) σ x dθ e ıθσz φ(σz) n(z) dz = F (2π) 1/2 σ (x) (13) R Samy T. Gaussian vectors & CLT Probability Theory 61 / 86

62 Proof (2) Expression for f σ : Differentiating the lhs of (13) we get f σ (θ) = (c.v: σz = y) = (n is Gaussian) = σ (2π) 1/2 σ (2π) 1/2 1 (2π) 1/2 R R R e ıθσz φ(σz) n(z) dz e ıθy φ(y) n(σ 1 y) dy e ıθy φ(y) e σ 2 y 2 2 dy Relation (10) on a finite interval: Let I = [a, b]. Using f θ we have P ( σ 1 N + X [a, b] ) = F σ (b) F σ (a) = b a f σ (θ) dθ (14) Samy T. Gaussian vectors & CLT Probability Theory 62 / 86

63 Proof (3) Limit of f σ : By dominated convergence, lim f σ(θ) = σ Domination of f σ : We have f σ (θ) = = 1 e ıθy φ(y) dy f (θ) (2π) 1/2 R 1 e ıθy φ(y) e σ 2 y 2 (2π) 1/2 2 dy R 1 φ(y) dy (2π) 1/2 R 1 (2π) φ 1/2 L 1 (R) Samy T. Gaussian vectors & CLT Probability Theory 63 / 86

64 Proof (4) Limits in (14): We use On lhs of (14): Convergence result (11) On rhs of (14): Dominated convergence (on finite interval I) We get P (X [a, b]) = F (b) F (a) = b a f (θ) dθ Conclusion: X admits f (obtained by Fourier inversion) as a density Samy T. Gaussian vectors & CLT Probability Theory 64 / 86

65 Convergence in law and chf Theorem 20. Let {X n ; n 1} sequence of r.v, X 0 another r.v φ n chf of X n, φ 0 chf of X 0 Then (i) We have lim n X n (ii) Assume that (d) = X 0 = lim n φ n (t) = φ 0 (t) for all t R φ 0 (0) = 1 and φ 0 continuous at point 0 Then we have lim φ (d) n(t) = φ 0 (t) for all t R = lim n n X n = X 0 Samy T. Gaussian vectors & CLT Probability Theory 65 / 86

66 Central limit theorem in R (repeated) Theorem 21. We consider the following situation: {X n ; n 1} sequence of i.i.d R-valued r.v Hypothesis: E[ X 1 2 ] < We set E[X 1 ] = µ and Var(X 1 ) = σ 2 Then n ( X n µ ) (d) N (0, σ 2 ), with X n = 1 n Otherwise stated we have S n nµ σn 1/2 (d) N (0, 1), with S n = n X j. j=1 n X i (15) i=1 Samy T. Gaussian vectors & CLT Probability Theory 66 / 86

67 Proof of CLT (1) Reduction to µ = 0, σ = 1: Set Then and ˆX i = X i µ, and Ŝ n = σ n ˆX i i=1 Ŝ n = S n nµ, ˆXi N (0, 1) σ S n nµ σn 1/2 = Ŝn n 1/2 Thus it is enough to prove (15) when µ = 0 and σ = 1 Samy T. Gaussian vectors & CLT Probability Theory 67 / 86

68 Proof of CLT (2) Aim: For X i such that E[X i ] = 0 and Var(X i ) = 1, set [ ] Sn ıt φ n (t) = E e n 1/2 We wish to prove that lim n φ n(t) = e 1 2 t2 According to Theorem 20 -(ii), this yields the desired result Samy T. Gaussian vectors & CLT Probability Theory 68 / 86

69 Taylor expansion of the chf Lemma 22. Let Y be a r.v. ψ chf of Y Hypothesis: for l 1, E [ Y l] <. Conclusion: l ψ(s) (ıs) k k! k=0 [ ] E[X k s X l+1 ] E (l + 1)! 2 sx l. l! Proof: Similar to (1). Samy T. Gaussian vectors & CLT Probability Theory 69 / 86

70 Proof of CLT (3) Computation for φ n : We have φ n (t) = = where ( [ ]) E e ı tx 1 n n 1/2 [ ( )] t n φ, (16) n 1/2 φ characteristic function of X 1 Samy T. Gaussian vectors & CLT Probability Theory 70 / 86

71 Proof of CLT (4) Expansion of φ: According to Lemma 22, we have and R n satisfies ( ) t φ n 1/2 = 1 + ıt E[X 1] n + 1/2 ı2 t 2 E[X 1 2 ] + R n 2n = 1 t2 2n + R n, (17) R n E [ t X1 3 6n t X 1 2 ] 3/2 n Behavior of R n : By dominated convergence we have lim n R n = 0 (18) n Samy T. Gaussian vectors & CLT Probability Theory 71 / 86

72 Products of complex numbers Lemma 23. Let {a i ; 1 i n}, such that a i C and a i 1 {b i ; 1 i n}, such that b i C and b i 1 Then we have n n n a i b i a i b i i=1 i=1 i=1 Samy T. Gaussian vectors & CLT Probability Theory 72 / 86

73 Proof of Lemma 23 Case n = 2: Stems directly from the identity a 1 a 2 b 1 b 2 = a 1 (a 2 b 2 ) + (a 1 b 1 )b 2 General case: By induction Samy T. Gaussian vectors & CLT Probability Theory 73 / 86

74 Proof of CLT (5) Summary: Thanks to (16) and (17) we have φ n (t) = [ ( )] t n ( ) t φ, and φ = 1 t2 n 1/2 n 1/2 2n + R n Application of Lemma 23: We get [ ( )] ) t n n φ (1 t2 n 1/2 2n ( ) ( ) t n φ 1 t2 n 1/2 2n (19) (20) = n R n (21) Samy T. Gaussian vectors & CLT Probability Theory 74 / 86

75 Proof of CLT (6) Limit for φ n : Invoking (18) and (19) we get lim n φ n(t) ( ) n 1 t2 2n = 0 In addition Therefore ( lim n lim n 1 t2 2n ) n e t2 2 φ n(t) e t 2 2 = 0 = 0 Conclusion: CLT holds, since lim n φ n(t) = e 1 2 t2 Samy T. Gaussian vectors & CLT Probability Theory 75 / 86

76 Outline 1 Real Gaussian random variables 2 Random vectors 3 Gaussian random vectors 4 Central limit theorem 5 Empirical mean and variance Samy T. Gaussian vectors & CLT Probability Theory 76 / 86

77 Gamma and chi-square laws Definition 1: For all λ > 0 and a > 0, we denote by γ(λ, a) the distribution on R defined by the density x λ 1 ( a λ Γ(λ) exp x ) 1 {x>0}, where Γ(λ) = x λ 1 e x dx a 0 This distribution is called gamma law with parameters λ, a. Definition 2: Let X 1,..., X n i.i.d N (0, 1). We set Z = n i=1 X 2 The law of Z is called chi-square distribution with n degrees of freedom. We denote this distribution by χ 2 (n). i. Samy T. Gaussian vectors & CLT Probability Theory 77 / 86

78 Gamma and chi-square laws (2) Proposition 24. The distribution χ 2 (n) coincides with γ(n/2, 2). As a particular case, if X 1,..., X n i.i.d N (0, 1) We set Z = n i=1 X 2 i, then we have Z γ(n/2, 2). Samy T. Gaussian vectors & CLT Probability Theory 78 / 86

79 Empirical mean and variance Let X 1,..., X n n real r.v Definition: we set X n = 1 n n k=1 X n is called empirical mean. S 2 n is called empirical variance. X k, and S 2 n = 1 n 1 Property: Let X 1,..., X n n i.i.d real r.v Assume E[X 1 ] = m and Var(X 1 ) = σ 2. Then E [ X n ] = m, and E [ S 2 n ] = σ 2 n (X k X n ) 2. k=1 Samy T. Gaussian vectors & CLT Probability Theory 79 / 86

80 Law of ( X n, S 2 n) in a Gaussian situation Theorem 25. Let X 1, X 2,...,X n i.i.d with common law N 1 (m, σ 2 ). Then 1 X n and S 2 n are independent. 2 Xn N 1 (m, σ2 n 1 ) and S 2 n σ 2 n χ 2 (n 1). Samy T. Gaussian vectors & CLT Probability Theory 80 / 86

81 Proof (1) (1) Reduction to m = 0 and σ = 1: we set X i = X i m σ X i = σx i + m 1 i n. The r.v X 1,..., X n are i.i.d distributed as N 1 (0, 1) empirical mean X n, empirical variance S 2 n Samy T. Gaussian vectors & CLT Probability Theory 81 / 86

82 Proof (2) (1) Reduction to m = 0 and σ = 1 (ctd): It is easily seen (using X i X n = σ(x i X n)) that X n = σ X n + m, and S 2 n = σ 2 S 2 n. Thus we are reduced to the case m = 0 and σ = 1 (2) Reduced case: Consider X 1,..., X n i.i.d N (0, 1) Let u 1 = n 1/2 (1, 1,..., 1) We can construct u 2,..., u n such that (u 1,..., u n ) onb of R n Let A R n,n whose columns are u 1,..., u n We set Y = A X Samy T. Gaussian vectors & CLT Probability Theory 82 / 86

83 Proof (3) (i) Expression for the empirical mean: A orthogonal matrix: AA = A A = Id Y N (0, K Y ) with K Y = A K X (A ) = A Id A = A A = Id, because the covariance matrix K X of X is Id. Due to the fact that the first row of A is ( ) 1 u1 1 1 = n,,,...,, n n we have: Y 1 = 1 n (X 1 + X X n )= n X n, or otherwise stated, X n = Y 1 n Samy T. Gaussian vectors & CLT Probability Theory 83 / 86

84 Proof (4) (ii) Expression for the empirical variance: Let us express S 2 n in terms of Y : (n 1)S 2 n = = n n (X k X n ) 2 = (Xk 2 2X k X n + X n 2 ) k=1 k=1 ( n ) ( n ) 2 X n X k + n X n 2. k=1 Xk 2 k=1 As a consequence, ( n ) ( n ) (n 1)Sn 2 = Xk 2 2 X n (n X n ) + n X n 2 = Xk 2 n X n 2. k=1 k=1 Samy T. Gaussian vectors & CLT Probability Theory 84 / 86

85 Proof (5) (ii) Expression for the empirical variance (ctd): We have Y = A X, A orthogonal n k=1 Yk 2 = n k=1 Xk 2 Hence n n n (n 1)Sn 2 = Xk 2 n X n 2 = Yk 2 Y1 2 = Yk 2. k=1 k=1 k=2 Samy T. Gaussian vectors & CLT Probability Theory 85 / 86

86 Proof (6) Summary: We have seen that Conclusion: X n = Y 1 n, and (n 1)S 2 n = 1 Y N (0, Id n ) Y 1,..., Y n i.i.d N (0, 1) independence of X n and S 2 n. 2 Furthermore, X n = Y 1 n X n N 1 (0, 1/n) n Yk 2 k=2 3 We also have (n 1)S 2 n = n k=2 Y 2 k the law of (n 1)S 2 n is χ 2 (n 1). Samy T. Gaussian vectors & CLT Probability Theory 86 / 86

Statistics, Data Analysis, and Simulation SS 2015

Statistics, Data Analysis, and Simulation SS 2015 08.128.730 Statistik, Datenanalyse und Simulation Dr. Michael O. Distler Mainz, 27. April 2015 Dr. Michael O. Distler