Chapter 4 : Expectation and Moments

Similar documents
Chapter 2: Random Variables

Review of Probability Theory

Quick Tour of Basic Probability Theory and Linear Algebra

5 Operations on Multiple Random Variables

conditional cdf, conditional pdf, total probability theorem?

1 Random Variable: Topics

Continuous Random Variables

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Chapter 2. Discrete Distributions

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

Chapter 4. Chapter 4 sections

Chp 4. Expectation and Variance

UCSD ECE153 Handout #34 Prof. Young-Han Kim Tuesday, May 27, Solutions to Homework Set #6 (Prepared by TA Fatemeh Arbabjolfaei)

Probability Theory and Statistics. Peter Jochumzen

Lecture 1: August 28

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

1 Review of Probability

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

Probability- the good parts version. I. Random variables and their distributions; continuous random variables.

BASICS OF PROBABILITY

LIST OF FORMULAS FOR STK1100 AND STK1110

Multiple Random Variables

Random Variables. P(x) = P[X(e)] = P(e). (1)

Random Variables and Their Distributions

Statistics STAT:5100 (22S:193), Fall Sample Final Exam B

Things to remember when learning probability distributions:

3. Probability and Statistics

Lecture 13 (Part 2): Deviation from mean: Markov s inequality, variance and its properties, Chebyshev s inequality

ACM 116: Lectures 3 4

Probability Background

1.1 Review of Probability Theory

18.440: Lecture 28 Lectures Review

Expectation of Random Variables

1 Presessional Probability

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

Solutions to Homework Set #6 (Prepared by Lele Wang)

18.440: Lecture 28 Lectures Review

Lecture 11. Probability Theory: an Overveiw

Expectation. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Lecture 19: Properties of Expectation

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Lecture 11. Multivariate Normal theory

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline.

3 Multiple Discrete Random Variables

Expectation. DS GA 1002 Probability and Statistics for Data Science. Carlos Fernandez-Granda

Formulas for probability theory and linear models SF2941

Lecture 2: Review of Probability

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

UCSD ECE250 Handout #20 Prof. Young-Han Kim Monday, February 26, Solutions to Exercise Set #7

3. Review of Probability and Statistics

Final Examination Solutions (Total: 100 points)

Exam P Review Sheet. for a > 0. ln(a) i=0 ari = a. (1 r) 2. (Note that the A i s form a partition)

ECON 5350 Class Notes Review of Probability and Distribution Theory

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Lecture 2: Review of Basic Probability Theory

SDS 321: Introduction to Probability and Statistics

01 Probability Theory and Statistics Review

1 Basic continuous random variable problems

Chapter 5. Chapter 5 sections

Exercises with solutions (Set D)

8 Laws of large numbers

discrete random variable: probability mass function continuous random variable: probability density function

1: PROBABILITY REVIEW

ECE353: Probability and Random Processes. Lecture 7 -Continuous Random Variable

Lecture 2: Repetition of probability theory and statistics

Actuarial Science Exam 1/P

Random Variables. Saravanan Vijayakumaran Department of Electrical Engineering Indian Institute of Technology Bombay

CHAPTER 1 DISTRIBUTION THEORY 1 CHAPTER 1: DISTRIBUTION THEORY

Lecture 22: Variance and Covariance

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

Fundamental Tools - Probability Theory II

Algorithms for Uncertainty Quantification

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Problem Set 8 Fall 2007

6.1 Moment Generating and Characteristic Functions

MAS223 Statistical Inference and Modelling Exercises

Exercises and Answers to Chapter 1

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

ENGG2430A-Homework 2

2. Suppose (X, Y ) is a pair of random variables uniformly distributed over the triangle with vertices (0, 0), (2, 0), (2, 1).

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

Joint Distributions. (a) Scalar multiplication: k = c d. (b) Product of two matrices: c d. (c) The transpose of a matrix:

ECE 541 Stochastic Signals and Systems Problem Set 9 Solutions

Recall that if X 1,...,X n are random variables with finite expectations, then. The X i can be continuous or discrete or of any other type.

X = X X n, + X 2

Solutions to Homework Set #5 (Prepared by Lele Wang) MSE = E [ (sgn(x) g(y)) 2],, where f X (x) = 1 2 2π e. e (x y)2 2 dx 2π

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

ECE534, Spring 2018: Solutions for Problem Set #3

Chapter 4 continued. Chapter 4 sections

1 Solution to Problem 2.1

where r n = dn+1 x(t)

1 Basic continuous random variable problems

ORF 245 Fundamentals of Statistics Chapter 4 Great Expectations

ECE302 Exam 2 Version A April 21, You must show ALL of your work for full credit. Please leave fractions as fractions, but simplify them, etc.

More than one variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Appendix A : Introduction to Probability and stochastic processes

ECE 650 Lecture 4. Intro to Estimation Theory Random Vectors. ECE 650 D. Van Alphen 1

STAT 430/510: Lecture 16

Week 12-13: Discrete Probability

Review 1: STAT Mark Carpenter, Ph.D. Professor of Statistics Department of Mathematics and Statistics. August 25, 2015

Transcription:

ECE5: Analysis of Random Signals Fall 06 Chapter 4 : Expectation and Moments Dr. Salim El Rouayheb Scribe: Serge Kas Hanna, Lu Liu Expected Value of a Random Variable Definition. The expected or average value of a random variable X is defined by,. E[X] µ X i x ip X (x i ), if X is discrete.. E[X] xf X(x)dx, if X is continuous. Example. Let X P oisson(λ). What is the expected value of X? The PMF of X is given by, P r(x k) e λ λk, k 0,,...,. k! E[X] k0 k λ λk ke k! λ λk ke k! λe λ + k λe λ e λ λ. λ k (k )! Theorem. (Linearity of Expectation) Let X and Y be any two random variables and let a and b be constants, then, E[aX + by ] ae[x] + be[y ]. Example. Let X Binomial(n, p). What is the expected value of X? The PMF of X is given by, ( ) n P r(x k) p k ( p) n k, k 0,,..., n. k E[X] n ( ) n k p k ( p) n k. k k0

Rather than evaluating this sum, an easier way to calculate E[X] is to express X as the sum of n independent Bernoulli random variables and apply Theorem. In fact, X X + X +... + X n. Where X i Bernoulli(p), for all i,..., n. Hence, By linearity of expectation (Theorem ), E[X] E[X +... + X n ]. { with probability p, X i 0 with probability p.. E[X i ] p + 0 ( p) p. E[X] E[X ] +... + E[X n ] np. Theorem. (Expected value of a function of a RV) Let X be a RV. For a function of a RV, that is, Y g(x), the expected value of Y can be computed from, E[Y ] g(x)f X (x)dx. Example 3. Let X N(µ, σ ) and Y X. What is the expected value of Y? Rather than calculating the pdf of Y and afterwards computing E[Y ], we apply Theorem : E[Y ] µ + σ. x πσ (x µ) e σ dx Conditional Expectations Definition. The conditional expectation of X given that the event B was observed is given by,. E[X B] i x ip X B (x i B), if X is discrete.. E[X B] xf X B(x B)dx, if X is continuous. Intuition: Think of E[X Y y] as the best estimate (guess) of X given that you observed Y. Example 4. A fair die is tossed twice. Let X be the number observed after the first toss, and Y be the number observed after the second toss. Let Z X + Y.. Calculate E[X]. E[X] + + 3 + 4 + 5 + 6 6 3.5.

. Calculate E[X Y 3]. E[X Y 3] E[X] 3.5. 3. Calculate E[X Z 5]. Remark. E[X Z] is a random variable. E[X Z 5] + + 3 + 4 4.5. Theorem 3. (Towering Property of Conditional Expectation) Let X and Y be two random variables, then, E [E[X Y ]] E[X]. Example 5. Let X and Y be two zero mean jointly gaussian random variables, that is, [ f X,Y (x, y) πσ ρ exp x + y ] ρxy σ ( ρ. ) Where ρ. Calculate E[X Y y]. E[X Y y] xf X Y (x Y y)dx. f X Y (X Y y) f X,Y (x, y) f Y (y) [ πσ exp ρ x +y ρxy σ ( ρ ) [ ] exp y πσ σ [ πσ ( ρ ) exp x + y ρxy ( ρ )y ] [ πσ ( ρ ) exp ] (x ρy) σ ( ρ ) σ ( ρ ) ]. Hence, E[X Y y] ρy. [ x πσ ( ρ ) exp (x ρy) σ ( ρ ) ] dx Remark. If ρ 0 X and Y are independent, Remark 3. For gaussian random variables, E[X Y y] E[X] 0. E[X Y ] ρy. E[Y X] ρx. Remark 4. Let ˆX E[X Y ]. ˆX is the estimate of X given that Y was observed. The MSE (mean-square error) of this estimate is given by E[ ˆX X ]. 3

Example 6. A movie, of size N bits, is downloaded through a binary erasure channel. Where N P oisson(λ). Let K be the number of received bits. 0 ɛ ɛ ɛ ɛ 0 E Figure : Binary Erasure Channel. Calculate E[K]. Intuitively E[K] λ( ɛ). Now we will prove it mathematically by conditioning on N as a first step. For a given number of bits N n, K is a binomial random variable (K Binomial(n, ɛ)). E[K N n] n( ɛ). E[K N] N( ɛ). Applying the towering property of conditional expectation, E[K] E [E[K N]] E[N( ɛ)] ( ɛ)e[n] λ( ɛ).. Calculate E[N K]. Intuitively E[N K] k + λɛ. Now we will prove it mathematically, E[N K k] n0 np r(n K k). P r(n n, K k) P r(n K k) P r(k k) P r(n n)p r(k k N n). P r(k k) P r(n n) λn e λ. ( n! ) n P r(k k N n) ( ɛ) k ɛ n k. k P r(k k) nk nk P r(n n)p r(k k N n) e λ λ n n! ( ) n ( ɛ) k ɛ n k. k 4

P r(n n K k) e λ λ n n! + nk e λ λ n n! λ n ɛ n (n k)! λɛ + (λɛ) n k nk (n k)! (λɛ)n (n k)! (λɛ) k e λɛ (λɛ)n k e λɛ. (n k)! n! k!(n k)! ( ɛ)k ɛ n k n! k!(n k)! ( ɛ)k ɛ n k E[N K k] nk nk n (λɛ)n k e λɛ (n k)! (n k + k) (λɛ)n k e λɛ (n k)! (n k)(λɛ) n k e λɛ +k (n k)! nk }{{} λɛ k + λɛ. nk (λɛ) n k e λɛ (n k)! } {{ } 3 Moments of Random Variables Definition 3. The r th moment, r 0,,..., of a RV X is defined by,. E[X r ] m r i xr i P X(x i ), if X is discrete.. E[X r ] m r xr f X (x)dx, if X is continuous. Remark 5. Note that m 0 for any X, and m E[X] µ (the mean). Definition 4. The r th central moment, r 0,,..., of a RV X is defined as,. E[(X µ) r ] c r i (x i µ) r P X (x i ), if X is discrete.. E[(X µ) r ] c r (x µ)r f X (x)dx, if X is continuous. Remark 6. Note that for any RV X,. c 0.. c E[X µ] E[X] µ µ µ 0. 5

3. c E[(X µ) ] σ V ar[x] (the variance). In fact, σ is called the standard deviation. σ E[(X µ) ] E[X ] µ. Example 7. Let X Binomial(n, p). What is the variance of X? The PMF of X is given by, ( ) n P r(x k) p k ( p) n k, k 0,,..., n. k E[X] np (from Example ). σ E[X ] E[X] n ( ) n k p k ( p) n k n p. k k0 Check the textbook for the calculation of this sum. Here we will calculate σ using the same idea of Example, i.e. expressing X as the sum of n independent Bernoulli random variables. In fact, X X + X +... + X n. Where X i Bernoulli(p), for all i,..., n. Hence, Where, E[X ] E [ (X +... + X n ) ] E[X +... + X n + i X i X j ] j j i ne[x i ] + n(n )E[X i X j ] ne[x i ] + n(n )E[X i ][X j ]. E[X i ] p. E[X i ] p + 0 ( p) p. Hence, E[X ] np + n(n )p n p np + np. σ E[X ] E[X] n p np + np n p np( p). Example 8. Let X Geometric(p). What is the variance of X? The PMF of X is given by, P r(x k) ( p) k p, k,,... The variance of X is given by, σ E[X ] E[X]. 6

E[X] k( p) k p. E[X ] k k k ( p) k p. To calculate these sums we use the following facts, for x <, Deriving both sides with respect to x, Deriving both sides with respect to x, i i0 x i x ix i ( x) Hence, i i(i )x i ( x) 3 i i x i i i i ix i i x i (i + ) x i i ( x) 3 i i ix i + ( x) 3 (i + )x i + i x i + ( x) 3 i ( x) 3 ix i ( x) 3 ( x). Hence, i i x i + x ( x) 3. p E[X] ( ( p)) p. E[X ] p ( + ( p)) ( ( p)) 3 p p. 7

σ E[X ] E[X] p p p p p. Definition 5. The covariance of two random variables X and Y is defined by, cov(x, Y ) E [(X µ X )(Y µ Y )] E[XY ] µ X µ Y. Definition 6. The correlation coefficient of two random variables X and Y is defined by, ρ X,Y cov(x, Y ) σ X σ Y. If ρ X,Y 0, then cov(x, Y ) 0 and X and Y are said to be uncorrelated. Lemma. If X and Y are independent X and Y are uncorrelated. Remark 7. There could be two RVs which are uncorrelated but dependent. Example 9. (discrete case: uncorrelated independent) Consider two random variables X and Y with joint PMF P X,Y (x i, y j ) as shown. Figure : Values of P X,Y (x i, y j ). µ X 3 + 0 3 + 3 0. Hence, µ Y 0 3 + 3 3. with probability /3, XY 0 with probability /3, with probability /3. E[XY ] 3 + 0 3 + 3 0. ρ X,Y cov(x, Y ) σ X σ Y E[XY ] µ Xµ Y σ X σ Y 0 σ X σ Y 0 X and Y are uncorrelated. However, X and Y are dependent. For example P (X Y 0) 0 P (X ) /3. 8

Example 0. (continuous case: uncorrelated independent) Consider a RV Θ uniformly distributed on [0, π]. Let X cos Θ and Y sin Θ. X and Y are obviously dependent, in fact, Hence, E[X] E[cos Θ] π E[Y ] E[sin Θ] π π 0 π E[XY ] E[sin Θ cos Θ] π 0 X + Y. cos θdθ 0. sin θdθ 0. π 0 sin θ cos θdθ π sin θdθ 0. 4π 0 ρ X,Y cov(x, Y ) 0. X and Y are uncorrelated although they are dependent. Theorem 4. Given two random variables X and Y, ρ X,Y. Proof. We will prove this theorem using the Cauchy-Schwarz inequality by showing that, cov(x, Y ) σ X σ Y. θ u v < u, v > u. v cos θ < u, v > u. v. Using the same idea on random variables, let Z Y ax where a R. Assume that X and Y have zero mean. Consider the following cases:. Z 0 for all a R. Hence, Where, E[Z ] E [ (Y ax) ] > 0. E [ (Y ax) ] E[Y axy + a X ] E[X ]a E[XY ]a + E[Y ]. Since E[Z ] > 0 and a 0, 4E[XY ] 4E[X ]E[Y ] < 0. Hence, E[XY ] < E[X ]E[Y ] σxσ Y, E[XY ] < σ X σ Y. cov(x, Y ) < σ X σ Y. 9

. There exists a 0 R such that Z Y a 0 X 0. Y a 0 X, hence, E[XY ] E[a 0 X ] a 0 E[X ] a 0 σ X. V ar[y ] V ar[a 0 X] a 0V ar[x] a 0σ X σ Y. E[XY ] a 0 σ X σ X σ X σ Y. Therefore there is equality in this case, cov(x, Y ) σ X σ Y. Combining the results of the cases, cov(x, Y ) σ X σ Y. And therefore, ρ X,Y. Lemma. V ar[x + Y ] V ar[x] + V ar[y ] + cov(x, Y ). Proof. V ar[x + Y ] E[(X + Y ) ] E[(X + Y )] E[(X + Y ) ] (E[X] + E[Y ]) E[X ] + E[XY ] + E[Y ] E[X] E[X]E[Y ] E[Y ] V ar[x] + V ar[y ] + cov(x, Y ). Lemma 3. If X and Y are uncorrelated, then V ar[x + Y ] V ar[x] + V ar[y ]. Proof. X and Y are correlated cov(x, Y ) 0. Result then follows for Lemma. 0

4 Estimation An application for the previously mentioned definitions and theorems is estimation. Suppose we wish to estimate the a values of a RV Y by observing the values of another random variable X. This estimate is represented by Ŷ and the estimation error is given by ɛ Ŷ Y. A popular approach for determining Ŷ the estimate of Y given X, is by minimizing the MSE (mean squared error): MSE E[ɛ ] E[(Ŷ Y ) ]. Theorem 5. The MMSE (minimum mean squared estimate) of Y given X is ŶMMSE E[Y X]. In other words, the theorem states that ŶMMSE E[Y X] minimizes the MSE. Sometimes E[Y X] is difficult to find and a linear MMSE is used instead, i.e. Ŷ LMMSE αx + β. Proof. (sketch) Assume WLOG that random variables X and Y are zero mean. Denote by Ŷ the Y Ŷ ɛ X estimate of Y given X. In order to minimize E[ɛ ] ɛ E[ Y Ŷ ], the error ɛ should be orthogonal to the observation X as shown in the figure above. ɛ X, therefore, E[(Ŷ Y )X] 0. E[(Y αx)x] 0, E[XY ] αe[x ] 0. Hence, α E[XY ] E[X ] cov(x, Y ) σ X Ŷ LMMSE σ Y ρ X,Y σ X X. σ Y ρ X,Y σ X. The result above is for any two zero mean random variables. The general result, i.e. when µ X, µ Y 0, can be obtained by the same reasoning and is given by, Theorem 6. The LMMSE (linear minimum mean squared estimate) of Y given X that minimize the MSE is given by Ŷ LMMSE σ Y ρ X,Y σ X (X µ X ) + µ Y. Remark: (orthogonality principle) Let Ŷ be the estimate of Y given X, and ɛ the estimation error. Then Ŷ that minimizes the MSE E[ ɛ ] E[ Y Ŷ ] is given by Ŷ ɛ i.e. Y (Y Ŷ ).

Example. Suppose that in a room the temperature is given by a RV Y N(µ, σ ). A sensor in this room observes X Y + W, where W is the additive noise given by N(0, σw ). Assume Y and W are independent.. Find the MMSE of Y given X. Ŷ MMSE E[Y X]. f Y X (Y X x) f X,Y (x, y) f X (x) f Y (y)f X Y (x, y). f X (x) Since X is the sum of two independent gaussian RVs Y and W, we know from homework 3 that, X N(µ, σ + σ W ). Furthermore, E[X Y y] E[Y + W Y y] y + E[W ] y. V ar[x Y y] E[X Y y] E[X Y y] E[Y + Y W + W Y y] y E[Y Y y] + E[Y W Y y] + E[W Y y] y y + 0 + σ W y σ W. f X Y (X Y y) πσw exp [ ] (x y) σw. f Y X (Y X x) π σ σ W σ +σ W ] (y µ) (x y) (x µ) exp [ σ σw + (σ + σw ) [ exp (y µ ) πσ σ ]. Where, We are interested in, σ σ W σ σ + σw. Ŷ MMSE E[Y X] µ.

To determine µ take y 0: µ σ σ W σ +σ W µ σ µ σ W µ σ + σ W x σ W + (x µ) (σ + σ W ) + σ x σ + σ W σ4 x + µσ σw x + σ4 W µ (σ + σw ( ) σ x + σw µ ) σ + σw. σ σw (x µ) (σ + σw ) Ŷ MMSE E[Y X] µ σ σ + σ W X + σ W µ σ + σw.. Find the linear MMSE of Y given X. cov(x, Y ) E[XY ] E[X]E[Y ] E[(Y + W )Y ] E[Y + W ]E[Y ] E[Y ] + E[Y W ] µ σ + µ + 0 µ σ. Hence, ρ X,Y σ σ σ + σw σ. σ + σw Applying the general formula of LMMSE, Ŷ LMMSE σ Y ρ X,Y σ X (X µ X ) + µ Y σ (X µ) + µ σ + σw σ + σw σ σ + σw X + σ W µ σ + σw. Notice that ŶLMMSE ŶMMSE, in fact this is always the case for jointly gaussian random variables X and Y. 3

3. Find the MSE. E[ɛ ] E[(Ŷ Y ) ] E[ɛ(Ŷ Y )] 0 E[ɛŶ ] E[ɛY ] E[ɛY ] E[Ŷ Y ] E[Y ] [ σ E σ + σw σ (σ + µ ) σ + σw σ σw σ + σw. XY + σ W µ σ + σw Y + σ W µ σ + σ W σ µ ] σ µ 4

5 Jointly Gaussian Random Variables Definition 7. Two random variable X and Y are jointly gaussian if, [ ( f X,Y (x, y) πσ X σ exp (x µx ) Y ρ ( ρ ) σx + (y µ Y ) σy ρ(x µ )] X)(y µ Y ). σ X σ Y Properties:. If X and Y are jointly gaussian random variables, f X (x) f Y (y) f X,Y (x, y)dy f X,Y (x, y)dx e (x µ X ) σ X. πσx e (y µy) σ Y. πσy. If X and Y are jointly gaussian uncorrelated random variables X and Y are independent. 3. If X and Y are jointly gaussian random variables then any linear combination Z ax + by is a gaussian random variable. Figure 3: Two-variable joint gaussian distribution (from Wikipedia). 5

6 Bounds Theorem 7. (Chebyshev s Bound) Let X be an arbitrary random variable with mean µ and finite variance σ. Then for any δ > 0, Proof. σ (x µ) f X (x)dx x µ δ P r( X µ δ) σ δ. (x µ) f X (x)dx δ x µ δ f X (x)dx δ P r( X µ δ). Corollary. Corollary. P r( X µ < δ) σ δ. P r( X µ kδ) k. Example. A fair coin is flipped n times, let X be the number of heads observed. Determine a bound for P r(x 75% n). By applying Chebyshev s inequality, E[X] np n. V [X] np( p) n 4. P r(x 3 4 n) P r(x n n 4 ) P r( X n n 4 ) n/4 n /4 n. Theorem 8. (Markov Inequality) Consider a RV X for which f X (x) 0 for x < 0. Then X is called a nonnegative RV and the Markov inequality applies: P r(x δ) E[X]. δ Proof. E[X] 0 xf X (x)dx δ xf X (x)dx δ δ xf X (x)dx δp r(x δ). Example 3. Same setting as Example. According to Markov inequality, P r(x 3 n/ n) 4 3n/4 3. which is not dependent on n. The bound from Chebyshev s inequality is much tighter. 6

Theorem 9. (Law of Large Numbers) Let X, X,..., X n be n iid RVs with mean µ and variance σ. Consider the sample mean: ˆµ n X i. n We say, Proof. Since X i s are iid, E[ˆµ] E[ n V [ˆµ] V [ n Applying Chebyshev s inequality, i lim P r( ˆµ µ δ) 0 δ > 0. n + ˆµ in probability µ. n X i ] n i n X i ] n i n E[X i ] E[X i ] µ. i n i V [X i ] n V [X i] σ n. P r( ˆµ µ δ) σ 0 as n + for δ. nδ n 7 Moment Generating Functions (MGF) Definition 8. The moment-generating function (MGF), if it exists, of an RV X is defined by M(t) E[e tx ] where t is a complex variable. For discrete RVs, we can define M(t) using the PMF as e tx f X (x)dx, M(t) E[e tx ] i e tx i P X (x i ). Example: Let X Poisson(λ), P (X k) e λ λk k!, k 0,,,..... Find M x (t). M X (t) E(e tλ ) M X (t) k0 M X (t) e λ e λet. e tk P (X k) k0 e tk λ λk e k! e λ (λe t ) k k0 k! 7

. Find E(X) from M x (t). E(X) M X(t) t t0 λe t e λ(et ) t0 λ. Example: Let X N(µ, σ ), find M x (t). M X (t) e xt e (x µ) σ dx πσ e µt+ σ t Lemma 4. If M(t) exists, moments m k E[X k ] can be obtained by m k M (k) (0) dk dt k (M(t)), k 0,,.... t0 Proof. ] M X (t) E[e tx ] E [ + tx + (tx) + + (tx)n +...! n! + E[X] + t! E[X ] + + tn n! E[Xn ] +... m k E[X k ] dk dt k (M(t)) t0 8 Chernoff Bound In this section, we introduce the Chernoff bound. Recall that to use Markov s inequality X must be positive. Theorem 0. (Chernoff s bound) For any RV X, P (X a) e at M X (t) t > 0. In particular, P (X a) min e at M X (t). t Proof. Apply Markov on Y e tx, but first recall that P (X a) P (e tx e ta ) P (Y e ta ), by Markov we get P (Y e ta ) E(Y ) e ta e ta E(Y ) P (X a) e ta M X (t) 8

Example: Consider X N(µ, σ) and try to bound P (X a) using Chernoff bound, this is an artificial example because we know the distribution of X. From last lecture M X (t) e µt+ σ t hence P (X) min t e at e µt+ σ t min t e (µ a)t+ σ t Remark: You can check at home how the parameter t can affect the outer bound. For example pick µ 0, σ and change t; for t 0 you will get the trivial bound P and for t you will get P. See how it varies. min t e (µ a)t+ σ t f(t) t 0 (σ t + µ a)e 0 t a µ σ Which gives us the following: P (X a) e (µ a)t + σ t P (X a) e (a µ)(µ a) σ + σ (a µ) σ 4 P (X a) e (a µ) σ We can compare this result with the reality where we know that P (X a) a πσ e (x µ) σ dx. 9 Characteristic Function In this section, we define a characteristic function and give some examples. The characteristic function of a RV is similar to a Fourrier transform of a function without the -. Definition 9. X is a RV, Φ X (w) E(e jwx ) f X (x)e jwx dx, () is called the characteristic function of X where j is the complex number j. Example: Find the characteristic function of X exp(λ). Recall that for λ 0 { λe λx if x 0 f X (x) 0 otherwise 9

Then Φ X (w) λ 0 0 λe λx e jwx dx e (jw λ)x dx λ [ e (jw λ)x ] jw λ 0 Since λ 0 and jw is a unit quantity (jw λ) 0 therefore lim x e (jw λ)x 0. Which results in λ Φ X (w) (0 ) jw λ λ Φ X (w) λ jw Lemma 5. If Φ X (w) exists, moments m n E[X n ] can be obtained by m n j n Φ(n) X (0) where Φ (n) X (0) dn dw n Φ(w) X. w0 Proof. Φ X (w) E[e jwx ] (jw) n n! n0 m n ( d n j n m n dw n Φ(w) X ). w0 Lemma 6. if X, Y are two independent RV and Z X + Y then Φ Z (w) Φ X (w)φ Y (w) and M Z (t) M X (t)m Y (t) Remark: To find the distribution of Z X + Y it could be easier to find Φ X (w), Φ Y (w), multiply them and then invert the from Fourrier domain by integrating or by using tables of Fourrier inverse. Example: Consider the example of problem 9 of homework 3: 0

Question: Let X and X be two independent RV such that X N(µ, σ ) and X N(µ, σ ) and let X ax + bx. Find the distribution of X. Answer: Let X ax, X bx it is clear that X N(aµ, aσ ) and X N(bµ, bσ ) and that Φ X (w) Φ X (w)φ X (w). Φ X (w)... e aµ jw a σ w Φ X (w) e aµ jw a σ w e bµ jw b σ w Φ X (w) e j(aµ +bµ )w (a σ +b σ ) w Which implies that X N(aµ + bµ, a σ + b σ ). Fact. A linear combination of two independent Gaussian RV is a Gaussian RV. 0 Central Limit Theorem In this section we state the central limit theorem and give a rigourous proof. Theorem. Let X, X,..., X n be n independent RVs with µ Xi 0 and V (X i ) i then Z n X + X + + X n n N(0, ) n In other words z lim P (Z z) e x dx n π This is for example a way to convert flipping a coin n times to a Gaussian RV (fig. 4,fig. 5). { 0 if a tail is observed with p X i if a head is observed with p And set S n n i0 X i n n, notice that S n {0,,..., n } and according to CLT S n N(0, ). n CLT: says that no matter how far you are from the mean, the probability of X x being outside x µ n decreases exponentially with n. Remark: The RVs X i have to be independent because if for example X i X for i {, 3,..., n} then { n if X S n nx i 0 if X 0 which does not converge to a Gaussian distribution when n.

0.8 0.7 0.6 0.5 S n /n 0.4 0.3 0. 0. 0 0 0 0 30 40 50 60 70 80 90 00 n Figure 4: This is Sn n as a function of n, we can clearly see that when n grows Sn n goes to 0.5 for the equation of the example below for n goes to 00. Refer to section 6 for detailed code. Proof. (of theorem ) lim Φ Z n n (w) e w f Z (z) e x π where this form of Φ Zn (w) is the characteristic function of a N(0, ) RV. Z n X n + X n + + X n n }{{} W n }{{}}{{} W W Φ Zn (w) Φ W (w)φ W (w)φ W3 (w)... Φ Wn (w) [Φ W (w)] n Φ W (w) E(e wjw ) E(e jwx n ) Φ X ( w n )

0.8 0.7 0.6 0.5 S n /n 0.4 0.3 0. 0. 0 0 50 00 50 00 50 300 n Figure 5: This is Sn n as a function of n, we can clearly see that when n grows Sn n goes to 0.5 for the equation of the example below for n goes to 300. Refer to section 6 for detailed code. Taylor expansion: Using the Taylor expansion of Φ W (w) around 0 we get,. Φ W (w) Φ W (0) + Φ W (0) + Φ W (0)! +.... Find the value of Φ W (0). Φ W (0) E(e jwx n ) w0 e j0x n f X (x)dx f X (x)dx 3

. Find the value of Φ W (0). Φ W (0) jx n e j0x n f X (x)dx j xf X (x)dx n j n E(X ) 0 3. Find the value of Φ W (0). Φ W (0) d Φ W (w) dw n n n ( jx ) e j0x n f X (x)dx n x f X (x)dx (V (X) + E (X)) }{{}}{{} 0 Hence, using these results and Taylor s expansion, Φ W (w) w n. Therefore Φ Z n (w) [ w Recall that log ( ɛ) ɛ, then log Φ Zn n log ( w n ) log Φ Zn n( w n ) log Φ Zn w Φ Zn e w n ]n. MATLAB Code generating the figures In this section we give the MATLAB code used to generate fig. 4 and fig 5. 4

A[];B[]; % generate two empty vectors for i:00 % in this loop i stands for the number of times the coin is flipped A[A,binornd(i,0.5)/i]; % at each iteration generate a binomial random number with end % parameters ni, p0.5 and divide it by n to have (S n)/n for n:300 B[B,binornd(n,0.5)/n]; % same as previous but repeat it 300 times end x[:i];x[:n]; % x and x are used to represent n in each figure figure() plot(x,a); hold on plot(x,0.5,'r','linewidth',); xlabel('n'); ylabel('s n/n'); figure() plot(x,b); hold on plot(x,0.5,'r','linewidth',); xlabel('n'); ylabel('s n/n'); 5