Chapter 4 : Expectation and Moments

ECE5: Analysis of Random Signals Fall 06 Chapter 4 : Expectation and Moments Dr. Salim El Rouayheb Scribe: Serge Kas Hanna, Lu Liu Expected Value of a Random Variable Definition. The expected or average value of a random variable X is defined by,. E[X] µ X i x ip X (x i ), if X is discrete.. E[X] xf X(x)dx, if X is continuous. Example. Let X P oisson(λ). What is the expected value of X? The PMF of X is given by, P r(x k) e λ λk, k 0,,...,. k! E[X] k0 k λ λk ke k! λ λk ke k! λe λ + k λe λ e λ λ. λ k (k )! Theorem. (Linearity of Expectation) Let X and Y be any two random variables and let a and b be constants, then, E[aX + by ] ae[x] + be[y ]. Example. Let X Binomial(n, p). What is the expected value of X? The PMF of X is given by, ( ) n P r(x k) p k ( p) n k, k 0,,..., n. k E[X] n ( ) n k p k ( p) n k. k k0

Rather than evaluating this sum, an easier way to calculate E[X] is to express X as the sum of n independent Bernoulli random variables and apply Theorem. In fact, X X + X +... + X n. Where X i Bernoulli(p), for all i,..., n. Hence, By linearity of expectation (Theorem ), E[X] E[X +... + X n ]. { with probability p, X i 0 with probability p.. E[X i ] p + 0 ( p) p. E[X] E[X ] +... + E[X n ] np. Theorem. (Expected value of a function of a RV) Let X be a RV. For a function of a RV, that is, Y g(x), the expected value of Y can be computed from, E[Y ] g(x)f X (x)dx. Example 3. Let X N(µ, σ ) and Y X. What is the expected value of Y? Rather than calculating the pdf of Y and afterwards computing E[Y ], we apply Theorem : E[Y ] µ + σ. x πσ (x µ) e σ dx Conditional Expectations Definition. The conditional expectation of X given that the event B was observed is given by,. E[X B] i x ip X B (x i B), if X is discrete.. E[X B] xf X B(x B)dx, if X is continuous. Intuition: Think of E[X Y y] as the best estimate (guess) of X given that you observed Y. Example 4. A fair die is tossed twice. Let X be the number observed after the first toss, and Y be the number observed after the second toss. Let Z X + Y.. Calculate E[X]. E[X] + + 3 + 4 + 5 + 6 6 3.5.

. Calculate E[X Y 3]. E[X Y 3] E[X] 3.5. 3. Calculate E[X Z 5]. Remark. E[X Z] is a random variable. E[X Z 5] + + 3 + 4 4.5. Theorem 3. (Towering Property of Conditional Expectation) Let X and Y be two random variables, then, E [E[X Y ]] E[X]. Example 5. Let X and Y be two zero mean jointly gaussian random variables, that is, [ f X,Y (x, y) πσ ρ exp x + y ] ρxy σ ( ρ. ) Where ρ. Calculate E[X Y y]. E[X Y y] xf X Y (x Y y)dx. f X Y (X Y y) f X,Y (x, y) f Y (y) [ πσ exp ρ x +y ρxy σ ( ρ ) [ ] exp y πσ σ [ πσ ( ρ ) exp x + y ρxy ( ρ )y ] [ πσ ( ρ ) exp ] (x ρy) σ ( ρ ) σ ( ρ ) ]. Hence, E[X Y y] ρy. [ x πσ ( ρ ) exp (x ρy) σ ( ρ ) ] dx Remark. If ρ 0 X and Y are independent, Remark 3. For gaussian random variables, E[X Y y] E[X] 0. E[X Y ] ρy. E[Y X] ρx. Remark 4. Let ˆX E[X Y ]. ˆX is the estimate of X given that Y was observed. The MSE (mean-square error) of this estimate is given by E[ ˆX X ]. 3

Example 6. A movie, of size N bits, is downloaded through a binary erasure channel. Where N P oisson(λ). Let K be the number of received bits. 0 ɛ ɛ ɛ ɛ 0 E Figure : Binary Erasure Channel. Calculate E[K]. Intuitively E[K] λ( ɛ). Now we will prove it mathematically by conditioning on N as a first step. For a given number of bits N n, K is a binomial random variable (K Binomial(n, ɛ)). E[K N n] n( ɛ). E[K N] N( ɛ). Applying the towering property of conditional expectation, E[K] E [E[K N]] E[N( ɛ)] ( ɛ)e[n] λ( ɛ).. Calculate E[N K]. Intuitively E[N K] k + λɛ. Now we will prove it mathematically, E[N K k] n0 np r(n K k). P r(n n, K k) P r(n K k) P r(k k) P r(n n)p r(k k N n). P r(k k) P r(n n) λn e λ. ( n! ) n P r(k k N n) ( ɛ) k ɛ n k. k P r(k k) nk nk P r(n n)p r(k k N n) e λ λ n n! ( ) n ( ɛ) k ɛ n k. k 4

P r(n n K k) e λ λ n n! + nk e λ λ n n! λ n ɛ n (n k)! λɛ + (λɛ) n k nk (n k)! (λɛ)n (n k)! (λɛ) k e λɛ (λɛ)n k e λɛ. (n k)! n! k!(n k)! ( ɛ)k ɛ n k n! k!(n k)! ( ɛ)k ɛ n k E[N K k] nk nk n (λɛ)n k e λɛ (n k)! (n k + k) (λɛ)n k e λɛ (n k)! (n k)(λɛ) n k e λɛ +k (n k)! nk }{{} λɛ k + λɛ. nk (λɛ) n k e λɛ (n k)! } {{ } 3 Moments of Random Variables Definition 3. The r th moment, r 0,,..., of a RV X is defined by,. E[X r ] m r i xr i P X(x i ), if X is discrete.. E[X r ] m r xr f X (x)dx, if X is continuous. Remark 5. Note that m 0 for any X, and m E[X] µ (the mean). Definition 4. The r th central moment, r 0,,..., of a RV X is defined as,. E[(X µ) r ] c r i (x i µ) r P X (x i ), if X is discrete.. E[(X µ) r ] c r (x µ)r f X (x)dx, if X is continuous. Remark 6. Note that for any RV X,. c 0.. c E[X µ] E[X] µ µ µ 0. 5

3. c E[(X µ) ] σ V ar[x] (the variance). In fact, σ is called the standard deviation. σ E[(X µ) ] E[X ] µ. Example 7. Let X Binomial(n, p). What is the variance of X? The PMF of X is given by, ( ) n P r(x k) p k ( p) n k, k 0,,..., n. k E[X] np (from Example ). σ E[X ] E[X] n ( ) n k p k ( p) n k n p. k k0 Check the textbook for the calculation of this sum. Here we will calculate σ using the same idea of Example, i.e. expressing X as the sum of n independent Bernoulli random variables. In fact, X X + X +... + X n. Where X i Bernoulli(p), for all i,..., n. Hence, Where, E[X ] E [ (X +... + X n ) ] E[X +... + X n + i X i X j ] j j i ne[x i ] + n(n )E[X i X j ] ne[x i ] + n(n )E[X i ][X j ]. E[X i ] p. E[X i ] p + 0 ( p) p. Hence, E[X ] np + n(n )p n p np + np. σ E[X ] E[X] n p np + np n p np( p). Example 8. Let X Geometric(p). What is the variance of X? The PMF of X is given by, P r(x k) ( p) k p, k,,... The variance of X is given by, σ E[X ] E[X]. 6

E[X] k( p) k p. E[X ] k k k ( p) k p. To calculate these sums we use the following facts, for x <, Deriving both sides with respect to x, Deriving both sides with respect to x, i i0 x i x ix i ( x) Hence, i i(i )x i ( x) 3 i i x i i i i ix i i x i (i + ) x i i ( x) 3 i i ix i + ( x) 3 (i + )x i + i x i + ( x) 3 i ( x) 3 ix i ( x) 3 ( x). Hence, i i x i + x ( x) 3. p E[X] ( ( p)) p. E[X ] p ( + ( p)) ( ( p)) 3 p p. 7

σ E[X ] E[X] p p p p p. Definition 5. The covariance of two random variables X and Y is defined by, cov(x, Y ) E [(X µ X )(Y µ Y )] E[XY ] µ X µ Y. Definition 6. The correlation coefficient of two random variables X and Y is defined by, ρ X,Y cov(x, Y ) σ X σ Y. If ρ X,Y 0, then cov(x, Y ) 0 and X and Y are said to be uncorrelated. Lemma. If X and Y are independent X and Y are uncorrelated. Remark 7. There could be two RVs which are uncorrelated but dependent. Example 9. (discrete case: uncorrelated independent) Consider two random variables X and Y with joint PMF P X,Y (x i, y j ) as shown. Figure : Values of P X,Y (x i, y j ). µ X 3 + 0 3 + 3 0. Hence, µ Y 0 3 + 3 3. with probability /3, XY 0 with probability /3, with probability /3. E[XY ] 3 + 0 3 + 3 0. ρ X,Y cov(x, Y ) σ X σ Y E[XY ] µ Xµ Y σ X σ Y 0 σ X σ Y 0 X and Y are uncorrelated. However, X and Y are dependent. For example P (X Y 0) 0 P (X ) /3. 8

Example 0. (continuous case: uncorrelated independent) Consider a RV Θ uniformly distributed on [0, π]. Let X cos Θ and Y sin Θ. X and Y are obviously dependent, in fact, Hence, E[X] E[cos Θ] π E[Y ] E[sin Θ] π π 0 π E[XY ] E[sin Θ cos Θ] π 0 X + Y. cos θdθ 0. sin θdθ 0. π 0 sin θ cos θdθ π sin θdθ 0. 4π 0 ρ X,Y cov(x, Y ) 0. X and Y are uncorrelated although they are dependent. Theorem 4. Given two random variables X and Y, ρ X,Y. Proof. We will prove this theorem using the Cauchy-Schwarz inequality by showing that, cov(x, Y ) σ X σ Y. θ u v < u, v > u. v cos θ < u, v > u. v. Using the same idea on random variables, let Z Y ax where a R. Assume that X and Y have zero mean. Consider the following cases:. Z 0 for all a R. Hence, Where, E[Z ] E [ (Y ax) ] > 0. E [ (Y ax) ] E[Y axy + a X ] E[X ]a E[XY ]a + E[Y ]. Since E[Z ] > 0 and a 0, 4E[XY ] 4E[X ]E[Y ] < 0. Hence, E[XY ] < E[X ]E[Y ] σxσ Y, E[XY ] < σ X σ Y. cov(x, Y ) < σ X σ Y. 9

. There exists a 0 R such that Z Y a 0 X 0. Y a 0 X, hence, E[XY ] E[a 0 X ] a 0 E[X ] a 0 σ X. V ar[y ] V ar[a 0 X] a 0V ar[x] a 0σ X σ Y. E[XY ] a 0 σ X σ X σ X σ Y. Therefore there is equality in this case, cov(x, Y ) σ X σ Y. Combining the results of the cases, cov(x, Y ) σ X σ Y. And therefore, ρ X,Y. Lemma. V ar[x + Y ] V ar[x] + V ar[y ] + cov(x, Y ). Proof. V ar[x + Y ] E[(X + Y ) ] E[(X + Y )] E[(X + Y ) ] (E[X] + E[Y ]) E[X ] + E[XY ] + E[Y ] E[X] E[X]E[Y ] E[Y ] V ar[x] + V ar[y ] + cov(x, Y ). Lemma 3. If X and Y are uncorrelated, then V ar[x + Y ] V ar[x] + V ar[y ]. Proof. X and Y are correlated cov(x, Y ) 0. Result then follows for Lemma. 0

4 Estimation An application for the previously mentioned definitions and theorems is estimation. Suppose we wish to estimate the a values of a RV Y by observing the values of another random variable X. This estimate is represented by Ŷ and the estimation error is given by ɛ Ŷ Y. A popular approach for determining Ŷ the estimate of Y given X, is by minimizing the MSE (mean squared error): MSE E[ɛ ] E[(Ŷ Y ) ]. Theorem 5. The MMSE (minimum mean squared estimate) of Y given X is ŶMMSE E[Y X]. In other words, the theorem states that ŶMMSE E[Y X] minimizes the MSE. Sometimes E[Y X] is difficult to find and a linear MMSE is used instead, i.e. Ŷ LMMSE αx + β. Proof. (sketch) Assume WLOG that random variables X and Y are zero mean. Denote by Ŷ the Y Ŷ ɛ X estimate of Y given X. In order to minimize E[ɛ ] ɛ E[ Y Ŷ ], the error ɛ should be orthogonal to the observation X as shown in the figure above. ɛ X, therefore, E[(Ŷ Y )X] 0. E[(Y αx)x] 0, E[XY ] αe[x ] 0. Hence, α E[XY ] E[X ] cov(x, Y ) σ X Ŷ LMMSE σ Y ρ X,Y σ X X. σ Y ρ X,Y σ X. The result above is for any two zero mean random variables. The general result, i.e. when µ X, µ Y 0, can be obtained by the same reasoning and is given by, Theorem 6. The LMMSE (linear minimum mean squared estimate) of Y given X that minimize the MSE is given by Ŷ LMMSE σ Y ρ X,Y σ X (X µ X ) + µ Y. Remark: (orthogonality principle) Let Ŷ be the estimate of Y given X, and ɛ the estimation error. Then Ŷ that minimizes the MSE E[ ɛ ] E[ Y Ŷ ] is given by Ŷ ɛ i.e. Y (Y Ŷ ).

Example. Suppose that in a room the temperature is given by a RV Y N(µ, σ ). A sensor in this room observes X Y + W, where W is the additive noise given by N(0, σw ). Assume Y and W are independent.. Find the MMSE of Y given X. Ŷ MMSE E[Y X]. f Y X (Y X x) f X,Y (x, y) f X (x) f Y (y)f X Y (x, y). f X (x) Since X is the sum of two independent gaussian RVs Y and W, we know from homework 3 that, X N(µ, σ + σ W ). Furthermore, E[X Y y] E[Y + W Y y] y + E[W ] y. V ar[x Y y] E[X Y y] E[X Y y] E[Y + Y W + W Y y] y E[Y Y y] + E[Y W Y y] + E[W Y y] y y + 0 + σ W y σ W. f X Y (X Y y) πσw exp [ ] (x y) σw. f Y X (Y X x) π σ σ W σ +σ W ] (y µ) (x y) (x µ) exp [ σ σw + (σ + σw ) [ exp (y µ ) πσ σ ]. Where, We are interested in, σ σ W σ σ + σw. Ŷ MMSE E[Y X] µ.

To determine µ take y 0: µ σ σ W σ +σ W µ σ µ σ W µ σ + σ W x σ W + (x µ) (σ + σ W ) + σ x σ + σ W σ4 x + µσ σw x + σ4 W µ (σ + σw ( ) σ x + σw µ ) σ + σw. σ σw (x µ) (σ + σw ) Ŷ MMSE E[Y X] µ σ σ + σ W X + σ W µ σ + σw.. Find the linear MMSE of Y given X. cov(x, Y ) E[XY ] E[X]E[Y ] E[(Y + W )Y ] E[Y + W ]E[Y ] E[Y ] + E[Y W ] µ σ + µ + 0 µ σ. Hence, ρ X,Y σ σ σ + σw σ. σ + σw Applying the general formula of LMMSE, Ŷ LMMSE σ Y ρ X,Y σ X (X µ X ) + µ Y σ (X µ) + µ σ + σw σ + σw σ σ + σw X + σ W µ σ + σw. Notice that ŶLMMSE ŶMMSE, in fact this is always the case for jointly gaussian random variables X and Y. 3

3. Find the MSE. E[ɛ ] E[(Ŷ Y ) ] E[ɛ(Ŷ Y )] 0 E[ɛŶ ] E[ɛY ] E[ɛY ] E[Ŷ Y ] E[Y ] [ σ E σ + σw σ (σ + µ ) σ + σw σ σw σ + σw. XY + σ W µ σ + σw Y + σ W µ σ + σ W σ µ ] σ µ 4

5 Jointly Gaussian Random Variables Definition 7. Two random variable X and Y are jointly gaussian if, [ ( f X,Y (x, y) πσ X σ exp (x µx ) Y ρ ( ρ ) σx + (y µ Y ) σy ρ(x µ )] X)(y µ Y ). σ X σ Y Properties:. If X and Y are jointly gaussian random variables, f X (x) f Y (y) f X,Y (x, y)dy f X,Y (x, y)dx e (x µ X ) σ X. πσx e (y µy) σ Y. πσy. If X and Y are jointly gaussian uncorrelated random variables X and Y are independent. 3. If X and Y are jointly gaussian random variables then any linear combination Z ax + by is a gaussian random variable. Figure 3: Two-variable joint gaussian distribution (from Wikipedia). 5

6 Bounds Theorem 7. (Chebyshev s Bound) Let X be an arbitrary random variable with mean µ and finite variance σ. Then for any δ > 0, Proof. σ (x µ) f X (x)dx x µ δ P r( X µ δ) σ δ. (x µ) f X (x)dx δ x µ δ f X (x)dx δ P r( X µ δ). Corollary. Corollary. P r( X µ < δ) σ δ. P r( X µ kδ) k. Example. A fair coin is flipped n times, let X be the number of heads observed. Determine a bound for P r(x 75% n). By applying Chebyshev s inequality, E[X] np n. V [X] np( p) n 4. P r(x 3 4 n) P r(x n n 4 ) P r( X n n 4 ) n/4 n /4 n. Theorem 8. (Markov Inequality) Consider a RV X for which f X (x) 0 for x < 0. Then X is called a nonnegative RV and the Markov inequality applies: P r(x δ) E[X]. δ Proof. E[X] 0 xf X (x)dx δ xf X (x)dx δ δ xf X (x)dx δp r(x δ). Example 3. Same setting as Example. According to Markov inequality, P r(x 3 n/ n) 4 3n/4 3. which is not dependent on n. The bound from Chebyshev s inequality is much tighter. 6

Theorem 9. (Law of Large Numbers) Let X, X,..., X n be n iid RVs with mean µ and variance σ. Consider the sample mean: ˆµ n X i. n We say, Proof. Since X i s are iid, E[ˆµ] E[ n V [ˆµ] V [ n Applying Chebyshev s inequality, i lim P r( ˆµ µ δ) 0 δ > 0. n + ˆµ in probability µ. n X i ] n i n X i ] n i n E[X i ] E[X i ] µ. i n i V [X i ] n V [X i] σ n. P r( ˆµ µ δ) σ 0 as n + for δ. nδ n 7 Moment Generating Functions (MGF) Definition 8. The moment-generating function (MGF), if it exists, of an RV X is defined by M(t) E[e tx ] where t is a complex variable. For discrete RVs, we can define M(t) using the PMF as e tx f X (x)dx, M(t) E[e tx ] i e tx i P X (x i ). Example: Let X Poisson(λ), P (X k) e λ λk k!, k 0,,,..... Find M x (t). M X (t) E(e tλ ) M X (t) k0 M X (t) e λ e λet. e tk P (X k) k0 e tk λ λk e k! e λ (λe t ) k k0 k! 7

. Find E(X) from M x (t). E(X) M X(t) t t0 λe t e λ(et ) t0 λ. Example: Let X N(µ, σ ), find M x (t). M X (t) e xt e (x µ) σ dx πσ e µt+ σ t Lemma 4. If M(t) exists, moments m k E[X k ] can be obtained by m k M (k) (0) dk dt k (M(t)), k 0,,.... t0 Proof. ] M X (t) E[e tx ] E [ + tx + (tx) + + (tx)n +...! n! + E[X] + t! E[X ] + + tn n! E[Xn ] +... m k E[X k ] dk dt k (M(t)) t0 8 Chernoff Bound In this section, we introduce the Chernoff bound. Recall that to use Markov s inequality X must be positive. Theorem 0. (Chernoff s bound) For any RV X, P (X a) e at M X (t) t > 0. In particular, P (X a) min e at M X (t). t Proof. Apply Markov on Y e tx, but first recall that P (X a) P (e tx e ta ) P (Y e ta ), by Markov we get P (Y e ta ) E(Y ) e ta e ta E(Y ) P (X a) e ta M X (t) 8

Example: Consider X N(µ, σ) and try to bound P (X a) using Chernoff bound, this is an artificial example because we know the distribution of X. From last lecture M X (t) e µt+ σ t hence P (X) min t e at e µt+ σ t min t e (µ a)t+ σ t Remark: You can check at home how the parameter t can affect the outer bound. For example pick µ 0, σ and change t; for t 0 you will get the trivial bound P and for t you will get P. See how it varies. min t e (µ a)t+ σ t f(t) t 0 (σ t + µ a)e 0 t a µ σ Which gives us the following: P (X a) e (µ a)t + σ t P (X a) e (a µ)(µ a) σ + σ (a µ) σ 4 P (X a) e (a µ) σ We can compare this result with the reality where we know that P (X a) a πσ e (x µ) σ dx. 9 Characteristic Function In this section, we define a characteristic function and give some examples. The characteristic function of a RV is similar to a Fourrier transform of a function without the -. Definition 9. X is a RV, Φ X (w) E(e jwx ) f X (x)e jwx dx, () is called the characteristic function of X where j is the complex number j. Example: Find the characteristic function of X exp(λ). Recall that for λ 0 { λe λx if x 0 f X (x) 0 otherwise 9

Then Φ X (w) λ 0 0 λe λx e jwx dx e (jw λ)x dx λ [ e (jw λ)x ] jw λ 0 Since λ 0 and jw is a unit quantity (jw λ) 0 therefore lim x e (jw λ)x 0. Which results in λ Φ X (w) (0 ) jw λ λ Φ X (w) λ jw Lemma 5. If Φ X (w) exists, moments m n E[X n ] can be obtained by m n j n Φ(n) X (0) where Φ (n) X (0) dn dw n Φ(w) X. w0 Proof. Φ X (w) E[e jwx ] (jw) n n! n0 m n ( d n j n m n dw n Φ(w) X ). w0 Lemma 6. if X, Y are two independent RV and Z X + Y then Φ Z (w) Φ X (w)φ Y (w) and M Z (t) M X (t)m Y (t) Remark: To find the distribution of Z X + Y it could be easier to find Φ X (w), Φ Y (w), multiply them and then invert the from Fourrier domain by integrating or by using tables of Fourrier inverse. Example: Consider the example of problem 9 of homework 3: 0

Question: Let X and X be two independent RV such that X N(µ, σ ) and X N(µ, σ ) and let X ax + bx. Find the distribution of X. Answer: Let X ax, X bx it is clear that X N(aµ, aσ ) and X N(bµ, bσ ) and that Φ X (w) Φ X (w)φ X (w). Φ X (w)... e aµ jw a σ w Φ X (w) e aµ jw a σ w e bµ jw b σ w Φ X (w) e j(aµ +bµ )w (a σ +b σ ) w Which implies that X N(aµ + bµ, a σ + b σ ). Fact. A linear combination of two independent Gaussian RV is a Gaussian RV. 0 Central Limit Theorem In this section we state the central limit theorem and give a rigourous proof. Theorem. Let X, X,..., X n be n independent RVs with µ Xi 0 and V (X i ) i then Z n X + X + + X n n N(0, ) n In other words z lim P (Z z) e x dx n π This is for example a way to convert flipping a coin n times to a Gaussian RV (fig. 4,fig. 5). { 0 if a tail is observed with p X i if a head is observed with p And set S n n i0 X i n n, notice that S n {0,,..., n } and according to CLT S n N(0, ). n CLT: says that no matter how far you are from the mean, the probability of X x being outside x µ n decreases exponentially with n. Remark: The RVs X i have to be independent because if for example X i X for i {, 3,..., n} then { n if X S n nx i 0 if X 0 which does not converge to a Gaussian distribution when n.

0.8 0.7 0.6 0.5 S n /n 0.4 0.3 0. 0. 0 0 0 0 30 40 50 60 70 80 90 00 n Figure 4: This is Sn n as a function of n, we can clearly see that when n grows Sn n goes to 0.5 for the equation of the example below for n goes to 00. Refer to section 6 for detailed code. Proof. (of theorem ) lim Φ Z n n (w) e w f Z (z) e x π where this form of Φ Zn (w) is the characteristic function of a N(0, ) RV. Z n X n + X n + + X n n }{{} W n }{{}}{{} W W Φ Zn (w) Φ W (w)φ W (w)φ W3 (w)... Φ Wn (w) [Φ W (w)] n Φ W (w) E(e wjw ) E(e jwx n ) Φ X ( w n )

0.8 0.7 0.6 0.5 S n /n 0.4 0.3 0. 0. 0 0 50 00 50 00 50 300 n Figure 5: This is Sn n as a function of n, we can clearly see that when n grows Sn n goes to 0.5 for the equation of the example below for n goes to 300. Refer to section 6 for detailed code. Taylor expansion: Using the Taylor expansion of Φ W (w) around 0 we get,. Φ W (w) Φ W (0) + Φ W (0) + Φ W (0)! +.... Find the value of Φ W (0). Φ W (0) E(e jwx n ) w0 e j0x n f X (x)dx f X (x)dx 3

. Find the value of Φ W (0). Φ W (0) jx n e j0x n f X (x)dx j xf X (x)dx n j n E(X ) 0 3. Find the value of Φ W (0). Φ W (0) d Φ W (w) dw n n n ( jx ) e j0x n f X (x)dx n x f X (x)dx (V (X) + E (X)) }{{}}{{} 0 Hence, using these results and Taylor s expansion, Φ W (w) w n. Therefore Φ Z n (w) [ w Recall that log ( ɛ) ɛ, then log Φ Zn n log ( w n ) log Φ Zn n( w n ) log Φ Zn w Φ Zn e w n ]n. MATLAB Code generating the figures In this section we give the MATLAB code used to generate fig. 4 and fig 5. 4

A[];B[]; % generate two empty vectors for i:00 % in this loop i stands for the number of times the coin is flipped A[A,binornd(i,0.5)/i]; % at each iteration generate a binomial random number with end % parameters ni, p0.5 and divide it by n to have (S n)/n for n:300 B[B,binornd(n,0.5)/n]; % same as previous but repeat it 300 times end x[:i];x[:n]; % x and x are used to represent n in each figure figure() plot(x,a); hold on plot(x,0.5,'r','linewidth',); xlabel('n'); ylabel('s n/n'); figure() plot(x,b); hold on plot(x,0.5,'r','linewidth',); xlabel('n'); ylabel('s n/n'); 5