Lecture 5: Moment generating functions Definition 2.3.6. The moment generating function (mgf) of a random variable X is { x e tx f M X (t) = E(e tx X (x) if X has a pmf ) = etx f X (x)dx if X has a pdf provided that E(e tx ) exists. (Note that M X () = E(e X ) = 1 always exists.) Otherwise, we say that the mgf M X (t) does not exist at t. Theorem 2.3.15. For any constants a and b, the mgf of the random variable ax + b is Proof. By definition, M ax+b (t) = e bt M X (at) M ax+b (t) = E(e t(ax+b) ) = E(e tax e bt ) = e bt E(e (ta)x ) = e bt M X (at) UW-Madison (Statistics) Stat 69 Lecture 5 215 1 / 16
The main use of mgf It can be used to generate moments. It helps to characterize a distribution. Theorem 2.3.7. If M X (t) exists at ±t, then E(X n ) exists for any positive integer n and E(X n ) = M (n) X () = d n dt n M X (t) t= i.e., the nth moment is the nth derivative of M X (t) evaluated at t =. Proof. Assuming that we can exchange the differentiation and integration (which will be justified later), d n dt n M X (t) = d n dt n e tx d n e tx f X (x)dx = dt n f X (x)dx = x n e tx f X (x)dx Hence d n dt n M X (t) = t= x n e x f X (x)dx = x n f X (x)dx = E(X n ) UW-Madison (Statistics) Stat 69 Lecture 5 215 2 / 16
The condition that M X (t) exists at ±t is in fact used to ensure the validity of the exchange of differentiation and integration. Besides this, we also need to argue that E X n < for any positive integer n under the condition that M X (t) exists at ±t. We first show that, if M X (t) exists at ±t, then for any s ( t,t), M X (s) exists: M X (s) = E(e sx ) = E[e sx I(X > )] + E[e sx I(X )] E[e tx I(X > )] + E[e tx I(X )] E(e tx ) + E(e tx ) = M X (t) + M X ( t) < where I(X > ) is the indicator of X >. Next, we show that, for any positive p >, E X p < under the condition M X (t) exists at ±t. For a given p >, choose s such that < ps < t. Because s X e s X, we have s p X p e ps X e psx + e psx and E X p s p E(e psx + e psx ) = s p M X (ps) + M X ( ps) < UW-Madison (Statistics) Stat 69 Lecture 5 215 3 / 16
In fact, the condition that M X (t) exists at ±t ensures that M X (s) has the power series expansion E(X M X (s) = k )s k t < s < t k! k= If the distribution of X is symmetric (about ), i.e., X and X have the same distribution, then M X (t) = E(e tx ) = E(e t( X) ) = E(e tx ) = M X ( t) i.e., M X (t) is an even function and M X (t) exists at ±t is the same as M X (t) exists at a t >. Example 2.3.8 (Gamma mgf) Let X have the gamma pdf 1 f X (x) = Γ(α)β α x α 1 e x/β, x >, where α > and β > are two constants and Γ(α) = is the so-called gamma function. x α 1 e x dx α > UW-Madison (Statistics) Stat 69 Lecture 5 215 4 / 16
If t < 1/β, M X (t) = E(e tx 1 ) = Γ(α)β α = 1 Γ(α)β α = ( β 1 βt )α Γ(α)β α x α 1 e x/( 1 βt ) dx e tx x α 1 e x/β dx β s α 1 e s ds = 1 (1 βt) α If t 1/β, then E(e tx ) =. We can obtain E(X) = d dt M X (t) αβ = t= (1 βt) α+1 = αβ t= For any integer n > 1, E(X n ) = d n dt n M X (t) α(α + 1) (α + n 1)β n = t= (1 βt) α+n = α(α + 1) (α + n 1)β n t= UW-Madison (Statistics) Stat 69 Lecture 5 215 5 / 16
Can the moments determine a distribution? Can two random variables with different distributions have the same moments of any order? Example 2.3.1. X 1 has pdf f 1 (x) = 1 2πx e (logx)2 /2, x X 2 has pdf f 2 (x) = f 1 (x)[1 + sin(2π logx)], x For any positive integer n, E(X1 n ) = 1 x n 1 e (logx)2 /2 dx 2π = 1 2π = en2 /2 2π = e n2 /2 e ny y 2 /2 dy e (y n)2 /2 dy y = logx using the property of a normal distribution UW-Madison (Statistics) Stat 69 Lecture 5 215 6 / 16
E(X2 n ) = x n f 1 (x)[1 + sin(2π logx)]dx = E(X1 n ) + 1 x n 1 e (logx)2 /2 sin(2π logx)dx 2π = E(X n 1 ) + 1 2π = E(X n 1 ) + en2 /2 2π = E(X n 1 ) + en2 /2 2π = E(X n 1 ) + en2 /2 2π = E(X n 1 ) e ny e y 2 /2 sin(2πy)dy e (y n)2 /2 sin(2πy)dy e s2 /2 sin(2π(s + n))ds e s2 /2 sin(2πs)ds since e s2 /2 sin(2πs) is an odd function. This shows that X 1 and X 2 have the same moments of order n = 1, 2,..., but they have different distributions. UW-Madison (Statistics) Stat 69 Lecture 5 215 7 / 16
In some cases, moments determine the distributions. The mgf, if it exists, determines a distribution. Theorem 2.3.11 Let X and Y be random variables with cdfs F X and F Y, respectively. a. If X and Y are bounded, then F X (u) = F Y (u) for all u iff E(X r ) = E(Y r ) for all r = 1,2,... b. If mgf s exist in a neighborhood of and M X (t) = M Y (t) for all t, then F X (u) = F Y (u) for all u. The key idea of the proof can be explained as follows. Note that M X (t) = e tx f X (x)dx is the Laplace transformation of f X (x). From the uniqueness of the Laplace transformation, there is a one-to-one correspondence between the mgf and the pdf. We will give a proof of this result in Chapter 4 for the multivariate case, after we introduce the characteristic functions. UW-Madison (Statistics) Stat 69 Lecture 5 215 8 / 16
From the power series result in the last lecture, if the mgf of X exists in a neighborhood of, then it has a power series expansion which is determined by moments E(X n ), n = 1,2,... Therefore, knowing the mgf and knowing moments of all order are the same, but this is under the condition that the mgf exists in a neighborhood of. Once we establish part (b), the proof of part (a) is easy: if X and Y are bounded, then their mgf s exist for all t and thus their cdf s are the same iff their moments are the same for any order. The condition that the mgf exists in a neighborhood of is important. There are random variables with finite moments of any order, but their mgf s do not exist. Example The pdf f X (x) = 1 2πx e (logx)2 /2, x is called the log-normal distribution or density, because if X has pdf f X, then logx has a normal pdf. UW-Madison (Statistics) Stat 69 Lecture 5 215 9 / 16
In Example 2.3.1, we have shown that the log-normal distribution has finite moments of any order. For t >, e tx M X (t) = e (logx)2 /2 dx = 2πx because, when t >, When t <, M X (t) = lim x e tx 2πx e (logx)2 /2 = e tx e (logx)2 /2 1 dx e (logx)2 /2 dx = 1 2πx 2πx and, hence, M X (t) exists for all t <. How do we find a distribution for which all moments exist and the mgf does not exists for any t? Consider the pdf { fx (x)/2 x > f Y (x) = f X ( x)/2 x < UW-Madison (Statistics) Stat 69 Lecture 5 215 1 / 16
For this pdf, E( Y n ) = x n f X (x) 2 dx + ( x) n f X ( x) 2 which has been derived for any n = 1,2,... On the other hand, E(e ty ) = e tx f X (x) 2 dx + dx = x n f X (x)dx = E(X n ) e tx f X ( x) dx 2 = e tx f X (x) 2 dx + e tx f X (x) 2 dx and we have shown that one of these integrals is (depending on whether t > or < ). Theorem. If a random variable X has finite moment a n = E(X n ) for any n = 1,2,..., and the series a n t n < with t > n! n= then the cdf of X is determined by a n, n = 1,2... UW-Madison (Statistics) Stat 69 Lecture 5 215 11 / 16
Example. Suppose that a n = n! is the nth moment of a random variable X. Since a n t n = n! t n = 1 t < 1 1 t n= n= and this function is the mgf of Gamma(1,1) at t, we conclude that X Gamma(1,1). Suppose that the nth moment of a random variable Y is { n!/(n/2)! if n is even a n = if n is odd Then a n t n = n! n= n!(t 2 ) n/2 n!(n/2)! = t 2k k! = et2 t R n is even k= Later, we show that this is the mgf of N(, 2), hence X N(, 2). For a log-normal distributed random variable X discussed in the beginning of this lecture, E(X n ) = e n2 /2 and n= en2 /2 t n /n! = for any t > and, hence, the theorem is not applicable. UW-Madison (Statistics) Stat 69 Lecture 5 215 12 / 16
In applications we often need to approximate a cdf by a sequence of cdf s. The next theorem gives a sufficient condition for the convergence of cdf s and moments of random variables in terms of the convergence of mgf s. Theorem 2.3.12 Suppose that X 1,X 2,... is a sequence of random variables with mgf s M Xn (t), and lim M X n n (t) = M X (t) < for all t in a neighborhood of where M X (t) is the mgf of a random variable X. Then, for all x at which F X (x) is continuous, lim F X n n (x) = F X (x). Furthermore, for any p >, we have lim E X n p = E X p and lim E X n X p = n n UW-Madison (Statistics) Stat 69 Lecture 5 215 13 / 16
Example 2.3.13 (Poisson approximation) The cdf of the binomial pmf, ( ) n f X (x) = p x (1 p) n x, x x =,1,...,n, where n is a positive integer and < p < 1, may not be easy to calculate when n is very large. It is often approximated by the cdf of the Poisson pmf, f Y (x) = e λ λ x, x =,1,2,... x! where λ > is a constant. For this purpose, we first compute the mgf s for the binomial and Poisson distributions. Example 2.3.9 (binomial mgf) Using the binomial formula n x= ( n x ) u x v n x = (u + v) n UW-Madison (Statistics) Stat 69 Lecture 5 215 14 / 16
we obtain n M X (t) = E(e tx ) = x=e tx( n x = n x= ( ) n (pe t ) x (1 p) n x x = (pe t + 1 p) n ) p x (1 p) n x Note that E(X) = d dt M X (t) = n(pe t + 1 p) n 1 pe t = np t= t= E(X 2 ) = d 2 dt 2 M X (t) t= = [n(n 1)(pe t + 1 p) n 1 (pe t ) 2 + n(pe t + 1 p) n 1 pe t ] = n(n 1)p 2 + np t= We got the same results previously, but the calculation here is simpler. UW-Madison (Statistics) Stat 69 Lecture 5 215 15 / 16
Example 2.3.13 (continued) If Y has the Poisson pmf, then M Y (t) = x= e tx e λ λ x x! = e λ (e t λ) x x! x= = e λ e et λ = e λ(et 1) which is finite for any t R. Let X n have the binomial distribution with n and p. Suppose that lim n np = λ > (that means p also depends on n and p when n ). Then, for any t, as n, [ ] n M Xn (t) = (pe t + 1 p) n = 1 + (np)(et 1) e λ(et 1) = M Y (t) n using the fact that, for any sequence of numbers a n converges to a, ( lim 1 + a ) n n = e a n n With this result and Theorem 2.3.12, we can approximate P(X n u) by P(Y u) when n is large and np is approximately a constant. UW-Madison (Statistics) Stat 69 Lecture 5 215 16 / 16