Chapter 7 Generating functions Definition 7.. Let X be a random variable. The moment generating function is given by M X (t) =E[e tx ], provided that the expectation exists for t in some neighborhood of 0. That is, 9h >0:8t 2 [ h, h], E[e tx ] exists. This is also called the Laplace transform. Theorem 7.2. If X has mgf M X (t), then Example 7.3. Let X Gamma(, E[X n ]= dn dt n M X(t). t=0 ). What is the mgf? Solution. Found this the other day in the Kernel matching exercise: M X (t) =, t < /. t What if I want the variance of Gamma(, )? E[X] = d dx M X(t) t=0 = ( )( t) t=0 = E[X 2 ]= d2 dx 2 M X(t) t=0 =( )( ) 2 ( t) 2 t=0 =( 2 + ) 2 V[X] = 2 2 + 2 2 2 = 2. 29
30 CHAPTER 7. GENERATING FUNCTIONS Definition 7.4. X has Normal (Gaussian) distribution if f X (x) = Example 7.5. Let X N(0, ). What is the mgf? (x µ) 2 p exp 2 2 2. Solution. M X (t) = = = e t2 /2 e tx p e x2 /2 dx e tx p exp Z N(t, ) 2 ((x t)2 t 2 ) dx = e t2 /2. Matching moments do not uniquely determine a distribution except under some conditions. Example 7.6. Consider the random variables X and Y. Let f X (x) = p x e log2 (x)/2 I [0,) (x) f Y (y) =f X (y)[ + sin( log y)]. It can be shown that, for any r, E[X r ]=E[Y r ]. Solution. E[X r ]= = 0 = e r2 /2 = e r2 /2. x r p x e E[Y r ]=E[X r ]+ = E[X r ]+ log2 x/2 dx dy p e y(r ) e y2 /2 e y (using y = log x, dy = dx/x) = E[X r ]+0, dy p e (y2 2ry+r 2 )/2 0 y r p y e dz p e (r2 log2 y/2 sin( log y)dy z 2 )/2 sin(z)dz since the integrand is an odd function. Theorem 7.7. Let F X and F Y be 2 cdfs, all of whose moments exists.
3 0.5 0.4 0.3 0.2 0. 0.0 0 2 3 4 5 Figure 7.: Probability density functions of X and Y in Example 7.6. If X and Y have bounded support, then F X = F Y, E[X r ]=E[Y r ]8r 2 N +. If the mgfs exist and M X (t) =M Y (t) for all t in some neighborhood of 0, then F X = F Y. This is a bit unpleasant. The following uniquely determines the distribution. Definition 7.8. The characteristic function of X is X(t) =E[e itx ]. This is also the Fourier transform of the density. It still exists even if there is no density. It has lots of nice properties. The cf exists since, e itx = cos(tx)+i sin(tx) which are both bounded. Thus the integral always exists. If the density is symmetric about zero (i.e. f(x) =f( x)8x), then X is real valued and symmetric around zero. Every distribution has a unique cf. (0) =. With mgfs or cfs, we get other nice properties by virtue of their interpretations as Laplace or Fourier transforms respectively. This gives us another way to find the distribution of transformations of random variables. If Y = ax + b then M Y (t) =e bt M X (at) and Y (t) =e ibt X(at).
32 CHAPTER 7. GENERATING FUNCTIONS Suppose you have two functions f and g. The convolution is (f g)(x) = R f(y)g(x y)dx. Why might this matter? Suppose you have 2 independent random variables X and Y and you are interested in their sum. Then P (X + Y = z) =P ({X = x} \ {Y = z x}, x apple z) = P xapplez P (X = x)p (Y = z x). This is generally a pain (the continuous analogue is f Z (z) = R f X (x)f Y (z x)dx. But in Fourier space, convolution becomes multiplication. That is Z(t) = X (t) Y (t). The same with mgfs. This is extremely useful. Example 7.9. Let X,...,X n be independent Bern(p) random variables. Then Y = P n i= X i Binom(n, p). Solution. M X (t) =E[e tx ]=e t p + ( p) =( p)+pe t. nx M Y (t) =E[e ty ]= y=0 n y (pe t ) x ( p) n x =[pe t +( p)] n (by the binomial formula) ny = M Xi (t). i= Characteristic functions also give moments similarly to the mgf. Note that the n th derivative of (t) is d n dt n (t) = i n x n e itx f(x)dx. So, if the n th moment exists, then µ 0 n = h (0)/i h. Hence if we Taylor expand X (t) around 0, we get X(t) =+ it µ0 + (it)2 2 µ0 2 +
Chapter 8 Common familes of distributions So far we have talked about a number of named distributions. Each of these is really a family of specific probability distributions. Take for instance the Normal distribution with parameters µ and 2 with pdf: f X (x µ, 2 )= p exp 2 2 2 (x µ)2. (8.) Here I am making explicit the dependence of the pdf on the parameters µ and 2. By varying µ 2 (, ) and 2 > 0, we can obtain an entire collection of di erent individual distributions. So far we have seen a number of di erent families: N(µ, 2 ), Gamma(, ), Bern(p), Binom(n, p), Exp( ), and Cauchy(µ, ). All of these named distributions tend to arise from common probability ideas. I ll mention a few more common distributions that we haven t discussed yet, and then talk about how some of these distributions are related to each other. There are more in the text that you should read about. 8. Discrete distributions Definition 8.. A random variable X has discrete uniform distribution if P (X = x N) = N I {,2,...,N}(x). Of course, we could alter the support set. Definition 8.2. A random variable X has Poisson distribution if P (X = x )= e x I N (x). (8.2) x! The Poisson distribution is often used to model count data, for instance, the number of customers arriving at a bank window, pings to a computer server, the firing of neurons, etc. Some calculations will show that both the mean and the variance of the Poisson is. The Poisson also arises through the concept of rare events. Suppose X Binom(n, p) withn large and p small, maybe like X represents a blue-haired woman at the slot machine in Vegas. Let Y P ois(np). Then it turns out that P (X = x) P (Y = x). The easiest way to see this is via the moment generating functions. Then, apple M X (t) =[pe t +( p)] n = + (et n )np (8.3) n n!! e np(e t ) = M Y (t), (8.4) 33
34 CHAPTER 8. COMMON FAMILES OF DISTRIBUTIONS showing that as n gets large, Binom(n, p) converges to the P ois(np). Another way of arriving at the Poisson distribution is it s unique ability to describe a counting process. Definition 8.3. A stochastic process (random process) {N(t),t 0} is said to be a counting process if N(t) represents the total number of events that have occurred up to time t. A counting process must satisfy the following properties:. N(t) 0. 2. N(t) is integer valued. 3. If s<t, then N(s) apple N(t). 4. For s<t, N(t) N(s) is the number of events that have occurred in the interval (s, t]. If we impose a few more conditions, then we wind up with a Poisson process. Definition 8.4. Recall that the notation f(h) =o(h) means lim h!0 f(h) h =0. Definition 8.5. The counting process {N(t),t if. N(0) = 0. 0} is said to be a Poisson process with rate 2. {N(t),t 0} has stationary independent increments: the distribution of N(t) N(s) depends only on t s, and{n(t i ) N(s i ):i =, 2,...,n} are independent if s apple t apple s 2 apple t 2 apple apple t n. 3. P (N(h) = ) = h + o(h). 4. P (N(h) > ) = o(h). The reason this N(t) is called a Poisson process is that under these conditions, one can show that P (N(t + s) N(s) =x) = e t ( t) x I N (x). (8.5) x! In other words, the number of events that occurs in the t time has a Poisson distribution. Suppose we have a Poisson process N(t). Let X be the time that the first arrival occurs. Similarly, for n, let X n be the time between the (n )st arrival and the nth arrival. What is the distribution of X n? First, note that the event {X >t} occurs if and only if N(t) = 0, that is no arrivals occurred between 0 and t. Therefore, P (X >t)=p (N(t) = 0) = e t.so F X (t) = e t which is that of Exp(/ ). Similarly, P (X 2 >t X = s) =P (no events in (s, s + t] X = s) = P (no events in (s, s + t]) (condition (2) of the definition) = e So X 2 Exp(/ ) and X? X 2. t. Theorem 8.6. The interarrival times X i are independent identically distributed exponential random variables, X i Exp(/ ), i =,...,n. So the poisson distribution has a very nice interpretation and also a close connection to the exponential (and Gamma) distribution.