Stat 3 Fall 0: Notes on generating functions Michael Lugo October, 0 Definitions Given a random variable X which always takes on a positive integer value, we define the probability generating function f X (z) = P (X = 0) + P (X = )z + P (X = )z + = P (X = k)z k This infinite sum converges at least in some neighborhood of the origin z = 0; in particular f X () = P (X = k) = But we won t usually worry about questions of convergence The nice thing about generating functions is that they let us take a whole sequence of numbers and let us consolidate them into a single function; as Herb Wilf puts it in his book generatingfunctionology, A generating function is a clothesline on which we hang up a sequence of numbers for display Then we can use the tools of calculus on that function Since we know calculus this is useful In particular, if we want to know E(X), E(X ), we can find them by taking successive derivatives of f X Differentiating term by term gives [ ] d dz f X(z) = d P (X = k)z k = P (X = k) d dz dz zk = P (X = k)kz k and so f X () = P (X = k)k = E(X) To find E(X ) is a bit harder The obvious thing to do is differentiate twice, but this gives [ ] d dz f X(z) = d P (X = k)z k = P (X = k) d dz dz zk = P (X = k)k(k )z k Therefore letting z = gives () = P (X = k)k(k ) = E(X(X ))
But we need not despair! We have E(X ) = E(X(X ) + X) = E(X(X )) + E(X) = X() + f X() and if we recall that V ar(x) = E(X ) E(X), we can derive the formula V ar(x) = X() + f X() f X() We won t often have use for higher moments, but let s consider how we could find E(X 3 ) Differentiating three times gives f (3) () = E(X(X )(X ) It turns out that x 3 = x(x )(x ) + 3x(x ) + x, as you can easily verify Thus we get the formula E(X 3 ) = f (3) () + 3 () + f() This continues In general { } { } n n n k=0 (x) k k = x n, where denotes a Stirling number k of the second kind ; this is the number of ways to partition a set of n objects into k nonempty subsets, although for our purposes we can just think of these as the number that makes this identity holds Therefore E(X k ) = n k= { } n f (k) (n) k Incidentally, you might also write f X (z) = E(z X ), where the exponent is a random variable; this means that a lot of what we say here about discrete random variables can carry over to continuous random variables Finding some generating functions We know some distributions, so let s find their generating functions and then use these to derive moments The Bernoulli distribution Recall that the Bernoulli distribution with parameter p has P (X = 0) = p, P (X = ) = p Therefore the generating function is ( p)z 0 + pz, or pz + q This gives f X(z) = p, X(z) = 0 and so we can use the formulas E(X) = f X (), V ar(x) = f X () + f X () f X () to get E(X) = p, V ar(x) = 0 + p p = p( p) = pq But we already knew these The geometric distribution This is the time until the first head if we flip a coin with probability p of heads This distribution has P (X = k) = q k p for k, and so f X (z) = k q k pz k = k (qz) k pz
By the usual formula for the sum of a geometric series, we get f X (z) = pz/( qz) Differentiating once gives f X(z) = ( qz)p pz( q)/( qz) = p( qz), f X() = p( q) = p p = /p and so E(X) = /p Differentiating again gives X(z) = pq( qz) 3, X() = pq( q) 3 = pq/p 3 = q/p We can put this all together to get V ar(x) = X() + f X() f X() = q p + p p = q p The Poisson distribution The Poisson random variable has P (X = k) = e λ λ k /k! Therefore we have f X (z) = λ λk e k! zk = e λ (λz) k k! Recognizing the sum as the Taylor series for e λz, we get f X (z) = e λ e λz = e λ(z ) Therefore f X (z) = λeλ(z ), f X (z) = λ e λ(z ) Thus E(X) = λ, E(X X) = λ, E(X ) = λ + λ, V ar(x) = E(X ) E(X) = λ The mean and variance of the Poisson are λ, which we already knew The uniform distribution Somewhat surprisingly, it s difficult to use generating functions to get facts about the uniform distribution The generating function of the uniform distribution on 0,,,, n is f X (z) = n k=0 If you differentiate this you get n zk = n ( + z + + zn ) = ( zn ) n( z) f X(z) = n ( (n )z n nz n + ( z) But how can we evaluate this at z = 0? It appears to be 0/0 We take the limit as z : ) E(X) = n lim (n )z n nz n + z ( z) We then apply l Hopital s rule twice to get E(X) = n lim z (n )n(n )z n n(n )(n )z n 3 3
Plug in z = and simplify to get E(X) = (n )/ The variance can be found in the same way: we have f X(z) = (n )(n )z n n(n )z n + n(n )z n n (z ) 3 and we take the limit as z We have to apply l Hopital s rule three times to get f X() = (n )(n )n(n )(n ) n(n )(n )(n )(n 3) + n(n )(n )(n 3)(n n 6 and after much simplifcation this is (n )(n )/3 Finally V ar(x) = (n )(n ) 3 + n ( ) n = n But there are easier ways to get this see for example Pitman, exercise 330 The convolution formula One nice thing about generating functions is that they play well with respect to multiplication In particular, if X and Y are independent, and we have S = X + Y, then f S (z) = f X (z)f Y (z) That is, the generating function of the sum is the product of the generating functions To prove this fact, we show that the coefficient of z k in f S (z) is the same as that in f X (z)f Y (z) The coefficient of z k in f X (z)f Y (z) is k j=0 X and Y are independent, this is k j=0 be broken down into disjoint events: P (X = j)p (Y = k j) But since P (X = j, Y = k j) Finally, the event S = k can {S = k} = {X = 0, Y = k} {X =, Y = k } {X =, Y = k } {X = k, Y = 0} and so P (S = k) is the sum of the probabilities of these subevents So the coefficients of z k in f S (z) and f X (z)f Y (z) are the same for all k; thus the functions are the same Of course this can be extended to a sum of any finite number of random variables In particular, if S = X + + X n where X,, X n each have the distribution of X and the X i are independent, then f S (z) = f X (z) n Binomial distribution A binomial(n, p) random variable is the sum of n Bernoulli(p) random variables; therefore its generating function is f X (z) = (pz + q) n, the nth power of that of the Bernoulli So we have f X(z) = np(pz + q) n, X(z) = n(n )p (pz + q) n and in particular f X () = np, f X = n(n )p Thus E(X) = np and V ar(x) = n(n )p + np (np) = n p np + np n p = np np = np( p) = npq Negative binomial distribution Consider the waiting time until the rth success in a series of independent trials, where the first success occurs on the pth trial The overall
waiting time is the sum of r waiting times, each of which is geometric with parameter p The generating function of this waiting time T is therefore ( ) r pz f T (z) = ( qz) We can find the mean and the variance of T from this A useful trick is to note that d (log f dz T (z)) = f T (z)/f T (z); this is known as logarithmic differentiation Therefore we have and differentiating gives log f T (z) = r(log pz log( qz)) f T (z) ( f T (z) = r Letting z = gives f T ()/f T () = r Since f T () = we have f T z + q qz ) ( ) + q ; simplifying gives f q T ()/f T () = r/p () = r/p Finding the variance of T is left as an exercise 3 Alternative proofs of some facts When is the sum of two binomials a binomial? I claimed in class that the sum of two independent binomials is only a binomial if they have the same success probability We can prove this using generating functions Let X Bin(n, p ) and Y Bin(n, p ) Then they have generating functions (p z + q ) n and (p z + q ) n, respectively The sum S = X + Y has generating function f S (z) = (p z + q ) n (p z + q ) n, which is only of the form (pz + q) n if n = n One way to see this is to note that f S (z) = (p z + q ) n (p z + q ) n has real zeros at z = q /p, q /p If p p then this means that S has two real zeros, while (pz + q) n only has one Means and variances add Say X and Y are independent random variables with S = X + Y If X, Y have generating functions then we can show that E(X + Y ) = E(X) + E(Y ), V ar(x + Y ) = V ar(x) + V ar(y ) purely by calculus In particular, we have f S (z) = f X (z)f Y (z) by the convolution formula above Differentiating once gives and if we let z = we get f S(z) = f X(z)f Y (z) + f X (z)f Y (z) f S() = f X()f Y () + f X ()f Y () This should be read as I m writing these notes on a Friday afternoon and have done enough calculus for one day 5
But since f X, f Y are probability generating functions, f X () = f Y () = and so we have f S () = f X () + f Y () In terms of expectations this is just E(S) = E(X) + E(Y ) Similarly, we have and at z = this becomes S(z) = X(z)f Y (z) + f X(z)f Y (z) + f X (z) Y (z) S() = X() + f X()f Y () + Y () Combining this with the known expression for f S () we get V ar(s) = X() + f X()f Y () + Y () f X() f Y () (f X() + f Y ()) After some rearrangement this becomes X() + f X() f X() + f Y () + f Y () f Y () and this is clearly V ar(x) + V ar(y ) Another proof of the square root law If S = X + + X n, and all the X i are independent and have the distribution of X, then f S (z) = f X (z) n Differentiating both sides of this identity and substituting gives f S() = nf X () n f X() = nf X() which has the probabilistic interpretation E(S) = ne(x) Differentiating twice gives and letting z = gives Therefore the variance of S is S(z) = n(n )f X (z) n f X(z) + nf X (z) n X(z) S() = n(n )f X() + n X() V ar(s) = S() + f S() f S() = n(n )f X() + n X() + nf X() n f X() After simplifying we get V ar(s) = n( X() + f X() f X() ) = nv ar(x) and taking square roots gives SD(S) = nsd(x), the square root law 6
A couple frivolous results A power series identity from the negative binomial Recall that for the waiting time until the rth success in independent trials with success probability p, we have ( ) t P (T r = t) = p r q t r r But we also know that the generating function of T r is (pz/( qz)) r Therefore, the coefficient of z t in the Taylor expansion of (pz/( qz)) r is ( t r ) p r q t r We write this fact as ( ) r ( ) pz t [z t ] = p r q t r qz r where we use [z k ]f(z) to stand for the coefficient of z k in the Taylor expansion of f(z) Some simple manipulation gives z t r pr ( qz) r z t r ( qz) r = ( ) t r p r q t r = ( ) t r q t r [z t p r z r ] = ( qz) r ( ) t p r q t r r and if we let t = r + k, we get ( ) [z k k + r ] ( qz) = q k r r Finally, summing over k, we get ( qz) = ( ) k + r q k z k r r For example, we have = ( ) k + q k z k ( qz) 3 ( ) ( ) ( ) ( ) 3 5 = + (qz) + (qz) + (qz) 3 + = + 3(qz) + 6(qz) + 0(qz) 3 + Remainders when flipping coins You flip a coin, which comes up heads with probability p, n times The probability that the number obtained is even is (f X () + f X ( ))/, 7
where X is a binomial random variable this was a homework problem Since f X (z) = (pz + q) n, this works out to (p + q) n + ( p + q) n + ( p)n = So for a large number of coin flips (large n), this will be close to /; if p = / this will be exactly / But what if we want to know the probability that we obtain a number of heads which is a multiple of? This is not, in general, / For example, if we flip four fair coins the probability of getting a multiple of heads is ( ( ( 0) + ) )/ = /6; if we flip five fair coins it s ( ( ( 5 0) + 5 ) )/ 5 = 6/3 We can evaluate f X (z) at each of the fourth roots of unity to get f X () = p 0 + p + p + p 3 + p + p 5 + f X (i) = p 0 + ip p ip 3 + p + ip 5 + f X ( ) = p 0 p + p p 3 + p p 5 + f X ( i) = p 0 ip + p + ip 3 + p ip 5 + where p i = P (X = i) Adding all four of these together we get f X () + f X (i) + f X ( ) + f X ( i) = (p 0 + p + p 8 + ) and all the other p j cancel out Therefore we have a formula good for any random variable, P (X is divisible by ) = f X() + f X (i) + f X ( ) + f X ( i) and in the case where X Bin(n, /) this is P (X is divisible by ) = + ( +i ) n + 0 n + ( ) i n Since ( + i)/ < the second and fourth terms in the numerator go away; if we flip a large number of coins the probability that the number of heads is divisible by goes to / as n 8