The Bi, p) ca be thought of as the distributio of a sum of idepedet idicator radom variables X +...+ X, with {X i = } deotig a head o the ith toss of a coi. The ormal approximatio to the Biomial works best whe the variace p p) is large, for the each of the stadardized summads X i p)/ p p) makes a relatively small cotributio to the stadardized sum. Whe is large but p is small, i such a way that p is ot too large, a differet type of approximatio to the Biomial is better. Defiitio. A radom variable Y is said to have a Poisso distributio with parameter λ if it ca take values i N 0, the set of oegative itegers, with probabilities P{Y = k} = e λ λ k for k = 0,, 2,... The parameter λ must be positive. The distributio is deoted by Poissoλ). Throughout this Chapter I will use Q λ to deote the Poissoλ) distributio. That is, Q λ is a probability distributio cocetrated o N 0 for which Q λ {k} = e λ λ k for k = 0,, 2,... Example <8.>: Poissop) approximatio to the Biomial, p) The Poisso iherits several properties from the Biomial. For example, the Bi, p) has expected value p ad variace p p). Oe might suspect that the Poissoλ) should therefore have expected value λ = λ/) ad variace λ = lim λ/) λ/). Also, the coi-tossig origis of the Biomial show that if X has a Bim, p) distributio ad Y has a Bi, p) distributio idepedet of X, the X + Y has a Bi + m, p) distributio. Puttig λ = mp ad µ = p oe would the suspect that the sum of idepedet Poissoλ) ad Poissoµ) distributed radom variables is Poissoλ + µ) distributed. Example <8.2>: If X has a Poissoλ) distributio, the EX = varx) = λ. If also Y has a Poissoµ) distributio, ad Y is idepedet of X, the X + Y has a Poissoλ + µ) distributio. Couts of rare evets such as the umber of atoms udergoig radioactive decay durig a short period of time, or the umber of aphids o a leaf are ofte modelled by Poisso distributios, at least as a first approximatio. I some situatios it makes sese to thik of the couts as the umber of successes i a large umber of idepedet trials, with the chace of a success o ay particular trial beig very small rare evets ). I such a settig, the Poisso arises as a approximatio for a sum of idepedet couts. I fact, moder probability methods ca hadle situatios much more geeral tha approximatio to the Biomial. For example, suppose S = X + X 2 +... + X, where X i Statistics 24: 8 October 2005 C8- c David Pollard
has a Bi, p i ) distributio, for costats p, p 2,...,p that are ot ecessarily all the same. Suppose the X i s are idepedet. If the p i s are ot all the same the S does ot have a Biomial distributio. Nevertheless, the Che-Stei method see Barbour, Holst & Jaso 992 for a extesive discussio of the method) ca be used to show that max P{S A} Q λa) e λ )/λ i= A p2 i where λ = p +...+ p. The method of proof is elemetary i the sese that it makes use of probabilistic techiques at the level of Statistics 24 but extremely subtle. Remark. The maximum here rus over all subsets A of N 0. I fact the maximum is achieved whe A = {k N 0 : P{S = k} Q λ {k}}, i which case P{S A} Q λ A) equals P{S = k} Q 2 k=0 λ{k}. This quatity is called the total variatio distace betwee Q λ ad the distributio of X; it gives a very strog cotrol over the errors i the approximatio. Note also that e λ )/λ mi, /λ). Ideed, the left-had side is close to whe λ 0 ad it behaves like /λ whe λ is large. Whe all the p i are equal to some small p, we get a boud o the total variatio distace betwee the Biomial, p) ad the Poissop) smaller tha mip, p 2 ). This boud makes precise the traditioal advice that the Poisso approximatio is good whe p is small ad p is ot too big. I fact, the traditio was a bit coservative.) The Poisso approximatio also applies i may settigs where the trials are almost idepedet, but ot quite. Agai the Che-Stei method delivers impressively good bouds o the errors of approximatio. For example, the method works well i two cases where the depedece takes a a simple form. Oce agai suppose S = X + X 2 +...+ X, where X i has a Bi, p i ) distributio, for costats p, p 2,...,p that are ot ecessarily all the same. Defie S i = S X i = j: j i X j. The radom variables X,...,X are said to be positively associated if P{S i k X i = } P{S i k X i = 0} for each i ad each k N 0 ; they are said to be egatively associated 2 if P{S i k X i = } P{S i k X i = 0} for each i ad each k N 0 ; With some work it ca be show that max P{S A} Q λa) A { e λ 2 )/λ i= p2 i + vars) λ ) ) uder positive associatio λ vars) uder egative associatio. These bouds take advatage of the fact that vars) would be exactly equal to λ if S had a Poissoλ) distributio. The ext Example illustrates both the classical approach ad the Che-Stei approach via positive associatio) to derivig a Poisso approximatio for a matchig problem. Example <8.3>: Poisso approximatio for a matchig problem: assigmet of letters at radom to evelopes, oe per evelope. The Appedix to this Chapter provides a more detailed itroductio to the Che-Stei method, as applied to aother aspect of the matchig problem. I have take advatage of a few special features of the matchig problem to simplify the expositio.) You could safely skip this Appedix. For more details, see the moograph by Barbour et al. 992). ot stadard termiology 2 ot stadard termiology Statistics 24: 8 October 2005 C8-2 c David Pollard
Examples for Chapter 8 <8.> Example. The Poissoλ) appears as a approximatio to the Bi, p) whe is large, p is small, ad λ = p: ) ) p k p) k )... k + ) λ k = λ ) k k = )... k λk λ ) if k is small relative to ) λ ) k λ k λ ) λk e λ if is large. The fial e λ comes from a approximatio to the logarithm, log λ ) = log λ ) = λ λ 2 ) 2... λ if λ/ 0. 2 <8.2> Example. Verify the properties of the Poisso distributio suggested by the Biomial aalogy: If X has a Poissoλ) distributio, show that i) EX = λ ii) varx) = λ Also, if Y has a Poissoµ) distributio idepedet of X, show that iii) X + Y has a Poissoλ + µ) distributio Solutio: Assertio i) comes from a routie applicatio of the formula for the expectatio of a radom variable with a discrete distributio. EX = kp{x = k} = k e λ λ k What happes to k = 0? k=0 k= = e λ λ = e λ λe λ k =0 λ k k )! = λ. Notice how the k cacelled out oe factor from the i the deomiator. If we were to calculate EX 2 ) i the same way, oe factor i the k 2 would cacel the leadig k from the, but would leave a upleasat k/k )! i the sum. Too bad the k 2 caot be replaced by kk ). Well, why ot? EX 2 X) = kk )P{X = k} k=0 = e λ k=2 = e λ λ 2 kk ) λk k 2=0 λ k 2 k 2)! What happes to k = 0 ad k =? = λ 2. Now calculate the variace. varx) = EX 2 ) EX) 2 = EX 2 X) + EX EX) 2 = λ. Statistics 24: 8 October 2005 C8-3 c David Pollard
For assertio iii), first ote that X + Y ca take oly values 0,, 2... For a fixed k i this rage, decompose the evet {X + Y = k} ito disjoit pieces whose probabilities ca be simplified by meas of the idepedece betwee X ad Y. P{X + Y = k} =P{X = 0, Y = k}+p{x =, Y = k }+...+ P{X = k, Y = 0} = P{X = 0}P{Y = k}+p{x = }P{Y = k }+...+ P{X = k}p{y = 0} = e λ λ 0 0! = e λ µ e µ µ k +...+ e λ λ k e µ µ 0 0! 0! λ0 µ k +!k )! λ µ k +...+ 0! λk µ 0 = e λ µ λ + µ) k. The bracketed sum i the secod last lie is just the biomial expasio of λ + µ) k. Questio: How do you iterpret the otatio i the last calculatio whe k = 0? I always feel slightly awkward about a cotributio from k ifk = 0. <8.3> Example. Suppose letters are placed at radom ito evelopes, oe letter per evelope. The total umber of correct matches, S, ca be writte as a sum X +... + X of idicators, { if letter i is placed i evelope i, X i = 0 otherwise. The X i are depedet o each other. For example, symmetry implies that p i = P{X i = } =/ for each i ad P{X i = X = X 2 =...= X i = } = i + We could elimiate the depedece by relaxig the requiremet of oly oe letter per evelope. The umber of letters placed i the correct evelope possibly together with other, icorrect letters) would the have a Bi, /) distributio, which is approximated by Poisso) if is large. We ca get some supportig evidece for S havig somethig close to a Poisso) distributio uder the origial assumptio oe letter per evelope) by calculatig some momets. ES = EX i = P{X i = } = i ) ad ES 2 = E X 2 +...+ X 2 + 2 ) X i X j i< j ) = EX 2 + 2 EX X 2 by symmetry 2 = P{X = }+ 2 )P{X =, X 2 = } = ) + 2 ) ) = 2. Thus vars) = ES 2 ES) 2 =. Compare with Example <8.2>, which gives EY = ad vary ) = for a Y distributed Poisso). Statistics 24: 8 October 2005 C8-4 c David Pollard
DP: check result ad Feller citatio Usig the method of iclusio ad exclusio, it is possible Feller 968, Chapter 4) to calculate the exact distributio of the umber of correct matches, <8.4> P{S = k} =! + 2! ) 3!...± for k = 0,,...,. k)! For fixed k, as the probability coverges to + 2! 3! )... = e = Q {k}, which is the probability that Y = k if Y has a Poisso) distributio. The Che-Stei method is also effective i this problem. I claim that it is ituitively clear although a rigorous proof might be tricky) that the X i s are positively associated: P{S i k X i = } P{S i k X i = 0} for each i ad each k N 0. I feel that if X i =, the it is more likely for the other letters to fid their matchig evelopes tha if X i = 0, which makes thigs harder by fillig oe of the evelopes with the icorrect letter i. We therefore have max P{S A} Q A) 2 i= A p2 i + vars) = 2/. As gets large, the distributio of S does get close to the Poisso) i the strog, total variatio sese. However, it is possible see Barbour et al. 992, page 73) to get a better boud by workig directly from <8.4> Refereces Barbour, A. D., Holst, L. & Jaso, S. 992), Poisso Approximatio, Oxford Uiversity Press. Feller, W. 968), A Itroductio to Probability Theory ad Its Applicatios, Vol., third ed, Wiley, New York. Appedix: The Che-Stei method for the matchig problem You might actually fid the argumet leadig to the fial boud of Example <8.3> more elighteig tha the codesed expositio that follows. I ay case, you ca safely stop readig this chapter right ow without sufferig major probabilistic deprivatio. You were wared. Cosider oce more the matchig problem described i Example <8.3>. Use the Che- Stei method to establish the approximatio P{S = k} e for k = 0,, 2,... The startig poit is a curious coectio betwee the Poisso) ad the fuctio g ) defied by g0) = 0 ad g j) = 0 e t t j dt for j =, 2,... Notice that 0 g j) for all j. Also, itegratio by parts shows that g j + ) = jg j) e for j =, 2,... ad direct calculatio gives g) = e More succictly, <8.5> g j + ) jg j) = { j = 0} e for j = 0,,... Statistics 24: 8 October 2005 C8-5 c David Pollard
Actually the defiitio of g0) has o effect o the validity of the assertio whe j = 0; you could give g0) ay value you liked. Suppose Y has a Poisso) distributio. Substitute Y for j i <8.5>, the take expectatios to get E gy + ) YgY )) = E{Y = 0} e = P{Y = 0} e = 0. A similar calculatio with S i place of Y gives <8.6> P{S = 0} e = E gs + ) SgS)). If we ca show that the right-had side is close to zero the we will have P{S = 0} e, which is the desired Poisso approximatio for P{S = k} whe k = 0. A simple symmetry argumet will the give the approximatio for other k values. There is a beautiful probabilistic trick for approximatig the right-had side of <8.6>. Write the SgS) cotributio as <8.7> ESgS) = E X i gs) = EX i gs) = EX gs) i= The trick cosists of a special two-step method for allocatig letters at radom to evelopes, which iitially gives letter a special role. ) Put letter i evelope, the allocate letters 2,..., to evelopes 2,..., i radom order, oe letter per evelope. Write + Z for the total umber of matches of letters to correct evelopes. The comes from the forced matchig of letter ad evelope.) Write Y j for the letter that goes ito evelope j. Notice that EZ =, as show i Example <8.3>. 2) Choose a evelope R at radom probability / for each evelope), the swap letter with the letter i the chose evelope. Notice that X is idepedet of Z, because of step 2. Ideed, P{X = Z = k} =P{R = Z = k} =/ for each k. Notice also that { + Z if R = S = Z if R 2 ad Y R = R Z if R 2 ad Y R R Thus P{S Z Z = k} =P{R = }+ j 2 P{R = j, Y j = j Z = k} = + j 2 j = j Z = k} = k + ad P{S Z} = k + P{Z = k} = EZ + = 2. k That is, the costructio gives S = Z with high probability. From the fact that whe X = that is, R = ) we have S = Z +, deduce that <8.8> X gs) = X g + Z) i= Statistics 24: 8 October 2005 C8-6 c David Pollard
The same equality holds trivially whe X = 0. Take expectatios. The argue that ESgS) = EX gs) by <8.7> = EX g + Z) by <8.8> = EX Eg + Z) by idepedece of X ad Z = Eg + Z) Thus the right-had side of <8.6> equals E gs + ) gz + )). O the evet {S = Z} the two terms cacel; o the evet {S Z}, the differece gs + ) gz + ) lies betwee ± because 0 g j) for j =, 2,... Combiig these two cotributios, we get P gs + ) gz + )) P{S Z} 2 ad <8.9> P{S = 0} e = P gs + ) SgS)) 2/. The exact expressio for P{S = 0} from <8.4> shows that 2/ greatly overestimates the error of approximatio, but at least it teds to zero as gets large. After all that work to justify the Poisso approximatio to P{S = k} for k = 0, you might be forgive for shrikig from the prospect of extedig the approximatio to larger k. Fear ot! The worst is over. For k =, 2,... the evet {S = k} specifies exactly k matches. There are k) choices for the matchig evelopes. By symmetry, the probability of matches oly i a particular set of k evelopes is the same for each specific choice of the set of k evelopes. It follows that ) P{S = k} = P{evelopes,...,k match; the rest do t} k The probability of gettig matches i evelopes,...,k equals )... k + ). The coditioal probability P{evelopes k +,..., do t match evelopes,...,k match} is equal to the probability of zero matches whe k letters are placed at radom ito their evelopes. If is much larger tha k, this probability is close to e, as show above. Thus! P{S = k} k)! More formally, for each fixed k, ) 2)... k + ) e = e. P{S = k} e where Y has the Poisso) distributio. = P{Y = k} as, Statistics 24: 8 October 2005 C8-7 c David Pollard