Recursive Computatios for Discrete Radom Variables Ofte times, oe sees a problem stated like this: A machie is shut dow for repairs is a radom sample of 100 items selected from the dail output of the machie reveals at least 15% defectives. If o a give da the machie is producig ol 10% defectives, what is the probabilit that it will be shut dow? Although there is at least some level of approximatio ivolved, it would be appropriate to cosider this a istace of the biomial distributio with = 100 ad p = 0.1, ad we are asked for the probabilit that Y 15, which is a cumulative (or reverse cumulative probabilit uestio. I fact, I foud this problem i a later sectio of our textbook, a sectio devoted to approximatig the biomial distributio b the ormal distributio. We re expected to work out how ma stadard deviatios awa from the mea that is ad the look up the approximate cumulative probabilit o a table of the ormal distributio. Aother possible wa to approach this is to fid a table of cumulative probabilities for biomial distributios. As it happes, the edpapers of our textbook iclude a ormal probabilit table ad a umber of tables of biomial distributios. But that is t reall the be the best wa to do this. We ca t hope to have eough tables of biomial distributios to cover all of the situatios that we might ecouter, ad we ofte do t wat to tie ourselves dow to a bulk collectio of tables. Ad while the ormal approximatio ca be uite useful, it is still a approximatio ad it becomes icreasigl troublesome as a approximatio the further we move awa from the mea. For modest values of, ad with modest computatioal meas available to ourselves, we should be aswerig such uestios directl b computig the probabilit fuctio p( = P (Y = for all of the values of from 0 to, ad the addig up the oes that we eed. The most efficiet computatio will be recursive, ad to compute the biomial probabilities, we do t ever have to directl compute a biomial coefficiet. Oce we get started, we ca write P (Y = i terms of P (Y = 1 b multiplig b two umbers ad dividig b two umbers. Give that, we ca uickl ad efficietl compute the umbers we eed. The recursive procedures I m describig ca be used with a simple had calculator ad a piece of paper o which we write dow our results. But a spreadsheet program (for istace, Excel has a particular affiit for recursive computatios, ad this descriptio has bee writte with a spreadsheet i mid, ad is accompaied b a Excel file that executes these ideas. Of course, the same ideas ca be executed i a programmig laguage or i a eviromet such as MATLAB. I have ot pursued those lies but the reader is certail free to do so. Also, oce we have foud a wa to compute the biomial distributio, we see that we ca use the exact same ideas for the other amed distributios i the curret chapter of our textbook: the Poisso distributio, the hpergeometric distributio, the geometric distributio, ad the egative biomial distributio. Postscript to the problem stated i the first paragraph: usig the ormal approximatio to the biomial ad a table of ormal probabilities gives us a probabilit that the machie will be shut dow of about 0.0668. The tables of biomial probabilities i our book do t iclude a = 100 cases, so we ca t use those. But we ca use the recursive computatio give i these otes (usig the accompaig Excel file to fid that for = 100, p = 0.1, we get P (Y 14 0.927427, which b complemetatio gives us P (Y 15 0.072573. This is the accurate computatio ad the ormal approximatio is less accurate.. 1. The biomial distributio The biomial distributio with trials ad with p the probabilit of success o a oe trial 1
has probabilit fuctio P (Y = = to P (Y = = use both here ad i later sectios: ( = ( p (1 p. We will write = 1 p ad shorte that p. We eed a computatio cocerig biomial coefficiets that we will = + 1 1 ( 1( 2 ( + 1 We ca write. Leavig off the last factor i both umerator ( 1 2 2 ad deomiator will give us, hece euatio (1. 1 That gives us our recursive step. For 1, P (Y = P (Y = 1 = p 1 p 1 +1 = + 1 p That s eough for the recursive scheme: Biomial distributio iitiatio: Biomial distributio recursio: (1 P (Y = 0 = (2 For 1, P (Y = = P (Y = 1 + 1 If we cotiue this scheme too far, which is to sa for >, o harm comes of it. Note the umerator of the fractio i the recursive step, which gives P (Y = + 1 = 0 ad thus P (Y = = 0 b recursio for all >. Wasteful, perhaps, but it does t cause a trouble. We ca compute the cumulative distributio fuctio P (Y just b keepig a ruig sum of the probabilities we are computig. Are there limitatios to this? Yes, there are some practical limitatios. Oe is simpl imposed b storage space. If we tr to do these computatios i a spreadsheet i the particular wa I set it up, the we eed a spreadsheet file with at least rows. We ca probabl maage a few thousad rows, but beod that we re straiig practical file sizes. A more serious limitatio is umerical, ad arises i the iitiatio step. For large, ca be a ver small umber. The dager is that could uderflow, which meas that it could become too small to be represeted i the floatig poit umber sstem implemeted i the particular calculator, spreadsheet, or programmig laguage that we are usig. For the umbers as implemeted i Excel (ad i ma other places: look up IEEE 754 stadard, biar64, ofte called double precisio if ou wat more iformatio, if p = = 1 2, the wo t uderflow util gets to be a little greater tha 1000. If p > 1 2, the would uderflow for smaller values of. But we ca get aroud that b iterchagig the roles of p ad, redefiig success as failure ad failure as success. So i most cases, we ca make this work for < 1000. We could use larger with small eough p, but evetuall, as p gets ver small, a differet umerical problem arises: lack of accurac (loss of sigificat digits i computig = 1 p, leadig to lack of accurac i computig. If that should happe to us, we would best be advised to approximate the resultig biomial radom variable b a Poisso radom variable with the same mea. 2 p (3
2. The Poisso distributio A Poisso radom variable with mea λ has probabilit fuctio P (Y = = e λ λ. This! leads to a particularl simple form for the recursio: We take advatage of this as follows: Poisso distributio iitiatio: Poisso distributio recursio: For 1, P (Y = P (Y = 1 = e λλ! e λ λ 1 ( 1! = λ P (Y = 0 = e λ (4 P (Y = = P (Y = 1 λ Oe of the paths to the Poisso distributio is as the limit of a famil of biomial distributios ad ad p 0 i such a wa that p = λ remais costat. For the biomial distributio, we ca write the uotiet of P (Y = ad P (Y = 1 as follows: But as p 0, that teds to λ. + 1 p = p p( 1 (1 p This scheme ca also suffer from uderflow i the iitiatio step, but e λ does t uderflow util λ > 700, ad for such a large λ, we d eed a few thousad rows of the spreadsheet awa. 3. The hpergeometric distributio The setup for the hpergeometric distributio is the most complicated of our commo discrete distributios. We have a pool of N objects, of which r have a certai propert ad N r do ot have that propert. We choose, without replacemet, a sample of of these objects, ad is the cout of those objects i the sample with the specified propert. P (Y = is positive whe is a iteger such that max(0, r + N mi(, r. We ca draw parallels betwee the hpergeometric distributio ad the biomial distributio. plas the same role i both places. p correspods to r, the proportio of the pool with the specified propert, ad correspods to N N r, the proportio of the pool without the specified propert. If we keep costat ad let N N while keepig r = pn, the the hpergeometric would ted to the biomial. ( r N r ( For max(0, r + N mi(, r, the probabilit fuctio is P (Y = = ( N. We ca work out the appropriate uotiet with two uses of euatio (1: ( r N r P (Y = ( P (Y = 1 = ( N r = r + 1 + 1 N r + = + 1 ( r 1 +1 3 r + 1 N r + (5
We recogize the factor of + 1 as beig the same as oe which appeared i the recursive formula for the biomial distributio. I will leave it to the reader to show that the factor of r + 1 N r + teds to p as N with r = pn. Our recursive formula for the hpergeometric distributio has the most complicated iitiatio of all of these. It is also the ol case i which, i buildig the spreadsheet, ( I cheated ad used the built-i formula for biomial coefficiets. The Excel stax for is COMBIN(, k. k Hpergeometric distributio iitiatio: If r + N 0, the P (Y = 0 = else if r + N > 0, the P (Y = r + N = ( N r ( N (6 ( r N ( N (7 Hpergeometric distributio recursio: For > max(0, r + N, P (Y = = P (Y = 1 + 1 r + 1 N r + There are limitatios o this, i that the biomial coefficiets used i the iitiatio will overflow for large eough values of the parameters. I assume that this is somewhat more sesitive to overflow or uderflow tha a of the other distributios i this ote, but I have ot sstematicall explored how hard we ca push it. 4. The geometric distributio Icludig the geometric distributio i this ote is a little bit sill, as we ca compute the probabilit fuctio at a poit ad the cumulative probabilit at a poit directl, with o eed for recursio. We have a trial for with the probabilit of success (defied the same wa as i the biomial distributio is p. We repeat idepedet trials util success occurs ad the radom variable is defied as the umber of turs it took for that to happe. We must have 1, sice we must take at least oe tur. The probabilit fuctio is P (Y = = p 1 ad as for cumulative probabilit, we have P (Y > =, so that P (Y = 1. So we do t eed recursive computatios, but just for completeess, I ll iclude them awa: Geometric distributio iitiatio: Geometric distributio recursio: (8 P (Y = 1 = p (9 For 2, P (Y = = P (Y = 1 (10 5. The egative biomial distributio The egative biomial distributio has the same setup as the geometric distributio, ol this time, we cout the umber of trials eeded to get r successes, for some r 1. If r = 1, the this is 4
precisel the geometric distributio. Note that we must have r to have a positive probabilit. The probabilit fuctio for the egative biomial is ( 1 P (Y = = p r r. r 1 I will leave the details of determiig the recursive formula to the reader, ad will uote the results: Negative biomial distributio iitiatio: Negative biomial distributio recursio: P (Y = r = p r (11 For r + 1, P (Y = = P (Y = 1 1 r (12 6. Implemetatio i Excel I have icluded a Excel spreadsheet implemetig these ideas. It s ver much a bare-boes file, with ver little i the wa of formattig ad ot ma features. There are five sheets to this file, oe for each of our amed distributios. You should be able to avigate via the tabs at the bottom. Each sheet has oe to three cells set aside for ou to eter the parameters of the particular distributio. Those cells have bee outlied, ad are the ol cells outlied. Everthig else is give b formulas that deped o those cells i some wa. (I have pre-populated each sheet with parameters; the all happe to have mea 40. But the idea is that ou should replace those values with other values that ou happe to be iterested i. The, the probabilit fuctio P (Y = ad the cumulative distributio fuctio P (Y are tabulated i a block of cells below. Possible values of are i colum A, the probabilit fuctio is i colum B, ad the cumulative distributio fuctio is i colum C. I ve give ou about 200 rows of that; if ou eed more rows, make a selectio that icludes the bottom of the table ad exteds dowwards as far as ou eed it to, ad the issue a Fill Dow commad. The formulas i those cells are precisel the formulas i the umbered euatios above. To uderstad them, ou ol eed to kow the distictio betwee a relative address ad a fixed address i Excel. I did t iclude a graphs or charts. Feel free to add those ourself, as the meas to do that are available i Excel. Ad ou ca make other chages. Do ou wat more digits of accurac? The icrease the width of the appropriate colums ad chage the umber format. Do ou wat reversed cumulative probabilities? The add a colum subtractig the cumulative probabilities from 1. Ad so o. Also, as I metioed, all of this could be doe i a programmig laguage or other procedurebased eviromet (icludig Excel macros. If that is the world ou are familiar with, ou should be able to make the traslatio for ourself. For m ow part, I wat to show ou how we ca implemet all of this i a spreadsheet without macros. 5