Discrete probability distributions

Discrete probability distributios I the chapter o probability we used the classical method to calculate the probability of various values of a radom variable. I some cases, however, we may be able to develop a mathematical fuctio for calculatig such probabilities based o the defiitio of the radom variable. I geeral, a radom variable may be discrete or cotiuous. A radom variable is discrete whe it ca assume coutable values, ad is cotiuous whe it ca take o values o a cotiuous scale. Usually, if you cout it, it is discrete, ad if you measure, it is cotiuous. I this chapter, we will develop mathematical fuctios to calculate probabilities associated with discrete radom variables. Accordigly, such a fuctio is called probability fuctio, probability mass fuctio, or probability distributio. These fuctios will be directly drive from the defiitio of the radom variable. For coveiece, we shall deote a radom variable by a capital letter, while the correspodig small letter will be used to deote a specific value of the radom variable. For eample, X may represet the umber of studets passig a course. the, will deote a specific value of X. The probability fuctio will be deoted by, where will assig P [X = ]. Probability fuctios must satisfy certai coditios that are directly related to the defiitio of a probability. These coditios are:. ú 0. (Probabilities must be positive).. =. All possible outcomes (sample space) must have a probability of. all ' s Associated with each radom variable or distributio is a cumulative distributio fuctio, F(, which assigs cumulative probabilities. F( = t) for < t <. I the followig sectio, we will defie some stadard discrete radom variables ad develop their probability distributios. Stadard Discrete Radom Variables The stadard discrete probability distributios preseted here are all based o the Beroulli trial. The Beroulli trial is ay eperimet or activity with two possible outcomes. For coveiece, we will call these outcomes success ad failure; with success beig the desired outcome. For eample, whe shootig at a target, oe may hit or miss. I this case success represets the outcome of hittig the target, while failure represets missig it. We will assume that the probability of success i the Beroulli trial is p, ad the probability of failure is q. Obviously, p+q=, or q=-p. Now, we may defie a radom variable X as the umber of successes i a Beroulli trial. Obviously, X is a discrete radom variable with two possible values 0 ad. We may defie its probability fuctio as: = p q, = 0,. is called the Beroulli distributio ad it clearly assigs the right probabilities ad satisfies the two coditios above (i.e., is always positive ad p ( = p + q = ). The Beroulli distributio is rather very simple, but is a foudatio for 0, may powerful distributios. t Discrete Probability Distributios of 8 L. K. Gaafar

The Biomial Distributio If we assume that we have a series of idepedet Beroulli trials, we may be iterested i the umber of successes i these trials. The word idepedet meas that the probability of success does ot chage from oe trial to aother. The umber of successes is clearly a discrete radom variable (X) ad i a series of idepedet Beroulli trials, it may assume ay discrete value from 0 to (i.e., 0,,,, ). For a give, we must have successes ad, accordigly, - failures. I other words we eed success ad success ad success etc. times, failure ad failure ad failure etc. - times. I simpler terms we are lookig for the itersectio of successes ad, accordigly, - failures. Sice these are all idepedet evets, they will happe with a probability p q. Still, the successes may be located i ways withi the trials. Therefore, = p q, 0,,...,. The figure to the right eplais how probabilities are calculated for successes i 4 trials. successes ad failures have a probability of p q. This outcome may occur i 6 differet ways leadig to: P(X=4) = 6 p q 4 4. Usig our probability fuctio, p ( ) = p q = 6 p q. Because of the form of the probability fuctio, it is called the biomial distributio. Notice that the Beroulli distributio is a special case of the biomial whe =. The biomial distributio satisfies the two coditios above. It is always positive ad p ( = p q = ( p + q) = () =. A iterestig questio is o average, how may successes are epected i such situatio? This is called the epected value of X ad is deoted by E[X]. E[X] is the log-term average of X, or simply the mea of ad may also be deoted by µ. If we made idepedet Beroulli trials ad recorded the umber of successes ad repeated this process so may times, µ or E[X]would be the average of the recorded successes. µ or E[X], where X is ay discrete radom variable, may easily be calculated as µ = E[ X ] =, which simply weighs each value of X by its probability to get the average. I geeral, all ' s E[ g( ] = g(, where g( is ay fuctio of all ' s X ad E[g(] is the log term average of that fuctio. The followig rules follow from the defiitio of E[X] ad E[g(]:. E[c] = c, where c is a costat.. E[cX] = ce[x]. Discrete Probability Distributios of 8 L. K. Gaafar

3. E[X! Y] = E[X]! E[Y], X ad Y are radom variables. For the Beroulli trial, µ = E[ X ] = = 0( q) + ( p) = p 0 For the biomial distributio, µ = E[ X ] = = p q = p. This result for the biomial distributio could easily be obtaied by observig that the biomial is the summatio of idepedet Beroulli distributios, ad from the third rule above, its mea would be the summatio of the meas, i.e., p. µ or E[X] is a measure of ceter. Aother importat measure is the variace, which measures how much the radom variable varies aroud its mea. The variace is the average square distace from the mea ad is deoted by σ. From its defiitio, σ = E[(X-µ) ] = E[(X- Ε[X]) ]. From the epectatio rules above, σ = E[X ] - µ or σ = E[X ] (Ε[X]). The followig rules follow from the defiitio of V[X]:. V[c] = 0, c is a costat. V[cX] = c V[X] 3. V[X! Y] = V[X] + V[Y], X ad Y are Idepedet radom variables For the Beroulli distributio, E[ X ] = = 0 ( q) + ( p) = p, V[X] = σ = E[X ] 0 (Ε[X]) = p p = -p) = pq. Agai, sice the biomial is the summatio of idepedet Beroulli distributios, for the biomial distributio σ = pq. Eample: The probability that a patiet survives a critical heart operatio is 0.9. What is the probability that eactly 5 out of the et 7 patiets havig this operatio survive? Solutio: We may cosider each operatio as a Beroulli trial with two possible outcomes: survival (p=0.9) ad death (q=0.). The 7 () operatios are the a series of idepedet Beroulli trials where we are iterested i havig 5 ( successes, leadig to the biomial distributio. We are assumig that the trials are idepedet because they are performed o differet patiets. However, if these operatios are performed by the same team, their eperiece my lead to chagig p from operatio (trial) to operatio. Nevertheless, we will assume idepedece (costat p). 7 5 Usig the biomial distributio, P[X=5] = p (5) = (0.9) (0.) = ()(0.505)(.0) = 0. 4 5 Discrete Probability Distributios 3 of 8 L. K. Gaafar

The Negative biomial (Pascal) Distributio With a series of idepedet Beroulli trials we may be iterested i the umber of trials (X) to obtai a specific umber of successes (r). X is clearly a radom variable that could be aywhere from r to. I other words, we may be very lucky ad succeed i every trial, ad thus get the r successes i r trials. O the other had, we may be ulucky ad cotiue to try without gettig the r successes. I geeral however, our last trial will always be a success (the r th success), otherwise we would have to cotiue, while the remaiig r- successes could be located aywhere i the remaiig - trials i ways. Each oe r of these locatios has a probability of p r q -r ad thus the overall probability of eedig trials to obtai r successes is r r =, =, +, +,...,. p q r r r r The figure to the right eplais this developmet for the case of eedig 4 trials to get successes. Notice how the last trial is always the last success. The defiitio of this radom variable is the opposite of the biomial case. Here the radom variable is the umber of trials to get specific successes, while i the biomial case, the radom variable is the umber of success i a specific umber of trials. Therefore, the resultig distributio is called the egative biomial. It is also sometimes called the Pascal distributio after the perso who developed it. r rq The mea ad variace of the Pascal distributio are give by: µ =, σ = p p A special case of the Pascal distributio results whe we are oly iterested i obtaiig oe success, i.e., r =. I this case, simplifies to = =, =,, 3,...,. p q pq This ew radom variable, the umber of successes to obtai ONE success, is called the geometric radom variable, ad the resultig probability fuctio is called the geometric distributio. The mea ad variace of the Geometric distributio are give by: q µ =, σ = p p Observe that the Pascal radom variable may be viewed as the summatio of r idepedet Geometric radom variables. Eample: A scietist ioculates several mice, oe at a time, with a disease germ util she fids that have cotracted the disease. If the probability of cotractig the disease is 0.5, what is the probability that: a. 8 mice are required? b. Less tha 5 mice are required? Discrete Probability Distributios 4 of 8 L. K. Gaafar

Solutio a. We are lookig to ifect mice ( successes) ad we are iterested i eedig to try 8 times to achieve this. This satisfies the Pascal distributio with r =, = 8. Therefore, 8 8 7 6 P(8) = p (8) = p q = (0.5) (0.85) = (7)(0.05)(0.3775) = 0.0594. b. Less tha 5 may be satisfied by, 3, or 4, i.e., we are lookig for P( 4), or F(4). 4 4 F(4) = = p q = p + p q + p q = 0.05 + 0.09+ 0.035= 0.074. Eample: The probability that a studet pilot passes the writte test for his private pilot s licese is 0.7. Fid the probability that a perso passes the test: a. o the third try b. before the fourth try Solutio Sice the perso eeds to pass the test oly oce, we are lookig for the umber of trials to obtai oe success, i.e. the Geometric distributio. Notice that, i usig the Geometric distributio, we are implicitly assumig that these trials are idepedet. I reality, however, oe would epect a higher probability of passig the test o the secod trial due to the learig effects. Nevertheless we will proceed with the idepedece assumptio assumig that these effects are miimal. a. = 3, 3) = p q = (0.7) (0.3) = 0.063. Notice that the probability is low because the perso is much more likely to pass it o the first trial (0.7), or the secod (0.). b. Before the fourth trial may be satisfied by the first or the secod or the third, i.e., we are lookig for P( 3), or F(3). 3 F(3) = = pq 3 The Poisso Distributio = p + pq + pq = 0.7 + 0.+ 0.063= 0.973 A very useful radom variable ad probability distributio results from the Poisso process i which we cout discrete occurreces i a cotiuous iterval. For eample, we may be iterested i coutig the umber of customers arrivig to a service facility i a give time iterval, or coutig the umber of white blood cells i a drop of blood, or coutig the umber of traffic accidets at a particular itersectio over a week. We will assume that the rate of occurrece per uit time or measuremet uit is a kow costat, ad will deote by λ. To develop the Poisso distributio, we make the followig assumptios:. The umber of outcomes i a give iterval or regio is idepedet of the umber that occurs at ay other disjoit iterval or regio. That is to say that the Poisso process has o memory. Discrete Probability Distributios 5 of 8 L. K. Gaafar

. The probability that a sigle outcome will occur durig a very short time iterval or a very small regio (t) is proportioal to t. 3. The probability that more tha oe outcome will occur i t is egligible. The Poisso distributio may be developed from the biomial distributio. For eample, cosider the case of coutig the umber of arrivals i a time iterval (t). If the rate of arrival is λ per time uit, the the average umber of arrivals i t is µ=λt. If we divide t to a huge umber () of ifiitesimal itervals such that oly oe arrival ca occur i each ifiitesimal iterval with a very small probability (p), we may cosider each such iterval as a Beroulli trial with oly two possible outcomes: arrival (success) ad o arrival (failure). Now, the series of ifiitesimal itervals is a series of () Beroulli trials ad we are iterested i the umber of arrival (successes) X; i.e., X is biomially distributed with, p 0, µ = p, X = 0,,,,. I the biomial distributio, if we take the limit as ad substitute p = µ/, we get: µ µ ( )...( + µ µ lim = lim = lim! = lim ( µ However, lim( )...( ) =, lim µ lim )...( µ µ )! = lim{[ + ] ( ) / µ / µ } µ µ = e µ e µ =, =,,...,.! The last equatio is the distributio fuctio of the Poisso radom variable that satisfies the three coditios above. The mea of the Poisso distributio is µ ad the variace is µ = λt. Eample: The average umber of oil takers arrivig each day at a certai port is kow to be per hour. The port ca hadle at most takers per 8-hour day. What is the probability o a give day takers will have to be set away? Solutio: Let X represet the umber of arrivals per day. X is Poisso distributed with a mea µ = λt =*8 = 8 takers/day. µ 8 e µ e 8 P(X > ) =-P(X ñ ) = F() = = = = 0.936= 0. 064 0 0!! The cumulative results were obtaied usig Microsoft Ecel, which has preprogrammed fuctios for all distributios covered here. µ =, Discrete Probability Distributios 6 of 8 L. K. Gaafar

The Hypergeometric Distributio All distributios described above were based o a series of idepedet Beroulli trials. A situatio i which we obtai a series of depedet Beroulli trials comes about i samplig without replacemet. For eample, assume that we are iterested i acceptig a lot of 00 parts if it cotais 5 or fewer defectives. We decide to take a sample of 0 parts, oe by oe, without replacemet ad cout the umber of defectives i the sample, ad use that umber to judge the quality of the lot. This ispectio is a series of Beroulli trials each with two possible outcomes: defective ad ot defective. If the lot cotais 5 defectives, the the probability that the first part o the sample will be defective is 5 = 0. 05. Now, the probability of the secod part beig 00 defective depeds o the results of the first part. If the first part is defective, we are left with 99 parts out of which oly 4 are defective ad accordigly, the probability of the secod part beig defective is 4 = 0. 0404. If, however, the first part is ot defective, we are left with 99 parts out 99 of which 5 are defective, ad accordigly, the probability of the secod part beig defective is 5 = 0.0505. Hece, we have a series of Beroulli trials i which the probability of success i 99 each trial depeds o the results of all proceedig trials. We will ow develop the probability fuctio for this radom variable, X, defied as the umber of items with a give characteristic i a sample take from a lot N that has k items possessig the desired characteristic. The diagram to the right depicts this situatio. The items i the sample must come from the k items i the lot, while the remaiig items i the sample (- will have to come from the remaiig items i the lot (N-k). Usig the classical approach, umber of ways of choosig outof k ad outof N k = Number of ways outof N k N k =, = ma(0, N + k)...mi( k, ). N The rage of X starts from either 0 or - (N-K), whichever is larger, ad eds at either or k, whichever is smaller. X is called a hypergeometric radom variable ad is called the hypergeometric distributio. The mea ad variace of the hypergeometric distributio are give by k N k = = k µ, σ... N N N N N Notice that if we replace k/n by p, we get µ = p, ad σ =. pq, which are the mea ad N N the variace of the biomial distributio if the term is close to oe, which is the case N Discrete Probability Distributios 7 of 8 L. K. Gaafar

whe is small relative to N. Cosequetly, whe is small relative to N, the biomial distributio with p= k/n may be used to approimate the hypergeometric distributio. Eample: A committee of 5 members is formed by radomly choosig from a class of 40 studets i which oly 3 are freshme. What is the probability that the committee will iclude: a. eactly oe freshma? b. at most oe freshma? Solutio: a. Usig the hypergeometric distributio with = 5, N = 40, k = 3, =, we fid the probability of oe freshma to be 3 37 4 (3)(66045) p ( ) = = = 0.30 40 658008 5 b. At most is satisfied by 0 or. 3 37 0 5 ()(435897) p ( 0) = = = 0.664 40 658008 5 P(X ñ ) = X = 0) + P(X = ) = 0.664 + 0.30 = 0.9635, a very high probability as we epect the majority of committee members to be o-freshme. Discrete Probability Distributios 8 of 8 L. K. Gaafar