CS 70 Discrete Mathematics for CS Sprig 2005 Clacy/Wager Notes 21 Some Importat Distributios Questio: A biased coi with Heads probability p is tossed repeatedly util the first Head appears. What is the expected umber of tosses? As always, our first step i aswerig the questio must be to defie the sample space Ω. A momet s thought tells us that Ω = {H,T H,T T H,T T T H,...}, i.e., Ω cosists of all sequeces over the alphabet {H,T } that ed with H ad cotai o other H s. This is our first example of a ifiite sample space (though it is still discrete). What is the probability of a sample poit, say ω = T T H? Sice successive coi tosses are idepedet (this is implicit i the statemet of the problem), we have Pr[T T H] = (1 p) (1 p) p = (1 p) 2 p. Ad geerally, for ay sequece ω Ω of legth i, we have Pr[ω] = (1 p) i 1 p. To be sure everythig is cosistet, we should check that the probabilities of all the sample poits add up to 1. Sice there is exactly oe sequece of each legth i 1 i Ω, we have ω Ω Pr[ω] = (1 p) i 1 p = p i=0 (1 p) i = p 1 1 (1 p) = 1, as expected. [I the secod-last step here, we used the formula for summig a geometric series.] Now let the radom variable X deote the umber of tosses i our sequece (i.e., X(ω) is the legth of ω). Our goal is to compute E(X). Despite the fact that X couts somethig, there s o obvious way to write it as a sum of simple r.v. s as we did i may examples i the last lecture. (Try it!) Istead, let s just dive i ad try a direct computatio. Note that the distributio of X is quite simple: So from the defiitio of expectatio we have Pr[X = i] = (1 p) i 1 p for i = 1,2,3,... E(X) = (1 p) + (2 (1 p)p) + (3 (1 p) 2 p) + = p i(1 p) i 1. This series is a bled of a arithmetic series (the i part) ad a geometric series (the (1 p) i 1 part). There are several ways to sum it. Here is oe way, usig a auxiliary trick (give i the followig Theorem) that is ofte very useful. [Ask your TA about other ways.] Theorem 21.1: Let X be a radom variable that takes o oly o-egative iteger values. The E(X) = Pr[X i]. CS 70, Sprig 2005, Notes 21 1
Proof: For otatioal coveiece, let s write p i = Pr[X = i], for i = 0,1,2,... From the defiitio of expectatio, we have E(X) = (0 p 0 ) + (1 p 1 ) + (2 p 2 ) + (3 p 3 ) + (4 p 4 ) + = p 1 + (p 2 + p 2 ) + (p 3 + p 3 + p 3 ) + (p 4 + p 4 + p 4 + p 4 ) + = (p 1 + p 2 + p 3 + p 4 + ) + (p 2 + p 3 + p 4 + ) + (p 3 + p 4 + ) + (p 4 + ) + = Pr[X 1] + Pr[X 2] + Pr[X 3] + Pr[X 4] +. I the third lie, we have regrouped the terms ito coveiet ifiite sums. You should check that you uderstad how the fourth lie follows from the third. [Note that our... otatio here is a little iformal, but the meaig should be clear. We could give a more rigorous, but less clear proof usig iductio.] Usig Theorem 21.1, it is easy to compute E(X). The key observatio is that, for our coi-tossig r.v. X, Pr[X i] = (1 p) i 1. (1) Why is this? Well, the evet X i meas that at least i tosses are required. This is exactly equivalet to sayig that the first i 1 tosses are all Tails. Ad the probability of this evet is precisely (1 p) i 1. Now, pluggig equatio (1) ito Theorem 21.1, we get E(X) = Pr[X i] = (1 p) i 1 = 1 1 (1 p) = 1 p. So, the expected umber of tosses of a biased coi util the first Head appears is 1 p. For a fair coi, the expected umber of tosses is 2. The geometric distributio The distributio of the radom variable X that couts the umber of coi tosses util the first Head appears has a special ame: it is called the geometric distributio with parameter p (where p is the probability that the coi comes up Heads o each toss). Defiitio 21.1 (geometric distributio): A radom variable X for which Pr[X = i] = (1 p) i 1 p for i = 1,2,3,... is said to have the geometric distributio with parameter p. If we plot the distributio of X (i.e., the values Pr[X = i] agaist i) we get a curve that decreases mootoically by a factor of 1 p at each step. For posterity, let s record two importat facts we ve leared about the geometric distributio: Theorem 21.2: For a radom variable X havig the geometric distributio with parameter p, 1. E(X) = 1 p ; ad 2. Pr[X i] = (1 p) i 1 for i = 1,2,... The geometric distributio occurs very ofte i applicatios because frequetly we are iterested i how log we have to wait before a certai evet happes: how may rus before the system fails, how may shots before oe is o target, how may poll samples before we fid a Democrat, etc. The ext sectio discusses a rather more ivolved applicatio, which is importat i its ow right. CS 70, Sprig 2005, Notes 21 2
The Coupo Collector s Problem Questio: We are tryig to collect a set of differet baseball cards. We get the cards by buyig boxes of cereal: each box cotais exactly oe card, ad it is equally likely to be ay of the cards. How may boxes do we eed to buy util we have collected at least oe copy of every card? The sample space here is similar i flavor to that for our previous coi-tossig example, though rather more complicated. It cosists of all sequeces ω over the alphabet {1,2,...,}, such that 1. ω cotais each symbol 1, 2,..., at least oce; ad 2. the fial symbol i ω occurs oly oce. [Check that you uderstad this!] For ay such ω, the probability is just Pr[ω] = 1 i, where i is the legth of ω (why?). However, it is very hard to figure out how may sample poits ω are of legth i (try it for the case = 3). So we will have a hard time figurig out the distributio of the radom variable X, which is the legth of the sequece (i.e., the umber of boxes bought). Fortuately, we ca compute the expectatio E(X) very easily, usig (guess what?) liearity of expectatio, plus the fact we have just leared about the expectatio of the geometric distributio. As usual, we would like to write X = X 1 + X 2 +... + X (2) for suitable simple radom variables X i. But what should the X i be? A atural thig to try is to make X i equal to the umber of boxes we buy while tryig to get the ith ew card (startig immediately after we ve got the (i 1)st ew card). With this defiitio, make sure you believe equatio (2) before proceedig. What does the distributio of X i look like? Well, X 1 is trivial: o matter what happes, we always get a ew card i the first box (sice we have oe to start with). So Pr[X 1 = 1] = 1, ad thus E(X 1 ) = 1. How about X 2? Each time we buy a box, we ll get the same old card with probability 1, ad a ew card with probability 1 1. So we ca thik of buyig boxes as flippig a biased coi with Heads probability p = ; the X 1 is just the umber of tosses util the first Head appears. So X 1 has the geometric distributio with parameter p = 1, ad E(X 2 ) = 1. How about X 3? This is very similar to X 2 except that ow we oly get a ew card with probability 2 (sice there are ow two old oes). So X 3 has the geometric distributio with parameter p = 2, ad E(X 3 ) = 2. Arguig i the same way, we see that, for i = 1,2,...,, X i has the geometric distributio with parameter p = i+1, ad hece that E(X i ) = i + 1. Fially, applyig liearity of expectatio to equatio (2), we get E(X) = E(X i ) = + 1 + + 2 + 1 = 1 i. (3) CS 70, Sprig 2005, Notes 21 3
This is a exact expressio for E(X). We ca obtai a tidier form by otig that the sum i it actually has a very good approximatio 1, amely: 1 l + γ, i where γ = 0.5772... is Euler s costat. Thus the expected umber of cereal boxes eeded to collect cards is about (l + γ). This is a excellet approximatio to the exact formula (3) eve for quite small values of. So for example, for = 100, we expect to buy about 518 boxes. The biomial distributio While we are baptizig distributios, here is aother importat oe. Let X be the umber of Heads i tosses of a biased coi with Heads probability p. Clearly X takes o the values 0,1,...,. Ad, as we saw i a earlier lecture, its distributio is Pr[X = i] = ( ) p i (1 p) i. (4) i Defiitio 21.2 (biomial distributio): A radom variable havig the distributio (4) is said to have the biomial distributio with parameters ad p. Recall from Lecture Notes 20 that the expectatio of a biomial radom variable is E(X) = p. A plot of the biomial distributio (whe is large eough) looks more-or-less bell-shaped, with a sharp peak aroud the expected value p. The Poisso distributio Throw balls ito bis (where is a costat). Let X be the umber of balls that lad i bi 1. The X has the biomial distributio with parameters ad p =, ad its expectatio is E(X) = p =. (Why?) Let s look i more detail at the distributio of X (which is a special case of the biomial distributio, i which the parameter p is of the form ). For coveiece, we ll write p i = Pr[X = i] for i = 0,1,2,... Begiig with p 0, we have ( p 0 = Pr[all balls miss bi 1] = 1 ) e as. So the probability of o balls ladig i bi 1 will be very close to the costat value e whe is large. What about the other p i? Well, we kow from the biomial distributio that p i = ( ) i ( )i (1 ) i. Sice we kow how p 0 behaves, let s look at the ratio p 1 p 0 : p 1 p 0 = (1 ) 1 (1 ) = 1 = as. [Recall that we are assumig is a costat.] So, sice p 0 e, we see that p 1 e as. 1 This is aother of the little tricks you might like to carry aroud i your toolbox. CS 70, Sprig 2005, Notes 21 4
Now let s look at the ratio p 2 p 1 : ( ) p 2 2 ( = )2 (1 ) 2 p 1 ( ) (1 = 1 ) 1 2 1 (1 ) = 1 2 2 as. So p 2 2 2 e as. For each value of i, somethig very similar happes to the ratio p i p i 1 : p i p i 1 = ( ) i ( ) ( ( i 1 )i (1 ) i )i 1 (1 = i + 1 ) i+1 i Puttig this together, we see that, for each fixed value i, = i + 1 i i as. p i i i! e as. [You should check this!] I.e., whe is large compared to i, the probability that exactly i balls fall ito bi 1 is very close to i i! e. This motivates the followig defiitio: Defiitio 21.3 (Poisso distributio): A radom variable X for which Pr[X = i] = i i! e for i = 0,1,2,... (5) is said to have the Poisso distributio with parameter. To make sure this defiitio is valid, we had better check that (5) is i fact a distributio, i.e., that the probabilities sum to 1. We have i i=0 i! e = e i=0 i i! = e e = 1. [I the secod-last step here, we used the Taylor series expasio e x = 1 + x + x2 2! + x3 3! +.] What is the expectatio of a Poisso radom variable X? This is a simple hads-o calculatio, startig from the defiitio of expectatio: E(X) = i=0 i Pr[X = i] = i=0 i i i! e = e i (i 1)! = e i 1 (i 1)! = e e =. So the expectatio of a Poisso r.v. X with parameter is E(X) =. A plot of the Poisso distributio reveals a curve that rises mootoically to a sigle peak ad the decreases mootoically. The peak is as close as possible to the expected value, i.e., at i =. We have see that the Poisso distributio arises as the limit of the umber of balls i bi 1 whe balls are throw ito bis. I other words, it is the limit of the biomial distributio with parameters ad CS 70, Sprig 2005, Notes 21 5
p = as, with beig a fixed costat. The Poisso distributio is also a very widely accepted model for so-called rare evets, such as miscoected phoe calls, radioactive emissios, crossovers i chromosomes, etc. This model is appropriate wheever the evets ca be assumed to occur radomly with some costat desity i a cotiuous regio (of time or space), such that evets i disjoit subregios are idepedet. Oe ca the show that the umber of evets occurrig i a regio of uit size should obey the Poisso distributio with parameter. Here is a slightly frivolous example. Suppose cookies are made out of a dough that cotais (o average) three raisis per spooful. Each cookie cotais two spoofuls of dough. The we would expect that, to good approximatio, the umber of raisis i a cookie is has the Poisso distributio with parameter = 6. Here are the first few values: i 0 1 2 3 4 5 6 7 8 9 10 11 12 Pr[X = i] 0.002 0.015 0.045 0.089 0.134 0.161 0.161 0.138 0.103 0.069 0.041 0.023 0.011 Notice that the Poisso distributio arises aturally i (at least) two distict importat cotexts. Alog with the biomial ad the ormal distributios (which we shall meet soo), the Poisso distributio is oe of the three distributios you are most likely to fid yourself workig with. CS 70, Sprig 2005, Notes 21 6