Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 19

CS 70 Discrete Mathematics ad Probability Theory Sprig 2016 Rao ad Walrad Note 19 Some Importat Distributios Recall our basic probabilistic experimet of tossig a biased coi times. This is a very simple model, yet surprisigly powerful. May importat probability distributios that are widely used to model real-world pheomea ca be derived from lookig at this basic coi tossig model. The first example, which we have see i Note 16, is the biomial distributio Bi(, p). This is the distributio of the umber of Heads, S, i tosses of a biased coi with probability p to be Heads. Recall that the distributio of S is Pr[S k] ( k) p k (1 p) k for k {0,1,...,}. The expected value is E(S ) p ad the variace is Var(S ) p(1 p). The biomial distributio frequetly appears to model the umber of successes i a repeated experimet. Geometric Distributio Cosider tossig a biased coi with Heads probability p repeatedly. Let X deote the umber of tosses util the first Head appears. The X is a radom variable that takes values i the set of positive itegers {1,2,3,...}. The evet that X i is equal to the evet of observig Tails for the first i 1 tosses ad gettig Heads i the i-th toss, which occurs with probability (1 p) i 1 p. Such a radom variable is called a geometric radom variable. The geometric distributio frequetly occurs i applicatios because we are ofte iterested i how log we have to wait before a certai evet happes: how may rus before the system fails, how may shots before oe is o target, how may poll samples before we fid a Democrat, how may retrasmissios of a packet before successfully reachig the destiatio, etc. Defiitio 19.1 (Geometric distributio). A radom variable X for which Pr[X i] (1 p) i 1 p for i 1,2,3,... is said to have the geometric distributio with parameter p. This is abbreviated as X Geom(p). As a saity check, we ca verify that the total probability of X is equal to 1: Pr[X i] (1 p) i 1 p p (1 p) i 1 p where i the secod-to-last step we have used the formula for geometric series. 1 1 (1 p) 1, If we plot the distributio of X (i.e., the values Pr[X i] agaist i) we get a curve that decreases mootoically by a factor of 1 p at each step, as show i Figure 1. Let us ow compute the expectatio E(X). Applyig the defiitio of expected value directly gives us: E(X) i Pr[X i] p i(1 p) i 1. CS 70, Sprig 2016, Note 19 1

Figure 1: The Geometric distributio. However, the fial summatio is difficult to evaluate ad requires some calculus trick. Istead, we will use the followig alterative formula for expectatio. Theorem 19.1. Let X be a radom variable that takes values i {0,1,2,...}. The E(X) Pr[X i]. Proof. For otatioal coveiece, let s write p i Pr[X i], for i 0,1,2,... From the defiitio of expectatio, we have E(X) (0 p 0 ) + (1 p 1 ) + (2 p 2 ) + (3 p 3 ) + (4 p 4 ) + p 1 + (p 2 + p 2 ) + (p 3 + p 3 + p 3 ) + (p 4 + p 4 + p 4 + p 4 ) + (p 1 + p 2 + p 3 + p 4 + ) + (p 2 + p 3 + p 4 + ) + (p 3 + p 4 + ) + (p 4 + ) + Pr[X 1] + Pr[X 2] + Pr[X 3] + Pr[X 4] +. I the third lie, we have regrouped the terms ito coveiet ifiite sums, ad each ifiite sum is exactly the probability that X i for each i. You should check that you uderstad how the fourth lie follows from the third. Let us repeat the proof more formally, this time usig more compact mathematical otatio: E(X) j Pr[X j] j1 j1 j Pr[X j] ji Pr[X j] Pr[X i]. We ca ow use Theorem 19.1 to compute E(X) more easily. Theorem 19.2. For a geometric radom variable X Geom(p), we have E(X) 1 p. Proof. The key observatio is that for a geometric radom variable X, Pr[X i] (1 p) i 1 for i 1,2,... (1) We ca obtai this simply by summig Pr[X j] for j i. Aother way of seeig this is to ote that the evet X i meas at least i tosses are required. This is equivalet to sayig that the first i 1 tosses are all CS 70, Sprig 2016, Note 19 2

Tails, ad the probability of this evet is precisely (1 p) i 1. Now, pluggig equatio (1) ito Theorem 19.1, we get E(X) Pr[X i] (1 p) i 1 1 1 (1 p) 1 p, where we have used the formula for geometric series. So, the expected umber of tosses of a biased coi util the first Head appears is 1 p. Ituitively, if i each coi toss we expect to get p Heads, the we eed to toss the coi 1 p times to get 1 Head. So for a fair coi, the expected umber of tosses is 2, but remember that the actual umber of coi tosses that we eed ca be ay positive itegers. Remark: Aother way of derivig E(X) 1 p is to use the iterpretatio of a geometric radom variable X as the umber of coi tosses util we get a Head. Cosider what happes i the first coi toss: If the first toss comes up Heads, the X 1. Otherwise, we have used oe toss, ad we repeat the coi tossig process agai; the umber of coi tosses after the first toss is agai a geometric radom variable with parameter p. Therefore, we ca calculate: Solvig for E(X) yields E(X) 1 p, as claimed. Let us ow compute the variace of X. E(X) p 1 }{{} + (1 p) (1 + E(X)). }{{} first toss is H first toss is T, the toss agai Theorem 19.3. For a geometric radom variable X Geom(p), we have Var(X) 1 p p 2. Proof. We will show that E(X(X 1)) 2(1 p). Sice we already kow E(X) 1 p 2 p, this will imply the desired result: Var(X) E(X 2 ) E(X) 2 E(X(X 1)) + E(X) E(X) 2 2(1 p) p 2 + 1 p 1 2(1 p) + p 1 p2 p 2 1 p p 2. Now to show E(X(X 1)) 2(1 p), we begi with the followig idetity of geometric series: p 2 i0 (1 p) i 1 p. Differetiatig the idetity above with respect to p yields (the i 0 term is equal to 0 so we omit it): i(1 p) i 1 1 p 2. Differetiatig both sides with respect to p agai gives us (the i 1 term is equal to 0 so we omit it): i2 i(i 1)(1 p) i 2 2 p 3. (2) CS 70, Sprig 2016, Note 19 3

Now usig the geometric distributio of X ad idetity (2), we ca calculate: E(X(X 1)) i2 p(1 p) i(i 1) Pr[X i] i(i 1)(1 p) i 1 p i2 i(i 1)(1 p) i 2 p(1 p) 2 p 3 (usig idetity (2)) (the i 1 term is equal to 0 so we omit it) as desired. 2(1 p) p 2, Applicatio: The Coupo Collector s Problem Suppose we are tryig to collect a set of differet baseball cards. We get the cards by buyig boxes of cereal: each box cotais exactly oe card, ad it is equally likely to be ay of the cards. How may boxes do we eed to buy util we have collected at least oe copy of every card? Let X deote the umber of boxes we eed to buy i order to collect all cards. The distributio of X is difficult to compute directly (try it for 3). But if we are oly iterested i its expected value E(X), the we ca evaluate it easily usig liearity of expectatio ad what we have just leared about the geometric distributio. As usual, we start by writig X X 1 + X 2 +... + X (3) for suitable simple radom variables X i. What should the X i be? Naturally, X i is the umber of boxes we buy while tryig to get the i-th ew card (startig immediately after we ve got the (i 1)-st ew card). With this defiitio, make sure you believe equatio (3) before proceedig. What does the distributio of X i look like? Well, X 1 is trivial: o matter what happes, we always get a ew card i the first box (sice we have oe to start with). So Pr[X 1 1] 1, ad thus E(X 1 ) 1. How about X 2? Each time we buy a box, we ll get the same old card with probability 1, ad a ew card with probability 1 1. So we ca thik of buyig boxes as flippig a biased coi with Heads probability p ; the X 2 is just the umber of tosses util the first Head appears. So X 2 has the geometric distributio with parameter p 1, ad E(X 2 ) 1. How about X 3? This is very similar to X 2 except that ow we oly get a ew card with probability 2 (sice there are ow two old oes). So X 3 has the geometric distributio with parameter p 2, ad E(X 3 ) 2. Arguig i the same way, we see that, for i 1,2,...,, X i has the geometric distributio with parameter p i+1, ad hece that E(X i ) i + 1. CS 70, Sprig 2016, Note 19 4

Fially, applyig liearity of expectatio to equatio (3), we get E(X) E(X i ) + 1 + + 2 + 1 1 i. (4) This is a exact expressio for E(X). We ca obtai a tidier form by otig that the sum i it actually has a very good approximatio, 1 amely: 1 l + γ, i where γ 0.5772... is kow as Euler s costat. Thus, the expected umber of cereal boxes eeded to collect cards is about (l + γ). This is a excellet approximatio to the exact formula (4) eve for quite small values of. So for example, for 100, we expect to buy about 518 boxes. Poisso Distributio Cosider the umber of clicks of a Geiger couter, which measures radioactive emissios. The average umber of such clicks per uit time, λ, is a measure of radioactivity, but the actual umber of clicks fluctuates accordig to a certai distributio called the Poisso distributio. What is remarkable is that the average value, λ, completely determies the probability distributio o the umber of clicks X. Defiitio 19.2 (Poisso distributio). A radom variable X for which Pr[X i] λ i i! e λ for i 0,1,2,... (5) is said to have the Poisso distributio with parameter λ. This is abbreviated as X Poiss(λ). To make sure this is a valid defiitio, let us check that (5) is i fact a distributio, i.e., that the probabilities sum to 1. We have λ i i0 i! e λ e λ λ i i0 i! e λ e λ 1. I the secod-to-last step, we used the Taylor series expasio e x 1 + x + x2 2! + x3 3! +. The Poisso distributio is also a very widely accepted model for so-called rare evets," such as miscoected phoe calls, radioactive emissios, crossovers i chromosomes, the umber of cases of disease, the umber of births per hour, etc. This model is appropriate wheever the occurreces ca be assumed to happe radomly with some costat desity i a cotiuous regio (of time or space), such that occurreces i disjoit subregios are idepedet. Oe ca the show that the umber of occurreces i a regio of uit size should obey the Poisso distributio with parameter λ. Example: Suppose whe we write a article, we make a average of 1 typo per page. We ca model this with a Poisso radom variable X with λ 1. So the probability that a page has 5 typos is Pr[X 5] 15 5! e 1 1 120 e 1 326. 1 This is aother of the little tricks you might like to carry aroud i your toolbox. CS 70, Sprig 2016, Note 19 5

Now suppose the article has 200 pages. If we assume the umber of typos i each page is idepedet, the the probability that there is at least oe page with exactly 5 typos is Pr[ a page with exactly 5 typos] 1 Pr[every page has 5 typos] 200 1 k1 Pr[page k has 5 typos] 200 1 (1 Pr[page k has exactly 5 typos]) 1 k1 ( 1 1 ) 200, 120 e where i the last step we have used our earlier calculatio for Pr[X 5]. Let us ow calculate the expectatio ad variace of a Poisso radom variable. As we oted before, the expected value is simply λ. Here we see that the variace is also equal to λ. Theorem 19.4. For a Poisso radom variable X Poiss(λ), we have E(X) λ ad Var(X) λ. Proof. We ca calculate E(X) directly from the defiitio of expectatio: E(X) i0 i Pr[X i] i λ i i! e λ (the i 0 term is equal to 0 so we omit it) λe λ (i 1)! λ i 1 λe λ e λ (sice e λ j0 λ j j! λ. Similarly, we ca calculate E(X(X 1)) as follows: E(X(X 1)) Therefore, as desired. i0 i2 i(i 1) Pr[X i] i(i 1) λ i λ 2 e λ (i 2)! i2 with j i 1) i! e λ (the i 0 ad i 1 terms are equal to 0 so we omit them) λ i 2 λ 2 e λ e λ (sice e λ j0 λ j j! λ 2. with j i 2) Var(X) E(X 2 ) E(X) 2 E(X(X 1)) + E(X) E(X) 2 λ 2 + λ λ 2 λ, CS 70, Sprig 2016, Note 19 6

A plot of the Poisso distributio reveals a curve that rises mootoically to a sigle peak ad the decreases mootoically. The peak is as close as possible to the expected value, i.e., at i λ. Figure 2 shows a example for λ 5. 0.18 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 0 5 10 15 20 Figure 2: The Poisso distributio with λ 5. Poisso ad Coi Flips To see a cocrete example of how Poisso distributio arises, suppose we wat to model the umber of cell phoe users iitiatig calls i a etwork durig a time period, of duratio (say) 1 miute. There are may customers i the etwork, ad all of them ca potetially make a call durig this time period. However, oly a very small fractio of them actually will. Uder this sceario, it seems reasoable to make two assumptios: The probability of havig more tha 1 customer iitiatig a call i ay small time iterval is egligible. The iitiatio of calls i disjoit time itervals are idepedet evets. The if we divide the oe-miute time period ito disjoit itervals, the the umber of calls X i that time period ca be modeled as a biomial radom variable with parameter ad probability of success p, i.e., p is the probability of havig a call iitiated i a time iterval of legth 1/. But what should p be i terms of the relevat parameters of the problem? If calls are iitiated at a average rate of λ calls per miute, the E(X) λ ad so p λ, i.e., p λ/. So X Bi(, λ ). As we shall see below, as we let ted to ifiity, this distributio teds to the Poisso distributio with parameter λ. We ca also see why the Poisso distributio is a model for rare evets. We are thikig of it as a sequece of a large umber,, of coi flips, where we expect oly a fiite umber λ of Heads. Now we will prove that the Poisso distributio Poiss(λ) is the limit of the biomial distributio Bi(, λ ), as teds to ifiity. Theorem 19.5. Let X Bi(, λ ) where λ > 0 is a fixed costat. The for every i 0,1,2,..., Pr[X i] λ i i! e λ as. That is, the probability distributio of X coverges to the Poisso distributio with parameter λ. CS 70, Sprig 2016, Note 19 7

Proof. Fix i {0,1,2,...}, ad assume i (because we will let ). The, because X has biomial distributio with parameter ad p λ, Pr[X i] Let us collect the factors ito Pr[X i] λ i The first parethesis above becomes, as, ( ) ( ) p i (1 p) i! λ i ( 1 λ ) i. i i!( i)!! ( i)! 1 ( 1) ( i + 1) ( i)! i ( i)! i! (! ( i)! 1 ) ( i 1 λ ) ( 1 λ ) i. (6) 1 i From calculus, the secod parethesis i (6) becomes, as, ( 1 λ ) e λ. Ad sice i is fixed, the third parethesis i (6) becomes, as, Substitutig these results back to (6) gives us ( 1 λ ) i (1 0) i 1. ( 1) ( i + 1) 1. as desired. Pr[X i] λ i i! 1 e λ 1 λ i i! e λ, CS 70, Sprig 2016, Note 19 8