Approximations and more PMFs and PDFs

Approximatios ad more PMFs ad PDFs Saad Meimeh 1 Approximatio of biomial with Poisso Cosider the biomial distributio ( b(k,,p = p k (1 p k, k λ: k Assume that is large, ad p is small, but p λ at the limit. For a fixed b(,,p = ( b(1,,p = b(2,,p = = p (1 p = (1 p = (1 λ/ e λ ( 1 p(1 p 1 = p 1 p b(,,p λ = b(,,p λe λ 1 λ/ p 2 (1 p 2 = ( 2 ( 1p2 2(1 p 2 b(,,p ( 1λ2 λ2 2 2 b(,,p (1 λ/ 2 2 e λ Cotiuig this way, we fid that (whe is large b(k,,p p(k,λ = λk e λ k! where λ = p. While p(k, λ represets a approximatio for the biomial probability (b,, p, p(k,λ is a probability mass fuctio o its ow, kow as the Poisso mass fuctio. Note that p(k,λ = λk e λ, k k!

p(k,λ = e λ k= k= λ k k! = e λ e λ = 1 What is the probability that amog 5 people, exactly k will have their birthday o ew year s day? If the people are chose at radom, the this is 5 Beroulli trials with p = 1/365. The, λ = 5/365 = 1.3699. For differet values of k, we have: k p(k,λ b(k,,p.2541.2537 1.3481.3484 2.2385.2388 3.189.189 4.373.372 5.12.11 6.23.23 2 The expoetial distributio May statistical observatios actually lead to a Poisso radom variable. For istace, cosider a sequece of radom evets occurrig over time. If we divide time ito small itervals of legth 1/ ( is large, the: the probability of a evet occurrig withi a iterval becomes small, call this p the probability that more tha oe evet occurs i a iterval is egligible Assumig that evets are statistically idepedet, fix a iterval of time t ad observe that it cotais approximately t small itervals of legth 1/. Therefore, the probability of k evets occurrig durig t is at the limit b(k,t,p. Now if we coceive that p λ We ca show that p < 2p 2. The probability that o evets occur i a small iterval of legth 1/ is the probability that o evets occur i ay of its two halves of legth 1/2. Therefore, 1 p = (1 p 2 (1 p 2 = 1 2p 2 + p 2 2. This meas p = 2p 2 p 2 2 ad thus p < 2p 2. I additio, p is ot a sesible situatio sice this would imply ifiitely may occurreces eve i the smallest itervals because the expected umber of evets is tp, the (assumig p λ: b(k,t,p = b(k,t,λt/t (λtk e λt k! where λ i the Poisso approximatio is replaced with λt. We will see that λ ca be iterpreted as a rate. Cosider the time T util the occurrece of the first evet. Observe that P(T t is the probability that at least oe evets occurs durig time iterval t. Therefore,

But, P(T t = 1 p(,λt = 1 e λt P(T t = t f T (zdz = 1 e λt where f T (t is the probability desity fuctio (cotiuous for T. If we differetiate both sides we get: d dt t f(zdz = d dt (1 e λt f T (t = λe λt This is kow as the expoetial distributio. We say that T is expoetially distributed. Now we ca compute the expected value of T (recall this is the time util the occurrece of the first evet: E[T] = tλe λt dt = 1 λ Therefore, λ ca be iterpreted as the rate of occurrece. Example: Suppose that bus arrival is expoetially distributed. Give that you have waited for some time τ, what is the probability that you will wait a additioal time t before the bus arrives? Usig Bayes rule: P(T τ > t T > τ = P(T > t + τ T > τ = = 1 P(T > t + τ P(T > τ P(T > τ T > t + τp(t > t + τ P(T > τ = e λ(t+τ e λτ = e λt = P(T > t This ca be iterpreted as follows: the fact that you have waited for some time does ot mea that you will wait less for the bus to arrive. Of course this is ot true if the bus is ruig o a fixed schedule. I fact, this is oly a property of the expoetial distributio, kow as the memoryless property: P(T τ > t T > τ = P(T > t The same result ca be obtaied if we work directly with the desity f T (t. For this, we eed to use Baye s rule with a mix of probabilities ad desities. To see this, recall that for a give cotiuous radom variable X ad a small δ, P( X x + δ δf X (x. Therefore, give a evet E. P( X x + δ E = P(E X x + δp( X x + δ P(E

δf X (x E = P(E X = xδf X(x P(E By takig the limit as δ, we have f X (x E = With a simplified otatio: ad similarly, f(x E = P(E X = x = Goig back to the memoryless property: f(t T > τ = P(E X = xf X (x P(E = P(E X = xf X (xdx P(E X = xf(x P(E X = xfx (xdx f(x EP(E f(x EP(E + f(x E c [1 P(E] P(T > τ T = tf(t P(T > τ Sice P(T > τ T = t is whe t τ ad 1 otherwise, we get { t τ f(t T > τ = λe λ(t τ t > τ Aother example: Suppose y is the time before the first occurrece of a radioactive decay which is measured by a istrumet, but that, because there is a delay built ito the mechaism, the decay is recorded as havig take place at x > y. We actually have a value of x, but would like to say what we ca about the value of y o the basis of this observatio. Assume: f(y = e y, y Usig Bayes, f(x y = ke k(x y, x y f(y x = f(x yf(y f(x Now f(x ca be computed as = ke k(x y e y f(x = ke kx e (k 1y f(x f(x = x Therefore, f(x,ydy = x ke kx e (k 1y dy = ke kx k 1 e(k 1y x = ke kx k 1 (e(k 1x 1 f(y x = (k 1e(k 1y e (k 1x 1, y x

3 DeMoivre-Laplace approximatio Cosider a biomial radom variable with parameters ad p, ad let q = 1 p. The DeMoivre-Laplace result says that if (k p 3 / 2 (ituitively k does ot deviate much from p as, the: b(k,,p 1 2πqp e (k p2 2pq = 1 pq φ( k p pq where φ(x = 1 2π e x2 2. This result ca be obtaied usig Stirlig s approximatio for factorials i the biomial coefficiets. With some additioal work, it ca also lead to (assumig both a ad b satisfy the coditio above: b b(k,,p 1 b φ( p + k b +.5 p a.5 p Φ( Φ( pq pq pq pq pq k=a k=a where Φ(x = x φ(ydy. Note that b(k,,p is the probability of k successes amog idepedet Beroulli trials. Therefore, b(k,,p = P(S = k, where S = X 1 +X 2 +...+ X, ad X i is a Beroulli radom variable. P(a S b Φ( b +.5 p a.5 p Φ( pq pq Example: Toss a fair coi 2 times. What is the probability that the umber of heads will be betwee 95 ad 15. That s b(95,2,.5+...+b(15,2,.5 but this is a cumbersome computatio. Usig the approximatio we have a shortcut: 15 +.5 1 95.5 1 P(95 S 15 Φ( Φ( 5 5 = Φ( 5.5 5 Φ( 5.5 5 =.56331... (from tables The actual aswer is.56325... If istead of cosiderig S, we cosider S = S p pq the for arbitrary fixed a ad b (why?, the DeMoivre-Laplace result yields: P(a S b Φ(b Φ(a This is a special case of the more geeral Cetral Limit Theorem.

4 The Gaussia (Normal distributio Note that p is the mea of the biomial distributio, ad pq its variace. The DeMoivre-Laplace approximatio ca be geeralized to the followig expressio: 1 e (x µ2 2σ 2 2πσ where µ is a mea ad σ 2 is a variace. This expressio the represets a desity fuctio o its ow, kow as the Gaussia or ormal desity. A importat property of this desity is that a liear combiatio of idepedet ormal radom variables gives a ormal radom variable. I other words, let X = a 1 X 1 + a 2 X 2...+a X, where X i is a ormal radom variable with mea µ i ad variace σi 2. The X is a ormal radom variable with mea µ = i a iµ i ad variace i a2 i σ2 i. This ca be show by fidig the trasform of X, E[e sx ] (see below, which is equal to i ] because X E[esXi i s are idepedet. The comfirm that this trasform is the trasform of a ormal radom variable with the desired mea ad variace. More importatly, the Gaussia distributio is at the heart of the most celebrated result i probability theory, the Cetral Limit Theorem. 5 The Cetral Limit Theorem Let X 1,X 2,...X be idepedet idetically distributed radom variables, with fiite mea µ ad fiite variace σ 2. Defie S = X 1 + X 2 +... + X. The, as goes to ifiity, P(a S µ σ b Φ(b Φ(a Example: Cosider a coi with probability p of gettig a head. We toss the coi times. Let S be the umber of heads. How large should be to guaratee that: P( S p.1.5 We would like S p.1 with probability at least.95. Let σ 2 be the variace of the Beroulli trial correspodig to a coi toss. We eed S p σ.1 /σ with probability at least.95. Sice σ 2 = p(1 p, σ 2 has a maximum of 1/4, so σ 1/2. So it should be eough to guaratee S p σ.2 with probability at least.95. Usig the Cetral Limit Theorem, we eed Φ(.2 Φ(.2.95 From the tables, we see that 964. Note that this is a approximatio, sice is ot really ifiity; moreover, the boud.2 is ot fixed ad

depeds o. A better guaratee, but less practical, is to use Chebychev iequality (which ca also prove the weak law of large umbers: P( S p ǫ σ2 ǫ 2 Therefore, we eed 1/(4.1 2.5 or 5. Aother example: Law of large umbers (weak form. Cosider the followig probability: P( S ǫ p ǫ = P( S σ Φ( ǫ/σ Φ( ǫ/σ Therefore, for ay fixed ǫ, as, P( S p ǫ Φ( Φ( = 1 We say that S / p i probability. This is our ituitive otio that as icreases the average umber of successes is close to the probability of success (1/2 if coi is fair for istace. Here s a sketch of a proof for the Cetral Limit Theorem: Give a radom variable X, cosider the trasform E[e sx ] (a fuctio of s. The trasform uiquely defies the PDF. The trasform of a Gaussia (ormal radom variable with mea ad variace 1 ca be easily computed to be e s2 /2. Now cosider the trasform of S µ σ. E[e s( X 1 µ σ +...+ X µ σ ] = E[e s X 1 µ σ s...e X µ σ ] Sice X i are idepedet ad idetically distributed, we get: E[e s X µ σ ] Note that for a give s, s/ ; therefore, where o(s2 / s 2 / e s X µ σ s X µ = 1 + + s2 (X µ 2 σ 2σ 2 + o(s 2 /. Therefore, E[e s X µ σ ] = E[1 + s X µ + s2 (X µ 2 σ 2σ 2 + o(s 2 /] = (1 + + s2 2 + o(s2 / = (1 + s2 /2 + o(s 2 / e s2 /2+s 2 o(s 2 / s 2 / e s2 /2 This is ot a rigorous proof because it assumes that the trasform exists (which is ot ecessarily true. But it captures the essece of a proof. A more rigorous proof based o a similar trasform is possible.