Poisson approximations

Similar documents
0, otherwise. EX = E(X 1 + X n ) = EX j = np and. Var(X j ) = np(1 p). Var(X) = Var(X X n ) =

Poisson approximations

CS 330 Discussion - Probability

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 19

Discrete Mathematics for CS Spring 2005 Clancy/Wagner Notes 21. Some Important Distributions

f X (12) = Pr(X = 12) = Pr({(6, 6)}) = 1/36

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22

n outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n,

Discrete Mathematics and Probability Theory Summer 2014 James Cook Note 15

Lecture 1 Probability and Statistics

Random Models. Tusheng Zhang. February 14, 2013

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Discrete Mathematics and Probability Theory Spring 2012 Alistair Sinclair Note 15

6.3 Testing Series With Positive Terms

Lecture 1 Probability and Statistics

7.1 Convergence of sequences of random variables

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Lecture 6: Coupon Collector s problem

Math 525: Lecture 5. January 18, 2018

f X (12) = Pr(X = 12) = Pr({(6, 6)}) = 1/36

Problem Set 2 Solutions

Math 216A Notes, Week 5

Parameter, Statistic and Random Samples

Chapter 5. Inequalities. 5.1 The Markov and Chebyshev inequalities

ON POINTWISE BINOMIAL APPROXIMATION

STAT Homework 1 - Solutions

An Introduction to Randomized Algorithms

Infinite Sequences and Series

Math 152. Rumbos Fall Solutions to Review Problems for Exam #2. Number of Heads Frequency

The standard deviation of the mean

Lecture 12: September 27

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

Randomized Algorithms I, Spring 2018, Department of Computer Science, University of Helsinki Homework 1: Solutions (Discussed January 25, 2018)

The Growth of Functions. Theoretical Supplement

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Lecture 12: November 13, 2018

EE 4TM4: Digital Communications II Probability Theory

Lecture 2: Monte Carlo Simulation

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

1 Introduction: within and beyond the normal approximation

TMA4245 Statistics. Corrected 30 May and 4 June Norwegian University of Science and Technology Department of Mathematical Sciences.

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Notes 5 : More on the a.s. convergence of sums

7.1 Convergence of sequences of random variables

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Basics of Probability Theory (for Theory of Computation courses)

Chapter 6 Principles of Data Reduction

Lecture 2: April 3, 2013

Understanding Samples

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Lecture 16

CS / MCS 401 Homework 3 grader solutions

Mathematics 170B Selected HW Solutions.

( ) = p and P( i = b) = q.

Seunghee Ye Ma 8: Week 5 Oct 28

Problems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman:

Advanced Stochastic Processes.

Discrete probability distributions

Learning Theory: Lecture Notes

Lecture 2. The Lovász Local Lemma

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

Distribution of Random Samples & Limit theorems

Application to Random Graphs

4. Partial Sums and the Central Limit Theorem

Approximations and more PMFs and PDFs

HOMEWORK 2 SOLUTIONS

Lecture 7: Properties of Random Samples

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 6 9/24/2008 DISCRETE RANDOM VARIABLES AND THEIR EXPECTATIONS

PH 425 Quantum Measurement and Spin Winter SPINS Lab 1

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

On Random Line Segments in the Unit Square

Random Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices

Math 155 (Lecture 3)

Lecture 2: Concentration Bounds

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Expectation and Variance of a random variable

STAT 516 Answers Homework 6 April 2, 2008 Solutions by Mark Daniel Ward PROBLEMS

Binomial Distribution

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Elements of Statistical Methods Lots of Data or Large Samples (Ch 8)

Disjoint Systems. Abstract

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Statistics

Lecture 6: Integration and the Mean Value Theorem. slope =

Design and Analysis of Algorithms

The Poisson Distribution

Variance of Discrete Random Variables Class 5, Jeremy Orloff and Jonathan Bloom

Optimally Sparse SVMs

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

(b) What is the probability that a particle reaches the upper boundary n before the lower boundary m?

Limit Theorems. Convergence in Probability. Let X be the number of heads observed in n tosses. Then, E[X] = np and Var[X] = np(1-p).

Simulation. Two Rule For Inverting A Distribution Function

Exercise 4.3 Use the Continuity Theorem to prove the Cramér-Wold Theorem, Theorem. (1) φ a X(1).

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

Output Analysis and Run-Length Control

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Introduction to Probability and Statistics Twelfth Edition

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Random Variables, Sampling and Estimation

Estimation for Complete Data

Transcription:

The Bi, p) ca be thought of as the distributio of a sum of idepedet idicator radom variables X +...+ X, with {X i = } deotig a head o the ith toss of a coi. The ormal approximatio to the Biomial works best whe the variace p p) is large, for the each of the stadardized summads X i p)/ p p) makes a relatively small cotributio to the stadardized sum. Whe is large but p is small, i such a way that p is ot too large, a differet type of approximatio to the Biomial is better. Defiitio. A radom variable Y is said to have a Poisso distributio with parameter λ if it ca take values i N 0, the set of oegative itegers, with probabilities P{Y = k} = e λ λ k for k = 0,, 2,... The parameter λ must be positive. The distributio is deoted by Poissoλ). Throughout this Chapter I will use Q λ to deote the Poissoλ) distributio. That is, Q λ is a probability distributio cocetrated o N 0 for which Q λ {k} = e λ λ k for k = 0,, 2,... Example <8.>: Poissop) approximatio to the Biomial, p) The Poisso iherits several properties from the Biomial. For example, the Bi, p) has expected value p ad variace p p). Oe might suspect that the Poissoλ) should therefore have expected value λ = λ/) ad variace λ = lim λ/) λ/). Also, the coi-tossig origis of the Biomial show that if X has a Bim, p) distributio ad Y has a Bi, p) distributio idepedet of X, the X + Y has a Bi + m, p) distributio. Puttig λ = mp ad µ = p oe would the suspect that the sum of idepedet Poissoλ) ad Poissoµ) distributed radom variables is Poissoλ + µ) distributed. Example <8.2>: If X has a Poissoλ) distributio, the EX = varx) = λ. If also Y has a Poissoµ) distributio, ad Y is idepedet of X, the X + Y has a Poissoλ + µ) distributio. Couts of rare evets such as the umber of atoms udergoig radioactive decay durig a short period of time, or the umber of aphids o a leaf are ofte modelled by Poisso distributios, at least as a first approximatio. I some situatios it makes sese to thik of the couts as the umber of successes i a large umber of idepedet trials, with the chace of a success o ay particular trial beig very small rare evets ). I such a settig, the Poisso arises as a approximatio for a sum of idepedet couts. I fact, moder probability methods ca hadle situatios much more geeral tha approximatio to the Biomial. For example, suppose S = X + X 2 +... + X, where X i Statistics 24: 8 October 2005 C8- c David Pollard

has a Bi, p i ) distributio, for costats p, p 2,...,p that are ot ecessarily all the same. Suppose the X i s are idepedet. If the p i s are ot all the same the S does ot have a Biomial distributio. Nevertheless, the Che-Stei method see Barbour, Holst & Jaso 992 for a extesive discussio of the method) ca be used to show that max P{S A} Q λa) e λ )/λ i= A p2 i where λ = p +...+ p. The method of proof is elemetary i the sese that it makes use of probabilistic techiques at the level of Statistics 24 but extremely subtle. Remark. The maximum here rus over all subsets A of N 0. I fact the maximum is achieved whe A = {k N 0 : P{S = k} Q λ {k}}, i which case P{S A} Q λ A) equals P{S = k} Q 2 k=0 λ{k}. This quatity is called the total variatio distace betwee Q λ ad the distributio of X; it gives a very strog cotrol over the errors i the approximatio. Note also that e λ )/λ mi, /λ). Ideed, the left-had side is close to whe λ 0 ad it behaves like /λ whe λ is large. Whe all the p i are equal to some small p, we get a boud o the total variatio distace betwee the Biomial, p) ad the Poissop) smaller tha mip, p 2 ). This boud makes precise the traditioal advice that the Poisso approximatio is good whe p is small ad p is ot too big. I fact, the traditio was a bit coservative.) The Poisso approximatio also applies i may settigs where the trials are almost idepedet, but ot quite. Agai the Che-Stei method delivers impressively good bouds o the errors of approximatio. For example, the method works well i two cases where the depedece takes a a simple form. Oce agai suppose S = X + X 2 +...+ X, where X i has a Bi, p i ) distributio, for costats p, p 2,...,p that are ot ecessarily all the same. Defie S i = S X i = j: j i X j. The radom variables X,...,X are said to be positively associated if P{S i k X i = } P{S i k X i = 0} for each i ad each k N 0 ; they are said to be egatively associated 2 if P{S i k X i = } P{S i k X i = 0} for each i ad each k N 0 ; With some work it ca be show that max P{S A} Q λa) A { e λ 2 )/λ i= p2 i + vars) λ ) ) uder positive associatio λ vars) uder egative associatio. These bouds take advatage of the fact that vars) would be exactly equal to λ if S had a Poissoλ) distributio. The ext Example illustrates both the classical approach ad the Che-Stei approach via positive associatio) to derivig a Poisso approximatio for a matchig problem. Example <8.3>: Poisso approximatio for a matchig problem: assigmet of letters at radom to evelopes, oe per evelope. The Appedix to this Chapter provides a more detailed itroductio to the Che-Stei method, as applied to aother aspect of the matchig problem. I have take advatage of a few special features of the matchig problem to simplify the expositio.) You could safely skip this Appedix. For more details, see the moograph by Barbour et al. 992). ot stadard termiology 2 ot stadard termiology Statistics 24: 8 October 2005 C8-2 c David Pollard

Examples for Chapter 8 <8.> Example. The Poissoλ) appears as a approximatio to the Bi, p) whe is large, p is small, ad λ = p: ) ) p k p) k )... k + ) λ k = λ ) k k = )... k λk λ ) if k is small relative to ) λ ) k λ k λ ) λk e λ if is large. The fial e λ comes from a approximatio to the logarithm, log λ ) = log λ ) = λ λ 2 ) 2... λ if λ/ 0. 2 <8.2> Example. Verify the properties of the Poisso distributio suggested by the Biomial aalogy: If X has a Poissoλ) distributio, show that i) EX = λ ii) varx) = λ Also, if Y has a Poissoµ) distributio idepedet of X, show that iii) X + Y has a Poissoλ + µ) distributio Solutio: Assertio i) comes from a routie applicatio of the formula for the expectatio of a radom variable with a discrete distributio. EX = kp{x = k} = k e λ λ k What happes to k = 0? k=0 k= = e λ λ = e λ λe λ k =0 λ k k )! = λ. Notice how the k cacelled out oe factor from the i the deomiator. If we were to calculate EX 2 ) i the same way, oe factor i the k 2 would cacel the leadig k from the, but would leave a upleasat k/k )! i the sum. Too bad the k 2 caot be replaced by kk ). Well, why ot? EX 2 X) = kk )P{X = k} k=0 = e λ k=2 = e λ λ 2 kk ) λk k 2=0 λ k 2 k 2)! What happes to k = 0 ad k =? = λ 2. Now calculate the variace. varx) = EX 2 ) EX) 2 = EX 2 X) + EX EX) 2 = λ. Statistics 24: 8 October 2005 C8-3 c David Pollard

For assertio iii), first ote that X + Y ca take oly values 0,, 2... For a fixed k i this rage, decompose the evet {X + Y = k} ito disjoit pieces whose probabilities ca be simplified by meas of the idepedece betwee X ad Y. P{X + Y = k} =P{X = 0, Y = k}+p{x =, Y = k }+...+ P{X = k, Y = 0} = P{X = 0}P{Y = k}+p{x = }P{Y = k }+...+ P{X = k}p{y = 0} = e λ λ 0 0! = e λ µ e µ µ k +...+ e λ λ k e µ µ 0 0! 0! λ0 µ k +!k )! λ µ k +...+ 0! λk µ 0 = e λ µ λ + µ) k. The bracketed sum i the secod last lie is just the biomial expasio of λ + µ) k. Questio: How do you iterpret the otatio i the last calculatio whe k = 0? I always feel slightly awkward about a cotributio from k ifk = 0. <8.3> Example. Suppose letters are placed at radom ito evelopes, oe letter per evelope. The total umber of correct matches, S, ca be writte as a sum X +... + X of idicators, { if letter i is placed i evelope i, X i = 0 otherwise. The X i are depedet o each other. For example, symmetry implies that p i = P{X i = } =/ for each i ad P{X i = X = X 2 =...= X i = } = i + We could elimiate the depedece by relaxig the requiremet of oly oe letter per evelope. The umber of letters placed i the correct evelope possibly together with other, icorrect letters) would the have a Bi, /) distributio, which is approximated by Poisso) if is large. We ca get some supportig evidece for S havig somethig close to a Poisso) distributio uder the origial assumptio oe letter per evelope) by calculatig some momets. ES = EX i = P{X i = } = i ) ad ES 2 = E X 2 +...+ X 2 + 2 ) X i X j i< j ) = EX 2 + 2 EX X 2 by symmetry 2 = P{X = }+ 2 )P{X =, X 2 = } = ) + 2 ) ) = 2. Thus vars) = ES 2 ES) 2 =. Compare with Example <8.2>, which gives EY = ad vary ) = for a Y distributed Poisso). Statistics 24: 8 October 2005 C8-4 c David Pollard

DP: check result ad Feller citatio Usig the method of iclusio ad exclusio, it is possible Feller 968, Chapter 4) to calculate the exact distributio of the umber of correct matches, <8.4> P{S = k} =! + 2! ) 3!...± for k = 0,,...,. k)! For fixed k, as the probability coverges to + 2! 3! )... = e = Q {k}, which is the probability that Y = k if Y has a Poisso) distributio. The Che-Stei method is also effective i this problem. I claim that it is ituitively clear although a rigorous proof might be tricky) that the X i s are positively associated: P{S i k X i = } P{S i k X i = 0} for each i ad each k N 0. I feel that if X i =, the it is more likely for the other letters to fid their matchig evelopes tha if X i = 0, which makes thigs harder by fillig oe of the evelopes with the icorrect letter i. We therefore have max P{S A} Q A) 2 i= A p2 i + vars) = 2/. As gets large, the distributio of S does get close to the Poisso) i the strog, total variatio sese. However, it is possible see Barbour et al. 992, page 73) to get a better boud by workig directly from <8.4> Refereces Barbour, A. D., Holst, L. & Jaso, S. 992), Poisso Approximatio, Oxford Uiversity Press. Feller, W. 968), A Itroductio to Probability Theory ad Its Applicatios, Vol., third ed, Wiley, New York. Appedix: The Che-Stei method for the matchig problem You might actually fid the argumet leadig to the fial boud of Example <8.3> more elighteig tha the codesed expositio that follows. I ay case, you ca safely stop readig this chapter right ow without sufferig major probabilistic deprivatio. You were wared. Cosider oce more the matchig problem described i Example <8.3>. Use the Che- Stei method to establish the approximatio P{S = k} e for k = 0,, 2,... The startig poit is a curious coectio betwee the Poisso) ad the fuctio g ) defied by g0) = 0 ad g j) = 0 e t t j dt for j =, 2,... Notice that 0 g j) for all j. Also, itegratio by parts shows that g j + ) = jg j) e for j =, 2,... ad direct calculatio gives g) = e More succictly, <8.5> g j + ) jg j) = { j = 0} e for j = 0,,... Statistics 24: 8 October 2005 C8-5 c David Pollard

Actually the defiitio of g0) has o effect o the validity of the assertio whe j = 0; you could give g0) ay value you liked. Suppose Y has a Poisso) distributio. Substitute Y for j i <8.5>, the take expectatios to get E gy + ) YgY )) = E{Y = 0} e = P{Y = 0} e = 0. A similar calculatio with S i place of Y gives <8.6> P{S = 0} e = E gs + ) SgS)). If we ca show that the right-had side is close to zero the we will have P{S = 0} e, which is the desired Poisso approximatio for P{S = k} whe k = 0. A simple symmetry argumet will the give the approximatio for other k values. There is a beautiful probabilistic trick for approximatig the right-had side of <8.6>. Write the SgS) cotributio as <8.7> ESgS) = E X i gs) = EX i gs) = EX gs) i= The trick cosists of a special two-step method for allocatig letters at radom to evelopes, which iitially gives letter a special role. ) Put letter i evelope, the allocate letters 2,..., to evelopes 2,..., i radom order, oe letter per evelope. Write + Z for the total umber of matches of letters to correct evelopes. The comes from the forced matchig of letter ad evelope.) Write Y j for the letter that goes ito evelope j. Notice that EZ =, as show i Example <8.3>. 2) Choose a evelope R at radom probability / for each evelope), the swap letter with the letter i the chose evelope. Notice that X is idepedet of Z, because of step 2. Ideed, P{X = Z = k} =P{R = Z = k} =/ for each k. Notice also that { + Z if R = S = Z if R 2 ad Y R = R Z if R 2 ad Y R R Thus P{S Z Z = k} =P{R = }+ j 2 P{R = j, Y j = j Z = k} = + j 2 j = j Z = k} = k + ad P{S Z} = k + P{Z = k} = EZ + = 2. k That is, the costructio gives S = Z with high probability. From the fact that whe X = that is, R = ) we have S = Z +, deduce that <8.8> X gs) = X g + Z) i= Statistics 24: 8 October 2005 C8-6 c David Pollard

The same equality holds trivially whe X = 0. Take expectatios. The argue that ESgS) = EX gs) by <8.7> = EX g + Z) by <8.8> = EX Eg + Z) by idepedece of X ad Z = Eg + Z) Thus the right-had side of <8.6> equals E gs + ) gz + )). O the evet {S = Z} the two terms cacel; o the evet {S Z}, the differece gs + ) gz + ) lies betwee ± because 0 g j) for j =, 2,... Combiig these two cotributios, we get P gs + ) gz + )) P{S Z} 2 ad <8.9> P{S = 0} e = P gs + ) SgS)) 2/. The exact expressio for P{S = 0} from <8.4> shows that 2/ greatly overestimates the error of approximatio, but at least it teds to zero as gets large. After all that work to justify the Poisso approximatio to P{S = k} for k = 0, you might be forgive for shrikig from the prospect of extedig the approximatio to larger k. Fear ot! The worst is over. For k =, 2,... the evet {S = k} specifies exactly k matches. There are k) choices for the matchig evelopes. By symmetry, the probability of matches oly i a particular set of k evelopes is the same for each specific choice of the set of k evelopes. It follows that ) P{S = k} = P{evelopes,...,k match; the rest do t} k The probability of gettig matches i evelopes,...,k equals )... k + ). The coditioal probability P{evelopes k +,..., do t match evelopes,...,k match} is equal to the probability of zero matches whe k letters are placed at radom ito their evelopes. If is much larger tha k, this probability is close to e, as show above. Thus! P{S = k} k)! More formally, for each fixed k, ) 2)... k + ) e = e. P{S = k} e where Y has the Poisso) distributio. = P{Y = k} as, Statistics 24: 8 October 2005 C8-7 c David Pollard