The Poisson Distribution Mary Lindstrom (Adapted from notes provided by Professor Bret Larget) February 5, 2004 Statistics 371 Last modified: February 4, 2004
The Poisson Distribution The Poisson distribution arises in many biological contexts. Examples of random variables for which a Poisson distribution might be reasonable include: the number of bacterial colonies in a Petri dish; the number of trees in an area of land; the number of offspring an individual has; the number of nucleotide base substitutions in a gene over a period of time; Statistics 371 1
Probability Mass Function The probability mass function of the Poisson distribution with mean µ is Pr{Y = k} = e µ µ k for k = 0, 1, 2,.... k! Where e is a special number in math (like π). It is approximately equal to 2.718282. The Poisson distribution is discrete, like the binomial distribution, but has only a single parameter µ and it has infinitely many possible outcomes. The mean and variance are the same for a Poisson random variable. µ Y = E(Y ) = µ σ 2 = V ar(y ) = µ Statistics 371 2
Computing Poisson Probabilities in R The function dpois will compute Poisson probabilities. If µ = 10, we can find Pr{Y = 12} = e 10 10 12 12! with the command > dpois(12, 10) [1] 0.09478033 > dpois(12, 1000) [1] 0 Or, if we wanted the probabilities that a Poisson random variable with mean 4 would take on the values 8 through 12 we would type: > dpois(8:12, 4) [1] 0.0297701813 0.0132311917 0.0052924767 0.0019245370 0.0006415123 Statistics 371 3
Plotting Poisson Probabilities in R > source("prob.r") > par(mfrow = c(2, 1)) > gpois(mu = 2) > gpois(mu = 10) Poisson Distribution mu = 2 Probability 0.00 0.25 0 2 4 6 8 Count Probability 0.00 0.10 Poisson Distribution mu = 10 0 5 10 15 20 Count Statistics 371 4
Poisson approximation to the binomial One way that the Poisson distribution can arise is as an approximation for the binomial distribution when p is small. The approximation is quite good for large enough n. If p is small and n is large then the probability that a binomial(n, p) R.V. is equal to k is approximately the same as the probability that a Poisson R.V. with µ = np is equal to k. Here is an example with p = 0.01 and n = 50. > dbinom(0:4, 50, 0.01) [1] 0.605006067 0.305558620 0.075618042 0.012221098 0.001450484 > dpois(0:4, 50 * 0.01) [1] 0.606530660 0.303265330 0.075816332 0.012636055 0.001579507 This approximation is most useful when n is large so that the binomial coefficients are very large. Statistics 371 5
The Poisson Process The Poisson Process arises naturally under assumptions that are often reasonable. For the following, think of occurrences as being exact times of events or random locations of something. The assumptions are: 1. The chance of two simultaneous occurrences (or occurrences at the same location) is negligible; 2. The expected value of the random number of occurrences in a time interval (or in a region) is proportional to the length of the interval (area of the region). 3. The random number of occurrences in non-overlapping time intervals (regions) are independent. Under these assumptions, the random variable that counts the number of occurrences has a Poisson distribution. Statistics 371 6
Generating one Poisson R.V. from another Sometimes we are given a Poisson R.V. for the number of occurrences for one unit of length (area, time) but are interested in another. We can create a second Poisson random variable from the first. If Y 1 is a Poisson random variable with mean µ 1 counting the number of occurrences per unit length (area, time). Then the random variable Y 2 which counts the number of occurrences in an interval of length (area, time) t is Poisson with mean µ 2 = tµ 1. Statistics 371 7
Example Suppose that we assume that at a location, a particular species of plant is distributed according to a Poisson process with expected density 0.2 individuals per square meter. In a nine square meter quadrant, what is the probability of no individuals? Solution: The number of individuals in 9 square meters has a Poisson distribution with mean µ = 9 0.2 = 1.8. The probability of no individuals in 9 meters is Pr{Y = 0} = e 1.8 (1.8) 0 0! In R, we can compute this as = e 1.8 = 0.165 > dpois(0, 1.8) [1] 0.1652989 Statistics 371 8
Example (cont.) Find the probability of three or more individuals in 9 square meters. Solution: Instead of summing the probabilities from 3 to infinity, we can use the complement rule. Pr{Y 3} = 1 Pr{Y 2} = 1 Pr{Y = 0} Pr{Y = 1} Pr{Y = 2} In R, this is found as > 1 - sum(dpois(0:2, 1.8)) [1] 0.2693789 Statistics 371 9
Consider an experiment to find the concentration of colony forming units (cells that will result in the growth of a colony) in a solution. Assume that it is difficult or impossible to count individual colonies. First create a series of dilutions at the following strengths: 1/2 the strength of the the original solution 1/4 the strength of the the original solution 1/8 the strength of the the original solution 1/16 the strength of the the original solution 1/32 the strength of the the original solution 1/64 the strength of the the original solution 1/128 the strength of the the original solution Statistics 371 10
Now take 80 plates and plate out 1 milliliter of each solution on to each of 10 growing plates. Wait an appropriate length of time and record the number of plates that have at least one colony growing on it. Our data might look like: Dilution Number of plates out of 10 with no colonies Full strength 0 1/2 strength 1 1/4 strength 3 1/8 strength 6 1/16 strength 8 1/32 strength 8 1/64 strength 9 1/128 strength 10 Statistics 371 11
How would we estimate the number of colony forming units (CFU) per ml in the original solution? First let s compute the proportion of the plates with no colonies: Plates with no colonies Dilution Number Proportion Full strength 0 0.00 1/2 strength 1 0.10 1/4 strength 3 0.30 1/8 strength 6 0.60 1/16 strength 8 0.80 1/32 strength 8 0.80 1/64 strength 9 0.90 1/128 strength 10 1.00 Statistics 371 12
Lets start with the Full strength solution. Let s assume: the original solution has µ CFUs per ml the number of CFU s in any individual ml of the original solution is a Poisson distributed random variable with mean µ. Is this reasonable? What are the assumptions required for a R.V. to be Poisson? 1. The chance of two simultaneous occurrences (or occurrences at the same location) is negligible; 2. The expected value of the random number of occurrences in a time interval (or in a region) is proportional to the length of the interval (area of the region). Statistics 371 13
3. The random number of occurrences in non-overlapping time intervals (regions) are independent. If Y is the number of CFU s in 1 ml of the original solution then we know that Pr{Y = k} = e µ µ k k! for k = 0, 1, 2,.... A plate having no colonies is equivalent to Y = 0. calculate the probability that Y = 0 as Pr{Y = 0} = e µ µ 0 = e µ 0! We can Statistics 371 13
What about our dilutions? If the number of CFU s in a ml of the original solution is Poisson(µ) then the number of CFU s in 1/d dilution of the original solution will be Poisson(µ/d). Let s expand our table to show the expected proportion of plates with no colonies Statistics 371 14
Plates with no colonies Expected Dilution Number Proportion Proportion Full strength 0 0.00 exp( µ) 1/2 strength 1 0.10 exp( µ/2) 1/4 strength 3 0.30 exp( µ/4) 1/8 strength 6 0.60 exp( µ/8) 1/16 strength 8 0.80 exp( µ/16) 1/32 strength 8 0.80 exp( µ/32) 1/64 strength 9 0.90 exp( µ/64) Notes exp(x) = e x We drop the dilutions which give 0 or 10 plates with zero colonies. This method does not use them. Statistics 371 15
Now take the log (base e) of the proportions and the expected proportions Plates with no colonies Log Log expected Dilution Number Proportion Proportion 1/2 strength 1 log e (0.10) µ/2 1/4 strength 3 log e (0.30) µ/4 1/8 strength 6 log e (0.60) µ/8 1/16 strength 8 log e (0.80) µ/16 1/32 strength 8 log e (0.80) µ/32 1/64 strength 9 log e (0.90) µ/64 Statistics 371 16
Multiply through by the dilutions to get Plates with no colonies Sample Theoretical Dilution Number Statistic Value 1/2 strength 1 2 log e (0.10) µ 1/4 strength 3 4 log e (0.30) µ 1/8 strength 6 8 log e (0.60) µ 1/16 strength 8 16 log e (0.80) µ 1/32 strength 8 32 log e (0.80) µ 1/64 strength 9 64 log e (0.90) µ Statistics 371 17
Doing the math we get Plates with no colonies Sample Theoretical Dilution Number Statistic Value 1/2 strength 1 4.8 µ 1/4 strength 3 4.8 µ 1/8 strength 6 4.1 µ 1/16 strength 8 3.6 µ 1/32 strength 8 7.1 µ 1/64 strength 9 6.7 µ mean 5.2 µ So we might use 5.2 as our estimate of the true but unknown value of µ. Note that there are other ways of analyzing this type of experiment. Statistics 371 18
We can use R to calculate the expected number of plates with zero colonies assuming µ = 5.2. > dilutions = 2^(0:7) > dilutions [1] 1 2 4 8 16 32 64 128 > means <- 5.2/dilutions > means [1] 5.200000 2.600000 1.300000 0.650000 0.325000 0.162500 0.081250 0.040625 > round(dpois(0, means), 2) [1] 0.01 0.07 0.27 0.52 0.72 0.85 0.92 0.96 Statistics 371 19
So assuming µ = 5.2 our observed and expected proportion of plates with no colonies are Proportion Dilution Observed Expected full strength 0.00 0.01 1/2 strength 0.10 0.07 1/4 strength 0.30 0.27 1/8 strength 0.60 0.52 1/16 strength 0.80 0.72 1/32 strength 0.80 0.85 1/64 strength 0.90 0.92 1/128 strength 1.00 0.96 Why don t the observed and expected match? Statistics 371 20