Lecture 5 : The Poisson Distribution

Similar documents
Lecture 5 : The Poisson Distribution

Lecture 6: The Normal distribution

Binomial and Poisson Probability Distributions

Poisson distributions Mixed exercise 2

Continuous Random Variables. What continuous random variables are and how to use them. I can give a definition of a continuous random variable.

Math/Stat 3850 Exam 1

Probability and Samples. Sampling. Point Estimates

Chapters 3.2 Discrete distributions

Lecture 2: Discrete Probability Distributions

Scientific Measurement

Lecture 1 : Basic Statistical Measures

Lecture 13. Poisson Distribution. Text: A Course in Probability by Weiss 5.5. STAT 225 Introduction to Probability Models February 16, 2014

Lecture 8 Continuous Random Variables

3.4. The Binomial Probability Distribution

Solutions to Problem Set 4

Sections 3.4 and 3.5

Discrete Distributions

The central limit theorem

POISSON RANDOM VARIABLES

Random Variables Example:

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Lecture 3. Discrete Random Variables

Chapter 1 - Lecture 3 Measures of Location

Chapter 6 Continuous Probability Distributions

18.175: Lecture 17 Poisson random variables

X = X X n, + X 2

Part 3: Parametric Models

Lecture 6. Probability events. Definition 1. The sample space, S, of a. probability experiment is the collection of all

Binomial random variable

4. Discrete Probability Distributions. Introduction & Binomial Distribution

Descriptive statistics

S2 QUESTIONS TAKEN FROM JANUARY 2006, JANUARY 2007, JANUARY 2008, JANUARY 2009

CS 361: Probability & Statistics

Quick review on Discrete Random Variables

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 2 MATH00040 SEMESTER / Probability

The Wright-Fisher Model and Genetic Drift

Lecture 7: Confidence interval and Normal approximation

MAS1302 Computational Probability and Statistics

Normal approximation to Binomial

We're in interested in Pr{three sixes when throwing a single dice 8 times}. => Y has a binomial distribution, or in official notation, Y ~ BIN(n,p).

Expectations. Definition Let X be a discrete rv with set of possible values D and pmf p(x). The expected value or mean value of X, denoted by E(X ) or

We're in interested in Pr{three sixes when throwing a single dice 8 times}. => Y has a binomial distribution, or in official notation, Y ~ BIN(n,p).

Topic 3: Introduction to Statistics. Algebra 1. Collecting Data. Table of Contents. Categorical or Quantitative? What is the Study of Statistics?!

Categorical Data Analysis 1

Tests for Population Proportion(s)

Lecture 12. Poisson random variables

Math/Stat 352 Lecture 8

Lecture 16. Lectures 1-15 Review

Chapter 15 Sampling Distribution Models

Random Variable. Discrete Random Variable. Continuous Random Variable. Discrete Random Variable. Discrete Probability Distribution

Part 2: One-parameter models

2.3 Estimating PDFs and PDF Parameters

Math 10 - Compilation of Sample Exam Questions + Answers

Math/Stat 352 Lecture 10. Section 4.11 The Central Limit Theorem

PubH 5450 Biostatistics I Prof. Carlin. Lecture 13

SDS 321: Introduction to Probability and Statistics

Part 3: Parametric Models

ACMS Statistics for Life Sciences. Chapter 13: Sampling Distributions

Mark Scheme (Results) January 2011

Measures of Central Tendency. Mean, Median, and Mode

18.175: Lecture 13 Infinite divisibility and Lévy processes

3. DISCRETE PROBABILITY DISTRIBUTIONS

Stat Lecture 20. Last class we introduced the covariance and correlation between two jointly distributed random variables.

Sections 3.4 and 3.5

Markov Chains. X(t) is a Markov Process if, for arbitrary times t 1 < t 2 <... < t k < t k+1. If X(t) is discrete-valued. If X(t) is continuous-valued

LAB 2 1. Measurement of 2. Binomial Distribution

ST 371 (V): Families of Discrete Distributions

Statistics 2. The Normal Distribution

More Discrete Distribu-ons. Keegan Korthauer Department of Sta-s-cs UW Madison

Discrete Distributions: Poisson Distribution 1

Lecture 10: Generalized likelihood ratio test

STAT 200 Chapter 1 Looking at Data - Distributions

σ. We further know that if the sample is from a normal distribution then the sampling STAT 2507 Assignment # 3 (Chapters 7 & 8)

Chapter 7: Theoretical Probability Distributions Variable - Measured/Categorized characteristic

CS 237: Probability in Computing

Solutions. Some of the problems that might be encountered in collecting data on check-in times are:

Introduction to Statistical Data Analysis Lecture 3: Probability Distributions

Salt Lake Community College MATH 1040 Final Exam Fall Semester 2011 Form E

Random Variables. Definition: A random variable (r.v.) X on the probability space (Ω, F, P) is a mapping

Topic 9 Examples of Mass Functions and Densities

Statistics 360/601 Modern Bayesian Theory

Chapter 6. The Standard Deviation as a Ruler and the Normal Model 1 /67

The independent-means t-test:

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology Kharagpur

Since D has an exponential distribution, E[D] = 0.09 years. Since {A(t) : t 0} is a Poisson process with rate λ = 10, 000, A(0.

Sampling (Statistics)

DEFINITION: IF AN OUTCOME OF A RANDOM EXPERIMENT IS CONVERTED TO A SINGLE (RANDOM) NUMBER (E.G. THE TOTAL

Chapter 3. Measuring data

Chapter 18. Sampling Distribution Models. Bin Zou STAT 141 University of Alberta Winter / 10

Expected Value - Revisited

CSE 103 Homework 8: Solutions November 30, var(x) = np(1 p) = P r( X ) 0.95 P r( X ) 0.

Some Statistics. V. Lindberg. May 16, 2007

The Binomial distribution. Probability theory 2. Example. The Binomial distribution

AP Final Review II Exploring Data (20% 30%)

STAT/MA 416 Answers Homework 6 November 15, 2007 Solutions by Mark Daniel Ward PROBLEMS

Statistical Experiment A statistical experiment is any process by which measurements are obtained.

Chapter 18. Sampling Distribution Models /51

Statistics of Radioactive Decay

PhysicsAndMathsTutor.com

Transcription:

Random events in time and space Lecture 5 : The Poisson Distribution Jonathan Marchini Many experimental situations occur in which we observe the counts of events within a set unit of time, area, volume, length etc. For example, The number of cases of a disease in different towns The number of mutations in set sized regions of a chromosome In such situations we are often interested in whether the events occur randomly in time or space or not. Consider the birth times from the Babyboom dataset we saw in Lecture 2. We can plot a histogram of births per hour. 0 200 400 600 800 1000 1200 1440 Birth Time (minutes since midnight) Frequency 0 5 10 15 0 2 4 6 No. of births per hour

Consider the following sequence of birth times that are obviously not random. We observe a very different pattern in the histogram of these birth times per hour. 0 200 400 600 800 1000 1200 1440 Birth Time (minutes since midnight) Frequency 0 5 10 15 0 2 4 6 No. of births per hour This example illustrates that the distribution of counts is useful in uncovering whether the events might occur randomly or non-randomly in time (or space). Simply looking at the histogram isn t sufficient if we want to ask the question whether the events occur randomly or not. To answer this question we need a probability model for the distribution of counts of random events that dictates the type of distributions we should expect to see. The Poisson Distribution The Poisson distribution is a discrete probability distribution for the counts of events that occur randomly in a given interval of time (or space). = The number of events in a given interval, λ = mean number of events per interval The probability of observing x events in a given interval is given by P( = x) = e λλx x! x = 0, 1, 2, 3, 4,...

Note e is a mathematical constant. e 2.718282. There should be a button on your calculator e x that calculates powers of e. If the probabilities of are distributed in this way, we write Po(λ) λ is the parameter of the distribution. We say follows a Poisson distribution with parameter λ Note A Poisson random variable can take on any positive integer value. In contrast, the Binomial distribution always has a finite upper limit. Example 1 Births in a hospital occur randomly at an average rate of 1.8 births per hour. What is the probability of observing 4 births in a given hour at the hospital? Let = No. of births in a given hour (i) Events occur randomly (ii) Mean rate λ = 1.8 Po(1.8) We can now use the formula to calculate the probability of observing exactly 4 births in a given hour P( = 4) = e 1.8( 1.8 4 ) = 0.0723 4! Example 2 What about the probability of observing more than or equal to 2 births in a given hour at the hospital? We want P( 2) = P( = 2) + P( = 3) +... i.e. an infinite number of probabilities to calculate

but P( 2) = P( = 2) + P( = 3) +... = 1 P( < 2) = 1 (P( = 0) + P( = 1)) = 1 (e 1.8( 1.8 0 ) + e 1.8( 1.8 1 ) ) 0! 1! = 1 (0.16529 + 0.29753) = 0.537 Example 3 Jinkinson and Slater (1981) observed the nature of queues at a London Underground station. They counted the number of women present in queues of length ten. The data for 100 such queues are presented below: No. women 0 1 2 3 4 5 6 7 8 9 10 Frequency 1 3 4 23 25 19 18 5 1 1 0 Which is model is appropriate for this data (a) Binomial, or (b) Poisson? P() 0.00 0.05 0.10 0.15 0.20 0.25 Po(3) 0 5 10 20 Shape of the Poisson P() 0.00 0.05 0.10 0.15 0.20 0.25 Po(5) 0 5 10 20 0.00 0.05 0.10 0.15 0.20 0.25 Po(10) 0 5 10 20 P() Poisson distributions are (i) unimodal (ii) exhibit positive skew (that decreases as λ increases) (iii) centered roughly on λ (iii) the variance (spread) increases as λ increases

Mean and Variance of the Poisson distribution In general, there is a formula for the mean of a Poisson distribution. There is also a formula for the standard deviation, σ, and variance, σ 2. If Po(λ) then µ = λ σ = λ σ 2 = λ Changing the size of the interval Suppose we know that births in a hospital occur randomly at an average rate of 1.8 births per hour. What is the probability that we observe 5 births in a given 2 hour interval? 1.8 births per 1 hour interval 3.6 births per 2 hour interval Let Y = No. of births in a 2 hour period Then Y Po(3.6) P(Y = 5) = e 3.6 ( 3.6 5 ) = 0.13768 5! This example illustrates the following rule If Po(λ) on 1 unit interval, then Y Po(kλ) on k unit intervals. Sum of two Poisson variables Now suppose we know that in hospital A births occur randomly at an average rate of 2.3 births per hour and in hospital B births occur randomly at an average rate of 3.1 births per hour. What is the probability that we observe 7 births in total from the two hospitals in a given 1 hour period?

To answer this question we can use the following rule If Po(λ 1 ) on 1 unit interval, and Y Po(λ 2 ) on 1 unit interval, then + Y Po(λ 1 + λ 2 ) on 1 unit interval. = No. of births in a given hour at hospital A Y = No. of births in a given hour at hospital B Then Po(2.3), Y Po(3.1) and + Y Po(5.4) P( + Y = 7) = e 5.4 ( 5.4 7 ) = 0.11999 7! Using the Poisson to approximate the Binomial Bin(100, 0.02) Po(2) P() 0.00 0.10 0.20 P() 0.00 0.10 0.20 0 2 4 6 8 10 0 2 4 6 8 10 In general, If n is large (say > 50) and p (say < 0.1) is small then a Bin(n, p) can be approximated with a Po(λ) where λ = np The idea of using one distribution to approximate another is widespread throughout statistics and one we will meet again. In many situations it is extremely difficult to use the exact distribution and so approximations are very useful.

Example Given that 5% of a population are left-handed, use the Poisson distribution to estimate the probability that a random sample of 100 people contains 2 or more left-handed people. = No. of left handed people out of 100 Bin(100, 0.05) Poisson approximation Po(λ) with λ = 100 0.05 = 5 We want P( 2)? P( 2) = 1 P( < 2) ( ) = 1 P( = 0) + P( = 1) ( 1 e 5 ( 5 0 ) + e 5( 5 1 )) 0! 1! 1 0.040428 0.959572 If we use the exact Binomial distribution we get the answer 0.96292. Fitting a Poisson distribution Consider the two sequences of birth times we saw at the beginning of the lecture. Both of these examples consisted of a total of 44 births in 24 hour intervals. Therefore the mean birth rate for both sequences is 44 24 = 1.8333 What would be the expected counts if birth times were really random i.e. what is the expected histogram for a Poisson random variable with mean rate λ = 1.8333. Using the Poisson formula we can calculate the probabilities of obtaining each possible value. x 0 1 2 3 4 5 6 P( = x) 0.159 0.293 0.269 0.164 0.075 0.028 0.011 Note We group all of the values 6 together because they have very small probabilities of occurring.

Then if we observe 24 hour intervals we can calculate the expected frequencies as 24 P( = x) for each value of x. x 0 1 2 3 4 5 6 Expected Frequency 3.84 7.04 6.45 3.94 1.81 0.66 0.27 24 P( = x) We say we have fitted a Poisson distribution to the data. 3 steps Fitting discrete distributions (i) Estimating the parameters of the distribution from the data (ii) Calculating the probability distribution (iii) Multiplying the probability distribution by the number of observations Once we have fitted a distribution to the data we can compare the expected frequencies to those we actually observed from the real Babyboom dataset. x 0 1 2 3 4 5 6 Expected 3.84 7.04 6.45 3.94 1.81 0.66 0.27 Observed 3 8 6 4 3 0 0 The agreement is quite good. For the second sequence of birth times there is much less agreement. x 0 1 2 3 4 5 6 Expected 3.84 7.04 6.45 3.94 1.81 0.66 0.27 Observed 12 3 0 2 2 4 1 In Lecture 7 we will see how to formally test this discrepancy.