www.stat.ubc.ca/~bouchard/courses/stat302-sp2017-18/ Intro to Probability Instructor: Alexandre Bouchard
Info on midterm CALCULATOR: only NON-programmable, NON-scientific, NON-graphing (and of course, no phone). closed book, but we provide a formula sheet Topics examined: up to and not including the Poisson process (but including Poisson approximation!) See Midterm study resources under Tools tab of website: Formula sheet provided; posted on website Also on website: previous year (slightly shorter) midterm
www.stat.ubc.ca/~bouchard/courses/stat302-sp2017-18/ Logistics What else is new/recent on the website: Written assignment 2 released, due after break (but start now if you can!) Webwork: next one will be posted today, due in one week. Clicker grading rule: participation-based (but might give a few extra clicker marks based on correctness)
Plan for today: Poisson process, continued Normal approximations
Review
Def 16 The uniform distribution Density CDF
Def 18 The exponential distribution Density CDF f(x) Mean (i.e. E[X]): 1/λ Variance: 1/λ 2
Space of outcomes (S) Space of labels ( realizations ) Before s X Each value X(s) is a number Example: X(s) = 5
Space of outcomes (S) Space of labels ( realizations ) New r.v.: PP X Each value X(s) is a set Examples: X(s) = X(s ) = x xx x x
X(s) = 0 1 [ x xx ] X N[0,1] 2 1 0 X(s ) = 0 1 [ x ] x
Poisson process: two equivalent descriptions Spacings: D1 D2 D3 D4 D5 xx x x Counts: A
Def 19 Description based on counts G xx x x B A X is called a Poisson process with intensity i if: For all A G, N A ~ Poi(i * length(a)) For all disjoint A, B G, NA is independent of NB Notation: X ~ PP( i ) NA = # points of X in A NB = # points of X in B
Def 19b Description based on spacings D1 D2 D3 D4 D5 xx x x Sets of points where the spacings (distances between consecutive points) are given by: D1 ~ Exp( i ), D2 ~ Exp( i ), D3 ~ Exp( i ), etc All independent and identically distributed (iid)
Application to earthquake distribution
Ex 56 Earthquake example Data: 68 earthquake around the Strait of Messina in the 1700-1980 period Only magnitude > 4.5 kept Foreshocks and aftershocks removed
Ex 56 Data:... Available under Files in case you are curious: earthquakes.xls 1/average = intensity =
Ex 56
Useful facts Exponential (rate λ) Density: λ exp( -λx ) x [0, ) Mean: 1/λ Variance: 1/λ 2 CDF: 1 - exp( -λx) Poisson Process (intensity λ) Let A and B be two disjoint intervals Let their length be LA and LB, Then NA and NB are: independent Poisson r.v.s, with rates λla and λlb Poisson Distribution (mean λ) PMF: λ x e - λ / x!, x {0, 1, 2,...} Mean: λ, Variance: λ
Ex 56a Clicker question The probability that a recurrence interval is shorter than 10 months is given by the expression: A. P(T < 10) where T~Exp( =0.2403) B. P(T < 0.83) where T~Exp( =0.2403) C. P(T < 10) where T~Poisson( l =0.2403*10) D. P(T < 0.83) where T~Poisson( l =0.2403*10)
Ex 56a Clicker question The probability that a recurrence interval is shorter than 10 months is given by the expression: A. P(T < 10) where T~Exp( =0.2403) B. P(T < 0.83) where T~Exp( =0.2403) C. P(T < 10) where T~Poisson( l =0.2403*10) D. P(T < 0.83) where T~Poisson( l =0.2403*10)
Ex 56b Clicker question What is the probability that a recurrence interval is shorter than 10 months? A. 0.1815 B. 0.8185 C. 0.0904 D. 0.9096
Ex 56b Clicker question What is the probability that a recurrence interval is shorter than 10 months? A. 0.1815 B. 0.8185 C. 0.0904 D. 0.9096
Ex 56c Clicker question What proportion of the observed recurrence intervals are greater than 5 years? A. 0.10 B. 0.16 C. 0.28 D. 0.38
Ex 56c Clicker question What proportion of the observed recurrence intervals are greater than 5 years? A. 0.10 B. 0.16 C. 0.28 D. 0.38
Ex 56c Clicker question a. What proportion of the observed recurrence intervals are greater than 5 years (hint: find the areas of the bars where area = density x bar width)? How does it compare to the theoretical value assuming an Exponential distribution for the recurrence intervals? Proportion = 0.06 + 0.06 + 0.02 + 0.03 + 0.02 + 0.05 + 0.02 + 0.02 = 0.28 The observed proportion is close to the theoretical probability P(T > 5) = 1 F(5) = e 0.2403 5 = 0.301
Ex 56d Clicker question What is the probability that a recurrence interval is greater than 5 years (theoretical value)? A. 0.095 B. 0.905 C. 0.301 D. 0.699
Ex 56d Clicker question What is the probability that a recurrence interval is greater than 5 years (theoretical value)? A. 0.095 B. 0.905 C. 0.301 D. 0.699
Ex 56e Clicker question The probability that there are no earthquakes in 5 years is given by the expression: A. P(T < 5) where T~Exp( =0.2403) B. P(T > 5) where T~Exp( =0.2403) C. P(T = 0) where T~Exp( =0.2403) D. P(X = 0) where X~Poisson( l =0.2403*5) E. P(X = 0) where X~Poisson( =0.2403)
Ex 56e Clicker question The probability that there are no earthquakes in 5 years is given by the expression: A. P(T < 5) where T~Exp( =0.2403) B. P(T > 5) where T~Exp( =0.2403) C. P(T = 0) where T~Exp( =0.2403) Both correct D. P(X = 0) where X~Poisson( l =0.2403*5) E. P(X = 0) where X~Poisson( =0.2403)
Ex 56f Clicker question An earthquake just occurred. What is the probability that one has to wait for more than 30 years to observe the 10th earthquake? A. P(T > 30) where T~Exp( =0.2403) B. P(X = 9) where X~Poisson( l =0.2403*30) C. P(X = 10) where X~Poisson( l =0.2403*30) D. P(X l =0.2403*30) E. P(G > 30) where G~Gamma( =10, =0.2403)
Ex 56f Clicker question An earthquake just occurred. What is the probability that one has to wait for more than 30 years to observe the 10th earthquake? A. P(T > 30) where T~Exp( =0.2403) B. P(X = 9) where X~Poisson( l =0.2403*30) C. P(X = 10) where X~Poisson( l =0.2403*30) D. P(X l =0.2403*30) E. P(G > 30) where G~Gamma( =10, =0.2403)
Another way of answering last question: the Gamma distribution
Examples of Gamma random variables 3-Spacings: ~ Gamma(rate = i, shape = 3) 2-Spacings: ~ Gamma(rate = i, shape = 2) 1-Spacings: ~ Exponential(rate = i) xx x x
Def 20 The Gamma distribution Density CDF f(x) / x shape 1 e rate x No convenient form f(x) / g(x) f(x) =Cg(x) for x positive means: for some constant C,
Def 20 The Gamma distribution In other words: the sum of n Exponential with rate λ is a Gamma with shape n and rate λ
Def 20 The Gamma distribution Density CDF f(x) / x shape 1 e rate x No convenient form for x positive Special case: if shape = 1, this is just an exponential 1-Spacings: ~ Exponential(rate = i) ~ Gamma(rate = i, shape = 1) xx x x
A first glance to the Normal approximation
Situations where the Normal approximation is useful Sums of (pretty much*) any iid random variables Sn = X1 + X2 +... + Xn * precise conditions will be described later
Example: random walks Sn = Sn-1 + Xn So far, we know how to deal with cases where Xn = 0 or 1: Suppose Xi is more complicated: Sn = $ n = year Sn = $ Here for example, Xn can be 0, 1, or 2 n = year
Prop 15 Normal approximation: intuition If: S = X1 + X2 +... + Xn Then: d S a + b Z Where: (more detail in next slides) a and b are easy to compute numbers Z is the standard Normal distribution d mean approximately equal in distribution
Normal approximation: intuition If: S = X1 + X2 +... + Xn Then: d S a + b Z Where: (more detail in next slides) a = nµ b = p n µ = E[X i ] = p Var(X i ) = standard deviation (SD)
Def 21 Standard normal distribution 1 Density: f(z) / exp 2 z2
Def 22 d d = and d X = Y means: P(X t) = P(Y t) for all t d X Y means: P(X t) P(Y t) for all t
Ex 57 Example: gambling when you can borrow as much money as you want Suppose a roulette has: 2 green squares, 18 black squares, 18 red squares You bed $1 each round You can borrow as much as you want and will not stop until you play 100 rounds What is the probability you are ahead (do not owe money) after 100 rounds?
Using standard normal CDF tables
Ex 58 Another example - Ideal class size: 150 students - 30% accept the admission offer - Proposed policy: send 450 offers.. - probab no more than 150 attend? A.51% B.79% C.94% D. 99%