Approximations and more PMFs and PDFs

Similar documents
This section is optional.

Binomial Distribution

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

EE 4TM4: Digital Communications II Probability Theory

Lecture 7: Properties of Random Samples

Lecture 2: Poisson Sta*s*cs Probability Density Func*ons Expecta*on and Variance Es*mators

Generalized Semi- Markov Processes (GSMP)

4. Partial Sums and the Central Limit Theorem

Topic 9: Sampling Distributions of Estimators

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 19

Exponential Families and Bayesian Inference

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Lecture 19: Convergence

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Last time: Moments of the Poisson distribution from its generating function. Example: Using telescope to measure intensity of an object

Chapter 6 Principles of Data Reduction

PRACTICE PROBLEMS FOR THE FINAL

Expectation and Variance of a random variable

Limit Theorems. Convergence in Probability. Let X be the number of heads observed in n tosses. Then, E[X] = np and Var[X] = np(1-p).

Distribution of Random Samples & Limit theorems

Convergence of random variables. (telegram style notes) P.J.C. Spreij

7.1 Convergence of sequences of random variables

Final Review for MATH 3510

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

Simulation. Two Rule For Inverting A Distribution Function

Mathematical Statistics - MS

Basics of Probability Theory (for Theory of Computation courses)

CSE 527, Additional notes on MLE & EM

6. Sufficient, Complete, and Ancillary Statistics

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Probability and Statistics

Statistics 511 Additional Materials

0, otherwise. EX = E(X 1 + X n ) = EX j = np and. Var(X j ) = np(1 p). Var(X) = Var(X X n ) =

An Introduction to Randomized Algorithms

Problems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman:

1 Introduction to reducing variance in Monte Carlo simulations

7.1 Convergence of sequences of random variables

Chapter 6 Infinite Series

Estimation for Complete Data

NOTES ON DISTRIBUTIONS

KLMED8004 Medical statistics. Part I, autumn Estimation. We have previously learned: Population and sample. New questions

AMS570 Lecture Notes #2

Stat 421-SP2012 Interval Estimation Section

Chapter 6 Sampling Distributions

Mathematics 170B Selected HW Solutions.

MATH/STAT 352: Lecture 15

Lecture 01: the Central Limit Theorem. 1 Central Limit Theorem for i.i.d. random variables

Random Models. Tusheng Zhang. February 14, 2013

IIT JAM Mathematical Statistics (MS) 2006 SECTION A

[ ] ( ) ( ) [ ] ( ) 1 [ ] [ ] Sums of Random Variables Y = a 1 X 1 + a 2 X 2 + +a n X n The expected value of Y is:


IE 230 Seat # Name < KEY > Please read these directions. Closed book and notes. 60 minutes.

Infinite Sequences and Series

Topic 9: Sampling Distributions of Estimators

Discrete Mathematics for CS Spring 2005 Clancy/Wagner Notes 21. Some Important Distributions

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

Lecture 12: November 13, 2018

Probability and statistics: basic terms

Advanced Stochastic Processes.

Introduction to probability Stochastic Process Queuing systems. TELE4642: Week2

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics and Probability Theory Summer 2014 James Cook Note 15

Module 1 Fundamentals in statistics

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

ECE 330:541, Stochastic Signals and Systems Lecture Notes on Limit Theorems from Probability Fall 2002

PRACTICE PROBLEMS FOR THE FINAL

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

10-701/ Machine Learning Mid-term Exam Solution

Lecture 1 Probability and Statistics

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain

Random Variables, Sampling and Estimation

SINGLE-CHANNEL QUEUING PROBLEMS APPROACH

f X (12) = Pr(X = 12) = Pr({(6, 6)}) = 1/36

Unbiased Estimation. February 7-12, 2008

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Discrete probability distributions

Topic 9: Sampling Distributions of Estimators

Probability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)].

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

It is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function.

Lecture 18: Sampling distributions

1 Approximating Integrals using Taylor Polynomials

Lecture Chapter 6: Convergence of Random Sequences

Frequentist Inference

Introduction to Probability I: Expectations, Bayes Theorem, Gaussians, and the Poisson Distribution. 1

Quick Review of Probability

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Problem Set 4 Due Oct, 12

Quick Review of Probability

STAT 516 Answers Homework 6 April 2, 2008 Solutions by Mark Daniel Ward PROBLEMS

EE / EEE SAMPLE STUDY MATERIAL. GATE, IES & PSUs Signal System. Electrical Engineering. Postal Correspondence Course

4. Basic probability theory

AMS 216 Stochastic Differential Equations Lecture 02 Copyright by Hongyun Wang, UCSC ( ( )) 2 = E X 2 ( ( )) 2

Lecture 3. Properties of Summary Statistics: Sampling Distribution

STAT 515 fa 2016 Lec Sampling distribution of the mean, part 2 (central limit theorem)

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Parameter, Statistic and Random Samples

Transcription:

Approximatios ad more PMFs ad PDFs Saad Meimeh 1 Approximatio of biomial with Poisso Cosider the biomial distributio ( b(k,,p = p k (1 p k, k λ: k Assume that is large, ad p is small, but p λ at the limit. For a fixed b(,,p = ( b(1,,p = b(2,,p = = p (1 p = (1 p = (1 λ/ e λ ( 1 p(1 p 1 = p 1 p b(,,p λ = b(,,p λe λ 1 λ/ p 2 (1 p 2 = ( 2 ( 1p2 2(1 p 2 b(,,p ( 1λ2 λ2 2 2 b(,,p (1 λ/ 2 2 e λ Cotiuig this way, we fid that (whe is large b(k,,p p(k,λ = λk e λ k! where λ = p. While p(k, λ represets a approximatio for the biomial probability (b,, p, p(k,λ is a probability mass fuctio o its ow, kow as the Poisso mass fuctio. Note that p(k,λ = λk e λ, k k!

p(k,λ = e λ k= k= λ k k! = e λ e λ = 1 What is the probability that amog 5 people, exactly k will have their birthday o ew year s day? If the people are chose at radom, the this is 5 Beroulli trials with p = 1/365. The, λ = 5/365 = 1.3699. For differet values of k, we have: k p(k,λ b(k,,p.2541.2537 1.3481.3484 2.2385.2388 3.189.189 4.373.372 5.12.11 6.23.23 2 The expoetial distributio May statistical observatios actually lead to a Poisso radom variable. For istace, cosider a sequece of radom evets occurrig over time. If we divide time ito small itervals of legth 1/ ( is large, the: the probability of a evet occurrig withi a iterval becomes small, call this p the probability that more tha oe evet occurs i a iterval is egligible Assumig that evets are statistically idepedet, fix a iterval of time t ad observe that it cotais approximately t small itervals of legth 1/. Therefore, the probability of k evets occurrig durig t is at the limit b(k,t,p. Now if we coceive that p λ We ca show that p < 2p 2. The probability that o evets occur i a small iterval of legth 1/ is the probability that o evets occur i ay of its two halves of legth 1/2. Therefore, 1 p = (1 p 2 (1 p 2 = 1 2p 2 + p 2 2. This meas p = 2p 2 p 2 2 ad thus p < 2p 2. I additio, p is ot a sesible situatio sice this would imply ifiitely may occurreces eve i the smallest itervals because the expected umber of evets is tp, the (assumig p λ: b(k,t,p = b(k,t,λt/t (λtk e λt k! where λ i the Poisso approximatio is replaced with λt. We will see that λ ca be iterpreted as a rate. Cosider the time T util the occurrece of the first evet. Observe that P(T t is the probability that at least oe evets occurs durig time iterval t. Therefore,

But, P(T t = 1 p(,λt = 1 e λt P(T t = t f T (zdz = 1 e λt where f T (t is the probability desity fuctio (cotiuous for T. If we differetiate both sides we get: d dt t f(zdz = d dt (1 e λt f T (t = λe λt This is kow as the expoetial distributio. We say that T is expoetially distributed. Now we ca compute the expected value of T (recall this is the time util the occurrece of the first evet: E[T] = tλe λt dt = 1 λ Therefore, λ ca be iterpreted as the rate of occurrece. Example: Suppose that bus arrival is expoetially distributed. Give that you have waited for some time τ, what is the probability that you will wait a additioal time t before the bus arrives? Usig Bayes rule: P(T τ > t T > τ = P(T > t + τ T > τ = = 1 P(T > t + τ P(T > τ P(T > τ T > t + τp(t > t + τ P(T > τ = e λ(t+τ e λτ = e λt = P(T > t This ca be iterpreted as follows: the fact that you have waited for some time does ot mea that you will wait less for the bus to arrive. Of course this is ot true if the bus is ruig o a fixed schedule. I fact, this is oly a property of the expoetial distributio, kow as the memoryless property: P(T τ > t T > τ = P(T > t The same result ca be obtaied if we work directly with the desity f T (t. For this, we eed to use Baye s rule with a mix of probabilities ad desities. To see this, recall that for a give cotiuous radom variable X ad a small δ, P( X x + δ δf X (x. Therefore, give a evet E. P( X x + δ E = P(E X x + δp( X x + δ P(E

δf X (x E = P(E X = xδf X(x P(E By takig the limit as δ, we have f X (x E = With a simplified otatio: ad similarly, f(x E = P(E X = x = Goig back to the memoryless property: f(t T > τ = P(E X = xf X (x P(E = P(E X = xf X (xdx P(E X = xf(x P(E X = xfx (xdx f(x EP(E f(x EP(E + f(x E c [1 P(E] P(T > τ T = tf(t P(T > τ Sice P(T > τ T = t is whe t τ ad 1 otherwise, we get { t τ f(t T > τ = λe λ(t τ t > τ Aother example: Suppose y is the time before the first occurrece of a radioactive decay which is measured by a istrumet, but that, because there is a delay built ito the mechaism, the decay is recorded as havig take place at x > y. We actually have a value of x, but would like to say what we ca about the value of y o the basis of this observatio. Assume: f(y = e y, y Usig Bayes, f(x y = ke k(x y, x y f(y x = f(x yf(y f(x Now f(x ca be computed as = ke k(x y e y f(x = ke kx e (k 1y f(x f(x = x Therefore, f(x,ydy = x ke kx e (k 1y dy = ke kx k 1 e(k 1y x = ke kx k 1 (e(k 1x 1 f(y x = (k 1e(k 1y e (k 1x 1, y x

3 DeMoivre-Laplace approximatio Cosider a biomial radom variable with parameters ad p, ad let q = 1 p. The DeMoivre-Laplace result says that if (k p 3 / 2 (ituitively k does ot deviate much from p as, the: b(k,,p 1 2πqp e (k p2 2pq = 1 pq φ( k p pq where φ(x = 1 2π e x2 2. This result ca be obtaied usig Stirlig s approximatio for factorials i the biomial coefficiets. With some additioal work, it ca also lead to (assumig both a ad b satisfy the coditio above: b b(k,,p 1 b φ( p + k b +.5 p a.5 p Φ( Φ( pq pq pq pq pq k=a k=a where Φ(x = x φ(ydy. Note that b(k,,p is the probability of k successes amog idepedet Beroulli trials. Therefore, b(k,,p = P(S = k, where S = X 1 +X 2 +...+ X, ad X i is a Beroulli radom variable. P(a S b Φ( b +.5 p a.5 p Φ( pq pq Example: Toss a fair coi 2 times. What is the probability that the umber of heads will be betwee 95 ad 15. That s b(95,2,.5+...+b(15,2,.5 but this is a cumbersome computatio. Usig the approximatio we have a shortcut: 15 +.5 1 95.5 1 P(95 S 15 Φ( Φ( 5 5 = Φ( 5.5 5 Φ( 5.5 5 =.56331... (from tables The actual aswer is.56325... If istead of cosiderig S, we cosider S = S p pq the for arbitrary fixed a ad b (why?, the DeMoivre-Laplace result yields: P(a S b Φ(b Φ(a This is a special case of the more geeral Cetral Limit Theorem.

4 The Gaussia (Normal distributio Note that p is the mea of the biomial distributio, ad pq its variace. The DeMoivre-Laplace approximatio ca be geeralized to the followig expressio: 1 e (x µ2 2σ 2 2πσ where µ is a mea ad σ 2 is a variace. This expressio the represets a desity fuctio o its ow, kow as the Gaussia or ormal desity. A importat property of this desity is that a liear combiatio of idepedet ormal radom variables gives a ormal radom variable. I other words, let X = a 1 X 1 + a 2 X 2...+a X, where X i is a ormal radom variable with mea µ i ad variace σi 2. The X is a ormal radom variable with mea µ = i a iµ i ad variace i a2 i σ2 i. This ca be show by fidig the trasform of X, E[e sx ] (see below, which is equal to i ] because X E[esXi i s are idepedet. The comfirm that this trasform is the trasform of a ormal radom variable with the desired mea ad variace. More importatly, the Gaussia distributio is at the heart of the most celebrated result i probability theory, the Cetral Limit Theorem. 5 The Cetral Limit Theorem Let X 1,X 2,...X be idepedet idetically distributed radom variables, with fiite mea µ ad fiite variace σ 2. Defie S = X 1 + X 2 +... + X. The, as goes to ifiity, P(a S µ σ b Φ(b Φ(a Example: Cosider a coi with probability p of gettig a head. We toss the coi times. Let S be the umber of heads. How large should be to guaratee that: P( S p.1.5 We would like S p.1 with probability at least.95. Let σ 2 be the variace of the Beroulli trial correspodig to a coi toss. We eed S p σ.1 /σ with probability at least.95. Sice σ 2 = p(1 p, σ 2 has a maximum of 1/4, so σ 1/2. So it should be eough to guaratee S p σ.2 with probability at least.95. Usig the Cetral Limit Theorem, we eed Φ(.2 Φ(.2.95 From the tables, we see that 964. Note that this is a approximatio, sice is ot really ifiity; moreover, the boud.2 is ot fixed ad

depeds o. A better guaratee, but less practical, is to use Chebychev iequality (which ca also prove the weak law of large umbers: P( S p ǫ σ2 ǫ 2 Therefore, we eed 1/(4.1 2.5 or 5. Aother example: Law of large umbers (weak form. Cosider the followig probability: P( S ǫ p ǫ = P( S σ Φ( ǫ/σ Φ( ǫ/σ Therefore, for ay fixed ǫ, as, P( S p ǫ Φ( Φ( = 1 We say that S / p i probability. This is our ituitive otio that as icreases the average umber of successes is close to the probability of success (1/2 if coi is fair for istace. Here s a sketch of a proof for the Cetral Limit Theorem: Give a radom variable X, cosider the trasform E[e sx ] (a fuctio of s. The trasform uiquely defies the PDF. The trasform of a Gaussia (ormal radom variable with mea ad variace 1 ca be easily computed to be e s2 /2. Now cosider the trasform of S µ σ. E[e s( X 1 µ σ +...+ X µ σ ] = E[e s X 1 µ σ s...e X µ σ ] Sice X i are idepedet ad idetically distributed, we get: E[e s X µ σ ] Note that for a give s, s/ ; therefore, where o(s2 / s 2 / e s X µ σ s X µ = 1 + + s2 (X µ 2 σ 2σ 2 + o(s 2 /. Therefore, E[e s X µ σ ] = E[1 + s X µ + s2 (X µ 2 σ 2σ 2 + o(s 2 /] = (1 + + s2 2 + o(s2 / = (1 + s2 /2 + o(s 2 / e s2 /2+s 2 o(s 2 / s 2 / e s2 /2 This is ot a rigorous proof because it assumes that the trasform exists (which is ot ecessarily true. But it captures the essece of a proof. A more rigorous proof based o a similar trasform is possible.