Poisson approximation

Similar documents
Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators

Probability and statistics: basic terms

Stat 319 Theory of Statistics (2) Exercises

Department of Mathematics

Frequentist Inference

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

The Random Walk For Dummies

Some Properties of the Exact and Score Methods for Binomial Proportion and Sample Size Calculation

Chapter 6 Principles of Data Reduction

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Lecture 2: Monte Carlo Simulation

4. Partial Sums and the Central Limit Theorem

A statistical method to determine sample size to estimate characteristic value of soil parameters

x = Pr ( X (n) βx ) =

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

IE 230 Seat # Name < KEY > Please read these directions. Closed book and notes. 60 minutes.

MATH/STAT 352: Lecture 15

6 Sample Size Calculations

1 Inferential Methods for Correlation and Regression Analysis

GUIDELINES ON REPRESENTATIVE SAMPLING

Lecture 7: Properties of Random Samples

Simulation. Two Rule For Inverting A Distribution Function

This is an introductory course in Analysis of Variance and Design of Experiments.

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 9

6.3 Testing Series With Positive Terms

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Properties and Hypothesis Testing

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

Access to the published version may require journal subscription. Published with permission from: Elsevier.

Chapter 6 Sampling Distributions

Confidence Intervals

DEPARTMENT OF ACTUARIAL STUDIES RESEARCH PAPER SERIES

TMA4245 Statistics. Corrected 30 May and 4 June Norwegian University of Science and Technology Department of Mathematical Sciences.

Expectation and Variance of a random variable

Statisticians use the word population to refer the total number of (potential) observations under consideration

The Sampling Distribution of the Maximum. Likelihood Estimators for the Parameters of. Beta-Binomial Distribution

STAC51: Categorical data Analysis

5. Likelihood Ratio Tests

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Approximate Confidence Interval for the Reciprocal of a Normal Mean with a Known Coefficient of Variation

Continuous Functions

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

Statistical inference: example 1. Inferential Statistics

Statistics 511 Additional Materials

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Lecture 19: Convergence

Estimation of a population proportion March 23,

A New Method to Order Functions by Asymptotic Growth Rates Charlie Obimbo Dept. of Computing and Information Science University of Guelph

Closed book and notes. No calculators. 60 minutes, but essentially unlimited time.

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Statistical Intervals for a Single Sample

The Sample Variance Formula: A Detailed Study of an Old Controversy

ADVANCED SOFTWARE ENGINEERING

Sampling Distributions, Z-Tests, Power

R. van Zyl 1, A.J. van der Merwe 2. Quintiles International, University of the Free State

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula

The standard deviation of the mean

Direction: This test is worth 150 points. You are required to complete this test within 55 minutes.

Confidence interval for the two-parameter exponentiated Gumbel distribution based on record values

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

PROBABILITY DISTRIBUTION RELATIONSHIPS. Y.H. Abdelkader, Z.A. Al-Marzouq 1. INTRODUCTION

Statistical Inference Based on Extremum Estimators

Disjoint Systems. Abstract

Introduction to Extreme Value Theory Laurens de Haan, ISM Japan, Erasmus University Rotterdam, NL University of Lisbon, PT

On an Application of Bayesian Estimation

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

Sequences of Definite Integrals, Factorials and Double Factorials

A NEW CLASS OF 2-STEP RATIONAL MULTISTEP METHODS

Section 14. Simple linear regression.

Economics 250 Assignment 1 Suggested Answers. 1. We have the following data set on the lengths (in minutes) of a sample of long-distance phone calls

Paper SD-07. Key words: upper tolerance limit, order statistics, sample size, confidence, coverage, maximization

( µ /σ)ζ/(ζ+1) µ /σ ( µ /σ)ζ/(ζ 1)

NOTES ON DISTRIBUTIONS

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

A new distribution-free quantile estimator

B Supplemental Notes 2 Hypergeometric, Binomial, Poisson and Multinomial Random Variables and Borel Sets

Lecture 6 Simple alternatives and the Neyman-Pearson lemma

Bootstrap Intervals of the Parameters of Lognormal Distribution Using Power Rule Model and Accelerated Life Tests

A LARGER SAMPLE SIZE IS NOT ALWAYS BETTER!!!

Bayesian Control Charts for the Two-parameter Exponential Distribution

Optimally Sparse SVMs

Z ß cos x + si x R du We start with the substitutio u = si(x), so du = cos(x). The itegral becomes but +u we should chage the limits to go with the ew

Maximum likelihood estimation from record-breaking data for the generalized Pareto distribution

Sample Size Determination (Two or More Samples)

The target reliability and design working life

Binomial Distribution

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable

Final Examination Solutions 17/6/2010

Random Variables, Sampling and Estimation

Estimation of Gumbel Parameters under Ranked Set Sampling


Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

Comparison of Minimum Initial Capital with Investment and Non-investment Discrete Time Surplus Processes

Department of Civil Engineering-I.I.T. Delhi CEL 899: Environmental Risk Assessment HW5 Solution

Interval Estimation (Confidence Interval = C.I.): An interval estimate of some population parameter is an interval of the form (, ),

Transcription:

p^ 0.17 0.16 0.15 0.14 0.13 0.12 0.11 0.10 0.09 0.08 0.07 0.06 Poisso approximatio Normal approximatio 90 200 400 800 2000 5000 10,000 Figure 3: Poisso vs. ormal approximatios for large sample sizes. 14

0.5 p^ R2 R1 R4 R3 0.4 Normal approximatio 0.3 0.2 0.1 Do ot approximate Poisso approximatio 0.0 R5 0 20 40 60 80 100 Figure 2: Approximatio methods for codece limits (maximum error: 0.04). 13

0.5 0.4 p^ Normal approximatio 0.3 0.2 Do ot approximate 0.1 0.0 Poisso approximatio 0 20 40 60 80 100 Figure 1: Approximatio methods for codece limits (maximum error: 0.01). 12

[17] R.E. Walpole ad R.H. Myers, Probability ad Statistics for Egieers ad Scietists, Fifth Editio, Macmilla, 1993. [18] S. Wolfram, Mathematica: A System for Doig Mathematics by Computer, Secod Editio, Addiso-Wesley, 1991. 11

[5] J.E. Freud, Mathematical Statistics, Fifth Editio, Pretice-Hall, 1992. [6] B.K. Ghosh, \A Compariso of Some Approximate Codece Itervals for the Biomial Parameter", Joural of the America Statistical Associatio, Volume 74, Number 368, 1979, 894-900. [7] R.N. Goldma ad J.S. Weiberg, Statistics: A Itroductio, Pretice-Hall, 1985. [8] R.V. Hogg ad E.A. Tais, Probability ad Statistical Iferece, Fourth Editio, 1993. [9] L.L. Lapi Probability ad Statistics for Moder Egieerig, Brooks/Cole, 1983. [10] R.J. Larse ad M.L. Marx, A Itroductio to Mathematical Statistics ad Its Applicatios, Secod Editio, Pretice-Hall, 1986. [11] R.F. Lig, \Just Say No to Biomial (ad other Discrete Distributios) Tables", America Statisticia, Volume 46, Number 1, 1992, pp. 53-54. [12] W. Medehall ad T. Sicich, Statistics for Egieerig ad the Scieces, Third Editio, MacMilla, 1992. [13] S.Ross, A First Course i Probability, Third Editio, Macmilla, 1988. [14] M. Schader ad F. Schmid, \Two Rules of Thumb for the Approximatio of the Biomial Distributio by the Normal Distributio", America Statisticia, Volume 43, Number 1, 1989, pp. 23-24. [15] R.L. Scheaer ad J.T. McClave, Probability ad Statistics for Egieers, Third Editio, PWS-Ket, 1990. [16] K.S. Trivedi, Probability ad Statistics with Reliability, Queuig ad Computer Sciece Applicatios, Pretice-Hall, 1982. 10

For sample sizes larger tha 150, the absolute error of either upper ad lower codece limit is less tha 0.01 if the appropriate approximatio techique is used. Figure 3 should be cosulted for specic guidace as to whether the biomial or Poisso approximatio is appropriate. Itroductory probability ad statistics textbooks targetig statistics ad mathematics majors would beet from icludig the use of the F distributio to d p L ad p U. Also, more of these texts should iclude the use of the Poisso approximatio to the biomial distributio for determiig iterval estimates for p. These codece limits oly require a table look-up associated with the chi-square distributio ad are very accurate for large ad small p. Ackowledgmet This research was supported by the Istitute for Computer Applicatios i Sciece ad Egieerig (ICASE). Their support is gratefully ackowledged. Helpful commets from Pam Burch, Herb Multhaup ad Bruce Schmeiser are gratefully ackowledged. Refereces [1] A.D. Aczel, Complete Busiess Statistics, Secod Editio, Irwi, 1993. [2] C. Blyth, \Approximate Biomial Codece Limits", Joural of the America Statistical Associatio, Volume 81, Number 395, 1986, pp. 843-855. [3] G. Casella ad R. Berger, Statistical Iferece, Wadsworth ad Brooks/Cole, 1990. [4] H. Che, \The Accuracy of Approximate Itervals for a Biomial Parameter", Joural of the America Statistical Associatio, Volume 85, Number 410, 1990, pp. 514-518. 9

correspodig to the ormal approximatio rule ^p(1 ^p) 10 the rule labeled \R4" is a plot of ^p = 1 2 p ( 36) correspodig to the ormal approximatio rule ^p(1 ^p) > 9 2 o the rage [36; 100] the rule labeled \R5" is a plot of 20 ad ^p 0:05 or 100 ad ^p 10 correspodig to the guidelie for usig the Poisso approximatio. The, ^pcombiatios fallig above the dotted curves for rules R1, R2, R3, ad R4 correspod to those that would be used if the rules of thumb were followed. Clearly, rules R3 ad R4 are sigicatly more coservative tha R1 ad R2. Figure 3 is a cotiuatio of Figure 2 for sample sizes larger tha = 100. Note that the vertical axis has bee modied ad the horizotal axis is logarithmic. The curve i the gure represets the largest value of ^p where the Poisso approximatio to the biomial is superior to the ormal approximatio to the biomial. Sice this relatioship is liear, a rather uwieldy rule of thumb for betwee 100 ad 10,000 is: use the ormal approximatio over the Poisso approximatio if ^p > 5:2 4 Coclusios log 10. 18:8 Although there are a umber of dieret variatios of the calculatios that have bee coducted here (e.g., oe-sided codece itervals, dieret sigicace levels, dieret deitios of error), there are three geeral coclusios: The traditioal advice from most textbooks of usig the ormal ad Poisso approximatios to the biomial for the purpose of computig codece itervals for p should be tempered with a statemet such as: \the Poisso approximatio should be used whe 20 ad p 0:05 if the aalyst ca tolerate a error that may be as large as 0.04" (see Figure 2). 8

o each Beroulli trial is arbitrary, we oly cosider the rage 0 < ^p 1. Figures 1, 2 2 ad 3 have mirror images for the rage 1 ^p <1. 2 Figure 1 cotais a plot of versus ^p for =2;4;...;100 ad cosiders the rage 0 < ^p 1 for a maximum error of 0.01. Thus if the actual error for a particular (; ^p) 2 pair is greater that 0.01, the poit lads i the \Do ot approximate" regio. If oe of the two approximatios yields a error of less tha 0.01, the the pair belogs to either the \Normal approximatio" or \Poisso approximatio" regios, depedig o which yields a smaller error. Not surprisigly, the ormal approximatio performs better whe the poit estimate is closer to 1 ad the Poisso approximatio performs 2 better whe the poit estimate is closer to 0. Both approximatios perform better as icreases. I order to avoid ay spurious discotiuities i the regios, the calculatios were made for eve values of. The edges of the regio are ot smooth because of the discrete atures of ad ^p. The boudary of the approximatio regios are those (; ^p) pairs where the error is less tha 0.01. If the horizotal axis were exteded, the ormal ad Poisso regios would meet at approximately = 150. Mathematica [18] was used for the comparisos because of its ability to hold variables to arbitrary precisio. If the maximum error is relaxed to 0.04, the there are more cases where the approximatios perform adequately. Figure 2 is aalogous to Figure 1 but cosiders a error of 0.04. This gure also cotais the rules of thumb associated with the ormal ad Poisso approximatios to the biomial distributio. I particular, the rule labeled \R1" is a plot of ^p =5= o the rage [10; 100] correspodig to the ormal approximatio rule ^p 5 ad (1 ^p) 5 the rule labeled \R2" is a plot of ^p = 4 to the ormal approximatio rule ^p 2 q 4+ the rule labeled \R3" is a plot of ^p = 1 2 7 o the rage [4; 100] correspodig ^p(1 ^p) fallig i the iterval (0; 1) p ( 40) 2 o the rage [40; 100]

or 1 y 1 X k=0 (p PL ) k k! e p PL = =2: The left-had side of this equatio is the cumulative distributio fuctio for a Erlag radom variable with parameters p PL at oe. Cosequetly, P [E ppl ;y 1] = =2 ad y (deoted by E ppl ;y) evaluated Sice 2p PL E ppl ;y this reduces to is equivalet toa 2 radom variable with 2y degrees of freedom, P [ 2 2p 2y PL]==2 or p PL = 1 2 2 2y;1 =2 : By a similar lie of reasoig, the upper limit based o the Poisso approximatio to the biomial distributio is p PU = 1 2 2 2(y+1);=2 : This approximatio works best whe p is small (e.g., reliability applicatios where the probability of failure p is small). 3 Compariso of the Approximate Methods There are a multitude of dieret ways to compare the approximate codece itervals with the exact values. We have decided to compute the error of a approximate two-sided codece iterval as the maximum error maxfjp L ~p L j; jp U ~p U jg where ~p L ad ~p U are the approximate lower ad upper bouds, respectively. This error is computed for all combiatios of ad ^p. Sice the deitio of \success" 6

fcrit = Quatile[FRatioDistributio[2 * y, 2 * ( - y + 1)], alpha/2] pl = 1 / ( 1 + ( - y + 1) / ( y * fcrit) ) fcrit = Quatile[FRatioDistributio[2 * (y + 1), 2 * ( - y)], 1 - alpha/2] pu = 1 / ( 1 + ( - y) / ( (y + 1) * fcrit) ) This method is sigicatly faster tha the approach usig the biomial distributio, but ecouters diculty with determiig the F ratio quatiles for some combiatios of ad y. The rst approximate codece iterval is based o the ormal approximatio to the biomial. The radom variable p Y p p(1 p) Thus a approximate codece iterval for p is s Y Y z (1 Y ) =2 where z =2 is the 1 =2 fractile of the stadard ormal distributio. This approximatio works best whe p = 1 2 <p< Y +z =2 is asymptotically stadard ormal. s Y (1 Y ) (e.g., political polls). It allows codece limits that fall outside of the iterval [0, 1]. Oe should also be careful whe Y =0orY = sice the codece iterval will have a width of 0. The secod approximate codece iterval is based o the Poisso approximatio to the biomial (see, for example, Trivedi [16], page 498). This codece iterval does ot appear as ofte i textbooks as the rst approximate codece iterval. The radom variable Y is asymptotically Poisso with parameter p. Therefore, the exact lower boud p L satisfyig X k=y k! p k L(1 p L ) k = =2 ca be approximated with a Poisso lower limit p PL which satises 1X k=y (p PL ) k k! 5 e p PL = =2

Sice this probability is equal to =2 for a two-sided codece iterval, or I a similar fashio, F 2y;2( y+1);1 =2 = ( y +1)p L y(1 p L ) p L = 1+ p U = 1+ 1 : y+1 yf 2y;2( y+1);1 =2 1 : y (y+1)f 2(y+1);2( y);=2 The ext paragraph discusses umerical issues associated with determiig these bouds. is The Mathematica (see [18]) code for solvig the biomial equatios umerically pl = FidRoot[ Sum[Biomial[, k] * p ^ k * (1 - p) ^ ( - k), {k, y, }] == alpha/2, {p, y / } ] pu = FidRoot[ Sum[Biomial[, k] * p ^ k * (1 - p) ^ ( - k), {k, 0, y}] == alpha/2, {p, y / } ] for a give, y ad. This code works well for small ad moderate sized values of. Some umerical istability occurred for larger values of, so the well kow relatioship (Larse ad Marx [10], page 101) betwee the successive values of the probability mass fuctio f(x) of the biomial distributio f(x) = ( x+1)p x(1 p) f(x 1) x =1;2;...; was used to calculate the biomial cumulative distributio fuctio. The Mathematica code for determiig p L ad p U usig the F distributio is 4

the lower limit p L satises X k=y k! p k L(1 p L ) k = =2 where y is the observed value of the radom variable Y ad is the omial coverage of the codece iterval (see, for example, [10], page 279). For y = 1;2;...; 1, the upper limit p U satises yx k=0 k! p k U (1 p U) k = =2: This codece iterval requires umerical methods to determie p L ad p U ad takes loger to calculate as icreases. This iterval will be used as a basis to check the approximate bouds reviewed later i this sectio. A gure showig the coverage probabilities for bouds of this type is show i Blyth [2]. Followig a derivatio similar to his, a faster way to determie the lower ad upper limits ca be determied. Let W 1 ;W 2 ;...;W be iid U(0, 1) radom variables. Let Y be the umber of the W i 's that are less tha p. Hece Y is biomial with parameters ad p. Usig a result from page 233 of Casella ad Berger [3], the order statistic W W (y) has the beta distributio with parameters y ad y + 1. Sice the evets Y y ad W<pare equivalet, P [Y y] (which is ecessary for determiig p L ) ca be calculated by P (Y y) = P (W <p) = Usig the substitutio t = ( ( +1) (y) ( y +1) Z p y+1)w ad simplifyig yields y(1 w) 0 w y 1 (1 w) y dw: P (Y y) = ( +1) (y) ( y +1) = P [F 2y;2( y+1) < ( y+1 y ( y +1)p ]: y(1 p) Z ( y+1)p ) y+1 y(1 p) 0 ( y+1 y t y 1 dt +1 + t) 3

Poisso approximatios to the biomial distributio. Determiig a codece iterval for p whe the sample size is large usig approximate methods is ofte eeded i simulatios with a large umber of replicatios ad i pollig. Computig probabilities usig the ormal ad Poisso approximatios is ot cosidered here sice work has bee doe o this problem. Lig [11] suggests usig a relatioship betwee the cumulative distributio fuctios of the biomial ad F distributios to compute biomial probabilities. Ghosh [6] compares two codece itervals for the Beroulli parameter based o the ormal approximatio to the biomial distributio. Schader ad Schmid [14] compare the maximum absolute error i computig the cumulative distributio fuctio for the biomial distributio usig the ormal approximatio with a cotiuity correctio. They cosider the two rules for determiig whether the approximatio should be used: p ad (1 p) are both greater tha 5, ad p(1 p) > 9. Their coclusio is that the relatioship betwee the maximum absolute error ad p is approximately liear whe cosiderig the smallest possible sample sizes to satisfy the rules. Cocerig work doe o codece itervals for p, Blyth [2] has compared ve approximate oe-sided codece itervals for p based o the ormal distributio. I additio, he uses the F distributio to reduce the amout of time ecessary to compute a exact codece iterval. Usig a arcsi trasformatio to improve the codece limits is cosidered by Che [4]. 2 Codece Iterval Estimators for p Two-sided codece iterval estimators for p ca be determied with the aid of umerical methods. Oe-sided codece iterval estimators are aalogous. Let p L <p<p U be a \exact" (see [2]) codece iterval for p. For y =1;2;...; 1, 2

1 Itroductio There is coictig advice cocerig the sample size ecessary to use the ormal approximatio to the biomial distributio. For example, a samplig of textbooks recommed that the ormal distributio be used to approximate the biomial distributio whe: p ad (1 p) are both greater tha 5 (see [1], page 211, [5], page 245, [7], page 304, [9], page 148, [16], page 497, [17], page 161) p 2 q p(1 p) lies i the iterval (0; 1) (see [15], page 242, [12], page 299) p(1 p) 10 (see [13], page 171) p(1 p) > 9 (see [1], page 158). May other textbook authors give o specic advice cocerig whe the ormal approximatio should be used. To complicate matters further, most of this advice cocers usig these approximatios to compute probabilities. Whether these same rules of thumb apply to codece itervals is seldom addressed. The Poisso approximatio, while less popular tha the ormal approximatio to the biomial, is useful for large values of ad small values of p. The same samplig of textbooks recommed that the Poisso distributio be used to approximate the biomial distributio whe 20 ad p 0:05 or 100 ad p 10 (see [8], page 177, [5], page 204). Let X 1 ;X 2 ;...;X p ad let Y be iid Beroulli radom variables with ukow parameter = P i=1 X i be a biomial radom variable with parameters ad p. The maximum likelihood estimator for p is ^p = Y, which isubiased ad cosistet. The iterest here is i codece iterval estimators for p. I particular, we wat to compare the approximate codece iterval estimators based o the ormal ad 1

A Compariso of Approximate Iterval Estimators for the Beroulli Parameter Lawrece Leemis Departmet of Mathematics College of William ad Mary Williamsburg, VA 23187-8795 Kishor S. Trivedi Departmet of Electrical Egieerig Duke Uiversity, Box 90291 Durham, NC 27708-0291 The goal of this paper is to compare the accuracy of two approximate codece iterval estimators for the Beroulli parameter p. The approximate codece itervals are based o the ormal ad Poisso approximatios to the biomial distributio. Charts are give to idicate which approximatio is appropriate for certai sample sizes ad poit estimators. KEY WORDS: Codece iterval, Biomial distributio, Beroulli distributio, Poisso distributio. 0