A LARGER SAMPLE SIZE IS NOT ALWAYS BETTER!!!

Similar documents
Some Properties of the Exact and Score Methods for Binomial Proportion and Sample Size Calculation

1 Inferential Methods for Correlation and Regression Analysis

Class 23. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Topic 9: Sampling Distributions of Estimators

GUIDELINES ON REPRESENTATIVE SAMPLING

Stat 421-SP2012 Interval Estimation Section

Estimation of a population proportion March 23,

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

Statistics 511 Additional Materials

The standard deviation of the mean

Confidence Intervals for the Population Proportion p

Topic 9: Sampling Distributions of Estimators

Confidence intervals summary Conservative and approximate confidence intervals for a binomial p Examples. MATH1005 Statistics. Lecture 24. M.

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

Binomial Distribution

Topic 9: Sampling Distributions of Estimators

Commutativity in Permutation Groups

Expectation and Variance of a random variable

MATH/STAT 352: Lecture 15

Lecture 7: Non-parametric Comparison of Location. GENOME 560 Doug Fowler, GS

Mathacle. PSet Stats, Concepts In Statistics Level Number Name: Date: Confidence Interval Guesswork with Confidence

µ and π p i.e. Point Estimation x And, more generally, the population proportion is approximately equal to a sample proportion

On an Application of Bayesian Estimation

April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE

Lecture 7: Non-parametric Comparison of Location. GENOME 560, Spring 2016 Doug Fowler, GS

Statistical inference: example 1. Inferential Statistics

Confidence Intervals รศ.ดร. อน นต ผลเพ ม Assoc.Prof. Anan Phonphoem, Ph.D. Intelligent Wireless Network Group (IWING Lab)

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ

Sampling Distributions, Z-Tests, Power

KLMED8004 Medical statistics. Part I, autumn Estimation. We have previously learned: Population and sample. New questions

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Lecture 2: Monte Carlo Simulation

Properties and Hypothesis Testing

Lecture 5. Materials Covered: Chapter 6 Suggested Exercises: 6.7, 6.9, 6.17, 6.20, 6.21, 6.41, 6.49, 6.52, 6.53, 6.62, 6.63.

Statistical Intervals for a Single Sample

Approximate Confidence Interval for the Reciprocal of a Normal Mean with a Known Coefficient of Variation

Interval Estimation (Confidence Interval = C.I.): An interval estimate of some population parameter is an interval of the form (, ),

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

4.1 Sigma Notation and Riemann Sums

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 9

Topic 10: Introduction to Estimation

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

Parameter, Statistic and Random Samples

Simulation. Two Rule For Inverting A Distribution Function

ON POINTWISE BINOMIAL APPROXIMATION

Exam II Covers. STA 291 Lecture 19. Exam II Next Tuesday 5-7pm Memorial Hall (Same place as exam I) Makeup Exam 7:15pm 9:15pm Location CB 234

Bootstrap Intervals of the Parameters of Lognormal Distribution Using Power Rule Model and Accelerated Life Tests

Department of Civil Engineering-I.I.T. Delhi CEL 899: Environmental Risk Assessment HW5 Solution

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

6 Sample Size Calculations

On a Smarandache problem concerning the prime gaps

Stat 200 -Testing Summary Page 1

PH 425 Quantum Measurement and Spin Winter SPINS Lab 1

Frequentist Inference

Simple Random Sampling!

A NEW METHOD FOR CONSTRUCTING APPROXIMATE CONFIDENCE INTERVALS FOR M-ESTU1ATES. Dennis D. Boos

1 Introduction to reducing variance in Monte Carlo simulations

Discrete probability distributions

Lecture 3. Properties of Summary Statistics: Sampling Distribution

4.3 Growth Rates of Solutions to Recurrences

IE 230 Seat # Name < KEY > Please read these directions. Closed book and notes. 60 minutes.

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01

Instructor: Judith Canner Spring 2010 CONFIDENCE INTERVALS How do we make inferences about the population parameters?

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

UNIT 8: INTRODUCTION TO INTERVAL ESTIMATION

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

A statistical method to determine sample size to estimate characteristic value of soil parameters

Infinite Sequences and Series

Chapter 8: Estimating with Confidence

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Math 140 Introductory Statistics

Lecture 2: Poisson Sta*s*cs Probability Density Func*ons Expecta*on and Variance Es*mators

GG313 GEOLOGICAL DATA ANALYSIS

Probability and Statistics

Basics of Probability Theory (for Theory of Computation courses)

Lecture 5. Random variable and distribution of probability

Estimation for Complete Data

Lecture 5: Parametric Hypothesis Testing: Comparing Means. GENOME 560, Spring 2016 Doug Fowler, GS

STAT 155 Introductory Statistics Chapter 6: Introduction to Inference. Lecture 18: Estimation with Confidence

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Chapter 8: STATISTICAL INTERVALS FOR A SINGLE SAMPLE. Part 3: Summary of CI for µ Confidence Interval for a Population Proportion p

Probability and Statistics Estimation Chapter 7 Section 3 Estimating p in the Binomial Distribution

STAT 350 Handout 19 Sampling Distribution, Central Limit Theorem (6.6)

Confidence Interval for Standard Deviation of Normal Distribution with Known Coefficients of Variation

Chapter 18: Sampling Distribution Models

Department of Mathematics

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

Sample Size Determination (Two or More Samples)

Parameter, Statistic and Random Samples

Basis for simulation techniques

Sampling Error. Chapter 6 Student Lecture Notes 6-1. Business Statistics: A Decision-Making Approach, 6e. Chapter Goals

This is an introductory course in Analysis of Variance and Design of Experiments.

Probability, Expectation Value and Uncertainty

Statistics 300: Elementary Statistics

Chapter 23: Inferences About Means

MBACATÓLICA. Quantitative Methods. Faculdade de Ciências Económicas e Empresariais UNIVERSIDADE CATÓLICA PORTUGUESA 9. SAMPLING DISTRIBUTIONS

Transcription:

A LARGER SAMLE SIZE IS NOT ALWAYS BETTER!!! Nagaraj K. Neerchal Departmet of Mathematics ad Statistics Uiversity of Marylad Baltimore Couty, Baltimore, MD 2250 Herbert Lacayo ad Barry D. Nussbaum Uited States Evirometal rotectio Agecy Washigto, DC 20460 ABSTRACT I a previous paper Neerchal, Lacayo ad Nussbaum (2007) explored the behavior of the well-kow problem of fidig the optimal sample size for obtaiig a cofidece iterval of a pre-assiged precisio (or legth) for the proportio parameter of a fiite or ifiite biary populatio. We illustrated some special problems that arise due to the discreteess of the populatio distributio ad precisio that is measured by the legth of the iterval rather tha by the variace. Specifically, the cofidece level of a iterval of fixed legth does ot ecessarily icrease as the sample size icreases. However, whe such cofidece levels are computed usig ormal approximatios, we see a mootoic behavior. I this paper, we cosider the correspodig problem uder the oisso approximatio ad show that for this distributio mootoicity does ot hold ad oe should be beware of this seemig peculiarity i recommedig sample sizes for studies ivolvig estimatio of meas or proportios. Keywords ad hrases: Cofidece itervals, oisso distributio, Biomial distributio, Hypergeometric distributio, Optimal sample size. INTRODUCTION I may sample situatios, the paret populatio size is relatively small, (say N < 00). For example, the Uited States Evirometal rotectio Agecy (USEA) routiely audits certai small databases. Further, if samplig is expesive, a customer may request the smallest sample size that attais or exceeds a specified cofidece iterval (CI) where that CI has a specified precisio deoted by τ, or d. [A more formal statemet of this problem a la Cochra (975) will follow later.] This questio as stated by oe customer is as follows: How large a sample do I eed to estimate the error rate of a specific data base. This is a fairly straight forward simple radom samplig without replacemet (SRSWOR, i.e. the Hypergeometric Distributio) problem. It is cosidered by us i a previous paper amely Neerchal, Lacayo ad Nussbaum (2007). I that paper, we foud, cotrary to our expectatios, that icreasig the sample size did ot always icrease the magitude of the cofidece level. For example as show i Table., of the Appedix, for a populatio of N=90 ad a desired precisio of.02 (i.e. CI=.04), we see that the cofidece level is NOT mootoe, but rather goes up ad dow as the sample size icreases. This uexpected o-mootoic up-dow-up behavior is observed for the Hypergeometric ad Biomial Distributios, both discreet. O the other had, the cofidece level for the ormal distributio, which is ofte used to approximate biomial probabilities, is mootoically icreasig. I this paper we will ivestigate the mootoicity (or the lack thereof) of oisso distributio.

2. A FORMAL STATEMENT OF THE ROBLEM We cosider the statistical iferece problem of obtaiig a iterval estimate of a pre-assiged legth (also referred to as precisio) for the mea of a populatio. The objective is to provide the optimal sample size that will achieve a desired cofidece level. A prelimiary estimate of µ, the populatio mea, is available a priori. Suppose X deotes the sample mea from a sample of size ; the, we are lookig for the smallest such that ( µ < τ ) α. () X Thus, the cofidece iterval is of fixed legth 2 τ aroud the sample mea ad has at least 00( α)% cofidece level. If the populatio is fiite (size N) ad the sample is obtaied by simple radom samplig without replacemet, (SRSWOR), the cofidece level () above is give by summig up the appropriate terms from a hypergeometric distributio. That is. ( X µ < τ ) = p+ τ j= p τ + [ Np] N [ Np]) j j N (2) where deotes the sample size, ad where the ceilig ad floor fuctios [, ] idicate the smallest iteger less tha or equal to the quatity iside the brackets for the ceilig fuctio ad similarly the floor fuctio idicates the largest iteger ot less tha the quatity i the floor brackets. Suppose we assume that either the populatio is ifiite or the samplig is doe with replacemet; the we ca use the biomial distributio to compute the cofidece level give i (). That is, ( X µ < τ ) = λ + τ j= λ τ + j p ( j p) j (3) Where (, ) i the upper ad lower limits of the summatio symbol above deote ceilig ad floor fuctios respectively. Of course, for large, it is also commo to use Normal approximatio. Eve elemetary textbooks meat for the first course i Statistics cotai elaborate descriptios of Normal approximatio to biomial distributio with or without correctio. That is, ( X µ < τ ) τ p( p) < Z < τ p( p) (4) - 2 -

where Z deotes a stadard ormal radom variable. The advatage of ormal approximatio is that, we ca obtai a explicit formula for the optimal sample size to achieve the desired cofidece level. As show i Cochra (977), opt = z 2 α / 2 p( p) 2 τ I Neerchal, Lacayo ad Nussbaum(2007), we show that the ormal approximatio formula (4) for the cofidece level is mootoically icreasig as the sample size icreases, while the exact formulas (2) ad (3) correspod to a up-ad-dow (a saw tooth shape) growth patter. Cosequetly, oe eeds to use cautio whe roudig up sample size formulas. I this paper, we cosider the same iferece problem uder the oisso distributio, aother popular distributio used widely i practical applicatios. The oisso distributio is also used to approximate biomial whe the sample size is large ad the probability of success is small. We let X, X 2 L, X deote a simple radom sample from a oisso distributio with parameter λ, ad cosider ( X τ, X +τ ) as the fixed legth cofidece iterval for λ. The cofidece level of this iterval is give by ( X τ < λ < X + τ ) = λ τ < λ+ τ = e i= λ τ + λ i= X ( λ) i! i i = X < λ + τ τ < λ < i i= i= X i + τ (5) where, oce agai,, deote ceilig ad floor fuctios as i (3). We have also observed a similar behavior for the oisso distributio as we did for the biomial distributio. I other words, the cofidece level give i equatio (5) is o-mootoic ad its graph will have a saw tooth patter. That is, if we let ( X τ < λ < X + τ ) ( X τ < λ < X + τ ) = + +, the ca actually be positive or egative as the sample size icreases. This ca be see i Figures through 3 of the Appedix. 3. RESULTS AND DISCUSSION It is straightforward to compute the expressio give i (5) usig ay software package which computes oisso probabilities that provide plots of the relatioships betwee cofidece levels ad sample size, for differet combiatios of λ ad τ. [See Figures to 3] The saw tooth patter is obvious. This has - 3 -

major cosequeces i determiig recommeded sample size. The usual practice of roudig up the optimal sample size formula to a higher iteger may lead to a lower cofidece level tha desired. The mai thrust of the author s work is from the vatage poit of applicatios, which focuses o the determiatio of optimal sample size. Whe the samples are expesive to obtai, as it was i the motivatig example of US-EA s auditig case study metioed i the itroductio, it would be quite costly if the additioal samples take actually drive dow the cofidece. This would be like payig more ad gettig less!! This prelimiary ivestigatio of oisso distributio ad our previous work leads to iterestig research questios. We list some of them below.. Is this peculiar behavior of the cofidece level of the fixed legth cofidece itervals true for all the commo discrete distributios ad false for all cotiuous distributios? 2. I our work so far, we focused o the fixed legth cofidece itervals ad correspodig optimal "sample size determiatio problem" a la Cochra (977). Aother commoly used approach is based o lookig at the coverage probabilities by specifyig the Type I ad Type II errors. I fact, for some of the commoly used discrete distributios, so-called exact cofidece itervals are also available. See for example, page 247 (for Biomial) ad page 25 (for oisso) of Millard ad Neerchal (200). A iterestig research questio would be to ask Would we see the saw tooth patter i the cofidece levels as a fuctio the sample size for such cofidece itervals as well? REFERENCES Abramowitz ad Stegu (965). Hadbook of Mathematical Fuctios. Dover ublicatios, Ic. New York. Johso, N.L. ad Kotz, S. (969). Distributios i Statistics: Discrete Distributios. Houghto Miffli Compay, New York. Cochra, W. G. (977). Samplig Techiques, 3rd ed., Wiley, New York. Brow, L. D, Cai T., ad DasGupta, A. (200). Iterval Estimatio for a Biomial roportio. Statistical Sciece, Vol. 6, No. 2, 0-33. Millard. S. M. ad Neerchal, N. K. (200). Evirometal Statistics with S-LUS. CRC/Chapma Hall, Boca Rato, FL. Neerchal, N. K., Lacayo, H. ad Nussbaum, B. D. (2007). Is a Large Sample Size Always Better? America Joural of Mathematics ad Maagemet Scieces. (I process). - 4 -

AENDIX: TABLES AND GRAHS TABLE. Exact Cofidece Levels (robability that the mea will be i the cofidece iterval specified by the precisio tau) for shortest itervals aroud the sample mea, ad the sample sizes proposed by various commercial programs for precisio 0% Exact Cofidece Level for idicated sample size ad precisio [i.e. tau] Cofidece Iterval [i.e. 2*tau=2precisio] = 47 = 60 = 63 =67 0.02 0.7233 0.8799 0.834 0.7457 0.04 0.9047 0.8799 0.9698 0.9454 0.06 0.9047 0.98 0.9698.0000 Figure lot of cofidece level [ ( X µ < τ ) ] vs Sample Size. Legth of CI: 2*tau, tau=., ad Lambda= 0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.5 0. 0.05 0 0 5 0 5 20 25 30 35 Sample Size - 5 -

Table 2 Some raw data that may helps explai jagadess i Fig. tau lambda up lo lambda prob prob2 cof 0. 0 0.735759 0.367879 0.367879 0. 2 2 2 0.676676 0.406006 2 0.27067 0. 3 3 2 3 0.647232 0.4239 3 0.224042 0. 4 4 3 4 0.628837 0.43347 4 0.95367 0. 5 5 4 5 0.6596 0.440493 5 0.75467 0. 6 6 5 6 0.606303 0.44568 6 0.60623 0. 7 7 6 7 0.59874 0.4497 7 0.49003 0. 8 8 7 8 0.592547 0.45296 8 0.39587 0. 9 9 8 9 0.587408 0.455653 9 0.3756 0. 0 0 9 0 0.58304 0.45793 0 0.25 0. 2 9 0.688697 0.3405 0.34886 0. 2 3 0 2 0.68536 0.347229 2 0.334306 0. 3 4 3 0.67532 0.35365 3 0.32967 0. 4 5 2 4 0.66936 0.358458 4 0.30902 0. 5 6 3 5 0.66423 0.36328 5 0.300905 0. 6 7 4 6 0.659344 0.367527 6 0.2986 0. 7 8 5 7 0.654958 0.37454 7 0.283505 0. 8 9 6 8 0.65096 0.37505 8 0.275866 0. 9 20 7 9 0.64774 0.37836 9 0.26883 0. 20 2 8 20 0.643698 0.38422 20 0.262276 0. 2 23 8 2 0.76029 0.3068 2 0.44349-6 -

Figure 2 lot of cofidece level [ ( X µ < τ ) ] vs Sample Size. Legth of CI: 2*tau, tau=., ad Lambda=. 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0. 0 0 5 0 5 20 25 30 35 Sample Size Table 3 Some raw data that may helps explai jagadess i Fig. 2 tau lambda up lo lambda prob prob2 cof 0. 0. 0 0 0. 0.9048374 0.9048374 0 0. 0. 0 0 0.2 0.887308 0.887308 2 0 0. 0. 0 0 0.3 0.740882 0.740882 3 0 0. 0. 0 0 0.4 0.67032 0.67032 4 0 0. 0. 0 0 0.5 0.6065307 0.6065307 5 0 0. 0. 0 0.6 0.8780986 0.54886 6 0.329287 0. 0. 0 0.7 0.84495 0.4965853 7 0.3476097 0. 0. 0 0.8 0.808792 0.449329 8 0.3594632 0. 0. 0 0.9 0.7724824 0.4065697 9 0.365927 0. 0. 0 0.7357589 0.3678794 0 0.3678794 0. 0. 2 0. 0.900463 0.33287 0.5675452 0. 0. 2 0.2 0.879487 0.30942 2 0.5782929 0. 0. 2 0.3 0.85725 0.272538 3 0.5845807 0. 0. 2 0.4 0.8334977 0.246597 4 0.5869008 0. 0. 2 0.5 0.8088468 0.223302 5 0.585767 0. 0. 3 0.6 0.92865 0.208965 6 0.7929 0. 0. 3 0.7 0.906806 0.826835 7 0.72427-7 -

Figure 3. lot of cofidece level [ ( X µ < τ ) ] vs Sample Size. Legth of CI: 2*tau, tau=., ad Lambda=.2 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0. 0 0 0 20 30 40 50 Sample Size Discussio of Figures ad 2 I Figure, we ote a peculiar patter as the sample size icreases. For istace, startig with a sample size of, we see [from Table 2] that the cofidece level is.348 As the sample size icreases to 2, 3, 20, the cofidece level actually decreases. However, whe the sample size goes from 20 to 2, there is a marked icrease i the cofidece level [from.262 to.44 ]. This patter appears to repeat i cycles of 0. I additio, i Figure 2, we ote a similar peculiar patter as the sample size icreases. For istace, startig with a sample size of 6, we see from Table 3 that the cofidece level is.329 As the sample size icreases to 7, 8, 9, 0, the cofidece level icreases. However, whe the sample size goes from 0 to, there is a marked icrease i the cofidece level [from.367 to.567 ]. This patter appears to repeat i cycles of 5. - 8 -