Social Studies 201 Notes for November 14, 2003

Similar documents
Social Studies 201 Notes for March 18, 2005

Suggested Answers To Exercises. estimates variability in a sampling distribution of random means. About 68% of means fall

CHAPTER 6. Estimation

Source slideplayer.com/fundamentals of Analytical Chemistry, F.J. Holler, S.R.Crouch. Chapter 6: Random Errors in Chemical Analysis

μ + = σ = D 4 σ = D 3 σ = σ = All units in parts (a) and (b) are in V. (1) x chart: Center = μ = 0.75 UCL =

If Y is normally Distributed, then and 2 Y Y 10. σ σ

1. The F-test for Equality of Two Variances

Comparing Means: t-tests for Two Independent Samples

Z a>2 s 1n = X L - m. X L = m + Z a>2 s 1n X L = The decision rule for this one-tail test is

7.2 INVERSE TRANSFORMS AND TRANSFORMS OF DERIVATIVES 281

A Bluffer s Guide to... Sphericity

Alternate Dispersion Measures in Replicated Factorial Experiments

Lecture 4 Topic 3: General linear models (GLMs), the fundamentals of the analysis of variance (ANOVA), and completely randomized designs (CRDs)

EE Control Systems LECTURE 14

Preemptive scheduling on a small number of hierarchical machines

Lecture 8: Period Finding: Simon s Problem over Z N

Math Skills. Scientific Notation. Uncertainty in Measurements. Appendix A5 SKILLS HANDBOOK

Standard Guide for Conducting Ruggedness Tests 1

Inferences Based on Two Samples: Confidence Intervals and Tests of Hypothesis Chapter 7

Optimal Coordination of Samples in Business Surveys

NCAAPMT Calculus Challenge Challenge #3 Due: October 26, 2011

Chapter 4. The Laplace Transform Method

Control Systems Analysis and Design by the Root-Locus Method

Lecture 9: Shor s Algorithm

DIFFERENTIAL EQUATIONS

IEOR 3106: Fall 2013, Professor Whitt Topics for Discussion: Tuesday, November 19 Alternating Renewal Processes and The Renewal Equation

Gain and Phase Margins Based Delay Dependent Stability Analysis of Two- Area LFC System with Communication Delays

Suggestions - Problem Set (a) Show the discriminant condition (1) takes the form. ln ln, # # R R

After the invention of the steam engine in the late 1700s by the Scottish engineer

Theoretical Computer Science. Optimal algorithms for online scheduling with bounded rearrangement at the end

Regression. What is regression? Linear Regression. Cal State Northridge Ψ320 Andrew Ainsworth PhD

Homework #7 Solution. Solutions: ΔP L Δω. Fig. 1

V = 4 3 πr3. d dt V = d ( 4 dv dt. = 4 3 π d dt r3 dv π 3r2 dv. dt = 4πr 2 dr

Clustering Methods without Given Number of Clusters

Standard normal distribution. t-distribution, (df=5) t-distribution, (df=2) PDF created with pdffactory Pro trial version

PhysicsAndMathsTutor.com

Lecture 10 Filtering: Applied Concepts

Lecture 7: Testing Distributions

ON THE APPROXIMATION ERROR IN HIGH DIMENSIONAL MODEL REPRESENTATION. Xiaoqun Wang

5. Fuzzy Optimization

Multicolor Sunflowers

Moment of Inertia of an Equilateral Triangle with Pivot at one Vertex

MINITAB Stat Lab 3

Chapter 2 Sampling and Quantization. In order to investigate sampling and quantization, the difference between analog

Dimensional Analysis A Tool for Guiding Mathematical Calculations

Laplace Transformation

Fermi Distribution Function. n(e) T = 0 T > 0 E F

Solving Differential Equations by the Laplace Transform and by Numerical Methods

Singular perturbation theory

Feedback Control Systems (FCS)

ARTICLE Overcoming the Winner s Curse: Estimating Penetrance Parameters from Case-Control Data

Chapter 13. Root Locus Introduction

is defined in the half plane Re ( z ) >0 as follows.

A Constraint Propagation Algorithm for Determining the Stability Margin. The paper addresses the stability margin assessment for linear systems

Acceptance sampling uses sampling procedure to determine whether to

into a discrete time function. Recall that the table of Laplace/z-transforms is constructed by (i) selecting to get

Asymptotic Values and Expansions for the Correlation Between Different Measures of Spread. Anirban DasGupta. Purdue University, West Lafayette, IN

arxiv: v2 [nucl-th] 3 May 2018

Stochastic Neoclassical Growth Model

1 Routh Array: 15 points

Lecture 21. The Lovasz splitting-off lemma Topics in Combinatorial Optimization April 29th, 2004

An Inequality for Nonnegative Matrices and the Inverse Eigenvalue Problem

p. (The electron is a point particle with radius r = 0.)

Convex Hulls of Curves Sam Burton

11.2 Stability. A gain element is an active device. One potential problem with every active circuit is its stability

Nonlinear Single-Particle Dynamics in High Energy Accelerators

DIFFERENTIAL EQUATIONS Laplace Transforms. Paul Dawkins

Automatic Control Systems. Part III: Root Locus Technique

Chapter 5 Consistency, Zero Stability, and the Dahlquist Equivalence Theorem

What lies between Δx E, which represents the steam valve, and ΔP M, which is the mechanical power into the synchronous machine?

AP Physics Charge Wrap up

c n b n 0. c k 0 x b n < 1 b k b n = 0. } of integers between 0 and b 1 such that x = b k. b k c k c k

[Saxena, 2(9): September, 2013] ISSN: Impact Factor: INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

Unified Design Method for Flexure and Debonding in FRP Retrofitted RC Beams

Math 273 Solutions to Review Problems for Exam 1

Physics 741 Graduate Quantum Mechanics 1 Solutions to Final Exam, Fall 2014

Introduction to Laplace Transform Techniques in Circuit Analysis

STATISTICAL SIGNIFICANCE

The Hassenpflug Matrix Tensor Notation

Asymptotics of ABC. Paul Fearnhead 1, Correspondence: Abstract

The Secret Life of the ax + b Group

Question 1 Equivalent Circuits

(b) Is the game below solvable by iterated strict dominance? Does it have a unique Nash equilibrium?

( ) ( Statistical Equivalence Testing

The Laplace Transform (Intro)

Estimation of Peaked Densities Over the Interval [0,1] Using Two-Sided Power Distribution: Application to Lottery Experiments

UNIT 15 RELIABILITY EVALUATION OF k-out-of-n AND STANDBY SYSTEMS

Euler-Bernoulli Beams

Bogoliubov Transformation in Classical Mechanics

Given the following circuit with unknown initial capacitor voltage v(0): X(s) Immediately, we know that the transfer function H(s) is

March 18, 2014 Academic Year 2013/14

Determination of Flow Resistance Coefficients Due to Shrubs and Woody Vegetation

5.2.6 COMPARISON OF QUALITY CONTROL AND VERIFICATION TESTS

TRANSITION PROBABILITY MATRIX OF BRIDGE MEMBERS DAMAGE RATING

Codes Correcting Two Deletions

Digital Control System

Solution to Test #1.

Riemann s Functional Equation is Not a Valid Function and Its Implication on the Riemann Hypothesis. Armando M. Evangelista Jr.

GNSS Solutions: What is the carrier phase measurement? How is it generated in GNSS receivers? Simply put, the carrier phase

Online Appendix for Managerial Attention and Worker Performance by Marina Halac and Andrea Prat

Transcription:

1 Social Studie 201 Note for November 14, 2003 Etimation of a mean, mall ample ize Section 8.4, p. 501. When a reearcher ha only a mall ample ize available, the central limit theorem doe not apply to the ditribution of ample mean. In thi cae, if certain aumption are made, the t-ditribution can be ued to decribe the ditribution of ample mean. From thi, an interval etimate of the population mean µ can be contructed. The t-ditribution The t-ditribution i ometime referred to a Student t-ditribution. A table of the t-ditribution i contained in Appendix I, p. 911, of the text. Thi ditribution ha a hape that i very imilar to that of the normal ditribution, and ha the ame interpretation and ue a the normal ditribution in that i i ymmetrical about the centre, peaked in the centre, and trailing off toward the horizontal axi in each direction from centre. Like the normal ditribution, the t-ditribution ha a mean of 0 and a tandard deviation of 1. t-value are meaured along the horizontal axi, and with a tandard deviation of 1, the t-value aociated with each point i alo the number of tandard deviation. For example, a t-value of 1.50 i aociated with a point on the horizontal axi 1.5 tandard deviation to the right of centre. One difference from the normal ditribution i that the t-ditribution i a little more pread out than the normal one way of picturing it i to take the normal ditribution and pull it out a bit at each end. See the diagram on p. 503 of the text, where one ditribution i uperimpoed on the other. Another difference between the t and normal ditribution, i that there i a different t-ditribution for each degree of freedom (df) a new concept that i related to ample ize. A a concept, degree of freedom i a little difficult to explain at thi tage of the coure it refer to how many ample value are free to vary and how many are contrained. In the cae of etimation of a mean, there are n 1 degree of freedom, one le than the ample ize of n. When etimating a mean from n ample value, any n 1 value are free to vary but one value i fixed or contrained, by the fact that a particular value of a mean mut reult. If you find thi confuing, for now jut accept that in etimating the mean the degree of freedom i the ample ize minu

Etimation and Sample Size November 14, 2003 2 one, that i, df = n 1. To return to the t-ditribution, when there are few degree of freedom, the ditribution i very dipered. For example, when the degree of freedom are only 4, the middle 95% of the t-ditribution require including the area from -2.776 to +2.776. Thi i in contrat to the correponding Z-value of ±1.96, from the normal ditribution. But a the number of degree of freedom increae, the t-ditribution approache the normal ditribution. Going down the column of the t-table (p. 911) aociated with the 95% confidence level, if there i a ample ize of 25, meaning df = n 1 = 25 1 = 24 degree of freedom, the t-value i 2.064, coniderably le than the 2.776 for 4 degree of freedom. A the ample ize, and the correponding degree of freedom, become larger, the t-ditribution actually approache the normal ditribution. To ee thi, examine the lat row of the t-table. For a very large degree of freedom (labelled infinite), the t-value for 95% confidence i 1.96, exactly the ame a the correponding Z-value from the table of the normal ditribution. For mot purpoe, when the ample ize reache 30, we ue the normal ditribution. For 29 degree of freedom and 95% confidence, the t-value i 2.045, not much larger than the 1.96 aociated with the normal ditribution. The table of the t-ditribution on p. 911 lit variou confidence level acro the top. You are thu retricted to obtaining confidence interval for the confidence level lited there. But the table provide the t-value aociated with common confidence level uch a 80%, 90%, 95%, and 99%. To ue the table, pick the proper confidence level and the aociated degree of freedom (ample ize minu 1) and the t-value in the table provide the aociated area under the t-ditribution between the t-value and the negative of that t-value. Ditribution of the ample mean, mall n Under certain aumption, the t-ditribution can be ued to obtain interval etimate for the mean of certain ditribution. Thi ection outline the condition for thi. Strictly peaking, the t-ditribution can only be ued if ample i drawn from a normally ditributed population. That i, if a reearcher ha ome aurance that the characteritic of the population being examined i normally ditributed, then mall random ample from thi population have a

Etimation and Sample Size November 14, 2003 3 t-ditribution. Thi reult can be tated a follow. If a population ha a mean of µ and a tandard deviation of σ, and if mall random ample of ize n (le than 30 cae) are drawn from thi population, then the ample mean X of thee ample have a t-ditribution with mean µ, tandard deviation / n, and n 1 degree of freedom, where i the tandard deviation obtained from the ample. Thi can be tated ymbollically. If X i Nor (µ, σ) where µ and σ are unknown, and if random ample of ize n are drawn from thi population, ) X i t d (µ,. n where d = n 1 i the degree of freedom and X and are the mean and tandard deviation, repectively, from the ample. When the ample ize n i mall, ay le than n = 30, the t ditribution hould be ued. If n > 30, then the t value become o cloe to the tandardized normal value Z that the Central Limit Theorem can be ued to decribe the ampling ditribution of X. That i, t Z a n. Thi mean that the t ditribution i likely to be ued only when the ample ize i mall. For larger ample ize, X may till have a t ditribution, but if the ample ize i large enough, the normal value are o cloe to the t value that the normal value are ordinarily ued. There are two aumption aociated with thi reult. 1. A i the cae with larger ample ize, the ample are to be random ample from the population. If the ample are not random ample, it i difficult to determine what the ditribution of the ample mean might be.

Etimation and Sample Size November 14, 2003 4 2. Unlike the central limit theorem, which generally hold regardle of the nature of the ditribution of the variable, the t-ditribution require ampling from a normally ditributed population. Thi i quite a retrictive aumption, ince few population are likely to be exactly normally ditributed. In practice, the t-ditribution i often ued even when there i no aurance that the ample i drawn from a normally ditributed population. If a reearcher conider the population to be very different than normally ditributed, perhap the t-ditribution hould not be ued. But if the population i not ditributed all that differently from a normal ditribution, little error may be introduced by uing the t-ditribution. I generally argue that, in the cae of ample ize le than 30, it i alway better to ue the t-ditribution than the normal ditribution, when conducting interval etimate or hypothei tet. The reaon for thi i that the t-ditribution i more dipered than the normal ditribution, o it give a better picture of the preciion of the ample. If a normal ditribution i ued, interval etimate may be reported a narrower than they really are in practice. By uing the t-ditribution, a reearcher i le likely to make reult look more precie than they really are. Interval etimate for the mean mall ample ize The t ditribution for X can be ued to obtain interval etimate of a population mean µ. The method i the ame for thi mall ample method a it i for the large ample method. That i, the ame erie of five tep can be ued. If the population mean µ i to be etimated, and the ample i a random ample of ize n with ample mean X and ample tandard deviation, and if the population from which thi ample i drawn i normally ditributed, then ) X i t d (µ,. n where d = n 1. In order to obtain an interval etimate, the reearcher pick a confidence level C% and ue thi to determine the appropriate t-value from the t-table on p. 911. For d degree of freedom, let t d be the t value uch that C% of the

Etimation and Sample Size November 14, 2003 5 area under the t curve lie between t d and t d. The C% confidence interval i then X ± t d n or in interval form, ( ) X t d, X + td. n n Note that thi i the ame formula a for the confidence interval when the ample ize i large the only difference i that t d replace the Z-value. Note that the ample tandard deviation i ued in the formula, rather than σ. The latter wa ued when preenting the formula for the interval etimate in the cae of the large ample ize. But even in the cae of a large ample ize, σ i almot alway unknown, o that in practice i ued a an etimate of σ in that formula. The interpretation of the confidence interval etimate i alo the ame a earlier. That i, C% of the the interval X ± t d n contain µ if random ample of ize n are drawn from the population, where d = n 1. Any pecific interval which i contructed will either contain µ or it will not contain µ, but the reearcher can be confident that C% of thee interval will be wide enough o that µ will be in the interval. Example wage of worker after plant hutdown In Bringing Globalization Down to Earth: Retructuring and Labour in Rural Communitie in the Augut, 1995 iue of the Canadian Review of Sociology and Anthropology, the author Belinda Leach and Anthony Winon examine change in wage of worker after a plant hutdown. Before hutdown, mean male wage were $13.76 per hour and mean female wage were $11.80 per hour. After hutdown, ome of the worker found new job and the data from mall ample of uch worker i contained in Table 1. Uing data in thi table, obtain 95% interval etimate for the true mean wage of (i) male worker after the plant hutdown, and (ii) female worker after the plant hutdown. From thee interval etimate comment on whether there i trong evidence that the wage of male and female worker have declined.

Etimation and Sample Size November 14, 2003 6 Table 1: Data on Hourly Wage of Worker with Job, After Plant Shutdown Type of Hourly Wage in Dollar Sample Worker Mean St. Dev. Size Male 12.20 3.27 12 Female 8.11 3.53 12 Anwer For the firt part, the parameter to be etimated i µ, the true mean wage for all male worker who lot job becaue of the plant hutdown. Organizing the anwer in term of the five tep involved in interval etimation (ee note of November 10), the anwer i a follow. 1. The ample mean X, tandard deviation, and ample ize n are given in Table 1. The ample ize of n = 12 i mall in thi cae, o it will be neceary to ue the t-ditribution. 2. Auming the ditribution of wage of all male worker who lot job in the hutdown i a normal ditribution, the ditribution of X i a t-ditribution with mean µ and tandard deviation / n with n 1 = 12 1 = 11 degree of freedom. That i where d = n 1 = 11. X i t d (µ, ). n 3. From the quetion, the confidence level i C = 95%. 4. For 11 degree of freedom and 95% confidence level, the t-value i 2.201. 5. The interval are X ± t d n

Etimation and Sample Size November 14, 2003 7 and uing value from thi ample, the interval are For X = 12.20, the interval i X ± 2.201 3.27 12 X ± 2.201 3.27 3.464 X ± (2.201 0.944) X ± 2.078 12.20 ± 2.078 Thu the 95% interval etimate for the true mean wage level for all male who have lot job becaue of the plant hutdown i ($10.12, $14.28). For the female worker, the ame tep yield the interval For X = 8.11, the interval i X ± 2.201 3.53 12 X ± 2.201 3.53 3.464 X ± (2.201 1.019) X ± 2.243 8.11 ± 2.243 Thu the 95% interval etimate for the true mean wage level for all female who have lot job becaue of the plant hutdown i ($5.87, $10.35). Comment on reult. From the data in Table 1, the ample provide evidence that the hourly wage of both male and female worker ha declined ince the plant hutdown. The twelve male in the ample had a mean wage of $12.20 after the hutdown, $1.56 per hour le than the $13.76 they were earning prior to the hutdown. On average, the twelve female worker uffered

Etimation and Sample Size November 14, 2003 8 a decline of $3.69 per hour, from $11.80 prior to the hutdown to $8.11 after the hutdown. The interval etimate provide fairly trong evidence that all female worker uffered a decline in hourly wage, while the evidence for a decline i not o clear in the cae of male. Conider firt the female interval etimate. There i 95% certainty that the ample mean from a ample of ize twelve yield differ from the true mean by no more than $2.24. In the cae of thi ample, the interval i from $5.87 to $10.35. While a reearcher cannot be certain thi interval contain the true mean hourly wage for female after the plant hutdown, it very likely doe. But thi interval lie well below the former mean hourly wage of $11.80 per hour. A a reult, it eem fairly certain that female wage after hutdown are lower than prior to the hutdown. In contrat, the decline in male wage wa le than that for female and the 95% etimate for thi ample yield an interval for the male that contain the previou mean pay of $13.76. Since the reearcher i relatively certain that the true mean male hourly wage i in the interval from $10.12 to $14.28, it i poible that the true mean for all male worker i around the former mean hourly wage of $13.76. While the ample mean i le than the previou mean, it i not a lot le there i thu weak evidence for a decline of male hourly wage but the evidence i not a trong a in the female cae. The interval etimate do not provide direct tet of whether mean wage have changed or not that will be provided later in the ection on hypothei teting. But the reult of hypothei tet will be hown to be conitent with the comment above that i, there i evidence that female wage declined but inufficient evidence to prove that male wage declined. Small and large ample ize Small ample generally reult in fairly wide confidence interval, thu providing le precie etimate of the mean than do larger ample. Comparing the interval for mall ample X ± t d n with thoe from larger ample X ± Z n

Etimation and Sample Size November 14, 2003 9 there are two difference that contribute to thi. 1. If the ample ha a mall ample ize, the t-ditribution mut be ued, rather than the normal ditribution. A noted earlier, and a can be een by comparing the t-value with the correponding normal value, t-value are alway larger than the correponding Z-value for any given confidence level. The ± ection of the interval i thu larger for mall n, ince the larger t-value, rather than the maller Z-value, mut be ued. 2. When the ample ize i maller, the quare root of the ample ize i alo maller. Since the quare root of the ample ize i maller in thi cae, the denominator of / n i maller, o the fraction a a whole i larger. Table 2: Preciion of etimate for 95% interval, mall ample of ize 16 and large ample of ize 256, with a ample tandard deviation of = 12 Size Characteritic of ample of Sample t(/ n) or ample ize (n) n / n Z(/ n) Small 16 4 12/4 = 3 2.131 3 = 6.393 Large 256 16 12/16 = 0.667 1.96 0.667 = 1.31 Table 2 illutrate the difference between a mall and a large ample ize for ample from a hypothetical population. A in the cae of the lat example, a ample tandard deviation of = 12 i hypotheized for each of thee ample. For the ample of ize 16, the interval are X ± 6.4, while for the ample of ize 256, the interval are X ± 1.3. From the table, it can be een that thi large difference in interval width emerge from the two factor mentioned above. For the mall ample, the t-value of 2.131 i larger than the Z-value of 1.96 for the larger ample. In addition, the tandard error, or tandard deviation of the ample mean i 3 in the cae of the mall ample and only 0.667 in the cae of the large ample.

Etimation and Sample Size November 14, 2003 10 Given thee different ized interval that can emerge from different ample, the next ection i devoted to determining appropriate ample ize prior to the ample being elected. Sample ize for etimating a population mean By uing the central limit theorem, prior to obtaining the ample, it i poible to pecify the ample ize required to achieve a given degree of accuracy for an etimate of the mean. In addition, the confidence level for the interval etimate mut be pecified and the reearcher mut have ome knowledge of the variability of the population from which the ample i drawn. Since a larger ample ize generally take more time and effort, cot more, and may diturb the population more, a reearcher want to elect the mallet poible ample ize conitent with obtaining the required accuracy and confidence level. But thi may be quite a large ample ize, and thi ection provide the rationale for determining thi ample ize. Such a ample mut alo be a random ample other method of ampling may be aociated with different required ample ize. Since the required ample ize i uually large, the central limit theorem can be ued to decribe the ditribution of ample mean. Thi provide the reearcher with a way to determine the variability of ample mean prior to obtaining the ample. From thi theorem, the normal ditribution i generally ued, rather than the t-ditribution, ince the ample ize that will be required i uually larger than 30. Notation. The required accuracy for the etimate i denoted by E, o that the interval contructed will be X ± E after the ample data have been obtained. Note that thi i an interval of ±E on either ide of the ample mean X, o the interval width i W = 2E. The letter E i ued here to denote error, that i, ampling error. Thi amount E i equivalent to the ampling error of the ample. A before, the confidence level i C%, with the correponding value from the normal table given the ymbol Z C. That i, ±Z C are the Z-value uch that C% of the ditribution i between them. Derivation of the formula for ample ize If the mean µ of a population i to be etimated, the central limit theorem decribe the ditribution of ample mean X. The theorem tate that when random ample are drawn from a population with mean µ and tandard

Etimation and Sample Size November 14, 2003 11 deviation σ, X i nor ( µ, ) σ n when a random ample from the population ha more than thirty cae. A noted earlier (Table 2) and a can be een by examining the formula, a larger n produce a maller tandard error (tandard deviation of the mean) than doe a maller ample ize. That i, there are different normal ditribution for each different ample ize. The trick i to elect the normal ditribution that will reult in an interval etimate of required accuracy E, that i, an interval of X ± E. Since the ample mean are normally ditributed, for confidence level C, the correponding Z-value from the normal ditribution i Z C. C% of the normal ditribution lie within Z C tandard deviation of the true mean µ. Since one tandard deviation for the ditribution of ample mean i σ/ n, thi mean that Z C tandard deviation amount to Z C σ n. From thi, it hould be poible to ee that, for the ditribution of ample mean (with mean µ and tandard deviation σ/ n), C% of the area under the ditribution i within the interval from µ ± Z C σ n. C% of the ample mean X alo lie within thee limit. Now if the interval etimate i to be accurate to within ±E, thi mean that the ample mean mut be within E of the population mean µ. What we need to match the accuracy of the etimate with the interval of the previou paragraph i a ample ize uch that: µ ± Z C σ n and match. Thi occur when: µ ± E E = Z C σ n.

Etimation and Sample Size November 14, 2003 12 Thi can be obtained by olving thi expreion for n. Rearranging and olving thi expreion for n (ee p. 533 of the text), give n = Z2 σ 2 ( ) Zσ 2 = E 2 E where I have imply ued Z, rather than Z C. If a random ample of the ize pecified by thi formula i obtained, then the confidence interval etimate obtained by the reearcher hould be of accuracy E, that i, it hould be approximately X ± E. In order to ee that it i practical to ue thi formula, all the term on the right ide of thi equation can be obtained prior to the ample being obtained. That i, the accuracy of the etimate deired, E, can be pecified by the reearcher prior to conducting the ample. The Z-value can be determined from the table of the normal ditribution once a confidence level i given. Finally, a reearcher ha to have ome etimate of the variability of the population from which the ample i to be drawn, o that σ can be pecified in the formula. Some guideline concerning thi are contained later in thee note, and on pp. 538-9 of the text. Example ample ize to etimate mean wage Uing the data above concerning the wage of worker who lot their job becaue of a plant hutdown, the problem wa that with ample of ize 12, the interval were fairly wide, jut over ±$2 for each of male and female worker. In thi example, the ample ize required in order to determine the mean wage correct to within (a) one dollar, and (b) fifty cent will be determined. For thi example, the 95% confidence level i ued. Anwer A with any problem of thi ort, the firt tep i to be clear concerning what i to be etimated. In thi cae, the parameter to be etimated i µ, the current mean wage of all worker who lot job becaue of the plant hutdown. Since thi i an etimate of a mean, and ince we expect the required ample ize to be reaonably large, the central limit theorem can be ued to decribe the ditribution of ample mean. A hown above, from thi, the required ample ize i ( ) Zσ 2 n = E

Etimation and Sample Size November 14, 2003 13 where E i the accuracy required of the etimate. For the firt part of the example, the accuracy required i one dollar, or E = 1. If 95% confidence i to be ued, thi mean Z = 1.96, ince 95% of the area under a normal ditribution lie between Z = 1.96 and Z = +1.96. While the true tandard deviation of hourly wage for all worker i not known, the mall ample from Table 1 provide an idea of the variability of hourly wage. Since female wage vary more than male wage, the female tandard deviation of = 3.53 will be ued to provide an etimate of σ in the formula for ample ize. That i, the tandard deviation from the more variable group will be ued, to enure that a large enough ample ize i obtained. From thee value the determination of ample ize i n = ( ) Zσ 2 E ( 1.96 3.53 n = 1.00 ) 2 n = 6.9188 2 = 47.8 or a ample ize of n = 48. A random ample of n = 48 worker hould provide an etimate of mean hourly wage correct to within ±$1, with probability 0.95 or 95% confidence. For an interval etaimate accurate to within fifty cent, or E = $0.50, the ame formula i ued, but with E = 0.50 replacing E = 1. The ample ize i ( ) Zσ 2 n = E ( ) 1.96 3.53 2 n = 0.50 n = 13.8376 2 = 191.479 or a ample ize of n = 192. A random ample of n = 192 worker hould provide a ample mean X that differ from the population mean µ by no more than $0.50.

Etimation and Sample Size November 14, 2003 14 Additional note on ample ize 1. Round up. In the above example, where there were decimal for the ample ize n, thee were alway rounded up to the next integer when reporting the required ample ize. In order to pecify a large enough ample ize, the anwer hould alway be rounded up to the next integer. 2. Unit. When uing the formula above, make ure that E and the etimate of σ are in the ame unit. In the example above, everything wa converted into dollar, to enure conitency. 3. Trade-off. There i often a trade-off between the budget for a urvey and the accuracy of the reult. A larger ample ize produce greater accuracy but thi may cot much more and take much more time and effort. A a reult, a reearcher may not be able to obtain the ample ize pecified by the formula, and may have to live with the le accurate reult from a ample ize maller than deired. 4. Factor aociated with larger required n. A careful look at the formula ( ) Zσ 2 n = E how that the required ample ize n increae a Z increae, σ increae, and E decreae. Thi can be ummarized a follow: (a) A larger confidence level, C%, produce a larger Z-value and reult in a larger required ample ize. (b) A more variable population, with larger σ, mean a larger ample ize i required to achieve the given level of accuracy. In contrat, population where member are imilar to each other in the characteritic being examined, do not require uch large ample ize to achieve the required accuracy of etimate. (c) The greater the accuracy required, the maller i the value of E, and the larger the required ample ize. 5. Population ize not important. The required ample ize doe not depend on the ize of the population from which the ample i being

Etimation and Sample Size November 14, 2003 15 drawn, unle the required ample ize i a large proportion of the population. Suppoe the above formula lead to a ample ize of 200, but the population ize i 10,000 people. Then a random ample from thi population give the accuracy required. If the ize of the population i 100,000, or one million, the ample ize i the ame a random ample with a ize of n = 200 i required in each cae. The only exception to thi i when the population i relatively mall. Say the population ize i 1,000, o the ample ize recommended i 200/1, 000 = 0.2, or 20%, of the population ize. In thi cae, the required ample ize may be reduced omewhat, ince the ample ize i a coniderable portion of the total population. But if the ample ize i le than, ay, 5% of the population ize, the above formula hold. The reaon for thi apparent paradox are within probability theory conult a text on the mathematical principle involved in ampling if you are intereted in thi iue. 6. Etimate of σ. In order to calculate required ample ize, ome etimate of σ, the variability of the population, i required. Some method of obtaining a prior etimate of σ are a follow (ee pp. 538-9 for a fuller dicuion). (a) Small ample. A in the above example, a reearcher may have a mall ample and, from thi, an initial idea of the variability of the population from which a larger ample i to be drawn. (b) Other tudie and other population. Other reearcher may have obtained urvey from a population, or imilar population, and the ample tandard deviation from thee urvey may provide a reaonable etimate of σ. (c) Range. Recall that the tandard deviation may be cloe to onequarter of the range for a variable. Thi i not very exact but, in the abence of much knowledge of the variability of a population, may provide a quick and rough etimate of σ. (d) Sampling method. It may be poible to develop a ampling method o the ample ize can be enlarged later. That i, conduct an initial random ample and if the reult are not accurate enough, randomly elect more cae.