STATISTICS 200 Lecture #16 Thursday, October 13, 2016 Textbook: Sections 9.3, 9.4, 10.1, 10.2 Objectives: Define standard error, relate it to both standard deviation and sampling distribution ideas. Describe the sampling distribution of a sample proportion. Reformulate confidence interval formula using general idea of estimate plus/minus (multiplier standard error) Interpret confidence level as a relative frequency Calculate new values of the multiplier for new confidence levels other than 95%
We now begin a strong focus on Inference Proportions Means One population proportion Two population proportions One population mean Difference between Means Mean difference This week
Motivation Eventual Goal: Use statistical inference to answer the question What is the percentage of Creamery customers who prefer chocolate ice cream over vanilla? Strategy: Get a random sample of 90 individuals and ask them this question. Use the answers to perform a hypothesis test to answer the question.
Comparison of Binomial-based statistics Variable Notation Mean St. Dev. Count of successes Chapter 8 X n p p np(1 p) Proportion of successes Chapter 9 and beyond ˆp = X n p r p(1 p) n
Binomial Distribution vs. approximate p-hat sampling distribution: n = 100 & p = 0.70 µ = = p np = 70 µ = p =0.70 r np(1 p) =4.58 p(1 p) = =0.05 n
A better confidence interval OLD: ˆp 1 p to ˆp n 1 p n Conservative margin of error: 1 p n NEW: ME = (multiplier)*(standard error) (ˆp ME to ˆp ME)
New formula for margin of error ME = (multiplier) (standard error) Z* Related to. Empirical rule Expresses level of confidence that the interval includes the. parameter s.e.(ˆp) Estimate of the Standard deviation of the sampling distribution of p-hat s.e.(ˆp) = r ˆp(1 n ˆp)
Z*-multiplier Use when the normal approximation is appropriate, i.e. when n*p > 10 and. n*(1-p) > 10 Confidence level 90% 1.65 Multiplier (z*) 95% 1.96 à 2 98% 2.33 99% 2.58 The z-multiplier for a 68% confidence level would be, 1 because we must go 1 standard deviation from the mean to capture 68% of the area. 0.90 0.95 0.98
Three Factors affect the width of a confidence interval ME = Z r ˆp(1 n ˆp) Page 382 textbook 1. Level of confidence 2. Sample size Level of confidence Z* ME sample size s.e.(ˆp) ME 3. ˆp as explained on the next page
3. ˆp in pˆp(1 ˆp) p^(1 p^) The scatterplot shows the variation is A. largest when p-hat = 1.0 B. largest when p-hat = 0.5 C. largest when p-hat = 0.25 D. smallest when p-hat = 0.9 E. smallest when p-hat = 0.2 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.2 0.4 0.6 0.8 1.0 p^
Factor 3: Value of p-hat impacts width of C.I. ME = 2 r ˆp(1 n ˆp) for a 95% CI At a given level of confidence and sample size, the confidence interval is the widest when p-hat equals 0.5 and it becomes narrower as p-hat moves away from 0.5 in either direction.
Confidence Intervals: Population Proportion When normal conditions aren t met, use this option Conservative Method: Chapter 1 & 5 ME = 1 n Normal Approximation: Chapter 10 ME = z* pˆ(1 pˆ) n Exact (Binomial) Need a computer to calculate the interval. Does not include a M.E. Minitab: provides both options Pages 389 & 390 in the textbook
13 Binomial distributions n fixed at 10, p increasing p fixed at 0.02, n increasing Values of n and p determine whether binomial is normal in shape
What does it mean to be 95% confident? Before the sample is drawn: We can say that P(conf. int. contains the true parameter) = 0.95. After the sample is drawn: There is no more randomness! (Both the CI and the parameter are now fixed.) So we cannot talk of probability any longer.
Interpreting 95% confidence: An example Suppose we have a sample of 200 students in STAT 100 and find that 28 of them are left handed. Our sample proportion is: 0.14 We now find the ME and construct a 95% CI.
Find the standard error: That is, estimate the standard deviation of the sample proportion based on a sample of size 200: r 0.14 (1 0.14) 200 =0.025 Hence, z* times the standard error = 2.025 =.05 On the following two slides, we'll pretend that the true population proportion is 0.12.
Normal curve of sample proportions based on sample size 200 The green curve is the true distrtibution of p- hat. Of course, ordinarily we don't know where it lies, but at least we know its approximate standard deviation. Thus, we can build a confidence interval around our 14% estimate (in red). 0.08 0.10 0.12 0.14 0.16 0.18 sample percents If we take another sample, the red line will move but the green curve will not!
30 confidence intervals based on sample size 200 If we repeat the sampling over and over, 95% of our confidence intervals will contain the true proportion of 0.12. This is why we use the term "95% confidence interval". 0.06 0.08 0.10 0.12 0.14 0.16 0.18 sample percents
Definition of "95% confidence interval for the true population proportion": An interval of values computed from a sample that will cover the true but unknown population proportion for 95% of the possible samples. To find a 95% CI: The center is at p-hat. The margin of error is 2 times the S.E., where the S.E. is the square root of [p-hat(1-p-hat)/n].
What does it mean to be 95% confident? A. There is a 95% probability that the one interval that I calculated contains the true value for the parameter. B. If I get 100 such intervals, about 95 of them will contain the true value for the parameter. C. The sample estimate has a 95% chance of being inside the calculated interval. D. The p-value has a 95% chance of being inside the interval.
If you understand today s lecture 9.25, 9.33, 9.35, 9.37, 10.1, 10.3, 10.7, 10.9, 10.11, 10.13, 10.15, 10.19, 10.21, 10.23, 10.25, 10.27, 10.33, 10.45 Objectives: Define standard error, relate it to both standard deviation and sampling distribution ideas. Describe the sampling distribution of a sample proportion. Reformulate confidence interval formula using general idea of estimate plus/minus (multiplier standard error) Interpret confidence level as a relative frequency Calculate new values of the multiplier for new confidence levels other than 95%