Topic 6 - Confidence intervals based on a single sample

Similar documents
Topic 3 - Discrete distributions

Chapter 6. Estimates and Sample Sizes

Chapter 9 Inferences from Two Samples

Confidence intervals CE 311S

Business Statistics. Lecture 5: Confidence Intervals

Ch. 7 Statistical Intervals Based on a Single Sample

Statistic: a that can be from a sample without making use of any unknown. In practice we will use to establish unknown parameters.

Medical statistics part I, autumn 2010: One sample test of hypothesis

Confidence Intervals for Normal Data Spring 2014

LECTURE 12 CONFIDENCE INTERVAL AND HYPOTHESIS TESTING

2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008

Confidence Intervals for Normal Data Spring 2018

Sections 7.1 and 7.2. This chapter presents the beginning of inferential statistics. The two major applications of inferential statistics

Topic 10 - Linear Regression

1 Statistical inference for a population mean

Chapter 15 Sampling Distribution Models

Chapter 12: Inference about One Population

Mock Exam - 2 hours - use of basic (non-programmable) calculator is allowed - all exercises carry the same marks - exam is strictly individual

Unit 9: Inferences for Proportions and Count Data

CS 5014: Research Methods in Computer Science. Bernoulli Distribution. Binomial Distribution. Poisson Distribution. Clifford A. Shaffer.

STAT 4385 Topic 01: Introduction & Review

STA Module 10 Comparing Two Proportions

Discrete Distributions

Point Estimation and Confidence Interval

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

Mathematical Notation Math Introduction to Applied Statistics

Unit 9: Inferences for Proportions and Count Data

Chapter 6 Estimation and Sample Sizes

16.400/453J Human Factors Engineering. Design of Experiments II

Statistical Inference

Inference for Proportions, Variance and Standard Deviation

Math/Stat 352 Lecture 10. Section 4.11 The Central Limit Theorem

Section 9.4. Notation. Requirements. Definition. Inferences About Two Means (Matched Pairs) Examples

Chapter 5: HYPOTHESIS TESTING

Introduction to Econometrics. Review of Probability & Statistics

Continuous random variables

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

Statistics for Business and Economics

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =

Ch. 7: Estimates and Sample Sizes

Hypothesis testing for µ:

Section 2: Estimation, Confidence Intervals and Testing Hypothesis

Harvard University. Rigorous Research in Engineering Education

15-388/688 - Practical Data Science: Basic probability. J. Zico Kolter Carnegie Mellon University Spring 2018

Chapters 3.2 Discrete distributions

This does not cover everything on the final. Look at the posted practice problems for other topics.

Data Analysis and Statistical Methods Statistics 651

Exam Empirical Methods VU University Amsterdam, Faculty of Exact Sciences h, February 12, 2015

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series

QUIZ 4 (CHAPTER 7) - SOLUTIONS MATH 119 SPRING 2013 KUNIYUKI 105 POINTS TOTAL, BUT 100 POINTS = 100%

Special distributions

Lecture 10: Introduction to Logistic Regression

Week 2: Review of probability and statistics

Block 3. Introduction to Regression Analysis

Last few slides from last time

1 Hypothesis testing for a single mean

STA 584 Supplementary Examples (not to be graded) Fall, 2003

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 2 MATH00040 SEMESTER / Probability

Sample Problems for the Final Exam

Chapter 18. Sampling Distribution Models. Copyright 2010, 2007, 2004 Pearson Education, Inc.

II. The Normal Distribution

Econ 325: Introduction to Empirical Economics

Inferences for Proportions and Count Data

Mathematical Notation Math Introduction to Applied Statistics

Margin of Error for Proportions

Math101, Sections 2 and 3, Spring 2008 Review Sheet for Exam #2:

Purposes of Data Analysis. Variables and Samples. Parameters and Statistics. Part 1: Probability Distributions

Notes for Math 324, Part 19

CHAPTER 8. Test Procedures is a rule, based on sample data, for deciding whether to reject H 0 and contains:

Lecture Slides. Elementary Statistics. Tenth Edition. by Mario F. Triola. and the Triola Statistics Series

2011 Pearson Education, Inc

Chapter 6 Sampling Distributions

Relating Graph to Matlab

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

An interval estimator of a parameter θ is of the form θl < θ < θu at a

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 9.1-1

Statistical Inference, Populations and Samples

Chapter 7: Statistical Inference (Two Samples)

Confidence Intervals, Testing and ANOVA Summary

Single Sample Means. SOCY601 Alan Neustadtl

Lecture 11: Simple Linear Regression

Session 11. Large-sample intervals. Small-sample intervals. Bootstrap. Confidence Intervals via the bootstrap. Dr. Syring April 2, 2019

IV. The Normal Distribution

Math 494: Mathematical Statistics

Binomial Model. Lecture 10: Introduction to Logistic Regression. Logistic Regression. Binomial Distribution. n independent trials

Statistical Intervals (One sample) (Chs )

Design and Optimize Sample Plans at The Dow Chemical Company

MVE055/MSG Lecture 8

STATISTICAL INFERENCE PART II CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Ling 289 Contingency Table Statistics

p = q ˆ = 1 -ˆp = sample proportion of failures in a sample size of n x n Chapter 7 Estimates and Sample Sizes

Continuous RVs. 1. Suppose a random variable X has the following probability density function: π, zero otherwise. f ( x ) = sin x, 0 < x < 2

example: An observation X comes from a normal distribution with

AP Statistics Review Ch. 7

STA2603/205/1/2014 /2014. ry II. Tutorial letter 205/1/

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

Area is the amount of space a shape covers. It is a 2D measurement. We measure area in square units. For small areas we use square centimetres.

Comparing Systems Using Sample Data

Applied Statistics I

Stat 100a: Introduction to Probability. Outline for the day:

Transcription:

Topic 6 - Confidence intervals based on a single sample Sampling distribution of the sample mean Sampling distribution of the sample variance Confidence interval for a population mean Confidence interval for a population variance Confidence interval for a population proportion 1

Confidence Intervals To use the CLT in our examples, we had to know the population mean,, and the population standard deviation,. It is okay to estimate the population parameters if we have a huge amount of sample data, using x and s respectively. In most cases, the primary goal of the analysis of our sample data is to estimate and to determine a range of likely values for these population values. 2

Confidence interval for with known Suppose for a moment we know but not The CLT says that X is approximately normal with a mean of and a variance of 2 /n. So, Z x will be standard normal. n For the standard normal distribution, let z be such that P(Z > z ) = 3

Confidence interval for with known Safety tip..when working with Confidence Intervals, alpha is the total area of the curve outside the upper and lower boundaries. For example, if you want a 95% confidence interval, the area between the extremes, limits or boundaries specified is 95% and the area outside, or alpha, is 5%. Px ( Z /2 xz /2 ) xz /2 n n n 4

Confidence interval for with unknown For large samples (n 30), we can replace with s, so that x z s n is a (1-)100% confidence interval for. /2 For small samples (n < 30) from a normal population, x t s n /2, n 1 is a (1-)100% confidence interval for. The value t,n-1 is the appropriate quantile from a t distribution with n-1 degrees of freedom. Demonstration of the t distribution calculator 5

Acid rain data The EPA states that any area where the average ph of rain is less than 5.6 on average has an acid rain problem. The ph values collected at Shenandoah National Park are listed below. Calculate 95% and 99% confidence intervals for the average ph of rain in the park. 6

Acid rain calculations by hand Summary statistics: Column n Mean Variance Std. Dev. ph_level 90 4.5779 0.0836 0.2892 s x Z /2 n 0.2892 4.5779 (1.96)( ) 4.5182; 4.6377 90 7

Light bulb data The lifetimes in days of 10 light bulbs of a certain variety are given below. Give a 95% confidence interval for the expected lifetime of a light bulb of this type. Do you trust the interval? Summary statistics: Column n Mean Variance Std. Dev. lifetime 10 0.5580 0.9767 0.9883 Z 1.96 T 2.2622.025.025,9 Using Z Using T 0.9883 0.9883 0.5580 1.96 0.5580 2.2622 10 10 ( 0.0545;1.1705) ( 0.1499;1.2650) 8

Interpreting confidence statements For any % CI, a valid interpretation is, If we perform this study a large number of times, we would expect the estimated parameter to fall within our confidence interval that % of the time. The confidence deals with the long-term view, not the percentage on any specific trial. Run the confidence interval applet in StatCrunch. 9

Sample size effect As sample size increases, the width of our confidence interval clearly decreases (of course, as long as you re dealing with sample means). If is known, the width of our interval for is Solving for n, L n 2z /2 n (2 z /2 ) L 2 If is known or we have a good estimate, we can use this formula to decide the sample size we need to obtain a certain interval length. 10

Paint example Suppose we know from past data that the standard deviation of the square footage covered by a one gallon can of paint is somewhere between 2 and 4 square feet. How many one gallon cans do we need to test so that the width of a 95% confidence interval for the mean square footage covered will be at most 1 square foot? n (2 z /2 ) L 2 11

Additional sample size problem We buy sacks of flour from a vendor that claims the average bag weighs 100 lbs. with a standard deviation of 2 lbs. How many bags do we need to sample to estimate the actual average weight of bags from this vendor to within 0.5 lb. with 90% confidence? n 2 2 2 (2 z/2 ) [2(1.645)( )] 43.2964 44 L 1 How about with 95% confidence? 2 n (2 z/2 ) 61.4656 62 L 12

Confidence interval for a variance In many cases, especially in manufacturing, understanding variability is very important. Often times, the goal is to reduce the variability in a system. For samples from a normal population, ( n 1) s ( n 1) s, 2 2 2 2 /2, n1 1 ( /2), n1 is a (1-)100% confidence interval for 2. The value,n-1 is the appropriate quantile from a distribution with n-1 degrees of freedom. 13

Acid rain data Calculate a 95% confidence interval for the variance of ph of rain in the park. Summary statistics: Column n Mean Variance Std. Dev. ph_level 90 4.57789 0.08363 0.28919 ( n 1) s ( n 1) s, 2 2 2 2 /2, n1 1 ( /2), n1 We have everything but the Chi-square values. 14

Summary statistics: Column n Mean Variance Std. Dev. ph_level 90 4.57789 0.08363 0.28919 95% Chi-square values; 64.79336, 116.98908 2 2 0.025,89 0.975,89 15

Summary statistics: Column n Mean Variance Std. Dev. ph_level 90 4.57789 0.08363 0.28919 95% Chi-square values; 64.79336, 116.98908 2 2 0.025,89 0.975,89 (90 1)0.08363 (90 1)0.08363, 116.98908 64.79336 0.06362;0.11487 16

Confidence interval for a proportion A binomial random variable, X, counts the number of successes in n Bernoulli trials where the probability of success on each trial is p. In sampling studies, we are often times interested in the proportion of items in the population that have a certain characteristic. We think of each sample, X i, from the population as a Bernoulli trial, the selected item either has the characteristic (X i = 1) or it does not (X i = 0). The total number with the characteristic in the sample is then X n X i 1 i 17

Confidence interval for a proportion The proportion in the sample with the characteristic is then n pˆ X/ n X / n i 1 i The CLT says that the distribution of this sample proportion is then normally distributed for large n with E( X / n) E( X) 0(1 p) 1p p Var X E X E X p p p p p Var( X / n) n n n n 2 2 2 2 2 ( ) ( ) ( ) 0 (1 ) 1 (1 ) For the CLT to work well here, we need X 5 and n-x 5 n to be much smaller than the population size 18

Confidence interval for a proportion Example: In a survey of 277 randomly selected shoppers, 69 stated that if the advertised item is unavailable, they request a rain check. Construct a 95% CI for µ. What is the underlying distribution? What is? What is? p x 69 pˆ 0.249 n 277 npˆ 277(.249) 69 5 nqˆ 277(.751) 208 5 ˆp pˆ Z /2 ( pq ˆˆ) n 0.249 1.96 (0.249*0.751) 277 0.249 0.051 or (0.198;0.300) 19

Murder case example Find a 95% confidence interval for the proportion of African Americans in the jury pool. (22 out of 295 African American in sample) 0.0745760.925424 pq ˆˆ 22 pˆ Z /2 1.96 n 295 295.074576 0.02998 0.0446;0.10456 20

Sample size considerations The width of the confidence interval is L 2 z ˆ ˆ /2 p(1 p)/ n To get the sample size required for a specific length, we have 2 n (2 z / L) pˆ(1 pˆ) /2 or sometimes written as... n = 4 Z pq ˆˆ/ L 2 2 /2 21

CI sample size with no clue for actual proportion Often in TV/Media polls, they talk about a margin of error of +/- 3%. This margin of error is actually a 95% confidence interval. How large of a sample do we need to have to result in a margin of error of 3%? 3% L.06 2 pq ˆˆ n 4Z /2 2 L pq ˆˆ n 2 4(1.96 ).06 2 How do we continue? What is? We have a catch-22 situation, because to find the sample size we need an estimate of the population proportion and vice-versa. ˆp 22

CI sample size no clue for p (cont) So, if you have a good supportable estimate for the population proportion, use it. If not, you 0.50. 23

Nurse employment case How many sample records should be considered if we want the 95% confidence interval for the proportion all her records handled in a timely fashion to be 0.02 wide? n 2 pq ˆˆ.9(.1) 4Z /2 4(1.96)( ) 3457.44 3458 2 2 L.02 Where did the.9 come from? That was the original percentage from the nurses contract in Topic 3. If that wasn t in place on average, there would be chaos in the hospital. The additional file for Topic 6 contains examples of confidence intervals, sample size calculations and prediction intervals. 24

Prediction intervals Sometimes we are not interested in a confidence interval for a population parameter but rather we are interested in a prediction interval for a new observation. For our light bulb example, we might want a 95% prediction interval for the time a new light bulb will last. For our paint example, we might want a 95% prediction interval for the amount of square footage a new can of paint will cover. 25

Prediction intervals Given a random sample of size n, our best guess at a new observation, x n+1, would be the sample mean, X. Now consider the difference,. Ex ( n 1 X) x n 1 X Var x ( n 1 X ) 26

Prediction intervals If is known and our population is normal, then using a pivoting procedure gives If is unknown and our population is normal, a (1- )100% prediction interval for a new observation is x t s /2, n 1 1 1 n 27

Acid rain example For the acid rain data, the sample mean ph was 4.577889 and the sample standard deviation was 0.2892. What is a 95% prediction interval for the ph of a new rainfall? Does this prediction interval apply for the light bulb data? 28

Additional Prediction Interval A 7th generation citrus farmer is worried about revenues from this year s crop, specifically because if he doesn t get at least an 82% yield, he won t be able to make the payments on the family farm and the bank will foreclose. It s late spring and the last cold snap is coming tonight; forecast low is 30-degrees. He pulls up the Department of Agriculture data for citrus crops for the area and finds 30 matches for impacts on crop yields due to a 30-degree night, with an average of 85% and a standard deviation of 3.5%. Right before bed, he mentions this all to his wife and the last thing his wife says is, Well, I m sure you ve run some sort of prediction interval to tell whether we re in trouble or not.zzzzzzz 29

Additional Prediction Interval (cont) Will the farmer sleep well tonight (ie., is there a statistical chance that he s not going to make 82% or more on a crop yield?), using a 95% Prediction Interval? x t s /2, n 1 1 1 n 1.85 (2.0452)* 035 1 30.85.0728 (0.7772;0.9228) The answer would be no. His disaster number falls within the realm of reasonable values, given the historical values He could expect to lose the farm 30