Confidence Intervals. Confidence interval for sample mean. Confidence interval for sample mean. Confidence interval for sample mean

Similar documents
Statistical Intervals (One sample) (Chs )

7.1 Basic Properties of Confidence Intervals

6 Central Limit Theorem. (Chs 6.4, 6.5)

Chapter 23. Inferences About Means. Monday, May 6, 13. Copyright 2009 Pearson Education, Inc.

Introduction to Statistical Data Analysis Lecture 5: Confidence Intervals

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

Chapter 8 - Statistical intervals for a single sample

Practice Problems Section Problems

Simple Linear Regression. Material from Devore s book (Ed 8), and Cengagebrain.com

Confidence intervals CE 311S

Dr. Maddah ENMG 617 EM Statistics 10/12/12. Nonparametric Statistics (Chapter 16, Hines)

Probability Distributions for Continuous Variables. Probability Distributions for Continuous Variables

Inference for Single Proportions and Means T.Scofield

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

Applied Statistics I

Interval estimation. October 3, Basic ideas CLT and CI CI for a population mean CI for a population proportion CI for a Normal mean

ACMS Statistics for Life Sciences. Chapter 13: Sampling Distributions

The Components of a Statistical Hypothesis Testing Problem

Chapter 5: HYPOTHESIS TESTING

Lecture 26: Chapter 10, Section 2 Inference for Quantitative Variable Confidence Interval with t

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

CH.8 Statistical Intervals for a Single Sample

Review of Statistics 101

Chapter 1 Statistical Inference

Chapter 6 Continuous Probability Distributions

Tables Table A Table B Table C Table D Table E 675

Section B. The Theoretical Sampling Distribution of the Sample Mean and Its Estimate Based on a Single Sample

Ch18 links / ch18 pdf links Ch18 image t-dist table

Math/Stats 425, Sec. 1, Fall 04: Introduction to Probability. Final Exam: Solutions

A General Overview of Parametric Estimation and Inference Techniques.

CHAPTER 3 ANALYSIS OF RELIABILITY AND PROBABILITY MEASURES

Chapter 18 Sampling Distribution Models

CS 5014: Research Methods in Computer Science. Bernoulli Distribution. Binomial Distribution. Poisson Distribution. Clifford A. Shaffer.

Lecture 10A: Chapter 8, Section 1 Sampling Distributions: Proportions

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

7 Random samples and sampling distributions

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary

ACE 564 Spring Lecture 8. Violations of Basic Assumptions I: Multicollinearity and Non-Sample Information. by Professor Scott H.

MFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators

The Secret Foundation of Statistical Inference. What you don t know can hurt you

ESTIMATION BY CONFIDENCE INTERVALS

Probability theory and inference statistics! Dr. Paola Grosso! SNE research group!! (preferred!)!!

3 Random Samples from Normal Distributions

Probability Density Functions

Confidence Intervals, Testing and ANOVA Summary

y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output

Business Statistics. Lecture 10: Course Review

Lecture 2 Linear Regression: A Model for the Mean. Sharyn O Halloran

One-sample categorical data: approximate inference

Statistical Inference with Small Samples

CONTENTS OF DAY 2. II. Why Random Sampling is Important 10 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

Business Statistics. Lecture 5: Confidence Intervals

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology Kharagpur

Statistical Inference

CENTRAL LIMIT THEOREM (CLT)

df=degrees of freedom = n - 1

Supporting Australian Mathematics Project. A guide for teachers Years 11 and 12. Probability and statistics: Module 25. Inference for means

appstats27.notebook April 06, 2017

Estimating a population mean

INTRODUCTION TO ANALYSIS OF VARIANCE

Simple linear regression

hp calculators HP 50g Probability distributions The MTH (MATH) menu Probability distributions

IV. The Normal Distribution

Ch. 7: Estimates and Sample Sizes

Background to Statistics

Chapter 27 Summary Inferences for Regression

Probability Distributions.

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series

Semiparametric Regression

An interval estimator of a parameter θ is of the form θl < θ < θu at a

Common ontinuous random variables

Continuous Random Variables. What continuous random variables are and how to use them. I can give a definition of a continuous random variable.

Practitioner's Guide To Statistical Sampling: Part 1

Chapter 23. Inference About Means

7 Estimation. 7.1 Population and Sample (P.91-92)

Chapter 3 Single Random Variables and Probability Distributions (Part 1)

Statistical inference (estimation, hypothesis tests, confidence intervals) Oct 2018

Week 1 Quantitative Analysis of Financial Markets Distributions A

Chapters 4-6: Estimation

Solutions. Some of the problems that might be encountered in collecting data on check-in times are:

Harvard University. Rigorous Research in Engineering Education

4.12 Sampling Distributions 183

Stat 5421 Lecture Notes Fuzzy P-Values and Confidence Intervals Charles J. Geyer March 12, Discreteness versus Hypothesis Tests

Error Analysis in Experimental Physical Science Mini-Version

GEOMETRIC -discrete A discrete random variable R counts number of times needed before an event occurs

Reference: Chapter 7 of Devore (8e)

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

Probability Distributions Columns (a) through (d)

The variable θ is called the parameter of the model, and the set Ω is called the parameter space.

REVIEW: Midterm Exam. Spring 2012

Chapter 8: Confidence Intervals

Sampling Distribution Models. Chapter 17

PubH 5450 Biostatistics I Prof. Carlin. Lecture 13

Sampling Distributions of Statistics Corresponds to Chapter 5 of Tamhane and Dunlop

IV. The Normal Distribution

1 Motivation for Instrumental Variable (IV) Regression

2008 Winton. Statistical Testing of RNGs

Confidence Intervals for Normal Data Spring 2018

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

Transcription:

Confidence Intervals Confidence interval for sample mean The CLT tells us: as the sample size n increases, the sample mean is approximately Normal with mean and standard deviation Thus, we have a standard normal variable Because the area under the standard normal curve between 1.96 and 1.96 is.95, we know: This is equivalent to: If the underlying population is Normally distributed, we don t need CLT or large sample size for the sample mean to be Normally distributed normality is guaranteed. which can be interpreted as the probability that the interval includes the true mean is 95%. 1 2 Confidence interval for sample mean Confidence interval for sample mean The interval The CI interval is centered at the sample mean and extends 1.96 to each side of the sample mean. Thus the interval s width is 2 (1.96) boundaries are random and is not random; only the interval is thus called the 95% confidence interval for the mean. This interval varies from sample to sample, as the sample mean varies. So the interval itself is a random interval: its bounds are random variables. 3 4

Basic Properties of Confidence Intervals Interpreting a Confidence Level For a given sample, the CI can be expressed either as We started with an event (that the random interval captures the true value ) whose probability was.95 It is tempting to say that lies within this fixed interval with probability 0.95. or as is a constant (unfortunately unknown to us). It is therefore incorrect to write the statement P( lies in (a, b)) = 0.95 -- since either is in (a,b) or isn t. A concise expression for the interval is x 1.96 where gives the left endpoint (lower limit) and + gives the right endpoint (upper limit). Basically, is not random (it s a constant), so it can t have a probability associated with its behavior. 5 6 Interpreting a Confidence Level Interpreting a Confidence Level Instead, a correct interpretation of 95% confidence relies on the long-run relative frequency interpretation of probability. Example: the vertical line cuts the measurement axis at the true (but unknown) value of. To say that an event A has probability.95 is to say that if the same experiment is performed over and over again, in the long run A will occur 95% of the time. So the right interpretation is to say that in repeated sampling, 95% of the confidence intervals obtained from all samples will actually contain. The other 5% of the intervals will not. One hundred 95% CIs (asterisks identify intervals that do not include ). 7 8

Interpreting a Confidence Level Other Levels of Confidence Notice that 7 of the 100 intervals shown fail to contain. Probability of 1 is achieved by using z /2 in place of 1.96 In the long run, only 5% of the intervals so constructed would fail to contain. According to this interpretation, the confidence level is not a statement about any particular interval, eg (79.3, 80.7). Instead it pertains to what would happen if a very large number of like intervals were to be constructed using the same CI formula. P( z /2 Z < z /2 ) = 1 9 10 Other Levels of Confidence Example A 100(1 )% confidence interval for the mean when the value of is known is given by A sample of 40 units is selected and diameter measured for each one. The sample mean diameter is 5.426 mm, and the standard deviation of measurements is 0.1mm. Let s calculate a confidence interval for true average diameter using a confidence level of 90%. This requires that 100(1 ) = 90, from which =.10. or, equivalently, by Using qnorm(0.05) z /2 = z.05 = 1.645 (corresponding to a cumulative z-curve area of.95). The formula for the CI can also be expressed in words as Point estimate (z critical value) (standard error). The desired interval is then 11 12

Interval width Sample size computation Since the 95% interval extends 1.96 x, the width of the interval is 2(1.96) = 3.92 to each side of For each desired confidence level and interval width, we can determine the necessary sample size. Similarly, the width of the 99% interval is (using qnorm(0.005)) 2(2.58) = 5.16 Example: A response time is Normally distributed with standard deviation 25 millisec. A new system has been installed, and we wish to estimate the true average response time for the new environment. We have more confidence that the 99% interval includes the true value precisely because it is wider. Assuming that response times are still normally distributed with = 25, what sample size is necessary to ensure that the resulting 95% CI has a width of (at most) 10? The higher the desired degree of confidence, the wider the resulting interval will be. 13 14 Example The sample size n must satisfy Rearranging this equation gives cont d Unknown mean and variance We know that - a CI for the mean of a normal distribution - a large-sample CI for for any distribution with a confidence level of 100(1 ) % is: = 2 (1.96)(25)/10 = 9.80 So n = (9.80) 2 = 96.04 A practical difficulty is the value of, which will rarely be known. Instead we work with the standardized variable Since n must be an integer, a sample size of 97 is required. Where the sample standard deviation S has replaced. 15 16

Unknown mean and variance A Large-Sample Interval for Previously, there was randomness only in the numerator of Z by virtue of estimator., the If n is sufficiently large, the standardized variable In the new standardized variable, both another. and S vary in value from one sample to has approximately a standard normal distribution. This implies that Thus the distribution of this new variable should be wider than the Normal to reflect the extra uncertainty. This is indeed true when n is small. However, for large n the substitution of S for adds little extra variability, so this variable also has approximately a standard normal distribution. is a large-sample confidence interval for with confidence level approximately 100(1 ) %. This formula is valid regardless of the population distribution. 17 18 A Large-Sample Interval for Small sample intervals for the mean In words, the CI is point estimate of (z critical value) (estimated standard error of the mean). Generally speaking, n > 40 will be sufficient to justify the use of this interval. This is somewhat more conservative than the rule of thumb for the CLT because of the additional variability introduced by using S in place of. The CI for presented in earlier section is valid provided that n is large Rule of thumb: n>40 The resulting interval can be used whatever the nature of the population distribution. The CLT cannot be invoked when n is small Need to do something else when n<40 When n<40, we have to make a specific assumption about the form of the population distribution and then derive a CI tailored to that assumption. For example, we could develop a CI for when the population is described by a Normal, or gamma distribution, or a Weibull distribution, and so on. 19 20

Small Sample Intervals Based on a Normal Population Distribution The result on which inference is based introduces a new family of probability distributions called t distributions. t Distributions When is the sample mean of a random sample of size n from a normal distribution with mean, the rv has a probability distribution called a t distribution with n 1 degrees of freedom (df). 21 22 Properties of t Distributions Properties of t Distributions Figure below illustrates some members of the t-family Properties of t Distributions Let t denote the t distribution with df. 1. Each t curve is bell-shaped and centered at 0. 2. Each t curve is more spread out than the standard normal (z) curve. 3. As increases, the spread of the corresponding t curve decreases. 4. As, the sequence of t curves approaches the standard normal curve (so the z curve is the t curve with df = ). 23 24

Properties of t Distributions The One-Sample t Confidence Interval Let t, = the number on the measurement axis for which the area under the t curve with df to the right of t, is ; t, is called a t critical value. Let and s be the sample mean and sample standard deviation computed from the results of a random sample from a normal population with mean. Then a 100(1 )% confidence interval for is For example, t.05,6 is the t critical value that captures an upper-tail area of.05 under the t curve with 6 df or, more compactly Because t curves are symmetric about zero, t, captures lower-tail area. 25 26 Example cont d Example cont d A dataset on the modulus of material rupture (psi): 6807.99 7637.06 6663.28 6165.03 6991.41 6992.23 6981.46 7569.75 7437.88 6872.39 7663.18 6032.28 6906.04 6617.17 6984.12 7093.71 7659.50 7378.61 7295.54 6702.76 7440.17 8053.26 8284.75 7347.95 7422.69 7886.87 6316.67 7713.65 7503.33 7674.99 The histogram provides support for assuming that the population distribution is at least approximately normal. There are 30 observations. The sample mean is 7203.191 The sample standard deviation is 543.5400. 27 28

Example cont d Intervals Based on Nonnormal Population Distributions Recall the sample mean and sample standard deviation are 7203.191 and 543.5400, respectively. The 95% CI is based on n 1 = 29 degrees of freedom, so the necessary critical value is t.025,29 = 2.045. The interval estimate is now The one-sample t CI for is robust to small or even moderate departures from normality unless n is quite small. By this we mean that if a critical value for 95% confidence, for example, is used in calculating the interval, the actual confidence level will be reasonably close to the nominal 95% level. If, however, n is small and the population distribution is nonnormal, then the actual confidence level may be considerably different from the one you think you are using when you obtain a particular critical value from the t table. 29 30 A General Large-Sample Confidence Interval The large-sample intervals and are special cases of a general large-sample CI for a parameter. General parameter Confidence Interval Suppose that is an estimator such that: (1) It has approximately a normal distribution; (2) it is (at least approximately) unbiased; (3) an expression for, the standard deviation of, is available. Then, 31 And the CI for is: 32

A Confidence Interval for a Population Proportion Let p denote the proportion of successes in a population, where success identifies an individual or object that has a specified property (e.g., individuals who graduated from college, computers that do not need warranty service, etc.). A Confidence Interval for a Population Proportion A random sample of n individuals is to be selected, and X is the number of successes in the sample. X can be thought of as a sum of all X i s, where 1 is added for every success that occurs and a 0 for every failure, so X 1 +... + X n = X). Thus, X can be regarded as a binomial rv with mean np and. 33 Furthermore, if both np 10 and n(1-p) 10, X has approximately a normal distribution. 34 A Confidence Interval for a Population Proportion The natural estimator of p is = X / n, the sample fraction of successes. Since is the sample mean, (X 1 +... + X n )/ n It has approximately a normal distribution. As we know that, E( and ) = p (unbiasedness) The standard deviation involves the unknown parameter p. Standardizing then implies that One-Sided Confidence Intervals And the CI is 35 36

One-Sided Confidence Intervals (Confidence Bounds) One-Sided Confidence Intervals (Confidence Bounds) The confidence intervals discussed thus far give both a lower confidence bound and an upper confidence bound for the parameter being estimated. In some circumstances, an investigator will want only one of these two types of bounds. Because the cumulative area under the standard normal curve to the left of 1.645 is. 95, we have For example, a psychologist may wish to calculate a 95% upper confidence bound for true average reaction time to a particular stimulus, or a reliability engineer may want only a lower confidence bound for true average lifetime of components of a certain type. 37 38 One-Sided Confidence Intervals (Confidence Bounds) Starting with P( 1.645 < Z).95 and manipulating the inequality results in the upper confidence bound. A similar argument gives a one-sided bound associated with any other confidence level. Proposition A large-sample upper confidence bound for is and a large-sample lower confidence bound for is Confidence Intervals for Variance of a normal population 39 40

Confidence Intervals for the Variance of a Normal Population Confidence Intervals for the Variance of a Normal Population Let X 1, X 2,, X n be a random sample from a normal distribution with parameters and 2. Then The graphs of several 2 probability density functions are has a chi-squared ( 2 ) probability distribution with n 1 df. We know that the chi-squared distribution is a continuous probability distribution with a single parameter v, called the number of degrees of freedom, with possible values 1, 2, 3,.... 41 42 Confidence Intervals for the Variance of a Normal Population Confidence Intervals for the Variance of a Normal Population The chi-squared distribution is not symmetric, so need values of both for near 0 and 1 As a consequence Or equivalently Substituting the computed value s 2 into the limits gives a CI for 2 Taking square roots gives an interval for. 43 44

Confidence Intervals for the Variance of a Normal Population Example A 100(1 )% confidence interval for the variance 2 of a normal population has lower limit The data on breakdown voltage of electrically stressed circuits are: and upper limit A confidence interval for has lower and upper limits that are the square roots of the corresponding limits in the interval for 2. breakdown voltage is approximately normally distributed. 45 46 Example cont d Let 2 denote the variance of the breakdown voltage distribution. The computed value of the sample variance is s 2 = 137,324.3, the point estimate of 2. With df = n 1 = 16, a 95% CI requires = 6.908 and = 28.845. The interval is Taking the square root of each endpoint yields (276.0, 564.0) as the 95% CI for. 47