Chapter 5 Confidence Intervals

Chapter 5 Confidence Intervals Confidence Intervals about a Population Mean, σ, Known Abbas Motamedi Tennessee Tech University

A point estimate: a single number, calculated from a set of data, that is the best guess for the parameter. (Common notation: put a hat on the parameter.)

A point estimate is a single number, How much uncertainty is associated with a point estimate of a population parameter? An interval estimate provides more information about a population characteristic than does a point estimate. It provides a confidence level for the estimate. Such interval estimates are called confidence intervals Lower Confidence Limit Point Estimate Width of confidence interval Upper Confidence Limit

A confidence interval estimate: A range of numbers around the point estimate within which the parameter is believed to fall. Also called a confidence interval.

An interval gives a range of values: Takes into consideration variation in sample statistics from sample to sample Gives information about closeness to unknown population parameters Stated in terms of level of confidence. (Can never be 100% confident) The general formula for all confidence intervals is equal to: Point Estimate ± (Critical Value)(Standard Error)

The level of confidence in a confidence interval is a probability that represents the percentage of intervals that will contain if a large number of repeated samples are obtained. The level of confidence is denoted Example: 95% confidence, 99% confidence Can never be 100% confident

For example, a 95% level of confidence would mean that if 100 confidence intervals were constructed, each based on a different sample from the same population, we would expect 95 of the intervals to contain the population mean.

The construction of a confidence interval for the population mean depends upon three factors q The point estimate of the population q The level of confidence q The standard deviation of the sample mean

Suppose we obtain a simple random sample from a population. Provided that the population is normally distributed or the sample size is large, the distribution of the sample mean will be normal with

95% of all sample means are in the interval With a little algebraic manipulation, we can rewrite this inequality and obtain:

q The level of significance, or α risk is the chance we take that the true population parameter is not contained in the confidence interval. q Therefore, a 95% confidence interval would have an α of 5%.

The Level of Significance (α) If α =.05, then each tail has.025 area The critical values of z that define the α areas are -1.96 and + 1.96 95% Confidence Interval a =.025-1.96 z + 1.96 z a =.025.0250 Point Estimate.9750 Z.06-1.9.0250 Z.06 + 1.9.9750 α is the proportion in the tails of the sampling distribution that is outside the established confidence interval.

The z Table.0250 of the area under the standardized normal distribution corresponds to - 1.96 z.

The z Table.9750 of the area under the standardized normal distribution corresponds to + 1.96 z.

Recall: Single population mean (large n) Hypothesis test: Z = observed mean null s n mean Confidence Interval confidence interval = observed mean ± Z /2 α *( s n )

Single population mean (small n, normally distributed trait) Hypothesis test: T n 1 observed mean null = s n mean Confidence Interval confidence interval = observed mean ± Tn 1, /2 α *( s n )

What is a T-distribution? A t-distribution is like a Z distribution, except has slightly fatter tails to reflect the uncertainty added by estimating σ. The bigger the sample size (i.e., the bigger the sample size used to estimate σ), then the closer t becomes to Z. If n>100, t approaches Z.

T-distribution with only 1 degree of freedom.

T-distribution with 4 degrees of freedom.

T-distribution with 9 degrees of freedom.

T-distribution with 29 degrees of freedom.

T-distribution with 99 degrees of freedom. Looks a lot like Z!!

Student s t Distribution Note: t Z as n increases Standard Normal (t with df = ) t-distributions are bellshaped and symmetric, but have fatter tails than the normal t (df = 13) t (df = 5) from Statistics for Managers Using Microsoft Excel 4 th Edition, Prentice-Hall 2004 0 t

Student s t Table Upper Tail Area df.25.10.05 1 1.000 3.078 6.314 Let: n = 3 df = n - 1 = 2 α =.10 α/2 =.05 2 0.817 1.886 2.920 3 0.765 1.638 2.353 α/2 =.05 The body of the table contains t values, not probabilities 0 2.920 t from Statistics for Managers Using Microsoft Excel 4 th Edition, Prentice-Hall 2004

Confidence Intervals for the Difference between Two Population Means µ 1 - µ 2 : Independent Samples Two random samples are drawn from the two populations of interest. Because we compare two population means, we use the statistic. x 1 x 2 32

Population 1 Population 2 Parameters: µ 1 and σ 1 2 Parameters: µ 2 and σ 2 2 (values are unknown) (values are unknown) Sample size: n 1 Sample size: n 2 Statistics: x 1 and s 1 2 Statistics: x 2 and s 2 2 Estimate µ 1 - µ 2 with x 1 - x 2 33

Confidence Interval for µ 1 µ 2 Confidence interval ( x x ) z 1 2 where * ± + z * σ 2 σ 2 1 2 n n 1 2 is the value from the z-table that corresponds to the confidence level Note: when the values of σ 1 2 and σ 2 2 are unknown, the sample variances s 1 2 and s 2 2 computed from the data can be used. 34

Inference about Two Populations We are interested in: Confidence intervals for the difference between two proportions. 35

Point Estimator: pˆ pˆ 1 2 Two random samples are drawn from two populations. The number of successes in each sample is recorded. The sample proportions are computed. Sample 1 Sample size n 1 Number of successes x 1 Sample proportion ˆ = p 1 x n 1 1 Sample 2 Sample size n 2 Number of successes x 2 Sample proportion x2 pˆ 2 = n 2 36

37 Confidence Interval for p 1 - p 2 confidence level the z - table that depends on the where z*is the appropriate value from ) ˆ (1 ˆ ) ˆ (1 ˆ ) ˆ ˆ ( 2 2 2 1 1 1 * 2 1 n p p n p p z p p + ±