Confidence Intervals and Hypothesis Tests STA 281 Fall 2011 1 Background The central limit theorem provides a very powerful tool for determining the distribution of sample means for large sample sizes. In particular, if X 1,, Xn are independent and identically distributed (iid) with mean E[Xi]=µ and variance V[Xi]=σ 2 AND n is large, then ( ) For the remainder of this handout, all sample sizes should be assumed to be greater than 30 so this result holds. A special case of this result is that if X 1,, Xn Bern(p), then ( ) Results for two samples may be obtained using the formulas for linear combinations of normal distributions. Thus, if are iid with mean and variance while are iid with mean and variance (and of course the X and Y samples are independent), then ( ) This also has a special case for proportions. If and, then ( ) These formulas are the four fundamental results that motivate all of the confidence interval and hypothesis testing theory we will investigate in this course. 2 What are Confidence Intervals and Hypothesis Tests? Inference is the use of data to draw conclusions about population parameters. Probability theory assumes we have X 1,, Xn Bern(0.4) and then specifies the likelihood of generating 0 through n successes. Thus probability theory assumes we know the parameter p and specifies how our data should appear. Inference is concerned about the reverse problem. We already have X 1,, Xn Bern(p), but we don t know p. Our goal is to use the data to determine p. The first thing to note is that we will NEVER be able to determine p exactly with only a finite amount of data. Suppose n=1000 and we observe that X 1,, Xn Bern(p) has 800 successes. What is p? Unfortunately, no value of p in (0,1) can be completely excluded based on this data. It is possible to see the observed data when p=0.01 (not likely, but possible). For any value of p in (0,1), the observed value is possible. Thus, we are forced to make probabilistic statements about p. Intuitively, while p=0.01 cannot be excluded, our observed data (800 successes in 1000 trials) is so unlikely when p=0.01 that for all practical purposes we can exclude p=0.01. These are the kind of inferences we will pursue.
We will focus on making two kinds of inferences, confidence intervals and hypothesis tests in several scenarios. Not coincidentally, these scenarios correspond to the situations where we applied the central limit theorem in section 1. Specifically, we will make inferences on means for one or two samples, and on proportions for one or two samples. The two kinds of inferences correspond to two common questions asked in scientific experiments. The first, confidence intervals, answers the question I have no idea what µ (p) is, how do I use the data to estimate it? The second, hypothesis tests, answers the question I have a specific value of µ (p) in mind. Is the data consistent with that particular value of µ (p)? 3 Point Estimates (our best guess) Fundamental to answering both these questions is the notion of a point estimate. A point estimate takes the observed data and produces a single value (or guess) of the parameter. Returning to our example where we had 1000 Bernoulli trials and observed 800 successes, we have already established we are not pleased with p=0.01. If we had to guess a single number, what would we guess? The most common choice is, which in this example is 800/1000=0.8. This guess is justified by the central limit theorem, which states that the expected value of is p. While may not be equal to p in any particular situation, has a distribution that is centered around the true value. Thus, if we are estimating a proportion p, we estimate it with. For a mean µ, use. These extend to the two sample case, so the difference of two proportions is estimated by and the difference of two means is estimated by. Not coincidentally, the center of the distributions of all these guesses is the quantity we are trying to guess. 4 Confidence Intervals Our best guess is a good start for inference, but it isn t ideal. Specifically, our best guess is basically guaranteed to be wrong. If X 1,, Xn N(0,1), then N(0,1), which is a continuous distribution. Although the distribution of is centered at µ=0, the probability that will exactly equal 0 is 0. OK, that doesn t sound great, but it s not terrible. While might not be exactly correct, its key advantage is that it should be close to µ, and the larger the sample size, the closer to µ it should be (this can be observed by noting the variance of, σ 2 /n, tends to 0 as n increases). In fact, the central limit theorem allows us to quantify just how close our point estimate should be to the correct answers. In general, a confidence interval is Thus, for each situation, the only thing to do is find the best guess, and then use the central limit theorem to compute the standard deviation of that best guess.
4.1 Formulas 4.1.1 Single Proportion We have X 1,, Xn Bern(p). The best guess of p is. Looking at the central limit theorem result, the variance of is p(1-p)/n. This is an obvious difficulty, since p is unknown (it is what we are trying to estimate!). However, it turns out that is a sufficiently good guess of p that we can replace p with in the variance, resulting in the confidence interval 4.1.2 Single Means We have X 1,, Xn iid with mean µ=e[xi] and variance σ 2 =V[Xi]. The best guess of µ is, which has variance σ 2 /n, resulting in the interval If σ 2 is unknown, then it must be estimated from the data. It turns out that s 2, defined as [ ] is a reasonable guess of σ 2, and thus should be used in place of σ 2 when necessary. 4.1.3 Difference between two proportions We have and, and are interested in estimating. The best guess of is. The variance of this best guess depends on the unknown quantities px and py, but as with a single proportion these can be replaced with and in the variance, resulting in the interval 4.1.4 Difference between two means We have iid with mean µx and variance, and iid with mean µy and variance, and are interested in estimating. The best guess of is. Using the central limit theorem to find the variance of this best guess, we find the confidence interval is As with estimating a single mean, replace with and with as necessary.
5 Hypothesis Tests When we have a specific value of the parameter in mind and want to verify whether that parameter value is reasonable for the data, we use a hypothesis test. The specific value of the parameter we have in mind is recorded in the null hypothesis, H0, which might state p=0.2, or µ=5, or =3. The point is that a specific value of the parameter is chosen. A hypothesis test is conducted by observing the difference between our best guess of the parameter and the null value (the value specified in the null hypothesis). This difference must then be standardized. The standardization consists of finding the standard deviation of the best guess under the assumption H0 is true. This results in the test statistic The test statistic merely measures how many standard deviations the best guess is from the null value. If the best guess is too far away, the null hypothesis is rejected, otherwise the null hypothesis is accepted. Too far away in this context is determined by. We reject H0 if or if. Otherwise we do not reject H0. Note that when H0 is true, we have constructed a procedure that rejects H0 with probability α. Thus, we can control the probability of falsely rejecting H0. 5.1 Formulas 5.1.1 Single Proportion Suppose we have X 1,, Xn Bern(p) and are testing H0: p=p0. Our best guess of p is. When H0 is true,, thus the test statistic is 5.1.2 Single Mean Suppose we have X 1,, Xn iid with mean µ and variance σ 2, and we are testing H0: µ=µ0. The best guess of µ is, which under the null hypothesis is distributed N(µ=µ0, σ 2 /n). Thus the test statistic is As with confidence intervals, replace σ 2 with s 2 if the variance is unknown.
5.1.3 Difference between two proportions We have and, and are interested in testing H0: =0. There is one trick to this. We have to compute the standard deviation of under the assumption that =0. We cannot just plug in and into the usual variance formula, because it may not be true that =0. We use where 5.1.4 Difference between two means We have iid with mean µx and variance, and iid with mean µy and variance, and are interested in testing H0: =d0 against H1: d0. The most common instance of this occurs when d0=0, when the null hypothesis simplifies to H0:. Our best guess of is, and thus the test statistic is where and should be replaced with and as necessary.