Inference for the mean of a population. Testing hypotheses about a single mean (the one sample t-test). The sign test for matched pairs

Stat 528 (Autumn 2008) Inference for the mean of a population (One sample t procedures) Reading: Section 7.1. Inference for the mean of a population. The t distribution for a normal population. Small sample CI for µ in a normal population. Robustness of the t procedures. Testing hypotheses about a single mean (the one sample t-test). Methods for matched pairs The paired t-test The sign test for matched pairs The power of the one sample t-test. 1

Inference for the mean of a population So far we have based inference for the population mean on the Z statistic Z = X µ σ/ n. For large n, Z is approximately N(0,1). Problem: in practice we do not know the population standard deviation, σ. Instead we use the sample standard deviation, s, as an estimate for σ. 2

The distribution of t for a normal population Let X 1, X 2,... X n be a SRS from a normal population with population mean µ. Then the standardized variable t = X µ s/ n, has a t distribution with n 1 degrees of freedom (df). The impact of estimating σ is to add uncertainty about our standardization. Smaller n leads to fewer degrees of freedom and less certainty. We say that t has a t n 1 distribution The quantity, s/ n is the (estimated) standard error for the sample mean. It is denoted SE mean in MINITAB. 3

Properties of the t distribution probability density 0.0 0.1 0.2 0.3 0.4 standard normal t with 5 df t with 2 df t with 1 df 4 2 0 2 4 value The density curve is symmetric with mean zero and is bellshaped like the normal distribution. The t distribution has heavier tails than the normal distribution (more spread out about zero). As the degrees of freedom increase the tails become thinner, and more of the density is concentrated in the center of the distribution. t = standard normal distribution. 4

A small sample CI for µ (The normal population case) For one random sample of normal data, a C = 100(1 α)% level confidence interval for µ is given by x ± t n 1,α/2 s n, where t n 1,α/2 is the critical value of the t distribution with n 1 degrees of freedom. The t n 1,α/2 value is tabulated in Table D. 1. Look at the bottom of the table for the confidence level C of the two sided interval, OR 2. Look up α/2 as the upper tail probability p. Recall that the CI for µ comes from a family of hypothesis tests about µ. 5

Robustness of the t-procedures What if the population is not normal can we still use the t distribution? Practical guidelines from the textbook: 1. n < 15: Use t procedures if data are close to normal. If data are clearly non-normal or if outliers are present, do not use the t procedure. 2. n 15: Use t procedures except in presence of strong skewness or outliers. 3. Roughly n 40: The t procedures are valid even for clearly skewed distributions. Use plots of the data to help you decide! 6

Polymerization example The article Measuring and understanding the aging of craft insulating paper in power transformers contained the following observations on the degree of polymerization for paper specimens for which viscosity times concentration fell in a certain middle range. 418 421 421 422 425 427 431 434 437 439 446 447 448 453 454 463 465 Plots of the data show that a normality assumption for the data is reasonable. (Note that x = 438.29, s = 15.14, n = 17). Form a 95% confidence interval for the true average degree of polymerization (as did the authors of the article). Does the interval suggest that 440 is a plausible value for the true average degree of polymerization? What about 450? 7

Testing hypotheses about a single mean The one sample t test Data: We assume x 1, x 2,...x n is a random sample from a normal population with mean µ. We state our hypotheses: H 0 : µ = µ 0, for some constant value µ 0 H a : µ < µ 0, µ µ 0, OR µ > µ 0 (remember to define what µ is (in words) for your problem). We calculate the test statistic, t = x µ 0 s/ n. Under H 0, the test statistic follows a t n 1 distribution. Decision: Compare the observed t-statistic to the critical value found in Table D. 8

Drawing conclusions in the one-sample t-test For a test of significance at the level α If the observed t-statistic is in the tail, we reject H 0 (in favor of H A ). If the observed t-statistic is not in the tail, we do not reject H 0. Alternatives and tails For a two-tailed alternative, reject if t t α/2. For an upper-tailed alternative, reject if t t α. For a lower-tailed alternative, reject if t t α. As always, write your conclusion(s) in words. It is important to think about the assumptions that you made to carry out the t-test. Remember that some assumptions can be validated using plots of the data. 9

Example The one-sample t statistic from a sample of n = 50 observations for the two-sided test of H 0 : µ = 50 versus H a : µ 50, has the value t = 1.65. What are the degrees of freedom for the test statistic, t? Is the value t = 1.65 statistically significant at the 10% level? At the 5% level? Locate the two critical values, t from Table D that bracket t. What are the right-tail probabilities for these two values? How would you report the P-value for this test? 10

Matched pairs (revision and analysis) Suppose we have two treatments. In the matched pairs design we try to gain precision in the response by matching pairs of similar individuals. we assign each treatment randomly to each subject (each subject only receives one treatment). Or an individual serves as his/her own partner. the individual receives both treatments. Each pair of subjects (individual) form their own block. To analyze the results of this type of experiment, we compare the responses across the pairs (individuals). We usually take differences, and carry out the statistical inference using the paired t-test. 11

Football example Two identical footballs, one air-filled and one helium-filled, were used outdoors on a windless day at The Ohio State University s athletic complex. The kicker was a novice punter and was not informed which football contained the helium. Each football was kicked 39 times. The kicker changed footballs after each kick so that his leg would play no favorites if he tired or improved with practice. (Source: Lafferty, M. B. (1993), OSU scientists get a kick out of sports controversy, The Columbus Dispatch (21 Nov 1993), B7.) 12

The data (all distances are in yards) Trial Air Helium 1 25 25 2 23 16 3 18 25 4 16 14 5 35 23 6 15 29 7 26 25 8 24 26 9 24 22 10 28 26 11 25 12 12 19 28 13 27 28 Trial Air Helium 14 25 31 15 34 22 16 26 29 17 20 23 18 22 26 19 33 35 20 29 24 21 31 31 22 27 34 23 22 39 24 29 32 25 28 14 26 29 28 Trial Air Helium 27 22 30 28 31 27 29 25 33 30 20 11 31 27 26 32 26 32 33 28 30 34 32 29 35 28 30 36 25 29 37 31 29 38 28 30 39 28 26 13

A scatterplot 14

The paired t procedure the setup Suppose we have pairs of data values (x 1, y 1 ), (x 2, y 2 ),... (x n, y n ). e.g., In our example the pairs of values are the (helium-filled, air-filled) distances for each kick. Clearly the x and y values are not independent. Instead, we calculate the differences d i = y i x i, for each i = 1,..., n. We assume d 1, d 2,... d n is a random sample from a normal population with mean µ d and stdev σ d. µ d is the population mean of the differences between the x and y values. σ d is the population stdev of the differences. 15

The paired t procedure We want to test: H 0 : µ d = µ 0, for some constant value µ 0 H a : µ d < µ 0, µ d µ 0, OR µ d > µ 0 We compute the test statistic, t = d µ 0 s d / n, where d is the sample average of the differences, and s d is the sample stdev of the differences. Under H 0, the test statistic follows a t n 1 distribution. We make our decision in the same way that we did for the one-sample t-test. if the observed t-statistic is in tail, we reject H 0, if the observed t-statistic is not in the tail, we do not reject H 0. 16

Identifying the hypotheses There is a belief that on average a helium-filled ball travels further than the air-filled ball. State the appropriate H 0 and H a. Be sure to identify the parameters appearing in the hypotheses. 17

Summary figures 18

Performing the test Carry out a test. Can you reject H 0 at the 5% significance level? At the 1% significance level? Write down you conclusion in words. Variable N N* Mean SE Mean StDev Air-Helium 39 0-0.462 1.10 6.87 Variable Minimum Q1 Median Q3 Maximum Air-Helium -17.00-4.00-1.00 2.00 14.00 Provide a 90% confidence interval for the mean difference in the distances (air-filled minus helium-filled). 19

Inference for non-normal populations If the data do not seem to be drawn from a normal population, then the t procedures may not be valid. Three possible strategies: 1. Learn about other probability distributions. For example, there plenty of skewed distributions (e.g, exponential, gamma, Weibull). Use methods for these distributions instead of the methods for the normal distribution. 2. Transform your data to make it look as normal as possible (recall the ladder of power transformations). Can be hard to interpret the results when using a transformation. 3. Use distribution-free tests. These tests do not assume a particular distribution for the population. Often these test are based on other parameters of the distribution such as the median (rather than the mean). These tests can be less powerful in practice. 20

The sign test for matched pairs Example of a distribution-free test. As before, consider pairs of data values: (x 1, y 1 ), (x 2, y 2 ),... (x n, y n ). We will test H 0 : population median of differences = 0, versus H a : population median of differences 0. Let d i = y i x i (i = 1,..., n) be the differences. Exclude the differences that are zero. Let X denote the count out of the remaining m differences that are positive. Then under H 0, X is Binomial(m,0.5). (If the median is zero, then half the nonzero differences are above zero, and the other half are below zero). If x is the observed X value, then the P-value is 2 P(X x) or 2 P(X x). 21

The sign test for matched pairs (cont.) For the football example: Out of n = 39 differences, m = 37 differences are nonzero. Thus under H 0, X is Binomial(37, 0.5). Out of the 37, we observe 17 that are above zero. P-value = 2 P(X 17) = 2 0.3714 = 0.7428. No evidence to reject H 0. See the textbook for the one-sided test. Note: If the population of differences is normally (or approximately normally) distributed then this test will be less powerful at detecting differences than the paired t-test. 22

The power of the one sample t-test The power calculation for the one sample t-test is similar to the power calculation for the z-test. But, the math is much harder! Instead we use MINITAB. Stat Power and Sample Size 1-Sample t. Under Options select the Alternative Hypothesis and Significance Level Then enter any two of the following three items: 1. Sample sizes: 2. Differences: 3. Power values: Enter the Standard deviation (the sample stdev in this case) and click OK. 23

A value for σ There are four main ways to obtain a value for σ. Literature search. Use historical data from similar studies. Pilot study. Use the results of a pilot study. The estimate of σ will often need to be adjusted. Elicit σ. Two useful methods are the Range/4 method and the Range/6 method. Construct a value for σ. Some probability models yield a value for σ. (e.g. For a Bernoulli RV, σ = p(1 p)). Be conservative. Use several methods and consider a slightly larger value of σ than these methods suggest. 24

An agricultural field trial example An agricultural field trial compares the yield of two varieties of tomatoes for commercial use. The researchers divide in half each of 10 small plots of land and plant each tomato variety on one half of each plot. After harvest, they compare the yields in pounds per plant at each location. The ten differences (Variety A - Variety B) give the following statistics: x = 0.46 and s = 0.92. Is there convincing evidence that Variety A has the higher mean yield? Let µ d denote the population mean of the difference in the yields. We test: H 0 : µ d = 0 versus H a : µ d > 0. The MINITAB output for the paired t test is: One-Sample T, Test of mu = 0 vs > 0 95% Lower N Mean StDev SE Mean Bound T P 10 0.460000 0.920000 0.290930-0.073307 1.58 0.074 25

Agricultural trial (cont.) The tomato experts who carried out the field trial suspect that the relative lack of significance is due to low power. They would like to detect a mean difference in yields of 0.6 pounds per plant at the 0.05 significance level. Based on the previous study, use 0.92 as an estimate of the population σ. What is the power of the test with n = 12 against the alternative of µ = 0.6? If the sample size is increased to n = 30 plots of land, what will be the power against the same alternative? 26