Disadvantages of using many pooled t procedures. The sampling distribution of the sample means. The variability between the sample means

Stat 529 (Winter 2011) Analysis of Variance (ANOVA) Reading: Sections 5.1 5.3. Introduction and notation Birthweight example Disadvantages of using many pooled t procedures The analysis of variance procedure Assumptions The hypothesis test The sampling distribution of the sample means The variability between the sample means Is there any difference in the means? Estimating σ 2 The F test statistic and F distribution Performing the ANOVA for the birthweight data 1

Introduction We move away from comparing two populations means, on the basis of data drawn from each population. Now we consider multiple populations. Here are some motivating examples: 1. There are three methods of assessing a concentration of a contaminant in water. Are all three methods equivalent? 2. How does the paper thickness vary for five different production lines? 3. How is the average crop yield affected by the use of four different fertilizers? 2

Notation Suppose that we have I samples drawn from I populations. For each population i = 1,..., I we have: µ i : mean of population i. σi 2 : variance of population i. We draw n i observations from population i: Y ij : the jth observation within population i. In total we have n 1 +... + n I = n observations. Estimates of µ i and σi 2 are respectively Y i : sample mean for sample i, and s 2 i : sample variance for sample i. 3

Birthweights example One measure of the overall health of a newborn baby is its birthweight. There are many factors which affect birthweight, including both genetic factors (such as mother s size or mother s birthweight) and environmental factors. One environmental factor which is believed to lower birthweight is maternal smoking. The data below present birthweights of a small number of infants, with the mother s smoking status during pregnancy recorded as non-smoker (someone who has never smoked), former smoker, light smoker or heavy smoker. Birthweight is recorded in pounds, with the ounces part translated into a decimal. Non Quit Light Heavy 7.5 5.8 5.9 6.2 6.2 7.3 6.2 6.8 6.9 8.2 5.8 5.7 7.4 7.1 4.7 4.9 9.2 7.8 8.3 6.2 8.3 7.2 7.1 7.6 6.2 5.8 5.4 4

Birthweights: questions of interest The question of primary concern is whether a mother s smoking reduces the mean birthweight of an infant. Two issues to consider when performing an analysis are: 1. We would like to make use of all of the information in our data. 2. We would like to avoid performing so many analyses on our data that we find significant differences where there are none. 5

Summaries of the data Maternal smoking Variable status N Mean SE Mean StDev Variance Minimum Birthweight in pounds Heavy 8 6.013 0.255 0.720 0.518 4.900 Light 7 6.329 0.431 1.140 1.299 4.700 Non 7 7.586 0.363 0.962 0.925 6.200 Quit 5 7.240 0.408 0.913 0.833 5.800 Variable status Q1 Median Q3 Maximum IQR Birthweight in pounds Heavy 5.475 6.000 6.650 7.100 1.175 Light 5.800 6.200 7.200 8.300 1.400 Non 6.900 7.500 8.300 9.200 1.400 Quit 6.450 7.300 8.000 8.200 1.550 6

Discussion of the summaries 7

Many pooled t procedures We could make all the 6 pairwise comparisons of the means: µ 1 µ 2, µ 1 µ 3, µ 1 µ 4, µ 2 µ 3, µ 2 µ 4, µ 3 µ 4. Based on the results of many pooled t-tests, we might conclude that maternal smoking is related to lower birthweight. Disadvantages: 1. If the additive model holds and all the variances are equal then why do we use six different estimates of σ 2, with differing degrees of freedom? Should we not pool all the information about the variability from all the samples? 2. Then chance of making at least one I type error in all the tests of µ i µ j is larger. e.g., For 6 tests, the chance of making at least one type I error in all six tests lies between α and 6α. (in our case between 0.05 and 0.30). 8

The analysis of variance procedure The ANalysis Of VAriance (ANOVA) procedure tries to remedy all these problems (it certainly fixes 2.) Assume that the additive model is appropriate for each of our I samples. Our model is Y ij = µ i + ɛ ij. Here µ i is the mean of population i and ɛ ij is the error. We assume the errors are Then Y ij are 9

Assumptions Y 11,..., Y 1n1 form a random sample from some population. Y 21,..., Y 2n2 form a random sample from a second population. Similarly, further samples are random samples from some population. With I samples, the last sample is Y I1,..., Y InI. The I samples are independent of one another. The population distributions are normal with unknown means µ 1, µ 2,..., µ I and with a common unknown standard deviation σ. 10

The hypothesis test The null hypothesis in ANOVA is no difference. That is, H 0 : µ 1 = µ 2 =... = µ I. The alternative is not H 0, or more specifically H a : at least two of the means differ, which is the same as H a : there is some difference in the means. Our test is based on measuring how far apart the sample means are from one another. The further they are apart the more likely we will be to reject H 0. 11

The sampling distribution of the sample means For each population i the distribution of each Y i is Since the observations are independent across populations, Under H 0 : µ 1 = µ 2 =... = µ I = µ, say, the distribution of Y i is Then under H 0 an estimate for µ, the common or grand mean, is 12

The variability between the sample means How far away are the sample means from the grand mean? The variability of the sample means is calculated from MS(B) = I i=1 n i(y i Y ) 2. I 1 MS(B) stands for the between group mean square, sometimes called the mean square for treatments. This variance is calculated by dividing the by the SS(B) = I i=1 n i(y i Y ) 2, between group sum of squares, df(b) = I 1, between group degrees of freedom. As MS(B) increases we are less likely / more likely (choose one) to reject H 0. 13

Birthweights example: calculating MS(B) (Exercise: check this for yourself!) In total there are n = i n i = 8+7+7+5 = 27 observations. The grand mean, calculated from the summary statistics is i j Y = Y ij i = n iy i n n Thus = 8 6.013 + 7 6.329 + 7 7.586 + 5 7.240 27 = 181.709 27 = 6.73. SS(B) = i n i (Y i Y ) 2 df(b) = = 8 (6.013 6.73) 2 + 7 (6.329 6.73) 2 + 7 (7.586 6.73) 2 + 5 (7.240 6.73) 2 = 11.667, and MS(B) = SS(B) df(b) = 14

How large should MS(B) be? We compare MS(B) to σ 2, the variance within the samples. We need to estimate σ 2. For each population i, we know that s 2 i is an estimate of σ2. Pooling across the populations, an estimate of σ 2 is MS(W ) = s 2 p = (n 1 1)s 2 1 +... + (n I 1)s 2 I (n 1 1) +... + (n I 1) = I i=1 (n i 1)s 2 i. n I This is called the within group mean square, or the mean square for the error. This variance is calculated by dividing the by the SS(W ) = I i=1 (n i 1)s 2 i, within group sum of squares, df(w ) = n I, within group degrees of freedom. 15

Birthweights example: calculating MS(W) (Exercise: check this for yourself!) We have that SS(W ) = i (n i 1)s 2 i = 7 0.518 + 6 1.299 + 6 0.925 + 4 0.833 = 20.302 df(w ) = and MS(W ) = SS(W ) df(w ) = 16

The F test statistic Under the additive model with normal populations with H 0 : µ 1 = µ 2 =... = µ I being true, the F test statistic, F = MS(B) MS(W ), follows an F distribution with and df(b) = I 1 numerator degrees of freedom, df(w ) = n I denominator degrees of freedom. We reject H 0 for large values of the observed F statistic, F obs. The p-value is P (F F obs ), where F is a F distributed random variable on I 1 and n I df. 17

Viewing the F distribution The F distribution has two separate degrees of freedom. It is a positive and right skewed distribution. Some of the critical values are tabulated in Table A.4 (p.720 727). Can also use MINITAB to directly calculate the p-value. density for F on 3 and 23 df 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0 1 2 3 4 5 6 18

Finding the p-value for an F-test in MINITAB Calc Probability Distributions F. Check Cumulative Probability, and use the default value of 0.0 for Noncentrality parameter. Specify the numerator and denominator degrees of freedom in the corresponding boxes. Highlight Input Constant and enter the value of the F-statistic, F obs. Leave Optional Storage blank. Minitab s output gives P (F F obs ), but the p-value is P (F F obs ) = 1 P (F < F obs ) = 1 P (F F obs ) (since the F distribution is continuous). 19

Birthweights example: Performing the ANOVA test 20

Carrying out the test in MINITAB Stat ANOVA One-Way Response: Weight in pounds Factor: Reduced Maternal Status Click OK One-way ANOVA: Birthweight in pounds versus Maternal smoking status Source DF SS MS F P Maternal smoking 3 11.673 3.891 4.41 0.014 Error 23 20.304 0.883 Total 26 31.976 S = 0.9396 R-Sq = 36.50% R-Sq(adj) = 28.22% Individual 95% CIs For Mean Based on Pooled StDev Level N Mean StDev ---+---------+---------+---------+------ Heavy 8 6.0125 0.7200 (-------*--------) Light 7 6.3286 1.1398 (--------*--------) Non 7 7.5857 0.9616 (--------*--------) Quit 5 7.2400 0.9127 (---------*----------) ---+---------+---------+---------+------ 5.60 6.40 7.20 8.00 Pooled StDev = 0.9396 21

The analysis of variance table One-way ANOVA: Birthweight in pounds versus Maternal smoking status Source DF SS MS F P Maternal smoking 3 11.673 3.891 4.41 0.014 Error 23 20.304 0.883 Total 26 31.976 An analysis of variance table lists all the sources of variability that we account for in our data: 1. The variability within the groups (treatments) Maternal smoking for our example. 2. The variability between the groups (errors). 3. The total variability. It is an easy way to lay out the F test of H 0 : µ 1 =... = µ I, versus H a : there is some difference in the means. 22

The layout of the analysis of variance table Sum of Mean Source d.f. Squares Squares F Statistic p-value Between groups I 1 SS(B) MS(B) F obs P (F F obs ) Within groups n I SS(W) MS(W) Total n 1 SS(T) 23

More of the MINITAB output S = 0.9396 R-Sq = 36.50% R-Sq(adj) = 28.22% S is the pooled estimate of the S.D., s p = MS(W ). For the birthweights, s p = 0.883 = 0.9396. R-Sq, R 2, is the percentage of the total variance accounted for by the model: R 2 For the birthweights it is = SS(B) SS(T ) 100%. R 2 = SS(B) SS(T ) 100% = 11.673 20.304 100% = 28.22%. For Stat 529, ignore R-Sq(adj). 24

CIs for each population mean, differences of means Individual 95% CIs For Mean Based on Pooled StDev Level N Mean StDev ---+---------+---------+---------+------ Heavy 8 6.0125 0.7200 (-------*--------) Light 7 6.3286 1.1398 (--------*--------) Non 7 7.5857 0.9616 (--------*--------) Quit 5 7.2400 0.9127 (---------*----------) ---+---------+---------+---------+------ 5.60 6.40 7.20 8.00 Pooled StDev = 0.9396 Using the pooled estimate of the S.D., s p, a 100(1 α)% CI for µ i is given by Y i ± t n I (1 α 2 ) s p ni. A 100(1 α)% CI for µ i µ j (i j) can be calculated using Y i Y j ± t n I (1 α 2 )s 1 p + 1. n i n j These intervals do not adjust for making multiple comparisons (we will correct for multiple comparisons later in the course). 25