One-way ANOVA (Single-Factor CRD) STAT:5201 Week 3: Lecture 3 1 / 23
One-way ANOVA We have already described a completed randomized design (CRD) where treatments are randomly assigned to EUs. There is no blocking and no nesting in a CRD. We will now take a closer look at the model for a CRD when there is tdd/0; only one factor. l/;wj--. We will consider two models (or parameterizations) for describing the single-factor CRD here. The first is called the cell means model and the second is called the effects model. We will mostly use the latter. 2 / 23
Notation tdd/0; Let Y ij be the jth response in treatment i. We have i = 1, 2,..., g groups and j = 1, 2,..., n i where the number of observations from each group does not have to be the same. ni j=1 Y ij Let Ȳ i = n i be the mean response in the ith treatment group (stated as Y-bar or a cell mean ). Let Ȳ = mean. l/;wj--. g ni i=1 j=1 Y ij N be the grand mean response or the overall N = i n i is the total number of observations in the study. 3 / 23
One-way ANOVA: Cell means model Cell Means Model Y ij = µ i + ɛ ij iid with ɛ ij N(0, σ 2 ) for i = 1,..., g and j = 1,..., n i We have one mean parameter µ i for each cell, or separate group. This is the same as Y ij N(µ i, σ 2 ). 4 / 23
One-way ANOVA: Cell means model Cell Means Model iid Y ij = µ i + ɛ ij with ɛ ij N(0, σ 2 ) for i = 1,..., g and j = 1,..., n i The estimates for the mean-structure parameters are simply the cell means, or ˆµ i = Ȳi Estimate for the noise: ˆσ 2 = i j (Y ij Ȳ i ) 2 N g where N = i n i Positive Characteristics We do not need any constraints or restrictions for estimation because we use g parameters to describe g means. ˆµ i is the estimated group mean. Estimates are easy, very intuitive. Negative Characteristic The estimated parameters don t directly tell us how far a treatment mean is from the overall mean, nor how far a treatment mean is from another treatment mean (but we can get this information from our ˆµ i values.) 5 / 23
One-way ANOVA: Cell means model Cell Means Model Y ij = µ i + ɛ ij with ɛ ij iid N(0, σ 2 ) for i = 1,..., g and j = 1,..., n i The Design matrix X is of full rank for the cell means model. Again, very easy to work with, intuitive. Example (1-way ANOVA with g = 3 and n = 2) Suppose we have a one-way ANOVA framework with g = 3 and n = 2 for each group. In the cell means model, the design matrix is of rank 3 and has 6 rows and 3 columns. µ 1 µ 2 µ 3 X = 1 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 1 Letting Y = X µ + ɛ using OLS we have ˆµ = ˆµ 1 ˆµ 1 ˆµ 1 = (X X ) 1 X Y = Ȳ 1 Ȳ 2 Ȳ 3 6 / 23
One-way ANOVA: Effects model Effects Model Y ij = µ + α i + ɛ ij with ɛ ij iid N(0, σ 2 ) for i = 1,..., g and j = 1,..., n i In this model, we use g + 1 parameters to describe g means. This is an overparameterization. - One parameter µ for the overall mean. - One parameter α i for each group. This is the same as Y ij N(µ + α i, σ 2 ). 7 / 23
One-way ANOVA: Effects model Effects Model Y ij = µ + α i + ɛ ij with ɛ ij iid N(0, σ 2 ) for i = 1,..., g and j = 1,..., n i µ + α i represents the mean of a group. Because this is an overparameterization, we need a constraint to make the parameters identifiable (i.e. uniquely determined). One option is to use the sum-to-zero constraints which provides intuitive interpretation of the parameters. For balanced data, this is g i ˆα i = 0 and we use the estimates of ˆµ = Ȳ ˆα i = Ȳi Ȳ ˆσ 2 = i j (Y ij Ȳi ) 2 N g. Here, ˆµ represents the overall mean and ˆα i represents the distance that group i is from the overall mean. Some ˆα i values will be positive and some will be negative. 8 / 23
One-way ANOVA: Effects model Effects Model Y ij = µ + α i + ɛ ij with ɛ ij iid N(0, σ 2 ) for i = 1,..., g and j = 1,..., n i Positive Characteristics In the sum-to-zero constraints, the estimated parameters directly tell us how far a treatment mean is from the overall mean. The effects are just deviations from the grand mean. Negative Characteristic We need a constraint or restriction on the parameters for estimation due to overparameterization. * No statistical software uses sum-to-zero-constraints by default, but we will use these when calculating estimates by hand (it s easiest). * SAS uses a constraint that sets the last ˆα i = 0. By default, R sets the first ˆα i = 0. In these constraints µ no longer represents the overall mean, but the mean of a specific reference group. 9 / 23
One-way ANOVA The choice between these models (cell means model or effects model) or constraints does affect the interpretation of the parameters, but the important estimates are the same under any of these choices... * Fitted Ŷ values * Differences between groups or ˆµ i ˆµ j * Residual ˆɛ ij values In a one-way ANOVA, we perceive a scenario where we have distinct group means, with normally distributed errors around the means, and we are interested in comparing group means. This perception is the same regardless of the model and constraint choices above. 10 / 23
One-way ANOVA Unbalanced data in One-way ANOVA If you have unbalance data, n l n k for some l, k and you are using the effects model, then the grand mean i j Y ij i ˆµ = Ȳ = N = i n = i looks like a weighted average of the group means. ˆµ will be pulled toward the larger groups. j Y ij The sum-to-zero constraints are g i n i ˆα i = 0 i n i ˆµ i N Estimates of the effects are shown with the same formula as deviations from the grand mean which is ˆα i = ˆµ i ˆµ But most of the time we will have balanced data, so I will usually state the constraints on the board as g i ˆα i = 0 NOTE: if n i = n j for all i, j then g i n i ˆα i = 0 g i ˆα i = 0. 11 / 23
One-way ANOVA: Sums of Squares ANOVA - The partitioning of the sums of squares is called Analysis of Variance, or ANOVA. In an ANOVA, we break down the total variability in the data into component parts, i.e. into the differing sources of variation. Consider the one-factor experiment: Y ij = µ + α i + ɛ ij with ɛ ij iid N(0, σ 2 ) for i = 1,..., g and j = 1,..., n i We analyze such data as a 1-way ANOVA with only one factor and the hypothesis test of interest is H 0 : µ 1 = = µ g vs. H 1 : at least one group is not equal If we reject this null hypothesis, we usually do follow-up comparisons to see which of the groups are statistically significant from each other. 12 / 23
One-way ANOVA: Sums of Squares Total Variation: SS TOT = i j (Y ij Ȳ ) 2 This is the total sum of squares (corrected for the mean). Variation due to Treatment: SS TRT = i n i(ȳ i Ȳ ) 2 This is the treatment sum of squares. This quantifies how far the groups means are from the overall mean. Unexplained Variation: SS E = i j (Y ij Ȳi ) 2 This is the sum of squares for error. This quantifies how far the individual observations are from their group mean. We know SS TOT = SS TRT + SS E *Fundamental ANOVA identity 13 / 23
One-way ANOVA: Sums of Squares At a minimum, an ANOVA table will list the sources of variation in the experiment and their degrees of freedom. We usually also include the sum of squares (SS x ) and the related mean squares (MS x ). Here is a general layout for a 1-way ANOVA: 14 / 23
One-way ANOVA: Correcting for the mean We almost always estimate an overall mean in our model, so we lose 1 degree of freedom (d.f.) right away. Thus, we essentially start with N 1 d.f., and say the total sum of squares is corrected for the mean. Once we have our overall mean estimated (or ˆµ), then we only need g 1 more parameters to describe the mean structure (i.e. to describe the g cell means). Thus, we use g 1 d.f. for Treatment. The leftover N g d.f. are given for estimation of the error. 15 / 23
One-way ANOVA: Example Example (Response time for circuit types) Three different types of circuit are investigated for response time in milliseconds. Fifteen are completed in a balanced CRD with the single factor of Type (1,2,3). Circuit Type Response Time 1 9 12 10 8 15 2 20 21 23 17 30 3 6 5 8 16 7 From D.C Montgomery (2005). Design and Analysis of Experiments. Wiley:USA 16 / 23
One-way ANOVA: Example Example (Response time for circuit types) 17 / 23
One-way ANOVA: Example Example (Response time for circuit types) See handout for annotated output. 18 / 23
Class Level Information One-way ANOVA: MS TRT and MS E Class Levels Values type 3 1 2 3 Why is the ANOVA table useful? The MS values will be used to perform statistical tests. Number of Observations Read 15 Number of Observations Used 15 Example (Response time for circuit types) Dependent Variable: time Sum of Source DF Squares Mean Square F Value Pr > F Model 2 543.6000000 271.8000000 16.08 0.0004 Error 12 202.8000000 16.9000000 Corrected Total 14 746.4000000 We need to know what we EXPECT2 to get from MS TRT and MS E... E(MS TRT ) = σ 2 + g i n i α 2 i g 1 E(MS E ) = σ 2 If H 0 : µ 1 = µ 2 = µ 3 α i = 0 i is true, then E(MS TRT ) = σ 2, and MS TRT and MS E should be similar. If H A : α i 0 is true for at least one i, then MS TRT > MS E. 19 / 23
One-way ANOVA: MS TRT and MS E We base our statistical test on the ratio of MS TRT MS E. Under H 0 true, F o = MS TRT MS E 1 for our F o (in general). F (g 1,N g) and we expect a value near Under H A true, F o has a stochastically greater distribution than F (g 1,N g) and we reject the null if F o > F (g 1,N g,0.95) 20 / 23
One-way ANOVA: MS TRT and MS E Example (Response time for circuit types) Circuit data F o = 271.8 16.9 = 16.08 compared to F (2,12) to get p-value. p-value is 0.0004 Only valid if model assumptions are met (we ll return to checking the assumptions for this model soon). 21 / 23
One-way ANOVA: Full vs. Reduced Models The overall F -test in a 1-way ANOVA is actually a test for comparing a full model and a reduced model that is nested in the full model. - A reduced model is nested in a full model if it is a particular case of the full model. NOTE: we will use the design term nested in another way later, so be aware of this. The ANOVA table in the 1-way ANOVA compares a full model (requiring g parameters to describe the mean structure) and a reduced model (requiring only 1 parameter to describe the mean structure). Thus, we can think of the F -test as comparing a full and reduced model. 22 / 23
SIDENOTE: SAS Settings 1 On the first line of all my SAS code files, I set the following options: options linesize = 79 nocenter nodate formchar = " ---- + ---+= -/\<>*" ; 2 I set my preferences to have SAS output the results in both listing and HTML format. The HTML output is nice because you automatically get HTML graphics generated, but I ve also found the HTML output difficult to deal with at times as well (like when I m trying to save pieces of it). Therefore, I also generate all my output as a listing. If you are on virtual desktop, I know you can choose this option by going to... Tools Options Preferences... Click the Results tab, and check the box that says Create Listing. Then OK. This listing output is just text and you can easily copy and paste the pieces into LaTeX and use the verbatim environment to present it. If you copy and past into Word, you might use a monospace font, such as Andale Mono or SAS monospace. If you save your listing output it will be as a.lst file. 23 / 23