


1 Contents Comparison of several groups Analysis of variance April 16, 2009 One-way ANOVA Two-way ANOVA Interaction Model checking Acknowledgement for use of presentation Julie Lyng Forman, Dept. of Biostatistics (2008), Lene Theil Skovgaard, Dept. of Biostatistics (2007, 2006) ANOVA, April Comparison of 2 or more groups Marc Andersen StatGroup ApS number different same of groups individuals individual 2 unpaired paired t-test t-test 2 oneway twoway analysis of variance analysis of variance One-way analysis of variance: Do the distributions differ between the groups? Do the levels differ between the groups?

2 ANOVA, April ANOVA, April Example: 22 bypass-patients, 3 different kinds of ventilation during anaesthesia, randomized Group I Group II Group III 50% N 2 O, 50% O 2 for 24 hours 50% N 2 O, 50% O 2 during operation 30 50% O 2 (no N 2 O) for 24 hours Gr.I Gr.II Gr.III n Mean SD ANOVA, April ANOVA, April One-way ANOVA one-way: because we only have one critera for classification of the observations, here ventilation method ANalysis Of VAriance: because we compare the variance between groups with the variance within groups Model: j th observation in group no. i Y ij = µ i + ε ij mean of group no. i individual deviation Observations are assumed be independent and to follow a normal distribution (within each group) with the same variance. ε ij N(0, σ 2 ) or equivalently Y ij N(µ i, σ 2 ) Model assumptions must be checked!

3 ANOVA, April ANOVA, April ANOVA math: Sums of squares Decomposition of deviation from grand mean : Hypothesis testing Usual approach Null hypothesis: group means are equal, H 0 : µ i = µ Alternative hypothesis: group means are not equal We show the means are not equal by rejecting the null hypothesis of equality (ref DGA, 8.5 Hypothesis Testing) y ij ȳ i ȳ. y ij ȳ = (y ij ȳ i ) + (ȳ i ȳ ) j th observation in i th group average in i th group total average Decomposition of variation (sums of squares): (y ij ȳ ) 2 = (y ij ȳ i ) 2 + (ȳ i ȳ ) 2 i,j i,j i,j }{{}}{{}}{{} total variation within groups between groups ANOVA, April ANOVA, April Decomposition of variation: total = between + within F-test statistic: SS total = SS between + SS within (n 1) = (k 1) + (n k) F = MS between = SS between/(k 1) MS within SS within /(N k) Reject the null hypothesis if F is large, i.e. if the variation between groups is too large compared to the variation within groups. Usually the analysis is summarized in an Analysis of variance table Variation df SS MS F P Between k 1 SS b SS b /df b MS b /MS w P (F (df b, df w ) > F obs ) Within n k SS w SS w /df w Total n 1 SS tot

4 ANOVA, April ANOVA, April Analysis of variance table - Anaestesia example df SS MS F P Between Within Total F = 3.71 F (2, 19) P = 0.04 Weak evidence of non-equality of the three means Analysis of variance in SAS To define the anaestesia data in SAS, we write data ex_redcell; input grp redcell; cards; ; The variable redcell contains all the measurements of the outcome and grp contains the method of ventilation for each individual. ANOVA, April ANOVA, April Analysis of variance program: The option solution outputs parameter estimates: proc glm data=ex_redcell; class grp; model redcell=grp / solution; General Linear Models Procedure Dependent Variable: REDCELL Sum of Mean Source DF Squares Square F Value Pr > F Model Error Corrected Total R-Square C.V. Root MSE REDCELL Mean Source DF Type I SS Mean Square F Value Pr > F GRP T for H0: Pr > T Std Error of Parameter Estimate Parameter=0 Estimate INTERCEPT B GRP B B B... NOTE: The X X matrix has been found to be singular and a generalized inverse was used to solve the normal equations. Estimates followed by the letter B are biased, and are not unique estimators of the parameters. Group 3 (the last group) is the reference group The estimates for the other groups refer to differences to this reference group Source DF Type III SS Mean Square F Value Pr > F GRP

5 ANOVA, April ANOVA, April Some issues: Clinical significance Statistical significance Provide confidence interval Does it make sense? Interpreting the estimates Multiple comparisons The F -test show, that there is a difference but where? Pairwise t-tests are not suitable due to risk of mass significance Recall a significance level of α = 0.05 means 5% chance of wrongfully rejecting a true hypothesis (type I error) The chance of at least one type I error goes up with the number of tests (for k groups, we have m = k(k 1)/2 possible tests, the actual significance level can be as bad as: 1 (1 α) m, e.g. for k=5: 0.40) ANOVA, April ANOVA, April There is no completely satisfactory solution. Approximative solutions: 1. Select a (small) number of relevant comparisons in the planning stage. 2. Make a graph of the average ±2 SEM and judge visually (!), perhaps supplemented with F -tests on subsets of groups. 3. Modify the t-tests by multiplying the P-values with the number of tests, the socalled Bonferroni correction (conservative) 4. Use a correction for multiple testing (Dunnett, Tukey) or a (prespecified) multiple testing procedure Tukey multiple comparisons in SAS: proc glm data=ex_redcell; class grp; model redcell=grp / solution; lsmeans grp / adjust=tukey pdiff cl; The GLM Procedure Least Squares Means Adjustment for Multiple Comparisons: Tukey-Kramer Least Squares Means for effect grp Pr > t for H0: LSMean(i)=LSMean(j) Dependent Variable: redcell i/j Least Squares Means for Effect grp Difference Simultaneous 95% Between Confidence Limits for i j Means LSMean(i)-LSMean(j)

6 ANOVA, April ANOVA, April Visual assessment: the bars represent confidence intervals for the means. proc gplot data=ex_redcell; plot redcell*grp / haxis=axis1 vaxis=axis2 frame; axis1 order=(1 to 3 by 1) offset=(8,8) label=(h=3 gruppe nr. ) value=(h=2) minor=none; axis2 offset=(1,1) value=(h=2) minor=none label=(a=90 R=0 H=3 red cell foliate ); symbol1 v=circle i=std2mjt l=1 h=2 w=2; Model checking Check if the assumptions are reasonable: (If not the analysis is unreliable!) Variance homogeneity may be checked by performing Levenes test (or Bartletts test). In case of variance inhomogeneity, we may also perform a weighted analysis (Welch s test), just as in the T-test Normality may be checked through probability plots (or histograms) of residuals, or by a numerical test on the residuals. In case of non-normality, we may use the nonparametric Kruskal-Wallis test Transformation (often logarithms) may help to achieve variance homogeneity as well as normality ANOVA, April ANOVA, April Check of variance homogeneity and normality in SAS proc glm data=ex_redcell; class grp; model redcell=grp; means grp / hovtest=levene welch; output out=model p=predicted r=residual; Store residuals in a dataset for further model checking proc univariate normal data=model; var residual; Output from proc glm: Test for variance homogeneity Levene s Test for Homogeneity of redcell Variance ANOVA of Squared Deviations from Group Means Sum of Mean Source DF Squares Square F Value Pr > F grp Error and weighted anova in case of variance heterogeneity: Welch s ANOVA for redcell Source DF F Value Pr > F grp Error So we are not too sure concerning the group differences...

7 ANOVA, April ANOVA, April Output from proc univariate: Test for normality: Tests for Normality Test --Statistic p Value---- Shapiro-Wilk W Pr < W Kolmogorov-Smirnov D Pr > D > Cramer-von Mises W-Sq Pr > W-Sq > Anderson-Darling A-Sq Pr > A-Sq > The 4 tests focus on different aspects of non-normality. For small data sets, we rarely get significance For large data sets, we almost always get significance Could look at a probability plot instead Non-parametric ANOVA, the Kruskal-Wallis test: proc npar1way wilcoxon; exact; class grp; var redcell; Again, we have lost the significance... Wilcoxon Scores (Rank Sums) for Variable redcell Classified by Variable grp Sum of Expected Std Dev Mean grp N Scores Under H0 Under H0 Score Kruskal-Wallis Test Chi-Square DF 2 Asymptotic Pr > Chi-Square Exact Pr >= Chi-Square ANOVA, April ANOVA, April Two-way analysis of variance Two criterias for subdividing observations, A og B Data in two-way layout: (not for analysis!!) B A 1 2 c r. Effect of both factors Perhaps even interaction (effect modification) One factor may be individuals or experimental units (e.g. different treatments tried on same person) Repeated measurements Example: Short term effect of enalaprilate on heart rate Time Subject average average

8 ANOVA, April ANOVA, April Line plot ( Spaghettiogram ) Additive model: Y st = µ + α s + β t + ε st The two effects (s and t) work in an additive way. The ε st s are assumed to be independent, normally distributed with mean 0, and identical variances, ε st N(0, σ 2 ) (check this!) Variational decomposition: SS total = SS subject + SS time + SS residual Ideally the time courses are parallel. ANOVA, April ANOVA, April Analysis of variance table - enalaprilate example Two-way ANOVA in SAS: df SS MS F P Subjects < Times Residual Total Highly significant difference between subjects (not very interesting) Significant time differences. proc glm data=ex_pulse; class subject times; model hrate=subject times / solution; General Linear Models Procedure Class Level Information Class Levels Values SUBJECT TIMES Number of observations in data set = 36

9 ANOVA, April ANOVA, April Analysis of variance table from output: Parameter estimates from output: General Linear Models Procedure Dependent Variable: HRATE Sum of Mean Source DF Squares Square F Value Pr > F Model Error Corrected Total R-Square C.V. Root MSE HRATE Mean Source DF Type I SS Mean Square F Value Pr > F SUBJECT TIMES Source DF Type III SS Mean Square F Value Pr > F SUBJECT TIMES T for H0: Pr > T Std Error of Parameter Estimate Parameter=0 Estimate INTERCEPT B SUBJECT B B B B B B B B B... TIMES B B B B... NOTE: The X X matrix has been found to be singular and a generalized inverse was used to solve the normal equations. Estimates followed by the letter B are biased, and are not unique estimators of the parameters. subject 9 at time 120 minutes is the reference ANOVA, April ANOVA, April Expected values for subject=3, times=30: ŷ st = ˆµ + ˆα s + ˆβ t = = Residuals r st = observed expected = y st ŷ st ε st Residual for subject 3, time 30: r 32 = = 0.84 Look for: Model checking differences in variances (systematic?) Non-normality. Lack of additivity (interaction). Can only be tested if there is more than one observation for each combination Serial correlation? (Neighboring observations look more alike)

10 ANOVA, April ANOVA, April Enalaprilate example: Use the residuals for model checking: Probability plot of residuals. Plot residuals vs expected values. Plot residuals vs group. Look for outliers (a large residual means observed and expected values deviate a lot). No systematic patterns should be present. ANOVA, April ANOVA, April Interaction Example of two criterias for subdividing individuals: sex and smoking habits Outcome: FEV 1 Possible explanations for interaction: biologically different effects of smoking on males and females perhaps the women do not smoke as much as the men perhaps the effect is relative (to be expressed in %) Here, we see an interaction between sex and smoking.

11 ANOVA, April ANOVA, April Example: The effect of smoking on birth weight ANOVA, April ANOVA, April Interaction: There is an effect of smoking, but only for those who have been smoking for a long time. There is an effect of duration, and this effects increases with amount of smoking The effect of duration depends upon... amount of smoking and the effect of amount depends upon... duration of smoking Example: Fibrinogen after spleen operation 34 rats are randomized, in 2 ways 17 have their spleen removed (splenectomy=yes/no) 8/17 in each group are kept in high altitude (place=altitude/control) Outcome: Fibrinogen level in mg at day 21

12 ANOVA, April ANOVA, April The usual additive model: Y spr = µ + α s + β p + ε spr, ε spr N(0, σ 2 ) splenectomy (s=yes/no) and place (p=altitude/control) have an additive effect. Model with interaction Y spr = µ + α s + β p + γ sp + ε spr, ε spr N(0, σ 2 ) Here, we specify an interaction between splenectomy and place, i.e. the effect of living in a high altitude may be thought to depend upon whether or not you have an intact spleen. and vice versa.. ANOVA, April ANOVA, April Two-way ANOVA with interaction in SAS: Dependent Variable: fibrinogen proc glm data=ex_fibrinogen; class splenectomy place; model fibrinogen=place splenectomy place*splenectomy / solution; output out=model p=predicted r=residual; Sum of Source DF Squares Mean Square F Value Pr > F Model Error Corrected Total R-Square Coeff Var Root MSE fibrinogen Mean Source DF Type I SS Mean Square F Value Pr > F The GLM Procedure Class Level Information Class Levels Values splenectomy 2 no yes place 2 altitude control Number of observations 34 place splenectomy splenectomy*place Source DF Type III SS Mean Square F Value Pr > F place splenectomy splenectomy*place

13 ANOVA, April ANOVA, April Standard Parameter Estimate Error t Value Intercept B place altitude B place control B.. splenectomy no B splenectomy yes B.. splenectomy*place no altitude B splenectomy*place no control B.. splenectomy*place yes altitude B.. splenectomy*place yes control B.. Parameter Pr > t Intercept <.0001 place altitude place control. splenectomy no splenectomy yes. splenectomy*place no altitude splenectomy*place no control. splenectomy*place yes altitude. splenectomy*place yes control. The reference levels are place=control, splenectomy=yes (they come last in the alphabet) so the expected fibrinogen level for these animals is intercept= For all other groups, we have to add one or more extra estimates, as shown in the table below: NOTE: The X X matrix has been found to be singular, and a generalized inverse was used to solve the normal equations. Terms whose estimates are followed by the letter B are not uniquely estimable. ANOVA, April ANOVA, April place splenectomy control altitude yes = no = = Model checking Variance homogeneity may be judged from a one-way anova: The GLM Procedure Class Level Information Class Levels Values group 4 no_altitude no_control yes_altitude yes_control Number of observations 34 Levene s Test for Homogeneity of fibrinogen Variance ANOVA of Squared Deviations from Group Means Sum of Mean Source DF Squares Square F Value Pr > F group E Error E No reason to suspect inhomogeneity

14 ANOVA, April ANOVA, April Normality assumption for residuals (proc univariate normal) In the two-way anova, the interaction was not significant (P=0.77), so we omit it from the model: Tests for Normality Test --Statistic p Value Shapiro-Wilk W Pr < W Kolmogorov-Smirnov D Pr > D > Cramer-von Mises W-Sq Pr > W-Sq Anderson-Darling A-Sq Pr > A-Sq proc glm data=ex_fibrinogen; class splenectomy place; model fibrinogen=place splenectomy / solution clparm; Dependent Variable: fibrinogen Sum of Source DF Squares Mean Square F Value Pr > F Model Error Corrected Total R-Square Coeff Var Root MSE fibrinogen Mean No reason to suspect non-normality Source DF Type III SS Mean Square F Value Pr > F place splenectomy ANOVA, April ANOVA, April Standard Parameter Estimate Error t Value Pr > t Intercept B <.0001 place altitude B place control B... splenectomy no B splenectomy yes B... Residual plots Normality: Variance homogeneity: Removal of spleen leads to a decrease in fibronogen of approx mg at day 21 Placing in altitude leads to an increase in fibronogen of approx mg at day 21

15 ANOVA, April ANOVA, April More complicated analyses of variances Three- or more-sided analysis of variance. Latin squares I A B C II B C A III C A B (Cochran & Cox (1957): Experimental Designs, 2.ed., Wiley) Cross-over designs Variance component models Example of a latin square: A rabbit experiment 6 rabbits Vaccination at 6 different spots on the back 6 different orders of vaccination Swelling is area of blister (cm 2 ) spot rabbit order swelling ANOVA, April ANOVA, April Some illustrations:

16 ANOVA, April ANOVA, April Fit 3-way analysis of variance, with additive effects proc glm; class rabbit spot order; model swelling=rabbit spot order; Dependent Variable: swelling Sum of Source DF Squares Mean Square F Value Pr > F Model Error Corrected Total R-Square Coeff Var Root MSE swelling Mean Source DF Type III SS Mean Square F Value Pr > F The GLM Procedure Class Level Information Class Levels Values rabbit spot 6 a b c d e f order Number of observations 36 rabbit spot order The design is balanced, so the test of the effect of one variable (covariate) does not depend on which of the others are still in the model. ANOVA, April How about possible interactions? proc glm; class rabbit spot order; model swelling=rabbit spot order spot*order; Dependent Variable: swelling Sum of Source DF Squares Mean Square F Value Pr > F Model Error Corrected Total Source DF Type I SS Mean Square F Value Pr > F rabbit spot order spot*order There is no room for interaction, since there is only one observation for each combination of spot and order!

Analysis of variance. April 16, 2009 Analysis of variance April 16, 2009 Contents Comparison of several groups One-way ANOVA Two-way ANOVA Interaction Model checking Acknowledgement for use of presentation Julie Lyng Forman, Dept. of Biostatistics

