STAT 4 ANOVA -Cotrasts ad Multiple Comparisos /3/04 Plaed comparisos vs uplaed comparisos Cotrasts Cofidece Itervals Multiple Comparisos: HSD Remark Alterate form of Model I y ij = µ + α i + ɛ ij, a i α i = 0 idetifiability costrait Plaed comparisos - sigle pairs of meas, or costraits specified i advace Differece of Meas e.g. µ i µ j : like a two-sample test -but, we have a ANOVA model ad hece the pooled variace estimate s for the commo variace σ. 00( α)%ci ȳ i ȳ j ± t α/[ν] SEȳi ȳ j, ν = a, SEȳi ȳ j = s + i j Ex: (Pea sectio data) Legth of pea sectios grow i tissue cultures [] 3 4 5 3 4 5 3 4 5 3 4 5 3 4 5 3 4 5 3 4 5 3 4 Cotrol Glucose Fructose Gluc+ Fruc Sucrose 75 57 58 58 6 67 58 6 59 66 70 60 56 58 65 75 59 58 6 63 65 6 57 57 64 7 60 56 56 6 67 60 6 58 65 67 57 60 57 65 76 59 57 57 6 68 6 58 59 67 peas=sca() pea.df=data.frame(peas,culture=as.factor(rep(:5,0))) culture=as.factor(rep(:5,0)) culture [46] 3 4 5 Levels: 3 4 5 pea.lm=lm(peas~culture,data=pea.df) aova(pea.lm) Respose: peas Df Sum Sq Mea Sq F value Pr(>F) culture 4 077.3 69.33 49.368 6.737e-6 *** Residuals 45 45.50 5.46 --- Sigif. codes: 0 *** 0.00 ** 0.0 * 0.05. 0. pea.resid=residuals(pea.lm) pea.fitted=fitted(pea.lm) matrix(roud(pea.resid,),col=5,byrow=t) [,] [,] [,3] [,4] [,5] [,] 4.9 -.3-0. 0 -.
[,] -3. -.3.8.9 [3,] -0. 0.7 -. 0 0.9 [4,] 4.9-0.3-0. 3 -. [5,] -5..7 -. - -0. [6,] 0.9 0.7 -. - -. [7,] -3. 0.7.8 0 0.9 [8,] -3. -.3.8-0.9 [9,] 5.9-0.3 -. - -. [0,] -..7-0..9 > sum(roud(pea.resid,)^) [] 45.5 matrix(roud(pea.fitted,),col=5,byrow=t) [,] [,] [,3] [,4] [,5] [,] 70. 59.3 58. 58 64. [,] 70. 59.3 58. 58 64. [3,] 70. 59.3 58. 58 64. [4,] 70. 59.3 58. 58 64. [5,] 70. 59.3 58. 58 64. [6,] 70. 59.3 58. 58 64. [7,] 70. 59.3 58. 58 64. [8,] 70. 59.3 58. 58 64. [9,] 70. 59.3 58. 58 64. [0,] 70. 59.3 58. 58 64. 95% CI for µ c µ g, differece betwee the cotrol group ad the glucose group: ȳ c ȳ g = 70. 59.3 = 0.8 s = s = MS withi = 5.46 =.34 SEȳc ȳ g =.34 0 + 0 =.046 dof ν = a = 50 5 = 45 ; t.975[45] =.05 C.I. = 0.8 ± (.0)(.046) = [8.69,.9] Cotrast: A liear combiatio of meas where the coefficiets sum to zero: Populatio Sample a a γ = c i µ i c i = 0 i= i= c i ȳ i i Ex: sugars vs. cotrol γ = µ c 4 (µ g + µ f + µ g+f + µ s ), coefficiets (c i ) = (, 4, 4, 4, 4 ) Typically used to compare groups of meas or certai weighted combiatios ( orthogoal cotrasts ) such as liear or quadratic effects. Variace of a sample cotrast (assumig the sample meas are idepedet) Estimated SE c, SE c = s i c i i, Var c = Var ( c i ȳ i ) = c i Var(ȳ i ) = c i 00( α)%ci C ± t α/ [ν]se c σ i
Ex: C = 70. (59.3 + 58. + 58 + 64.) = 0. 4 c i = i 0 [ + + + + ] = 5 4 4 4 4 40 = c 8, i = i 8 = 0.3536 SE c = (.34) (0.3536) = 0.873 CI for C is 0. ± (.0) (0.873) = [8.53,.87] Hypothesis Test for Cotrast: e.g. H 0 : γ = γ 0 Form a t-statistic t = C γ 0 t [ν] if H 0 is true. SE c Remark: differeces betwee meas are a speacial case of cotrasts: e.g. µ c µ g = a i= c i µ i with (C i ) = (,, 0, 0, 0). These types of ivestigatios should be doe o combiatios of factors that were determied i advace of observig the experimetal results, or else the cofidece levels are ot as specified by the procedure. Also, doig several comparisos might chage the overall cofidece level. This ca be avoided by carefully selectig cotrasts to ivestigate i advace ad makig sure that: the umber of such cotrasts does ot exceed the umber of degrees of freedom betwee the treatmets *oly orthogoal cotrasts are chose. However, there are also several powerful multiple compariso procedures we ca use after observig the experimetal results. Uplaed comparisos After lookig at the data, we may wish to assess the sigificace of, or give C.I. s for certai differeces e.g. µ g µ f (i pea legth e.g.) or cotrasts e.g. µ s (µ 3 g + µ f + µ gf ) That looked iterestig a posteriori. Viewed a priori, however, there are may differeces or cotrasts that could potetially attract attetio. We eed to adjust our sigificace levels ad P-values (larger) or our C.I. s (wider) to allow for this search over all possibilities. Subject of multiple comparisos -see books, e.g. Miller, R.G. Simulateous Statistic Iferece All pairs of differeces with a treatmets, there are ( ) a = a(a ) possible comparisos of differet meas: µ i µ j, i =,..., a; j =,..., i If we used t-itervals, would have may itervals of form I ij ȳ i ȳ j ± t α/[ν] SEȳi ȳ j But the chace that all itervals simultaeously cover all µ i µ j : P {I ij coverµ i µ j ; for all]i < j} < α To obtai a simultaeous coverage property, make itevals wider I T K ij ȳ i ȳ j ± Q α[a,ν] SEȳi ȳ j TK = Tukey Kramer Q α[a,ν] are percetage poits of studetized rage distributio. Formal defiitio: Q [a,ν] = max Z i Z j s where Z, Z,..., Z a N(0, ); νs χ (ν) ad all idepedet gives wider itervals Q α[a,ν] a=!) Ex. (pea legths) Q.95[5,45] = 4.0 =.84(>.0 = t.975[45] ) > t α/[ν] (uless 3
> qtukey(0.95,5,45) [] 4.0 simultaeous iterval for µ c µ g i 0.8 ± (.843)(.046) = [7.83, 3.8] Simultaeous coverage property if Model I holds, ad = =... = a ( balaced ), the P (Iij T K covers µ i µ j for all i < j) = α Remark: If the ANOVA is ubalaced (ot all i equal) the these Tukey-Kramer itervals are coservative (coverage prob α). Whe comparig the meas for the levels of a factor i a aalysis of variace, a simple compariso usig t-tests will iflate the probability of declarig a sigificat differece whe it is ot i fact preset. This because the itervals are calculated with a give coverage probability for each iterval but the iterpretatio of the coverage is usually with respect to the etire family of itervals. Joh Tukey itroduced itervals based o the rage of the sample meas rather tha the idividual differeces. The itervals retured by this fuctio are based o this Studetized rage statistics. Techically the itervals costructed i this way would oly apply to balaced desigs where there are the same umber of observatios made at each level of the factor. This fuctio icorporates a adjustmet for sample size that produces sesible itervals for mildly ubalaced desigs. >peas.aov_aov(peas~gr) >TukeyHSD(peas.aov) Tukey multiple comparisos of meas 95% family-wise cofidece level Fit: aov(formula = peas ~ gr) $gr diff lwr upr - -0.8-3.76807-7.8393 3- -.9-4.86807-8.9393 4- -. -5.06807-9.393 5- -6.0-8.96807-3.0393 3- -. -4.06807.86807 4- -.3-4.6807.66807 5-4.8.8393 7.76807 4-3 -0. -3.6807.76807 5-3 5.9.9393 8.86807 5-4 6. 3.393 9.06807 > peas.hsd_tukeyhsd(peas.aov) > plot(peas.hsd) 5 0 5 0 5 0 5 4 5 3 4 3 5 4 3 5 4 3 95% family wise cofidece level Differeces i mea levels of gr The term experimet wise error rate α arises because, if H 0 is true (all µ i equal), the the chace of falsely declarig as sigificat ay of the a(a ) pair wise diffs is (at most) α : P H0 {max ȳ i ȳ i SEȳi ȳ i > Q α/[a,ν] } α ( = α if all i equal) Balaced Case ad Hoestly Sigificat Differece (HSD) if all i =, the all SEȳi ȳ i = s ȳ i ȳ i > Q α/ [a, ν] s so just fid those pairs (ȳ i ȳ i ) separated by > HSD 4
Overlappig itervals picture - the ± HSD itervals overl ap if ad oly if ȳ i ȳ i HSD meas (µ i, µ i ) whose HSD itervals do t overlap are sigificatly differet at experimet wise error rate α. (Warig! ȳ i ± HSD is NOT a 00( α)% Cof. iterval! ) All cotrasts: The Scheffé itervals I s ± (a )F α,[a,ν] SE c have the simultaeous coverage property (for balaced or ubalaced cases) P {I s coverforallcotrasts} = a α Sice cotrasts are more geeral tha differeces, expect Scheffé itervals to be eve wider tha Tukey-Kramer Ex: γ = µ s (µ 3 g + µ f + µ gf )c = (0,,,, ) c i i ( + 3 + ) = 4 3 3 3 = 0 3 3 c SE c = s i = (.34)(.365) = 0.8544 (a )F α[a,ν] = 4F.95[4,45] = 4 x (.58) = 3. 95 % Scheffé iterval for c = x s ( x 3 g + x f + x gf ) = 64. 58.5 = 5.6 has margi effor (3.)(0.8544) =.743 CI [5.6.74, 5.6 +.74] = [.86, 8.34] (Note that Scheffé multiplier = 3. >.84 = Qα[a,ν] = Tukey-Kramer multiplier) Remark: - There is a versio of the T-K itervals for cotrasts - these ca be better (shorter) tha the Scheffé method if a is larger ad relatively fewer c i are o-zero cotr.peas=matrix(c(4,-,-,-,-,0,-,-,3,-),col=) cotr.peas [,] [,] [,] 4 0 [,] - - [3,] - - [4,] - 3 [5,] - - cotrasts(culture)=cotr.peas 5