In Box, Hunter, and Hunter Statistics for Experimenters is a two factor example of dying times for animals, let's say cockroaches, using 4 poisons and pretreatments with n=4 values for each combination of poison and pretreatment. Box, Hunter, and Hunter is a classic book used by many industrial a = 4 levels of treatment A = Poison b = 3 levels of treatment B = Pretreatment n = 4 replicates for each treatment combination Poison \ Pretreat 1 2 3 Mean A 2.49 3.27 4.80 n = 12 3.52 B 1.16 1.40 3.03 n = 12 1.86 C 1.86 2.71 4.26 n = 12 2.95 D 1.69 1.70 3.09 n = 12 2.16 n = 16 n = 16 n = 16 n = 48 2.62 Mean 1.80 2.27 3.79 SS Poison = 12(3.52 2.62) 2 + 12(1.86 2.62) 2 + 12(2.95 2.62) 2 + (2.16 2.62) 2 = 20.41 SS Pretreat = 16(1.80 2.62) 2 + 16(2.27 2.62) 2 + 16(3.79 2.62) 2 = 17.44 SS model = 4(2.49 2.62) 2 + 4(3.27 2.62) 2 + (4.80 2.62) 2 + + 4(3.09 2.62) 2 = 56.86 SS AB = 56.86 (20.41 + 34.88) = 1.57 MS Error = (4 1)0.502 + (4 1)0.82 2 + (4 1)0.53 2 + + (4 1)0.24 2 (4 1) + (4 1) + (4 1) + + (4 1) = 0.24 Source df Poison 4 1 = 3 a 1 Pretreat 3 1 = 2 b 1 Poison * Pretreat 3 2 = 6 (a 1) (b 1) Error 12 (4 1) = 36 ab (n 1) Model df = 3 + 2 + 6 = 11 Check: 12-1 = 11
From the interaction plot with standard error bars on the next page and plot of residuals versus predicted values, we can see that The variances are not equal o Since the n's are the same, the error bars are proportional to standard deviations as well. There is an apparent interaction.
Residuals versus Predicted: Clearly increasing variances with larger values Note: if we test for an interaction, the p-value is not less than 0.05 because the variances are not equal so the ANOVA results are unreliable. Given these interactions o We cannot make general statements about how these poisons compare. o We would have to make separate conclusions for each pretreatment. o If we have results for another poison with a particular pretreatment, we could not generalize to how that poison behaves with other pretreatments It's nicer if we do not have interactions.
o Then we can make general statements comparing poisons. o In the analysis that the researchers used for these data, they did the analysis on rates rather than times. Rate = 1/Time Backtransformed means from the rate scale would be harmonic means in the time scale. One way to check for a "best" transformation is with the Box-Cox procedure. o Aside: Box married RA Fisher's daughter. o The family of Box-Cox transformations is λ y y = λ 1 Where λ specifies the transformation For λ 0 This is essentially y λ, a power transformation. For λ = 0 λ y 0 λ 1 Limit = Ln( y) λ This family of transformations includes all powers and a log transformation. 1 This does not include for example the arcsin y transformation This makes variances nearly equal when Var(Y) = k*µ (1 µ) o The case for a binomial distribution. o This found by solving Var g( Y ) g µ Var Y = g µ µ * 1 µ = ( ) ( ) ( ) ( ) ( ) constant The Transreg procedure in SAS handles regression with transformations including the Box-Cox transformation. proc transreg data=anova.poison; model boxcox(time/ lambda = -2 to 2 by 0.10) = class(poison pretreat poison*pretreat); In this case the interaction was included, but we might have wanted to find the optimal transformation for a model without any interaction to look for good no interaction models.
Box-Cox Transformation Information for time Lambda R-Square Log Like -1.8 0.87 108.8756-1.7 0.87 110.3243-1.6 0.87 111.6391-1.5 0.87 112.8140-1.4 0.87 113.8431-1.3 0.87 114.7210-1.2 0.87 115.4425 * -1.1 0.87 116.0033 * -1.0 + 0.87 116.3993 * -0.9 0.87 116.6277 * -0.8 0.86 116.6862 < -0.7 0.86 116.5736 * -0.6 0.86 116.2897 * -0.5 0.85 115.8353 * -0.4 0.85 115.2121 * -0.3 0.84 114.4228-0.2 0.84 113.4711-0.1 0.83 112.3612 0.0 0.83 111.0980 0.1 0.82 109.6871 < - Best Lambda * - 95% Confidence Interval + - Convenient Lambda In this example the the maximum likelihood estimator (mle) of λ is -0.8. o Transformed y = y -0.8 o This λ maximizes the probability of having seen these data for a model with normal, equal variance errors. A "convenient" almost as good transformation is λ = -0.8 reciprocal transformation to rates. o y = y -1.0 = 1/y. The confidence interval for λ ( 0.40 το 1.4) is found using a likelihood ratio test as in Stat 5572 o Include any model in the confidence interval such that the Log Likelihood, essentially Ln(Probability of Data given model) differs from the maximized Ln(Probability of Data given model) = Ln(Probability of Data given model with λ = -1) by no more than half of the 95 th percentile of a Chi-Square distribution with 1 df. Since a Chi-Square distribution is Z 2, this is 0.5*(1.96) 2 = 0.5*3.84 The plots for Y = Rate show little, if any interaction and similar variances. See plots below. With a model with little if any interaction, we can make more general conclusions about how these poisons act and generalize our results to new situations. Interactions do not violate any assumptions for the model. But they do complicate summarizing and generalizing results.
Interaction Plot: There is not much interaction. o Much less than the time scale. The standard errors and thus standard deviations are fairly similar.
Residual Plots: The variances appear fairly similar with no outliers. Residuals vs Predicted
Residuals vs Poision
Residuals vs Pretreatment
The residuals are reasonably normal. There was no need to look at the normal plot in the time scale, since the variances weren't constant anyway. libname anova 'C:\Documents and Settings\rregal\My Documents\5411\5411_2009'; options nodate; ods pdf file= 'C:\Documents and Settings\rregal\My Documents\5411\5411_2009\poison.pdf'; symbol1 interpol=stdm1tj value=none; proc gplot data=anova.poison; plot time*poison=pretreat; proc glm data=anova.poison; class poison pretreat; model time = poison pretreat poison*pretreat; output out=time_resid r=resid p=predicted; ods exclude classlevels NObs; * Cut down the output some; symbol1 interpol=none value=dot height=0.07in; proc gplot data=time_resid; plot resid*predicted;
*Check for transformation; proc transreg data=anova.poison; model boxcox(time/ lambda = -2 to 2 by 0.10) = class(poison pretreat poison*pretreat); * Now for rate analysis ; symbol1 interpol=stdm1tj value=none; proc gplot data=anova.poison; plot rate*poison=pretreat; proc glm data=anova.poison; class poison pretreat; model rate = poison pretreat poison*pretreat; ods exclude classlevels NObs; * Cut down the output some; lsmeans poison/ tdiff pdiff CL; output out=rate_resid r=resid p=predicted; symbol1 interpol=none value=dot height=0.07in; proc gplot data=rate_resid; plot resid*(predicted poison pretreat); ods graphics on; proc univariate data=rate_resid plot normal; var resid; ods select testsfornormality probplot; * Select just some of the output; probplot; ods graphics off; ods pdf close;