enote 6 1 enote 6 Model Diagnostics

Size: px

Start display at page:

Download "enote 6 1 enote 6 Model Diagnostics"

Evan Hardy
5 years ago
Views:

1 enote 6 1 enote 6 Model Diagnostics

2 enote 6 INDHOLD 2 Indhold 6 Model Diagnostics Introduction Linear model diagnostics Which model to use? The model assumptions Residuals Normality investigation Checking for variance homogeneity Checking for variance homogeneity in new model for transformed data Outliers Check for influential observations Data transformation and back transformation Specific alternative distributions Box-Cox transformations Back transformation Model diagnostics in the mixed the linear model

3 enote INTRODUCTION Check for random effects normality Drying of beech wood data - a case study, part II Factor structure and basic model revisited Explorative analysis on transformed data Test of overall effects/model reduction for transformed data Post hoc analysis and summarizing the results for the transformed data Estimates of the variance parameters Estimates of the fixed parameters Error bars in plots Comparisons of the fixed parameters Exercises Introduction As should be clear by now, the mixed linear models are indeed based on a number of assumptions about distributions etc. As for linear models without random effects it is important to check for those assumptions as much as possible. For various reasons it is both convenient and useful to discuss model diagnostics in purely fixed effect models first. In mixed linear models with no further model structures on the residual part, the assumptions for the residual part are exactly the same as the assumptions for residuals in a purely fixed model. And both theory and R-software options are more easily accessible for usual linear model than for mixed linear models. So in many cases we can chose to either partly embed the model control into linear models (ANOVA/regression) without random effects by considering the model in which all random effects are considered fixed. Or we can subsequenly extract the mixed model residuals and infleunce measures and do similar copy the diagnostic plotting for these. So after a Section on simple linear model diagnostics we turn to the specifics of the mixed linear models again.

4 enote LINEAR MODEL DIAGNOSTICS Linear model diagnostics This section contains only information that could be part of any course on basic statistical analysis Which model to use? As was pointed out in the analysis of the beech wood data in Module 3, one step of the overall approach is to try to simplify a (possibly complex) starting model to a (simpler) final model. This leaves us to make the decision which of these models, we want to run through the model control machine. In the beech wood example it corresponds to basing the model diagnostics on either the (fixed effect starting) model given by Y i = µ + α(width i ) + β(depth i ) + γ(width i, depth i ) + δ(plank i ) + ɛ i, or the (fixed effect final) model given by Y i = µ + α(width i ) + β(depth i ) + δ(plank i ) + ɛ i. There are no clear answer to this question! Note that both of these models are purely fixed effect models where the plank effect for now is modelled as a fixed effect. The purpose here is solely to do some model diagnostics. Since the process of going from the starting model to the final model uses the model assumptions one would be inclined to use the starting model, since then no time is wasted on model reductions that would have to be ignored anyway after a model check in the reduced model shows that this model does not really hold. However, if large models (compared to the number of observations) are specified the information in the data about the model assumptions can be rather weak. So as a general approach, we recommend to carry out the model control primarily in a (preliminary) reduced model, and then redo the model reduction analysis if required The model assumptions The classical assumptions for linear normal models (without random effects) are the following:

5 enote LINEAR MODEL DIAGNOSTICS 5 1. The model structure should capture the systematic effects in the data. 2. Normality of residuals 3. Variance homogeneity of residuals 4. Independence of residuals It is recommended always to check whether these assumptions appear to be fulfilled for the situation in question. The independence assumption may not always be easily checked, although for some data situations methods are available, eg. for repeated measures data. We will return to this in later modules.the assumption in 1. is particularly an issue when regression terms (quantitative factors) enters the model. For the classical (x, y) linear regression model situation this corresponds to the assumption of linearity between x and y. Apart from the formal assumptions it is important to focus on the possibility of: A Outliers B Influential observations Residuals The assumptions may be investigated by constructing the predicted (expected) and residual values from the model. For the final main effects model for the beech wood data, it would amount to constructing: and ŷ i = ˆµ + ˆα(width i ) + ˆβ(depth i ) + ˆδ(plank i ) ˆɛ i = y i ŷ i In fact, it turns out the (theoretical) variance of these residuals are generally not homogeneous (even under the model assumption of homogeneous variance)! This is because the residuals are not the real error terms ɛ ijk but only estimated versions of those. The variance becomes: Var( ˆɛ i ) = σ 2 (1 h i ) 2 where σ 2 is the model error variance and h i is the so-called leverage for observation i. We will not give the exact definition of the leverage here, but just point out that the leverage is a measure (between 0 and 1) of distance from the ith observation to the typical (mean)

6 enote LINEAR MODEL DIAGNOSTICS 6 observation only using the X-information of a model. In a simple regression setting the leverage is equivalent to (x i x) 2. For pure ANOVA models the leverage has a less clear interpretation, and in fact for some cases, like the example here with balanced data, the leverage is actually the same for all observations. So the effect of constructing a nice experimental design combined with the luck of avoiding missing values induces a situation in which no observations are more atypical/ strange than others. High leverage (atypical) observations are potentially highly influential on the results of the analysis, and we do not want that the conclusions we make are based only on one or very few observations. To account for the difference in variances in the residuals, we use instead the standardized residuals defined by: ˆɛ i = y i ŷ i ˆσ(1 h i ) and these are given directly by R for us to study in various ways: 1. Normality investigation(histogram, probability/quantile plots, significance tests) 2. Plot of residuals versus predicted values 3. Plot of residuals versus the values/levels of quantitative/qualitative factors in the model. From now on, when we say residuals we consider the standardized residuals. Actually R provides easy access to some of the core diagnostic plots by the plot function, as is illustrated in Figure Normality investigation In figure 6.1(upper right) we see that the residuals for the example seem to be symmetrically distributed and that the normal distribution seems to fit quite well apart maybe from a few extremely small and large values. It is possible, and often easily provided, to compute different significance test for normality, e.g. the following:

7 enote LINEAR MODEL DIAGNOSTICS 7 planks <- read.table("planks.txt", header = TRUE, sep = ",") planks$plank <- factor(planks$plank) planks$depth <- factor(planks$depth) planks$width <- factor(planks$width) lm1 <- lm(humidity~depth+plank+width, data = planks) par(mfrow=c(2,2)) plot(lm1, which=1:4) Residuals vs Fitted Normal Q-Q Residuals Standardized residuals Standardized residuals Fitted values Scale-Location Cook s distance Theoretical Quantiles Cook s distance Fitted values Obs. number par(mfrow=c(1,1)) Figur 6.1: The four basic diagnostic plots by R

8 enote LINEAR MODEL DIAGNOSTICS 8 Test Statistic P value Shapiro-Wilk Kolmogorov-Smirnov Cramer-von Mises Anderson-Darling We do not give the exact definitions of theses tests here nor discuss the features of each test. We note that they seem to reject the normality assumption, although not that clear for some of the test. These tests and P-values should not be given too much weight in practical data analysis. For small data sets they will tend to be very weak, that is, it is generally difficult to reject the normality assumption in small data sets. But this is not the same as having proved that the normality is true! For large data sets, they become sensitive to even very small deviations from normality, also deviations that due to the central limit theorem have no effect on the tests and confidence intervals used in the analysis. Anyway, the significance tests may still enter as a part of a complete model diagnostics, and IF they become significant for rather small data sets, you definitely know, that there is a problem and IF they are non significant for a large data set, you can feel certain that everything is OK. The presence of some extreme observations is seen in the Normal Q-Q plot in figure 6.1(upper right), where it is clear that 5 residuals are too small while 4 residuals are too large compared to what could expected from the normality assumption. But other than that the distribution fits nicely to a normal distribution. So we seem to have a number of outliers, see below for a discussion of outliers Checking for variance homogeneity To check for variance homogeneity we plot the residuals versus the predicted values and versus the values/levels of quantitative/qualitative factors in the model. The former is what the default diagnoistic plot is gving us: In the two left hand side plots of the residuals versus the predicted values, figure 6.1 (top left and bottom left) it is investigated whether the variance depends on the mean ( on the size of the observations ). We actually see, that there is a typical trumpet shape indicating that the variance increases with increasing mean. Then one would consider a transformation like the log or something similar. There also seems to be some systematic deviation from zero, where the left and right ones are more typically positive and the middle ones more typically negative. This is highly disturbing as such patterns indicate that important structures in the data was not accounted for, cf. assumption 1.

9 enote LINEAR MODEL DIAGNOSTICS 9 Note that since the root-absolute residuals in the bottom left plot are standardized, we can view the size of these residuals in the light of the standard normal distribution: Approximately 5% of the residuals should be larger than 2 = 1.4 and only 0.1% should larger than 3 = So for a data set with 300 observations you wouldn t expect any residuals beyond this number. Plotting residuals against other factors in the model, we do ourselves by extracting either the raw residuals as: rawresiduals <- resid(lm1) Or the standardized standresid <- rstandard(lm1) Or even the so-called studentized ones: studresid <- rstudent(lm1) where the estimated residual error ˆσ (i) 2 without using the ith observation is used instead of the usual ˆσ 2 in the standardization of the residuals. The three plots of residuals versus the factor levels investigate whether there are any group dependent variance heterogeneity. For the five depth groups, figure 6.2 (bottom right) the variances look similar, for the three width groups, figure 6.2 (bottom left), there may be a tendency that the middle width has lower variablity and for the 20 planks, figure 6.2 (top right), there does seem to be clear differences in variability. However, since the residuals versus predicted plot indicated some severe problems, we shouldn t worry too much about these other potential variance heterogeneities, since these may very well change in the process of fixing the problem The same goes for the outlier and influence investigation. So apart from exclusion of errornous and explainable extreme observations, we should postpone the outlier/influence investigation until the investigation of normality and variance homogeneity is completed. Outlying observations in an inadequate model may turn out to be reasonable observations in a more proper model!

10 enote LINEAR MODEL DIAGNOSTICS 10 par(mfrow=c(2,2)) plot(studresid~predict(lm1)) with(planks, plot(studresid~plank, col = heat.colors(20))) with(planks, plot(studresid~width, col = rainbow(3))) with(planks, plot(studresid~depth, col = rainbow(5))) predict(lm1) studresid plank studresid width studresid depth studresid par(mfrow=c(1,1)) Figur 6.2: Residuals versus predicted and factor levels

11 enote LINEAR MODEL DIAGNOSTICS Checking for variance homogeneity in new model for transformed data In a section below the beech wood data is re-considered by including some possible effects that was forgotten in the first place together with a log-transformation of the data. In the remainder of this module we consider the (studentized) residuals from the model now obtained: planks$loghum=log(planks$humidity) lm3 <- lm(loghum ~ depth * plank + depth * width + plank * width, data = planks) studresid <- rstudent(lm3) Figure 6.3 is a reproduction of figure 6.1 on the new residuals. The significance tests becomes: Test Statistic P value Shapiro-Wilk Kolmogorov-Smirnov > Cramer-von Mises > Anderson-Darling > In R such tests could be produced by e.g. the nortest package:

12 enote LINEAR MODEL DIAGNOSTICS 12 par(mfrow=c(2,2)) plot(lm3, which=1:4) Residuals vs Fitted Normal Q-Q Residuals Standardized residuals Fitted values Scale-Location Theoretical Quantiles Cook s distance Standardized residuals Cook s distance Fitted values Obs. number Figur 6.3: The four basic diagnostic plots for the extended model and log-transformed data by R

13 enote LINEAR MODEL DIAGNOSTICS 13 par(mfrow=c(2,2)) plot(studresid~predict(lm3)) with(planks, plot(studresid~plank, col = heat.colors(20))) with(planks, plot(studresid~width, col = rainbow(3))) with(planks, plot(studresid~depth, col = rainbow(5))) predict(lm3) studresid plank studresid width studresid depth studresid par(mfrow=c(1,1)) Figur 6.4: Residuals versus predicted and factor levels for the extended model and logtransformed data

14 enote LINEAR MODEL DIAGNOSTICS 14 shapiro.test(rawresiduals) Shapiro-Wilk normality test data: rawresiduals W = , p-value = library(nortest) lillie.test(rawresiduals) Lilliefors (Kolmogorov-Smirnov) normality test data: rawresiduals D = , p-value = cvm.test(rawresiduals) Cramer-von Mises normality test data: rawresiduals W = , p-value = ad.test(rawresiduals) Anderson-Darling normality test data: rawresiduals A = , p-value = Now histogram, probability plot and significance tests all support the assumption of normality. The residuals versus predicted plot, figure 6.4, now has a much nicer pattern.

15 enote LINEAR MODEL DIAGNOSTICS 15 The trumpet shape and the systematic structure has disappeared and none of the rootabsolute-residuals are larger than 3 = The plots of residuals versus the factor levels of depth and width indicate no problems with heterogeneity but there may still be slight differences in residual variability for the different planks Outliers An outlier is intuitively defined as: An observation that deviates unusually much from it s expected value. What is unusual is determined by the (estimated) probability distribution of the model. The general approach for handling outliers is the following: 1. Identify the outlying observations 2. Check whether some of these may be due to errors or may be explained and excluded for some external/atypical reasons: maybe it turns out that a single plank was treated in some extreme way not representative for what is investigated. 3. Investigate the influence of non-explainable outliers. In practice, we are often left with some extreme observations that we cannot exclude for any of the reasons given in 2. The only thing left, is to investigate whether such extreme observations have important influence on the results of the analysis. This is done by redoing the analysis leaving out the extreme observations and comparing with the original results. In fact, there are stastistics easily extracted that can do this automatically for us. There are no indications of outliers in the new model for the log-transformed data Check for influential observations As a final step we investigate the influence of observations on the results of the analysis. A measure of influence is given by the change of the expected(predicted) value of the model by leaving out an observation: f i = ŷi ŷ (i) ˆσ (i) hi

16 enote LINEAR MODEL DIAGNOSTICS 16 where ŷ (i) is the model predicted value without using the ith observation, and ˆσ (i) 2 is the residual error variance estimated without using the ith observation. This is called the dffit value. Another measure is the Cook s distance : D i = n j=1 (ŷ j ŷ j(i) ) 2 pˆσ 2 where p is the number of parameters in the model. Similarly one could measure how much each individual parameter estimate in the model would change by leaving out an observation. All these values are directly extracted by the influence.measures function, illustrated below by a simple model to be able to more easily get the idea: lm0 <- lm(loghum ~ width, data = planks) infl.lm0 <- influence.measures(lm0) dim(infl.lm0$infmat) [1] head(infl.lm0$infmat) dfb.1_ dfb.wdt2 dfb.wdt3 dffit cov.r cook.d hat e e e e e e e e e e e e The first columns give the so-called DFBETAS values - the measure of how much an observation has effected the parameter estimate, next the DFFITS are given. Then the so-called COVRATIO - a measure of the impact of each observation on the variances (and standard errors) of the parameter estimates (=regression coefficients) and their covariances. Then the Cook s Distance is given and at the end the leverage values (h i ). Such measures would usually be plotted versus the observation number, see the lower right corner of figure 6.3, where R as one of default plots plots the Cook s Distance. There are no clear rules for the size of the Cook s distance - although you can find a

17 enote DATA TRANSFORMATION AND BACK TRANSFORMATION 17 number of very different rules of thumb out there - but look out for one of few extreme ones. If one or more observations were extreme in this sense, we would have to investigate in more detail what exact parts of the conclusion are influenced and in what way (by comparing model/test results with/without the observation). It could also be investigated whether such influential observations group in any particular way - it may be that the observations from an entire plank are influential, and it could be relevant to study the effect of leaving out all the observations from this plank to see the effect of this. 6.3 Data transformation and back transformation If the assumption of normality and/or constant variance are not fulfilled, based on an inspection of the standardized residuals, then the problem can often be solved by transforming the response variable and then consider a mixed linear model for the transformed variable. How should one then go about choosing a transformation? Of course one could just try with a given transformation and then see if the assumptions behind the linear model look like being better fulfilled after the transformation. It would however be more satisfying with a more constructive approach. With some experience one can often see from the plot of the standardized residuals against the predicted values which transformation is needed. If the picture is a fan opening to the right ( trumpet-shaped ) then typically a log-transformation or a power transformation (with a positive exponent) is what is called for. If the fan opens to the left an exponential transformation or a negative power transformation often helps. There are some more systematic approaches to the choice of transformation and in this section we will consider two such approaches Specific alternative distributions If the observations are more naturally described by a different distribution than the normal then the variance may vary with the mean. For example in the Poisson distribution the variance equals the mean. In such cases one can often successfully transform the data and then describe the transformed data by the normal distribution which has a constant variance. This is often preferable as the normal distribution is well understood and many results concerning distributions of estimates and test statistics are exact. In the following table such transformations are given for some common distributions:

18 enote DATA TRANSFORMATION AND BACK TRANSFORMATION 18 Distribution Variance Scale Transformation Binomial µ(1 µ) interval (0,1) arcsin Poisson µ Positive Gamma µ 2 Positive log Inverse Gauss µ 3 Positive 1/ Box-Cox transformations When another distribution for the observations is not obvious, which is usually the case, one could try and look for a power transformation. This only works for positive data but can be applied to all data if a constant is added to all observations. The idea is, instead of using a linear model for the observations Y 1,..., Y N, to analyze Z 1,..., Z N where { Y λ Z i = i, λ = 0 log Y i, λ = 0, (6-1) using a linear model. Here λ = 1 corresponds to no transformation, λ = 1/2 a square root transformation, and so on. So how should λ be determined? The most well known way was proposed by Box and Cox (1964) and is therefore known as the Box-Cox transformation. In order for the transformation to be continuous in λ, which is convenient for technical reasons, the Box-Cox transformation is written in the form { (Y λ Z i = i 1)/λ, λ = 0 log Y i, λ = 0. In this context, however, it suffices to think of the transformation as given by (6-1). The appealing feature about this approach is that λ is considered as a parameter along with the rest of the parameters in the linear model, and is therefore determined from the data. This is done using the method of maximum likelihood. The maximum likelihood estimate for λ is defined as the value of λ that maximizes the likelihood function, or the log likelihood function which is this case is given by l(λ) = N 2 log SS e(λ) + (λ 1) N i=1 log Y i, where SSe(λ) is the residual sum of squares corresponding to the linear model for the observations transformed by λ. Let ˆλ denote the maximum likelihood estimate of λ. The hypothesis H 0 : λ = λ 0 can be tested using the test statistic 2(l( ˆλ) l(λ 0 )) which is approximately χ 2 (1)-distributed.

19 enote DATA TRANSFORMATION AND BACK TRANSFORMATION 19 Large values of the test statistic are critical for the hypothesis. An approximate (1 α)%- confidence interval is given by the set of λ-values satisfying 2(l( ˆλ) l(λ)) χ 2 α(1), where χ 2 α(1) denotes the (1 α)%-quantile of the χ 2 (1)-distribution, in particular χ (1) = In the package MASS there is a function boxcox which takes an lm object and by default computes the values of the log likelihood function over the range -2 to 2 of the parameter λ in the transformation:. model1.5 <- lm(humidity ~ depth * plank + depth * width + plank*width, data = planks) library(mass) par(mfrow=c(1,2)) plot(boxcox(model1.5)) 95% log Likelihood boxcox(model1.5)$y λ boxcox(model1.5)$x

20 enote MODEL DIAGNOSTICS IN THE MIXED THE LINEAR MODEL Back transformation Working with transformed data has the disadvantage that often one would prefer to present the results on the original scale rather than using the transformed scale, which means that some kind of back transformation is required. However, not all quantities are easily back transformed with meaningful interpretations for any kind of transformation. We suggest to use simple back transformations of estimates/lsmeans. If ˆµ is an estimate computed for the log transformed data, then use the inverse log, the exponential: exp( ˆµ) as an estimate on the original scale. It should be noted that this is a biased estimate of the expected value on the original scale. In fact it is an estimate of the median, but this also seems like a more natural quantity to estimate when taking into consideration that the distribution on the original is not symmetric. A 95%-confidence interval is easily obtained by calculating it on the transformed scale and then transforming the endpoints of the interval back. Note that such an interval is not symmetric reflecting the asymmetric distribution on the original scale. 6.4 Model diagnostics in the mixed the linear model The residual errors ɛ i in the mixed models seen so far in this course are imposed the same assumptions as the residual errors in a systematic linear model. In later modules on repeated measures data the focus will be on more general residual error covariance structure modeling and specific tools for the investigation of this will be given. The estimated residuals could be defined within the mixed model using the predicted values (BLUP) for the random effects, such that they are given by (using the vectormatrix notation of the theory module): r = y ( X ˆβ + Zû ) In general the BLUPs of the mixed model and the parameter estimates of the corresponding fixed effects model will be different: The BLUPs are shrinkage versions of the fixed effects parameters (The most extreme values among the levels of a random factor become less extreme). However, the difference is often not pronounced and because of the more complicated model structure, the standardization of the residuals becomes more complicated. The raw residuals can be easily extracted also from lmer-results:

21 enote MODEL DIAGNOSTICS IN THE MIXED THE LINEAR MODEL 21 library(lmertest) lmer3 <- lmer(loghum ~ depth * width + (1 plank) + (1 depth:plank) + (1 plank:width), data = planks) lmerresid <- resid(lmer3) And now all the plots from above could be constructed manually, see Figure 6.5 and Figure 6.6. Only the basic residual versus fitted plot would be automatically produced by the plot function. And influence measures for lmer results can be extracted by the influence function of the influence.me-package, see Figure 6.7 for how to extract and plot the Cook s Distances Check for random effects normality Until now we have based the model diagnostics on the residuals. This means that we only investigated the normality assumptions of the residual error: ɛ i N(0, σ 2 ). It is the hope that a choice of transformation based on the structures in the residuals will also stabilize the random effects distributions, but this is not in any way guaranteed. However, in the mixed model used below for this, we assume that the effects due to planks and plank interactions are also normally distributed: and d(plank i ) N(0, σ 2 Plank ), f (width i, plank i ) N(0, σ 2 Plank width ) g(depth i, plank i ) N(0, σ 2 Plank depth ), ɛ i N(0, σ 2 ) We investigate this by looking the BLUPs for the random effects, cf. figure 6.8 where there are no indication of any severe lack of normality. This approach will only make sense if the number of levels for a factor is not too small. And the really big flaw: IF we have problems with normality of the random effects, we wouldn t really know how to cope with this!

22 enote MODEL DIAGNOSTICS IN THE MIXED THE LINEAR MODEL 22 par(mfrow=c(1,2)) plot(sqrt(abs(lmerresid))~predict(lmer3)) qqnorm(lmerresid) predict(lmer3) sqrt(abs(lmerresid)) Normal Q Q Plot Theoretical Quantiles Sample Quantiles Figur 6.5: Residuals versus predicted and Normal QQ plotfor the mixed model

23 enote MODEL DIAGNOSTICS IN THE MIXED THE LINEAR MODEL 23 par(mfrow=c(2,2)) plot(lmerresid~predict(lmer3)) with(planks, plot(lmerresid~plank, col = heat.colors(20))) with(planks, plot(lmerresid~width, col = rainbow(3))) with(planks, plot(lmerresid~depth, col = rainbow(5))) predict(lmer3) lmerresid plank lmerresid width lmerresid depth lmerresid Figur 6.6: Residuals versus predicted and factor levels for the mixed model

24 enote MODEL DIAGNOSTICS IN THE MIXED THE LINEAR MODEL 24 library(influence.me) lmer3.infl <- influence(lmer3, obs=true) par(mfrow=c(1,1)) plot(cooks.distance(lmer3.infl)) Index cooks.distance(lmer3.infl) Figur 6.7: Cook s distance for the mixed model

25 enote MODEL DIAGNOSTICS IN THE MIXED THE LINEAR MODEL 25 par(mfrow=c(1,3)) qqnorm(ranef(lmer3)$ depth:plank [,1]) qqnorm(ranef(lmer3)$ plank:width [,1]) qqnorm(ranef(lmer3)$plank[,1]) Normal Q Q Plot Theoretical Quantiles Sample Quantiles Normal Q Q Plot Theoretical Quantiles Sample Quantiles Normal Q Q Plot Theoretical Quantiles Sample Quantiles Figur 6.8: Random effects normal probability plot

26 enote DRYING OF BEECH WOOD DATA - A CASE STUDY, PART II Drying of beech wood data - a case study, part II In this example section we complete the analysis of this data set. In module 3 we completed an analysis, but without checking the model assumptions. In the main part of this module 6, we found, based on the plot of residuals versus predicted values that something was clearly wrong Factor structure and basic model revisited Having some possible problems with the model as in this case, it is also important to consider whether we actually included all possible effects in the model. In fact, in the classical randomized block analysis carried out in Module 3, we ignored the possibility of an interaction effect between planks and widths or between planks and depths. The average profile plots of the log-transformed data is given here as figure 6.9. The patterns in the two top plots provide information about these two interactions. The depth patterns seem to be rather parallel whereas some clear deviations from parallel patterns are seen for the width humidity structures. Only a statistical analysis can reveal whether these effects are significant. Since the plank effect is considered random the interactions with plank should also be considered random. So including these two would correspond to the model given by the factor structure in figure 6.10 or expressed formally: log Y i = µ + α(width i ) + β(depth i ) + γ(width i, depth i ) + d(plank i ) + f (width i, plank i ) + g(depth i, plank i ) + ɛ i, (6-2) where and d(plank i ) N(0, σ 2 Plank ), f (width i, plank i ) N(0, σ 2 Plank width ) g(depth i, plank i ) N(0, σ 2 Plank depth ), ɛ i N(0, σ 2 ) 6.6 Explorative analysis on transformed data Since a log-transformation also affects the structure of the data, it will generally be a good idea to redo some of the explorative plots of the raw data. In this case it does not change much, so we do not give any further plots.

27 enote TEST OF OVERALL EFFECTS/MODEL REDUCTION FOR TRANSFORMED DATA 27 mean of loghum mean of loghum width depth mean of loghum mean of loghum width depth Figur 6.9: Four average log-humidity profiles 6.7 Test of overall effects/model reduction for transformed data Although we have moved on to use lmertest or Anova for handling the mixed model, we have also worked with lm for model diagnostics. ANOVA on the lm-results can also

28 enote TEST OF OVERALL EFFECTS/MODEL REDUCTION FOR TRANSFORMED DATA [width plank] [plank] [I] [depth plank] 76 width depth width 8 15 depth 4 5 Figur 6.10: The factor structure diagram be used to provide certain F-tests for random effects (NOT given by lmertest). For instance, for those random effects belonging to the error stratum, i.e. those random effects with an arrow directly from [I] in the factor structure diagram. And furthermore, a comparison of the results of fixed ANOVA with the results of lmertest will hopefully support the subject matter understanding. The ANOVA table from the full fixed effects model (as given by both anova and Anova, as the data is balanced) is:

29 enote POST HOC ANALYSIS AND SUMMARIZING THE RESULTS FOR THE TRANSFORMED DATA 29 Source of DF Sums of Mean F P-value variation squares squares depth (217.14) (<.0001) width (80.86) (<.0001) depth*width plank (97.34) (<.0001) depth*plank width*plank <.0001 Error The F-statistics (and P-values) NOT to be used are put in parentheses. To compare, the table of fixed effects from the mixed model analysis corresponding to the model given by (6-2) is: Source of Numerator degrees Denominator degrees Mean P-value variation of freedom of freedom squares depth <.0001 width depth*width From the full ANOVA table we see that the interaction between width and plank is clearly significant, whereas the depth*plank interaction is on the limit. Note that the test for the depth*width interaction is the same in both tables. As opposed to the preliminary analysis above, this interaction seems to be significant. Also note that the denominator degrees of freedom of the tests of main effects coincide with the degrees of freedom (DF) for the test of the plank interaction term in the fixed model, and in fact in this case: F depth = MS depth MS depth plank and F width = MS width MS width plank So the test of fixed effects in the mixed model could be easily derived from the fixed effects ANOVA. This will not always be the case! In summary, we cannot reduce the model, since all effects appear significant. 6.8 Post hoc analysis and summarizing the results for the transformed data The final model is given by model (6-2).

30 enote POST HOC ANALYSIS AND SUMMARIZING THE RESULTS FOR THE TRANSFORMED DATA Estimates of the variance parameters Estimates of the four variance parameters (on log-scale) are: ˆσ 2 Planks = , ˆσ2 Plank width = ˆσ 2 Plank depth = , ˆσ2 = It is not clear at all how one could (or should!) back transform these values to the original scale. The confidence bands for the standard deviations these are obtained the usual way: confint(lmer3, 1:4) Computing profile confidence intervals % 97.5 %.sig sig sig sigma Estimates of the fixed parameters Estimates of the important parts of the systematic part of the model are the 15 values for each combination of width and depth (the LSMEANS for the interaction). Since we have a balanced model these are the simple plank averages of the log-humidity within each combination. Direct back transformation (using the exponential or antilog function) of LSMEANS is used. In this case it has the effect that the average values (median) presented for each combination are the so-called geometrical averages: (using the ijknotation) exp ( 1 20 These are depicted in figure ) ( ) 1/ log(y ijk ) = y ijk k=1 k=1

31 enote POST HOC ANALYSIS AND SUMMARIZING THE RESULTS FOR THE TRANSFORMED DATA 31 mylsmeans <- lsmeans(lmer3, "depth:width")$lsmeans.table with(mylsmeans, interaction.plot(depth, width, exp(estimate), col = 2:4)) width mean of exp(estimate) depth Figur 6.11: Back transformed expected values

32 enote POST HOC ANALYSIS AND SUMMARIZING THE RESULTS FOR THE TRANSFORMED DATA Error bars in plots For such a plot, one would usually like to add some kind of uncertainty information. But this is not always easily done in a meaningful way. A common approach in scientific literature is to add some error bars to each point showing plus/minus one or two standard errors (SE) of estimation. The use of plus/minus one SE should be avoided, since this has no real interpretation (and there is a danger that people actually think of these intervals as confidence intervals). The use of 2 standard errors corresponds to the 95% confidence band for each value, which may be a reasonable information to convey. In general, then the proper confidence interval using the critical value from the t-distribution could just as well be used. But still, IF the aim of the plot is to spot the significant/important differences, then these confidence intervals are NOT of much use: The eye would tend to claim two points significantly different, if the bars do not overlap and non-significant if they do overlap. This corresponds to claiming difference if (and only if) two points are more than 4 standard errors apart. However, this is NOT correct. In this case for two reasons: First of all, if the standard error of two estimated values are equal (and independent), then SE( ˆβ 1 ˆβ 2 ) = Var( ˆβ 1 ˆβ 2 ) = Var( ˆβ 1 ) + Var( ˆβ 2 ) = 2Var( ˆβ 1 ) = 2SE( ˆβ 1 ) and the 95% confidence interval of the difference would be approximately plus/minus twice this value. So two estimates are different if (and only if) they are 2 2 = 2.83 SE s apart, NOT 4! For a randomized block situation like this, the assumption of independence between estimates do NOT hold, since they are all based on the same 20 planks. And for this reason the uncertainty (SE) of a difference is not easily derived from the SE s of the expected values themselves. This make the use of the direct SE-bars approach even more questionable. In fact, in module 1 it was pointed out how the uncertainties of treatment differences are usually much smaller: The uncertainty of the humidity of a specific width*depth combination will include the extensive plank-to-plank variability, whereas the uncertainties of differences will not! Working with transformed data adds further to the complexity in the construction of error information in the plot. In this situation we can read off from the R output the standard errors (SE) and/or the confidence limits for the 15 averages on log-scale. In general, one should not back transform standard errors. In stead it will be meaningful

33 enote POST HOC ANALYSIS AND SUMMARIZING THE RESULTS FOR THE TRANSFORMED DATA 33 to back transform the limits of the confidence bands. For other transformations than the log this may not always be possible. For the numbers in figure 6.11 error bars based on these values would be extremely overlapping all over, and the plot would be messy and it would give the wrong impression with respect to the significant differences between width and depth levels. And since it is not really possible to visualize all possible nor relevant comparisons, we leave the plot without any further information and summarize the significances in tables in line with those presented in the preliminary results section given in module 3. We could investigate the relevant errors for differences between levels of the interaction factor (or any other effect) by extracting these from the difflsmeans function of the lmertest-package: mydifflsmeans <- difflsmeans(lmer3, "depth:width")$diffs.lsmeans.table head(mydifflsmeans,10) hist(mydifflsmeans$ Standard Error ) Histogram of mydifflsmeans$"standard Error" Frequency mydifflsmeans$"standard Error" It shows now that there are two levels of errors: one for comparing widths within depths and another for comparing depths within widths. One could extract these numbers and use them for some kind of error bar on the plot. Since we back transform it becomes a bit more complicated, but one could still investigate the back-transformed confidence intervals and convey some kind of average confidence intervals on the plot. The meaningfulness of this will depend on how much the widths of these confidence bands vary across all comparisons - the investigation will

34 enote POST HOC ANALYSIS AND SUMMARIZING THE RESULTS FOR THE TRANSFORMED DATA 34 show that also. For now we do not pursue this in more detail here Comparisons of the fixed parameters Since the width*depth interaction is significant we should provide depth values of humidity for each level of width (and vice versa). We give the back transformed values, but use significance information from the analysis on the transformed data. If all comparisons across all levels of the interaction factor are explored and reported, one should use an overall correction method, like the Tukey-Kramer method. However, if we only compare, say, depth levels within each width level, we do not perform all possible (105) tests but only 30 test in 3 groups of 10. (We need 10 tests to compare all of 5 combinations of depth). And probably a less restrictive correction should be employed, e.g. a bonferroni correction based on 10 tests within each set of tests. So carrying out standard t-tests, but only claiming significance if the P-value is less than gives the following summary table: Width 1 Width 2 Width 3 Depth a Depth a Depth a Depth a Depth a Depth a Depth b Depth b Depth b Depth bc Depth b Depth b Depth bc Depth b Depth b The conclusion is clearly the same as previously given, although there seems to be no clear statistical evidence of a difference between the middle depth (5) and the neighbor depths 3 and 7 - slightly so for width 1. The width effects within each depth is given by:(using level 5%/3 = 1.67% in each test) Depth 1 Depth 3 Depth 5 Depth 7 Depth 9 Width a Width a Width a Width a Width a Width ab Width b Width b Width b Width ab Width b Width b Width b Width b Width b

35 enote EXERCISES 35 The width*depth interaction effect is indicated by the difference in the results for the top and bottom compared to the rest. There seems to be a larger difference for the middle (high humidity) depths. Confidence intervals for specific combinations or differences can be obtained by direct back transformation of the lower and upper limits from the analysis on the transformed data. For instance, consider the difference between width 2 and width 3 for depth 1. On log-scale, this difference is (directly read off from the difflsmeans output from the lmertest-package) and its 95% confidence interval (without correction) (also directly read off from the R output) is [ , ] The back transformed value is then: and the confidence interval is: exp(0.1004) = [exp( ), exp(0.1725)] = [1.029, 1.189] Note that such a back transformed difference has a relative interpretation on the original scale: The humidity level for width 3 is estimated at 10% smaller than for width 2 with a 95% confidence interval from 3% to 19%. 6.9 Exercises Exercise 1 Cookies data Carry out model diagnostics for the analysis of the cookies data in exercise 1 in Module 2. Exercise 2 Milk data Carry out model diagnostics for the analysis of the milk data in exercise 2 (part c)) in Module 2.

36 enote EXERCISES 36 Exercise 3 Spinage data Carry out model diagnostics for the analysis of the spinage color data in exercise 1 in Module 3. Exercise 4 Spinage2 data Carry out model diagnostics for the analysis of the spinage sensory data in exercise 2 in Module 3. Exercise 5 blueberry data Carry out model diagnostics for the analysis of the blueberry data in exercise 1 in Module 5.

Module 6: Model Diagnostics

St@tmaster 02429/MIXED LINEAR MODELS PREPARED BY THE STATISTICS GROUPS AT IMM, DTU AND KU-LIFE Module 6: Model Diagnostics 6.1 Introduction............................... 1 6.2 Linear model diagnostics........................