Solution pigs exercise

Size: px

Start display at page:

Download "Solution pigs exercise"

Gervais Whitehead
5 years ago
Views:

1 Solution pigs exercise Course repeated measurements - R exercise class 2 November 24, 2017 Contents 1 Question 1: Import data Data management Inspection of the dataset Question 2: Descriptive statistics Raw data Summary statistics Question 3: Modeling the group effect Fitting a model using an unstructured covariance matrix Inference on the mean parameters Technical note: the nlme provides not very accurate results in small samples Inspection of the variance-covariance parameters Question 5: Investigating the group effect in the first four weeks 13 5 Question 6: Modeling the treatment effect Definition of the new variables the treatment variable number of weeks under treatment Interaction between time and treatment Model (a): non-parametric treatment effect Model (b): linear effect of the treatment Model (c): splitting the treatment effect into a linear effect and a non linear effect Question 7: Predicted weight profiles Compute individual predictions Graphical display Note

2 7 Question 8: Estimate of the difference in weight between the group at the end of week Model (a) Model (b) Model (c) Question 10: Specification of the covariance matrix (compound symmetry vs. unstructured) Comparison of the model fit Comparison of the fitted values NOTE: This document contains an example of R code and related software outputs that answers the questions of the pigs exercise. The focus here is on the implementation using the R software and not on the interpretation - we refer to the SAS solution for a more detailed discussion of the results. 2

3 Load the packages that will be necessary for the analysis: library(data.table) # data management library(nlme) # implementation of models for repeated measurements (e.g. gls, lme) library(ggplot2) # graphical display library(fields) # graphical display: image.plot library(multcomp) # Test for linear hypothesis (glht function) library(aiccmodavg) # predictse.gls 1 Question 1: Import data 1.1 Data management We first specify the location of the data through a variable called path.data: path.data <- " jufo/courses/rm2017/vitamin.txt" Then we use the function fread to import the dataset: dtl.vitamin <- fread(path.data, header = TRUE) str(dtl.vitamin) Classes data.table and data.frame : 60 obs. of 4 variables: $ grp : int $ animal: int $ week : int $ weight: int attr(*, ".internal.selfref")=<externalptr> We rename the group variable using the function factor: dtl.vitamin[, grp := factor(grp, levels = 1:2, labels = c("c","t"))] and convert the animal and week variables to factor: dtl.vitamin[, animal := as.factor(animal)] dtl.vitamin[, week.factor := paste0("w",as.factor(week))] 3

4 1.2 Inspection of the dataset The summary method provides useful information about the dataset: summary(dtl.vitamin, maxsum = 10) grp animal week weight week.factor C:30 1 :6 Min. :1.000 Min. :436.0 Length:60 T:30 2 :6 1st Qu.: st Qu.:508.5 Class :character 3 :6 Median :4.500 Median :565.0 Mode :character 4 :6 Mean :4.333 Mean : :6 3rd Qu.: rd Qu.: :6 Max. :7.000 Max. : :6 8 :6 9 :6 10:6 We have a total of 60 observations divided into 2 groups of 30 observations each. Further inside can be obtain with the table function: dtl.vitamin[, table(grp,animal)] animal grp C T Each group contain 5 animals and each animal has 6 measurements. dtl.vitamin[, table(animal,week.factor)] week.factor animal w1 w3 w4 w5 w6 w Each animal has been measured once at each of the 6 timepoints. Note that there is no missing values: colsums(is.na(dtl.vitamin)) grp animal week weight week.factor

5 2 Question 2: Descriptive statistics 2.1 Raw data We can visualize the weight variable using a spaguetti plot: gg.spaguetti <- ggplot(dtl.vitamin, aes(x = week.factor, y = weight, group = animal, color = animal)) gg.spaguetti <- gg.spaguetti + geom_line() + geom_point() gg.spaguetti <- gg.spaguetti + facet_grid( grp, labeller = label_both) gg.spaguetti <- gg.spaguetti + xlab("week") gg.spaguetti Here we use ggplot2 instead of matplot since the data is in the long format. But one could convert dtl.vitamin to the wide format (e.g. using dcast) and use matplot. The syntax of ggplot2 in the previous code chunk worked as follow: first we specify the dataset to use and the variables corresponding the x-axis and y-axis. We also specify that the points should be grouped and colored according to the variable animal. second we specify how to display the data: with points and lines. finally we request to split the elements to be plotted in two windows according to the variable grp. grp: C grp: T 700 weight animal w1 w3 w4 w5 w6 w7 w1 w3 w4 w5 w6 w7 week 5

6 2.2 Summary statistics We can compute the mean and standard deviation of the weight for each group at each time: dt.descriptive <- dtl.vitamin[,.(n =.N, mean = mean(weight), sd = sd(weight)), by = c("grp","week.factor")] dt.descriptive grp week.factor n mean sd 1: C w : C w : C w : C w : C w : C w : T w : T w : T w : T w : T w : T w and plot it: gg.mean <- ggplot(dt.descriptive, aes(x = week.factor, y = mean, group = grp, color = grp)) gg.mean <- gg.mean + geom_line() + geom_point() gg.mean <- gg.mean + ylab("sample mean (weight)") + xlab("week") gg.mean sample mean (weight) 550 grp C T 500 w1 w3 w4 w5 w6 w7 week 6

7 gg.sd <- ggplot(dt.descriptive, aes(x = week.factor, y = sd, group = grp, color = grp)) gg.sd <- gg.sd + geom_line() + geom_point() gg.sd <- gg.sd + ylab("sample standard deviation (weight)") + xlab("week") gg.sd sample standard deviation (weight) grp C T 20 w1 w3 w4 w5 w6 w7 week Instead of first computing the mean/variance and then plotting it, we could use ggplot to do both at the same time: gg.mean2 <- ggplot(dtl.vitamin, aes(x = week.factor, y = weight, group = grp, color = grp)) gg.mean2 <- gg.mean2 + stat_summary(geom = "line", fun.y = mean, size = 3, fun.data = NULL) gg.mean weight 550 grp C T 500 w1 w3 w4 w5 w6 w7 week.factor 7

8 If we wanted to compute the correlation matrix, it would be easier to first move to the wide format: dtw.vitamin <- dcast(dtl.vitamin, value.var = "weight", formula = grp+animal week.factor) dtw.vitamin grp animal w1 w3 w4 w5 w6 w7 1: C : C : C : C : C : T : T : T : T : T and then compute the correlation matrices relative to each group: list("grp=t" = cor(dtw.vitamin[grp=="t",.(w1,w3,w4,w5,w6,w7)]), "grp=c" = cor(dtw.vitamin[grp=="c",.(w1,w3,w4,w5,w6,w7)])) $ grp=t w1 w3 w4 w5 w6 w7 w w w w w w $ grp=c w1 w3 w4 w5 w6 w7 w w w w w w

9 3 Question 3: Modeling the group effect 3.1 Fitting a model using an unstructured covariance matrix We use the gls function to fit the mixed model, specifying the correlation and weights arguments to model the within individual variability in weights using an unstructured covariance matrix: gls.un <- gls(weight week.factor + grp + grp:week.factor, data = dtl.vitamin, correlation = corsymm(form = 1 animal), weights = varident(form = 1 week.factor) ) loglik(gls.un) log Lik (df=33) 3.2 Inference on the mean parameters We can then extract the estimated coefficients: summary(gls.un)$ttable Value Std.Error t-value p-value (Intercept) e-39 week.factorw e-04 week.factorw e-09 week.factorw e-05 week.factorw e-03 week.factorw e-05 grpt e-02 week.factorw3:grpt e-01 week.factorw4:grpt e-01 week.factorw5:grpt e-01 week.factorw6:grpt e-01 week.factorw7:grpt e-01 their confidence intervals intervals(gls.un)[["coef"]] lower est. upper (Intercept) week.factorw week.factorw week.factorw week.factorw week.factorw grpt

10 week.factorw3:grpt week.factorw4:grpt week.factorw5:grpt week.factorw6:grpt week.factorw7:grpt attr(,"label") [1] "Coefficients:" the F-tests: anova(gls.un, type = "marginal") Denom. DF: 48 numdf F-value p-value (Intercept) <.0001 week.factor <.0001 grp week.factor:grp Technical note: the nlme provides not very accurate results in small samples The last p.value does not match the one of the SAS output. Indeed according to gls we have the following test: 1-pf(5.2803, df1 = 5, df2 = 48) [1] Here the degree of freedom are clearly wrong. We perform a comparison between individuals, here the 10 pigs, so we do not really have 60 independent observations (minus 12 parameters) but something closer to 10 observations. The Satterthwaite approximation can be used to obtain a more sensible value for the degree of freedom: library(lavasearch2) ## not (yet!) available on CRAN, see github/bozenne/lavasearch2 system.time( df.satterthwaite <- dfvariance(gls.un, adjust.residuals = TRUE) ) Le chargement a nécessité le package : lava lava version Attachement du package : lava The following object is masked from package:fields : surface 10

11 lavasearch2 version utilisateur système écoulé df.satterthwaite[names(coef(gls.un))] (Intercept) week.factorw3 week.factorw4 week.factorw5 week.factorw week.factorw7 grpt week.factorw3:grpt week.factorw4:grpt week.factorw5:grpt week.factorw6:grpt week.factorw7:grpt We obtain something close to 7 degrees of freedom, so the p.value for the F-test of the interaction should be: 1-pf(5.2803, df1 = 5, df2 = 7) [1] Inspection of the variance-covariance parameters We can display the modeled variance-covariance matrix between the vitamin measurements within individuals using the getvarcov function: Sigma.UN <- getvarcov(gls.un, individuals = 1) Sigma.UN Marginal variance covariance matrix [,1] [,2] [,3] [,4] [,5] [,6] [1,] [2,] [3,] [4,] [5,] [6,] Standard Deviations: This matrix can be converted into a correlation matrix: Cor.UN <- cov2cor(sigma.un) A graphical representation of the correlation matrix can be obtain with the following code: seqtime <- paste0("week",unique(dtl.vitamin$week)) seqtime.num <- as.numeric(as.factor(seqtime)) palette.z <- rev(heat.colors(12)) 11

12 par(mar = c(4,4,5,5)) image(x = seqtime.num, y = seqtime.num, z = Cor.UN, main = "correlation matrix", axes = FALSE, col = palette.z, xlab = "", ylab = "") axis(1, at = seqtime.num, labels = seqtime) axis(2, at = seqtime.num, labels = seqtime, las = 2) image.plot(x = seqtime.num, y = seqtime.num, z = Cor.UN, legend.only = TRUE, col = palette.z) correlation matrix week7 1.0 week6 0.9 week week4 0.6 week3 0.5 week1 0.4 week1 week3 week4 week5 week6 week7 12

13 4 Question 5: Investigating the group effect in the first four weeks We can create a new dataset containing the data of the first four week doing: dt.tempo <- dtl.vitamin[week<=4] table(dt.tempo$week) So we can use a syntax similar to Question 3 to fit the mixed model using only the first weeks: gls.un.w14 <- gls(weight week.factor + grp:week.factor, data = dtl.vitamin[week<=4], correlation = corsymm(form = 1 animal), weights = varident(form = 1 week.factor) ) loglik(gls.un.w14) log Lik (df=12) We can then extract the estimated coefficients: summary(gls.un.w14)$ttable Value Std.Error t-value p-value (Intercept) e-23 week.factorw e-04 week.factorw e-08 week.factorw1:grpt e-02 week.factorw3:grpt e-01 week.factorw4:grpt e-01 the F-tests: anova(gls.un.w14, type = "marginal") Denom. DF: 24 numdf F-value p-value (Intercept) <.0001 week.factor <.0001 week.factor:grp As before, the F-test should be computed with something close to 7 degree of freedom instead of 24, e.g. for the interaction: 1-pf(2.1888, df1 = 3, df2 = 7) [1]

14 5 Question 6: Modeling the treatment effect 5.1 Definition of the new variables We first define the new variables suggested in the exercise: the treatment variable This variable takes value: "No" in the control group. "No" in the treated group at week 4 and before. "Yes" in the treated group after week 4. We can use the following syntax to obtain it: dtl.vitamin[, treat := as.character(na)] # initialization to missing dtl.vitamin[grp == "C", treat := "No"] dtl.vitamin[week<=4 & grp == "T", treat := "No"] dtl.vitamin[week>4 & grp == "T", treat := "Yes"] We can display the result for the first observation of each group at each time: dtl.vitamin[,.(treat = treat[1]), by = c("week","grp")] week grp treat 1: 1 C No 2: 3 C No 3: 4 C No 4: 5 C No 5: 6 C No 6: 7 C No 7: 1 T No 8: 3 T No 9: 4 T No 10: 5 T Yes 11: 6 T Yes 12: 7 T Yes 14

15 5.1.2 number of weeks under treatment This variable takes value: 0 when no treatment is given. 1 at week 5 when a treatment is given. 2 at week 6 when a treatment is given. 3 at week 7 when a treatment is given. dtl.vitamin[, vitaweeks := as.integer(na)] # initialization to missing dtl.vitamin[treat == "No", vitaweeks := 0] dtl.vitamin[treat == "Yes" & week == 5, vitaweeks := 1] dtl.vitamin[treat == "Yes" & week == 6, vitaweeks := 2] dtl.vitamin[treat == "Yes" & week == 7, vitaweeks := 3] We can display the result for the first observation of each group at each time: dtl.vitamin[,.(vitaweeks = vitaweeks[1]), by = c("week","grp")] week grp vitaweeks 1: 1 C 0 2: 3 C 0 3: 4 C 0 4: 5 C 0 5: 6 C 0 6: 7 C 0 7: 1 T 0 8: 3 T 0 9: 4 T 0 10: 5 T 1 11: 6 T 2 12: 7 T 3 A more concise syntax is: setkeyv(dtl.vitamin, c("animal","week")) dtl.vitamin[, vitaweeks2 := cumsum(treat=="yes"), by = "animal"] Here we count the number of week under treatement using the cumsum function. We can check that both coincide using: all(dtl.vitamin$vitaweeks == dtl.vitamin$vitaweeks2) [1] TRUE 15

16 5.1.3 Interaction between time and treatment To obtain an interation coefficients only at week 5, 6, and 7, we can define a new variable whose value is: baseline when the individual is not treated. the week number (e.g. w5, w6, w7) when the individual is treated. dtl.vitamin[treat == "No", I.treat_week := "baseline"] dtl.vitamin[treat == "Yes", I.treat_week := week.factor] We can display the result for the first observation of each group at each time: dtl.vitamin[,.(i.treat_week = I.treat_week[1]), by = c("week","grp")] week grp I.treat_week 1: 1 C baseline 2: 3 C baseline 3: 4 C baseline 4: 5 C baseline 5: 6 C baseline 6: 7 C baseline 7: 1 T baseline 8: 3 T baseline 9: 4 T baseline 10: 5 T w5 11: 6 T w6 12: 7 T w7 We also define another interaction term with only 2 coefficients. Here we decided not model an interaction at week 5: dtl.vitamin[, I.treat_week67 := I.treat_week] dtl.vitamin[week == 5, I.treat_week67 := "baseline"] We can display the result for the first observation of each group at each time: dtl.vitamin[,.(i.treat_week67 = I.treat_week67[1]), by = c("week","grp")] week grp I.treat_week67 1: 1 C baseline 2: 3 C baseline 3: 4 C baseline 4: 5 C baseline 5: 6 C baseline 6: 7 C baseline 7: 1 T baseline 8: 3 T baseline 9: 4 T baseline 10: 5 T baseline 11: 6 T w6 12: 7 T w7 16

17 5.2 Model (a): non-parametric treatment effect ls.un.a0 <- try(gls(weight week.factor + treat:week.factor, data = dtl.vitamin, correlation = corsymm(form = 1 animal), weights = varident(form = 1 week.factor) )) Error in glsestimate(object, control = control) : computed "gls" fit is singular, rank 10 The gls function cannot fit the model since the model is not properly defined by the formula. To see that let s look at how many coefficients gls is trying to estimate: X <- model.matrix(weight week.factor + treat:week.factor, data = dtl.vitamin) summary(x) (Intercept) week.factorw3 week.factorw4 week.factorw5 week.factorw6 week.factorw7 Min. :1 Min. : Min. : Min. : Min. : Min. : st Qu.:1 1st Qu.: st Qu.: st Qu.: st Qu.: st Qu.: Median :1 Median : Median : Median : Median : Median : Mean :1 Mean : Mean : Mean : Mean : Mean : rd Qu.:1 3rd Qu.: rd Qu.: rd Qu.: rd Qu.: rd Qu.: Max. :1 Max. : Max. : Max. : Max. : Max. : week.factorw1:treatyes week.factorw3:treatyes week.factorw4:treatyes week.factorw5:treatyes Min. :0 Min. :0 Min. :0 Min. : st Qu.:0 1st Qu.:0 1st Qu.:0 1st Qu.: Median :0 Median :0 Median :0 Median : Mean :0 Mean :0 Mean :0 Mean : rd Qu.:0 3rd Qu.:0 3rd Qu.:0 3rd Qu.: Max. :0 Max. :0 Max. :0 Max. : week.factorw6:treatyes week.factorw7:treatyes Min. : Min. : st Qu.: st Qu.: Median : Median : Mean : Mean : rd Qu.: rd Qu.: Max. : Max. : So gls is trying to estimate interactions before time 0 (e.g. week.factorw1:treatyes) even though they do not exist. The corresponding columns in the design matrix (X) contain only 0 making the design matrix singular. We therefore need to manually define the interaction using the variable I.treat_week that we have defined in the last subsection: gls.un.a <- gls(weight week.factor + I.treat_week, data = dtl.vitamin, correlation = corsymm(form = 1 animal), weights = varident(form = 1 week.factor) ) loglik(gls.un.a) 17

18 log Lik (df=30) 5.3 Model (b): linear effect of the treatment gls.un.b <- gls(weight week.factor + vitaweeks, data = dtl.vitamin, correlation = corsymm(form = 1 animal), weights = varident(form = 1 week.factor) ) loglik(gls.un.b) log Lik (df=28) 5.4 Model (c): splitting the treatment effect into a linear effect and a non linear effect Once again if we try to fit the model with interactions, we have an overparametrized model. We therefore redefine the interactions such that there is one degree of freedom left for vitaweeks. gls.un.c <- gls(weight week.factor + vitaweeks + I.treat_week67, data = dtl.vitamin, correlation = corsymm(form = 1 animal), weights = varident(form = 1 week.factor) ) loglik(gls.un.c) log Lik (df=30) As suggested by the log-likelihood, this is the same model as (a) but parametrized in another way: loglik(gls.un.a) - loglik(gls.un.c) log Lik e-10 (df=30) 18

19 6 Question 7: Predicted weight profiles 6.1 Compute individual predictions To compute the predicted profiles for all individuals, you can use the predict function: dtl.vitamin[, weight.un.a := predict(gls.un.a, newdata = dtl.vitamin)] dtl.vitamin[, weight.un.b := predict(gls.un.b, newdata = dtl.vitamin)] dtl.vitamin[, weight.un.c := predict(gls.un.c, newdata = dtl.vitamin)] 6.2 Graphical display We can directly display the prediction for a given model: gg.prediction <- ggplot(dtl.vitamin, aes(x = week, y = weight.un.a, group = grp, color = grp)) gg.prediction <- gg.prediction + geom_point() + geom_line() gg.prediction <- gg.prediction + ylab("model (a): week grptweek") gg.prediction model (a): week grptweek grp C T week To display the predictions of the three models on several panels, we need to move to the wide format: vec.name <- paste0("weight.",c("un.a","un.b","un.c")) vec.name [1] "weight.un.a" "weight.un.b" "weight.un.c" 19

20 dtl.prediction <- melt(dtl.vitamin, id.vars = c("grp","animal","week"), value.name = "weight", variable.name = "model", measure.vars = vec.name) dtl.prediction grp animal week model weight 1: C 1 1 weight.un.a : C 1 3 weight.un.a : C 1 4 weight.un.a : C 1 5 weight.un.a : C 1 6 weight.un.a : T 10 3 weight.un.c : T 10 4 weight.un.c : T 10 5 weight.un.c : T 10 6 weight.un.c : T 10 7 weight.un.c We can then use facet to divide the window into three sub-windows, each displaying the result of a specific gg.prediction2 <- ggplot(dtl.prediction, aes(x = week, y = weight, group = grp, color = grp)) gg.prediction2 <- gg.prediction2 + geom_point() + geom_line() gg.prediction2 <- gg.prediction2 + facet_wrap( model, labeller = label_both) gg.prediction2 model: weight.un.a model: weight.un.b model: weight.un.c week weight grp C T 20

21 6.3 Note In the previous graph we have displayed the predicted values for all individuals. However we could only distinguish two curves for each model. This is because given a group and a week the prediction are the same for all individuals: we don t model individual specific covariates like age. we use the marginal predictions and not predictions conditional on the random effects. In other words, if we have already observed an individual at week 1 and 3, we could we these values to have a more accurate prediction on week 4 (predictions conditional on the individual random effect). Here we display the predicted values as if we were to perform prediction for a new individual (i.e. not already included in the study). 7 Question 8: Estimate of the difference in weight between the group at the end of week Model (a) The estimated difference in weight is given by the interaction term: CI.UN.a <- intervals(gls.un.a, which = "coef") CI.UN.a[["coef"]]["I.treat_weekw7",] lower est. upper This matches the difference in predicted profiles: dtl.vitamin[grp=="t" & week=="7",unique(weight.un.a)] - dtl.vitamin[grp=="c" & week=="7 ",unique(weight.un.a)] [1] However, once again, the confidence intervals are computed using the wrong degree of freedom: beta <- summary(gls.un.a)$ttable["i.treat_weekw7","value"] sd.beta <- summary(gls.un.a)$ttable["i.treat_weekw7","std.error"] CI.default <- c("lower" = beta + qt(0.025, df = 60-9) * sd.beta, "est." = beta, "upper" = beta + qt(0.975, df = 60-9) * sd.beta) CI.default lower est. upper

22 CI.corrected <- c("lower" = beta + qt(0.025, df = 7) * sd.beta, "est." = beta, "upper" = beta + qt(0.975, df = 7) * sd.beta) CI.corrected lower est. upper Model (b) In this model the difference is three times the linear term: coef(gls.un.b)["vitaweeks"]*3 vitaweeks One can check that this matches the difference in predicted profiles: dtl.vitamin[grp=="t" & week=="7",unique(weight.un.b)] - dtl.vitamin[grp=="c" & week=="7 ",unique(weight.un.b)] [1] To obtain the p.values and the standard error (and deduce the confidence interval) one can use the glht function. We first need to indicate that we are interested in 3 times the coefficient vitaweeks: coef.un.b <- coef(gls.un.b) C <- matrix(0,nrow = 1, ncol=length(coef.un.b), dimnames =list(null,names(coef.un.b))) C[,"vitaweeks"] <- 3 C (Intercept) week.factorw3 week.factorw4 week.factorw5 week.factorw6 week.factorw7 vitaweeks [1,] and then call glht: glht.un.b <- summary(glht(gls.un.b, linfct = C)) glht.un.b Simultaneous Tests for General Linear Hypotheses Fit: gls(model = weight ~ week.factor + vitaweeks, data = dtl.vitamin, correlation = corsymm(form = ~1 animal), weights = varident(form = ~1 week.factor)) 22

23 Linear Hypotheses: Estimate Std. Error z value Pr(> z ) 1 == (Adjusted p values reported -- single-step method) We can obtain the corresponding confidence interval using confint confint(glht(gls.un.b, linfct = C)) Simultaneous Confidence Intervals Fit: gls(model = weight ~ week.factor + vitaweeks, data = dtl.vitamin, correlation = corsymm(form = ~1 animal), weights = varident(form = ~1 week.factor)) Quantile = % family-wise confidence level Linear Hypotheses: Estimate lwr upr 1 == In this case this is simply three time the confidence interval of vitaweeks: 3*intervals(gls.UN.b, type = "coef")[["coef"]]["vitaweeks",] lower est. upper Model (c) The results are the same as model (a) but obtaining them would be a bit more complex since the difference is the interaction terms at week 7 plus the three times the linear term. In this case using glht simplifies a lot the implementation: coef.un.c <- coef(gls.un.c) C <- matrix(0,nrow = 1, ncol=length(coef.un.c), dimnames =list(null,names(coef.un.c))) C[,"vitaweeks"] <- 3 C[,"I.treat_week67w7"] <- 1 C (Intercept) week.factorw3 week.factorw4 week.factorw5 week.factorw6 week.factorw7 vitaweeks [1,] I.treat_week67w6 I.treat_week67w7 [1,]

24 glht.un.c <- summary(glht(gls.un.c, linfct = C)) glht.un.c Simultaneous Tests for General Linear Hypotheses Fit: gls(model = weight ~ week.factor + vitaweeks + I.treat_week67, data = dtl.vitamin, correlation = corsymm(form = ~1 animal), weights = varident(form = ~1 week.factor)) Linear Hypotheses: Estimate Std. Error z value Pr(> z ) 1 == ** --- Signif. codes: 0 *** ** 0.01 * (Adjusted p values reported -- single-step method) 8 Question 10: Specification of the covariance matrix (compound symmetry vs. unstructured) 8.1 Comparison of the model fit Specifying an unstructured correlation matrix: gls.cs <-gls(weight week.factor + I.treat_week, data = dtl.vitamin, correlation = corcompsymm(form = 1 animal) ) loglik(gls.cs) log Lik (df=11) is equivalent to a "standard" mixed model fitted using lme: e.lme <-lme(weight week.factor + I.treat_week, data = dtl.vitamin, random = 1 animal ) loglik(e.lme) log Lik (df=11) or lmer from the lme4 package: library(lme4) e.lmer <- lmer(weight week.factor + I.treat_week+ (1 animal), data = dtl.vitamin) loglik(e.lmer) log Lik (df=11) 24

25 As we can expect the variance-covariance structure is much simpler compared to the previous models: list("unstructured" = unclass(getvarcov(gls.un.a)), "compound symmetry" = unclass(getvarcov(gls.cs)) ) $unstructured [,1] [,2] [,3] [,4] [,5] [,6] [1,] [2,] [3,] [4,] [5,] [6,] $ compound symmetry [,1] [,2] [,3] [,4] [,5] [,6] [1,] [2,] [3,] [4,] [5,] [6,] We can compare the two models using a likelihood ratio test: anova(update(gls.cs, method = "REML"), update(gls.un.a, method = "REML") ) Model df AIC BIC loglik Test L.Ratio p-value update(gls.cs, method = "REML") update(gls.un.a, method = "REML") vs <.0001 So it seems that the unstructured model gives a better fit (p<0.0001). 25

26 8.2 Comparison of the fitted values Computation of the predicted values with confidence intervals using predictse.gls: rescs.tempo <- predictse.gls(gls.cs, newdata = dtl.vitamin) dtl.vitamin[, weight.cs := rescs.tempo$fit] dtl.vitamin[, weightinf.cs := rescs.tempo$fit * rescs.tempo$se.fit] dtl.vitamin[, weightsup.cs := rescs.tempo$fit * rescs.tempo$se.fit] resun.tempo <- predictse.gls(gls.un.a, newdata = dtl.vitamin) dtl.vitamin[, weight.un.a := resun.tempo$fit] dtl.vitamin[, weightinf.un.a := resun.tempo$fit * resun.tempo$se.fit] dtl.vitamin[, weightsup.un.a := resun.tempo$fit * resun.tempo$se.fit] With the current dataset we could create one graph for each model. But putting the results on both model side by side may help to visualize discrepancies between the models. To do so we first convert the data to the long format. Since this involves to reshape simultaneously several variables, it might be easier to do that manually: keep.colscs <- c("grp","animal","week","weight.cs","weightinf.cs","weightsup.cs") keep.colsun <- c("grp","animal","week","weight.un.a","weightinf.un.a","weightsup.un.a") dt.tempo1 <- dtl.vitamin[,.sd,.sdcols = keep.colscs] setnames(dt.tempo1, old = names(dt.tempo1), new = c("grp","animal","week","estimate","lower", "upper")) dt.tempo1[, model := "CS"] dt.tempo2 <- dtl.vitamin[,.sd,.sdcols = keep.colsun] setnames(dt.tempo2, old = names(dt.tempo2), new = c("grp","animal","week","estimate","lower", "upper")) dt.tempo2[, model := "UN"] dtl.prediction2 <- rbind(dt.tempo1, dt.tempo2) dtl.prediction2 grp animal week estimate lower upper model 1: C CS 2: C CS 3: C CS 4: C CS 5: C CS : T UN 117: T UN 118: T UN 119: T UN 120: T UN 26

27 melt also enables to obtain a similar result in one operation: dtl.prediction2.bis <- melt(dt.tempo, id.vars = c("grp","animal","week"), measure.vars = patterns("weight\\.","weightinf\\.","weightsup\\."), variable.name = "model", value.name = c("estimate","lower","upper")) dtl.prediction2.bis grp animal week 1: C 1 1 2: C 1 3 3: C 1 4 4: C 2 1 5: C 2 3 6: C 2 4 7: C 3 1 8: C 3 3 9: C : C : C : C : C : C : C : T : T : T : T : T : T : T : T : T : T : T : T : T : T : T 10 4 grp animal week 27

28 We can now use ggplot2 to display the predictions: gg.predictionic <- ggplot(dtl.prediction2, aes(x = week, y = estimate, group = grp, color = grp)) gg.predictionic <- gg.predictionic + geom_point() + geom_line() gg.predictionic <- gg.predictionic + geom_ribbon(aes(ymin = lower, ymax = upper, fill = grp), alpha = 0.33) gg.predictionic <- gg.predictionic + facet_grid(grp model,labeller = label_both) gg.predictionic <- gg.predictionic + ylab("weight") gg.predictionic model: CS model: UN weight grp: C grp: T grp C T week 28

Solution: anti-fungal treatment exercise

Solution: anti-fungal treatment exercise Course repeated measurements - R exercise class 5 December 5, 2017 Contents 1 Question 1: Import data 2 1.1 Data management.....................................