Solution pigs exercise

Size: px
Start display at page:

Download "Solution pigs exercise"

Transcription

1 Solution pigs exercise Course repeated measurements - R exercise class 2 November 24, 2017 Contents 1 Question 1: Import data Data management Inspection of the dataset Question 2: Descriptive statistics Raw data Summary statistics Question 3: Modeling the group effect Fitting a model using an unstructured covariance matrix Inference on the mean parameters Technical note: the nlme provides not very accurate results in small samples Inspection of the variance-covariance parameters Question 5: Investigating the group effect in the first four weeks 13 5 Question 6: Modeling the treatment effect Definition of the new variables the treatment variable number of weeks under treatment Interaction between time and treatment Model (a): non-parametric treatment effect Model (b): linear effect of the treatment Model (c): splitting the treatment effect into a linear effect and a non linear effect Question 7: Predicted weight profiles Compute individual predictions Graphical display Note

2 7 Question 8: Estimate of the difference in weight between the group at the end of week Model (a) Model (b) Model (c) Question 10: Specification of the covariance matrix (compound symmetry vs. unstructured) Comparison of the model fit Comparison of the fitted values NOTE: This document contains an example of R code and related software outputs that answers the questions of the pigs exercise. The focus here is on the implementation using the R software and not on the interpretation - we refer to the SAS solution for a more detailed discussion of the results. 2

3 Load the packages that will be necessary for the analysis: library(data.table) # data management library(nlme) # implementation of models for repeated measurements (e.g. gls, lme) library(ggplot2) # graphical display library(fields) # graphical display: image.plot library(multcomp) # Test for linear hypothesis (glht function) library(aiccmodavg) # predictse.gls 1 Question 1: Import data 1.1 Data management We first specify the location of the data through a variable called path.data: path.data <- " jufo/courses/rm2017/vitamin.txt" Then we use the function fread to import the dataset: dtl.vitamin <- fread(path.data, header = TRUE) str(dtl.vitamin) Classes data.table and data.frame : 60 obs. of 4 variables: $ grp : int $ animal: int $ week : int $ weight: int attr(*, ".internal.selfref")=<externalptr> We rename the group variable using the function factor: dtl.vitamin[, grp := factor(grp, levels = 1:2, labels = c("c","t"))] and convert the animal and week variables to factor: dtl.vitamin[, animal := as.factor(animal)] dtl.vitamin[, week.factor := paste0("w",as.factor(week))] 3

4 1.2 Inspection of the dataset The summary method provides useful information about the dataset: summary(dtl.vitamin, maxsum = 10) grp animal week weight week.factor C:30 1 :6 Min. :1.000 Min. :436.0 Length:60 T:30 2 :6 1st Qu.: st Qu.:508.5 Class :character 3 :6 Median :4.500 Median :565.0 Mode :character 4 :6 Mean :4.333 Mean : :6 3rd Qu.: rd Qu.: :6 Max. :7.000 Max. : :6 8 :6 9 :6 10:6 We have a total of 60 observations divided into 2 groups of 30 observations each. Further inside can be obtain with the table function: dtl.vitamin[, table(grp,animal)] animal grp C T Each group contain 5 animals and each animal has 6 measurements. dtl.vitamin[, table(animal,week.factor)] week.factor animal w1 w3 w4 w5 w6 w Each animal has been measured once at each of the 6 timepoints. Note that there is no missing values: colsums(is.na(dtl.vitamin)) grp animal week weight week.factor

5 2 Question 2: Descriptive statistics 2.1 Raw data We can visualize the weight variable using a spaguetti plot: gg.spaguetti <- ggplot(dtl.vitamin, aes(x = week.factor, y = weight, group = animal, color = animal)) gg.spaguetti <- gg.spaguetti + geom_line() + geom_point() gg.spaguetti <- gg.spaguetti + facet_grid( grp, labeller = label_both) gg.spaguetti <- gg.spaguetti + xlab("week") gg.spaguetti Here we use ggplot2 instead of matplot since the data is in the long format. But one could convert dtl.vitamin to the wide format (e.g. using dcast) and use matplot. The syntax of ggplot2 in the previous code chunk worked as follow: first we specify the dataset to use and the variables corresponding the x-axis and y-axis. We also specify that the points should be grouped and colored according to the variable animal. second we specify how to display the data: with points and lines. finally we request to split the elements to be plotted in two windows according to the variable grp. grp: C grp: T 700 weight animal w1 w3 w4 w5 w6 w7 w1 w3 w4 w5 w6 w7 week 5

6 2.2 Summary statistics We can compute the mean and standard deviation of the weight for each group at each time: dt.descriptive <- dtl.vitamin[,.(n =.N, mean = mean(weight), sd = sd(weight)), by = c("grp","week.factor")] dt.descriptive grp week.factor n mean sd 1: C w : C w : C w : C w : C w : C w : T w : T w : T w : T w : T w : T w and plot it: gg.mean <- ggplot(dt.descriptive, aes(x = week.factor, y = mean, group = grp, color = grp)) gg.mean <- gg.mean + geom_line() + geom_point() gg.mean <- gg.mean + ylab("sample mean (weight)") + xlab("week") gg.mean sample mean (weight) 550 grp C T 500 w1 w3 w4 w5 w6 w7 week 6

7 gg.sd <- ggplot(dt.descriptive, aes(x = week.factor, y = sd, group = grp, color = grp)) gg.sd <- gg.sd + geom_line() + geom_point() gg.sd <- gg.sd + ylab("sample standard deviation (weight)") + xlab("week") gg.sd sample standard deviation (weight) grp C T 20 w1 w3 w4 w5 w6 w7 week Instead of first computing the mean/variance and then plotting it, we could use ggplot to do both at the same time: gg.mean2 <- ggplot(dtl.vitamin, aes(x = week.factor, y = weight, group = grp, color = grp)) gg.mean2 <- gg.mean2 + stat_summary(geom = "line", fun.y = mean, size = 3, fun.data = NULL) gg.mean weight 550 grp C T 500 w1 w3 w4 w5 w6 w7 week.factor 7

8 If we wanted to compute the correlation matrix, it would be easier to first move to the wide format: dtw.vitamin <- dcast(dtl.vitamin, value.var = "weight", formula = grp+animal week.factor) dtw.vitamin grp animal w1 w3 w4 w5 w6 w7 1: C : C : C : C : C : T : T : T : T : T and then compute the correlation matrices relative to each group: list("grp=t" = cor(dtw.vitamin[grp=="t",.(w1,w3,w4,w5,w6,w7)]), "grp=c" = cor(dtw.vitamin[grp=="c",.(w1,w3,w4,w5,w6,w7)])) $ grp=t w1 w3 w4 w5 w6 w7 w w w w w w $ grp=c w1 w3 w4 w5 w6 w7 w w w w w w

9 3 Question 3: Modeling the group effect 3.1 Fitting a model using an unstructured covariance matrix We use the gls function to fit the mixed model, specifying the correlation and weights arguments to model the within individual variability in weights using an unstructured covariance matrix: gls.un <- gls(weight week.factor + grp + grp:week.factor, data = dtl.vitamin, correlation = corsymm(form = 1 animal), weights = varident(form = 1 week.factor) ) loglik(gls.un) log Lik (df=33) 3.2 Inference on the mean parameters We can then extract the estimated coefficients: summary(gls.un)$ttable Value Std.Error t-value p-value (Intercept) e-39 week.factorw e-04 week.factorw e-09 week.factorw e-05 week.factorw e-03 week.factorw e-05 grpt e-02 week.factorw3:grpt e-01 week.factorw4:grpt e-01 week.factorw5:grpt e-01 week.factorw6:grpt e-01 week.factorw7:grpt e-01 their confidence intervals intervals(gls.un)[["coef"]] lower est. upper (Intercept) week.factorw week.factorw week.factorw week.factorw week.factorw grpt

10 week.factorw3:grpt week.factorw4:grpt week.factorw5:grpt week.factorw6:grpt week.factorw7:grpt attr(,"label") [1] "Coefficients:" the F-tests: anova(gls.un, type = "marginal") Denom. DF: 48 numdf F-value p-value (Intercept) <.0001 week.factor <.0001 grp week.factor:grp Technical note: the nlme provides not very accurate results in small samples The last p.value does not match the one of the SAS output. Indeed according to gls we have the following test: 1-pf(5.2803, df1 = 5, df2 = 48) [1] Here the degree of freedom are clearly wrong. We perform a comparison between individuals, here the 10 pigs, so we do not really have 60 independent observations (minus 12 parameters) but something closer to 10 observations. The Satterthwaite approximation can be used to obtain a more sensible value for the degree of freedom: library(lavasearch2) ## not (yet!) available on CRAN, see github/bozenne/lavasearch2 system.time( df.satterthwaite <- dfvariance(gls.un, adjust.residuals = TRUE) ) Le chargement a nécessité le package : lava lava version Attachement du package : lava The following object is masked from package:fields : surface 10

11 lavasearch2 version utilisateur système écoulé df.satterthwaite[names(coef(gls.un))] (Intercept) week.factorw3 week.factorw4 week.factorw5 week.factorw week.factorw7 grpt week.factorw3:grpt week.factorw4:grpt week.factorw5:grpt week.factorw6:grpt week.factorw7:grpt We obtain something close to 7 degrees of freedom, so the p.value for the F-test of the interaction should be: 1-pf(5.2803, df1 = 5, df2 = 7) [1] Inspection of the variance-covariance parameters We can display the modeled variance-covariance matrix between the vitamin measurements within individuals using the getvarcov function: Sigma.UN <- getvarcov(gls.un, individuals = 1) Sigma.UN Marginal variance covariance matrix [,1] [,2] [,3] [,4] [,5] [,6] [1,] [2,] [3,] [4,] [5,] [6,] Standard Deviations: This matrix can be converted into a correlation matrix: Cor.UN <- cov2cor(sigma.un) A graphical representation of the correlation matrix can be obtain with the following code: seqtime <- paste0("week",unique(dtl.vitamin$week)) seqtime.num <- as.numeric(as.factor(seqtime)) palette.z <- rev(heat.colors(12)) 11

12 par(mar = c(4,4,5,5)) image(x = seqtime.num, y = seqtime.num, z = Cor.UN, main = "correlation matrix", axes = FALSE, col = palette.z, xlab = "", ylab = "") axis(1, at = seqtime.num, labels = seqtime) axis(2, at = seqtime.num, labels = seqtime, las = 2) image.plot(x = seqtime.num, y = seqtime.num, z = Cor.UN, legend.only = TRUE, col = palette.z) correlation matrix week7 1.0 week6 0.9 week week4 0.6 week3 0.5 week1 0.4 week1 week3 week4 week5 week6 week7 12

13 4 Question 5: Investigating the group effect in the first four weeks We can create a new dataset containing the data of the first four week doing: dt.tempo <- dtl.vitamin[week<=4] table(dt.tempo$week) So we can use a syntax similar to Question 3 to fit the mixed model using only the first weeks: gls.un.w14 <- gls(weight week.factor + grp:week.factor, data = dtl.vitamin[week<=4], correlation = corsymm(form = 1 animal), weights = varident(form = 1 week.factor) ) loglik(gls.un.w14) log Lik (df=12) We can then extract the estimated coefficients: summary(gls.un.w14)$ttable Value Std.Error t-value p-value (Intercept) e-23 week.factorw e-04 week.factorw e-08 week.factorw1:grpt e-02 week.factorw3:grpt e-01 week.factorw4:grpt e-01 the F-tests: anova(gls.un.w14, type = "marginal") Denom. DF: 24 numdf F-value p-value (Intercept) <.0001 week.factor <.0001 week.factor:grp As before, the F-test should be computed with something close to 7 degree of freedom instead of 24, e.g. for the interaction: 1-pf(2.1888, df1 = 3, df2 = 7) [1]

14 5 Question 6: Modeling the treatment effect 5.1 Definition of the new variables We first define the new variables suggested in the exercise: the treatment variable This variable takes value: "No" in the control group. "No" in the treated group at week 4 and before. "Yes" in the treated group after week 4. We can use the following syntax to obtain it: dtl.vitamin[, treat := as.character(na)] # initialization to missing dtl.vitamin[grp == "C", treat := "No"] dtl.vitamin[week<=4 & grp == "T", treat := "No"] dtl.vitamin[week>4 & grp == "T", treat := "Yes"] We can display the result for the first observation of each group at each time: dtl.vitamin[,.(treat = treat[1]), by = c("week","grp")] week grp treat 1: 1 C No 2: 3 C No 3: 4 C No 4: 5 C No 5: 6 C No 6: 7 C No 7: 1 T No 8: 3 T No 9: 4 T No 10: 5 T Yes 11: 6 T Yes 12: 7 T Yes 14

15 5.1.2 number of weeks under treatment This variable takes value: 0 when no treatment is given. 1 at week 5 when a treatment is given. 2 at week 6 when a treatment is given. 3 at week 7 when a treatment is given. dtl.vitamin[, vitaweeks := as.integer(na)] # initialization to missing dtl.vitamin[treat == "No", vitaweeks := 0] dtl.vitamin[treat == "Yes" & week == 5, vitaweeks := 1] dtl.vitamin[treat == "Yes" & week == 6, vitaweeks := 2] dtl.vitamin[treat == "Yes" & week == 7, vitaweeks := 3] We can display the result for the first observation of each group at each time: dtl.vitamin[,.(vitaweeks = vitaweeks[1]), by = c("week","grp")] week grp vitaweeks 1: 1 C 0 2: 3 C 0 3: 4 C 0 4: 5 C 0 5: 6 C 0 6: 7 C 0 7: 1 T 0 8: 3 T 0 9: 4 T 0 10: 5 T 1 11: 6 T 2 12: 7 T 3 A more concise syntax is: setkeyv(dtl.vitamin, c("animal","week")) dtl.vitamin[, vitaweeks2 := cumsum(treat=="yes"), by = "animal"] Here we count the number of week under treatement using the cumsum function. We can check that both coincide using: all(dtl.vitamin$vitaweeks == dtl.vitamin$vitaweeks2) [1] TRUE 15

16 5.1.3 Interaction between time and treatment To obtain an interation coefficients only at week 5, 6, and 7, we can define a new variable whose value is: baseline when the individual is not treated. the week number (e.g. w5, w6, w7) when the individual is treated. dtl.vitamin[treat == "No", I.treat_week := "baseline"] dtl.vitamin[treat == "Yes", I.treat_week := week.factor] We can display the result for the first observation of each group at each time: dtl.vitamin[,.(i.treat_week = I.treat_week[1]), by = c("week","grp")] week grp I.treat_week 1: 1 C baseline 2: 3 C baseline 3: 4 C baseline 4: 5 C baseline 5: 6 C baseline 6: 7 C baseline 7: 1 T baseline 8: 3 T baseline 9: 4 T baseline 10: 5 T w5 11: 6 T w6 12: 7 T w7 We also define another interaction term with only 2 coefficients. Here we decided not model an interaction at week 5: dtl.vitamin[, I.treat_week67 := I.treat_week] dtl.vitamin[week == 5, I.treat_week67 := "baseline"] We can display the result for the first observation of each group at each time: dtl.vitamin[,.(i.treat_week67 = I.treat_week67[1]), by = c("week","grp")] week grp I.treat_week67 1: 1 C baseline 2: 3 C baseline 3: 4 C baseline 4: 5 C baseline 5: 6 C baseline 6: 7 C baseline 7: 1 T baseline 8: 3 T baseline 9: 4 T baseline 10: 5 T baseline 11: 6 T w6 12: 7 T w7 16

17 5.2 Model (a): non-parametric treatment effect ls.un.a0 <- try(gls(weight week.factor + treat:week.factor, data = dtl.vitamin, correlation = corsymm(form = 1 animal), weights = varident(form = 1 week.factor) )) Error in glsestimate(object, control = control) : computed "gls" fit is singular, rank 10 The gls function cannot fit the model since the model is not properly defined by the formula. To see that let s look at how many coefficients gls is trying to estimate: X <- model.matrix(weight week.factor + treat:week.factor, data = dtl.vitamin) summary(x) (Intercept) week.factorw3 week.factorw4 week.factorw5 week.factorw6 week.factorw7 Min. :1 Min. : Min. : Min. : Min. : Min. : st Qu.:1 1st Qu.: st Qu.: st Qu.: st Qu.: st Qu.: Median :1 Median : Median : Median : Median : Median : Mean :1 Mean : Mean : Mean : Mean : Mean : rd Qu.:1 3rd Qu.: rd Qu.: rd Qu.: rd Qu.: rd Qu.: Max. :1 Max. : Max. : Max. : Max. : Max. : week.factorw1:treatyes week.factorw3:treatyes week.factorw4:treatyes week.factorw5:treatyes Min. :0 Min. :0 Min. :0 Min. : st Qu.:0 1st Qu.:0 1st Qu.:0 1st Qu.: Median :0 Median :0 Median :0 Median : Mean :0 Mean :0 Mean :0 Mean : rd Qu.:0 3rd Qu.:0 3rd Qu.:0 3rd Qu.: Max. :0 Max. :0 Max. :0 Max. : week.factorw6:treatyes week.factorw7:treatyes Min. : Min. : st Qu.: st Qu.: Median : Median : Mean : Mean : rd Qu.: rd Qu.: Max. : Max. : So gls is trying to estimate interactions before time 0 (e.g. week.factorw1:treatyes) even though they do not exist. The corresponding columns in the design matrix (X) contain only 0 making the design matrix singular. We therefore need to manually define the interaction using the variable I.treat_week that we have defined in the last subsection: gls.un.a <- gls(weight week.factor + I.treat_week, data = dtl.vitamin, correlation = corsymm(form = 1 animal), weights = varident(form = 1 week.factor) ) loglik(gls.un.a) 17

18 log Lik (df=30) 5.3 Model (b): linear effect of the treatment gls.un.b <- gls(weight week.factor + vitaweeks, data = dtl.vitamin, correlation = corsymm(form = 1 animal), weights = varident(form = 1 week.factor) ) loglik(gls.un.b) log Lik (df=28) 5.4 Model (c): splitting the treatment effect into a linear effect and a non linear effect Once again if we try to fit the model with interactions, we have an overparametrized model. We therefore redefine the interactions such that there is one degree of freedom left for vitaweeks. gls.un.c <- gls(weight week.factor + vitaweeks + I.treat_week67, data = dtl.vitamin, correlation = corsymm(form = 1 animal), weights = varident(form = 1 week.factor) ) loglik(gls.un.c) log Lik (df=30) As suggested by the log-likelihood, this is the same model as (a) but parametrized in another way: loglik(gls.un.a) - loglik(gls.un.c) log Lik e-10 (df=30) 18

19 6 Question 7: Predicted weight profiles 6.1 Compute individual predictions To compute the predicted profiles for all individuals, you can use the predict function: dtl.vitamin[, weight.un.a := predict(gls.un.a, newdata = dtl.vitamin)] dtl.vitamin[, weight.un.b := predict(gls.un.b, newdata = dtl.vitamin)] dtl.vitamin[, weight.un.c := predict(gls.un.c, newdata = dtl.vitamin)] 6.2 Graphical display We can directly display the prediction for a given model: gg.prediction <- ggplot(dtl.vitamin, aes(x = week, y = weight.un.a, group = grp, color = grp)) gg.prediction <- gg.prediction + geom_point() + geom_line() gg.prediction <- gg.prediction + ylab("model (a): week grptweek") gg.prediction model (a): week grptweek grp C T week To display the predictions of the three models on several panels, we need to move to the wide format: vec.name <- paste0("weight.",c("un.a","un.b","un.c")) vec.name [1] "weight.un.a" "weight.un.b" "weight.un.c" 19

20 dtl.prediction <- melt(dtl.vitamin, id.vars = c("grp","animal","week"), value.name = "weight", variable.name = "model", measure.vars = vec.name) dtl.prediction grp animal week model weight 1: C 1 1 weight.un.a : C 1 3 weight.un.a : C 1 4 weight.un.a : C 1 5 weight.un.a : C 1 6 weight.un.a : T 10 3 weight.un.c : T 10 4 weight.un.c : T 10 5 weight.un.c : T 10 6 weight.un.c : T 10 7 weight.un.c We can then use facet to divide the window into three sub-windows, each displaying the result of a specific gg.prediction2 <- ggplot(dtl.prediction, aes(x = week, y = weight, group = grp, color = grp)) gg.prediction2 <- gg.prediction2 + geom_point() + geom_line() gg.prediction2 <- gg.prediction2 + facet_wrap( model, labeller = label_both) gg.prediction2 model: weight.un.a model: weight.un.b model: weight.un.c week weight grp C T 20

21 6.3 Note In the previous graph we have displayed the predicted values for all individuals. However we could only distinguish two curves for each model. This is because given a group and a week the prediction are the same for all individuals: we don t model individual specific covariates like age. we use the marginal predictions and not predictions conditional on the random effects. In other words, if we have already observed an individual at week 1 and 3, we could we these values to have a more accurate prediction on week 4 (predictions conditional on the individual random effect). Here we display the predicted values as if we were to perform prediction for a new individual (i.e. not already included in the study). 7 Question 8: Estimate of the difference in weight between the group at the end of week Model (a) The estimated difference in weight is given by the interaction term: CI.UN.a <- intervals(gls.un.a, which = "coef") CI.UN.a[["coef"]]["I.treat_weekw7",] lower est. upper This matches the difference in predicted profiles: dtl.vitamin[grp=="t" & week=="7",unique(weight.un.a)] - dtl.vitamin[grp=="c" & week=="7 ",unique(weight.un.a)] [1] However, once again, the confidence intervals are computed using the wrong degree of freedom: beta <- summary(gls.un.a)$ttable["i.treat_weekw7","value"] sd.beta <- summary(gls.un.a)$ttable["i.treat_weekw7","std.error"] CI.default <- c("lower" = beta + qt(0.025, df = 60-9) * sd.beta, "est." = beta, "upper" = beta + qt(0.975, df = 60-9) * sd.beta) CI.default lower est. upper

22 CI.corrected <- c("lower" = beta + qt(0.025, df = 7) * sd.beta, "est." = beta, "upper" = beta + qt(0.975, df = 7) * sd.beta) CI.corrected lower est. upper Model (b) In this model the difference is three times the linear term: coef(gls.un.b)["vitaweeks"]*3 vitaweeks One can check that this matches the difference in predicted profiles: dtl.vitamin[grp=="t" & week=="7",unique(weight.un.b)] - dtl.vitamin[grp=="c" & week=="7 ",unique(weight.un.b)] [1] To obtain the p.values and the standard error (and deduce the confidence interval) one can use the glht function. We first need to indicate that we are interested in 3 times the coefficient vitaweeks: coef.un.b <- coef(gls.un.b) C <- matrix(0,nrow = 1, ncol=length(coef.un.b), dimnames =list(null,names(coef.un.b))) C[,"vitaweeks"] <- 3 C (Intercept) week.factorw3 week.factorw4 week.factorw5 week.factorw6 week.factorw7 vitaweeks [1,] and then call glht: glht.un.b <- summary(glht(gls.un.b, linfct = C)) glht.un.b Simultaneous Tests for General Linear Hypotheses Fit: gls(model = weight ~ week.factor + vitaweeks, data = dtl.vitamin, correlation = corsymm(form = ~1 animal), weights = varident(form = ~1 week.factor)) 22

23 Linear Hypotheses: Estimate Std. Error z value Pr(> z ) 1 == (Adjusted p values reported -- single-step method) We can obtain the corresponding confidence interval using confint confint(glht(gls.un.b, linfct = C)) Simultaneous Confidence Intervals Fit: gls(model = weight ~ week.factor + vitaweeks, data = dtl.vitamin, correlation = corsymm(form = ~1 animal), weights = varident(form = ~1 week.factor)) Quantile = % family-wise confidence level Linear Hypotheses: Estimate lwr upr 1 == In this case this is simply three time the confidence interval of vitaweeks: 3*intervals(gls.UN.b, type = "coef")[["coef"]]["vitaweeks",] lower est. upper Model (c) The results are the same as model (a) but obtaining them would be a bit more complex since the difference is the interaction terms at week 7 plus the three times the linear term. In this case using glht simplifies a lot the implementation: coef.un.c <- coef(gls.un.c) C <- matrix(0,nrow = 1, ncol=length(coef.un.c), dimnames =list(null,names(coef.un.c))) C[,"vitaweeks"] <- 3 C[,"I.treat_week67w7"] <- 1 C (Intercept) week.factorw3 week.factorw4 week.factorw5 week.factorw6 week.factorw7 vitaweeks [1,] I.treat_week67w6 I.treat_week67w7 [1,]

24 glht.un.c <- summary(glht(gls.un.c, linfct = C)) glht.un.c Simultaneous Tests for General Linear Hypotheses Fit: gls(model = weight ~ week.factor + vitaweeks + I.treat_week67, data = dtl.vitamin, correlation = corsymm(form = ~1 animal), weights = varident(form = ~1 week.factor)) Linear Hypotheses: Estimate Std. Error z value Pr(> z ) 1 == ** --- Signif. codes: 0 *** ** 0.01 * (Adjusted p values reported -- single-step method) 8 Question 10: Specification of the covariance matrix (compound symmetry vs. unstructured) 8.1 Comparison of the model fit Specifying an unstructured correlation matrix: gls.cs <-gls(weight week.factor + I.treat_week, data = dtl.vitamin, correlation = corcompsymm(form = 1 animal) ) loglik(gls.cs) log Lik (df=11) is equivalent to a "standard" mixed model fitted using lme: e.lme <-lme(weight week.factor + I.treat_week, data = dtl.vitamin, random = 1 animal ) loglik(e.lme) log Lik (df=11) or lmer from the lme4 package: library(lme4) e.lmer <- lmer(weight week.factor + I.treat_week+ (1 animal), data = dtl.vitamin) loglik(e.lmer) log Lik (df=11) 24

25 As we can expect the variance-covariance structure is much simpler compared to the previous models: list("unstructured" = unclass(getvarcov(gls.un.a)), "compound symmetry" = unclass(getvarcov(gls.cs)) ) $unstructured [,1] [,2] [,3] [,4] [,5] [,6] [1,] [2,] [3,] [4,] [5,] [6,] $ compound symmetry [,1] [,2] [,3] [,4] [,5] [,6] [1,] [2,] [3,] [4,] [5,] [6,] We can compare the two models using a likelihood ratio test: anova(update(gls.cs, method = "REML"), update(gls.un.a, method = "REML") ) Model df AIC BIC loglik Test L.Ratio p-value update(gls.cs, method = "REML") update(gls.un.a, method = "REML") vs <.0001 So it seems that the unstructured model gives a better fit (p<0.0001). 25

26 8.2 Comparison of the fitted values Computation of the predicted values with confidence intervals using predictse.gls: rescs.tempo <- predictse.gls(gls.cs, newdata = dtl.vitamin) dtl.vitamin[, weight.cs := rescs.tempo$fit] dtl.vitamin[, weightinf.cs := rescs.tempo$fit * rescs.tempo$se.fit] dtl.vitamin[, weightsup.cs := rescs.tempo$fit * rescs.tempo$se.fit] resun.tempo <- predictse.gls(gls.un.a, newdata = dtl.vitamin) dtl.vitamin[, weight.un.a := resun.tempo$fit] dtl.vitamin[, weightinf.un.a := resun.tempo$fit * resun.tempo$se.fit] dtl.vitamin[, weightsup.un.a := resun.tempo$fit * resun.tempo$se.fit] With the current dataset we could create one graph for each model. But putting the results on both model side by side may help to visualize discrepancies between the models. To do so we first convert the data to the long format. Since this involves to reshape simultaneously several variables, it might be easier to do that manually: keep.colscs <- c("grp","animal","week","weight.cs","weightinf.cs","weightsup.cs") keep.colsun <- c("grp","animal","week","weight.un.a","weightinf.un.a","weightsup.un.a") dt.tempo1 <- dtl.vitamin[,.sd,.sdcols = keep.colscs] setnames(dt.tempo1, old = names(dt.tempo1), new = c("grp","animal","week","estimate","lower", "upper")) dt.tempo1[, model := "CS"] dt.tempo2 <- dtl.vitamin[,.sd,.sdcols = keep.colsun] setnames(dt.tempo2, old = names(dt.tempo2), new = c("grp","animal","week","estimate","lower", "upper")) dt.tempo2[, model := "UN"] dtl.prediction2 <- rbind(dt.tempo1, dt.tempo2) dtl.prediction2 grp animal week estimate lower upper model 1: C CS 2: C CS 3: C CS 4: C CS 5: C CS : T UN 117: T UN 118: T UN 119: T UN 120: T UN 26

27 melt also enables to obtain a similar result in one operation: dtl.prediction2.bis <- melt(dt.tempo, id.vars = c("grp","animal","week"), measure.vars = patterns("weight\\.","weightinf\\.","weightsup\\."), variable.name = "model", value.name = c("estimate","lower","upper")) dtl.prediction2.bis grp animal week 1: C 1 1 2: C 1 3 3: C 1 4 4: C 2 1 5: C 2 3 6: C 2 4 7: C 3 1 8: C 3 3 9: C : C : C : C : C : C : C : T : T : T : T : T : T : T : T : T : T : T : T : T : T : T 10 4 grp animal week 27

28 We can now use ggplot2 to display the predictions: gg.predictionic <- ggplot(dtl.prediction2, aes(x = week, y = estimate, group = grp, color = grp)) gg.predictionic <- gg.predictionic + geom_point() + geom_line() gg.predictionic <- gg.predictionic + geom_ribbon(aes(ymin = lower, ymax = upper, fill = grp), alpha = 0.33) gg.predictionic <- gg.predictionic + facet_grid(grp model,labeller = label_both) gg.predictionic <- gg.predictionic + ylab("weight") gg.predictionic model: CS model: UN weight grp: C grp: T grp C T week 28

Solution: anti-fungal treatment exercise

Solution: anti-fungal treatment exercise Solution: anti-fungal treatment exercise Course repeated measurements - R exercise class 5 December 5, 2017 Contents 1 Question 1: Import data 2 1.1 Data management.....................................

More information

Fitting mixed models in R

Fitting mixed models in R Fitting mixed models in R Contents 1 Packages 2 2 Specifying the variance-covariance matrix (nlme package) 3 2.1 Illustration:.................................... 3 2.2 Technical point: why to use as.numeric(time)

More information

Workshop 9.3a: Randomized block designs

Workshop 9.3a: Randomized block designs -1- Workshop 93a: Randomized block designs Murray Logan November 23, 16 Table of contents 1 Randomized Block (RCB) designs 1 2 Worked Examples 12 1 Randomized Block (RCB) designs 11 RCB design Simple Randomized

More information

STAT3401: Advanced data analysis Week 10: Models for Clustered Longitudinal Data

STAT3401: Advanced data analysis Week 10: Models for Clustered Longitudinal Data STAT3401: Advanced data analysis Week 10: Models for Clustered Longitudinal Data Berwin Turlach School of Mathematics and Statistics Berwin.Turlach@gmail.com The University of Western Australia Models

More information

These slides illustrate a few example R commands that can be useful for the analysis of repeated measures data.

These slides illustrate a few example R commands that can be useful for the analysis of repeated measures data. These slides illustrate a few example R commands that can be useful for the analysis of repeated measures data. We focus on the experiment designed to compare the effectiveness of three strength training

More information

Solution Anti-fungal treatment (R software)

Solution Anti-fungal treatment (R software) Contents Solution Anti-fungal treatment (R software) Question 1: Data import 2 Question 2: Compliance with the timetable 4 Question 3: population average model 5 Question 4: continuous time model 9 Question

More information

Repeated measures, part 1, simple methods

Repeated measures, part 1, simple methods enote 11 1 enote 11 Repeated measures, part 1, simple methods enote 11 INDHOLD 2 Indhold 11 Repeated measures, part 1, simple methods 1 11.1 Intro......................................... 2 11.1.1 Main

More information

Workshop 9.1: Mixed effects models

Workshop 9.1: Mixed effects models -1- Workshop 91: Mixed effects models Murray Logan October 10, 2016 Table of contents 1 Non-independence - part 2 1 1 Non-independence - part 2 11 Linear models Homogeneity of variance σ 2 0 0 y i = β

More information

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn A Handbook of Statistical Analyses Using R 2nd Edition Brian S. Everitt and Torsten Hothorn CHAPTER 12 Analysing Longitudinal Data I: Computerised Delivery of Cognitive Behavioural Therapy Beat the Blues

More information

Introduction to the Analysis of Hierarchical and Longitudinal Data

Introduction to the Analysis of Hierarchical and Longitudinal Data Introduction to the Analysis of Hierarchical and Longitudinal Data Georges Monette, York University with Ye Sun SPIDA June 7, 2004 1 Graphical overview of selected concepts Nature of hierarchical models

More information

Repeated measures, part 2, advanced methods

Repeated measures, part 2, advanced methods enote 12 1 enote 12 Repeated measures, part 2, advanced methods enote 12 INDHOLD 2 Indhold 12 Repeated measures, part 2, advanced methods 1 12.1 Intro......................................... 3 12.2 A

More information

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn A Handbook of Statistical Analyses Using R 2nd Edition Brian S. Everitt and Torsten Hothorn CHAPTER 12 Analysing Longitudinal Data I: Computerised Delivery of Cognitive Behavioural Therapy Beat the Blues

More information

R Output for Linear Models using functions lm(), gls() & glm()

R Output for Linear Models using functions lm(), gls() & glm() LM 04 lm(), gls() &glm() 1 R Output for Linear Models using functions lm(), gls() & glm() Different kinds of output related to linear models can be obtained in R using function lm() {stats} in the base

More information

SPSS LAB FILE 1

SPSS LAB FILE  1 SPSS LAB FILE www.mcdtu.wordpress.com 1 www.mcdtu.wordpress.com 2 www.mcdtu.wordpress.com 3 OBJECTIVE 1: Transporation of Data Set to SPSS Editor INPUTS: Files: group1.xlsx, group1.txt PROCEDURE FOLLOWED:

More information

Introduction and Background to Multilevel Analysis

Introduction and Background to Multilevel Analysis Introduction and Background to Multilevel Analysis Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Background and

More information

Part II { Oneway Anova, Simple Linear Regression and ANCOVA with R

Part II { Oneway Anova, Simple Linear Regression and ANCOVA with R Part II { Oneway Anova, Simple Linear Regression and ANCOVA with R Gilles Lamothe February 21, 2017 Contents 1 Anova with one factor 2 1.1 The data.......................................... 2 1.2 A visual

More information

Repeated Measures ANOVA Multivariate ANOVA and Their Relationship to Linear Mixed Models

Repeated Measures ANOVA Multivariate ANOVA and Their Relationship to Linear Mixed Models Repeated Measures ANOVA Multivariate ANOVA and Their Relationship to Linear Mixed Models EPSY 905: Multivariate Analysis Spring 2016 Lecture #12 April 20, 2016 EPSY 905: RM ANOVA, MANOVA, and Mixed Models

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 14 1 / 64 Data structure and Model t1 t2 tn i 1st subject y 11 y 12 y 1n1 2nd subject

More information

A brief introduction to mixed models

A brief introduction to mixed models A brief introduction to mixed models University of Gothenburg Gothenburg April 6, 2017 Outline An introduction to mixed models based on a few examples: Definition of standard mixed models. Parameter estimation.

More information

Answer to exercise: Blood pressure lowering drugs

Answer to exercise: Blood pressure lowering drugs Answer to exercise: Blood pressure lowering drugs The data set bloodpressure.txt contains data from a cross-over trial, involving three different formulations of a drug for lowering of blood pressure:

More information

over Time line for the means). Specifically, & covariances) just a fixed variance instead. PROC MIXED: to 1000 is default) list models with TYPE=VC */

over Time line for the means). Specifically, & covariances) just a fixed variance instead. PROC MIXED: to 1000 is default) list models with TYPE=VC */ CLP 944 Example 4 page 1 Within-Personn Fluctuation in Symptom Severity over Time These data come from a study of weekly fluctuation in psoriasis severity. There was no intervention and no real reason

More information

Inferences on Linear Combinations of Coefficients

Inferences on Linear Combinations of Coefficients Inferences on Linear Combinations of Coefficients Note on required packages: The following code required the package multcomp to test hypotheses on linear combinations of regression coefficients. If you

More information

Exploring Hierarchical Linear Mixed Models

Exploring Hierarchical Linear Mixed Models Exploring Hierarchical Linear Mixed Models 1/49 Last time... A Greenhouse Experiment testing C:N Ratios Sam was testing how changing the C:N Ratio of soil affected plant leaf growth. He had 3 treatments.

More information

MIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 2 DAVID C. HOWELL 4/1/2010

MIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 2 DAVID C. HOWELL 4/1/2010 MIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 2 DAVID C. HOWELL 4/1/2010 Part 1 of this document can be found at http://www.uvm.edu/~dhowell/methods/supplements/mixed Models for Repeated Measures1.pdf

More information

SAS Syntax and Output for Data Manipulation: CLDP 944 Example 3a page 1

SAS Syntax and Output for Data Manipulation: CLDP 944 Example 3a page 1 CLDP 944 Example 3a page 1 From Between-Person to Within-Person Models for Longitudinal Data The models for this example come from Hoffman (2015) chapter 3 example 3a. We will be examining the extent to

More information

Regression Analysis in R

Regression Analysis in R Regression Analysis in R 1 Purpose The purpose of this activity is to provide you with an understanding of regression analysis and to both develop and apply that knowledge to the use of the R statistical

More information

ST505/S697R: Fall Homework 2 Solution.

ST505/S697R: Fall Homework 2 Solution. ST505/S69R: Fall 2012. Homework 2 Solution. 1. 1a; problem 1.22 Below is the summary information (edited) from the regression (using R output); code at end of solution as is code and output for SAS. a)

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

Handout 4: Simple Linear Regression

Handout 4: Simple Linear Regression Handout 4: Simple Linear Regression By: Brandon Berman The following problem comes from Kokoska s Introductory Statistics: A Problem-Solving Approach. The data can be read in to R using the following code:

More information

Correlated Data: Linear Mixed Models with Random Intercepts

Correlated Data: Linear Mixed Models with Random Intercepts 1 Correlated Data: Linear Mixed Models with Random Intercepts Mixed Effects Models This lecture introduces linear mixed effects models. Linear mixed models are a type of regression model, which generalise

More information

Introduction to Statistics and R

Introduction to Statistics and R Introduction to Statistics and R Mayo-Illinois Computational Genomics Workshop (2018) Ruoqing Zhu, Ph.D. Department of Statistics, UIUC rqzhu@illinois.edu June 18, 2018 Abstract This document is a supplimentary

More information

STAT 3022 Spring 2007

STAT 3022 Spring 2007 Simple Linear Regression Example These commands reproduce what we did in class. You should enter these in R and see what they do. Start by typing > set.seed(42) to reset the random number generator so

More information

Temporal Learning: IS50 prior RT

Temporal Learning: IS50 prior RT Temporal Learning: IS50 prior RT Loading required package: Matrix Jihyun Suh 1/27/2016 This data.table install has not detected OpenMP support. It will work but slower in single threaded m Attaching package:

More information

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 10 Analysing Longitudinal Data I: Computerised Delivery of Cognitive Behavioural Therapy Beat the Blues 10.1 Introduction

More information

Introduction to SAS proc mixed

Introduction to SAS proc mixed Faculty of Health Sciences Introduction to SAS proc mixed Analysis of repeated measurements, 2017 Julie Forman Department of Biostatistics, University of Copenhagen 2 / 28 Preparing data for analysis The

More information

A Handbook of Statistical Analyses Using R 3rd Edition. Torsten Hothorn and Brian S. Everitt

A Handbook of Statistical Analyses Using R 3rd Edition. Torsten Hothorn and Brian S. Everitt A Handbook of Statistical Analyses Using R 3rd Edition Torsten Hothorn and Brian S. Everitt CHAPTER 15 Simultaneous Inference and Multiple Comparisons: Genetic Components of Alcoholism, Deer Browsing

More information

SAS Syntax and Output for Data Manipulation:

SAS Syntax and Output for Data Manipulation: CLP 944 Example 5 page 1 Practice with Fixed and Random Effects of Time in Modeling Within-Person Change The models for this example come from Hoffman (2015) chapter 5. We will be examining the extent

More information

Introduction to SAS proc mixed

Introduction to SAS proc mixed Faculty of Health Sciences Introduction to SAS proc mixed Analysis of repeated measurements, 2017 Julie Forman Department of Biostatistics, University of Copenhagen Outline Data in wide and long format

More information

Introductory Statistics with R: Simple Inferences for continuous data

Introductory Statistics with R: Simple Inferences for continuous data Introductory Statistics with R: Simple Inferences for continuous data Statistical Packages STAT 1301 / 2300, Fall 2014 Sungkyu Jung Department of Statistics University of Pittsburgh E-mail: sungkyu@pitt.edu

More information

lme4 Luke Chang Last Revised July 16, Fitting Linear Mixed Models with a Varying Intercept

lme4 Luke Chang Last Revised July 16, Fitting Linear Mixed Models with a Varying Intercept lme4 Luke Chang Last Revised July 16, 2010 1 Using lme4 1.1 Fitting Linear Mixed Models with a Varying Intercept We will now work through the same Ultimatum Game example from the regression section and

More information

22s:152 Applied Linear Regression. Returning to a continuous response variable Y...

22s:152 Applied Linear Regression. Returning to a continuous response variable Y... 22s:152 Applied Linear Regression Generalized Least Squares Returning to a continuous response variable Y... Ordinary Least Squares Estimation The classical models we have fit so far with a continuous

More information

Mixed Model: Split plot with two whole-plot factors, one split-plot factor, and CRD at the whole-plot level (e.g. fancier split-plot p.

Mixed Model: Split plot with two whole-plot factors, one split-plot factor, and CRD at the whole-plot level (e.g. fancier split-plot p. STAT:5201 Applied Statistic II Mixed Model: Split plot with two whole-plot factors, one split-plot factor, and CRD at the whole-plot level (e.g. fancier split-plot p.422 OLRT) Hamster example with three

More information

22s:152 Applied Linear Regression. In matrix notation, we can write this model: Generalized Least Squares. Y = Xβ + ɛ with ɛ N n (0, Σ)

22s:152 Applied Linear Regression. In matrix notation, we can write this model: Generalized Least Squares. Y = Xβ + ɛ with ɛ N n (0, Σ) 22s:152 Applied Linear Regression Generalized Least Squares Returning to a continuous response variable Y Ordinary Least Squares Estimation The classical models we have fit so far with a continuous response

More information

Week 8, Lectures 1 & 2: Fixed-, Random-, and Mixed-Effects models

Week 8, Lectures 1 & 2: Fixed-, Random-, and Mixed-Effects models Week 8, Lectures 1 & 2: Fixed-, Random-, and Mixed-Effects models 1. The repeated measures design, where each of n Ss is measured k times, is a popular one in Psych. We approach this design in 2 ways:

More information

STAT 572 Assignment 5 - Answers Due: March 2, 2007

STAT 572 Assignment 5 - Answers Due: March 2, 2007 1. The file glue.txt contains a data set with the results of an experiment on the dry sheer strength (in pounds per square inch) of birch plywood, bonded with 5 different resin glues A, B, C, D, and E.

More information

df=degrees of freedom = n - 1

df=degrees of freedom = n - 1 One sample t-test test of the mean Assumptions: Independent, random samples Approximately normal distribution (from intro class: σ is unknown, need to calculate and use s (sample standard deviation)) Hypotheses:

More information

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph. Regression, Part I I. Difference from correlation. II. Basic idea: A) Correlation describes the relationship between two variables, where neither is independent or a predictor. - In correlation, it would

More information

Analysis of 2x2 Cross-Over Designs using T-Tests

Analysis of 2x2 Cross-Over Designs using T-Tests Chapter 234 Analysis of 2x2 Cross-Over Designs using T-Tests Introduction This procedure analyzes data from a two-treatment, two-period (2x2) cross-over design. The response is assumed to be a continuous

More information

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model 1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor

More information

Package HGLMMM for Hierarchical Generalized Linear Models

Package HGLMMM for Hierarchical Generalized Linear Models Package HGLMMM for Hierarchical Generalized Linear Models Marek Molas Emmanuel Lesaffre Erasmus MC Erasmus Universiteit - Rotterdam The Netherlands ERASMUSMC - Biostatistics 20-04-2010 1 / 52 Outline General

More information

Statistical Prediction

Statistical Prediction Statistical Prediction P.R. Hahn Fall 2017 1 Some terminology The goal is to use data to find a pattern that we can exploit. y: response/outcome/dependent/left-hand-side x: predictor/covariate/feature/independent

More information

Homework 3 - Solution

Homework 3 - Solution STAT 526 - Spring 2011 Homework 3 - Solution Olga Vitek Each part of the problems 5 points 1. KNNL 25.17 (Note: you can choose either the restricted or the unrestricted version of the model. Please state

More information

Hypothesis Testing. Hypothesis: conjecture, proposition or statement based on published literature, data, or a theory that may or may not be true

Hypothesis Testing. Hypothesis: conjecture, proposition or statement based on published literature, data, or a theory that may or may not be true Hypothesis esting Hypothesis: conjecture, proposition or statement based on published literature, data, or a theory that may or may not be true Statistical Hypothesis: conjecture about a population parameter

More information

Using R in 200D Luke Sonnet

Using R in 200D Luke Sonnet Using R in 200D Luke Sonnet Contents Working with data frames 1 Working with variables........................................... 1 Analyzing data............................................... 3 Random

More information

Linear Probability Model

Linear Probability Model Linear Probability Model Note on required packages: The following code requires the packages sandwich and lmtest to estimate regression error variance that may change with the explanatory variables. If

More information

Multivariate Analysis of Variance

Multivariate Analysis of Variance Chapter 15 Multivariate Analysis of Variance Jolicouer and Mosimann studied the relationship between the size and shape of painted turtles. The table below gives the length, width, and height (all in mm)

More information

Workshop 7.4a: Single factor ANOVA

Workshop 7.4a: Single factor ANOVA -1- Workshop 7.4a: Single factor ANOVA Murray Logan November 23, 2016 Table of contents 1 Revision 1 2 Anova Parameterization 2 3 Partitioning of variance (ANOVA) 10 4 Worked Examples 13 1. Revision 1.1.

More information

Non-independence due to Time Correlation (Chapter 14)

Non-independence due to Time Correlation (Chapter 14) Non-independence due to Time Correlation (Chapter 14) When we model the mean structure with ordinary least squares, the mean structure explains the general trends in the data with respect to our dependent

More information

R in Linguistic Analysis. Wassink 2012 University of Washington Week 6

R in Linguistic Analysis. Wassink 2012 University of Washington Week 6 R in Linguistic Analysis Wassink 2012 University of Washington Week 6 Overview R for phoneticians and lab phonologists Johnson 3 Reading Qs Equivalence of means (t-tests) Multiple Regression Principal

More information

Overview. 1. Independence. 2. Modeling Autocorrelation. 3. Temporal Autocorrelation Example. 4. Spatial Autocorrelation Example

Overview. 1. Independence. 2. Modeling Autocorrelation. 3. Temporal Autocorrelation Example. 4. Spatial Autocorrelation Example 6. Autocorrelation Overview 1. Independence 2. Modeling Autocorrelation 3. Temporal Autocorrelation Example 4. Spatial Autocorrelation Example 6.1 Independence 6.1 Independence. Model assumptions All linear

More information

Finite Mixture Model Diagnostics Using Resampling Methods

Finite Mixture Model Diagnostics Using Resampling Methods Finite Mixture Model Diagnostics Using Resampling Methods Bettina Grün Johannes Kepler Universität Linz Friedrich Leisch Universität für Bodenkultur Wien Abstract This paper illustrates the implementation

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

Chapter 5 Exercises 1

Chapter 5 Exercises 1 Chapter 5 Exercises 1 Data Analysis & Graphics Using R, 2 nd edn Solutions to Exercises (December 13, 2006) Preliminaries > library(daag) Exercise 2 For each of the data sets elastic1 and elastic2, determine

More information

Stat 209 Lab: Linear Mixed Models in R This lab covers the Linear Mixed Models tutorial by John Fox. Lab prepared by Karen Kapur. ɛ i Normal(0, σ 2 )

Stat 209 Lab: Linear Mixed Models in R This lab covers the Linear Mixed Models tutorial by John Fox. Lab prepared by Karen Kapur. ɛ i Normal(0, σ 2 ) Lab 2 STAT209 1/31/13 A complication in doing all this is that the package nlme (lme) is supplanted by the new and improved lme4 (lmer); both are widely used so I try to do both tracks in separate Rogosa

More information

BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression

BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression Introduction to Correlation and Regression The procedures discussed in the previous ANOVA labs are most useful in cases where we are interested

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Package r2glmm. August 5, 2017

Package r2glmm. August 5, 2017 Type Package Package r2glmm August 5, 2017 Title Computes R Squared for Mixed (Multilevel) Models Date 2017-08-04 Version 0.1.2 The model R squared and semi-partial R squared for the linear and generalized

More information

Booklet of Code and Output for STAD29/STA 1007 Midterm Exam

Booklet of Code and Output for STAD29/STA 1007 Midterm Exam Booklet of Code and Output for STAD29/STA 1007 Midterm Exam List of Figures in this document by page: List of Figures 1 Packages................................ 2 2 Hospital infection risk data (some).................

More information

An Introduction to Path Analysis

An Introduction to Path Analysis An Introduction to Path Analysis PRE 905: Multivariate Analysis Lecture 10: April 15, 2014 PRE 905: Lecture 10 Path Analysis Today s Lecture Path analysis starting with multivariate regression then arriving

More information

Coping with Additional Sources of Variation: ANCOVA and Random Effects

Coping with Additional Sources of Variation: ANCOVA and Random Effects Coping with Additional Sources of Variation: ANCOVA and Random Effects 1/49 More Noise in Experiments & Observations Your fixed coefficients are not always so fixed Continuous variation between samples

More information

STAT 510 Final Exam Spring 2015

STAT 510 Final Exam Spring 2015 STAT 510 Final Exam Spring 2015 Instructions: The is a closed-notes, closed-book exam No calculator or electronic device of any kind may be used Use nothing but a pen or pencil Please write your name and

More information

Outline. Mixed models in R using the lme4 package Part 3: Longitudinal data. Sleep deprivation data. Simple longitudinal data

Outline. Mixed models in R using the lme4 package Part 3: Longitudinal data. Sleep deprivation data. Simple longitudinal data Outline Mixed models in R using the lme4 package Part 3: Longitudinal data Douglas Bates Longitudinal data: sleepstudy A model with random effects for intercept and slope University of Wisconsin - Madison

More information

Mixed Model Theory, Part I

Mixed Model Theory, Part I enote 4 1 enote 4 Mixed Model Theory, Part I enote 4 INDHOLD 2 Indhold 4 Mixed Model Theory, Part I 1 4.1 Design matrix for a systematic linear model.................. 2 4.2 The mixed model.................................

More information

Analysis of Covariance: Comparing Regression Lines

Analysis of Covariance: Comparing Regression Lines Chapter 7 nalysis of Covariance: Comparing Regression ines Suppose that you are interested in comparing the typical lifetime (hours) of two tool types ( and ). simple analysis of the data given below would

More information

CO2 Handout. t(cbind(co2$type,co2$treatment,model.matrix(~type*treatment,data=co2)))

CO2 Handout. t(cbind(co2$type,co2$treatment,model.matrix(~type*treatment,data=co2))) CO2 Handout CO2.R: library(nlme) CO2[1:5,] plot(co2,outer=~treatment*type,layout=c(4,1)) m1co2.lis

More information

SPH 247 Statistical Analysis of Laboratory Data

SPH 247 Statistical Analysis of Laboratory Data SPH 247 Statistical Analysis of Laboratory Data March 31, 2015 SPH 247 Statistical Analysis of Laboratory Data 1 ANOVA Fixed and Random Effects We will review the analysis of variance (ANOVA) and then

More information

Hierarchical Random Effects

Hierarchical Random Effects enote 5 1 enote 5 Hierarchical Random Effects enote 5 INDHOLD 2 Indhold 5 Hierarchical Random Effects 1 5.1 Introduction.................................... 2 5.2 Main example: Lactase measurements in

More information

Aedes egg laying behavior Erika Mudrak, CSCU November 7, 2018

Aedes egg laying behavior Erika Mudrak, CSCU November 7, 2018 Aedes egg laying behavior Erika Mudrak, CSCU November 7, 2018 Introduction The current study investivates whether the mosquito species Aedes albopictus preferentially lays it s eggs in water in containers

More information

Inference with Heteroskedasticity

Inference with Heteroskedasticity Inference with Heteroskedasticity Note on required packages: The following code requires the packages sandwich and lmtest to estimate regression error variance that may change with the explanatory variables.

More information

Module 4: Regression Methods: Concepts and Applications

Module 4: Regression Methods: Concepts and Applications Module 4: Regression Methods: Concepts and Applications Example Analysis Code Rebecca Hubbard, Mary Lou Thompson July 11-13, 2018 Install R Go to http://cran.rstudio.com/ (http://cran.rstudio.com/) Click

More information

A Re-Introduction to General Linear Models (GLM)

A Re-Introduction to General Linear Models (GLM) A Re-Introduction to General Linear Models (GLM) Today s Class: You do know the GLM Estimation (where the numbers in the output come from): From least squares to restricted maximum likelihood (REML) Reviewing

More information

Univariate Analysis of Variance

Univariate Analysis of Variance Univariate Analysis of Variance Output Created Comments Input Missing Value Handling Syntax Resources Notes Data Active Dataset Filter Weight Split File N of Rows in Working Data File Definition of Missing

More information

Regression. Marc H. Mehlman University of New Haven

Regression. Marc H. Mehlman University of New Haven Regression Marc H. Mehlman marcmehlman@yahoo.com University of New Haven the statistician knows that in nature there never was a normal distribution, there never was a straight line, yet with normal and

More information

R-companion to: Estimation of the Thurstonian model for the 2-AC protocol

R-companion to: Estimation of the Thurstonian model for the 2-AC protocol R-companion to: Estimation of the Thurstonian model for the 2-AC protocol Rune Haubo Bojesen Christensen, Hye-Seong Lee & Per Bruun Brockhoff August 24, 2017 This document describes how the examples in

More information

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS In our work on hypothesis testing, we used the value of a sample statistic to challenge an accepted value of a population parameter. We focused only

More information

Hierarchical Linear Models (HLM) Using R Package nlme. Interpretation. 2 = ( x 2) u 0j. e ij

Hierarchical Linear Models (HLM) Using R Package nlme. Interpretation. 2 = ( x 2) u 0j. e ij Hierarchical Linear Models (HLM) Using R Package nlme Interpretation I. The Null Model Level 1 (student level) model is mathach ij = β 0j + e ij Level 2 (school level) model is β 0j = γ 00 + u 0j Combined

More information

Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee

Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Lecture - 04 Basic Statistics Part-1 (Refer Slide Time: 00:33)

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

Subject-specific observed profiles of log(fev1) vs age First 50 subjects in Six Cities Study

Subject-specific observed profiles of log(fev1) vs age First 50 subjects in Six Cities Study Subject-specific observed profiles of log(fev1) vs age First 50 subjects in Six Cities Study 1.4 0.0-6 7 8 9 10 11 12 13 14 15 16 17 18 19 age Model 1: A simple broken stick model with knot at 14 fit with

More information

Lab #5 - Predictive Regression I Econ 224 September 11th, 2018

Lab #5 - Predictive Regression I Econ 224 September 11th, 2018 Lab #5 - Predictive Regression I Econ 224 September 11th, 2018 Introduction This lab provides a crash course on least squares regression in R. In the interest of time we ll work with a very simple, but

More information

STAT 215 Confidence and Prediction Intervals in Regression

STAT 215 Confidence and Prediction Intervals in Regression STAT 215 Confidence and Prediction Intervals in Regression Colin Reimer Dawson Oberlin College 24 October 2016 Outline Regression Slope Inference Partitioning Variability Prediction Intervals Reminder:

More information

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont. TCELL 9/4/205 36-309/749 Experimental Design for Behavioral and Social Sciences Simple Regression Example Male black wheatear birds carry stones to the nest as a form of sexual display. Soler et al. wanted

More information

Introduction to Mixed Models in R

Introduction to Mixed Models in R Introduction to Mixed Models in R Galin Jones School of Statistics University of Minnesota http://www.stat.umn.edu/ galin March 2011 Second in a Series Sponsored by Quantitative Methods Collaborative.

More information

The coxvc_1-1-1 package

The coxvc_1-1-1 package Appendix A The coxvc_1-1-1 package A.1 Introduction The coxvc_1-1-1 package is a set of functions for survival analysis that run under R2.1.1 [81]. This package contains a set of routines to fit Cox models

More information

STAT 526 Advanced Statistical Methodology

STAT 526 Advanced Statistical Methodology STAT 526 Advanced Statistical Methodology Fall 2017 Lecture Note 10 Analyzing Clustered/Repeated Categorical Data 0-0 Outline Clustered/Repeated Categorical Data Generalized Linear Mixed Models Generalized

More information

Simple, Marginal, and Interaction Effects in General Linear Models

Simple, Marginal, and Interaction Effects in General Linear Models Simple, Marginal, and Interaction Effects in General Linear Models PRE 905: Multivariate Analysis Lecture 3 Today s Class Centering and Coding Predictors Interpreting Parameters in the Model for the Means

More information

BIOL Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES

BIOL Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES BIOL 458 - Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES PART 1: INTRODUCTION TO ANOVA Purpose of ANOVA Analysis of Variance (ANOVA) is an extremely useful statistical method

More information

Mixed effects models

Mixed effects models Mixed effects models The basic theory and application in R Mitchel van Loon Research Paper Business Analytics Mixed effects models The basic theory and application in R Author: Mitchel van Loon Research

More information

Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.

Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6. Chapter 7 Reading 7.1, 7.2 Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.112 Introduction In Chapter 5 and 6, we emphasized

More information

Using the lsmeans Package

Using the lsmeans Package Using the lsmeans Package Russell V. Lenth The University of Iowa russell-lenth@uiowa.edu November, 0 Introduction Least-squares means (or LS means), popularized by SAS, are predictions from a linear model

More information