Outline. Mixed models in R using the lme4 package Part 3: Longitudinal data. Sleep deprivation data. Simple longitudinal data

Similar documents
Mixed models in R using the lme4 package Part 2: Longitudinal data, modeling interactions

A brief introduction to mixed models

Outline. Mixed models in R using the lme4 package Part 5: Generalized linear mixed models. Parts of LMMs carried over to GLMMs

Mixed models in R using the lme4 package Part 7: Generalized linear mixed models

lme4 Luke Chang Last Revised July 16, Fitting Linear Mixed Models with a Varying Intercept

Fitting Linear Mixed-Effects Models Using the lme4 Package in R

Mixed effects models

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Value Added Modeling

Fitting Mixed-Effects Models Using the lme4 Package in R

Workshop 9.3a: Randomized block designs

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn

Random and Mixed Effects Models - Part II

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

Correlated Data: Linear Mixed Models with Random Intercepts

Introduction and Background to Multilevel Analysis

Mixed models in R using the lme4 package Part 3: Linear mixed models with simple, scalar random effects

36-463/663: Hierarchical Linear Models

Introduction to Linear Mixed Models: Modeling continuous longitudinal outcomes

Generalized linear models

Subject-specific observed profiles of log(fev1) vs age First 50 subjects in Six Cities Study

Longitudinal Data Analysis

Mixed models in R using the lme4 package Part 1: Linear mixed models with simple, scalar random

Hierarchical Linear Models (HLM) Using R Package nlme. Interpretation. 2 = ( x 2) u 0j. e ij

Aedes egg laying behavior Erika Mudrak, CSCU November 7, 2018

Introduction to Within-Person Analysis and RM ANOVA

Generalized Linear and Nonlinear Mixed-Effects Models

University of Minnesota Duluth

R Output for Linear Models using functions lm(), gls() & glm()

Outline. Mixed models in R using the lme4 package Part 2: Linear mixed models with simple, scalar random effects. R packages. Accessing documentation

20. REML Estimation of Variance Components. Copyright c 2018 (Iowa State University) 20. Statistics / 36

Mixed models in R using the lme4 package Part 4: Theory of linear mixed models

Mixed models in R using the lme4 package Part 1: Linear mixed models with simple, scalar random effects

Outline. Statistical inference for linear mixed models. One-way ANOVA in matrix-vector form

Model comparison and selection

School of Mathematical Sciences. Question 1

An Introduction to Path Analysis

One-stage dose-response meta-analysis

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Stat 579: Generalized Linear Models and Extensions

Random and Mixed Effects Models - Part III

Analysis of binary repeated measures data with R

A (Brief) Introduction to Crossed Random Effects Models for Repeated Measures Data

Outline for today. Two-way analysis of variance with random effects

An Introduction to Mplus and Path Analysis

Introduction to the Analysis of Hierarchical and Longitudinal Data

Package blme. August 29, 2016

Coping with Additional Sources of Variation: ANCOVA and Random Effects

Machine Learning for OR & FE

MLMED. User Guide. Nicholas J. Rockwood The Ohio State University Beta Version May, 2017

These slides illustrate a few example R commands that can be useful for the analysis of repeated measures data.

Homework 3 - Solution

CS Homework 3. October 15, 2009

over Time line for the means). Specifically, & covariances) just a fixed variance instead. PROC MIXED: to 1000 is default) list models with TYPE=VC */

Regression, Ridge Regression, Lasso

Chapter 1 Statistical Inference

Linear Model Selection and Regularization

A. A Brief Introduction to Mixed-Effects Models

Lecture 9 STK3100/4100

Review of CLDP 944: Multilevel Models for Longitudinal Data

Exploring Hierarchical Linear Mixed Models

Vector Autoregressive Model. Vector Autoregressions II. Estimation of Vector Autoregressions II. Estimation of Vector Autoregressions I.

Generalized, Linear, and Mixed Models

Journal of Statistical Software

STATISTICS 110/201 PRACTICE FINAL EXAM

Introductory Econometrics

UNIVERSITY OF TORONTO Faculty of Arts and Science

Multilevel Modeling Day 2 Intermediate and Advanced Issues: Multilevel Models as Mixed Models. Jian Wang September 18, 2012

Simple Linear Regression

Multivariate Statistics in Ecology and Quantitative Genetics Summary

Power analysis examples using R

Non-Gaussian Response Variables

Penalized least squares versus generalized least squares representations of linear mixed models

Mixed Model Theory, Part I

Mixed-effects Maximum Likelihood Difference Scaling

arxiv: v2 [stat.co] 17 Mar 2018

Model Estimation Example

Correlational Structure in the Random-Effect Structure of Mixed Models

Model selection and comparison

10. Alternative case influence statistics

Stat 5303 (Oehlert): Balanced Incomplete Block Designs 1

Investigation into the use of confidence indicators with calibration

Answer to exercise: Blood pressure lowering drugs

How the mean changes depends on the other variable. Plots can show what s happening...

Ch 2: Simple Linear Regression

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

I r j Binom(m j, p j ) I L(, ; y) / exp{ y j + (x j y j ) m j log(1 + e + x j. I (, y) / L(, ; y) (, )

Tables Table A Table B Table C Table D Table E 675

Case of single exogenous (iv) variable (with single or multiple mediators) iv à med à dv. = β 0. iv i. med i + α 1

The linear mixed model: introduction and the basic model

Using R formulae to test for main effects in the presence of higher-order interactions

A Brief and Friendly Introduction to Mixed-Effects Models in Linguistics

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

Simple logistic regression

Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p )

Longitudinal Invariance CFA (using MLR) Example in Mplus v. 7.4 (N = 151; 6 items over 3 occasions)

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models:

A strategy for modelling count data which may have extra zeros

Transcription:

Outline Mixed models in R using the lme4 package Part 3: Longitudinal data Douglas Bates Longitudinal data: sleepstudy A model with random effects for intercept and slope University of Wisconsin - Madison and R Development Core Team <Douglas.Bates@R-project.org> Conditional means UseR!9, Rennes, France July 7, 9 Simple longitudinal data Repeated measures data consist of measurements of a response (and, perhaps, some covariates) on several experimental (or observational) units. Frequently the experimental (observational) unit is Subject and we will refer to these units as subjects. However, the methods described here are not restricted to data on human subjects. Longitudinal data are repeated measures data in which the observations are taken over time. We wish to characterize the response over time within subjects and the variation in the time trends between subjects. Frequently we are not as interested in comparing the particular subjects in the study as much as we are interested in modeling the variability in the population from which the subjects were chosen. Sleep deprivation data This laboratory experiment measured the effect of sleep deprivation on cognitive performance. There were 18 subjects, chosen from the population of interest (long-distance truck drivers), in the 10 day trial. These subjects were restricted to 3 hours sleep per night during the trial. On each day of the trial each subject s reaction time was measured. The reaction time shown here is the average of several measurements. These data are balanced in that each subject is measured the same number of times and on the same occasions.

Reaction time versus days by subject Comments on the sleep data plot Average reaction time (ms) of sleep deprivation The plot is a trellis or lattice plot where the data for each subject are presented in a separate panel. The axes are consistent across panels so we may compare patterns across subjects. A reference line fit by simple linear regression to the panel s data has been added to each panel. The aspect ratio of the panels has been adjusted so that a typical reference line lies about 45 on the page. We have the greatest sensitivity in checking for differences in slopes when the lines are near ±45 on the page. The panels have been ordered not by subject number (which is essentially a random order) but according to increasing intercept for the simple linear regression. If the slopes and the intercepts are highly correlated we should see a pattern across the panels in the slopes. Assessing the linear fits In most cases a simple linear regression provides an adequate fit to the within-subject data. Patterns for some subjects (e.g., and ) deviate from linearity but the deviations are neither widespread nor consistent in form. There is considerable variation in the intercept (estimated reaction time without sleep deprivation) across subjects ms. up to ms. and in the slope (increase in reaction time per day of sleep deprivation) 0 ms./day up to 20 ms./day. We can examine this variation further by plotting confidence intervals for these intercepts and slopes. Because we use a pooled variance estimate and have balanced data, the intervals have identical widths. We again order the subjects by increasing intercept so we can check for relationships between slopes and intercepts. 95% conf int on within-subject intercept and slope 180 220 240 260 280 10 0 10 20 These intervals reinforce our earlier impressions of considerable variability between subjects in both intercept and slope but little evidence of a relationship between intercept and slope.

A preliminary mixed-effects model We begin with a linear mixed model in which the fixed effects [β 1, β 2 ] are the representative intercept and slope for the population and the random effects b i = [b i1, b i2 ], i = 1,..., 18 are the deviations in intercept and slope associated with subject i. The random effects vector, b, consists of the 18 intercept effects followed by the 18 slope effects. 10 20 30 50 100 150 Fitting the model > (fm1 <- lmer(reaction ~ + ( Subject), + sleepstudy)) Linear mixed model fit by REML Formula: Reaction ~ + ( Subject) AIC BIC loglik deviance REMLdev 1756 1775-871.8 1752 1744 Random effects: Groups Name Variance Std.Dev. Corr Subject 612.092 24.7405 35.072 5.9221 0.066 Residual 654.941 25.5918 Number of obs: 180, groups: Subject, 18 Fixed effects: Estimate Std. Error t value 251.405 6.825 36.84 10.467 1.546 6.77 Correlation of Fixed Effects: (Intr) -0.138 Terms and matrices The term in the formula generates a model matrix X with two columns, the intercept column and the numeric column. (The intercept is included unless suppressed.) The term (Subject) generates a vector-valued random effect (intercept and slope) for each of the 18 levels of the Subject factor. A model with uncorrelated random effects The data plots gave little indication of a systematic relationship between a subject s random effect for slope and his/her random effect for the intercept. Also, the estimated correlation is quite small. We should consider a model with uncorrelated random effects. To express this we use two random-effects terms with the same grouping factor and different left-hand sides. In the formula for an lmer model, distinct random effects terms are modeled as being independent. Thus we specify the model with two distinct random effects terms, each of which has Subject as the grouping factor. The model matrix for one term is intercept only (1) and for the other term is the column for only, which can be written 0+. (The expression generates a column for and an intercept. To suppress the intercept we add 0+ to the expression; -1 also works.)

A mixed-effects model with independent random effects Linear mixed model fit by REML Formula: Reaction ~ + (1 Subject) + (0 + Subject) AIC BIC loglik deviance REMLdev 1754 1770-871.8 1752 1744 Random effects: Groups Name Variance Std.Dev. Subject 627.568 25.0513 Subject 35.858 5.9882 Residual 653.584 25.5653 Number of obs: 180, groups: Subject, 18 Fixed effects: Estimate Std. Error t value 251.405 6.885 36.51 10.467 1.559 6.71 Correlation of Fixed Effects: (Intr) -0.184 Comparing the models Model fm1 contains model fm2 in the sense that if the parameter values for model fm1 were constrained so as to force the correlation, and hence the covariance, to be zero, and the model were re-fit, we would get model fm2. The value 0, to which the correlation is constrained, is not on the boundary of the allowable parameter values. In these circumstances a likelihood ratio test and a reference distribution of a χ 2 on 1 degree of freedom is suitable. > anova(fm2, fm1) Models: fm2: Reaction ~ + (1 Subject) + (0 + Subject) fm1: Reaction ~ + ( Subject) Df AIC BIC loglik Chisq Chi Df Pr(>Chisq) fm2 5 1762.05 1778.01-876.02 fm1 6 1763.99 1783.14-875.99 0.0609 1 0.805 Conclusions from the likelihood ratio test Because the large p-value indicates that we would not reject fm2 in favor of fm1, we prefer the more parsimonious fm2. This conclusion is consistent with the AIC (Akaike s Information Criterion) and the BIC (Bayesian Information Criterion) values for which smaller is better. We can also use a Bayesian approach, where we regard the parameters as themselves being random variables, is assessing the values of such parameters. A currently popular Bayesian method is to use sequential sampling from the conditional distribution of subsets of the parameters, given the data and the values of the other parameters. The general technique is called Markov chain Monte Carlo sampling. The lme4 package has a function called mcmcsamp to evaluate such samples from a fitted model. At present, however, there seem to be a few infelicities, as Bill Venables calls them, in this function. Likelihood ratio tests on variance components As for the case of a covariance, we can fit the model with and without the variance component and compare the fit quality. As mentioned previously, the likelihood ratio is a reasonable test statistic for the comparison but the asymptotic reference distribution of a χ 2 does not apply because the parameter value being tested is on the boundary. The p-value computed using the χ 2 reference distribution should be conservative (i.e. greater than the p-value that would be obtained through simulation). > fm3 <- lmer(reaction ~ + (1 Subject), sleepstudy) > anova(fm3, fm2) Models: fm3: Reaction ~ + (1 Subject) fm2: Reaction ~ + (1 Subject) + (0 + Subject) Df AIC BIC loglik Chisq Chi Df Pr(>Chisq) fm3 4 1802.10 1814.87-897.05 fm2 5 1762.05 1778.01-876.02 42.053 1 8.885e-11

Conditional means of the random effects > (rr2 <- ranef(fm2)) $Subject 1.5138201 9.3241219-40.3749106-8.5997562-39.1816682-5.3881596 24.5182906-4.9689806 22.9140345-3.1941494 9.2219311-0.5136 17.1560765-0.2872253-7.4515945 1.1160651 0.5774092-10.9067061 34.7689483 8.6282046-25.7541540 1.2807723-13.8642119 6.7568576 4.9156063-3.0753411 20.9294539 3.5124498 3.2587508 0.8731102-26.4752098 4.9841221 0.9055257-1.0053610 12.4219020 1.2584893 Comparing within-subject coefficients Scatterplot of the conditional means 20 0 20 40 10 5 0 5 10 Estimated within-group coefficients and BLUPs For this model we can combine the conditional means of the random effects and the estimates of the fixed effects to get conditional means of the within-subject coefficients. Mixed model Within group These conditional means will be shrunken towards the fixed-effects estimates relative to the estimated coefficients from each subject s data. John Tukey called this borrowing strength between subjects. Plotting the shrinkage of the within-subject coefficients shows that some of the coefficients are considerably shrunken toward the fixed-effects estimates. However, comparing the within-group and mixed model fitted lines shows that large changes in coefficients occur in the noisy data. Precisely estimated within-group coefficients are not changed substantially. 280 260 240 220 0 5 10 15 20

Observed and fitted Plot of prediction intervals for the random effects Average reaction time (ms) Mixed model Within group 60 40 20 0 20 40 60 15 10 5 0 5 10 15 of sleep deprivation Each set of prediction intervals have constant width because of the balance in the experiment. Conclusions from the example Carefully plotting the data is enormously helpful in formulating the model. It is relatively easy to fit and evaluate models to data like these, from a balanced designed experiment. We consider two models with random effects for the slope and the intercept of the response w.r.t. time by subject. The models differ in whether the (marginal) correlation of the vector of random effects per subject is allowed to be nonzero. The estimates (actually, the conditional means) of the random effects can be considered as penalized estimates of these parameters in that they are shrunk towards the origin. Most of the prediction intervals for the random effects overlap zero.