Random Coefficients Model Examples

Similar documents
SAS Syntax and Output for Data Manipulation:

Introduction to the Analysis of Hierarchical and Longitudinal Data

Introduction to SAS proc mixed

Random Coefficient Model (a.k.a. multilevel model) (Adapted from UCLA Statistical Computing Seminars)

over Time line for the means). Specifically, & covariances) just a fixed variance instead. PROC MIXED: to 1000 is default) list models with TYPE=VC */

Introduction to SAS proc mixed

Answer to exercise: Blood pressure lowering drugs

Best Linear Unbiased Prediction (BLUP) of Random Effects in the Normal Linear Mixed Effects Model. *Modified notes from Dr. Dan Nettleton from ISU

Biostatistics Workshop Longitudinal Data Analysis. Session 4 GARRETT FITZMAURICE

STAT 5200 Handout #23. Repeated Measures Example (Ch. 16)

Statistical Analysis of Hierarchical Data. David Zucker Hebrew University, Jerusalem, Israel

Mixed-Effects Models in R

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

Random Effects. Edps/Psych/Stat 587. Carolyn J. Anderson. Fall Department of Educational Psychology. university of illinois at urbana-champaign

SAS Code for Data Manipulation: SPSS Code for Data Manipulation: STATA Code for Data Manipulation: Psyc 945 Example 1 page 1

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc

Subject-specific observed profiles of log(fev1) vs age First 50 subjects in Six Cities Study

20. REML Estimation of Variance Components. Copyright c 2018 (Iowa State University) 20. Statistics / 36

Correlated Data: Linear Mixed Models with Random Intercepts

Lab 11. Multilevel Models. Description of Data

Mixed models with correlated measurement errors

SAS Syntax and Output for Data Manipulation: CLDP 944 Example 3a page 1

ST505/S697R: Fall Homework 2 Solution.

Randomized Block Designs with Replicates

Testing Indirect Effects for Lower Level Mediation Models in SAS PROC MIXED

Hierarchical Linear Models (HLM) Using R Package nlme. Interpretation. 2 = ( x 2) u 0j. e ij

Linear Mixed Models. Appendix to An R and S-PLUS Companion to Applied Regression. John Fox. May 2002

Random Effects. Edps/Psych/Stat 587. Carolyn J. Anderson. Fall Department of Educational Psychology. c Board of Trustees, University of Illinois

22s:152 Applied Linear Regression. Returning to a continuous response variable Y...

STAT 3A03 Applied Regression With SAS Fall 2017

Random Intercept Models

22s:152 Applied Linear Regression. In matrix notation, we can write this model: Generalized Least Squares. Y = Xβ + ɛ with ɛ N n (0, Σ)

Stat 579: Generalized Linear Models and Extensions

De-mystifying random effects models

Stat 579: Generalized Linear Models and Extensions

Mixed effects models

Faculty of Health Sciences. Correlated data. Variance component models. Lene Theil Skovgaard & Julie Lyng Forman.

Correlated data. Overview. Example: Swelling due to vaccine. Variance component models. Faculty of Health Sciences. Variance component models

Lecture 1 Linear Regression with One Predictor Variable.p2

Practice with Interactions among Continuous Predictors in General Linear Models (as estimated using restricted maximum likelihood in SAS MIXED)

Non-independence due to Time Correlation (Chapter 14)

Introduction to Mixed Models in R

Stat 209 Lab: Linear Mixed Models in R This lab covers the Linear Mixed Models tutorial by John Fox. Lab prepared by Karen Kapur. ɛ i Normal(0, σ 2 )

36-720: Linear Mixed Models

Topic 25 - One-Way Random Effects Models. Outline. Random Effects vs Fixed Effects. Data for One-way Random Effects Model. One-way Random effects

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION

Lecture 4 Multiple linear regression

ANOVA Longitudinal Models for the Practice Effects Data: via GLM

Variance components and LMMs

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016

Statistics for exp. medical researchers Regression and Correlation

Differences of Least Squares Means

Variance components and LMMs

Faculty of Health Sciences. Correlated data. Variance component models. Lene Theil Skovgaard & Julie Lyng Forman.

Correlated data. Overview. Variance component models. Terminology for correlated measurements. Faculty of Health Sciences. Variance component models

Repeated Measures Modeling With PROC MIXED E. Barry Moser, Louisiana State University, Baton Rouge, LA

36-402/608 Homework #10 Solutions 4/1

13. October p. 1

Correlated data. Repeated measurements over time. Typical set-up for repeated measurements. Traditional presentation of data

17. Example SAS Commands for Analysis of a Classic Split-Plot Experiment 17. 1

Value Added Modeling

Advantages of Mixed-effects Regression Models (MRM; aka multilevel, hierarchical linear, linear mixed models) 1. MRM explicitly models individual

Lecture 6 Multiple Linear Regression, cont.

Measuring relationships among multiple responses

Multiple Predictor Variables: ANOVA

Workshop 9.3a: Randomized block designs

Outline. Mixed models in R using the lme4 package Part 3: Longitudinal data. Sleep deprivation data. Simple longitudinal data

University of Minnesota Duluth

Outline. Statistical inference for linear mixed models. One-way ANOVA in matrix-vector form

Introduction and Background to Multilevel Analysis

14 Multiple Linear Regression

Properties of the least squares estimates

Correlated data. Longitudinal data. Typical set-up for repeated measurements. Examples from literature, I. Faculty of Health Sciences

Analysis of variance and regression. May 13, 2008

6. Multiple regression - PROC GLM

Covariance Structure Approach to Within-Cases

Mixed-Models. version 30 October 2011

Simple Linear Regression

Models for longitudinal data

General Linear Model (Chapter 4)

Lecture 3: Inference in SLR

STAT 705 Chapters 23 and 24: Two factors, unequal sample sizes; multi-factor ANOVA

Mixed-Model Estimation of genetic variances. Bruce Walsh lecture notes Uppsala EQG 2012 course version 28 Jan 2012

PLS205 Lab 2 January 15, Laboratory Topic 3

Introduction to Within-Person Analysis and RM ANOVA

Two-Variable Regression Model: The Problem of Estimation

Least Squares Estimation

Varians- og regressionsanalyse

Serial Correlation. Edps/Psych/Stat 587. Carolyn J. Anderson. Fall Department of Educational Psychology

Technische Universität München. Zentrum Mathematik. Linear Mixed Models Applied to Bank Branch Deposit Data

MIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 2 DAVID C. HOWELL 4/1/2010

Applied Regression Analysis

Correlated data. Variance component models. Example: Evaluate vaccine. Traditional assumption so far. Faculty of Health Sciences

Covariance Models (*) X i : (n i p) design matrix for fixed effects β : (p 1) regression coefficient for fixed effects

STAT 705 Generalized linear mixed models

Analysis of variance and regression. December 4, 2007

STATISTICS 479 Exam II (100 points)

STAT3401: Advanced data analysis Week 10: Models for Clustered Longitudinal Data

#Alternatively we could fit a model where the rail values are levels of a factor with fixed effects

Regression Models - Introduction

Transcription:

Random Coefficients Model Examples STAT:5201 Week 15 - Lecture 2 1 / 26

Each subject (or experimental unit) has multiple measurements (this could be over time, or it could be multiple measurements on a continuous variable x). Random effects are included in the model as to allow for a random intercept and a random slope, essentially allowing for a separate line for each subject. with b i = ( b0i b 1i Y ij = β 0 + β 1 x ij + b 0i + b 1i x ij + ɛ ) N ( [ 0 0, d11 d 12 d 21 d 22 b and ɛ independent from each other. Individual: E[Y ij b 0i, b 1i ] = (β 0 + b 0i ) + (β 1 + b 1i )x ij ]) and ɛ ij N(0, σ 2 ) Marginal: E[Y ij ] = β 0 + β 1 x ij 2 / 26

Adapted from John Fox example in Linear Mixed Models Appendix found in An R and S-PLUS Companion to Applied Regression. Eating disorders can be difficult to treat as many patients do not feel the need for treatment, when friends & family recognize the severity. Even after patients with eating disorders are hospitalized, they can continue behaviors that can be detrimental to their health. Here, data were collected recording the amount of exercise of 138 teenage girls hospitalized for eating disorders, and on a group of 93 control subjects. Variables subject: a factor with subject id codes. age: age in years. exercise: hours per week of exercise. group: factor indicating patient or control.. 3 / 26

We will consider the example in R and in SAS. > library(car) > data(blackmore) > head(blackmore) subject age exercise group 1 100 8.00 2.71 patient 2 100 10.00 1.94 patient 3 100 12.00 2.36 patient 4 100 14.00 1.54 patient 5 100 15.92 8.63 patient 6 101 8.00 0.14 patient > dim(blackmore) [1] 945 4 > length(unique(blackmore$subject[blackmore$group=="patient"])) [1] 138 > length(unique(blackmore$subject[blackmore$group=="control"])) [1] 93 4 / 26

Fox transformed the response variable for numerous reasons (described in text) as log 2 (y + 5/60). > Blackmore$log.exercise <- log(blackmore$exercise + 5/60, 2) > attach(blackmore) Investigating the data with plots (in R). Use a random sample of 20 girls from each group for trend plotting. The groupeddata object from the nlme package is used to form the trellis plots. > library(nlme) > chosen.pat.ids=sample(unique(subject[group=="patient"]), 20) > chosen.pat.20=groupeddata(log.exercise ~ age subject, data=blackmore[is.element(subject,chosen.pat.ids),]) > chosen.con.ids=sample(unique(subject[group=="control"]), 20) > chosen.con.20=groupeddata(log.exercise ~ age subject, data=blackmore[is.element(subject,chosen.con.ids),]) 5 / 26

> print(plot(chosen.con.20, main="control Subjects", xlab="age",ylab="log2 Exercise", ylim=1.2*range(chosen.con.20$log.exercise, chosen.pat.20$log.exercise), layout=c(5,4),aspect=1), position=c(0, 0, 0.5, 1), more=t) > print(plot(chosen.pat.20, main="patients", xlab="age",ylab="log2 Exercise", ylim=1.2*range(chosen.con.20$log.exercise, chosen.pat.20$log.exercise), layout=c(5,4),aspect=1), position=c(0.5, 0, 1, 1)) 6 / 26

The groupeddata object is automatically plotted in order by average exercise. The subjects with the highest exercise values are in the top row, the subjects with the lowest exercise values are in the bottom row. Control Subjects Patients 8 10 14 8 10 14 810 14 18 810 14 18 log2 Exercise 4 2 0-2 -4 4 2 0-2 -4 242 240 245 222 209 262 277 8 10 14 257 281 235 228 272 210 231 202 275 8 10 14 223 204 239 251 8 10 14 4 2 0-2 -4 4 2 0-2 -4 log2 Exercise 4 2 0-2 -4 4 2 0-2 -4 130 151 317 810 14 18 116 304 168 171 118 189 161 338 810 14 18 109 333 125 340 318 119 166 306 331 810 14 18 4 2 0-2 -4 4 2 0-2 -4 Age Age 7 / 26

Investigating the data with plots (in SAS). After I created subsetted data sets of patients from each group called control1 and patient1 (8 subjects per group), I used the PROC SGPANEL procedure to plot the individual trajectories. Here I ve asked for a linear regression line for each subject, but you can simply connect the observed points using the vline option instead of the reg option. proc sgpanel data=control1; title Control Subjects ; panelby subject/columns=4 rows=2; reg x=age y=log_exercise; rowaxis min=-4 max=4; colaxis values=(8, 10, 12, 14, 16); run; proc sgpanel data=patient1; <similar coding for the patient group as control group> 8 / 26

9 / 26

10 / 26

You can also plot the overlay of these individual lines using PROC SGPLOT... proc sgplot data=control1; title Subset of Control Subjects ; reg x=age y=log_exercise/group=subject; run; 11 / 26

Investigating the subject-specific parameter estimates (in R). Fox formally fits a linear regression to each subject (231 separately fit models) in order to investigate the variability and correlation in the slopes and intercept estimates from a graphical perspective. The predictor age is transformed to represent age after the start of the study or age-8. He points out that the random coefficients model (fitted to all the data) fits a unified model that considers slopes and intercepts as random effects, and in that case, the estimated random effects or û are estimated using BLUPs (best linear unbiased predictors). 12 / 26

Investigating the subject-specific parameter estimates (in R). For a model with independent random subject effects (i.e. just a random intercept, as in the gene expression line example from earlier), the BLUPs are actually shrinkage estimators and fall between the individual observed values and the overall mean values. Formally, BLUPs are estimated as û = ĜZ ˆΣ 1 (y X ˆβ) where Σ = var(y) = ZGZ + R. 13 / 26

Before moving to a unified mixed model, we consider truly fitting a separate line to each subject (so, not a random coefficients model). Again, the nlme package is utilized when employing the lmlist function: > pat.list=lmlist(log.exercise ~ I(age - 8) subject, subset = group=="patient", data=blackmore) > con.list=lmlist(log.exercise ~ I(age - 8) subject, subset = group=="control", data=blackmore) > pat.coef=coef(pat.list) > con.coef=coef(con.list) > par(mfrow=c(1,2)) > boxplot(pat.coef[,1], con.coef[,1], main="intercepts", names=c("patients","controls")) > boxplot(pat.coef[,2], con.coef[,2], main="slope", names=c("patients","controls")) 14 / 26

Intercepts Slope -4-2 0 2-1.0-0.5 0.0 0.5 1.0 Patients Controls Patients Controls The intercept represents the level of exercise at the start of the study. As expected, there is a great deal of variation in both the intercepts and the slopes. The median intercepts are fairly similar for patients and controls, but there is somewhat more variation among patients. The slopes are higher on average for patients than for controls and the slopes tend to be positive (suggesting their exercise increases over time). 15 / 26

It makes sense to also plot the relationship between the estimated intercept and slope parameters. The dataellipse function is in the car library. > plot(c(-5,4),c(-1.2,1.2),xlab="intercept",ylab="slope",type="n", main="(individual) Estimates of slope and intercept") > points(con.coef[,1],con.coef[,2],col=1) > points(pat.coef[,1],pat.coef[,2],col=2) > abline(v=0) > abline(h=0) > legend(-4.5,-.7,c("controls","patients"),col=c(1,2),pch=c(1,1)) > dataellipse(con.coef[,1],con.coef[,2],levels=c(.5,.95),add=true, plot.points=false,col=1) > dataellipse(pat.coef[,1],pat.coef[,2],levels=c(.5,.95),add=true, plot.points=false,col=2) 16 / 26

(Individual) Estimates of slope and intercept slope -1.0-0.5 0.0 0.5 1.0 Controls Patients -4-2 0 2 4 intercept Recall that we are on the log-scale base 2 for our response, so y = 0 coincides with 1 hour of exercise a week. It looks like the two groups have a reasonably similar correlation structure for the slope and intercept. It also looks like the patients have a shifted distribution such that they tend to have higher slopes. 17 / 26

Fitting the random coefficients model in SAS This model allows for a random slope and random intercept for each subject (which are allowed to be correlated). The population-level mean structure allows for separate lines for each treatment group (control and patient). The predictor age is transformed to represent age after the start of the study or age-8. data Blackmore; set Blackmore; age_trans = age-8; run; proc mixed data=blackmore; class subject group; model log_exercise = group age_trans group*age_trans/solution ddfm=satterth; random intercept age_trans/subject=subject type=un gcorr; run; 18 / 26

The Mixed Procedure Dimensions Covariance Parameters 4 Columns in X 6 Columns in Z Per Subject 2 Subjects 231 Max Obs Per Subject 5 Estimated G Correlation Matrix Row Effect subject Col1 Col2 1 Intercept 100 1.0000-0.2808 2 age_trans 100-0.2808 1.0000 We see that the correlation between b 0i and b 1i is estimated to be negative (ρ = 0.2808). 19 / 26

The Mixed Procedure Covariance Parameter Estimates Standard Z Cov Parm Subject Estimate Error Value Pr Z UN(1,1) subject 2.0839 0.2901 7.18 <.0001 UN(2,1) subject -0.06681 0.03698-1.81 0.0708 UN(2,2) subject 0.02716 0.007975 3.41 0.0003 Residual 1.5478 0.09743 15.89 <.0001 We see that the correlation between b 0i and b 1i is estimated to be negative (ρ = 0.2808) and marginally significant with a p=0.0708. 20 / 26

Solution for Fixed Effects Standard Effect group Estimate Error DF t Value Pr > t Intercept -0.6300 0.1487 230-4.24 <.0001 group control 0.3540 0.2353 234 1.50 0.1338 group patient 0.... age_trans 0.3039 0.02386 196 12.73 <.0001 age_trans*group control -0.2399 0.03941 221-6.09 <.0001 age_trans*group patient 0.... Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F group 1 234 2.26 0.1338 age_trans 1 221 87.16 <.0001 age_trans*group 1 221 37.05 <.0001 The groups do not have significantly different intercepts (average exercise values at start of study, at age 8), but they do have significantly different slopes with the patient group having a higher slope than the control group. 21 / 26

I can capture the estimated BLUPs or û = ĜZ ˆΣ 1 (y X ˆβ) where Σ = var(y) = ZGZ + R using the ODS output and the solution option in the random statement: ods output SolutionR=blups; proc mixed data=blackmore covtest; class subject group; model log_exercise = group age_trans group*age_trans/ddfm=satterth; random intercept age_trans/subject=subject type=un gcorr solution; run; /* Solution for the random effects are BLUPs*/ ods output close; 22 / 26

proc print data=blups (obs=10); run; StdErr Obs Effect subject Estimate Pred DF tvalue Probt 1 Intercept 100 1.0095 0.7092 235 1.42 0.1560 2 age_trans 100-0.05272 0.1261 69.8-0.42 0.6771 3 Intercept 101-2.1614 0.7094 256-3.05 0.0026 4 age_trans 101 0.01287 0.1221 79.5 0.11 0.9163 5 Intercept 102 0.9339 0.7161 266 1.30 0.1933 6 age_trans 102 0.1258 0.1353 53.1 0.93 0.3567 7 Intercept 103 0.9283 0.7101 250 1.31 0.1923 8 age_trans 103 0.02691 0.1413 44.5 0.19 0.8498 9 Intercept 104 1.1407 0.7177 273 1.59 0.1131 10 age_trans 104-0.03742 0.1332 56.7-0.28 0.7798 23 / 26

Below I ve plotted the estimated BLUPs for the random slopes against the estimated slopes from the separately fit regression lines (in absolute values). Absolute values of slopes BLUP of slope 0.0 0.2 0.4 0.6 0.8 1.0 1.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 separately fit slope (individual regression) 24 / 26

Fitting the random coefficients model (in R) Using the lme function in the nlme package, we see the same estimates for the covariance parameters as in SAS: > lme.1=lme(log.exercise~i(age-8)*group, random=~i(age-8) subject, data=blackmore) > summary(lme.1) Linear mixed-effects model fit by REML Data: Blackmore Random effects: Formula: ~I(age - 8) subject Structure: General positive-definite, Log-Cholesky parametrization StdDev Corr (Intercept) 1.4435580 (Intr) I(age - 8) 0.1647954-0.281 Residual 1.2440951 25 / 26

Fitting the random coefficients model (in R) Square the estimates to match the SAS estimates: Var(Intercept)=1.4435580 2 = 2.083 Var(slope)=0.1647954 2 = 0.027.06682 Corr(Intercept, slope)= 0.1647954 1.4435580 = 0.281 Var(Residual)=1.2440951 2 = 1.548 26 / 26