Random Coefficients Model Examples STAT:5201 Week 15 - Lecture 2 1 / 26
Each subject (or experimental unit) has multiple measurements (this could be over time, or it could be multiple measurements on a continuous variable x). Random effects are included in the model as to allow for a random intercept and a random slope, essentially allowing for a separate line for each subject. with b i = ( b0i b 1i Y ij = β 0 + β 1 x ij + b 0i + b 1i x ij + ɛ ) N ( [ 0 0, d11 d 12 d 21 d 22 b and ɛ independent from each other. Individual: E[Y ij b 0i, b 1i ] = (β 0 + b 0i ) + (β 1 + b 1i )x ij ]) and ɛ ij N(0, σ 2 ) Marginal: E[Y ij ] = β 0 + β 1 x ij 2 / 26
Adapted from John Fox example in Linear Mixed Models Appendix found in An R and S-PLUS Companion to Applied Regression. Eating disorders can be difficult to treat as many patients do not feel the need for treatment, when friends & family recognize the severity. Even after patients with eating disorders are hospitalized, they can continue behaviors that can be detrimental to their health. Here, data were collected recording the amount of exercise of 138 teenage girls hospitalized for eating disorders, and on a group of 93 control subjects. Variables subject: a factor with subject id codes. age: age in years. exercise: hours per week of exercise. group: factor indicating patient or control.. 3 / 26
We will consider the example in R and in SAS. > library(car) > data(blackmore) > head(blackmore) subject age exercise group 1 100 8.00 2.71 patient 2 100 10.00 1.94 patient 3 100 12.00 2.36 patient 4 100 14.00 1.54 patient 5 100 15.92 8.63 patient 6 101 8.00 0.14 patient > dim(blackmore) [1] 945 4 > length(unique(blackmore$subject[blackmore$group=="patient"])) [1] 138 > length(unique(blackmore$subject[blackmore$group=="control"])) [1] 93 4 / 26
Fox transformed the response variable for numerous reasons (described in text) as log 2 (y + 5/60). > Blackmore$log.exercise <- log(blackmore$exercise + 5/60, 2) > attach(blackmore) Investigating the data with plots (in R). Use a random sample of 20 girls from each group for trend plotting. The groupeddata object from the nlme package is used to form the trellis plots. > library(nlme) > chosen.pat.ids=sample(unique(subject[group=="patient"]), 20) > chosen.pat.20=groupeddata(log.exercise ~ age subject, data=blackmore[is.element(subject,chosen.pat.ids),]) > chosen.con.ids=sample(unique(subject[group=="control"]), 20) > chosen.con.20=groupeddata(log.exercise ~ age subject, data=blackmore[is.element(subject,chosen.con.ids),]) 5 / 26
> print(plot(chosen.con.20, main="control Subjects", xlab="age",ylab="log2 Exercise", ylim=1.2*range(chosen.con.20$log.exercise, chosen.pat.20$log.exercise), layout=c(5,4),aspect=1), position=c(0, 0, 0.5, 1), more=t) > print(plot(chosen.pat.20, main="patients", xlab="age",ylab="log2 Exercise", ylim=1.2*range(chosen.con.20$log.exercise, chosen.pat.20$log.exercise), layout=c(5,4),aspect=1), position=c(0.5, 0, 1, 1)) 6 / 26
The groupeddata object is automatically plotted in order by average exercise. The subjects with the highest exercise values are in the top row, the subjects with the lowest exercise values are in the bottom row. Control Subjects Patients 8 10 14 8 10 14 810 14 18 810 14 18 log2 Exercise 4 2 0-2 -4 4 2 0-2 -4 242 240 245 222 209 262 277 8 10 14 257 281 235 228 272 210 231 202 275 8 10 14 223 204 239 251 8 10 14 4 2 0-2 -4 4 2 0-2 -4 log2 Exercise 4 2 0-2 -4 4 2 0-2 -4 130 151 317 810 14 18 116 304 168 171 118 189 161 338 810 14 18 109 333 125 340 318 119 166 306 331 810 14 18 4 2 0-2 -4 4 2 0-2 -4 Age Age 7 / 26
Investigating the data with plots (in SAS). After I created subsetted data sets of patients from each group called control1 and patient1 (8 subjects per group), I used the PROC SGPANEL procedure to plot the individual trajectories. Here I ve asked for a linear regression line for each subject, but you can simply connect the observed points using the vline option instead of the reg option. proc sgpanel data=control1; title Control Subjects ; panelby subject/columns=4 rows=2; reg x=age y=log_exercise; rowaxis min=-4 max=4; colaxis values=(8, 10, 12, 14, 16); run; proc sgpanel data=patient1; <similar coding for the patient group as control group> 8 / 26
9 / 26
10 / 26
You can also plot the overlay of these individual lines using PROC SGPLOT... proc sgplot data=control1; title Subset of Control Subjects ; reg x=age y=log_exercise/group=subject; run; 11 / 26
Investigating the subject-specific parameter estimates (in R). Fox formally fits a linear regression to each subject (231 separately fit models) in order to investigate the variability and correlation in the slopes and intercept estimates from a graphical perspective. The predictor age is transformed to represent age after the start of the study or age-8. He points out that the random coefficients model (fitted to all the data) fits a unified model that considers slopes and intercepts as random effects, and in that case, the estimated random effects or û are estimated using BLUPs (best linear unbiased predictors). 12 / 26
Investigating the subject-specific parameter estimates (in R). For a model with independent random subject effects (i.e. just a random intercept, as in the gene expression line example from earlier), the BLUPs are actually shrinkage estimators and fall between the individual observed values and the overall mean values. Formally, BLUPs are estimated as û = ĜZ ˆΣ 1 (y X ˆβ) where Σ = var(y) = ZGZ + R. 13 / 26
Before moving to a unified mixed model, we consider truly fitting a separate line to each subject (so, not a random coefficients model). Again, the nlme package is utilized when employing the lmlist function: > pat.list=lmlist(log.exercise ~ I(age - 8) subject, subset = group=="patient", data=blackmore) > con.list=lmlist(log.exercise ~ I(age - 8) subject, subset = group=="control", data=blackmore) > pat.coef=coef(pat.list) > con.coef=coef(con.list) > par(mfrow=c(1,2)) > boxplot(pat.coef[,1], con.coef[,1], main="intercepts", names=c("patients","controls")) > boxplot(pat.coef[,2], con.coef[,2], main="slope", names=c("patients","controls")) 14 / 26
Intercepts Slope -4-2 0 2-1.0-0.5 0.0 0.5 1.0 Patients Controls Patients Controls The intercept represents the level of exercise at the start of the study. As expected, there is a great deal of variation in both the intercepts and the slopes. The median intercepts are fairly similar for patients and controls, but there is somewhat more variation among patients. The slopes are higher on average for patients than for controls and the slopes tend to be positive (suggesting their exercise increases over time). 15 / 26
It makes sense to also plot the relationship between the estimated intercept and slope parameters. The dataellipse function is in the car library. > plot(c(-5,4),c(-1.2,1.2),xlab="intercept",ylab="slope",type="n", main="(individual) Estimates of slope and intercept") > points(con.coef[,1],con.coef[,2],col=1) > points(pat.coef[,1],pat.coef[,2],col=2) > abline(v=0) > abline(h=0) > legend(-4.5,-.7,c("controls","patients"),col=c(1,2),pch=c(1,1)) > dataellipse(con.coef[,1],con.coef[,2],levels=c(.5,.95),add=true, plot.points=false,col=1) > dataellipse(pat.coef[,1],pat.coef[,2],levels=c(.5,.95),add=true, plot.points=false,col=2) 16 / 26
(Individual) Estimates of slope and intercept slope -1.0-0.5 0.0 0.5 1.0 Controls Patients -4-2 0 2 4 intercept Recall that we are on the log-scale base 2 for our response, so y = 0 coincides with 1 hour of exercise a week. It looks like the two groups have a reasonably similar correlation structure for the slope and intercept. It also looks like the patients have a shifted distribution such that they tend to have higher slopes. 17 / 26
Fitting the random coefficients model in SAS This model allows for a random slope and random intercept for each subject (which are allowed to be correlated). The population-level mean structure allows for separate lines for each treatment group (control and patient). The predictor age is transformed to represent age after the start of the study or age-8. data Blackmore; set Blackmore; age_trans = age-8; run; proc mixed data=blackmore; class subject group; model log_exercise = group age_trans group*age_trans/solution ddfm=satterth; random intercept age_trans/subject=subject type=un gcorr; run; 18 / 26
The Mixed Procedure Dimensions Covariance Parameters 4 Columns in X 6 Columns in Z Per Subject 2 Subjects 231 Max Obs Per Subject 5 Estimated G Correlation Matrix Row Effect subject Col1 Col2 1 Intercept 100 1.0000-0.2808 2 age_trans 100-0.2808 1.0000 We see that the correlation between b 0i and b 1i is estimated to be negative (ρ = 0.2808). 19 / 26
The Mixed Procedure Covariance Parameter Estimates Standard Z Cov Parm Subject Estimate Error Value Pr Z UN(1,1) subject 2.0839 0.2901 7.18 <.0001 UN(2,1) subject -0.06681 0.03698-1.81 0.0708 UN(2,2) subject 0.02716 0.007975 3.41 0.0003 Residual 1.5478 0.09743 15.89 <.0001 We see that the correlation between b 0i and b 1i is estimated to be negative (ρ = 0.2808) and marginally significant with a p=0.0708. 20 / 26
Solution for Fixed Effects Standard Effect group Estimate Error DF t Value Pr > t Intercept -0.6300 0.1487 230-4.24 <.0001 group control 0.3540 0.2353 234 1.50 0.1338 group patient 0.... age_trans 0.3039 0.02386 196 12.73 <.0001 age_trans*group control -0.2399 0.03941 221-6.09 <.0001 age_trans*group patient 0.... Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F group 1 234 2.26 0.1338 age_trans 1 221 87.16 <.0001 age_trans*group 1 221 37.05 <.0001 The groups do not have significantly different intercepts (average exercise values at start of study, at age 8), but they do have significantly different slopes with the patient group having a higher slope than the control group. 21 / 26
I can capture the estimated BLUPs or û = ĜZ ˆΣ 1 (y X ˆβ) where Σ = var(y) = ZGZ + R using the ODS output and the solution option in the random statement: ods output SolutionR=blups; proc mixed data=blackmore covtest; class subject group; model log_exercise = group age_trans group*age_trans/ddfm=satterth; random intercept age_trans/subject=subject type=un gcorr solution; run; /* Solution for the random effects are BLUPs*/ ods output close; 22 / 26
proc print data=blups (obs=10); run; StdErr Obs Effect subject Estimate Pred DF tvalue Probt 1 Intercept 100 1.0095 0.7092 235 1.42 0.1560 2 age_trans 100-0.05272 0.1261 69.8-0.42 0.6771 3 Intercept 101-2.1614 0.7094 256-3.05 0.0026 4 age_trans 101 0.01287 0.1221 79.5 0.11 0.9163 5 Intercept 102 0.9339 0.7161 266 1.30 0.1933 6 age_trans 102 0.1258 0.1353 53.1 0.93 0.3567 7 Intercept 103 0.9283 0.7101 250 1.31 0.1923 8 age_trans 103 0.02691 0.1413 44.5 0.19 0.8498 9 Intercept 104 1.1407 0.7177 273 1.59 0.1131 10 age_trans 104-0.03742 0.1332 56.7-0.28 0.7798 23 / 26
Below I ve plotted the estimated BLUPs for the random slopes against the estimated slopes from the separately fit regression lines (in absolute values). Absolute values of slopes BLUP of slope 0.0 0.2 0.4 0.6 0.8 1.0 1.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 separately fit slope (individual regression) 24 / 26
Fitting the random coefficients model (in R) Using the lme function in the nlme package, we see the same estimates for the covariance parameters as in SAS: > lme.1=lme(log.exercise~i(age-8)*group, random=~i(age-8) subject, data=blackmore) > summary(lme.1) Linear mixed-effects model fit by REML Data: Blackmore Random effects: Formula: ~I(age - 8) subject Structure: General positive-definite, Log-Cholesky parametrization StdDev Corr (Intercept) 1.4435580 (Intr) I(age - 8) 0.1647954-0.281 Residual 1.2440951 25 / 26
Fitting the random coefficients model (in R) Square the estimates to match the SAS estimates: Var(Intercept)=1.4435580 2 = 2.083 Var(slope)=0.1647954 2 = 0.027.06682 Corr(Intercept, slope)= 0.1647954 1.4435580 = 0.281 Var(Residual)=1.2440951 2 = 1.548 26 / 26