36-463/663: Hierarchical Linear Models

Size: px

Start display at page:

Download "36-463/663: Hierarchical Linear Models"

Kathryn King
5 years ago
Views:

1 36-463/663: Hierarchical Linear Models Lmer model selection and residuals Brian Junker 132E Baker Hall 1 Outline The London Schools Data (again!) A nice random-intercepts, random-slopes model Residuals in MLM s Marginal residuals Conditional residuals Random effects residuals Variable selection Overall fit statistics Simulation-based methods (save for later!) AIC, BIC, DIC and all that An improved model for the London Schools data 2

2 The London Schools Data Student ( ) Gender (0=Female, 1=Male), per student VR = verbal reasoning level (High/Med/Low) LRT = London Reading test (at beginning of year) Y = end-of-year test School (1..38) School.gender (All.Boy, All.Girl, Mixed) School.denom (Other,CofE,RomCath,State) 3 London Schools Data The dotted lineis the pooledregression (ignoring schools) The solid linesare unpooledregressions (separate for each school) The solid lines look like a random sample of lines, with mean the solid line! Unpooled Pooled Note these plots were made with an earlier version of ggplot2. An alternative approach with the help of gridextra is shown in the R handout. 4

The London Schools Data This suggests a model like or, as a variance components model As an R model this would be Y ~ 1 + LRT + (1 + LRT school) 5 The London Schools Data > display(lmer.

3 The London Schools Data This suggests a model like or, as a variance components model As an R model this would be Y ~ 1 + LRT + (1 + LRT school) 5 The London Schools Data > display(lmer.1 + <- lmer(y ~ 1 + LRT + + (1 + LRT school),data=school.frame)) Unpooled Pooled mlm alphas mlm betas coef.est coef.se (Intercept) LRT Error terms: Groups Name Std.Dev. Corr school (Int) 0.23 LRT Residual number of obs: 1978, groups: school, 38 AIC = , DIC = deviance =

4 Residuals In ordinary linear regression (and glm s) the residuals are easy to think about: E[y i ] = g -1 (X i β) r i = y i E[y i ] Multi-level models pose a couple of challenges 7 Residuals in Multi-Level Models Level 1 Level 2 Where are they? Level 1? Level 2? Some combination? What are they? The α s are random draws, so does the following make sense? E[y ij ] = α 0j + α 1j LRT ij?? r ij = y ij -E[y ij ]?? 8

5 Residuals in Multi-Level Models The variance components version of the model could be re-expressed in matrix form as i.e. Laird & Ware (1982, Biometrics) 9 Residuals in Multi-Level Models Given the Laird-Ware form, can formulate 3 different kinds of residuals: Marginal residuals: Conditional residuals: Random effects: In practice, estimate with, the MLE, and estimate with Although there is only one per group, there are as many s as there are observations. Nobre& Singer (2007, Biometrical Journal) 10

6 Residuals in Multi-Level Models Marginal residuals: Should be mean 0, but may show grouping structure May not be homoskedastic! Good for checking fixed effects, just like linear regr. Conditional residuals: Should be mean zero with no grouping structure Should be homoskedastic! Good for checking normality of ǫ, outliers Random effects: Will generally not be mean-zero May not be be homoskedastic! Good for checking normality of, outliers 11 Residuals in the London Schools Data > str(fixef(lmer.1)) > beta0 <- fixef(lmer.1)[1] > beta1 <- fixef(lmer.1)[2] > str(ranef(lmer.1)) > eta <- ranef(lmer.1)$school > attach(school.frame) > X <- cbind(1,lrt) > blocks <- lapply(split(x,school), + function(x){matrix(x,ncol=2)}) > J <- length(blocks) > n <- dim(school.frame)[1] > Z <- matrix(0,nrow=n,ncol=j*2) > row <- 1 > for (j in 1:J) { + col <- 2*j + nj <- dim(blocks[[j]])[1] + Z[row:(row+nj-1),c(col-1,col)] <- + blocks[[j]] + row <- row + nj + } > beta <- rbind(beta0,beta1) > # so beta is a column vector > eta <- c(t(eta)) > # so eta is a column vector > resid.marg <- Y - X%*%beta > resid.cond <- Y - X%*%beta - Z%*%eta > resid.reff <- Z%*%eta 12

Residuals in the London Schools Data Marginal residuals look pretty good Conditional residuals Marginal Residuals -2-1 0 1 2 Conditional Residuals -2-1 0 1 2 0 500 1000 1500 2000 Index 0 500 1000

7 Residuals in the London Schools Data Marginal residuals look pretty good Conditional residuals Marginal Residuals Conditional Residuals Index Index look pretty good Rand Effect residuals Random Effects look weird Index 13 Residuals in the London Schools Data Marginal residuals plotted by school No noticeable patterns Nice set of residuals 14

plotted by school The scaleof these residuals is smaller!

8 Residuals in the London Schools Data Conditional residuals plotted by school No noticeable patterns Nice set of residuals 15 Residuals in the London Schools Data Rand Effect residuals plotted by school The scaleof these residuals is smaller! A few schools show noticabledeviation from zero We do not expect mean-zero, but the BLUP estimates should cluster around a mean 16

9 Standardized Residuals We were looking at raw residuals You could standardize by dividing the raw residuals by the square root of the variance components, for example Quick and dirty: divide by their sample SD! If there are severe outliers, the sample SD will be inflated and this won t work well If we have standardized residuals, outlier detection is a bit easier. There are simulation-based methods for detecting outliers as well 17 Residuals: Practical Advice Looking at some residuals is better than looking at none. In many MLM s, marginal and conditional residuals can be used roughly as you would with ordinary linear regression It is worthwhile to plot residuals again the group/cluster indicators To identify and fix problems, plot residuals against other variables (within and/or across clusters), try transformations, etc residuals(lmer.1)gives you the conditional residuals! 18

10 The London Schools Data Variable Selection How can we improve the model? We have a bunch of other variables lying around: Unit-level (student): Gender, VR Group-level (school): School.denom, School.gender Which ones to include? Fixed effects or random effects? Interactions? Etc. 19 Comparing models Two basic tools Overall Fit Indices Likelihood ratio test / Deviance statistics AIC, BIC, DIC and all that Simulation-based Checks Fake data methods Posterior predictive checks 20

11 Overall fit indices Pro s: Quick-and-dirty (a drop of 2 is marginally interesting, a drop of 3 is interesting, a drop of 10 is a big deal) Tells you whether one model fits better overall than another Con s: Doesn t tell whether the best model really fits If a model doesn t fit, doesn t tell you what went wrong We will discuss these today 21 Simulation-based checks Con s The method is not automatic You have to choose a feature of the data or model that you want to check You have to program the simulation Pro s You can focus on particular features of the model or data that you are worried about You can see what went wrong with the model, and sometimes fix it We will discuss these later in the semester 22

12 Overall Fit Indices Basic tool: deviance of the model M Direct use: If M 0 is nested in M 1, then approx: Used to compare ordinary linear models, generalized linear models, etc. 23 What if models are not nested? The Chi-squared theory for D(M) is gone. Other considerations (prediction accuracy, Bayesian decis. theory, ) lead to penalized deviance measures: D P (M) = D(M) + P(#parameters) Three common cases (let k = #parameters) AIC = D(M) + 2 k BIC = D(M) + k log(n) DIC = mean sim (D(M)) + 2 k D D(M) + 2 k D k D is estimated#parameters 24

13 Why are we fudging about k vs k D? In ordinary linear models(and glm s), k = #X s. What about σ 2? what about over-dispersion parameter? In both M 1 and M 0, so df 1 df 0 is the same either way). In multilevel modelsit s not so clear: one-way ANOVA with J cells (df=j) fitting grand mean only (df=1) 1 k D J, depending on size of τ Variable Selection: Practical Advice Start with multilevel model that represents your initial guesses about group structure in the data Do variable selection on all the fixed effects first, using AIC or DIC (or if you prefer, BIC) AIC will result in bigger modelsthat predict better BIC will result in smaller models that interpret better DIC will result in models between AIC and BIC sizes Deviance/LRT only valid if models have same random effects, and are nested Then go back and use AIC or DIC (or BIC) to do selection on random effects 26

14 Digression: REML vs MLE In order to properly use AIC, BIC, or even LR tests, must calculate the true maximum log-likelihood. By default, lmer() calculates estimates known as REML estimates. Less biased estimates than true MLE s, especially for variance components Essentially, calculates fixed effect estimates by leastsquares, and calculates variance components from marginal residuals REML estimates do not maximize the true likelihood For AIC, BIC, LRT, etc., need to refit using MLEs! 27 Digression: REML vs MLE Can use lmer() or update() function to get MLE fit lmer.1 <-lmer(y ~ 1 + LRT + (1 + LRT school), data=school.frame, REML=F) lmer.1 <-update(lmer.1,. ~., REML=F) Can produce substantial differences in likelihood Use AIC(), BIC(), loglik() to extract these values directly from fitted model Not valid to compare models with these numbers OK to compare models with these numbers ANOVA() always refits using MLEs so that comparisons are valid 28

15 Back to London Schools Data > names(tmp) # main variabes in school.frame # [1] "Y" "LRT" "Gender" "School.gender" # [5] "School.denom" "VR" > lmer.2 <- update(lmer.1,. ~. + Gender) > anova(lmer.1,lmer.2) # refitting model(s) with ML (instead of REML) # Df AIC BIC loglik deviance Chisq Chi Df Pr(>Chisq) # lmer # lmer # --> AIC, BIC prefer lmer.2 > lmer.3 <- update(lmer.2,. ~. + School.gender) > anova(lmer.2,lmer.3) # refitting model(s) with ML (instead of REML) # Df AIC BIC loglik deviance Chisq Chi Df Pr(>Chisq) # lmer # lmer ** # --> AIC prefers lmer.3, BIC prefers lmer.2 Etc! 29 London Schools Data Tried Gender, School.gender, School.denom, and VR stepwise as fixed effects, found only Gender and VR reduce AIC by more than 3, so keep them. Gender and VR. Trying to convert Gender and VR to random effects does not improve AIC enough to keep them, so the final model we obtain is > lmer.5 <- lmer(y LRT + Gender + VR + + (1 + LRT school), data=school.frame)) 30

16 London Schools Data final model > display(lmer.5) lmer(formula = Y ~ LRT + Gender + VR + (1 + LRT school), data = school.frame) coef.est coef.se (Intercept) LRT Gender VRLow VRMed Error terms: Groups Name Std.Dev. Corr school (Intercept) 0.23 LRT Residual number of obs: 1978, groups: school, 38 AIC = , DIC = deviance = > anova(lmer.1,lmer.5) refitting model(s) with ML (instead of REML) Df AIC BIC loglik lmer lmer Some Automatic & Exact Methods There are a number of R packages that will do variable selection for lmer models, including: LMERConvenienceFunctionsautomates backwards selection of fixed effects and forward selection of random effects, using AIC, BIC, etc. fitlmer.fnc() is general-purpose function for this RLRsimprovides simulation-based exact likelihood ratio tests for random effects exactlrt() performs exact LRT test for true ML fits exactrlrt() performs exact LRT test for REML fits 32

17 Automated Variable Selection > library(lmerconveniencefunctions) # for fitlmer.fnc() function... # start with a "big fixed effects" model > lmer.10 <- lmer(y ~ LRT + VR + Gender + School.gender +School.denom + + (1+LRT school), data=school.frame) > bic_best.11 <- fitlmer.fnc(lmer.10, + ran.effects=c("(school.gender school)", + "(School.denom school)"),method="bic") fitlmer.fnc: > anova(lmer.5,lmer.10,lmer.11) 1. Backwards elimination of F.E s refitting model(s) with ML (instead of REML) 2. Forward selection of R.E. s Data: school.frame 3. Backwards elimination of F.E. s Models: lmer.5: Y ~ LRT + Gender + VR + (1 + LRT school) lmer.11: Y ~ LRT + VR + Gender + (1 + LRT school) lmer.10: Y ~ LRT + VR + Gender + School.gender + School.denom + (1 + LRT lmer.10: school) Df AIC BIC loglik deviance Chisq Chi Df Pr(>Chisq) lmer lmer <2e-16 *** lmer Exact Test of Random Effect.. library(rlrsim) m0 <- lmer(y ~ LRT + VR + Gender + (1 school), data=school.frame) lmer.11a <- lmer(y ~ LRT + VR + Gender + (1 school) + (0 + LRT school), data=school.frame) # need indep rand effects for RLRsim... lmer.lrt.only <- lmer(y ~ LRT + VR + Gender + (0 + LRT school), data=school.frame) formula(m0) # formula under H0: no random slopes for LRT formula(lmer.11a) # model under HA: yes random slopes for LRT formula(lmer.lrt.only) # model with *only* random slopes for LRT exactrlrt(lmer.lrt.only,lmer.11a,m0) # simulated finite sample distribution of RLRT. # # (p-value based on simulated values) # # data: # RLRT = , p-value =

18 Summary The London Schools Data (again!) A nice random-intercepts, random-slopes model Residuals in MLM s Marginal residuals Conditional residuals Random effects residuals Variable selection Overall fit statistics Simulation-based methods (save for later!) AIC, BIC, DIC and all that An improved model for the London Schools data 35

36-463/663: Multilevel & Hierarchical Models

36-463/663: Multilevel & Hierarchical Models (P)review: in-class midterm Brian Junker 132E Baker Hall brian@stat.cmu.edu 1 In-class midterm Closed book, closed notes, closed electronics (otherwise I have