Exploring Hierarchical Linear Mixed Models

Similar documents
Coping with Additional Sources of Variation: ANCOVA and Random Effects

Workshop 9.3a: Randomized block designs

13. October p. 1

Correlated Data: Linear Mixed Models with Random Intercepts

36-463/663: Hierarchical Linear Models

20. REML Estimation of Variance Components. Copyright c 2018 (Iowa State University) 20. Statistics / 36

A brief introduction to mixed models

These slides illustrate a few example R commands that can be useful for the analysis of repeated measures data.

Introduction and Background to Multilevel Analysis

lme4 Luke Chang Last Revised July 16, Fitting Linear Mixed Models with a Varying Intercept

Non-Gaussian Response Variables

Hierarchical Linear Models (HLM) Using R Package nlme. Interpretation. 2 = ( x 2) u 0j. e ij

Repeated measures, part 2, advanced methods

Outline. Mixed models in R using the lme4 package Part 3: Longitudinal data. Sleep deprivation data. Simple longitudinal data

CO2 Handout. t(cbind(co2$type,co2$treatment,model.matrix(~type*treatment,data=co2)))

Random and Mixed Effects Models - Part II

Lecture 9 STK3100/4100

Outline. Example and Model ANOVA table F tests Pairwise treatment comparisons with LSD Sample and subsample size determination

Workshop 9.1: Mixed effects models

STAT3401: Advanced data analysis Week 10: Models for Clustered Longitudinal Data

Generalized Linear Mixed-Effects Models. Copyright c 2015 Dan Nettleton (Iowa State University) Statistics / 58

Stat 209 Lab: Linear Mixed Models in R This lab covers the Linear Mixed Models tutorial by John Fox. Lab prepared by Karen Kapur. ɛ i Normal(0, σ 2 )

Value Added Modeling

Lecture 22 Mixed Effects Models III Nested designs

Intruction to General and Generalized Linear Models

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Mixed models in R using the lme4 package Part 7: Generalized linear mixed models

Stat 579: Generalized Linear Models and Extensions

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn

Outline. Statistical inference for linear mixed models. One-way ANOVA in matrix-vector form

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Outline for today. Two-way analysis of variance with random effects

R Output for Linear Models using functions lm(), gls() & glm()

Introduction to Within-Person Analysis and RM ANOVA

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Introduction to Hierarchical Data Theory Real Example. NLME package in R. Jiang Qi. Department of Statistics Renmin University of China.

Outline. Mixed models in R using the lme4 package Part 5: Generalized linear mixed models. Parts of LMMs carried over to GLMMs

36-720: Linear Mixed Models

Mixed Model Theory, Part I

Solution pigs exercise

WU Weiterbildung. Linear Mixed Models

Mixed effects models

Review of CLDP 944: Multilevel Models for Longitudinal Data

Power analysis examples using R

Multivariate Statistics in Ecology and Quantitative Genetics Summary

Mixed models in R using the lme4 package Part 2: Longitudinal data, modeling interactions

Mixed Model: Split plot with two whole-plot factors, one split-plot factor, and CRD at the whole-plot level (e.g. fancier split-plot p.

Information. Hierarchical Models - Statistical Methods. References. Outline

A (Brief) Introduction to Crossed Random Effects Models for Repeated Measures Data

Overdispersion Workshop in generalized linear models Uppsala, June 11-12, Outline. Overdispersion

Confidence Intervals, Testing and ANOVA Summary

Mixed models with correlated measurement errors

Package HGLMMM for Hierarchical Generalized Linear Models

The First Thing You Ever Do When Receive a Set of Data Is

Homework 3 - Solution

Introduction to Mixed Models in R

Introduction to the Analysis of Hierarchical and Longitudinal Data

Stat 5303 (Oehlert): Randomized Complete Blocks 1

Randomized Block Designs with Replicates

Week 8, Lectures 1 & 2: Fixed-, Random-, and Mixed-Effects models

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

#Alternatively we could fit a model where the rail values are levels of a factor with fixed effects

Mixed-effects Maximum Likelihood Difference Scaling

Random and Mixed Effects Models - Part III

Multilevel/Hierarchical Models. STA303/STA1002: Methods of Data Analysis II, Summer 2016 Michael Guerzhoy

The linear mixed model: introduction and the basic model

Stat 5303 (Oehlert): Balanced Incomplete Block Designs 1

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Subject-specific observed profiles of log(fev1) vs age First 50 subjects in Six Cities Study

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Introduction to Linear Mixed Models: Modeling continuous longitudinal outcomes

Part III: Linear mixed effects models

Time-Invariant Predictors in Longitudinal Models

Multiple Predictor Variables: ANOVA

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

Non-independence due to Time Correlation (Chapter 14)

Using R formulae to test for main effects in the presence of higher-order interactions

Overview. 1. Independence. 2. Modeling Autocorrelation. 3. Temporal Autocorrelation Example. 4. Spatial Autocorrelation Example

Introduction to General and Generalized Linear Models

STAT 740: Testing & Model Selection

Simulation and Analysis of Data from a Classic Split Plot Experimental Design

Statistical Inference: The Marginal Model

Bayesian analysis of logistic regression

Hierarchical Random Effects

Fitting mixed models in R

Latent random variables

I r j Binom(m j, p j ) I L(, ; y) / exp{ y j + (x j y j ) m j log(1 + e + x j. I (, y) / L(, ; y) (, )

Longitudinal Data Analysis. with Mixed Models

Lecture 18: Simple Linear Regression

Nested Designs & Random Effects

Additional Notes: Investigating a Random Slope. When we have fixed level-1 predictors at level 2 we show them like this:

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

36-402/608 Homework #10 Solutions 4/1

22s:152 Applied Linear Regression. Returning to a continuous response variable Y...

HW 2 due March 6 random effects and mixed effects models ELM Ch. 8 R Studio Cheatsheets In the News: homeopathic vaccines

A. A Brief Introduction to Mixed-Effects Models

22s:152 Applied Linear Regression. In matrix notation, we can write this model: Generalized Least Squares. Y = Xβ + ɛ with ɛ N n (0, Σ)

over Time line for the means). Specifically, & covariances) just a fixed variance instead. PROC MIXED: to 1000 is default) list models with TYPE=VC */

Outline for today. Maximum likelihood estimation. Computation with multivariate normal distributions. Multivariate normal distribution

Transcription:

Exploring Hierarchical Linear Mixed Models 1/49 Last time... A Greenhouse Experiment testing C:N Ratios Sam was testing how changing the C:N Ratio of soil affected plant leaf growth. He had 3 treatments. A control, a C addition, and a N addition. To ensure that any one measurement of one leaf wasn t a fluke, Sam measured 3 leaves per plant. The design is as follows: 3 Treatments (Control, C, N) 4 Pots of Plants per Treatment 3 Leaves Measured Per Pot 2/49

Our Greenhouse Experiment testing C:N Ratios 20 Growth Treatment Add C Add N Control 10 a b c d e f g h i j k l Pot Leaf Growth = Treatment Effect + Pot Variation + Error 3/49 Fitting a Varying Intercept Model for the Greenhouse Experiment library(nlme) plantlme <- lme(growth Treatment, random = 1 Pot, data=plants) 4/49

Today 1. Diagnostics in Mixed Models 2. Assessing hypotheses 3. Hierarchical Models 4. Variable Slope-Intercept models 5. A Procedure for Assessing Mixed Model Structure 6. A Taste of Generalized Linear Mixed Models 5/49 Diagnostics in a Mixed Model Framework 6/49

Diagnostics: 1. Is there a relationship between fitted and residual values? 2. Are the residuals normally distributed? 3. Is there a relationship between fitted and residual values at the group level? 4. Are the random effects normally distributed? 7/49 Fitted Values at the Group or Individual Level fitted(plantlme, level=0) a a a b b b 9.102532 9.102532 9.102532 9.102532 9.102532 9.102532 c c c d d d 9.102532 9.102532 9.102532 9.102532 9.102532 9.102532 e e e f f f 14.403830 14.403830 14.403830 14.403830 14.403830 14.403830 g g g h h h 14.403830 14.403830 14.403830 14.403830 14.403830 14.403830 i i i j j j 19.534654 19.534654 19.534654 19.534654 19.534654 19.534654 k k k l l l 19.534654 19.534654 19.534654 19.534654 19.534654 19.534654 attr(,"label") [1] "Fitted values" fitted(plantlme, level=1) a a a b b b 7.457340 7.457340 7.457340 12.425751 12.425751 12.425751 c c c d d d 10.310714 10.310714 10.310714 6.216325 6.216325 6.216325 e e e f f f 12.534204 12.534204 12.534204 15.552437 15.552437 15.552437 g g g h h h 17.099945 17.099945 17.099945 12.428732 12.428732 12.428732 i i i j j j 8/49

Residual Values at the Group or Individual Level residuals(plantlme, level=0) a a a b b -2.9707142-1.6822088-0.8998211 6.6253576 0.5521013 b c c c d 4.0388463 0.7860161 3.9622825-0.6705262-1.0759376 d d e e e -4.7815614-3.8838344-2.1943799 0.1592191-4.2750746 f f f g g 2.6840961 3.2900798-2.0974722 1.2180073 4.9705525 g h h h i 2.9111881-3.5337961-3.4706145 0.3381943 5.0448169 i i j j j 3.9780774 6.6445239-4.6891834-3.6370164-2.0014874 k k k l l 3.5273807 0.1951570 2.8855014-4.5607985-3.5464814 l -3.8404901 attr(,"label") [1] "Residuals" residuals(plantlme, level=1) a a a b -1.325521457-0.037016093 0.745371614 3.302138793 b b c c -2.771117508 0.715627450-0.422165179 2.754101256 c d d d -1.878707517 1.810269759-1.895354054-0.997627064 e e e f -0.324754172 2.028844867-2.405448831 1.535488405 f f g g 2.141472055-3.246079904-1.478108145 2.274437090 g h h h 0.215072669-1.558698732-1.495517046 2.313291744 i i i j 0.402801691-0.663937752 2.002508706-1.629248454 j j k k plot(fitted(plantlme, -0.577081416 1.058447570 1.569520279 level=1), -1.762703376 residuals(plantlme, level=1)) abline(a=0, k b=0,col="red") l l l 0.927640964-1.020857920-0.006540819-0.300549472 attr(,"label") [1] "Residuals" Residuals v. Fitted at Subsample Level 9/49 residuals(plantlme, level = 1) 3 2 1 0 1 2 3 10 15 20 fitted(plantlme, level = 1) 10/49

Normality of Residuals at Subsample Level qqnorm(residuals(plantlme, qqline(residuals(plantlme, type="pearson", level=1)) type="pearson", level=1)) Normal Q Q Plot Sample Quantiles 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2 1 0 1 2 Theoretical Quantiles 11/49 Residuals v. Fitted at Group Level plot(fitted(plantlme, level=0), residuals(plantlme, level=0)) abline(a=0, b=0,col="red") residuals(plantlme, level = 0) 4 2 0 2 4 6 10 12 14 16 18 fitted(plantlme, level = 0) 12/49

Normality of Random Effects r<-ranef(plantlme)[,1] qqnorm(r/sd(r)) qqline(r/sd(r)) Normal Q Q Plot Sample Quantiles 1.0 0.5 0.0 0.5 1.0 1.5 1.5 1.0 0.5 0.0 0.5 1.0 1.5 13/49 Theoretical Quantiles Evaluating Model Results 14/49

Evaluating Random Effects Do this first, as random effect structure alters fixed effects outcomes Use χ 2 tests for random effects - for a REML fit without any random effects, use gls OR If variance components small, need simulation approaches - see RLRsim 15/49 Evaluating the Greenhouse Experiment Random Effects plantgls <- gls(growth Treatment, data=plants) anova(plantlme, plantgls) Model df AIC BIC loglik Test plantlme 1 5 177.2003 184.6829-83.60017 plantgls 2 4 193.6161 199.6021-92.80804 1 vs 2 L.Ratio p-value plantlme plantgls 18.41574 <.0001 16/49

Evaluating the Greenhouse Experiment Random Effects library(rlrsim) exactrlrt(plantlme) simulated finite sample distribution of RLRT. (p-value based on 10000 simulated values) data: RLRT = 18.4157, p-value < 2.2e-16 More accurate, better if random effects small 17/49 Evaluating the Greenhouse Experiment Fixed Effects anova(plantlme, type="marginal") numdf dendf F-value p-value (Intercept) 1 24 27.149545 <.0001 Treatment 2 9 8.915836 0.0073 DF Denominator = Groups - DF Treatment - 1 Note type= marginal - type II 18/49

OR LRT using ML for Fixed Effects plantlme.ml <- lme(growth Treatment, random = 1 Pot, data=plants, method="ml") Anova(plantLME) Analysis of Deviance Table (Type II tests) Response: Growth Chisq Df Pr(>Chisq) Treatment 17.832 2 0.0001342 19/49 Why F-Tests or ML for Fixed Effects? F values calculated using differences in Residual Sums of Squares F tests with DF = Groups - DF Treatment - 1 are conservative χ 2 tests with REML for fixed effects are anti-conservative (type I prone) Use χ 2 tests for random effects - for a REML fit without any random effects, use gls If variance components small, need simulation approaches - see RLRsim 20/49

Contrasts with Fixed Effects library(contrast) contrast(plantlme, list(treatment = c("add C", "Add N")), list(treatment = "Control")) lme model parameter contrast Contrast S.E. Lower Upper t df -5.301297 2.470563-10.34004405-0.2625508-2.15 31 5.130825 2.470563 0.09207789 10.1695711 2.08 31 Pr(> t ) 0.0398 0.0462 21/49 Exercise: Random Effects on Richness Fit data from RIKZ survey Random Effect of Beach ONLY Compare to No Beach Effect Model (gls) Visualize Random Effects se.ranef <- function(obj) ranef(obj, standardized=t)/sapply(ranef(obj), sd) 22/49

Fitting Comparison Models rikzint <- lme(richness 1, random = 1 Beach, data=rikz) rikznobeach <- gls(richness 1, data=rikz) anova(rikzint, rikznobeach) Model df AIC BIC loglik Test rikzint 1 3 267.1142 272.4668-130.5571 rikznobeach 2 2 274.3695 277.9379-135.1848 1 vs 2 L.Ratio p-value rikzint rikznobeach 9.255284 0.0023 23/49 Fit Is Ok... 4 3 Standardized residuals 2 1 0 1 2 4 6 8 10 Fitted values 24/49

Fit Is Ok... Normal Q Q Plot Sample Quantiles 1.0 0.5 0.0 0.5 1.0 1.5 1.5 1.0 0.5 0.0 0.5 1.0 1.5 Theoretical Quantiles 25/49 Full Multilevel/Hierarchical Models 26/49

Types of Multilevel Models 0.5 0.0 0.5 1.0 1.5 2.0 y y y Variable Intercept Variable Slope Variable Slope &Intercept 1.0 1.2 1.4 1.6 1.8 2.0 0.0 0.5 1.0 1.0 1.2 1.4 1.6 1.8 2.0 1.0 0.5 0.0 0.5 1.0 1.5 2.0 2.5 1.0 1.2 1.4 1.6 1.8 2.0 x x x 27/49 Variable Intercept Models Useful with Group Level Predictors y i = α j[i] + β i x i + ɛ ij α j[i] N(µ α + x j, σ 2 α) ɛ ij N(0, σ 2 ) where i = individual sample, j = group 28/49

Each Site has a Unique Exposure - How does it Affect Species Richness? 29/49 Each Site has a Unique Exposure - How does it Affect Species Richness? 20 Richness 15 10 Beach 1 2 3 4 5 6 7 8 9 5 0 8 9 10 11 exposure Data from Zuur et al. 2009 30/49

A Variable Intercept Model for Wave Exposure varintexposure<- lme(richness exposure, random = 1 Beach, data=rikz) anova(varintexposure, type="m") numdf dendf F-value p-value (Intercept) 1 36 21.28013 <.0001 exposure 1 7 15.48873 0.0056 31/49 Plot Fit Using Extracted Components adf <- as.data.frame(t(fixef(varintexposure))) names(adf)[1] <- "intercept" p+ geom_abline(data=adf, mapping=aes(intercept=intercept, slope=exposure), lwd=2) 32/49

Plot Fit Using Extracted Components 20 Richness 15 10 Beach 1 2 3 4 5 6 7 8 9 5 0 8 9 10 11 exposure 33/49 Plot Fit Using Extracted Components adf2 <- as.data.frame(coef(varintexposure)) names(adf2)[1] <- "intercept" p+ geom_abline(data=adf2, mapping=aes(intercept=intercept, slope=exposure), color="grey") 34/49

Plotting is a Wee Bit Tricksy... 20 Richness 15 10 Beach 1 2 3 4 5 6 7 8 9 5 0 8 9 10 11 exposure 35/49 Variable Slope-Intercept Model with No Group Level Predictors y i = α j[i] + β [j]i X + ɛ ij ( ) (( ) ( )) α[i]j µα σ 2 N, α ρσ α σ β β [i]j µ β ρσ α σ β σβ 2 36/49

General Protocol for Model Fitting Variable slope? Intercept? Slope-Intercept? Why do I evaluate Fixed Effects? 1. Start with model with all fixed and random effects that may be important. Evaluate with diagnostics. 2. Evaluate random effects with full model of fixed effects (AIC, χ 2 ) 3. Evaluate fixed effects with reduced random effects (F Tests) 4. Model diagnostics again... 5. Draw inference from model 37/49 How Important is Tide Height? 20 Richness 15 10 Beach 1 2 3 4 5 6 7 8 9 5 0 1 0 1 2 NAP 38/49

Three Models with Different Random Effects varslope <- lme(richness NAP, random = 0 + NAP Beach, data=rikz) varint<- lme(richness NAP, random = 1 Beach, data=rikz) varslopeint <- lme(richness NAP, random = 1 + NAP Beach, data=rikz) 39/49 Does Slope Vary Randomly? ranef(varslope) NAP 1-4.920828e-10 2 2.658914e-09 3 2.901381e-09 4-3.135875e-10 5-7.360857e-09 6 5.445407e-09 7-4.259714e-09 8 2.708332e-09 9-1.287793e-09 40/49

SD in Variable Slope Model is Small summary(varslope)... Random effects: Formula: 0 + NAP Beach NAP Residual StdDev: 0.0001139427 4.159929... 41/49 Evaluation of Different Random Effects Models anova(varslope, varslopeint) Model df AIC BIC loglik Test varslope 1 4 260.2010 267.2458-126.1005 varslopeint 2 6 244.3839 254.9511-116.1919 1 vs 2 L.Ratio p-value varslope varslopeint 19.81713 <.0001 anova(varslopeint, varint) Model df AIC BIC loglik Test varslopeint 1 6 244.3839 254.9511-116.1919 varint 2 4 247.4802 254.5250-119.7401 1 vs 2 L.Ratio p-value varslopeint varint 7.096378 0.0288 42/49

A Simulation-Based Approach to Random Slope Assessment Assumes no covarnace between slope-intercept random effect varslopeint2 <- lme(richness NAP, random = list( 1 Beach, 0+NAP Beach), data=rikz) To test slope random effect m = slope only, ma = all random effect, m0 = model without slope exactrlrt(m=varslope, ma=varslopeint2, m0=varint) simulated finite sample distribution of RLRT. (p-value based on 10000 simulated values) data: RLRT = 1.4804, p-value = 0.0991 43/49 Evaluation of Fixed Effects anova(varslopeint, type="m") numdf dendf F-value p-value (Intercept) 1 35 27.13837 <.0001 NAP 1 35 15.32416 4e-04 44/49

Final Model 20 Richness 15 10 Beach 1 2 3 4 5 6 7 8 9 5 0 1 0 1 2 NAP 45/49 Exercise: RIKZ Tide Height and Shoreline Angle Evaluate the effect of angle1 (sample angle) & NAP on Richness Note: You already know the slope of the NAP relationship doesn t vary randomly Check for a NAP*angle1 interaction 46/49

nlme versus lme4 nlme - can work like nls for flexible nonlinear specification nlme - can accomodate specified correlation structures lmer - can fit more complex models glmer - can fit Generalized Linear Mixed Models (GLMM) 47/49 lmer for a GLMM library(lme4) angi_lmer <- glmer(richness angle1*nap + (1 Beach), data=rikz, family=poisson(link="log")) anova(angi_lmer, type="m") Analysis of Variance Table Df Sum Sq Mean Sq F value angle1 1 0.321 0.321 0.3207 NAP 1 41.359 41.359 41.3593 angle1:nap 1 4.948 4.948 4.9484 Testing GLMMs is trickier, but RLRsim and LRT fixed effects can (usually) get you there - see http://glmm.wikidot.com/faq for more! 48/49

Or we can always go Bayesian! library(mcmcglmm) angi_mcmcglmm <- MCMCglmm(Richness angle1*nap, random = Beach, data=rikz, family="poisson", verbose=f) summary(angi_mcmcglmm)$gcovariances post.mean l-95% CI u-95% CI eff.samp Beach 0.3892111 0.0588447 0.8634476 117.9257 summary(angi_mcmcglmm)$solutions post.mean l-95% CI u-95% CI (Intercept) 1.832249814 1.3732623849 2.271485074 angle1-0.004186804-0.0082018560-0.001161515 NAP -0.782624141-1.0237336673-0.558918778 angle1:nap 0.002903180 0.0009772816 0.006211612 eff.samp pmcmc (Intercept) 220.58814 0.001 angle1 54.22571 0.020 NAP 27.83021 0.001 angle1:nap 27.14478 0.024 49/49 Ideal because we can get credible intervals on all fixed random effects!