Modelling heterogeneous variance-covariance components in two-level multilevel models with application to school effects educational research

Similar documents
University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /

Summary of Talk Background to Multilevel modelling project. What is complex level 1 variation? Tutorial dataset. Method 1 : Inverse Wishart proposals.

Multilevel Modeling: A Second Course

An Application of a Mixed-Effects Location Scale Model for Analysis of Ecological Momentary Assessment (EMA) Data

Time-Invariant Predictors in Longitudinal Models

Estimating a Piecewise Growth Model with Longitudinal Data that Contains Individual Mobility across Clusters

Hierarchical Linear Modeling. Lesson Two

1 Introduction. 2 Example

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Designing Multilevel Models Using SPSS 11.5 Mixed Model. John Painter, Ph.D.

Statistical Distribution Assumptions of General Linear Models

The Basic Two-Level Regression Model

LINEAR MULTILEVEL MODELS. Data are often hierarchical. By this we mean that data contain information

Time-Invariant Predictors in Longitudinal Models

A multivariate multilevel model for the analysis of TIMMS & PIRLS data

Spring RMC Professional Development Series January 14, Generalized Linear Mixed Models (GLMMs): Concepts and some Demonstrations

Time Invariant Predictors in Longitudinal Models

Two-Level Models for Clustered* Data

A Non-parametric bootstrap for multilevel models

Time-Invariant Predictors in Longitudinal Models

Additional Notes: Investigating a Random Slope. When we have fixed level-1 predictors at level 2 we show them like this:

An Introduction to Multilevel Models. PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012

MLMED. User Guide. Nicholas J. Rockwood The Ohio State University Beta Version May, 2017

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Journal of Statistical Software

Time-Invariant Predictors in Longitudinal Models

Describing Change over Time: Adding Linear Trends

Generalized Linear Models (GLZ)

Specifying Latent Curve and Other Growth Models Using Mplus. (Revised )

H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL

WU Weiterbildung. Linear Mixed Models

Assessing the relation between language comprehension and performance in general chemistry. Appendices

Partitioning variation in multilevel models.

Longitudinal Modeling with Logistic Regression

1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available as

INTRODUCTION TO MULTILEVEL MODELLING FOR REPEATED MEASURES DATA. Belfast 9 th June to 10 th June, 2011

Random Intercept Models

The Application and Promise of Hierarchical Linear Modeling (HLM) in Studying First-Year Student Programs

Harvard University. Rigorous Research in Engineering Education

Model Assumptions; Predicting Heterogeneity of Variance

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Multilevel Analysis, with Extensions

Application of Item Response Theory Models for Intensive Longitudinal Data

Multilevel Structural Equation Modeling

Ron Heck, Fall Week 3: Notes Building a Two-Level Model

Review of Multiple Regression

MULTILEVEL IMPUTATION 1

Bayesian Methods in Multilevel Regression

Hierarchical Linear Models. Jeff Gill. University of Florida

Multi-level Models: Idea

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

Models for Clustered Data

Models for Clustered Data

Dyadic Data Analysis. Richard Gonzalez University of Michigan. September 9, 2010

Exploiting TIMSS and PIRLS combined data: multivariate multilevel modelling of student achievement

Hierarchical Linear Models (HLM) Using R Package nlme. Interpretation. 2 = ( x 2) u 0j. e ij

Advanced Quantitative Data Analysis

Three-Level Modeling for Factorial Experiments With Experimentally Induced Clustering

LDA Midterm Due: 02/21/2005

Recent Developments in Multilevel Modeling

Multilevel/Mixed Models and Longitudinal Analysis Using Stata

Journal of Statistical Software

Multiple Linear Regression II. Lecture 8. Overview. Readings

Multiple Linear Regression II. Lecture 8. Overview. Readings. Summary of MLR I. Summary of MLR I. Summary of MLR I

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

Nonlinear multilevel models, with an application to discrete response data

Univariate analysis. Simple and Multiple Regression. Univariate analysis. Simple Regression How best to summarise the data?

TABLE OF CONTENTS INTRODUCTION TO MIXED-EFFECTS MODELS...3

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models

Multiple Linear Regression II. Lecture 8. Overview. Readings

Multiple Linear Regression II. Lecture 8. Overview. Readings. Summary of MLR I. Summary of MLR I. Summary of MLR I

Generalized Linear Models for Non-Normal Data

Ronald Heck Week 14 1 EDEP 768E: Seminar in Categorical Data Modeling (F2012) Nov. 17, 2012

Part 8: GLMs and Hierarchical LMs and GLMs

Bayesian Hierarchical Models

Tutorial 6: Tutorial on Translating between GLIMMPSE Power Analysis and Data Analysis. Acknowledgements:

CAMPBELL COLLABORATION

Multilevel Analysis of Assessment Data

Discrete Response Multilevel Models for Repeated Measures: An Application to Voting Intentions Data

A 3-level Bayesian mixed effects location scale model with an application to ecological momentary assessment data

Correlation and Regression Bangkok, 14-18, Sept. 2015

Online appendices for: Explaining Fixed Effects: Random Effects modelling of Time-Series Cross-Sectional and Panel Data

Introducing Generalized Linear Models: Logistic Regression

Review of Multilevel Models for Longitudinal Data

Random Coefficient Model (a.k.a. multilevel model) (Adapted from UCLA Statistical Computing Seminars)

Correlation & Simple Regression

Introduction to Within-Person Analysis and RM ANOVA

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

STA 216, GLM, Lecture 16. October 29, 2007

Bayesian Networks in Educational Assessment

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

Estimation and Centering

Determining Sample Sizes for Surveys with Data Analyzed by Hierarchical Linear Models

Product Held at Accelerated Stability Conditions. José G. Ramírez, PhD Amgen Global Quality Engineering 6/6/2013

Introduction to. Multilevel Analysis

AN APPLICATION OF MULTILEVEL MODEL PREDICTION TO NELS:88

Lecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression:

Interactions among Continuous Predictors

Variance partitioning in multilevel logistic models that exhibit overdispersion

Simple, Marginal, and Interaction Effects in General Linear Models: Part 1

Transcription:

Modelling heterogeneous variance-covariance components in two-level multilevel models with application to school effects educational research Research Methods Festival Oxford 9 th July 014 George Leckie Centre for Multilevel Modelling Graduate School of Education University of Bristol

1. INTRODUCTION

The standard two-level random-slope multilevel model The standard two-level (e.g. students within schools or repeated measures within subjects) random-slope multilevel model can be written as where Y ij = β 0 + β 1 X ij fixed part + u 0j + u 1j X ij + e ij random part u 0j u 1j ~N 0 0, σ u0 σ u0u1 σ u1 e ij ~N(0, σ e ) Every school is modelled as having its own regression line with its own intercept, β 0 + u 0j, and its own slope, β 1 + u 1j, but all school are constrained to have a common residual error variance, σ e However, it will often be substantively interesting to model this residual error variance as heterogeneous across students and schools, σ eij

What we do We extend the standard random-slope model by modelling the level-1 variance as a log-linear function of the covariates and further random effects Mean function: Y ij = β 0 + β 1 X ij fixed part + u 0j + u 1j X ij + e ij random part where Level-1 variance function: log σ eij = α 0 + α 1 X ij fixed part + v 0j + v 1j X ij random part u 0j u 1j v 0j v 1j ~N 0 0 0 0, σ u0 σ u01 σ u1 σ u0v0 σ u1v0 σ v0 σ u0v1 σ u1v1 σ v01 σ v1, e ij ~N(0, σ eij ) We won t discus modelling the different variances and covariances of the level- covariance matrix as a function of the covariates, but this is possible

. SOFTWARE

Likelihood-based methods Not possible to fit these models using routine commands in general-purpose packages such as R, SAS, SPSS and Stata, nor is it possible to fit these models in dedicated multilevel modelling packages such as MLwiN, HLM, and SuperMix ASReml and GenStat: Assume independent random effects SAS PROC NLMIXED: Two-level models only; slow; sensitive to starting values MIXREGLS: Developed by Don Hedeker; Two-level random-intercept models only; computationally faster and more stable than SAS; fiddly to use We have written runmixregls, a command to call MIXREGLS from within Stata http://www.bristol.ac.uk/cmm/software/runmixregls/

MCMC methods WinBUGS: Highly flexible; fiddly to use; computationally fairly slow Stat-JR: Easy to use, computationally faster than WinBUGS, developed by the MLwiN team! We have developed a LevelRSCVGL template to fit this calls of model Need a better name! http://www.bristol.ac.uk/cmm/software/statjr/

3. ILLUSTRATIVE APPLICATION

Studies of school effects Most studies of school effects focus on estimating mean differences in student achievement Which schools score highest, having adjusted for intake differences? What school polices and practices make some schools more effective than others? Rarely is anything said about whether there might be variance differences in student achievement However, just as schools influence the mean achievement of their students, they are likely to influence the dispersion in their students achievements Which schools widen initial inequalities and which schools narrow them? What school polices and practices drive these differences?

MLwiN tutorial dataset Inner-London schools exam scores dataset 4,059 students (level-1) nested within 65 schools (level-) to 198 students per school (mean = 6 students) Response is a standardised age 16 exam score Main covariates are A standardised age 11 exam score Student gender

Observed school means and within-school variances There is substantial variability in both school means and within-school variances There is a moderate positive association between the two (r = 0.9)

Specify a log-linear level-1 variance function First we specify a log-linear level-1 variance function for the within-school variance and we include a new set of school random effects Mean function SCORE16 ij = β 0 + u j + e ij u j ~N 0, σ u e ij ~N(0, σ ej ) Level-1 variance function log(σ ej ) = α 0 + v j v j ~N 0, σ v where u j and v j are allowed to covary with covariance σ uv (correlation ρ uv ) Every school has its own mean β 0j = β 0 + u j and variance σ ej = exp (α 0 + v j )

Random within-school variances Model 1 is simply a reparameterised variance-components model where log σ e = α 0 Model includes the new school random effects log(σ ej ) = α 0 + v j, v j ~N 0, σ v Mean function Level-1 variance function Model 1 Model Parameter Mean SD Mean SD β 0 Intercept -0.0 0.06-0.0 0.05 σ u Intercept variance 0.18 0.04 0.19 0.04 α 0 Intercept -0.17 0.0-0. 0.05 σ v Intercept variance 0.11 0.03 Cross-function ρ uv Correlation 0.36 0.14 DIC 10910 10783 Model is preferred to Model 1 as shown by drop in DIC of 17 points Note that the estimated intercept has decreased from -0.17 to -0.. Why?

Caterpillar plots of school means and within-school variances β 0j = β 0 + u j σ ej = exp (α 0 + v j ) While 35 schools differ significantly from the population-average school mean, only 17 schools differ significantly from the population-average within-school variance

Caterpillar plot of intraclass correlation coefficients σ u corr y ij, y i j = σ u + exp (α 0 + v j ) The expected correlation between two students from the same school ranges from 0.11 to 0.9 Few schools differ significantly from the population-average correlation of 0.18

Add covariates to the mean function Model Model 3 Parameter Mean SD Mean SD β 0 Intercept -0.0 0.05-0.09 0.05 Mean function β 1 Age 11 scores 0.55 0.01 β Girl 0.17 0.03 σ u Intercept variance 0.19 0.04 0.10 0.0 Level-1 variance function α 0 Intercept -0. 0.05-0.60 0.04 σ v Intercept variance 0.11 0.03 0.06 0.0 Cross-function ρ uv Correlation 0.36 0.14 0.03 0.01 β 1 = 0.55 and so age 11 scores are strongly predictive of age 16 scores β = 0.17 and so girls make more progress than similar initial achieving boys Between-school variance σ u reduces by 47% DIC 10783 9194 Population-average of the within-school variances E(σ ej ) reduces by 33% Population-variance of the within-school variances Var(σ ej ) reduces by 78%

Add a random slope to the mean function Are schools differentially effective for different types of students? Are the schools that are best for high initial achievers different from the schools that are best for low initial achievers? Does the gender gap vary across schools? Are there some schools where boys actually outperform girls? Model 4 allows the age 11 slope coefficient to vary across schools SCORE16 ij = β 0 + β 1 SCORE11 ij + β GIRL ij + u 0j + u 1j SCORE11 ij + e ij log(σ ej ) = α 0 + v j u 0j u 1j v j ~N 0 0 0, σ u0 σ u01 σ u1 σ u0v σ u1v σ v e ij ~N(0, σ ej )

Predicted mean function school lines Age 11 scores are more predictive of age 16 scores in some schools than in others Schools with steeper slopes widen initial achievement differences School choice matters more for high initial achievers?

Add covariates to the level-1 variance function Model 4 Model 5 Parameter Mean SD Mean SD Mean function Level-1 variance function Cross-function α 0 Intercept -0.63 0.04-0.57 0.05 α 1 Age 11 scores -0.07 0.0 α Girl -0.10 0.05 σ v Intercept variance 0.06 0.0 0.06 0.0 ρ u0v Intercept-intercept correlation 0.40 0.17 0.45 0.15 ρ u1v Slope-intercept correlation 0.76 0.14 0.77 0.13 DIC 9133 911 The mean function parameters hardly change and are omitted from the table α 1 = 0.07 and so, within schools, low initial achievers tend to score more variably than high initial achievers α = 0.10 and so, within schools, girls tend to score less variably than boys

Add a random slope to the level-1 variance function Do schools have differentially dispersed outcomes for different types of students? Are the schools that are least dispersed for high initial achievers different from the schools that are least dispersed for low initial achievers? Does the gender dispersion gap vary across schools? Model 6 adds a random slope to the level-1 variance function SCORE16 ij = β 0 + β 1 SCORE11 ij + β GIRL ij + u 0j + u 1j SCORE11 ij + e ij log(σ eij ) = α 0 + α 1 SCORE11 ij + α GIRL ij + v 0j + v 1j SCORE11 ij u 0j u 1j v 0j v 1j ~N 0 0 0 0, σ u0 σ u01 σ u1 σ u0v0 σ u1v0 σ v0 σ u0v1 σ u1v1 σ v01 σ v1 e ij ~N(0, σ eij )

Predicted level-1 variance function school lines Three schools actually go against the overall trend and should be examined further What is it about these three schools which leads their highest initial achieving students to perform more erratically than their lowest initial achieving students?

Explaining the differences between schools So far we have quantified differences in effectiveness and dispersion between schools and how the magnitude of these differences vary as function of initial achievement The obvious next step is to seek to explain these differences in terms of schoollevel predictors W j Entering W j as a main effect into the mean function will explain away σ u0 Entering W j as a cross-level interaction with X ij into the mean function will explain away σ u1 Entering W j as a main effect into the level-1 variance function will explain away σ v0 Entering W j as a cross-level interaction with X ij into the level-1 variance function will explain away σ v1

4. SIMULATION STUDY

Can we ignore the random effects? Many packages allow you to fit limited level-1 variance functions with no random effects R, SAS, SPSS, Stata HLM, MLwiN However, we have carried out simulations which show that ignoring level- variability in the level-1 variances leads the level-1 variance function regression coefficients to be estimated with spurious precision This problem is particularly acute for the coefficients of level- covariates We run the risk of making Type I errors of inference about predictors of level-1 variance This problem is analogous to ignoring clustering in linear regression

5. CONCLUSION

Conclusion We have extended the standard two-level random-slope model to model the residual error variance as a function of the covariates and additional random effects We are implementing this in runmixregls and the new Stat-JR software http://www.bristol.ac.uk/cmm/software/runmixregls/ http://www.bristol.ac.uk/cmm/software/statjr The principle of modelling within-group variances as randomly varying across groups applies to multilevel models more generally, including those with additional levels, crossed random effects and discrete responses The discussed methods are relevant to any study where there is interest on estimating dispersion differences on outcome variables across groups

References to our work Goldstein, H., Leckie, G., Charlton, C., and Browne, W. Multilevel models with random effects for level 1 variance functions, with application to child growth data. Submitted. Leckie, G. (013). Modeling the residual error variance in Two-Level Random- Coefficient Multilevel Models. Bulletin of the International Statistical Institute, 68, 1-6. Leckie, G. (014). runmixregls - A Program to Run the MIXREGLS Mixed-effects Location Scale Software from within Stata. Journal of Statistical Software, Code Snippet, 1-41. Forthcoming. Leckie, G., French, R., Charlton, C., and Browne, W. (015). Modeling Heterogeneous Variance-Covariance Components in Two-Level Models. Journal of Educational and Behavioral Statistics. Forthcoming.

References to other work Hedeker, D., Mermelstein, R. J., & Demirtas, H. (008). An Application of a Mixed-Effects Location Scale Model for Analysis of Ecological Momentary Assessment (EMA) Data. Biometrics, 64, 67-634. Lee, Y., & Nelder, J. A. (006). Double hierarchical generalized linear models (with discussion). Applied Statistics, 55, 139 185. Rast, P., Hofer, S. M., & Sparks, C. (01). Modeling individual differences in within-person variation of negative and positive affect in a mixed effects location scale model using BUGS/JAGS. Multivariate Behavioral Research, 47, 177-00.

What about modelling the level- variance-covariance matrix? It is relatively easy to model a variance-covariance matrix as a function of the covariates u j v j ~N 0 0, σ uj σ uvj σ vj log σ uj = κ 0 + κ 1 W j log σ vj = γ 0 + γ 1 W j tanh 1 ρ uvj = δ 0 + δ 1 W j However, simply specifying appropriate link functions will no longer ensure positive definiteness in 3 3 and larger variance-covariance matrices In MCMC sampler, reject any proposed parameter values which give rise to variance-covariance matrices which are not positive definite