Multilevel Structural Equation Modeling with lavaan

Size: px

Start display at page:

Download "Multilevel Structural Equation Modeling with lavaan"

Samuel Crawford
5 years ago
Views:

1 VIII European Congress of Methodology University of Jena 24 July / 159

2 Contents 1 Before we start From regression to structural equation modeling The essence of SEM Model estimation (covariance structure only) Model evaluation Introduction to lavaan Introduction multilevel regression The linear mixed model (optional) Example: the High School and Beyond data Multilevel SEM Introduction Frameworks (and software) for multilevel SEM The two-level SEM model with random intercepts Two-level SEM in lavaan Further topics 77 2 / 159

3 3.1 The status of a latent variable in a two-level SEM The status of observed covariates in a two-level SEM Evaluating model fit Examples Example: two-level CFA Example: two-level SEM Alternative approaches to analyze multilevel data The wide data approach The survey (design-based) approach MUthén ML (MUML) estimator using multiple group SEM Last slide Appendix Short history Loglikelihood of a two-level SEM Further reading / 159

4 1 Before we start 1.1 From regression to structural equation modeling univariate linear regression 1 x 1 β 0 ɛ x 1 x 2 β 1 β 2 y x 2 y β 3 x 3 x 3 β 4 x 4 x 4 y i = β 0 + β 1 x i1 + β 2 x i2 + β 3 x i3 + β 4 x i4 + ɛ i (i = 1, 2,..., n) 4 / 159

5 multivariate regression x 1 x 2 y 1 x 3 y 2 x 4 5 / 159

6 path analysis testing models of causal relationships among observed variables all variables are observed (manifest) system of regression equations y 1 y 5 y 2 y 3 y 4 y 6 y 7 6 / 159

7 confirmatory factor analysis (CFA) factor analysis: representing the relationship between one or more latent variables and their (observed) indicators y 1 y 2 η 1 y 3 y 4 y 5 η 2 y 6 7 / 159

8 structural equation modeling (SEM) path analysis with latent variables y 7 y 8 y 9 y 10 y 11 y 12 y 1 y 2 η 1 η 3 η 4 y 3 y 4 y 5 η 2 y 6 8 / 159

9 1.2 The essence of SEM the goal of SEM is to test an a priori specified theory (which often can be depicted as a path diagram) each diagram has model-based implications: how should the main data characteristics (means, variances, covariances) look like if the model holds? in the standard CFA model, the implied sample statistics are: Σ = ΛΨΛ T + Θ µ = ν + Λα full SEM model (including a structural part): Σ = Λ(I B) 1 Ψ(I B) 1 Λ T + Θ µ = ν + Λ(I B) 1 α different diagrams lead to (potentially) different implications; some implications may not fit with your data 9 / 159

10 1.3 Model estimation (covariance structure only) we seek those values for θ that minimize the difference between what we observe in the data, S, and what the model implies, Σ(θ) there are many ways to quantify this difference, leading to different discrepancy measures the most used discrepancy measure is based on maximum likelihood: F ML (θ) = log Σ + tr(sσ 1 ) log S p in practice, we replace Σ by ˆΣ = Σ(ˆθ) an alternative is (weighted) least squares, for some weight matrix W: F W LS (θ) = (s σ) W 1 (s σ) where s and σ are the unique elements of S and Σ respectively 10 / 159

11 1.4 Model evaluation evaluation of global fit chi-square test statistic the chi-square test statistic is the primary test of our model if the chi-square test statistic is NOT significant, we have a good fit of the model this becomes increasingly difficult if the sample size grows evaluation of global fit fit indices (some) rules of thumb: CFI/TLI > 0.95, RMSEA < 0.05, SRMR < 0.06 there is a lot of controversy about the use (and misuse) of these fit indices a good reference is still Hu & Bentler (1999) 11 / 159

12 1.5 Introduction to lavaan installing lavaan, finding documentation lavaan depends on the R project for statistical computing to install lavaan, simply start up an R session and type: > install.packages("lavaan") more information about lavaan: the lavaan paper: Rosseel (2012). lavaan: an R package for structural equation modeling. Journal of Statistical Software, 48(2), lavaan discussion group (mailing list): 12 / 159

13 the lavaan model syntax a simple regression x 1 library(lavaan) mydata <- read.csv("c:/temp/mydata.csv") x 2 x 3 x 4 y mymodel <- ' y x1 + x2 + x3 + x4 ' # fit model fit <- sem(model = mymodel, data = mydata) # show results summary(fit, nd = 4) to see the intercept, use either fit <- sem(model = mymodel, data = mydata, meanstructure = TRUE) or include it explicitly in the syntax: mymodel <- ' y 1 + x1 + x2 + x3 + x4 ' 13 / 159

14 lavaan output > summary(fit, nd = 4, header = FALSE) Parameter Estimates: Information Information saturated (h1) model Standard Errors Expected Structured Standard Regressions: Estimate Std.Err z-value P(> z ) y x x x x Variances: Estimate Std.Err z-value P(> z ).y / 159

15 the lavaan model syntax multivariate regression x 1 mymodel <- ' y1 x1 + x2 + x3 + x4 y2 x1 + x2 + x3 + x4 ' x 2 y 1 x 3 y 2 x 4 15 / 159

16 the lavaan model syntax path analysis x 1 x 5 mymodel <- ' x5 x1 + x2 + x3 x6 x4 + x5 x7 x6 ' x 2 x 3 x 4 x 6 x 7 16 / 159

17 the lavaan model syntax using cfa() or sem() x1 x2 x3 x4 x5 visual textual HS.model <- ' visual = x1 + x2 + x3 textual = x4 + x5 + x6 speed = x7 + x8 + x9 ' fit <- cfa(model = HS.model, data = HolzingerSwineford1939) summary(fit, fit.measures = TRUE, standardized = TRUE) x6 x7 x8 speed x9 17 / 159

18 the lavaan model syntax using lavaan() x1 x2 x3 x4 x5 x6 x7 x8 x9 visual textual speed HS.model <- ' # latent variables visual = 1*x1 + x2 + x3 textual = 1*x4 + x5 + x6 speed = 1*x7 + x8 + x9 ' # factor (co)variances visual visual; visual textual visual speed; textual textual textual speed; speed speed # residual variances x1 x1; x2 x2; x3 x3 x4 x4; x5 x5; x6 x6 x7 x7; x8 x8; x9 x9 fit <- lavaan(model = HS.model, data = HolzingerSwineford1939) summary(fit, fit.measures = TRUE, standardized = TRUE) 18 / 159

19 output > summary(fit, fit.measures = TRUE) lavaan ended normally after 35 iterations Optimization method NLMINB Number of free parameters 21 Number of observations 301 Estimator ML Model Fit Test Statistic Degrees of freedom 24 P-value (Chi-square) Model test baseline model: Minimum Function Test Statistic Degrees of freedom 36 P-value User model versus baseline model: Comparative Fit Index (CFI) Tucker-Lewis Index (TLI) Loglikelihood and Information Criteria: 19 / 159

20 Loglikelihood user model (H0) Loglikelihood unrestricted model (H1) Number of free parameters 21 Akaike (AIC) Bayesian (BIC) Sample-size adjusted Bayesian (BIC) Root Mean Square Error of Approximation: RMSEA Percent Confidence Interval P-value RMSEA <= Standardized Root Mean Square Residual: SRMR Parameter Estimates: Information Information saturated (h1) model Standard Errors Expected Structured Standard Latent Variables: visual = Estimate Std.Err z-value P(> z ) 20 / 159

21 x x x textual = x x x speed = x x x Covariances: Estimate Std.Err z-value P(> z ) visual textual speed textual speed Variances: Estimate Std.Err z-value P(> z ).x x x x x x / 159

22 .x x x visual textual speed / 159

23 1.6 Introduction multilevel regression different types of data with non-independent observations clustered data (family members, teeth in a mouth) dyadic data (romantic couples) hierarchical data (students within schools within regions) matched data (case-control studies) survey data (nested sampling) longitudinal data (blood pressure of patients measured every week) repeated measures (within-subjects design) / 159

24 balanced versus unbalanced data when the data is balanced, we have the same number of units within each cluster, for example: dyadic data longitudinal data with three timepoints (for everyone) when the data is unbalanced, we have different cluster sizes this may be due to missing values students in schools wide versus long data when data is arranged in wide format, each row corresponds to a single cluster when data is arranged in long format, each row corresponds to a single unit 24 / 159

25 example wide format cluster.id y1 m1 x1 y2 m2 x2 y3 m3 x3 schoolsize large medium medium small small large medium large 25 / 159

26 example long format cluster.id y m x schoolsize large large large medium medium medium medium medium medium small small small small small small large large large medium medium medium large large large 26 / 159

27 ignoring the dependency structure we could treat the sample as a simple random sample with N independent observations although (still) widely used, ignoring the clustering in the data may have severe consequences: solutions wrong standard errors inflated type I error rates avoiding the clustering (only pick one individual per cluster) aggregating the data (may lead to ecological fallacies!) cluster-robust standard errors (clustering is just a nuisance) fixed-effects approach (eg: school as a fixed factor) mixed-effects approach (eg: school as a random factor) 27 / 159

28 1.7 The linear mixed model (optional) the structure of the model the Laird-Ware form of the linear mixed model (two-level): y ji = β 1 x 1ji + β 2 x 2ji + β 3 x 3ji β p x pji + b 1j z 1ji + b 2i z 2ji b q z qji + ɛ ji y ji is the value of the response variable for the ith of n j observations of cluster j = 1, 2,..., J x 1ji,..., x pji are the values of the p regressors for observation i in cluster j; they are fixed constants (with respect to repeated sampling), and can be anything (product terms, indicator variables,... ) in many regression models, a constant term x 0ji = 1 is added; β 0 is called the intercept 28 / 159

29 the regression coefficients β 1,..., β p are the fixed-effect coefficients, which are identical for all clusters b 1j, b 2j,..., b qj are the random-effect coefficients for cluster j; the randomeffect coefficients are thought of as random variables, not as parameters (similar to the errors ɛ ji ) z 1ji, z 2ji,..., z qji are the random-effect regressors; they are typically a subset of the fixed regressors; in many models, a random intercept term z 0ji = 1 is added and b 0j is called a random intercept ɛ ji is the random error for the ith observation of cluster j the model Laird-ware model in matrix form: y j = X j β + Z j b j + ɛ j j = 1, 2,..., J the model for the all clusters: y = Xβ + Zb + ɛ 29 / 159

30 stochastic assumptions in matrix notation the random-effect coefficients: b j N q (0, D) b j and b j are independent for j j the error terms: ɛ j N nj (0, Σ j ) ɛ j and ɛ j are independent for j j Cov[ɛ j, b j ] = 0 D contains q(q + 1)/2 parameters; Σ j contains n j (n j + 1)/2 elements the elements of D and the elements of Σ j (for each j) are collectively called the variance components the elements of D are often parameterized in terms of a smaller number of fundamental parameters 30 / 159

31 conditional and marginal distributions the conditional distribution: y j b j N(X j β + Z j b j, Σ j ) the marginal distribution: y j N(X j β, V j ) where Cov[y j ] = V j = Z j DZ j + Σ j Z j DZ j represents the random-effects structure: it implies a covariance structure of a very specific form combining the J clusters, we get Cov[y] = V = ZDZ + Σ where Cov[y] contains N(N + 1)/2 elements 31 / 159

32 1.8 Example: the High School and Beyond data the following example is borrowed from Raudenbusch and Bryk (2001) the data are from the 1982 High School and Beyond survey, and pertain to 7185 U.S. high-school students from 160 schools (70 catholic, 90 public) these are the variables that we will use: school an ordered factor designating the school that the student attends. mach a numeric vector of Mathematics achievement scores ses a numeric vector of socio-economic scores cses a numeric vector of centered ses values where the centering is with respect to the meanses for the school meanses a numeric vector of mean ses for the school sector a factor with levels Public and Catholic the aim of the analysis is to determine how students math achievement scores are related to their family socioeconomic status 32 / 159

33 exploring the data > library(mlmrev) > summary(hsb82) school minrty sx ses mach 2305 : 67 No :5211 Male :3390 Min. : Min. : : 66 Yes:1974 Female:3795 1st Qu.: st Qu.: : 65 Median : Median : : 64 Mean : Mean : : 64 3rd Qu.: rd Qu.: : 64 Max. : Max. : (Other):6795 meanses sector cses Min. : Public :3642 Min. : st Qu.: Catholic:3543 1st Qu.: Median : Median : Mean : Mean : rd Qu.: rd Qu.: Max. : Max. : / 159

34 model 1: a random-effects one-way ANOVA this is often called the empty model, since it contains no predictors, but simply reflects the nested structure no level-1 predictors, no level-2 predictors in the multilevel notation we specify a model for each level model for the first (student) level: y ji = α 0j + ɛ ji model for the second (school) level: α 0j = γ 00 + u 0j the combined model and the Laird-Ware form: y ji = γ 00 + u 0j + ɛ ji = β 0 + b 0j + ɛ ji 34 / 159

35 this is an example of a random-effects one-way ANOVA model with one fixed effect (the intercept, β 0 ) representing the general population mean of math achievement, and two random effects: b 0j representing the deviation of math achievement in school j from the general mean ɛ ji representing the deviation of individual i s math achievement in school j from the school mean there are two variance components for this model: Var(b 0j ) = d 2 : the variance among school means Var(ɛ ji ) = σ 2 : the variance among individuals in the same school since b 0j and ɛ ji are assumed to be independent, the variation in math scores among individuals can be decomposed into these two variance components: Var(y ji ) = d 2 + σ 2 35 / 159

36 R code > fit.model1 <- lmer(mach 1 + (1 school), data = Hsb82, REML = FALSE) > summary(fit.model1) Linear mixed model fit by maximum likelihood Formula: mach 1 + (1 school) Data: Hsb82 ['lmermod'] AIC BIC loglik deviance df.resid Scaled residuals: Min 1Q Median 3Q Max Random effects: Groups Name Variance Std.Dev. school (Intercept) Residual Number of obs: 7185, groups: school, 160 Fixed effects: Estimate Std. Error t value (Intercept) / 159

37 intra-class correlation the intra-class correlation coefficient is the proportion of variation in individuals scores due to differences among schools: d 2 Var(y ji ) = d2 d 2 + σ 2 = ρ ρ may also be interpreted as the correlation between the math scores of two individuals from the same school: Cor(y ji, y ji ) = Cov(y ji, y ji ) Var(yji ) Var(y ji ) = d2 d 2 + σ 2 = ρ > d2 <- as.numeric(varcorr(fit.model1)$school) > s <- as.numeric(attr(varcorr(fit.model1), "sc")) > rho <- d2/(d2 + sˆ2); rho [1] about 18 percent of the variation in students match-achievement scores is attributable to differences among schools 37 / 159

38 model 2: a random-effects one-way ANCOVA 1 level-1 predictor (SES, centered within school), no level-2 predictors random intercept, no random slopes model for the first (student) level: y ji = α 0j + α 1j cses ji + ɛ ji model for the second (school) level: α 0j = γ 00 + u 0j α 1j = γ 10 (the random intercept) (the constant slope) the combined model and the Laird-Ware form: y ji = (γ 00 + u 0j ) + γ 10 cses ji + ɛ ji = γ 00 + γ 10 cses ji + u 0j + ɛ ji = β 0 + β 1 x 1ji + b 0j + ɛ ji 38 / 159

39 the fixed-effect coefficients β 0 and β 1 represent the average within-schools population intercept and slope respectively note: because SES is centered within schools, the intercept β 0 represents the average level of math achievement in the population the model has two variance-covariance components: Var(b 0j ) = d 2 : the variance among school intercepts Var(ɛ ji ) = σ 2 : the error variance around the within-school regressions 39 / 159

40 R code > fit.model2 <- lmer(mach 1 + cses + (1 school), data = Hsb82, REML = FALSE) > summary(fit.model2, correlation = FALSE) Linear mixed model fit by maximum likelihood Formula: mach 1 + cses + (1 school) Data: Hsb82 ['lmermod'] AIC BIC loglik deviance df.resid Scaled residuals: Min 1Q Median 3Q Max Random effects: Groups Name Variance Std.Dev. school (Intercept) Residual Number of obs: 7185, groups: school, 160 Fixed effects: Estimate Std. Error t value (Intercept) cses / 159

41 model 3: a random-coefficients regression model 1 level-1 predictor (SES, centered within school), no level-2 predictors random intercept and random slopes model for the first (student) level: y ji = α 0j + α 1j cses ji + ɛ ji model for the second (school) level: α 0j = γ 00 + u 0j α 1j = γ 10 + u 1j (the random intercept) (the random slope) the combined model and the Laird-Ware form: y ji = (γ 00 + u 0j ) + (γ 10 + u 1j ) cses ji + ɛ ji = γ 00 + γ 10 cses ji + u 0j + u 1j cses ji + ɛ ji = β 0 + β 1 x 1ji + b 0j + b 0j z 1ji + ɛ ji 41 / 159

42 the fixed-effect coefficients β 0 and β 1 again represent the average withinschools population intercept and slope respectively the model has four variance-covariance components: Var(b 0j ) = d 2 0: the variance among school intercepts Var(b 1j ) = d 2 1: the variance among school slopes Cov(b 0j, b 1j ) = d 01 : the covariance between within-school intercepts and slopes Var(ɛ ji ) = σ 2 : the error variance around the within-school regressions 42 / 159

43 R code > fit.model3 <- lmer(mach 1 + cses + (1 + cses school), data = Hsb82, + REML = FALSE) > summary(fit.model3, correlation = FALSE) Linear mixed model fit by maximum likelihood ['lmermod'] Formula: mach 1 + cses + (1 + cses school) Data: Hsb82 AIC BIC loglik deviance df.resid Scaled residuals: Min 1Q Median 3Q Max Random effects: Groups Name Variance Std.Dev. Corr school (Intercept) cses Residual Number of obs: 7185, groups: school, 160 Fixed effects: Estimate Std. Error t value (Intercept) cses / 159

44 model 4: intercepts-and-slopes-as-outcomes model we expand the model by including two level-2 predictors: meanses and sector; the slopes are allowed to vary randomly model for the first (student) level: y ji = α 0j + α 1j cses ji + ɛ ji model for the second (school) level: α 0j = γ 00 + γ 01 meanses j + γ 02 sector j + u 0j α 1j = γ 10 + γ 11 meanses j + γ 12 sector j + u 1j 44 / 159

45 the combined model and the Laird-Ware form: R code y ji = (γ 00 + γ 01 meanses j + γ 02 sector j + u 0j )+ (γ 10 + γ 11 meanses j + γ 12 sector j + u 1j ) cses ji + ɛ ji = γ 00 + γ 01 meanses j + γ 02 sector j + γ 10 cses ji + γ 11 meanses j cses ji + γ 12 sector j cses ji + u 0j + u 1j cses ji + ɛ ji = β 0 + β 1 x 1ji + β 2 x 2ji + β 3 x 3ji + β 4 (x 1ji x 3ji ) + β 5 (x 2ji x 3ji )+ b 0j + b 1j z 1ji + ɛ ji > fit.model4 <- lmer(mach 1 + meanses*cses + sector*cses + (1 + cses school), + data = Hsb82, REML = FALSE) > summary(fit.model4, correlation = FALSE) Linear mixed model fit by maximum likelihood ['lmermod'] Formula: mach 1 + meanses * cses + sector * cses + (1 + cses school) Data: Hsb82 AIC BIC loglik deviance df.resid / 159

46 Scaled residuals: Min 1Q Median 3Q Max Random effects: Groups Name Variance Std.Dev. Corr school (Intercept) cses Residual Number of obs: 7185, groups: school, 160 Fixed effects: Estimate Std. Error t value (Intercept) meanses cses sectorcatholic meanses:cses cses:sectorcatholic / 159

47 do we need the random slopes? > fit.model4bis <- lmer(mach 1 + meanses*cses + sector*cses + (1 school), + data = Hsb82, REML = FALSE) > anova(fit.model4bis, fit.model4) Data: Hsb82 Models: fit.model4bis: mach 1 + meanses * cses + sector * cses + (1 school) fit.model4: mach 1 + meanses * cses + sector * cses + (1 + cses school) Df AIC BIC loglik deviance Chisq Chi Df Pr(>Chisq) fit.model4bis fit.model no: apparently, the level-2 predictors do a sufficiently good job of accounting for differences in slopes statistical note: using a LRT for comparing two models with a different random structure is conservative; better approaches exist (e.g. in the R package RLRsim) 47 / 159

48 summary of the models > # model 1: > lmer(mach 1 + (1 school), data = Hsb82, REML = FALSE) > # model 2: > lmer(mach 1 + cses + (1 school), data = Hsb82, REML = FALSE) > # model 3: > lmer(mach 1 + cses + (1 + cses school), data = Hsb82, REML = FALSE) > # model 4: > lmer(mach 1 + meanses*cses + sector*cses + (1 + cses school), + data = Hsb82, REML = FALSE) > # model 4bis: > lmer(mach 1 + meanses*cses + sector*cses + (1 school), + data = Hsb82, REML = FALSE) 48 / 159

49 2 Multilevel SEM 2.1 Introduction limitations of the multilevel regression model: (mostly) univariate perspective (multivariate is possible but awkward) no measurement models (latent variables) no mediators (only strictly dependent or independent variables) no reciprocal effects, no goodness-of-fit measures,... two evolutions since the late 1980s: the multilevel regression framework was extended to include measurement errors and latent variables (cfr. HLM and MLwiN software) the traditional SEM framework started to incorporate random intercepts and random slopes the boundaries between SEM and multilevel regression have gradually disappeared 49 / 159

50 2.2 Frameworks (and software) for multilevel SEM overview two-level SEM with random intercepts Mplus (type = twolevel), LISREL, EQS, lavaan the gllamm framework: gllamm, (related approach: Latent Gold) the Mplus framework: Mplus the case-wise likelihood based approach (e.g., Mehta & Neale, 2005) Mplus (type = random), Mx, OpenMx (definition variables) in principle: both continuous and categorical outcomes; random slopes xxm? the Bayesian framework Mplus (Open)BUGS, JAGS, Stan 50 / 159

51 two-level SEM with random intercepts an extension of single-level SEM to incorporate random intercepts extensive technical literature, starting from the late 1980s (until about 2004) available in Mplus, EQS, LISREL, lavaan,... this is by far the most widely used framework in the applied literature advantages: fast, simple, well-understood, plenty of examples well-documented disadvantages: continuous outcomes only no random slopes 51 / 159

52 the Mplus framework the Mplus framework has added many extensions to the two-level within/between approach in the last 17 years EM algorithm can handle random slopes and missing data categorical outcomes (with numerical quadrature) multilevel (robust) (D)WLS combination multilevel with complex survey data, mixture modeling,... advantages: superb implementation user-friendly, familiar ( multivariate ) approach disadvantages: NO technical documentation (about the extensions) black box software 52 / 159

53 the gllamm framework Sophia Rabe-Hesketh, Anders Skrondal and Andrew Pickles see an extension of generalized linear mixed models to include (continuous and discrete) latent variables (including a structural part) advantages: very well documented, open-source code (written in Stata) handles a wide range of outcome types (normal, categorical,... ) very general, very flexible disadvantages: not easy to specify (complex) models, univariate perspective needs Stata very, very slow (even in the continuous case) 53 / 159

54 2.3 The two-level SEM model with random intercepts we assume two-level data with individuals (students) nested within clusters (schools) in this framework, we decompose the total score of each variable into two parts: a within part, and a between part (Cronbach & Webb, 1979): y ji = (y ji ȳ j ) + ȳ j y T = y W + y B where j = 1,..., J is an index for the clusters, and i = 1,..., n j is an index for the units within a cluster; ȳ j is the cluster mean of cluster j both components are treated as unknown (latent) variables the two parts are orthogonal and additive; one of the parts can be zero the total covariance (at the population level) can be decomposed as Cov(y) = Σ T = Σ W + Σ B 54 / 159

55 two-level SEM: specifying a model for each level for a two-level CFA model, we can use Σ W = Λ W Ψ W Λ W + Θ W and Σ B = Λ B Ψ B Λ B + Θ B if we add a structural (regression) part, we need to add the (I B) 1 term to the matrix formulation (as in regular SEM) meanstructure within: µ W (usually all zero, as the level-1 variables are cluster-centered, except for within-only variables) between: µ B in addition, we can add level-2 covariates (z j ) to the model 55 / 159

56 2.4 Two-level SEM in lavaan multilevel SEM development started around jan 2017 implemented in lavaan (0.6-2): standard two-level within-and-between approach continuous responses only, no missing data (for now) no random slopes (for now) using quasi-newton optimization by default em algorithm available using the option optim.method = "em" future plans: many, but don t ask when it will be ready missing data, random slopes gllamm framework (but more user-friendly) case-wise likelihood approach more levels 56 / 159

57 lavaan syntax setup for two-level SEM model <- ' Σ B level: 1 # here comes the within level Between Within ' level: 2 # here comes the between level fit <- sem(mymodel, mydata, cluster = "school") Σ W 57 / 159

58 example: Demo.twolevel (simulated data) data: 200 clusters, 2500 observations, cluster sizes: 5, 10, 15 and 20 measures at the within level y 1, y 2, y 3,... covariates at the within level x 1, x 2... covariates at the between level w 1 and w 2 explore the data: > head(round(demo.twolevel[,c(1:4,7:12)], 3), n = 10) y1 y2 y3 y4 x1 x2 x3 w1 w2 cluster / 159

59 model 1: the empty (univariate) model library(lavaan) y 1 y 1 Between Within model <- ' level: 1 y1 y1 level: 2 y1 y1 ' fit <- sem(model, data = Demo.twolevel, cluster = "cluster") summary(fit, nd = 4) 59 / 159

60 lavaan output (parameter estimates only) Level 1 [within]: Intercepts: Estimate Std.Err z-value P(> z ) y Variances: Estimate Std.Err z-value P(> z ) y Level 2 [cluster]: Intercepts: Estimate Std.Err z-value P(> z ) y Variances: Estimate Std.Err z-value P(> z ) y / 159

61 lmer version > library(lme4) > fit.lmer <- lmer(y1 1 + (1 cluster), data = Demo.twolevel, REML = FALSE) > summary(fit.lmer) Linear mixed model fit by maximum likelihood Formula: y1 1 + (1 cluster) Data: Demo.twolevel ['lmermod'] AIC BIC loglik deviance df.resid Scaled residuals: Min 1Q Median 3Q Max Random effects: Groups Name Variance Std.Dev. cluster (Intercept) Residual Number of obs: 2500, groups: cluster, 200 Fixed effects: Estimate Std. Error t value (Intercept) / 159

62 model 2: simple twolevel regression (predictor within) model <- ' level: 1 y1 x1 y 1 level: 2 Between Within ' y1 y1 y 1 x 1 fit <- sem(model, data = Demo.twolevel, cluster = "cluster") summary(fit, nd = 4) 62 / 159

63 lavaan output (parameter estimates only) Level 1 [within]: Regressions: Estimate Std.Err z-value P(> z ) y1 x Intercepts: Estimate Std.Err z-value P(> z ).y Variances: Estimate Std.Err z-value P(> z ).y Level 2 [cluster]: Intercepts: Estimate Std.Err z-value P(> z ).y Variances: Estimate Std.Err z-value P(> z ).y / 159

64 lmer version > fit.lmer <- lmer(y1 1 + x1 + (1 cluster), data = Demo.twolevel, REML = FALSE) > summary(fit.lmer, corr = FALSE) Linear mixed model fit by maximum likelihood Formula: y1 1 + x1 + (1 cluster) Data: Demo.twolevel ['lmermod'] AIC BIC loglik deviance df.resid Scaled residuals: Min 1Q Median 3Q Max Random effects: Groups Name Variance Std.Dev. cluster (Intercept) Residual Number of obs: 2500, groups: cluster, 200 Fixed effects: Estimate Std. Error t value (Intercept) x / 159

65 model 3: simple twolevel regression (within + between predictor) model <- ' level: 1 y1 x1 y 1 w 1 Between Within ' level: 2 y1 w1 y 1 x 1 fit <- sem(model, data = Demo.twolevel, cluster = "cluster") summary(fit, nd = 4) 65 / 159

66 lavaan output lavaan ended normally after 21 iterations Optimization method NLMINB Number of free parameters 5 Number of observations 2500 Number of clusters [cluster] 200 Estimator ML Model Fit Test Statistic Degrees of freedom 0 Minimum Function Value Parameter Estimates: Information Observed information based on Standard Errors Observed Hessian Standard Level 1 [within]: Regressions: Estimate Std.Err z-value P(> z ) y1 x / 159

67 Intercepts: Estimate Std.Err z-value P(> z ).y Variances: Estimate Std.Err z-value P(> z ).y Level 2 [cluster]: Regressions: Estimate Std.Err z-value P(> z ) y1 w Intercepts: Estimate Std.Err z-value P(> z ).y Variances: Estimate Std.Err z-value P(> z ).y / 159

68 lmer version > fit.lmer <- lmer(y1 1 + x1 + w1 + (1 cluster), data = Demo.twolevel, REML = FALSE) > summary(fit.lmer, corr = FALSE) Linear mixed model fit by maximum likelihood Formula: y1 1 + x1 + w1 + (1 cluster) Data: Demo.twolevel ['lmermod'] AIC BIC loglik deviance df.resid Scaled residuals: Min 1Q Median 3Q Max Random effects: Groups Name Variance Std.Dev. cluster (Intercept) Residual Number of obs: 2500, groups: cluster, 200 Fixed effects: Estimate Std. Error t value (Intercept) x w / 159

69 model 4: one-factor model at both levels fb model <- ' level: 1 fw = y1 + y2 + y3 + y4 y 1 y 2 y 3 y 4 Between Within y 1 y 2 y 3 y 4 ' level: 2 fb = y1 + y2 + y3 + y4 fit <- sem(model, data = Demo.twolevel, cluster = "cluster") fw 69 / 159

70 lavaan output > summary(fit) lavaan ended normally after 44 iterations Optimization method NLMINB Number of free parameters 20 Number of observations 2500 Number of clusters [cluster] 200 Estimator ML Model Fit Test Statistic Degrees of freedom 4 P-value (Chi-square) Parameter Estimates: Information Observed information based on Standard Errors Observed Hessian Standard Level 1 [within]: Latent Variables: Estimate Std.Err z-value P(> z ) 70 / 159

71 fw = y y y y Intercepts: Estimate Std.Err z-value P(> z ).y y y y fw Variances: Estimate Std.Err z-value P(> z ).y y y y fw Level 2 [cluster]: Latent Variables: fb = Estimate Std.Err z-value P(> z ) 71 / 159

72 y y y y Intercepts: Estimate Std.Err z-value P(> z ).y y y y fb Variances: Estimate Std.Err z-value P(> z ).y y y y fb / 159

73 more output > fitmeasures(fit) npar fmin chisq df pvalue baseline.chisq baseline.df baseline.pvalue cfi tli nnfi rfi nfi pnfi ifi rni logl unrestricted.logl aic bic ntotal bic2 rmsea rmsea.ci.lower rmsea.ci.upper rmsea.pvalue srmr srmr_within srmr_between > lavinspect(fit, "h1") $within $within$cov y1 y2 y3 y4 y y / 159

74 y y $within$mean y1 y2 y3 y $cluster $cluster$cov y1 y2 y3 y4 y y y y $cluster$mean y1 y2 y3 y > lavinspect(fit, "implied") $within $within$cov y1 y2 y3 y4 y y y / 159

75 y $within$mean y1 y2 y3 y $cluster $cluster$cov y1 y2 y3 y4 y y y y $cluster$mean y1 y2 y3 y > lavinspect(fit, "icc") y1 y2 y3 y / 159

76 model 5: adding covariates (no output) fb w 1 model <- ' level: 1 fw = y1 + y2 + y3 + y4 fw x1 + x2 y 1 y 2 y 3 y 4 Between Within y 1 y 2 y 3 y 4 ' level: 2 fb = y1 + y2 + y3 + y4 fb w1 fit <- sem(model, data = Demo.twolevel, cluster = "cluster") x 2 fw x 1 76 / 159

77 3 Further topics 3.1 The status of a latent variable in a two-level SEM when a latent variable, representing a hypothetical construct, is introduced in a two-level model, we need to carefully reflect on the status of this latent variable are the indicators measured at the within or the between level? is the construct of (theoretical) interest at the within level, the between level, or both? how can we interpret the meaning of the construct at the within/between level? based on the answers on these questions, we need to create the latent variable in a different way at the within and/or the between level this is (still today) a big source of confusion (and bad practices) in the literature 77 / 159

78 different types of latent variables we will discuss five different construct types: 1. within-only construct in this case, if we have no other level-2 variables, we may as well use a single-level SEM based on a pooled within-cluster covariance matrix 2. between-only construct 3. shared (or climate, or reflective) between-level construct 4. configural (or contextual) construct 5. shared and configural construct reference: Stapleton, L.M., Yang, J.S., & Hancock, G.R. (2016). Construct meaning in multilevel settings. Journal of Educational and Behavioral Statistics, 41, / 159

79 within-only construct indicators of the latent variable are measured at the within level level at which construct is of interest: within level only interpretation at the within level: construct explains the covariances between its indicators measured at the within level interpretation at the between level: not relevant although the construct only exists at the within level, we may still observe spurious between-level variation in the sample example: construct represents lactose intolerance items inquire about the degree of severity of physical reactions after consuming products containing lactose construct can not be a school-level characteristic, although we may observe differences (on average) across schools 79 / 159

80 diagram and lavaan syntax model <- ' level: 1 fw = y1 + y2 + y3 + y4 y 1 y 2 y 3 y 4 Between Within y 1 y 2 y 3 y 4 ' level: 2 y1 y1 + y2 + y3 + y4 y2 y2 + y2 + y3 y3 y3 + y4 y4 y4 fw 80 / 159

81 between-only construct indicators of the latent variable are measured at the between level level at which construct is of interest: between level only interpretation at the within level: not relevant (does not exist at the within level) interpretation at the between level: construct explains the covariances between its indicators measured at the between level example: construct reflects self-reported leadership style measured by a questionnaire filled in by the school principles 81 / 159

82 diagram and lavaan syntax fb model <- ' level: 1 y 1 y 2 y 3 y 4 Between Within ' # perhaps other level-1 variables level: 2 fb = y1 + y2 + y3 + y4 82 / 159

83 shared (or reflective/climate) between-level construct indicators of the latent variable are measured at the within level level at which construct is of interest: between level only interpretation at the within level: none interpretation at the between level: construct represents a characteristic of the cluster example: construct reflects instructional quality (a classroom characteristic) as perceived by students each student in each classroom was asked to judge the instructional quality of the teacher of that classroom we are interested in the average responses of the individual students within each classroom responses within each classroom should be highly correlated (high agreement) if indeed instructional quality is a shared construct 83 / 159

84 diagram and lavaan syntax fs model <- ' level: 1 y 1 y 2 y 3 y 4 Between Within y 1 y 2 y 3 y 4 ' y1 y1 + y2 + y3 + y4 y2 y2 + y2 + y3 y3 y3 + y4 y4 y4 level: 2 fs = y1 + y2 + y3 + y4 84 / 159

85 configural (or formative) construct indicators of the latent variable are measured at the within level level at which construct is of interest: both within and between level interpretation at the within/between level: construct explains the covariances of the within/between part of its indicators the configural construct (at the between level) represents the aggregate of the measurements of individuals within a cluster example: reading motivation: at the individual level (within cluster) at the school level (average student motivation within a school) the cluster itself is not seen as the source/reason for variability of an individual construct therefore, between-cluster loadings are fixed to be the same as within-cluster loadings (cross-level measurement invariance) 85 / 159

86 diagram and lavaan syntax fb a b c d model <- ' level: 1 fw = a*y1 + b*y2 + c*y3 + d*y4 y 1 y 2 y 3 y 4 Between Within y 1 y 2 y 3 y 4 ' level: 2 fb = a*y1 + b*y2 + c*y3 + d*y4 # optional (cross-level invariance) # y1 0*y1 # y2 0*y2 # y3 0*y3 # y4 0*y4 a b c d fw 86 / 159

87 shared + configural construct indicators of the latent variable are measured at the within level level at which construct is of interest: within and between level interpretation at the within level: construct explains the covariances of the within part of its indicators interpretation at the between level: both the configural construct and the shared construct explain the covariances of the within/between part of its indicators example: reading motivation for each child in a classroom is rated by the classroom teacher (using multiple items) some teachers tend to rate more positively as compared to others the shared construct reflects the rater effect the configural construct reflects the average reading motivation in a classroom 87 / 159

88 diagram and lavaan syntax fb a b c d fs model <- ' level: 1 fw = a*y1 + b*y2 + c*y3 + d*y4 y 1 y 2 y 3 y 4 Between Within y 1 y 2 y 3 y 4 ' level: 2 fb = a*y1 + b*y2 + c*y3 + d*y4 fs = y1 + y2 + y3 + y4 # fb and fs must be orthogonal fs 0*fb a b c d fw 88 / 159

89 3.2 The status of observed covariates in a two-level SEM when observed covariates are added in a two-level model, we again need to carefully reflect on the status of these covariates are the covariates measured at the within or the between level? if they are measured at the within level, does it make sense to split this covariate into a within and a between part? based on the answers on these questions, we can make a distinction between three types of covariates: 1. within-only covariates 2. between-only covariates 3. level-1 covariates with a within and a between part 89 / 159

90 adding a within-only covariate fb a b c d model <- ' level: 1 fw = a*y1 + b*y2 + c*y3 + d*y4 y 1 y 2 y 3 y 4 Between Within y 1 y 2 y 3 y 4 ' fw x1 level: 2 fb = a*y1 + b*y2 + c*y3 + d*y4 a b c d fw x 1 90 / 159

91 adding a between-only covariate z 1 fb a b c d model <- ' level: 1 fw = a*y1 + b*y2 + c*y3 + d*y4 level: 2 y 1 y 2 y 3 y 4 Between Within y 1 y 2 y 3 y 4 ' fb = a*y1 + b*y2 + c*y3 + d*y4 fb z1 a b c d fw 91 / 159

92 adding a level-1 covariate with a within and a between part fb a b c d x 1 model <- ' level: 1 fw = a*y1 + b*y2 + c*y3 + d*y4 y 1 y 2 y 3 y 4 Between Within y 1 y 2 y 3 y 4 ' fw x1 level: 2 fb = a*y1 + b*y2 + c*y3 + d*y4 fb x1 a b c d fw x 1 92 / 159

93 adding a level-1 covariate with a within and a between part (2) decomposition of a level-1 covariate (say, x 1 ) into its within part and between part: the level-1 covariate is centered using cluster/group-mean centering the cluster/group means are treated as (unknown, latent) population parameters that need to be estimated this implies that we assume a random intercept for the level-1 covariate note that this is not the same as creating aggregated versions of the level-1 covariates manually, and adding them to the datafile so they can be used in the between part of the model, see: Lüdtke, O., Marsh, H.W., Robitzsch, A., Trautwein, U., Asparouhov, T., & Muthén, B. (2008). The multilevel latent covariate model: a new, more reliable approach to group-level effects in contextual studies. Psychological methods, 13, / 159

94 3.3 Evaluating model fit if no random slopes are involved, we can fit an unrestricted (saturated) model: we estimate all the elements of Σ W, Σ B and µ B then, we can compute the standard χ 2 goodness-of-fit test statistic as: T = 2(L 0 L 1 ) where L 0 and L 1 are the loglikelihood of the restricted (user-specified) model (h0) and the unrestricted model (h1) respectively under various optimal conditions, this statistic follows a chi-square distribution the degrees of freedom are computed as in a two-group SEM model: the difference between the number of (non-redundant) sample statistics for each level, and the number of free model parameters in principle, fit measures like CFI/TLI, RMSEA, SRMR,... can be computed in a similar way as in a single-level SEM 94 / 159

95 evaluating fit (2) unfortunately, a recent simulation study showed that CFI, TLI, and RMSEA were not sensitive to Level-2 model misspecification: Hsu, H.Y., Kwok, O.M., Lin, J.H., & Acosta, S. (2015). Detecting misspecified multilevel structural equation models with common fit indices: a Monte Carlo study. Multivariate behavioral research, 50, there seems to be a growing sentiment that global fit indices may not be very useful in a multilevel setting an alternative approach is to assess the fit per level: we could compute the SRMR for each level we could fit a model separately for each level, and leave the other level saturated 95 / 159

96 4 Examples 4.1 Example: two-level CFA we use an example from this book (Chapter 14): Hox, J.J., Moerbeek, M., & van de Schoot, R. (2010). Multilevel analysis: Techniques and applications. Routledge. the (simulated) data are the scores on six intelligence measures of 399 children from 60 (large) families, patterned after a real dataset collected by Van Peet, A.A.J. (1992) the six intelligence measures are: word list, cards, matrices, figures, animals, and occupations the data have a two-level structure, with children nested within families if intelligence is strongly influenced by shared genetic and environmental influences in the families, we may expect strong between-family effects the ICCs of the 6 measures range from 0.36 to / 159

97 exploring the data > FamIQData <- read.table("famiqdata.dat") > names(famiqdata) <- c("family", "child", "wordlist", "cards", "matrices", "figures", "animals", "occupats") > summary(famiqdata) family child wordlist cards Min. : 1.00 Min. : 1.00 Min. :12.00 Min. : st Qu.: st Qu.: st Qu.: st Qu.:26.50 Median :33.00 Median : 4.00 Median :30.00 Median :30.00 Mean :31.78 Mean : 4.04 Mean :29.95 Mean : rd Qu.: rd Qu.: rd Qu.: rd Qu.:33.00 Max. :60.00 Max. :12.00 Max. :45.00 Max. :44.00 matrices figures animals occupats Min. :15.00 Min. :17.00 Min. :15.00 Min. : st Qu.: st Qu.: st Qu.: st Qu.:27.00 Median :30.00 Median :30.00 Median :30.00 Median :30.00 Mean :29.73 Mean :30.08 Mean :30.11 Mean : rd Qu.: rd Qu.: rd Qu.: rd Qu.:33.00 Max. :46.00 Max. :44.00 Max. :46.00 Max. :43.00 > # various cluster sizes > table(table(famiqdata$family)) / 159

98 analytic procedure fitting a two-level model is often a stepwise procedure; below are the steps used by Joop Hox model 0: as a preliminary step, an EFA was carried out on the pooled withinclusters covariance matrix S P W it was concluded that a 2-factor model fitted well at the within level not shown here model 1: a two-factor model at the within level + a null model at the between level a null model implies: zero variances and covariances for all (6) variables if this model fits well, we would conclude that there is no between family structure at all: we may as well continue with a single-level analysis 98 / 159

99 model 2: a two-factor model at the within level + an independence model at the between level independence model implies: estimated variances but zero covariances if this model holds, there is family-level variance, but no substantively interesting structural model model 3: a two-factor model at the within level + a saturated model at the between level the factors at the within-level in this model correspond to what we have called within-only constructs models 4a and 4b: in his book, Joop Hox goes on and fits a model with a one-factor model for the between part (4a), and a model with a two-factor model for the between part (4b) the two-factor model seems no improvement over the one-factor model model 4a (with a general factor at the between level) is kept as the final model (this factor can be considered between-only ) 99 / 159

100 model 1: a 2-factor within model + null between model wordlist cards matrices figures animals occupats Between Within wordlist cards matrices figures animals occupats numeric perception 100 / 159

101 lavaan syntax > model1 <- ' level: 1 numeric = wordlist + cards + matrices perception = figures + animals + occupats level: 2 wordlist 0*wordlist cards 0*cards matrices 0*matrices figures 0*figures animals 0*animals occupats 0*occupats ' > fit1 <- sem(model1, data = FamIQData, cluster = "family", std.lv = TRUE, verbose = FALSE) > # summary(fit1) > # terrible fit 101 / 159

102 model 2: a 2-factor within model + independence between model wordlist cards matrices figures animals occupats Between Within wordlist cards matrices figures animals occupats numeric perception 102 / 159

103 lavaan syntax model 2 > model2 <- ' level: 1 numeric = wordlist + cards + matrices perception = figures + animals + occupats level: 2 wordlist wordlist cards cards matrices matrices figures figures animals animals occupats occupats ' > fit2 <- sem(model2, data = FamIQData, cluster = "family", std.lv = TRUE, verbose = FALSE) > # summary(fit2) > # still a bad fit 103 / 159

104 model 3: a 2-factor within model, with saturated between part wordlist cards matrices figures animals occupats Between Within wordlist cards matrices figures animals occupats numeric perception 104 / 159

105 lavaan syntax model 3 > model3 <- ' level: 1 numeric = wordlist + cards + matrices perception = figures + animals + occupats level: 2 # saturated wordlist cards + matrices + figures + animals + occupats cards matrices + figures + animals + occupats matrices figures + animals + occupats figures animals + occupats animals occupats ' > fit3 <- sem(model3, data = FamIQData, cluster = "family", std.lv = TRUE, verbose = FALSE) > # summary(fit3) > # much better fit 105 / 159

Multilevel Structural Equation Modeling with lavaan

Multilevel Structural Equation Modeling with lavaan Department of Data Analysis Zürich 2 3 November 2017 1 / 162 Contents 1 Multilevel regression 4 1.1 Brief overview........................... 4 1.2 Example: the High School and Beyond data........... 15