British Journal of Mathematical and Statistical Psychology (2016), 69, The British Psychological Society

Size: px

Start display at page:

Download "British Journal of Mathematical and Statistical Psychology (2016), 69, The British Psychological Society"

Lambert Hancock
5 years ago
Views:

1 316 British Journal of Mathematical and Statistical Psychology (16), 69, The British Psychological Society The relationship between multilevel models and non-parametric multilevel mixture models: Discrete approximation of intraclass correlation, random coefficient distributions, and residual heteroscedasticity Jason D. Rights* and Sonya K. Sterba Vanderbilt University, Nashville, Tennessee, USA Multilevel data structures are common in the social sciences. Often, such nested data are analysed with multilevel models (MLMs) in which heterogeneity between clusters is modelled by continuously distributed random intercepts and/or slopes. Alternatively, the non-parametric multilevel regression mixture model (NPMM) can accommodate the same nested data structures through discrete latent class variation. The purpose of this article is to delineate analytic relationships between NPMM and MLM parameters that are useful for understanding the indirect interpretation of the NPMM as a non-parametric approximation of the MLM, with relaxed distributional assumptions. We define how seven standard and non-standard MLM specifications can be indirectly approximated by particular NPMM specifications. We provide formulas showing how the NPMM can serve as an approximation of the MLM in terms of intraclass correlation, random coefficient means and (co)variances, heteroscedasticity of residuals at level 1, and heteroscedasticity of residuals at level. Further, we discuss how these relationships can be useful in practice. The specific relationships are illustrated with simulated graphical demonstrations, and direct and indirect interpretations of NPMM classes are contrasted. We provide an R function to aid in implementing and visualizing an indirect interpretation of NPMM classes. An empirical example is presented and future directions are discussed. 1. Introduction Multilevel data structures are common in social science research, as when students are nested within schools or patients are nested within clinicians. Such nested data are typically analysed with multilevel models (MLMs, also known as hierarchical linear models or mixed effects models) in which heterogeneity between clusters is modelled by continuous, normally distributed random intercepts and/or slopes (e.g., Goldstein, 11; Hox, 1; Raudenbush & Bryk, ). Multilevel mixture models are also used to accommodate nested data. One type of multilevel mixture that has been increasingly applied in practice has been termed the non-parametric multilevel mixture model (NPMM; e.g., Asparouhov & Muthen, 8; Finch & Marchant, 13; Henry & Muthen, 1; Kaplan & Keller, 11; Karakos, 15; *Correspondence should be addressed to Jason D. Rights, Department of Psychology and Human Development, Vanderbilt University, Peabody #55, 3 Appleton Place, Nashville, TN 373, USA ( jason.d. rights@vanderbilt.edu). DOI:1.1111/bmsp.173

2 Non-parametric multilevel regression mixture 317 Park & Yu, 15; Van Horn, Feng, Kim, Lamont, Feaster, & Jaki, 16; Vermunt, 3, 4, 8; Yu & Park, 14). In this model, latent classes are extracted at multiple levels of a hierarchical data structure (e.g., level 1 and level ) and, unlike the conventional MLM, there are no continuously distributed random effects. In this paper, we consider the regression 1 version of the NPMM, where intercepts and slopes of covariates can vary discretely across level-1 and level- classes (e.g., Vermunt, 1; Vermunt & Magidson, 5). Compared to the MLM, the NPMM provides the advantages of relaxed distributional assumptions (e.g., distributions of effects across clusters are not assumed normal) and the ability to interpret classes both directly (as representing literal, discrete subpopulations) and indirectly (e.g., as approximating an underlying continuous distribution of effects across clusters; Titterington, Smith, & Makov, 1985). Indeed, the NPMM is termed non-parametric because it has often been motivated for use with an indirect interpretation in which classes can represent a discrete approximation of continuous distributions, such as those assumed in the MLM. However, precisely how this NPMM non-parametrically approximates MLM parameters has not been shown mathematically. Having only a conceptual or incomplete 3 understanding of the relationship between the NPMM and MLM leaves us with several unknowns that limit our understanding of the indirect interpretation of the NPMM: (1) What are the analytic relationships through which the parameters of the NPMM can approximate the parameters of the MLM? () Which standard MLM specifications can be approximated by which NPMM specifications? (3) Can the NPMM also indirectly approximate non-standard MLM specifications? In contrast, model parameters from other mixtures have been analytically related to the non-mixture-model parameters they approximate (e.g., Bauer, 5, 7; Vermunt & Van Dijk, 1), which has aided the indirect interpretation of classes (e.g., Muthen & Asparouhov, 8; Pek, Sterba, Kok, & Bauer, 9). A recent review of psychology mixture applications noted that whereas a direct interpretation of classes is intuitive, an indirect semiparametric function of classes is less so and called for methodologists to provide more concrete details on an indirect interpretation to facilitate its use in practice (Sterba, Baldasaro, & Bauer, 1, pp. 593, 65). Understanding the relationship between the NPMM and MLM (i.e., the indirect interpretation of the NPMM) would be useful to applied researchers for several reasons. First, understanding this relationship can aid both multilevel modelling and mixture modelling researchers in developing competing explanations of their data-generating mechanisms. This can be particularly useful in mixture applications given that many applied researchers currently only consider a direct interpretation of classes which may be seductive in its simplicity, yet risks reifying classes as literal subpopulations (Bauer, 1 Note that Muthen and Asparouhov (9) address a different kind of multilevel regression mixture that involves both random effects and classes, requires numerical integration, and is most often used for directly interpreting classes. They do not discuss the NPMM. The NPMM involves only classes, does not require numerical integration, and is often used for indirectly interpreting classes. When the normality assumptions of the MLM hold, the NPMM non-parametrically approximates MLM parameters, including the first moment(s) and second central moment(s) of the random effect(s) distribution. When these normality assumptions do not hold, the NPMM has the flexibility to not only more accurately recover lower-order moments of the random effects distribution (e.g., Asparouhov & Muthen, 8; Brame, Nagin, & Wasserman, 6), but also recover its higher-order moments. As a shorthand, throughout this paper we refer to the NPMM as non-parametrically approximating MLM parameters when referring to its discrete approximation of such lower-order moments, regardless of whether MLM assumptions are upheld or not. 3 Certain relationships have been delineated for simpler mixtures with classes exclusively at level (Vermunt & Van Dijk, 1), as described in detail later.

3 318 Jason D. Rights and Sonya K. Sterba 7; Sampson & Laub, 5). Second, understanding this relationship would illuminate how the NPMM can be used as a preliminary, exploratory tool for multilevel modelling researchers. It will be shown that fitting our general NPMM automatically and flexibly approximates a number of non-standard MLM specifications that would not typically be considered by applied researchers fitting MLMs, yet could diagnostically suggest novel future directions for multilevel model-building. Third, understanding this link can help synthesize or even meta-analyse results across already published studies that have used the MLM and/or NPMM with similar dependent variables and covariates. Without the linkages established in this paper, there would be no way to formally compare empirical results across methods, as is often desired (e.g., Bergman, 15; Sterba & Bauer, 14). The purpose of this paper is to fill this gap by enumerating the relationships between the NPMM and MLM useful for indirectly interpreting the NPMM and, moreover, to highlight the practical importance of understanding these relationships. In doing so, we hope to facilitate the use of the NPMM as a non-parametric approximation of the MLM in practice. We first consider a series of NPMMs of increasing complexity. For each, we (a) define the corresponding MLM and (b) provide formulas underlying how parameters of each MLM are indirectly approximated. We begin by relating null NPMM parameters to the intraclass correlation (ICC) from a random-intercept-only MLM; subsequently, we discuss these relationships for MLM fixed effects, MLM random slope variances, and also nonstandard features not previously known to be approximated by the NPMM namely, MLM heteroscedastic residual variances at level 1 and level. To ground these analytic developments, we contrast direct and indirect interpretations of latent classes, employ simulated graphical illustrations, and provide an empirical example involving predicting mathematics achievement for students nested within schools. In our discussion (Section 8), we consider additional novel uses for the indirect interpretation of the NPMM. Mplus (Muthen & Muthen, ) syntax to implement the NPMM and an R function to implement indirect approximation calculations are provided in our Appendix S1. Regarding our scope, note that subsequent research can use our analytic formulas to investigate via simulation the degree of correspondence between MLM parameters and their NPMM approximations under different design conditions. Though we provide a single-sample empirical illustration, a full-scale simulation, as has been done for other mixtures (e.g., Brame et al., 6; Muthen & Asparouhov, 8; Sterba & Bauer, 14), 4 is beyond our scope here. Before continuing, it is important to note the ways in which our analytic developments build on and differ from those provided for other non-parametric mixtures with classes only at level (Vermunt & Van Dijk, 1). First, the additional presence of level-1 classes in the NPMM not only changes computational aspects of these analytic relationships (see Sections and 3), but also introduces new features (see Sections 4 and 5). Second, previous work has not related the variety of possible NPMM specifications each to particular MLMs, nor focused on approximating the ICC. Third, we show steps to derive each indirect approximation formula. This will allow our analytic approach to be generalized to non-parametric versions of other multilevel mixtures (e.g., multilevel latent class analysis) and to be used to clarify indirect relationships in other contexts. 4 In general, the NPMM would be expected to achieve a closer approximation of first moments and second central moments of MLM s random effects distribution with greater numbers of classes. Also note that the NPMM can fit aspects of the data that MLM cannot (i.e., higher-order moments). See, for example, Bauer and Curran (3).

4 Non-parametric multilevel regression mixture 319. Relationship of the NPMM to the random-intercept-only MLM and ICC To begin, we consider a null NPMM. In this model, there are no covariates; rather, the distribution of the outcome variable, y ij, is allowed to differ across latent classes at level 1 and level. Let i denote a level-1 unit, or observation within cluster (i = 1,..., N j ), and j denote a level- unit or cluster (j = 1,..., J ). Latent class membership at level 1 and level is denoted c ij and d j, respectively, with particular level-1 classes denoted by k (k = 1,..., K) and level- classes by h (h = 1,..., H). 5 For instance, suppose we were analysing scores on a mathematics achievement test for students (level-1 units) nested within schools (level- units). For the distribution of maths scores, we may specify a model with multiple school-level latent classes (e.g., level- classes of schools with primarily highachieving students vs. level- classes of schools with primarily low-achieving students) and, within each school-level class, multiple student-level latent classes (e.g., level-1 classes of high-achieving students vs. low-achieving students). More formally, this null NPMM is specified in equation (1) in Table 1 6 as y ij j cij ¼ k;d j ¼ h ¼ c kh þ e ij. In equation (1), the outcome variable y ij, where c ij = k and d j = h, is modelled by a kh class-combination intercept, c kh, and residual, e ij. The latter is assumed normally distributed with class-combination-specific 7 variance h kh, that is, e ij ~ N(, h kh ). Thus, the c kh and h kh parameters are specific to a particular combination of level-1 class and level- class. There are three types of probabilities important to the NPMM. The marginal probability of level- class membership, pðh j ¼ dþ ¼p dh, represents the probability of a randomly selected observation belonging to level- class h. It is modelled by a multinomial regression, pðd j ¼ hþ ¼p dh ¼ expð- h Þ= P H h¼1 expð-h Þ; with multinomial intercept ϖ h fixed at in class H for identification. Additionally, the conditional probability of level-1 class membership given level- class membership, pðc ij ¼ kjh j ¼ dþ ¼p kjh ;represents the probability of belonging to level-1 class k for a randomly selected observation within level- class h. It is modelled by another multinomial regression, pðc ij ¼ kjd j ¼ hþ ¼p kjh ¼ expðx k þ d kh Þ= P K k¼1 expðxk þ d kh Þ; with multinomial intercept x k and slope (k on h) d kh. For identification, x K = d Kh = d kh =. Finally, the kh class-combination probability p kh represents the joint probability of a randomly selected observation belonging to level-1 class k and level- class h. This is computed as the product of p dh and p k h. Under a direct interpretation of equation (1), one could consider intercept and residual variance differences between classes to represent substantively meaningful differences between distinct subpopulations. For example, average maths achievement scores may differ across distinct school-level (i.e., level-) classes, and we may be particularly interested in examining membership in kh class combinations such as a high-achieving student-level (i.e., level-1) class within a low-performing school-level class, or a low-achieving student-level class within a high-performing school-level class. 5 Though selecting K and H is not our focus, we revisit this topic later in the context of an empirical example. 6 Equations (1) (14) are contained in Table 1, and equations (15) and (16) in Table. 7 In empirical research, it is common for some or all class-combination residual variances to be constrained equal across class combination, that is h kh = h, for estimation stability (for discussion, see McLachlan & Peel, ).

5 3 Jason D. Rights and Sonya K. Sterba Table 1. Multilevel model (MLM) specifications with corresponding non-parametric multilevel regression mixture model (NPMM) specifications MLM NPMM a Random-intercept-only model yij = c + uj + eij () uj ~ N(, s) eij ~ N(, r ) + Fixed slope of x ij yij = c + c1xij + uj + eij (4) uj ~ N(, s) eij ~ N(, r ) + Random slope of xij yij = c + c1xij + uj + u1jxij + eij (6) uj s N ; eij Nð; r Þ u1j s1 s11 yijj cij ¼ k;dj ¼ h ¼ c kh þ e ij (1) eij Nð; h kh Þ yijj cij ¼ k;dj ¼ h ¼ c kh þ c 1 x ij þ eij (3) eij Nð; h kh Þ yijj cij ¼ k;dj ¼ h ¼ c kh þ c h 1 x ij þ eij (5) eij Nð; h kh Þ + Level-1 heteroscedasticity on xij y ij = c + c 1 x ij + u j + u 1j x ij + e ij + e 1ij x ij (8) uj u1j N e ij ; s s1 s11 e1ij N ; r r;1 r 1 yijj cij ¼ k;dj ¼ h ¼ c kh þ ckh 1 x ij þ eij (7) eij Nð; h kh Þ + Fixed slope of w j yij = c + c1xij + c1wj + uj + u1jxij + eij + e1ijxij (1) uj s e ij r N ; N ; u1j s1 s11 e1ij r;1 r 1 yijj cij ¼ k;dj ¼ h ¼ c kh þ ckh 1 x ij þ c w j þ eij (9) eij Nð; h kh Þ Continued

6 Non-parametric multilevel regression mixture 31 Table 1. (Continued) MLM NPMM a + Level- heteroscedasticity on wj 4 yij = 3 c + c1xij 3 + c1wj + uj + u3jwj u1jxij + eij + e1ijxij uj s 5 u3j 4 5; 4 5 A e ij r s3 s33 N ; e1ij r;1 r 1 u1j s1 s13 s11 (1) yijj ¼ c kh þ ckh c h þ cij ¼ k;dj ¼ h þ ckh 1 x w j eij ij þ c h eij Nð; h kh Þ (11) + Level-1 heteroscedasticity on wj y ij = c + c 1 x ij + c 1 w j + u j + u 3j w j + u 1j x ij + e ij + e 1ij x ij + e ij w j 4 uj u3j u1j ; 4 s s3 s33 s1 s13 s A e ij 4 e1ij eij ; 4 r r;1 r 1 r; r1; r A (14) yijj ¼ c kh þ ckh c kh þ cij ¼ k;dj ¼ h þ ckh 1 x w j eij ij þ eij Nð; h kh Þ (13) Note. a For all NPMM models: pðdj ¼ hþ ¼p d h ¼ expðh Þ= P H h¼1 expð-h Þ pðcij ¼ kjdj ¼ hþ ¼p kjh ¼ expðx k þ d kh Þ= P K k¼1 expðxk þ d kh Þ For identification, - H ¼ x K ¼ d Kh ¼ d kh ¼

7 3 Jason D. Rights and Sonya K. Sterba Table. Matrix specification of corresponding general NPMM (equation 15) and MLM (equation 16) MLM general specification y ij ¼ x ij c þ w ij u j þ v ij e ij u j ~ N(, T) (16) e ij ~ N(, Σ) where: x ij vector of 1 and all L1 or L covariates with fixed components c vector of corresponding fixed components v ij vector of 1 and L1 or L covariates affecting L1 residuals e ij vector of L1 random effects modelling L1 heteroscedasticity Σ covariance matrix of e ij w ij vector of 1 and all covariates with random components u j vector of corresponding L random effects T covariance matrix of u j Φ submatrix of T containing covariances of random effects modelling L heteroscedasticity (see equation 37) w j submatrix of w ij containing 1 and L covariates affecting L residuals (see equation 37) NPMM general specification y ij j cij ¼ k;d j ¼ h ¼ z ij ckh þ e ij e ij Nð; h kh Þ (15) p(d j = h) = p d h p(c ij = k d j = h) = p k h where: z ij vector containing 1 and all L1 and L covariates c kh vector of coefficients for class combination kh h kh residual variance for class combination kh p h and p k h : see Table 1 Notes. L1 = level-1; L = level-. An indirect interpretation of equation (1), however, could consider this NPMM a discrete approximation of a corresponding MLM. For example, rather than the schoollevel and student-level latent classes representing distinct subpopulations, they may be approximating continuous distributions of deviations in maths scores at each level. To illustrate, we begin by defining the corresponding MLM in equation (), the randomintercept-only (null) MLM. Specifically, in equation () we have y ij = c + u j + e ij, where outcome y ij is modelled by a mean intercept, c, cluster-specific deviation, u j ~ N (, s ), and level-1 residual, e ij ~ N(, r ). In contrast to the NPMM, all variance (withincluster and between-cluster) is modelled by continuously distributed residuals, rather than through discrete latent class variation. Researchers fitting MLMs typically use this null MLM exclusively to calculate and interpret the ICC (e.g., Snijders & Bosker, 1), which involves partitioning outcome variance into within-cluster (r ) and between-cluster (s ) components. Specifically, the ICC is the proportion of variance that is between clusters: ICC ¼ s s þ r : ð17þ Because the ICC is typically the only quantity of interpretive interest from the null MLM, to mathematically relate the null NPMM and null MLM thus aiding in an indirect interpretation of NPMM we first will focus on how the ICC would be computed from NPMM parameters.

8 Non-parametric multilevel regression mixture 33 Preliminarily, it is important to consider why one would want to compute the ICC from NPMM parameters. In an MLM context, researchers typically compute the ICC using equation (17) to determine if their degree of nesting is great enough to necessitate fitting MLMs and, if so, report their non-trivial ICC as a rationale for using the MLM. In an NPMM context, no such dependency measure currently exists that uses parameters of a fitted null NPMM model, and consequently there is often no rationale given for the necessity of fitting multilevel data with an NPMM as opposed to, say, fitting a single-level latent class regression model (see Park & Yu, 15). Having such a measure of ICC for the NPMM would allow researchers to understand the degree of nesting in a familiar correlation metric, without having to fit a second model (i.e., a null MLM). In MLMs, calculation of the ICC is straightforward. In NPMMs, however, there is no explicit (i.e., directly estimated) within-cluster or between-cluster variance component. Therefore, we must first partition variance analytically in the NPMM to obtain these quantities, and then use them in constructing the implied ICC. Because all clusters vary discretely in the NPMM (i.e., by level- class), partitioning variance into within-cluster and between-cluster components amounts to partitioning variance into within level- class and between level- class components, respectively. Fig 1 illustrates the concept of partitioning within-cluster and between-cluster variance in a null NPMM. In this figure, both panels depict a mixture of four kh class-combination densities generated from a null NPMM with two level-1 classes (K = ) and two level- classes (H = ). As shown in Fig 1a, to examine within-cluster variance, we examine how parameters vary by level-1 class k within level- class h. Specifically, in Fig 1a, each vertical line represents a kh class-combination intercept, c kh. Comparing the two leftmost vertical lines (denoted c 11 and c 1, respectively), we can see that there is class-combination intercept variance within level- class 1 (i.e., the intercepts c 11 and c 1, both belonging to h = 1, differ). Assessing within level- class variance in classcombination intercepts (and/or residuals) parallels assessing within-cluster variance in MLM. Though c kh is specific to class combination kh, we can marginalize across k within h (denoted E k h ( ) where E represents expectation). We thus obtain marginal level- class parameters, denoted with a dh superscript: E kjh ðc kh Þ¼c h. To examine between-cluster variance, we examine how marginal level- class parameters vary by h, that is, between level- classes, as shown in Fig 1b. There, each vertical line represents a marginal level- class intercept, c h. Comparing the two vertical lines (denoted c 1 and c ), we can see how there is marginal level- class intercept variance between level- classes. Assessing this between level- class variance in marginal level- class intercepts parallels assessing between-cluster variance in MLMs. With this background, we proceed to first calculate within level- class variance from the null NPMM. Because we are relating it to r in the MLM, it s termed an implied r, denoted r NP. Specifically, we take the variance of the intercepts and residuals from equation (1) across level-1 classes within each level- class, denoted var k h ( ), and pool across level- classes, denoted E h ( ), to obtain an overall variance, r NP ¼ E h½var kjh ðc kh þ e ijþš ¼ E h ½E kjh ½ðc kh c h Þ þ h kh ŠŠ: ð18þ The population expression in equation (18) can be converted to a sample estimate by replacing expectations with weighted sums and replacing parameters with corresponding estimates:

9 34 Jason D. Rights and Sonya K. Sterba Figure 1. Conceptualizing within-cluster and between-cluster variance in the null NPMM (equation 1) through weighted within and between level- class intercept differences. (a) Comparing class-combination intercepts within level- class to assess within-cluster variance. (b) Comparing marginal level- class intercepts between level- classes to assess between-cluster variance. Notes. These plots show the kernel density of data generated (in SAS 9.4; SAS Institute Inc., Cary, NC, USA) from the null NPMM in equation (1). There are two level-1 classes (K = ) and two level- classes (H = ), making four kh class combinations. Each plot is a mixture of four normal densities; each density corresponds to a kh class combination. In (a), each vertical line represents a class-combination intercept, c kh. In (b), each vertical line represents a marginal level- class intercept, c h. By comparing these two plots, we can see conceptually how class-combination parameters (here intercepts, but in other contexts also/instead level-1 class probabilities) can vary within level- classes (by level-1 class). We can also see how marginal level- class parameters can vary between level- classes. Comparing parameters across level-1 classes within level- class parallels assessing within-cluster variance in a fitted MLM; comparing parameters between level- classes parallels assessing between-cluster variance in a fitted MLM. ^r NP ¼ XH XK ^p h h¼1 k¼1 ¼ XH X K h¼1 k¼1 ^p kjh ðð^c kh ^p kh ðð^c kh ^c h Þ þ ^h kh Þ ^c h Þ þ ^h kh Þ: ð19þ We now calculate the between level- class variance from the null NPMM, which we term the implied s, denoted s (NP). To do so, we take the variance of marginal level- class intercepts between level- classes, s ðnpþ ¼ var h ðe kjh ½c kh ŠÞ ¼ var hðc h Þ¼E h½ðc h Þ Š E h ½c h Š : ðþ This population expression also can be converted to a sample estimate,

10 ^s ðnpþ ¼ var h X K k¼1 kh ^p kjh^c! ¼ XH h¼1 Non-parametric multilevel regression mixture 35 ^p h ð^c h Þ XH h¼1 ^p h^c h! : ð1þ Using r NP and s (NP) from the NPMM, we then compute the implied ICC: 8 s ðnpþ ICC NP ¼ s ðnpþ þ r : ðþ NP With these implied variance components now enumerated for the NPMM, we can better understand how NPMM parameters relate to the degree of nesting, that is, the magnitude of the ICC if an MLM were fitted. This can aid researchers in indirectly interpreting classes from a null NPMM. Specifically, examining equation (), we see that greater spread of marginal level- class intercepts in NPMM implies larger between-cluster variance in a fitted MLM (increasing ICC). In other words, with more level- class separation, there is more distinction between clusters and, thus, greater effects of nesting. This is illustrated by comparing Fig a,b. Furthermore, examining equation (18), we see that smaller kh class-combination residual variance in NPMM implies smaller withincluster variance in a fitted MLM (increasing ICC). This is illustrated by comparing Fig a,c. Also, from equation (18) we see that less spread of intercepts across level-1 classes within level- class in the NPMM implies smaller within-cluster variance in MLM (increasing ICC). This indicates less distinction across level-1 classes within level- class and, thus, less overall variability within clusters. This pattern is illustrated by comparing Fig a,d. 9 In sum, in addition to directly interpreting kh class-combination parameter differences in the null NPMM, researchers can better understand how the NPMM accounts for nestedness by indirectly interpreting these parameters in relation to the familiar ICC from the MLM. That is, it is now possible for researchers to simultaneously consider two options: that nesting is driven by some continuous distribution of intercepts across clusters, or that nesting is driven by patterns of discrete class differences. We will revisit both in the empirical example in Section Relationship of NPMM to random slope MLM Suppose now that a researcher is interested in fitting a more complex NPMM, involving a single level-1 covariate, x ij, whose effect on y ij is allowed to differ across level- classes, as in equation (5). 1 In equation (5), c h 1 denotes the level- class h slope of x ij; this slope does not vary across k within h. In equation (5), the intercept c kh is again allowed to differ across all kh class combinations. 8 Note that each term in equation () is a variance and is thus bounded by. The implied ICC is thus also bounded by. 9 In the Fig demonstration, between level- class differences are driven by intercepts that vary freely across class combination. For simplicity, generating kh class-combination probabilities are equal. Alternatively, even if parameters (e.g., intercepts) were equal across h for all k (i.e., c 11 ¼ c 1 and c1 ¼ c ), between level- class differences could still be driven by level-1 class proportions varying by level- class. Similar ICC patterns could be found. 1 Covariates can also/instead predict class membership (e.g., Muthen & Asparouhov, 9; Vermunt, 3), but we do not include this here because our focus is on non-parametric multilevel mixture regression models, where covariates predict outcomes within class.

11 36 Jason D. Rights and Sonya K. Sterba From a direct interpretation of classes, the marginal level- class slopes (c h 1 ) could be interpreted as substantively meaningful slopes corresponding to different subpopulations of clusters. For example, suppose the number of hours students spend studying at home (x ij ) has a strong effect on maths achievement in one school-level class but a weak effect in another school-level class, perhaps because these school-level classes differ in how effectively course material is taught in the classroom. Alternatively, we can indirectly interpret these slopes as approximating an underlying continuous distribution of slopes across clusters. We delineate the indirect interpretation by first defining the MLM corresponding to the NPMM in equation (5) namely, the random slope MLM in equation (6). In equation (6), there are normally distributed random effects at level- for the intercept, u j ~ N(, s ), and slope of x ij, u 1j ~ N(, s 11 ), which are allowed to covary (s 1 ). This yields a 9 covariance matrix of random effects, T. There is also a fixed component for the slope, denoted c 1, and intercept, denoted c. Let c denote the vector of fixed effects. Here, c = [c c 1 ]. Next, to aid in an indirect interpretation, we compute the implied c and implied T from equation (5) NPMM parameters. To approximate c using NPMM parameters, we first define c kh as the vector of all coefficients in the NPMM and take its expectation across k and h: c NP ¼ E h ½E kjh ½c kh ŠŠ ¼ c : ð3þ The corresponding sample estimate of equation (3) is ^c NP ¼ XH X K h¼1 k¼1 ^p kh^c kh ¼ ^c : ð4þ To approximate T using NPMM parameters, we first define c h as a vector of marginal level- class coefficients in the NPMM with implied random effects. Here, c h ¼½c h c h 1 Š. Because the intercept (c kh ) is allowed to differ across all kh in equation (5), the marginal level- class h intercept, c h, is obtained as E kjh½c kh Š¼c h. Because the slope of x ij differs only across h in equation (5), the marginal level- class h slope c h 1 is directly estimated.11 We can then take the variance of c h across level- classes to obtain the implied T, approximated using NPMM parameters: T NP ¼ var h ðc h Þ¼E h ½c h c h Š E h ½c h ŠE h ½c h Š: ð5þ Expanding this expression for our example yields a 9 implied T consisting of the implied intercept variance (as in equation ), implied slope variance, and implied intercept slope covariance: T NP ¼ Eh½ðc h Þ Š E ½c h h Š E ½c h h c h 1 Š Eh½c h ŠEh½c h 1 Š Eh½ðc h 1 Þ Š E ½c h h 1 Š : ð6þ The matrix expression in equation (5) would apply generally, that is, to any number of level-1 covariates whose effects in c h differ across level- classes. The corresponding sample estimate of equation (5) is 11 If the slope of x ij does vary across kh, it would be computed as c h 1 ¼ E kjh ½c kh 1 Š, as done later.

12 Non-parametric multilevel regression mixture 37 (a) versus (b): Increased marginal level- class intercept separation in NPMM increases ICC in fitted MLM (a) Frequency γ γ (b) Frequency γ γ y ij y ij Fitted MLM ICC.81 τ 7.56 σ 1.81 Fitted MLM ICC.87 τ 14.4 σ 1.81 (a) versus (c): Decreased class-combination residual variance in NPMM increases ICC in fitted MLM (c) Frequency γ 1 γ y ij Fitted MLM ICC.85 τ 7.55 σ 1.33 (a) versus (d): Decreased class-combination intercept separation within L classes in NPMM increases ICC in fitted MLM (d) Frequency γ 1 γ Fitted MLM ICC τ σ y ij Figure. Relationship between NPMM parameters and the ICC from a fitted MLM: generating model = null NPMM (equation 1) and fitted model = null MLM (equation ). (a) versus (b) Increased marginal level- class intercept separation in NPMM increases ICC in fitted MLM. (a) versus (c) Decreased class-combination residual variance in NPMM increases ICC in fitted MLM. (a) versus (d) Decreased class-combination intercept separation within level- classes in NPMM increases ICC in fitted MLM. Notes. Plots show kernel densities of data generated from a K = (level-1 classes), H = (level- classes) null NPMM and fitted with a random-intercept-only MLM. For illustrations in Figures 5, data were generated in SAS 9.4 and MLM estimates were computed using Proc MIXED. Note that the same pattern of results would hold with smaller ICC values; the conditions here were chosen for clarity of graphical illustration.

13 38 Jason D. Rights and Sonya K. Sterba ^T NP ¼ var ð^c h h Þ¼ XH ^p h^c h^c h h¼1 XH h¼1 ^p h^c h! X H h¼1 ^p h^c h!: ð7þ A key result from equations (5) to (7) is that the implied random effect variance from the NPMM will be non-zero only when there is variance in the marginal level- class parameters. This is illustrated in Fig 3a. On the other hand, if the marginal level- class parameters are equal, the implied random effect variance will be zero. This is illustrated in Fig 3b. In the context of fitting an NPMM, if marginal level- class parameters are explicitly constrained equal, this yields an approximation of a fixed slope, as shown in equations (3) and (4). In the discussion in Section 8, we consider the potential for utilizing this knowledge in significance testing of implied random effect variances using NPMM parameters. 4. Relationship of NPMM to MLM with heteroscedasticity of residuals at level 1 Suppose now that a researcher is interested in fitting an NPMM where the effect of a level-1 covariate, x ij, is allowed to vary by level-1 class within level- class, as in equation (7). That is, in the equation (7) NPMM, the slope of x ij is allowed to vary not only across h (as in equation 5), but also across k within each h. The slope of x ij in class combination kh is denoted c kh 1. From a direct interpretation standpoint, we could consider x ij slope variation by level-1 class within level- class to be substantively meaningful. For instance, within the school-level class where the number of hours spent studying (x ij ) strongly affects maths achievement, the effect of studying may still vary across student-level classes (perhaps because student-level classes differ in ability levels and test anxiety). Alternatively, we could consider an indirect interpretation for this x ij slope variation within level- class. However, no previous sources have addressed precisely what this slope variation in the NPMM would be approximating in MLM. Here we show that, under an indirect interpretation, slope variation within level- class in the NPMM approximates heteroscedastic level-1 residual variance in MLM, in which MLM residuals (e ij ) depend linearly on x ij. To explain this, we start by defining the MLM corresponding to the NPMM in equation (7). This corresponding MLM is provided in equation (8), where e ij = e ij + e 1ij x ij (e.g., Goldstein, 11; Snijders & Berkhof, 7). This non-standard MLM specification is conceptually equivalent to having a random effect of x ij at level 1, meaning that the effect of x ij varies across level-1 units (unlike a conventional random effect that varies across clusters). The equation (8) MLM 1 where residuals depend linearly on x ij yields the following quadratic residual variance expression (Goldstein, 11; Snijders & Bosker, 1): 1 Note that residuals can also depend linearly on a covariate in single-level contexts. Thus, the approximation outlined in this section is also relevant for relating single-level regression mixture models to conventional singlelevel regression models with residual heteroscedasticity.

14 Non-parametric multilevel regression mixture 39 yˆ ij yˆ ij (a) (b) x ij x ij Figure 3. Relationship between NPMM parameters and the random slope from a fitted MLM. (a) Implied random slope and (b) implied fixed slope. Notes. Both plots show predicted scores (^y ij jx ij ; u j ) in a fitted MLM. Each line represents the regression line for a single cluster. Data were generated from a K = 4, H = 4 NPMM with a single cluster-mean-centred level-1 covariate. Analyses in (a) are of data generated from an NPMM with varying marginal level- class slopes of x ij, whereas analyses in (b) are of data generated from an NPMM with constant marginal level- class slopes of x ij. As such, the data for (a) are consistent with a random slope in a fitted MLM and the data for (b) are consistent with a fixed slope in a fitted MLM. r ij ¼ var ijðe ij þ e 1ij x ij Þ¼r þ x ij r 1 þ x ijr ;1 : ð8þ Variances r and r 1 and covariance r,1 are defined in equation (8). For instance, this MLM allowing for level-1 heteroscedasticity can be useful when within-cluster variability in maths achievement is expected to be smaller for students who spend moderate hours studying, but larger for students who spend many or few hours studying. Within cluster, achievement can be highly variable for students who study intensively (some perform well whereas others perform poorly due to test anxiety) or study infrequently (some perform poorly whereas others perform well due to high ability). Taken together, equation (8) clarifies that the residual variance in the equation (8) MLM is heteroscedastic: r ij is specific to the x ij score for observation i in cluster j. To further clarify how the equation (7) NPMM indirectly approximates the MLM s level-1 heteroscedasticity in equation (8), we compute the corresponding quadratic variance expression with NPMM parameters. Specifically, we take the variance of y ij j cij ¼ k;d j ¼ h across level-1 classes within level- class, conditional on x ij, and pool across level- classes:

15 33 Jason D. Rights and Sonya K. Sterba r ijðnpþ ¼ E h½var kjh ðc kh þ ckh 1 x ij þ e ij ÞŠ ¼ E h ½E kjh ½ðc kh The sample estimate of equation (9) is ^r ijðnpþ ¼ XH X K h¼1 k¼1 ^p kh ðð^c kh c h Þ þ xij ðckh 1 c h 1 Þ þ x ij ðc kh c h Þðckh 1 c h 1 Þþhkh ŠŠ: ð9þ ^c h Þ þ xij ð^ckh 1 ^c h 1 Þ þ x ij ð^c kh ^c h Þð^ckh 1 ^c h 1 Þþ^h kh Þ: ð3þ Comparing equation (9) to (8), the expectation of ðc kh c h Þ þ h kh approximates r, the expectation of ðckh 1 c h 1 Þ approximates r 1, and the expectation of ðc kh c h Þðckh 1 c h 1 Þ approximates r,1. More generally, with q covariates affecting level-1 residuals, the MLM level-1 residual variance in equation (8) is expressed as r ij ¼ v ij Rv ij ð31þ where the vector v ij contains 1 and q covariates affecting level-1 residuals and Σ is a (q + 1) 9 (q + 1) covariance matrix of level-1 random effects used in modelling r ij. The corresponding expression implied by the NPMM generalizes to r ijðnpþ ¼ E h½var kjh ðv ij tkh þ e ij ÞŠ ¼ v ij Eh ½E kjh ½ðt kh t h Þðt kh t h Þ þ ah kh a ŠŠ v ij ¼ v ij R NPv ij ; ð3þ where a ¼ v ij v ij v ij 1 and υ kh is the (q +1)91 subvector of c kh containing coefficients for v ij. In the sample, this yields ^r ijðnpþ ¼ v ij X H X K h¼1 k¼1 ¼ v ij ^R NP v ij : ^p kh ðð^t kh ^t h Þð^t kh ^t h Þ þ a^h kh a Þ From equations (9) and (3), a key result is evident: if there is no slope variation across level-1 class within level- class (e.g., c kh 1 ¼ c h 1 for all h, as in equation 5), the linear and quadratic terms drop out. All that remains is a constant, implied homoscedastic residual variance, given by equation (19). This point is illustrated in Fig 4b in which, under these conditions, the variance of residuals ^e ij from a fitted MLM is shown to be homoscedastic constant across x ij. Fig 4a depicts the alternative situation: when level-1 class slopes vary within level- class (c kh 1 6¼ c h 1 for some h and k) the variance of residuals ^e ij from a fitted MLM is heteroscedastic greater at the extremes of x ij. In sum, in NPMM, slope differences by level-1 class within level- class afford not only a direct interpretation, but also an indirect interpretation in terms of accommodating level-1 heteroscedasticity in MLM. Given that the latter MLM specification is! v ij ð33þ

16 Non-parametric multilevel regression mixture 331 (a) eˆij x ij (b) eˆij x ij Figure 4. Relationship between NPMM parameters and heteroscedasticity of residuals at level-1 in MLM. (a) Level-1 heteroscedasticity in fitted MLM and (b) Level-1 homoscedasticity in fitted MLM. Notes. Both plots show estimated level-1 residuals from a fitted MLM, ^e ij, plotted by x ij. Data were generated from a K = 4, H = 4 NPMM with a single group-mean-centred level-1 covariate. Analyses in (a) are of data generated from an NPMM with varying slopes across level- 1 class within level- class, while analyses in (b) are of data generated from an NPMM with constant slopes across level-1 class within level- class. As such, the data for (a) are consistent with level-1 heteroscedasticity in a fitted MLM and the data for (b) are consistent with level-1 homoscedasticity in a fitted MLM. underused, unfamiliar (Goldstein, 11; Snijders & Bosker, 1) and, moreover, unavailable in some MLM software, an indirect interpretation of NPMM can be used to highlight or investigate level-1 heteroscedasticity. 5. Relationship of NPMM to MLM with heteroscedasticity of residuals at level- Now suppose a researcher is interested in fitting a more complex NPMM that also includes a level- covariate, w j, such as student teacher ratio. If we fit the NPMM in equation (9) where the slope of w j does not vary across level- or level-1 classes (c ), the corresponding MLM is defined as having a fixed slope of w j, as shown in equation (1). Suppose instead we fit the NPMM in equation (11) where the slope of w j (i.e., c h ) varies across level- classes. From a direct interpretation standpoint, one could consider variance in the marginal level- class slope of w j between level- classes in equation (11) to represent substantively

17 33 Jason D. Rights and Sonya K. Sterba meaningful slope differences between cluster-level subpopulations. For example, perhaps the effect of student-teacher ratio (w j ) on maths achievement is stronger for some school-level classes (e.g., those with more focus on class participation) compared to other school-level classes. If we instead desired an indirect interpretation of equation (11), previous sources have not addressed what the w j slope variation across level- class in the NPMM would be approximating in the MLM. Under an indirect interpretation, w j slope variation across level- classes in the NPMM approximates heteroscedastic level- residual variance in the MLM, where MLM intercept residuals, u j, depend linearly on w j. To illustrate this, we first define the MLM corresponding to the NPMM in equation (11). This non-standard MLM is provided in equation (1), where u j = u j + u 3j w j (see Goldstein, 11; Snijders & Berkhof, 7). Modelling level- heteroscedasticity in this way in the MLM essentially adds a random effect, u 3j, for a level- covariate at level, only u 3j is not interpreted as a conventional random effect, but as a means to model heteroscedasticity. The equation (1) MLM yields the following quadratic variance expression for intercept residuals (e.g., Goldstein, 11; Snijders & Bosker, 1): s j ¼ varðu j þ u 3j w j Þ¼s þ w j s 33 þ w j s 3 : ð34þ The variances s and s 33 and covariance s 3 were defined in equation (1). Such an expression can be useful when, for example, larger intercept residual variance is anticipated for more extreme student teacher ratios. This heteroscedastic intercept variance, s j, is specific to cluster j because it depends on the value of w j. To further illustrate this indirect interpretation of the equation (11) NPMM, we compute the quadratic variance expression corresponding to equation (34) that is implied by NPMM parameters, as shown in the following equation, which computes the variance of the sum of the marginal level- class intercept and slope of conditional on w j : s jðnpþ ¼ var ðc h h þ c h w jþ ¼ E ½ðc h h c Þ þ w j ðc h c Þ þ w ðc h j c Þðc h c ÞŠ: ð35þ Comparing equation (35) to (34), the expectation of ðc h c Þ approximates the variance s, the expectation of ðc h c Þ approximates the variance s 33, and the expectation of ðc h c Þðc h c Þ approximates the covariance s 3. The sample counterpart to equation (35) is: ^s jðnpþ ¼ XH h¼1 ^p h ðð^c h ^c Þ þ w j ð^c h ^c Þ þ w ð^c h j ^c Þð^c h ^c ÞÞ: ð36þ We now consider a more general expression with p level- covariates affecting intercept residuals. The equation (34) MLM expression generalizes to s j ¼ w j Uw j; ð37þ where the vector w j contains 1 and p level- covariates affecting intercept residuals and Φ is a (p + 1) 9 (p + 1) covariance matrix, a submatrix of T, used in modelling the

18 Non-parametric multilevel regression mixture 333 heteroscedastic intercept variance, s j. The corresponding expression implied by the NPMM generalizes to s jðnpþ ¼ var h ðw j a h Þ ¼ w j Eh½ða h a Þða h a Þ Šw j ¼ w j U NPw j ; ð38þ where a h is a (p +1) 9 1 subvector of c h containing marginal level- coefficients for w j. In the sample this yields ^s jðnpþ ¼ w j X H h¼1 ¼ w j ^U NP w j : ^p h ð^a h ^a Þð^a h ^a Þ!w j ð39þ From equations (35) and (36) a key result is evident: if there is no marginal level- class slope variation for level- covariates in NPMM (e.g., c h ¼ c for all h, as in equation 9), the linear and quadratic terms drop out. This leaves the implied homoscedastic intercept variance expression given by equation (). This point is illustrated in Fig 5b in which, under these conditions, the variance of level- intercept residuals in a fitted MLM, ^u j, remains constant across w j (i.e., homoscedastic). Fig 5a illustrates the opposite situation: provided c h 6¼ c for some h, there is implied heteroscedasticity of level- intercept residuals in a fitted MLM. Thus, an indirect interpretation of NPMM offers the ability to explore and accommodate MLM level- intercept heteroscedasticity another underused and unfamiliar MLM specification (Snijders & Bosker, 1). 6. Relationship of NPMM to general matrix formulation of MLM In previous sections, we demonstrated specific relationships between particular NPMM and MLM specifications in isolation. In this section, we integrate these results to (a) describe a general NPMM expression, (b) define its counterpart MLM, and (c) summarize how this general NPMM indirectly accommodates any number of covariates and any combination of implied fixed or random effects and implied homoscedasticity or heteroscedasticity of level-1 and/or level- residuals in MLM. The general NPMM specification is given in equation (15) in Table. Here, intercepts and both level-1 and level- covariate slopes are allowed to freely vary across all kh class combinations. This general NPMM in equation (15) corresponds with the MLM specification in equation (16) when v ij contains all covariates (i.e., v ij = z ij ) and w j contains all level- covariates. Specifically, the general equation (15) NPMM indirectly approximates the following in the corresponding MLM: random intercept, random slopes for all level-1 covariates (because their c h vary across h), level-1 heteroscedasticity for all covariates (because their c kh vary across k within h), and level- heteroscedasticity for all level- covariates (because their c h vary across h). To summarize how the MLM parameters in equation (16) are indirectly approximated with general NPMM parameters: c is indirectly approximated by equation (4), T by equation (7), Σ by equation (33), and Φ by equation (39). An R function for computing elements of each matrix is provided in our Appendix S1 to aid in indirect interpretation of the NPMM.

19 334 Jason D. Rights and Sonya K. Sterba (a) uˆ j w j (b) uˆ j w j Figure 5. Relationship between NPMM parameters and heteroscedasticity of intercept residuals at level in MLM. (a) Level- heteroscedasticity in fitted MLM and (b) Level- homoscedasticity in fitted MLM. Notes. Both plots show estimated level- intercept residuals in a fitted MLM, ^u j, plotted by the level- covariate w j. Data were generated from an NPMM. Analyses in (a) are of data generated from an NPMM with varying marginal level- class slopes, while analyses in (b) are of data generated from an NPMM with constant marginal level- class slopes. As such, the data for (a) are consistent with level- heteroscedasticity in a fitted MLM and the data for (b) are consistent with level- homoscedasticity in a fitted MLM. *Generating models included a single level- covariate, w j, which is slightly different than equations (9) and (11), which also include x ij. 7. Empirical example We now present an empirical application implementing the indirect interpretation of NPMM and, further, demonstrate how this interpretation can provide additional utility beyond simply implementing either an MLM or a direct interpretation of the NPMM. Using a data set of students nested within schools from the Trends in International Mathematics and Science Study (TIMSS; Mullis, Martin, Gonzalez, & Chrostowski, 4), we investigate effects of mathematics confidence (MATHCONF, cluster-mean-centred level-1 x ij ) and percentage of students receiving free/reduced lunch (PERLUNCH, level- w j ) on mathematics exam scores involving data/probability (MATH) for eighth graders. Confidence should be positively associated with maths achievement (Pajares & Miller, 1994) as it is thought to enable students to better demonstrate their knowledge. Schoollevel socioeconomic deprivation should be negatively associated with maths achievement (Bradley & Corwyn, ) as it is a proxy for disadvantages that impede learning (for further discussion, see Lubinski, 9). Level-1 residual heteroscedasticity by confidence could arise if students with high or low confidence levels have more performance variability

Multilevel Modeling: A Second Course

Multilevel Modeling: A Second Course Kristopher Preacher, Ph.D. Upcoming Seminar: February 2-3, 2017, Ft. Myers, Florida What this workshop will accomplish I will review the basics of multilevel modeling