Growth mixture modeling: Analysis with non-gaussian random effects

Size: px
Start display at page:

Download "Growth mixture modeling: Analysis with non-gaussian random effects"

Transcription

1 CHAPTER 6 Growth mixture modeling: Analysis with non-gaussian random effects Bengt Muthén and Tihomir Asparouhov Contents 6.1 Introduction Examples Example 1: Clinical trials with placebo response Example 2: Randomized interventions with treatment effects varying across latent classes Example 3: High school dropout predicted by failing math achievement development Example 4: Age crime curves Example 5: Classification of schools based on achievement development Other applications Growth mixture modeling Specification of a simple growth model A general multilevel mixture model Estimation Model assessment Examples Analysis of Example 4: Age crime curves Analysis of Example 2: Varying intervention effects on classroom aggressive behavior Analysis of Example 5: Classification of schools based on achievement development Parametric versus non-parametric random-effect models Parametric random-effect model Non-parametric random-effect model Simulation study Conclusions Acknowledgments References Introduction This chapter gives an overview of non-gaussian random-effects modeling in the context of finite-mixture growth modeling developed in Muthén and Shedden (1999), Muthén (2001a, 2001b, 2004), and Muthén et al. (2002), and extended to cluster samples and clusterlevel mixtures in Asparouhov and Muthén (2008). Growth mixture modeling represents

2 144 GROWTH MIXTURE MODELING unobserved heterogeneity between the subjects in their development using both random effects (e.g., Laird and Ware, 1982) and finite mixtures (e.g., McLachlan and Peel, 2000). This allows different sets of parameter values for mixture components corresponding to different unobserved subgroups of individuals, capturing latent trajectory classes with different growth curve shapes. This chapter discusses examples motivating modeling with such trajectory classes. A general latent-variable modeling framework is presented together with its maximum likelihood estimation. Examples from criminology, mental health, and education are analyzed. The choice of a normal or a non-parametric distribution for the random effects is discussed and investigated using a simulation study. The discussion will refer to growth mixture modeling techniques as implemented in the Mplus program (Muthén and Muthén, ) and input scripts for the analyses are available at The outline of this chapter is as follows. Section 6.2 presents examples with substantive questions that motivate growth mixture analysis. Section 6.3 describes the general model. Section 6.4 discusses estimation and model assessment. Section 6.5 illustrates the modeling with a series of examples. Section 6.6 compares the parametric and non-parametric versions of the random-effect model. Section 6.7 concludes. 6.2 Examples The following examples show the breadth of longitudinal studies that may be approached by growth mixture modeling Example 1: Clinical trials with placebo response The first example concerns analysis of data from a double-blind 8-week randomized trial on depression medication (Leuchter et al., 2002). Of particular interest is how to assess medication effects in the presence of placebo response. Placebo response is an improvement in depression ratings that is unrelated to medication. The improvement is often seen as an early steep drop in depression, often followed by a later upswing. Figure 6.1 shows results for a two-class growth mixture model for the sample of 45 placebo group subjects using the Hamilton depression scale (Ham-D). The first two time points are before randomization and the next nine time points are after randomization. The responder class is shown in the left panel and the non-responder class in the right panel. The solid curve is the estimated mean curve, whereas the broken curves are observed individual trajectories for individuals classified as most likely belonging to this class. Placebo response confounds the estimation of the true effect of medication and is an important phenomenon, given its high prevalence of 25 60%. Because placebo response is pervasive, the statistical modeling should account for this when estimating medication effects. This can be done by acknowledging the qualitative heterogeneity in trajectory shapes for responders and non-responders using growth mixture modeling. The estimation of medication effects using growth mixture modeling is described in Muthén et al. (2007). The medication effect is estimated in line with the approach of the next example Example 2: Randomized interventions with treatment effects varying across latent classes The second example concerns a randomized preventive field trial conducted in Baltimore public schools (Dolan et al., 1993; Ialongo et al., 1999). The study applied a universal intervention aimed at reducing aggressive-disruptive behavior during first and second grade to improve reading and reduce aggression with outcomes assessed through middle school and beyond (Kellam et al., 1994). Children were followed from first to seventh grade with

3 EXAMPLES 145 HamD Baseline Week 1 Week 4 Week 8 Baseline HamD Week 1 Week 4 Week 8 Figure 6.1 Two-class growth mixture model for depression in a placebo group. respect to the course of aggressive behavior, and a follow-up to age 18 also allowed for the assessment of intervention impact on more distal events, such as the probability of juvenile delinquency as indicated by juvenile court records. The intervention was administered after one pre-intervention time point in fall of first grade. Key scientific questions addressed whether the intervention reduced the slope of the aggression trajectory across the grades, whether the intervention was different in impact for children who initially display higher levels of aggression, and whether the intervention impacted distal outcomes. Analyses of these hypotheses were presented in Muthén et al. (2002). Allowing for multiple trajectory classes in the growth model gave a flexible way to assess differential effects of the intervention. The analyses focused on boys and intervention status as defined by classroom assignment in fall of first grade, resulting in a sample of 119 boys in the intervention group and 80 boys in the control group. Figure 6.2 shows results from a four-class growth mixture model for the 119 boys. For each combination of latent-trajectory class and intervention condition, the estimated mean growth curve is shown together with observed individual trajectories for individuals estimated to be most likely a member of the class. An intervention effect in terms of reducing aggressive behavior is seen for the high class and perhaps also for the low starting ( LS ) class, whereas the other two classes show no effects Example 3: High school dropout predicted by failing math achievement development The third example concerns growth mixture modeling of mathematics achievement development in US schools. Muthén (2004) analyzed longitudinal math scores for students in grades 7 10 from the Longitudinal Study of American Youth (LSAY) and found a problematic trajectory class with an exceptionally low starting point in grade 7 as well as a low growth rate; see Figure 6.3. The class membership was strongly related to covariates such as grade 7 measures of having low schooling expectations and dropout thoughts. Taken together with the poor math development, this suggests that the class consists of students who are disengaged from school. Class membership was also highly predictive of dropping out by grade 12, a binary distal outcome. In a further analysis, Muthén (2004) carried out a growth mixture analysis where the clustering of students within schools was taken

4 146 GROWTH MIXTURE MODELING 6 High Class, Control Group 6 High Class, Intervention Group 5 5 TOCA-R 4 3 TOCA-R F 1S 2F 2S 3S 4S 5S 6S 7S Grades F 1S 2F 2S 3S 4S 5S 6S 7S Grades Medium Class, Control Group 6 Medium Class, Intervention Group 5 5 TOCA-R 4 3 TOCA-R F 1S 2F 2S 3S 4S 5S 6S 7S Grades F 1S 2F 2S 3S 4S 5S 6S 7S Grades Low Class, Control Group 6 Low Class, Intervention Group 5 5 TOCA-R 4 3 TOCA-R F 1S 2F 2S 3S 4S 5S 6S 7S Grades F 1S 2F 2S 3S 4S 5S 6S 7S Grades LS Class, Control Group 6 LS Class, Intervention Group 5 5 TOCA-R 4 3 TOCA-R F 1S 2F 2S 3S 4S 5S 6S 7S 1F 1S 2F 2S 3S 4S 5S 6S 7S Grades 1-7 Grades 1-7 Figure 6.2 Four-class growth mixture model for aggressive behavior in control and intervention groups.

5 EXAMPLES Poor Development: 20% Moderate Development: 28% Good Development: 52% Math Achievement Grades Grades Grades 7-10 Dropout: 69% 8% 1% Figure 6.3 Three-class growth mixture model for math achievement related to high school dropout. into account by allowing random-effect variation across schools. The school variation was represented in the random effects for the growth, the random intercept in the logistic regression for dropping out, and the random intercept in the multinomial regression predicting latent-class membership as a function of student-level covariates. Furthermore, school-level covariates corresponding to poverty of the school neighborhood and teaching quality in the school were used to predict across-school variation in the random coefficients Example 4: Age crime curves The fourth example concerns criminal activity of 13,160 males born in Philadelphia, Pennsylvania in 1958 (D Unger et al., 1998; D Unger, Land, and McCall, 2002; Loughran and Nagin, 2006). Annual counts of police contacts are available from age 4 to 26 of this birth cohort. The aggregate age crime curve follows the well-known pattern of increasing annual convictions throughout the subjects teenage years and decreasing annual convictions thereafter. The criminology literature has focused extensively on identifying groups of individuals with similar patterns or careers of delinquent and criminal offending. To quote D Unger et al. (1998, p. 1595): This question of how many latent classes of criminal careers are optimal, and why the number of categories itself is important, has gained salience for criminological theory in light of recent theoretical debates. The authors go on to mention Moffit (1993) as a key contributor to the notion of different trajectory classes, proposing a distinction between the trajectory of life-course persistents versus adolescence limiteds depending on the behaviors persisting over the life course or seen only during adolescence. The debate continues, as seen in Sampson and Laub (2005) discussing the group-based analysis approach of Nagin (1999, 2005), Nagin and Land (1993), and Roeder, Lynch, and Nagin (1999). The Philadelphia crime data will be analyzed in a new way in this chapter. The analyses to be presented have two special features. First, the outcome variable is a count variable that is very skewed, with a large number of zeros at each point in time. Second, it is of interest to contrast the group-based approach with random-effects models Example 5: Classification of schools based on achievement development The fifth example extends the achievement analyses discussed in Example 3 by using a school-level latent-class variable, enabling a classification of schools as more or less successful. The LSAY data discussed in Example 3 are from a limited number of schools and

6 148 GROWTH MIXTURE MODELING analyses are instead performed on data from grades 8, 10, and 12 of the National Education Longitudinal Study (NELS). NELS surveyed 913 schools and a total of 14,217 students. In the analyses to be presented, student growth rate is regressed on the growth intercept in grade 8 and allows this relationship to vary across the school-level latent classes. It has been argued in the education literature that a weak relationship is an indicator of a school being egalitarian (e.g., Choi and Seltzer, 2006). The means of the random intercept and the intercept of the random growth rate are also allowed to vary across the school-level latent classes. Both types of school-level latent-class features are useful for determining school quality Other applications Other applications of growth mixture modeling found in the literature include Verbeke and Lesaffre (1996), see also Pearson et al. (1994), who considered different groups of males with linear or exponential growth in prostate-specific antigen (PSA); Muthén and Shedden (1999) and Muthén and Muthén (2000), with application to the development of heavy drinking and alcohol dependence; Lin et al. (2002), with application to PSA and prostate cancer, combining growth mixture modeling with survival analysis; Croudace et al. (2003), with application to bladder control; Muthén et al. (2003), with application to reading failure, including the modeling of a kindergarten process for phonemic awareness linked to a later process of word recognition; and Muthén and Masyn (2005), with application to aggressive behavior and juvenile delinquency, combining growth mixture modeling and discrete time survival analysis. Related applications to latent-class membership representing nonparticipation (non-compliance) and complier-average causal effect estimation in intervention studies (Angrist, Imbens, and Rubin, 1996) are given in Jo (2002), Jo and Muthén (2003), and are also generalizable to longitudinal studies (see Yau and Little, 2001; Dunn et al., 2003; Muthén, Jo, and Brown, 2003), including time-varying compliance (Lin, Ten Have, and Elliott, 2006). 6.3 Growth mixture modeling This section describes the general growth mixture modeling framework (see also Asparouhov and Muthén, 2008). The description is closely related to the implementation in the Mplus software version 4.2 and higher (Muthén and Muthén, ). To familiarize readers with the general Mplus modeling framework, the section starts with a simple growth example put into the conventional linear mixed-effects model as well as the Mplus modeling framework Specification of a simple growth model Consider a single growth process with no latent-trajectory classes, no clustering, linear growth for a continuous outcome y, atime-invariant covariate x and a time-varying covariate w, Y ij = η 0i + η 1i a ij + κ i w ij + ɛ ij, (6.1) where a ij are time scores (j =1, 2,...,T), the random intercept η 0i and the random slope η 1i represent the growth process, κ i is a random slope, and ɛ is a normally distributed residual. The random intercepts and slopes are expressed as η 0i = α 0 + γ 0 x i + ζ 0i, (6.2) η 1i = α 1 + γ 1 x i + ζ 1i, (6.3) κ i = α 2 + γ 2 w ij + ζ 2i, (6.4)

7 GROWTH MIXTURE MODELING 149 where the αs and γs are parameters and the ζs are normally distributed residuals. In multilevel terms, equation (6.1) represents level-1 variation across time and (6.2) (6.4) represent level-2 variation across individuals. Consider the mixed linear model formulation for Y i =(Y i1,...,y it ), Y i = X i β + Z i b i + e i, (6.5) where some individuals may not be observed at all occasions T, leading to missing data. In this example, let 1 a i1 1 a i2 Λ i = 1 a i3,.. 1 a it so that in (6.5) we have X i = ( ) Λ i w i Λ i x i w i x i, β =(α 0,α 1,α 2,γ 0,γ 1,γ 2 ), Z i = ( ) Λ i w i, b i =(ζ 0i,ζ 1i,ζ 2i ), e i =(ɛ i1,...,ɛ it ). The Mplus framework uses the general model expression for observed vectors Y i and X i, Y i = ν +Λη i + KX i + ɛ i, (6.6) η i = α + Bη i +ΓX i + ζ i, (6.7) implying Y i = ν +Λ(I B) 1 α +Λ(I B) 1 Γ X i + K X i +Λ(I B) 1 ζ i + ɛ i, where the first row refers to fixed effects and the second row to random effects. The regression parameter arrays Λ, K, B, and Γ are allowed to vary across i as a function of observed variables or they can be unobserved random slopes. The model equations (6.6) and (6.7) capture the level-1 and level-2 expressions for the linear growth example in (6.1) and (6.2) (6.4). The notation of (6.6) and (6.7) follows that of the linear growth example with B =0and with the vector X i containing both the time-varying covariate w ij and the time-invariant covariate x i. The model of (6.6) and (6.7) includes the mixed linear model of (6.5) as a special case. In latent-variable modeling terms, (6.6) is referred to as the measurement part of the model, where the latent-variable vector η i is measured by the indicators Y i. Here, Λ may contain parameters. A frequent example is when a it = a t,so that a t can be treated as parameters, for example capturing deviations from linear growth shape (fixing two a t values for identification, typically a 1 =0,a 2 = 1). Another example is where multiple indicators of a factor are available at each time point, where different indicators have different factor loadings λ. With Λ i =Λ,(6.6) also covers factor analysis with covariates. Furthermore, (6.7) is referred to as the structural part, containing regressions among the latent variables. The regression matrix B has zero diagonal elements, but the off-diagonal elements may be used to regress random effects on each other. For example, the growth slope (growth rate) η 1i,orthe random slope κ i may be expressed as a function of the intercept (initial status) η 0i. More generally, (6.7) also covers structural equation modeling. In this way, the extensions of (6.6) and (6.7) to finite mixtures and cluster samples presented in this chapter pertain to not only growth models but also factor analysis and structural equation models, as well as combinations of such models and growth models (Muthén, 2002).

8 150 GROWTH MIXTURE MODELING A general multilevel mixture model Let Y kij be the jth observed dependent variable for individual i in cluster k. Three types of variables are considered in the analyses to be presented: binary and ordered categorical variables, continuous normally distributed variables, and counts following the Poisson or zero-inflated Poisson distribution. Let C ki be a latent categorical variable for individual i in cluster k which takes values 1,...,L. Let D k beacluster-level latent categorical variable for cluster k which takes values 1,...,M. The choice of L and M will be discussed in Section To construct a model for observed binary and ordered categorical variables we proceed in line with Muthén (1984) by defining an underlying continuous, normally distributed latent variable Ykij such that, for a set of threshold parameters τ cdsj, Y kij = s Cki =c,d k =d τ cdsj <Ykij <τ cd,s+1,j. For continuous normally distributed variables we define Ykij = Y kij. For counts Ykij = log(λ kij ), where λ kij is the rate of the Poisson distribution. Let Y ki be the J-dimensional vector of all dependent variables and X ki be the Q-dimensional vector of all individual-level covariates. Using latent-variable terms, the measurement part of the model is defined by Y ki Cki =c,d k =d = ν cdk +Λ cdk η ki + K cdk X ki + ɛ ki, (6.8) where ν cdk is a J-dimensional vector of intercepts, Λ cdk is a J m slope matrix for the m-dimensional random-effect vector η ki, K cdk is a J Q slope matrix for the covariates, and ɛ ki is a J-dimensional vector of residuals with mean zero and covariance matrix Θ cd. Foracategorical variable Y kij a normality assumption for ɛ kij is thus equivalent to a probit regression for Y kij on η kij and X kij. Alternatively, ɛ kij can have a logistic distribution, resulting in a logistic regression. For a count variable Y kij the residual ɛ kij is assumed to be zero. For normally distributed continuous variables Y kij the residual variable ɛ kij is assumed normally distributed. The structural part of the model is defined by η ki Cki =c,d k =d = α cdk + B cdk η ki +Γ cdk X ki + ζ ki, (6.9) where α cdk is an m-dimensional vector of intercepts, B cdk is an m m structural regression parameter matrix, Γ cdk is a m Q slope parameter matrix, and ζ ki is an m-dimensional vector of normally distributed residuals with covariance matrix Ψ cd. The model for the latent categorical variable C ki is a multinomial logit model Pr(C ki = c D k = d) = exp(a cdk + b cdkx ki ) s exp(a sdk + b sdkx ki ). (6.10) Some parameters have to be restricted for identification purposes. For example, the variance of ɛ kij should be 1 for categorical variables Y kij under probit and π 2 / 3 under logit. Also a Ldk = b Ldk =0. The multilevel part of the model is introduced as follows. Each of the intercepts, slopes or loading parameters in equations (6.8) (6.10) can be either a fixed coefficient or a random effect that varies across clusters k. Let η k be the vector of all such random effects and let X k be the vector of all cluster-level covariates. The between-level model for η k is then η k Dk =d = µ d + B d η k +Γ d X k + ζ k, (6.11) where µ d, B d and Γ d are fixed parameters and ζ k is a normally distributed residual with covariance Ψ d. The model for the between level categorical variable D is also a multinomial logit regression Pr(D k = d) = exp(a d + b dx k ) s exp(a s + b sx k ). (6.12)

9 ESTIMATION 151 Equations (6.8) (6.12) comprise the definition of a multilevel latent-variable mixture model. There are many extensions of this model that are possible in the Mplus framework. For example, observed dependent variables can be incorporated on the between level. Other extensions arise from the fact that a regression equation can be constructed between any two variables in the model. Such equations can be fixed- or random-effect regressions. The model can also accommodate multiple latent-class variables on the within and the between level. Other types of dependent variables can also be incorporated in this model such as censored, nominal, semi-continuous, and time-to-event survival variables; see Olsen and Schafer (2001) and Asparouhov, Masyn, and Muthén (2006). 6.4 Estimation The above model is estimated by the maximum likelihood estimator using the EM algorithm where the latent variables C ki, η ki, D k and η k are treated as missing data. The observeddata likelihood is given by Pr(D k = d) ψ k (η k ) ( Pr(C ki = c) f ki (Y ki )ψ ki (η ki )dη ki )dη k, (6.13) k d i c where f ki, ψ ki and ψ k are the likelihood functions for Y ki, η ki and η k, respectively. Numerical integration is utilized in the evaluation of the above likelihood using both adaptive and non-adaptive quadrature (see Schilling and Bock, 2005). The method can be described as follows. Suppose that η is a continuously distributed random-effect variable with density function ψ. Then Q f(η)ψ(η)dη w q f(n q ), (6.14) where n q are the nodes of the numerical integration and w q are the weights. The weights are computed as w q = ψ(n q )/ Q i=1 ψ(n i). The numerical integration method approximates the continuous distribution for η with a categorical distribution, that is, we can assume that the variable η takes the values n q with probabilities w q. Using this method the likelihood (6.13) is approximated by Pr(D k = d) Pr(η k = n qk ) ( Pr(C ki = c) ) Pr(η ki = n rki )f ki (Y ki ) k d q i c r = Pr(D k = d, η k = n qk ) ( ) Pr(C ki = c, η ki = n rki )f ki (Y ki ), (6.15) k d,q i c,r where n qk and n rki are the nodes of the numerical integration. The EM algorithm is as follows. First compute the posterior distribution for the latent variables. The posterior joint distribution for D k and η k is computed as follows: p dqk =Pr(D k = d, η k = n qk ) Pr(D k = d, η k = n qk ) ) i( c,r Pr(C ki = c, η ki = n rki )f ki (Y ki ) = d,q Pr(D k = d, η k = n qk ) ). i( c,r Pr(C ki = c, η ki = n rki )f ki (Y ki ) The posterior conditional joint distribution for C ki and η ki is computed as follows: p crki dq = Pr(C ki = c, η ki = n rki,d k = d, η k = n qk ) q=1 = Pr(C ki = c, η ki = n rki )f ki (Y ki ) c,r Pr(C ki = c, η ki = n rki )f ki (Y ki ).

10 152 GROWTH MIXTURE MODELING The expected complete-data log-likelihood is now given by p dqk log(pr(d k = d, η k = n qk )) + p dqk p crki dq log(pr(c ki = c, η ki = n rki )) dqk dcqrki +p dqk p crki dq log(f ki (Y ki )), dcqrki which is maximized with respect to the model parameters. An alternative algorithm for obtaining the maximum likelihood estimates can be constructed by directly optimizing (6.15) with a standard maximization algorithms such as the Fisher scoring and the quasi-newton algorithms. Such alternative algorithms can be used in combination with the EM algorithm to achieve faster convergence, an approach known as the accelerated EM algorithm (AEM). The AEM algorithm is implemented in Mplus. A number of different integration methods can be used in (6.14). Mplus implements three different integration methods: rectangular, Gauss Hermite and Monte Carlo integration. In addition, adaptive integration can be used. With this method, the integration nodes are concentrated in the area where the posterior distribution of the random effects is non-zero. The estimation implemented in Mplus allows missing at random data for all dependent variables (Little and Rubin, 2002). Non-ignorable missing data is discussed in Muthén et al. (2003). It should be noted that mixture models in general are prone to have multiple local maxima of the likelihood and the use of many different sets of starting values in the interactive maximization procedure is strongly recommended. An automatic random starts procedure is implemented in the Mplus program, where starting values given by the user or produced automatically by the program are randomly perturbed Model assessment For comparison of fit of models that have the same number of classes and are nested, the usual likelihood ratio chi-square difference test can be used, as long as the requirement is fulfilled of not having parameters on the border of the admissible parameter space in the more restricted model. Comparison of models with different numbers of classes violates this requirement with zero probability parameters. Deciding on the number of classes is instead typically accomplished by a Bayesian information criterion (BIC: Schwartz, 1978; Kass and Raftery, 1993), BIC = 2 log L + r log n, where r is the number of free parameters in the model and n is the sample size. The lower the BIC value, the better the model. The number of classes is increased until a BIC minimum is found. Although not chi-square distributed, the usual likelihood ratio statistic for comparing models with different number of classes can still be used, assessing the distribution of the statistic by bootstrap techniques. McLachlan and Peel (2000, Chapter 6) discuss a parametric bootstrapped likelihood ratio approach proposed by Aitkin, Anderson, and Hinde (1981). Although computationally intensive, it has been found to perform well in simulation studies using latent-class and growth mixture models, outperforming BIC in some instances (Nylund, Asparouhov, and Muthén, 2007). The fit of the model to data for continuous variables can be studied by comparing for each class estimated moments with moments created by weighting the individual data by the estimated conditional probabilities (Roeder, Lynch and Nagin, 1999). To check how closely the estimated average curve within each class matches the data, it is also useful to randomly assign individuals to classes based on individual estimated conditional class

11 EXAMPLES 153 probabilities. Plots of the observed individual trajectories together with the model-estimated average trajectory can be used to check assumptions using class membership determined by pseudo-class draws (Bandeen-Roche et al., 1997). Wang, Brown, and Bandeen-Roche (2005) present methods for residual checking based on these ideas. With categorical and count outcomes, model fit may be investigated with respect to univariate and bivariate frequency tables, as well as frequencies for response patterns that do not have too small expected counts. Finally, it is important to note that the need for latent classes may be due to non-normality of the outcomes rather than substantively meaningful subgroups (see McLachlan and Peel, 2000, pp ; Bauer and Curran, 2003). To support a substantive interpretation of the latent classes, the researcher should consider not only the outcome variable in question, but also antecedents (covariates predicting latent-class membership), concurrent outcomes, and distal outcomes (predictive validity); see also related arguments in Muthén (2004). 6.5 Examples This section presents analyses of the crime data of Example 4, the aggressive behavior data of Example 2, and the math achievement data of Example 5. The Example 4 analysis uses a growth mixture model for crime counts. Examples 2 and 5 consider multilevel growth mixture modeling of cluster data. Example 2 examines intervention effects that vary across both student-level and classroom-level latent-class variables. Example 5 considers students within school where student growth characteristics vary across a school-level latent-class variable Analysis of Example 4: Age crime curves The analysis of the Philadelphia data with counts of criminal activity for 13,160 males aged 4 26 will compare two different approaches, a group-based approach and growth mixture analysis (for more extended comparisons, see Kreuter and Muthén, 2007, 2008). The group-based analysis is associated with the work of of Nagin and Land (1993), Nagin (1999, 2005), Roeder, Lynch, and Nagin (1999), and Jones, Nagin, and Roeder (2001). This approach is commonly seen in the criminology literature and was used by D Unger et al. (1998), D Unger, Land, and McCall (2002), and Loughran and Nagin (2006) for these data. The group-based analysis does not cover cluster sampling and has the further restrictions of zero within-class variances Ψ c =0,aswell as Θ c = θi. The group-based approach is further discussed in Muthén (2004) where it is referred to as latent-class growth analysis (LCGA), given its similarity to latent-class analysis (LCA). Both LCGA and LCA search for classes of individuals defined by conditional independence of the repeated measures given class. In contrast, a growth mixture model (GMM) allows for within-class correlations between repeated measures. Such correlation may, for example, be due to omitted time-varying covariates. If within-class correlation is ignored, a distorted class formation is obtained. Within-class correlation is obtained in GMMs by allowing for random effects with non-zero within-class variances. Both LCGA and GMMs use a zero-inflated Poisson model in line with Roeder, Lynch, and Nagin (1999). For time point j, individual i, and cluster k, { 0 with probability πkij, Y kij Cki =c = Poisson(λ ckij ) with probability 1 π kij where λ is the Poisson rate. In line with previous modeling of the Philadelphia data, a quadratic growth curve is used. Drawing on (6.8) and (6.9) of the general model in Section 6.3.2, the growth mixture zero-inflated Poisson model for these data is expressed in

12 154 GROWTH MIXTURE MODELING terms of the log rate as log λ = η ij Ci =c 0i + η 1i a ij + η 2i a 2 ij, η = α 0i Ci =c 0c + ζ 0i, η = α 1i Ci =c 1c + ζ 1i, η = α 2i Ci =c 2c + ζ 2i. To make analysis results comparable to the LCGA of Loughran and Nagin (2006), a minority of individuals with more than 10 criminal offenses in any given year are deleted, reducing the sample size only from 13,160 to 13,126, and combining the data into two-year intervals. Loughran and Nagin (2006) settled on a four-class solution: non-offenders, adolescentlimited, and high and low chronic (persisting criminal activity at age 26). D Unger et al. (1998) and D Unger, Land, and McCall (2002) used a random subset (n = 1000) of the data and concluded based on BIC that a five-class LCGA solution was preferred. Their five classes were labeled: non-offenders, high and low adolescent-peaked, and high and low chronic. Table 6.1 gives results for 1 4 classes of GMM and 4 8 classes for LCGA. In addition to log-likelihood values, number of parameters, and BIC, the table shows fit to the data in terms of the number of standardized residuals that are significant at the 5% level for the 10 most frequent response patterns across time (comprising 78% of the data and eliminating only patterns with observed frequency less than 100). The one-class GMM is the conventional random-effects model. Here, 5 of the 10 residuals show significant misfit, illustrating the need for a more flexible model. The two- and the three-class GMMs obtain considerably improved BIC values. The three-class GMM reduces the number of significant residuals from 5 to 1, indicating the appropriateness of the mixture modeling. The four-class GMM adds relatively little improvement. The three-class GMM displays the three themes of nonoffenders, adolescent-limited, and chronic. Figure 6.4 shows the mean trajectories for the three-class GMM. The four-class GMM splits the adolescent-limited class into two, where the total percentage for those two classes is about the same as for the adolescent-limited class of the three-class GMM. The four-class LCGA is the same as presented in Loughran and Nagin (2006) and the fiveclass LCGA shows the same types of trajectory classes as in D Unger et al. s analysis. Neither of these two models fit the data well. An eight-class LCGA is needed to get a reduction to one significant residual. In contrast, the three-class GMM has only one significant residual and the four-class GMM has none. With three classes the GMM gives a better BIC value than any of the LCGA models shown in Table 6.1. The BIC values for the four-class LCGA Table 6.1 Age crime curves: Log-likelihood and BIC comparisons for GMM and LCGA Model Log-Likelihood # Parameters BIC # Significant Residuals 1-class GMM 40, , class GMM 40, , class GMM 40, , class GMM 40, , class LCGA 40, , class LCGA 40, , class LCGA 40, , class LCGA 40, , class LCGA 40, ,896 1

13 EXAMPLES 155 used in Loughran and Nagin (2006) and the five-class model used in D Unger et al. (1998) and D Unger, Land, and McCall (2002) are considerably worse than the BIC value for the three-class GMM. Furthermore, the three-class GMM uses two parameters less than the fiveclass LCGA, but has a better log-likelihood by 200 points. This illustrates the importance of using random effects to allow for variations on the themes of the trajectory shapes of the classes. The LCGA approach leads to a proliferation of classes, all of which may not have substantive salience Analysis of Example 2: Varying intervention effects on classroom aggressive behavior The Baltimore randomized field trial discussed in Section was repeated for several cohorts of students. The Section analysis considered cohort 1 data, whereas data from cohort 3 (Ialongo et al., 1999) are analyzed here. A total of 362 boys in 27 classrooms are considered over four time points: fall of first grade, spring of first grade, spring of second grade, and spring of third grade. The average number of boys per classroom is It is of interest to study if teachers in classrooms with higher aggressiveness levels have a more difficult time successfully implementing the intervention aimed at reducing aggressivedisruptive behavior. For the first grade, there is substantial variation across classrooms in the aggressiveness scores as evidenced by the intraclass correlations at the four time points: 0.11, 0.16, 0.04, In addition to student-level trajectory classes, the use of latent classes on the classroom level makes it possible to more fully explore variation in intervention effects. Drawing on the Section general model, the two-level GMM is expressed as follows using a quadratic curve shape, Y = η kij Cki =c,d k =d 0ki + η 1ki a ij + η 2ki a 2 ij + ɛ kij, Class 1, 64.9% 1 Class 2, 15.6% Class 3, 19.6% Age Figure 6.4 Estimated mean trajectories from a three-class growth mixture model for criminal activities. Mean

14 156 GROWTH MIXTURE MODELING for j =1, 2, 3, 4, with variation across students within classrooms expressed as η = α 0ki Cki =c,d k =d cdk0 + ζ 0ki, η = α 1ki Cki =c,d k =d cdk1 + ζ 1ki, η = α 2ki Cki =c,d k =d cdk2 + ζ 2ki, and variation across classrooms expressed as α = α cdk0 Cki =c,d k =d cd0 + ζ 30k, α = α cdk1 Cki =c,d k =d cd1 + γ cd1 Z k + ζ 31k, α = α cdk2 Cki =c,d k =d cd2 + γ cd2 Z k + ζ 32k. Here, a i1 =0to center the intercept η 0 at the pre-intervention time point. Z is a treatmentcontrol dummy variable on the classroom level. For reasons of parsimony, the student-level latent-class variable C and the classroom-level latent-class variable D are taken to have an additive effect on the means α cd0, α cd1, and α cd2. The γ intervention effects are, however, allowed to vary across combinations of C and D classes. The linear and quadratic slopes were found to have zero variance across classrooms. The intraclass correlation is captured by the classroom variation in the random intercept of the growth model, α cdk0. The latent categorical variable C ki follows the multinomial logistic regression Pr(C ki = c D k = d) = exp(a cdk) s exp(a sdk), where in this application a cdk Dk =d = a c + ζ ck. (6.16) The analyses indicate that V (ζ ck )=0, that is, the random intercepts for the latent-class variable C do not vary across classrooms. In other applications, however, this variance can be substantial. As a first step, a model without the classroom-level latent-class variable D was explored. As judged by BIC, the conventional single-class random-effects growth model is clearly outperformed by growth mixture modeling, with a three-class model giving the lowest BIC. The log-likelihood for the conventional model is with 14 parameters and a BIC of 8398, while the three-class GMM has a log-likelihood of with 26 parameters and a BIC of The three-class model has a significant classroom variance for the random intercept. Second, two latent classes for D were added to the model resulting in latent classes with low versus high classroom-level aggression (51% versus 49%). The loglikelihood is with 34 parameters and a BIC of This BIC is not as good as for the previous model with no classroom-level latent-class variable, but it is not known how BIC performs in settings with multilevel latent-class variables. The three student-level latent-trajectory classes show a low-increasing class of 68%, a medium-increasing class of 19%, and a high-decreasing class of 12%. The mean curves for these three latent classes are shown in Figure 6.5 as pairs of control and intervention curves. Results for the latent class consisting of classrooms with low aggression level are given in the left plot and results for the latent class consisting of classrooms with high aggression level are given in the right plot. The plots suggest that in classrooms with a low level of aggression, students who are in the two highest trajectory classes benefit from the intervention. In classrooms with a high level of aggression, however, only students who are in the lowest trajectory class benefit from the intervention. This suggests that the intervention may be harder for teachers to implement well in high-aggressive classrooms. The results should be interpreted with caution, however, given the sample of only 27 classrooms and other competing models. An alternative model lets the C and D latent-class variables have an interactive effect on the

15 EXAMPLES Low Classroom Control Treatment 30 High Classroom Control Treatment Means Means Fall 1st Spring 1st Spring 2nd Spring 3rd 5 Fall 1st Spring 1st Spring 2nd Spring 3rd Figure 6.5 Estimated mean trajectories from a growth mixture model of classroom aggressive behavior. random-effect growth means and lets the random-effect means of a cdk in (6.16) be influenced by the latent classes of D. This significantly improves the log-likelihood, but the increased number of parameters on the classroom level results in a less stable solution. The resulting split into 7 and 20 classrooms for the latent classes of D causes estimated outcome mean differences with high variability Analysis of Example 5: Classification of schools based on achievement development The NELS math achievement data from grades 8, 10, and 12 discussed in Section are analyzed here. NELS surveyed 913 schools and a total of 14,217 students. The NELS analysis illustrates two features of the Section model, taking into account the school clusters and using a school-level latent-class variable. In the NELS analysis, student growth rate is regressed on the growth intercept in grade 8 using a random slope that varies across schools. This random slope and the means of the random intercept and the intercept of the random growth rate are allowed to vary across the school-level latent classes. Letting school-level latent classes influence student-level relations helps identify the school-level latent classes. Extending the example of Section to clusters k and a cluster-level latent-class variable D k,variation across grades is expressed as Y = η kij Dk =d 0ki + η 1ki a ij + ɛ kij, for j =1, 2, 3, with variation across students expressed as η = α 0ki Dk =d d0 + ζ 0ki, η = α 1ki Dk =d d1 + β dk η 0ki + ζ 1ki, where the variation across schools is accomplished by the variation of α d0, α d1, and β dk across the classes of D. A single-class growth model, that is, a conventional three-level analysis, obtains a log-likelihood of 31,791 with 10 parameters and a BIC of 63,678. A two-class GMM obtains a log-likelihood of 31,545 with 16 parameters and a BIC of 63,243. A three-class GMM obtains a log-likelihood of 31,434 with 22 parameters and a BIC of 63,079. A four-class model does not improve the log-likelihood further. The three-class model shows that the growth rate is significantly positively related to the growth intercept defined at grade 8 only for a class of 52% of the schools who have average growth over grades A higher developing class of 25% and a lower developing class of 23% have small and

16 158 GROWTH MIXTURE MODELING insignificant relationships. This illustrates the possibility of finding clusters of schools with different achievement profiles. School-level covariates predicting school class membership can give further understanding of the school classes. 6.6 Parametric versus non-parametric random-effect models Titterington, Smith, and Makov (1985) make a distinction in the use of finite-mixture modeling in terms of direct and indirect applications. A direct application uses mixtures to represent the underlying physical phenomenon, whereas with the indirect application the mixture components do not necessarily have a direct physical interpretation. The examples discussed so far can be seen as attempts at direct application, where trajectory classes are given substantive interpretation and results are presented for each mixture component rather than mixing over the classes. Examples of indirect applications include outlier detection and representation of non-normal distributions. Mixture modeling of non-normal distributions is the focus of this section. A growth model with a non-parametric representation of the random-effects distribution is presented and a simulation study compares the use of such a model to the conventional random-effect growth model assuming normality. It has been argued that with categorical and count outcomes, the typical normality assumption for random effects in repeated-measurement modeling may be less well supported by data (see also Aitkin, 1999). Deviations from normality may strongly affect the results. With categorical and count outcomes, maximum likelihood leads to the use of numerical integration which is computationally heavy and intractable when the number of random effects is large. Numerical integration uses fixed quadrature points and weights according to a normal distribution. A non-parametric approach instead considers a discretized distribution, estimating the points and the weights using a finite-mixture model. The latent class means are the points and the class probabilities are the weights. In this way, the non-parametric approach both avoids the normality specification and is computationally less demanding. In this section we describe and compare the general parametric and non-parametric random-effect models. Both of these models are special cases of the general model described in equations (6.8) (6.12). Both of these modeling alternatives attempt to capture clusterspecific effects. The difference between the two models is the underlying assumption for the distributions of the cluster-specific random effects. In the parametric model the random effects are assumed to a have conditionally normal distribution, that is, the conditional distribution of the random effects, given all covariates, is assumed to be normal. In the non-parametric model the random effect are assumed to have a non-parametric conditional distribution. The parametric random-effect model is well established and frequently used in practice. Butler and Louis (1992) show that the normality assumption in the parametric model does not affect the fixed slopes in the model. Verbeke and Lesaffre (1996) show that more accurate estimates can be obtained for the random effects if a non-normal distribution is estimated. Aitkin (1999) gives the general modeling approach to the non-parametric random-effect models that we follow here. First we give the complete description of the two modeling alternatives and show how they fit in the general modeling framework (6.8) (6.12) Parametric random-effect model This model is a special case of model (6.8) (6.12) for the case of no categorical latent variables. The within-level model is given by Y ki = ν k +Λ k η ki + K k X ki + ɛ ki, (6.17) η ki = α + B k η ki +Γ k X ki + ζ ki. (6.18)

17 PARAMETRIC VERSUS NON-PARAMETRIC RANDOM-EFFECT MODELS 159 The coefficients ν k,λ k, K k, B k, and Γ k can be either fixed coefficients that are the same across cluster or random effects that vary across cluster. Let η k represent the vector of all such random effects. The between-level model is described by η k = µ + Bη k +ΓX k + ζ k. (6.19) The random-effect residuals ζ ki and ζ k are assumed normally distributed. This assumption is the difference between the parametric and the non-parametric model. Note that the distributional assumption for ɛ ki is determined by the type of observed variable we model Non-parametric random-effect model This model is a special case of model (6.8) (6.12) where the random effects η ki and η k do not have normally distributed residuals ζ ki and ζ k. The within-level model is given by Y ki = ν k +Λ k η ki + K k X ki + ɛ ki, η ki Cki =c = α c + B k η ki +Γ k X ki, (6.20) Pr(C ki = c) =p c, (6.21) where p c are parameters to be estimated. The coefficients ν k,λ k, K k, B k, and Γ k can again be either fixed coefficients or random effects. Let η k represent the vector of all random effects. The between-level model is given by η k Dk =d = µ d + Bη k +ΓX k, (6.22) Pr(D k = d) =q d, (6.23) where q d are parameters to be estimated. The random-effect model (6.20) (6.23) can alternatively be presented as in equations (6.18) (6.19), considering the mixture across classes η ki = α + B k η ki +Γ k X ki + ζ ki, η k = µ + Bη k +ΓX k + ζ k, where α = c α cp c, µ = d µ dq d, and ζ ki and ζ k are non-parametric zero-mean residuals that are freely estimated. The residual ζ ki takes the values α c α with probability p c and the residual ζ k takes values µ d µ with probability q d. The variance and covariance for the non-parametric effects can also be computed; for example, the variance of ζ ki is c p c(α c α)(α c α) Simulation study A simulation study is conducted to compare the performance of the parametric and nonparametric random-effect models for data generated with non-normal random effects. Consider a logistic growth model with 10 binary items U 1,...,U 10, [ ] Pr(Uij =1) log = η 0i + η 1i a ij, (6.24) Pr(U ij =0) where the time scores a ij = (j 1)/2, and η 0 and η 1 are non-normal random effects. Generation of η 0 and η 1 used the following finite mixture of normal distributions: 0.67 N(µ 1,σ 2 )+0.09 N(µ 2,σ 2 )+0.24 N(µ 3,σ 3 ). To generate η 0, the following parameters were used: µ 1 =2,µ 2 =1,µ 3 =0,and σ =0.4. To generate η 1, the following parameters were used: µ 1 = 0.3, µ 2 = 0.4, µ 3 = 1, and σ = 0.1. From these values, 100 samples of size 2000 were generated according to the model (6.24). The data were analyzed using the parametric linear model (PM) and

18 160 GROWTH MIXTURE MODELING Table 6.2 Comparing the parametric (PM) and non-parametric (NPM) random-effect models Parameter True value PM bias NPM bias PM MSE NPM MSE m m v v ρ the non-parametric linear model (NPM). PM is a conventional single-class growth model with random normal effects as in (6.17) (6.18). Drawing on (6.20) and (6.22), the NPM is expressed as Yij = η 0i + η 1i a ij + ɛ ij, η = α 0i Ci =c 0c, η = α 1i Ci =c 1c, so that the random effects are represented by a mixture distribution. A more general form would allow within-class variation for residuals ζ as in (6.9). The parameter estimates are summarized in Table 6.2. The means of η 0 and η 1 are denoted by m 0 and m 1 and the variances by v 0 and v 1. The covariance of η 0 and η 1 is denoted by ρ. The results are presented for the non-parametric model with three nodes, since three nodes were determined to be sufficient for most replications using the McLachlan and Peel (2000) parametric likelihood ratio test. The estimates on which Table 6.2 is based are computed for the mixture over the three classes in line with (6.20) and (6.22). The results in Table 6.2 clearly indicate the advantages of the NPM method. The NPM parameter estimates have substantially smaller bias and smaller mean squared error (MSE) for several parameters. In general it is difficult to evaluate model fit for random-effect models. There is no general unrestricted model which can be used for comparison. In this simulated example, however, there is such a model, namely, the completely unrestricted contingency table for the binary items. In addition, the Pearson chi-square test can be used to test the fit of the model. The data were generated according to a linear growth model with non-normal random effects; that is, the true model is a linear random-effect growth model. Both the PM and NPM models are linear random-effect growth models but are based on different assumptions on the distribution of the random effects; neither assumption specifies the true random-effect distribution. This situation is typical in practical applications, where the true random-effect distribution is unknown and the modeling assumptions are likely to deviate from the true distribution to some extent. It is assumed that the distributional misspecification will not interfere with the basic structure of the model and that the estimated model will provide a good fit for the data despite the distributional misspecifications. The Pearson test of fit can be used to directly compare the sensitivity of the PM and NPM models. If the Pearson test rejects the model, one concludes that the model fit is poor. In practical applications the lack of model fit could incorrectly be interpreted as evidence for deficiency in the linear growth structure of the model rather than as possible misspecification in the random-effect distribution. In the current simulated example one wants the Pearson test to reject the model no more than the nominal 5% of the time. Table 6.3 contains the Pearson test of fit results for the PM linear growth model as well as the NPM linear growth model using 3, 4, 5, and 6 nodes. The rejection rate in Table 6.3 is the percentage of times the linear growth model was rejected incorrectly. Also presented are the average test statistic value and the degrees of freedom. These two values should generally be close because the expected value of the chi-square distribution is equal to the degrees of freedom. The parametric approach

Chapter 19. Latent Variable Analysis. Growth Mixture Modeling and Related Techniques for Longitudinal Data. Bengt Muthén

Chapter 19. Latent Variable Analysis. Growth Mixture Modeling and Related Techniques for Longitudinal Data. Bengt Muthén Chapter 19 Latent Variable Analysis Growth Mixture Modeling and Related Techniques for Longitudinal Data Bengt Muthén 19.1. Introduction This chapter gives an overview of recent advances in latent variable

More information

Relating Latent Class Analysis Results to Variables not Included in the Analysis

Relating Latent Class Analysis Results to Variables not Included in the Analysis Relating LCA Results 1 Running Head: Relating LCA Results Relating Latent Class Analysis Results to Variables not Included in the Analysis Shaunna L. Clark & Bengt Muthén University of California, Los

More information

Analyzing criminal trajectory profiles: Bridging multilevel and group-based approaches using growth mixture modeling

Analyzing criminal trajectory profiles: Bridging multilevel and group-based approaches using growth mixture modeling Analyzing criminal trajectory profiles: Bridging multilevel and group-based approaches using growth mixture modeling Frauke Kreuter 1 & Bengt Muthén 2 1 Joint Program in Survey Methodology, University

More information

Multilevel regression mixture analysis

Multilevel regression mixture analysis J. R. Statist. Soc. A (2009) 172, Part 3, pp. 639 657 Multilevel regression mixture analysis Bengt Muthén University of California, Los Angeles, USA and Tihomir Asparouhov Muthén & Muthén, Los Angeles,

More information

Analyzing Criminal Trajectory Profiles: Bridging Multilevel and Group-based Approaches Using Growth Mixture Modeling

Analyzing Criminal Trajectory Profiles: Bridging Multilevel and Group-based Approaches Using Growth Mixture Modeling J Quant Criminol (2008) 24:1 31 DOI 10.1007/s10940-007-9036-0 ORIGINAL PAPER Analyzing Criminal Trajectory Profiles: Bridging Multilevel and Group-based Approaches Using Growth Mixture Modeling Frauke

More information

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Tihomir Asparouhov 1, Bengt Muthen 2 Muthen & Muthen 1 UCLA 2 Abstract Multilevel analysis often leads to modeling

More information

Multilevel Regression Mixture Analysis

Multilevel Regression Mixture Analysis Multilevel Regression Mixture Analysis Bengt Muthén and Tihomir Asparouhov Forthcoming in Journal of the Royal Statistical Society, Series A October 3, 2008 1 Abstract A two-level regression mixture model

More information

Advances in Mixture Modeling And More

Advances in Mixture Modeling And More Advances in Mixture Modeling And More Bengt Muthén & Tihomir Asparouhov Mplus www.statmodel.com bmuthen@statmodel.com Keynote address at IMPS 14, Madison, Wisconsin, July 22, 14 Bengt Muthén & Tihomir

More information

Auxiliary Variables in Mixture Modeling: Using the BCH Method in Mplus to Estimate a Distal Outcome Model and an Arbitrary Secondary Model

Auxiliary Variables in Mixture Modeling: Using the BCH Method in Mplus to Estimate a Distal Outcome Model and an Arbitrary Secondary Model Auxiliary Variables in Mixture Modeling: Using the BCH Method in Mplus to Estimate a Distal Outcome Model and an Arbitrary Secondary Model Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 21 Version

More information

Specifying Latent Curve and Other Growth Models Using Mplus. (Revised )

Specifying Latent Curve and Other Growth Models Using Mplus. (Revised ) Ronald H. Heck 1 University of Hawai i at Mānoa Handout #20 Specifying Latent Curve and Other Growth Models Using Mplus (Revised 12-1-2014) The SEM approach offers a contrasting framework for use in analyzing

More information

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA Examples: Multilevel Modeling With Complex Survey Data CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA Complex survey data refers to data obtained by stratification, cluster sampling and/or

More information

Using Mplus individual residual plots for. diagnostics and model evaluation in SEM

Using Mplus individual residual plots for. diagnostics and model evaluation in SEM Using Mplus individual residual plots for diagnostics and model evaluation in SEM Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 20 October 31, 2017 1 Introduction A variety of plots are available

More information

Bayesian Analysis of Latent Variable Models using Mplus

Bayesian Analysis of Latent Variable Models using Mplus Bayesian Analysis of Latent Variable Models using Mplus Tihomir Asparouhov and Bengt Muthén Version 2 June 29, 2010 1 1 Introduction In this paper we describe some of the modeling possibilities that are

More information

NELS 88. Latent Response Variable Formulation Versus Probability Curve Formulation

NELS 88. Latent Response Variable Formulation Versus Probability Curve Formulation NELS 88 Table 2.3 Adjusted odds ratios of eighth-grade students in 988 performing below basic levels of reading and mathematics in 988 and dropping out of school, 988 to 990, by basic demographics Variable

More information

Categorical and Zero Inflated Growth Models

Categorical and Zero Inflated Growth Models Categorical and Zero Inflated Growth Models Alan C. Acock* Summer, 2009 *Alan C. Acock, Department of Human Development and Family Sciences, Oregon State University, Corvallis OR 97331 (alan.acock@oregonstate.edu).

More information

Continuous Time Survival in Latent Variable Models

Continuous Time Survival in Latent Variable Models Continuous Time Survival in Latent Variable Models Tihomir Asparouhov 1, Katherine Masyn 2, Bengt Muthen 3 Muthen & Muthen 1 University of California, Davis 2 University of California, Los Angeles 3 Abstract

More information

Plausible Values for Latent Variables Using Mplus

Plausible Values for Latent Variables Using Mplus Plausible Values for Latent Variables Using Mplus Tihomir Asparouhov and Bengt Muthén August 21, 2010 1 1 Introduction Plausible values are imputed values for latent variables. All latent variables can

More information

Growth Mixture Modeling and Causal Inference. Booil Jo Stanford University

Growth Mixture Modeling and Causal Inference. Booil Jo Stanford University Growth Mixture Modeling and Causal Inference Booil Jo Stanford University booil@stanford.edu Conference on Advances in Longitudinal Methods inthe Socialand and Behavioral Sciences June 17 18, 2010 Center

More information

Investigating Population Heterogeneity With Factor Mixture Models

Investigating Population Heterogeneity With Factor Mixture Models Psychological Methods 2005, Vol. 10, No. 1, 21 39 Copyright 2005 by the American Psychological Association 1082-989X/05/$12.00 DOI: 10.1037/1082-989X.10.1.21 Investigating Population Heterogeneity With

More information

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

More information

Nesting and Equivalence Testing

Nesting and Equivalence Testing Nesting and Equivalence Testing Tihomir Asparouhov and Bengt Muthén August 13, 2018 Abstract In this note, we discuss the nesting and equivalence testing (NET) methodology developed in Bentler and Satorra

More information

General Growth Mixture Modeling for Randomized Preventive Interventions

General Growth Mixture Modeling for Randomized Preventive Interventions General Growth Mixture Modeling for Randomized Preventive Interventions Bengt Muth en University of California, Los Angeles C. Hendricks Brown University of South Florida Katherine Masyn University of

More information

SRMR in Mplus. Tihomir Asparouhov and Bengt Muthén. May 2, 2018

SRMR in Mplus. Tihomir Asparouhov and Bengt Muthén. May 2, 2018 SRMR in Mplus Tihomir Asparouhov and Bengt Muthén May 2, 2018 1 Introduction In this note we describe the Mplus implementation of the SRMR standardized root mean squared residual) fit index for the models

More information

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction

More information

Overview of Talk. Motivating Example: Antisocial Behavior (AB) Raw Data

Overview of Talk. Motivating Example: Antisocial Behavior (AB) Raw Data An Evaluative Comparison of Random Coefficient Growth Models for Individual Development May 6, 7 Association for Psychological Science Overview of Talk Motivating example Definition of Random Coefficient

More information

An Introduction to Mplus and Path Analysis

An Introduction to Mplus and Path Analysis An Introduction to Mplus and Path Analysis PSYC 943: Fundamentals of Multivariate Modeling Lecture 10: October 30, 2013 PSYC 943: Lecture 10 Today s Lecture Path analysis starting with multivariate regression

More information

Centering Predictor and Mediator Variables in Multilevel and Time-Series Models

Centering Predictor and Mediator Variables in Multilevel and Time-Series Models Centering Predictor and Mediator Variables in Multilevel and Time-Series Models Tihomir Asparouhov and Bengt Muthén Part 2 May 7, 2018 Tihomir Asparouhov and Bengt Muthén Part 2 Muthén & Muthén 1/ 42 Overview

More information

Growth Mixture Model

Growth Mixture Model Growth Mixture Model Latent Variable Modeling and Measurement Biostatistics Program Harvard Catalyst The Harvard Clinical & Translational Science Center Short course, October 28, 2016 Slides contributed

More information

Determining the number of components in mixture models for hierarchical data

Determining the number of components in mixture models for hierarchical data Determining the number of components in mixture models for hierarchical data Olga Lukočienė 1 and Jeroen K. Vermunt 2 1 Department of Methodology and Statistics, Tilburg University, P.O. Box 90153, 5000

More information

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score Causal Inference with General Treatment Regimes: Generalizing the Propensity Score David van Dyk Department of Statistics, University of California, Irvine vandyk@stat.harvard.edu Joint work with Kosuke

More information

Mixture Modeling in Mplus

Mixture Modeling in Mplus Mixture Modeling in Mplus Gitta Lubke University of Notre Dame VU University Amsterdam Mplus Workshop John s Hopkins 2012 G. Lubke, ND, VU Mixture Modeling in Mplus 1/89 Outline 1 Overview 2 Latent Class

More information

Mixture Modeling. Identifying the Correct Number of Classes in a Growth Mixture Model. Davood Tofighi Craig Enders Arizona State University

Mixture Modeling. Identifying the Correct Number of Classes in a Growth Mixture Model. Davood Tofighi Craig Enders Arizona State University Identifying the Correct Number of Classes in a Growth Mixture Model Davood Tofighi Craig Enders Arizona State University Mixture Modeling Heterogeneity exists such that the data are comprised of two or

More information

Using Bayesian Priors for More Flexible Latent Class Analysis

Using Bayesian Priors for More Flexible Latent Class Analysis Using Bayesian Priors for More Flexible Latent Class Analysis Tihomir Asparouhov Bengt Muthén Abstract Latent class analysis is based on the assumption that within each class the observed class indicator

More information

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

General structural model Part 2: Categorical variables and beyond. Psychology 588: Covariance structure and factor models

General structural model Part 2: Categorical variables and beyond. Psychology 588: Covariance structure and factor models General structural model Part 2: Categorical variables and beyond Psychology 588: Covariance structure and factor models Categorical variables 2 Conventional (linear) SEM assumes continuous observed variables

More information

Bayes methods for categorical data. April 25, 2017

Bayes methods for categorical data. April 25, 2017 Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,

More information

What is Latent Class Analysis. Tarani Chandola

What is Latent Class Analysis. Tarani Chandola What is Latent Class Analysis Tarani Chandola methods@manchester Many names similar methods (Finite) Mixture Modeling Latent Class Analysis Latent Profile Analysis Latent class analysis (LCA) LCA is a

More information

An Introduction to Path Analysis

An Introduction to Path Analysis An Introduction to Path Analysis PRE 905: Multivariate Analysis Lecture 10: April 15, 2014 PRE 905: Lecture 10 Path Analysis Today s Lecture Path analysis starting with multivariate regression then arriving

More information

Semiparametric Mixed Effects Models with Flexible Random Effects Distribution

Semiparametric Mixed Effects Models with Flexible Random Effects Distribution Semiparametric Mixed Effects Models with Flexible Random Effects Distribution Marie Davidian North Carolina State University davidian@stat.ncsu.edu www.stat.ncsu.edu/ davidian Joint work with A. Tsiatis,

More information

Citation for published version (APA): Jak, S. (2013). Cluster bias: Testing measurement invariance in multilevel data

Citation for published version (APA): Jak, S. (2013). Cluster bias: Testing measurement invariance in multilevel data UvA-DARE (Digital Academic Repository) Cluster bias: Testing measurement invariance in multilevel data Jak, S. Link to publication Citation for published version (APA): Jak, S. (2013). Cluster bias: Testing

More information

Mplus Short Courses Topic 6. Categorical Latent Variable Modeling Using Mplus: Longitudinal Data

Mplus Short Courses Topic 6. Categorical Latent Variable Modeling Using Mplus: Longitudinal Data Mplus Short Courses Topic 6 Categorical Latent Variable Modeling Using Mplus: Longitudinal Data Linda K. Muthén Bengt Muthén Copyright 2009 Muthén & Muthén www.statmodel.com 06/29/2009 1 Table Of Contents

More information

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011) Ron Heck, Fall 2011 1 EDEP 768E: Seminar in Multilevel Modeling rev. January 3, 2012 (see footnote) Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October

More information

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter

More information

Factor Analytic Models of Clustered Multivariate Data with Informative Censoring (refer to Dunson and Perreault, 2001, Biometrics 57, )

Factor Analytic Models of Clustered Multivariate Data with Informative Censoring (refer to Dunson and Perreault, 2001, Biometrics 57, ) Factor Analytic Models of Clustered Multivariate Data with Informative Censoring (refer to Dunson and Perreault, 2001, Biometrics 57, 302-308) Consider data in which multiple outcomes are collected for

More information

Ron Heck, Fall Week 3: Notes Building a Two-Level Model

Ron Heck, Fall Week 3: Notes Building a Two-Level Model Ron Heck, Fall 2011 1 EDEP 768E: Seminar on Multilevel Modeling rev. 9/6/2011@11:27pm Week 3: Notes Building a Two-Level Model We will build a model to explain student math achievement using student-level

More information

Multilevel Mixture with Known Mixing Proportions: Applications to School and Individual Level Overweight and Obesity Data from Birmingham, England

Multilevel Mixture with Known Mixing Proportions: Applications to School and Individual Level Overweight and Obesity Data from Birmingham, England 1 Multilevel Mixture with Known Mixing Proportions: Applications to School and Individual Level Overweight and Obesity Data from Birmingham, England By Shakir Hussain 1 and Ghazi Shukur 1 School of health

More information

Introducing Generalized Linear Models: Logistic Regression

Introducing Generalized Linear Models: Logistic Regression Ron Heck, Summer 2012 Seminars 1 Multilevel Regression Models and Their Applications Seminar Introducing Generalized Linear Models: Logistic Regression The generalized linear model (GLM) represents and

More information

STA 216, GLM, Lecture 16. October 29, 2007

STA 216, GLM, Lecture 16. October 29, 2007 STA 216, GLM, Lecture 16 October 29, 2007 Efficient Posterior Computation in Factor Models Underlying Normal Models Generalized Latent Trait Models Formulation Genetic Epidemiology Illustration Structural

More information

Latent Class Analysis

Latent Class Analysis Latent Class Analysis Karen Bandeen-Roche October 27, 2016 Objectives For you to leave here knowing When is latent class analysis (LCA) model useful? What is the LCA model its underlying assumptions? How

More information

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3 University of California, Irvine 2017-2018 1 Statistics (STATS) Courses STATS 5. Seminar in Data Science. 1 Unit. An introduction to the field of Data Science; intended for entering freshman and transfers.

More information

Review of Panel Data Model Types Next Steps. Panel GLMs. Department of Political Science and Government Aarhus University.

Review of Panel Data Model Types Next Steps. Panel GLMs. Department of Political Science and Government Aarhus University. Panel GLMs Department of Political Science and Government Aarhus University May 12, 2015 1 Review of Panel Data 2 Model Types 3 Review and Looking Forward 1 Review of Panel Data 2 Model Types 3 Review

More information

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features Yangxin Huang Department of Epidemiology and Biostatistics, COPH, USF, Tampa, FL yhuang@health.usf.edu January

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Latent class analysis and finite mixture models with Stata

Latent class analysis and finite mixture models with Stata Latent class analysis and finite mixture models with Stata Isabel Canette Principal Mathematician and Statistician StataCorp LLC 2017 Stata Users Group Meeting Madrid, October 19th, 2017 Introduction Latent

More information

Model Estimation Example

Model Estimation Example Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions

More information

Propensity Score Weighting with Multilevel Data

Propensity Score Weighting with Multilevel Data Propensity Score Weighting with Multilevel Data Fan Li Department of Statistical Science Duke University October 25, 2012 Joint work with Alan Zaslavsky and Mary Beth Landrum Introduction In comparative

More information

Longitudinal Nested Compliance Class Model in the Presence of Time-Varying Noncompliance

Longitudinal Nested Compliance Class Model in the Presence of Time-Varying Noncompliance Longitudinal Nested Compliance Class Model in the Presence of Time-Varying Noncompliance Julia Y. Lin Thomas R. Ten Have Michael R. Elliott Julia Y. Lin is a doctoral candidate (E-mail: jlin@cceb.med.upenn.edu),

More information

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary

More information

CHAPTER 3. SPECIALIZED EXTENSIONS

CHAPTER 3. SPECIALIZED EXTENSIONS 03-Preacher-45609:03-Preacher-45609.qxd 6/3/2008 3:36 PM Page 57 CHAPTER 3. SPECIALIZED EXTENSIONS We have by no means exhausted the possibilities of LGM with the examples presented thus far. As scientific

More information

Introduction to Statistical Analysis

Introduction to Statistical Analysis Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive

More information

Latent Variable Centering of Predictors and Mediators in Multilevel and Time-Series Models

Latent Variable Centering of Predictors and Mediators in Multilevel and Time-Series Models Latent Variable Centering of Predictors and Mediators in Multilevel and Time-Series Models Tihomir Asparouhov and Bengt Muthén August 5, 2018 Abstract We discuss different methods for centering a predictor

More information

Logistic Regression: Regression with a Binary Dependent Variable

Logistic Regression: Regression with a Binary Dependent Variable Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression

More information

Bayesian Mixture Modeling

Bayesian Mixture Modeling University of California, Merced July 21, 2014 Mplus Users Meeting, Utrecht Organization of the Talk Organization s modeling estimation framework Motivating examples duce the basic LCA model Illustrated

More information

Logistic And Probit Regression

Logistic And Probit Regression Further Readings On Multilevel Regression Analysis Ludtke Marsh, Robitzsch, Trautwein, Asparouhov, Muthen (27). Analysis of group level effects using multilevel modeling: Probing a latent covariate approach.

More information

Analysing data: regression and correlation S6 and S7

Analysing data: regression and correlation S6 and S7 Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association

More information

Strati cation in Multivariate Modeling

Strati cation in Multivariate Modeling Strati cation in Multivariate Modeling Tihomir Asparouhov Muthen & Muthen Mplus Web Notes: No. 9 Version 2, December 16, 2004 1 The author is thankful to Bengt Muthen for his guidance, to Linda Muthen

More information

Time-Invariant Predictors in Longitudinal Models

Time-Invariant Predictors in Longitudinal Models Time-Invariant Predictors in Longitudinal Models Today s Class (or 3): Summary of steps in building unconditional models for time What happens to missing predictors Effects of time-invariant predictors

More information

Multilevel Modeling: A Second Course

Multilevel Modeling: A Second Course Multilevel Modeling: A Second Course Kristopher Preacher, Ph.D. Upcoming Seminar: February 2-3, 2017, Ft. Myers, Florida What this workshop will accomplish I will review the basics of multilevel modeling

More information

Variable-Specific Entropy Contribution

Variable-Specific Entropy Contribution Variable-Specific Entropy Contribution Tihomir Asparouhov and Bengt Muthén June 19, 2018 In latent class analysis it is useful to evaluate a measurement instrument in terms of how well it identifies the

More information

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,

More information

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review

More information

Mplus Short Courses Topic 4. Growth Modeling With Latent Variables Using Mplus: Advanced Growth Models, Survival Analysis, And Missing Data

Mplus Short Courses Topic 4. Growth Modeling With Latent Variables Using Mplus: Advanced Growth Models, Survival Analysis, And Missing Data Mplus Short Courses Topic 4 Growth Modeling With Latent Variables Using Mplus: Advanced Growth Models, Survival Analysis, And Missing Data Linda K. Muthén Bengt Muthén Copyright 2010 Muthén & Muthén www.statmodel.com

More information

Sample Size and Power Considerations for Longitudinal Studies

Sample Size and Power Considerations for Longitudinal Studies Sample Size and Power Considerations for Longitudinal Studies Outline Quantities required to determine the sample size in longitudinal studies Review of type I error, type II error, and power For continuous

More information

Individualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models

Individualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models Individualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models Nicholas C. Henderson Thomas A. Louis Gary Rosner Ravi Varadhan Johns Hopkins University July 31, 2018

More information

Testing the Limits of Latent Class Analysis. Ingrid Carlson Wurpts

Testing the Limits of Latent Class Analysis. Ingrid Carlson Wurpts Testing the Limits of Latent Class Analysis by Ingrid Carlson Wurpts A Thesis Presented in Partial Fulfillment of the Requirements for the Degree Master of Arts Approved April 2012 by the Graduate Supervisory

More information

Glossary for the Triola Statistics Series

Glossary for the Triola Statistics Series Glossary for the Triola Statistics Series Absolute deviation The measure of variation equal to the sum of the deviations of each value from the mean, divided by the number of values Acceptance sampling

More information

Joint Modeling of Longitudinal Item Response Data and Survival

Joint Modeling of Longitudinal Item Response Data and Survival Joint Modeling of Longitudinal Item Response Data and Survival Jean-Paul Fox University of Twente Department of Research Methodology, Measurement and Data Analysis Faculty of Behavioural Sciences Enschede,

More information

SEM for Categorical Outcomes

SEM for Categorical Outcomes This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Selection endogenous dummy ordered probit, and selection endogenous dummy dynamic ordered probit models

Selection endogenous dummy ordered probit, and selection endogenous dummy dynamic ordered probit models Selection endogenous dummy ordered probit, and selection endogenous dummy dynamic ordered probit models Massimiliano Bratti & Alfonso Miranda In many fields of applied work researchers need to model an

More information

For more information about how to cite these materials visit

For more information about how to cite these materials visit Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/

More information

Econometric Analysis of Cross Section and Panel Data

Econometric Analysis of Cross Section and Panel Data Econometric Analysis of Cross Section and Panel Data Jeffrey M. Wooldridge / The MIT Press Cambridge, Massachusetts London, England Contents Preface Acknowledgments xvii xxiii I INTRODUCTION AND BACKGROUND

More information

A Study of Statistical Power and Type I Errors in Testing a Factor Analytic. Model for Group Differences in Regression Intercepts

A Study of Statistical Power and Type I Errors in Testing a Factor Analytic. Model for Group Differences in Regression Intercepts A Study of Statistical Power and Type I Errors in Testing a Factor Analytic Model for Group Differences in Regression Intercepts by Margarita Olivera Aguilar A Thesis Presented in Partial Fulfillment of

More information

Multinomial Logistic Regression Models

Multinomial Logistic Regression Models Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word

More information

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall 1 Structural Nested Mean Models for Assessing Time-Varying Effect Moderation Daniel Almirall Center for Health Services Research, Durham VAMC & Dept. of Biostatistics, Duke University Medical Joint work

More information

Sampling bias in logistic models

Sampling bias in logistic models Sampling bias in logistic models Department of Statistics University of Chicago University of Wisconsin Oct 24, 2007 www.stat.uchicago.edu/~pmcc/reports/bias.pdf Outline Conventional regression models

More information

Semiparametric Generalized Linear Models

Semiparametric Generalized Linear Models Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student

More information

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science. Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint

More information

Latent class analysis with multiple latent group variables

Latent class analysis with multiple latent group variables Communications for Statistical Applications and Methods 2017 Vol. 24 No. 2 173 191 https://doi.org/10.5351/csam.2017.24.2.173 Print ISSN 2287-7843 / Online ISSN 2383-4757 Latent class analysis with multiple

More information

Department of Social Systems and Management. Discussion Paper Series. Bayesian Analysis of the Latent Growth Model with Dropout

Department of Social Systems and Management. Discussion Paper Series. Bayesian Analysis of the Latent Growth Model with Dropout Department of Social Systems and Management Discussion Paper Series No.1259 Bayesian Analysis of the Latent Growth Model with Dropout by Daisuke TANAKA and Yuichiro KANAZAWA April 2010 UNIVERSITY OF TSUKUBA

More information

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1 Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson

More information

The impact of covariance misspecification in multivariate Gaussian mixtures on estimation and inference

The impact of covariance misspecification in multivariate Gaussian mixtures on estimation and inference The impact of covariance misspecification in multivariate Gaussian mixtures on estimation and inference An application to longitudinal modeling Brianna Heggeseth with Nicholas Jewell Department of Statistics

More information

Selection on Observables: Propensity Score Matching.

Selection on Observables: Propensity Score Matching. Selection on Observables: Propensity Score Matching. Department of Economics and Management Irene Brunetti ireneb@ec.unipi.it 24/10/2017 I. Brunetti Labour Economics in an European Perspective 24/10/2017

More information

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation Biost 58 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 5: Review Purpose of Statistics Statistics is about science (Science in the broadest

More information

Ronald Heck Week 14 1 EDEP 768E: Seminar in Categorical Data Modeling (F2012) Nov. 17, 2012

Ronald Heck Week 14 1 EDEP 768E: Seminar in Categorical Data Modeling (F2012) Nov. 17, 2012 Ronald Heck Week 14 1 From Single Level to Multilevel Categorical Models This week we develop a two-level model to examine the event probability for an ordinal response variable with three categories (persist

More information

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model

More information

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs Michael J. Daniels and Chenguang Wang Jan. 18, 2009 First, we would like to thank Joe and Geert for a carefully

More information

Bayesian methods for missing data: part 1. Key Concepts. Nicky Best and Alexina Mason. Imperial College London

Bayesian methods for missing data: part 1. Key Concepts. Nicky Best and Alexina Mason. Imperial College London Bayesian methods for missing data: part 1 Key Concepts Nicky Best and Alexina Mason Imperial College London BAYES 2013, May 21-23, Erasmus University Rotterdam Missing Data: Part 1 BAYES2013 1 / 68 Outline

More information

Statistical Methods for Alzheimer s Disease Studies

Statistical Methods for Alzheimer s Disease Studies Statistical Methods for Alzheimer s Disease Studies Rebecca A. Betensky, Ph.D. Department of Biostatistics, Harvard T.H. Chan School of Public Health July 19, 2016 1/37 OUTLINE 1 Statistical collaborations

More information

LONGITUDINAL STUDIES OF ACHIEVEMENT GROWTH USING LATENT VARIABLE MODELING

LONGITUDINAL STUDIES OF ACHIEVEMENT GROWTH USING LATENT VARIABLE MODELING LONGITUDINAL STUDIES OF ACHIEVEMENT GROWTH USING LATENT VARIABLE MODELING BENGT O. MUTHI~N UNIVERSITY OF CALIFORNIA, LOS ANGELES SIEK-TOON KHO0 ARIZONA STATE UNIVERSITY ABSTRACT: This article gives a pedagogical

More information

Generalization to Multi-Class and Continuous Responses. STA Data Mining I

Generalization to Multi-Class and Continuous Responses. STA Data Mining I Generalization to Multi-Class and Continuous Responses STA 5703 - Data Mining I 1. Categorical Responses (a) Splitting Criterion Outline Goodness-of-split Criterion Chi-square Tests and Twoing Rule (b)

More information