SESSION 2 ASSIGNED READING MATERIALS

Size: px
Start display at page:

Download "SESSION 2 ASSIGNED READING MATERIALS"

Transcription

1 Introduction to Latent Class Modeling using Latent GOLD SESSION 2 SESSION 2 ASSIGNED READING MATERIALS Copyright 2012 by Statistical Innovations Inc. All rights reserved. No part of this material may be reproduced or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission from Statistical Innovations Inc.

2 LATENT CLASS MODELS Jay Magidson, Statistical Innovations Inc. Jeroen K. Vermunt, Tilburg University Address of correspondence: Jay Magidson Statistical Innovations Inc. Suite Concord Ave. Belmont MA phone: (617) fax: (617)

3 Assigned Reading: Sage A: 4. OTHER TYPES OF LATENT CLASS MODELS Thus far, we have focused on the traditional LC modeling approach, including some important extensions such as covariates, several latent variables, and local dependencies. Some common characteristics of these models are that they serve as scaling methods or tools for dealing with measurement error, that indicators are nominal or ordinal, and that local independence between indicators is the primary model assumption. In this section, we discuss other types of LC models. They are not used as scaling tools, but as clustering methods, tools for dealing with unobserved heterogeneity, density estimation methods, or random-coefficients models (McLachlan and Peel, 2000). Moreover, indicators or dependent variables can be of scale types other than nominal or ordinal and local independence is no longer the basic model assumption. As we will see, in some cases there is only one indicator or dependent variable. The next section presents simple mixture models for univariate distributions, with examples of mixtures of normals and mixtures of Poisson distributions. Then, we extend this basic model by including predictors, yielding what is called mixture regression or LC regression models. We present an example of a mixed linear regression model, and show how the method can deal with various types of repeated measurements. Special attention is given to the relationship with hierarchical or multilevel models. Then, we present another extension of the simple mixture model; that is, a mixture model for multivariate distributions. As will be shown, the resulting LC model can be seen as a model-based alternative to standard hierarchical clustering 3

4 methods like K-means. We end with a short overview of LC methods that were not discussed in detail. 4.1 Simple Mixture Models Figure 3: Simulated Distribution from a Mixture of Two Normals Std. Dev = 2.14 Mean = 1.16 N = Y Consider the histogram depicted in Figure 3. This generated data set of 1,000 cases is obtained from a population consisting of a mixture of two normal distributions. For 60 percent of the population, the variable of interest follows a normal distribution with a mean of 0 and a variance of 1, N(0,1); for the other 40 percent, the mean 4

5 equals 3 and the variance 4, N(3,4). The normal curve that is drawn through the histogram shows that the resulting mixture is clearly not normally distributed. A model that can be used to describe such a phenomenon is a finite mixture model (Everitt and Hand, 1981; McLachlan and Peel, 2000), which is a particular kind of LC model. The basic formula for a mixture of univariate distributions is T X f ( y ) f ( y ). (4) t1 t t The left-hand side of Equation (4) indicates that we are interested in describing the distribution of a random variable y, which depends on a set of unknown parameters X. The right-hand side contains two terms: t is the probability of belonging to latent class or mixture component t and f y ) is the distribution of y within latent class t given some unknown parameters ( t t. The class-specific distribution of y is assumed to belong to a particular parametric family. Depending on the scale-type of y, this can, for instance, be a normal, Poisson, binomial, exponential, or gamma distribution. The summation on the right-hand side indicates that the distribution of y is a weighted mean of the class-specific distributions, where the latent class proportions serve as weights. Mixture models like these have two important types of applications. The first is density estimation: complicated distributions can be approximated by a mixture of simple parametric distributions. Another important application type is 5

6 clustering, in which case the class-specific parameters are used to define the clusters and the posterior membership probabilities are used to classify cases into the most appropriate cluster. Table 9: Test Results for Generated Mixture of Normals Data Model Log- Likelihood BIC LL Number of Parameters Equal variances 1-Class Class Class Class Unequal Variances 2-Class Class Class Table 9 presents test results for various models fitted to the data depicted in Figure 3. We estimated 1- to 4-class mixtures of normal distributions with equal and unequal within cluster-variances. As can be seen, the BIC measure identifies the correct model, the two-cluster model with unequal within-cluster variances, as best. The three-class model with equal within-cluster variances fits almost as well as, showing that a simpler parametric form can sometimes be compensated by a larger number of mixture components. In the two-class model with unequal variances, the estimated probability of belonging to class one is.64. This class has an estimated mean of and a variance of The mean and variance of the other class equals 3.24 and Note that these estimates are close to the population values we used to generate this data set. 6

7 Table 10. Observed and Estimated Frequency Distribution of Packs of Hard-Candy Purchased During Last 7 Days under the 1-Class and 3-Class Poisson Model Number of Packages Frequencies Observed 1-Class Model 3-Class Model Table 10 provides a data set taken from Dillon and Kumar (1994) that we will use as a second example. It gives the observed frequency distribution of the number of packs of hard-candy consumed by 456 respondents during the 7 days prior to the survey. Because the outcome variable is a count without a fixed maximum, it is most natural to assume that it follows a Poisson distribution. The table also reports the estimated frequency distribution obtained with a standard, or 1-class, Poisson model, as well as with a 3-class mixture Poisson model. As can be seen, the standard Poisson 7

8 model does not fit the empirical distribution at all, while the 3-class Poisson describes the data almost perfectly. This shows that a mixture of simple parametric distributions can be used to describe a quite complicated empirical distribution. Test results obtained when applying mixture Poisson models to the hardcandy data set show that models with 2 and 3 mixture components perform much better than the standard Poisson model. As is typical, there is a saturation point at which increasing the number of classes no longer increases the log-likelihood function: in this case, it occurs at 4 classes. The 3-cluster solution is the one that is preferred according to the BIC criterion. The estimated latent class proportions in the 3-class model are 0.54, 0.28, and 0.18, and the Poisson rates are 3.48, 0.29, and This means that we identified a small cluster of heavy users (more than 11 packs in 7 days), a cluster containing slightly more than a quarter of the respondents with almost no usage, and a large group of moderate users. 8

9 Assigned Reading: Sage B: 4.2 LC Regression Models In the simple mixture models discussed above, it was assumed that the mean of the chosen parametric distribution differs across latent classes. This can also be expressed by specifying a linear regression model for the mean of the distribution of interest, t, after applying some transformation or link function g(..) that depends on the scale type of the y variable. For the mean of a binomial or multinomial distribution, we use a logit transformation; for a Poisson mean, a log transformation; and for a normal mean, no transformation or an identity link. The regression model has the form g( t) 0t. As can be seen, this regression model contains only an intercept and this intercept is class-specific. Let w denote a set of predictors or explanatory variables. Suppose we are no longer interested is the unconditional distribution of y, but in the conditional distribution of y given w, f y w, ). A natural way to express the dependency of y ( t on w is by the inclusion of the set of predictors w in the right-hand side of the regression equation. In the case of a single predictor w, the resulting LC regression model (Wedel and DeSarbo, 1994) has the form g( ) tw, t 0t 1 where 0t and 1t are the class-specific regression coefficients. 9

10 Figure 4: Simulated 2-Class LC Regression Model observed YLC1 YLC2 Figure 4 depicts a data set generated from a population consisting of two latent classes, with class-specific regression models equal to 1 3w and 0 1w. It also 1 2 compares the estimated y values for the two-class model (YLC2), with the standard, 1-class, regression model (YLC1). As can be seen, the description given by the standard regression model is very poor compared to the 2-class model. The LC 10

11 regression modeling procedure has no problem identifying the two regression lines without pre-knowledge of class membership. In a LC regression model, the latent variable is a predictor that interacts with the observed predictors, which means that it serves as a moderator variable. Compared to a standard regression model where all predictors are observed this basic LC regression model provides several useful functions. First, it can be used to weaken standard regression assumptions about the nature of the effects (linear, no interactions) and the error term (independent of predictors, particular distribution, homoskedastic). Second, it makes it possible to identify and correct for sources of unobserved heterogeneity. As explained below, this is especially useful in situations where there are repeated measurements or other types of dependent observations. Longitudinal data applications are sometimes referred to as LC or mixture growth models (each latent class has its own growth curve). Third, it can be used to detect outliers since these are cases for which the primary regression model does not hold. An important application area for LC regression modeling is clustering or segmentation (Wedel and Kamakura, 1998). In particular, ratings- and choice-based conjoint studies are designed to identify subgroups (segments) that react differently to product characteristics, which is the same as saying that these groups have different regression coefficients. This type of application is illustrated in more detail below with an empirical example. 11

12 Assigned Reading: Sage C: Example: repeated measurements or clustered observations As explained below, the LC regression model can be viewed as a random-coefficients model that, similar to multilevel or hierarchical models, can take dependencies between observations into account. This extends the application of LC regression models to situations with repeated measurements or other types of dependent observations. We will illustrate LC regression with repeated measurements using an application to longitudinal survey data. This is, therefore, an example of a LC growth model. The data set consist of 264 participants in the 1983 to 1986 yearly waves of the British Social Attitudes Survey (McGrath and Waterton, 1986). The dependent variable is the number of yes responses on seven yes/no questions as to whether it is woman's right to have an abortion under specific circumstances. Because this is a count variable with a fixed total, it is most natural to work with a logit link and binomial error function. The predictors that we used are the year of measurement (1=1983; 2=1984; 3=1985; 4=1986) and religion (1=Roman Catholic, 2=Protestant; 3=Other; 4=No religion). The effect of year of measurement is assumed to be classdependent and the effect of religion is assumed to be the same for all classes. We estimated models with 1 to 5 classes, and the 4-class model turned out to performs best in terms of the BIC criterion. We also estimated more restricted models in which the time effect is assumed to be linear and/or the time effect is assumed to be class independent. These models did not describe the data as well as our four-class model, which indicates that the time trend is non linear and heterogeneous. 12

13 Table 11: Parameter Estimates for the Abortion Example Parameter Class 1 Class 2 Class 3 Class 4 Mean Std.Dev. Class size Intercept Year Religion Roman Catholic Protestant Other No religion The parameters obtained with the 4-class model appear in Table 11. The parameter means across classes indicate that the attitudes are most positive at the last time point and most negative at the second time point. Furthermore, the effects of religion show that people without religion are most in favor and Roman Catholics and Others are most against abortion. Protestants have a position that is close to the no-religion group. The class-specific parameters indicate that the 4 latent classes have very different intercepts and time patterns. The largest class 1 is most against abortion and class 3 is most in favor of abortion. Both latent classes are very stable over time. The overall level of latent class 2 is somewhat higher than of class 1, and it shows somewhat more change of the attitude over time. People belonging to latent class 4 are very instable: at the first two time points they are similar to class 2, at the third time point to class 4, and at the last time point again to class 2 (this can be seen by combining the intercepts with the time effects). Class 4 could therefore be labeled as 13

14 random responders. It is interesting to note that in a three-class solution the randomresponder class and class two are combined. Thus, by going from a three- to a fourclass solution one identifies the interesting group with less stable attitudes. Vermunt and Van Dijk (2001) used the same empirical example to illustrate the similarity between LC regression models and random-coefficients, multilevel, or hierarchical models. Using terminology from multilevel modeling, the time variable is a level-1 predictor and religion a level-2 predictor. The effect of the level-1 predictor time is allowed to vary across level 2 units, in this case individuals. The LC regression output can be transformed into the usual output produced by a standard multilevel or hierarchical model -- means, variances, covariances of the intercept and the three time effects -- by elementary statistical operations. The most important part of this multilevel output is what appears in the last two columns of Table 11. A difference between LC regression analysis and standard hierarchical models is that the former does not make strong assumptions about the distribution of the random coefficients. LC regression models can, therefore, be seen as nonparametric hierarchical models in which the distribution of the random coefficients is approximated by a limited number of mass points (= latent classes). As shown by Vermunt and Van Dijk (2001), the LC approach has the practical advantage of being much less computationally intensive than parametric models, and substantively, easier-to-interpret results are often obtained. 14

15 REFERENCES Agresti, A. (2002). Categorical Data Analysis. Second Edition. New York: Wiley. Andrews, R.L., Ansari, A., and Currim, I.S. (2002). Hierarchical bayes versus finite mixture conjoint analysis models: A comparison of fit, prediction, and partworth recovery, Journal of Marketing Research, 39, Banfield, J.D., and Raftery, A.E. (1993). Model-based Gaussian and non-gaussian clustering. Biometrics, 49, Böckenholt, U. (2002). Comparison and choice: analyzing discrete preference data by latent class scaling models. J.A. Hagenaars and A.L. McCutcheon (eds.), Applied Latent Class Analysis. Cambridge University Press. Clogg, C.C., and Goodman, L.A. (1984). Latent structure analysis of a set of multidimensional contingency tables. Journal of the American Statistical Association, 79, Croon, M.A. (2002) Ordering the classes. J.A. Hagenaars and A.L. McCutcheon (eds.), Applied Latent Class Analysis. Cambridge University Press. Dayton, C.M., and Macready, G.B. (1988). Concomitant-variable latent-class models. Journal of the American Statistical Association, 83, Dayton, C.M. (1998). Latent Class Scaling Models. Thousand Oakes: Sage Publications. Dillon, W.R., and Kumar, A. (1994). Latent structure and other mixture models in marketing: An integrative survey and overview. R.P. Bagozzi (ed.), Advanced Methods of Marketing Research, , Cambridge: Blackwell Publishers. Everitt, B.S., and Hand, D.J. (1981). Finite Mixture Distributions. London: Chapman and Hall. Fraley, C. and Raftery, A.E. (1998). How many clusters? Which clustering method? Answers via model-based cluster analysis. Department of Statistics, University of Washington: Technical Report no. 329.: Formann, A.K. (1992). Linear logistic latent class analysis for polytomous data. Journal of the American Statistical Association, 87, Goodman, L.A. (1974a). Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika, 61, Goodman, L.A. (1974b). The analysis of systems of qualitative variables when some of the variables are unobservable. Part I: A modified latent structure approach, American Journal of Sociology, 79, Haberman, S.J. (1979). Analysis of Qualitative Data, Vol 2, New Developments. New York: Academic Press. Hagenaars, J.A. (1988). Latent structure models with direct effects between indicators: local dependence models. Sociological Methods and Research, 16, Hagenaars, J.A. (1990). Categorical Longitudinal Data Loglinear Analysis of Panel, Trend and Cohort Data. Newbury Park: Sage. Heinen, T. (1996). Latent Class and Discrete Latent Trait Models: Similarities and Differences. Thousand Oakes: Sage Publications. Hunt, L., and Jorgensen. M. (1999). Mixture model clustering using the MULTIMIX program. Australian and New Zeeland Journal of Statistics, 41:

16 Jedidi, K., Jagpal, H.S., and DeSarbo, W.S. (1997). Finite-mixture structural equation models for response-based segmentation and unobserved heterogeneity. Marketing Science, 16, Landis, J.R., and Koch, G.G. (1977). The measurement of observer agreement for categorical data, Biometrics, 33, Langeheine, R., and Van de Pol, F. (1994). Discrete-time mixed Markov latent class models. A. Dale and R.B. Davies (eds.), Analyzing social and political change: a casebook of methods, Langeheine, R., Pannekoek, J., Van de Pol, F. (1996). Bootstrapping goodness-of-fit measures in categorical data analysis. Sociological Methods and Research, 24, Lazarsfeld, P.F., and Henry, N.W. (1968). Latent Structure Analysis. Boston: Houghton Mill. Magidson, J., and Vermunt, J.K. (2001). Latent class factor and cluster models, bi-plots and related graphical displays, Sociological Methodology, 31, Magidson J. and Vermunt, J.K. (2002). Latent class models for clustering: A comparison with K-means. Canadian Journal of Marketing Research, 20, McCutcheon, A.L. (1987). Latent Class Analysis, Sage University Paper. Newbury Park: Sage Publications. McGrath, K., and Waterton, J. (1986). British Social Attitudes, Panel Survey. London: Social and Community Planning Research, Technical Report. McLachlan, G.J., and Peel, D. (2000). Finite Mixture Models. New York: Wiley. Rindskopf, R., and Rindskopf, W. (1986). The value of latent class analysis in medical diagnosis. Statistics in Medicine, 5, Uebersax JS. (1999) Probit latent class analysis with dichotomous or ordered category measures: conditional independence/dependence models. Applied Psychological Measurement, 23, Uebersax JS, and Grove, WM. (1990). Latent class analysis of diagnostic agreement. Statistics in Medicine, 9, Van der Heijden P.G.M., Gilula, Z., and Van der Ark, L.A. (1999). On a relationship between joint correspondence analysis and latent class analysis. Sociological Methodology, 29, Vermunt, J.K. (1997). Log-linear Models for Event Histories. Thousand Oakes: Sage Publications. Vermunt, J.K., and Magidson, J. (2000). Latent GOLD 2.0 User's Guide. Belmont, MA: Statistical Innovations Inc. Vermunt, J.K., and Magidson, J. (2002). Latent class cluster analysis. J.A. Hagenaars and A.L. McCutcheon (eds.), Applied Latent Class Analysis, Cambridge University Press. Vermunt, J.K. and Van Dijk. L. (2001). A nonparametric random-coefficients approach: the latent class regression model. Multilevel Modelling Newsletter, 13, Wedel, M., and DeSarbo, W.S (1994). A review of recent developments in latent class regression models. R.P. Bagozzi (ed.), Advanced Methods of Marketing Research, , Cambridge: Blackwell Publishers. Wedel, M., and Kamakura, W.A. (1998). Market Segmentation: Concepts and Methodological Foundations. Boston: Kluwer Academic Publishers. Yung, Y.F. (1997). Finite mixtures in confirmatory factor-analysis models. Psychometrika, 62,

17 LATENT GOLD 4.0 USER S GUIDE Jeroen K. Vermunt & Jay Magidson Statistical Innovations Thinking outside the brackets! TM 17

18 CHAPTER 5. BASIC STEPS FOR MODEL DEVELOPMENT Assigned Reading: LG U ser' s G uide A: LC REGRESSION MODEL Various restrictions are available for intercepts and predictor effects. In addition, for models with continuous dependent variables, restrictions are available for error variances. The various restrictions include class independent restrictions, order restrictions, zero restrictions, fixed-value (offset) restrictions, and equality restrictions. By default, no restrictions are imposed. Restrictions can be placed prior to estimating an initial model to incorporate prior knowledge. For example,, the zero constraints make it possible to specify a different regression model -- with different predictors -- for each latent class based of a priori knowledge about the classes. A specific application for this is a model with a random responders class for which all predictor effects are zero. An application of the equality constraints is the possibility of defining a DFactor-like structure in which, for example, one DFactor influences the intercept and another the predictor effects. Ordering constraints are important if one has a priori knowledge on the direction of an effect. For example, the price effect on a product rating is usually assumed to be negative (or non-positive) in each latent class (segment). The estimates obtained with Regression will be constrained to be in agreement with this assumption if the price effect is specified to be Descending. Alternatively, restrictions may be employed post-hoc to estimate a new model after viewing the results from an estimated model. For Regression, when K classes have been specified in Step 6, the Model Tab contains K individual columns for each of the K latent classes (1,2,,K). Additional columns are labeled Class Independent and Order Restriction. When the range option is used in Step 6 to specify the estimation of models with different numbers of latent classes, class-specific columns are absent from the Model Tab and only the class independent and order restrictions can be applied. Such restrictions are applied to each of the models generated by the range option. 18

19 LATENT GOLD 4.0 USER'S GUIDE Figure Model Tab for LC Regression Model. Selected intercept and predictor effects can be set equal to zero for certain classes (No Effect), equated across two or more classes (Merge Effect), or equated across all classes (Class Independent). The effects of a predictor can be restricted to be ordered in each class (Ascending or Descending). Error variances can be equated between selected classes (Merge Effect) or between all classes (Class Independent). The effect of a numeric predictor on a dependent variable that has a scale type other than nominal can be fixed to 1 in one or more classes (Offset). The first row consists of the label 'Intercept', followed by separate row for each predictor. Restrictions can be set separately for each row. By default, no restrictions are imposed. This is indicated by the unique integers (1,2,,K) that appear in each row, by the label 'No' indicating that the class independence restriction has not been imposed and the label 'None' indicating that no order restrictions have been selected for any predictors. S P E C I F Y I N G E Q U A L I T Y R E S T R I C T I O N S A C R O S S C L A S S E S The effects for a predictor may be specified as Class Independent (Fixed Effects) or Class Dependent (Random Effects, default). When the Class Independent restriction is applied, the regression coefficient for that predictor is restricted to be equal between each of the K latent classes (segments). When Class Independent is selected for a predictor, an '=' appears to the right of the predictor name in the Variables Tab. For a 1-class regression model (K=1), selection of this option has no effect. To select this option Right click on the desired cell(s) in the column labeled 'Class Independent' to retrieve the popup menu. 19

20 CHAPTER 5. BASIC STEPS FOR MODEL DEVELOPMENT Select 'Yes'. The 'No' changes to 'Yes' in the selected cells, and the indices in the selected rows all change to '1' to indicate that the effects for classes 2, 3, etc. are all restricted to be equal to the corresponding effects for class 1. Alternatively, the restriction of class independence can be imposed as follows: Select all the cells containing the indices for a chosen row Right click to bring up the pop-up menu Select 'Merge Effect'. To undo these restrictions, reselect the cells, right click and select 'Separate Effect'. This alternative way of imposing the class independence restriction can also be used to specify equal effects between some but not all the classes. Simply select those class indices for which the across-class equality restriction is desired and select 'Merge Effect'. The indices selected will change to be equal. 'Merge Effect' can be used more than once for a given row, so that it is possible to specify that certain effects for classes 1 and 2 be restricted to be equal and that the effects for classes 3 and 4 be equal. After imposing these restrictions, the associated indices will appear as ' '. Note: the numbers indicate to which class the effect for the class concerned is equated. The regression intercept may also be restricted to be class independent. To set this option, right click on the row labeled 'Intercept' in the column labeled 'Class Independent' to retrieve the popup menu and select 'Yes'. Note: For dependent variables having scale type 'Ordinal', a third option 'No - Simple' is also available. Rather than complete class independence ('Yes'), a 'simple' implementation of class independence analogous to a class independent intercept specification in the linear regression model (continuous scale type) is used where the (partial) dependent variable means are taken to be class independent rather than the entire marginal dependent variable distribution. For further details of this 'class-independent simple' specification for the intercept in ordinal regression models, see section 5.3 of Technical Guide S P E C I F Y I N G Z E R O R E S T R I C T I O N S To restrict one or more effects to zero, select the desired effects, right click, and select 'No Effect'. This causes a '-' to appear in the selected cells which indicates that the effect(s) associated with these cells is now restricted to zero. These menu options can also be used in combination with each other to produce a desired result. For example, first selecting class independence, followed by selecting 'Separate Effect' in one of these cells causes the index for the selected cell to return to its default setting. However, the remaining cells in that row remain restricted to be equal and 'Yes' automatically changes to 'No' in the Class Independent column. S P E C I F Y I N G O R D E R R E S T R I C T I O N S Order Restriction can be used to indicate that the regression coefficient is monotonically increasing (Ascending) or decreasing (Descending). 20

21 LATENT GOLD 4.0 USER'S GUIDE To impose an ordering restriction, right click on one or more cells containing a 'None' label and right click to retrieve the relevant pop-up menu. Select either Ascending or Descending to impose the desired ordering restriction on the selected predictor effect(s). This restriction is imposed across all classes. With a dependent variable of a scale type other than 'nominal', the usage and interpretation of this restriction is straightforward. In the case of a nominal dependent variable, the difference between parameters of adjacent categories of the dependent variable will be in agreement with the specified order restrictions (see section 5.3 of Technical Guide). SPECIFYING FIXED-VALUE RESTRICTIONS With the Offset option one can indicate that the effect of a numeric predictor should be fixed to 1 in selected classes. The option is not available for Nominal dependent variables. Note: It is possible to use the offset restriction to fix the effect to non-zero quantities other than 1 as well. For example, to restrict the effect of a numeric predictor to equal some desired constant c, you would first rescale that predictor by multiplying it by c. You would then use the rescaled version of the predictor in the model. A specific application occurs in the case in which one would like to fix the response probability to 0 for certain predictor values for certain classes. Suppose that the dependent variable is 1='buy', 0='no buy' and that you suspected that one latent class existed that always responded 'no buy' when the predictor variables reflected a certain pattern (e.g., PRICE $50). This option could be used to identify this latent class by creating a dummy predictor variable, say coded [-100, 0] where the value -100 refers to a situation which is suspected to always result in a 'no buy' for some class (fixing a logit coefficient to -100 amounts to fixing the probability of a 1='buy' response to 0). 21

22 Technical Guide for Latent GOLD 5.0: Basic and Advanced 1 Jeroen K. Vermunt and Jay Magidson Statistical Innovations Inc. (617) This document should be cited as J.K. Vermunt and J. Magidson (2013). Technical Guide for Latent GOLD 5.0: Basic and Advanced. Belmont Massachusetts: StatisticalInnovations Inc. 22

23 Assigned Reading: LG Technical G uide A: 5 Latent Class Regression Models The LC or FM Regression model implemented in Latent GOLD is a model with: a single nominal latent variable x, 2. T i replications or repeated observations of a single dependent variable y it, which may be nominal, ordinal, continuous, or a binomial or Poisson count, 3. Q numeric or nominal predictors z pred itq affecting y it via a GLM, where parameters may differ across latent classes, 16 References to LC or FM Regression analysis are Agresti (2002, section 13.2), Vermunt and Van Dijk (2001), Wedel and DeSarbo (1994), and Wedel and Kamakura (1998). 23

24 4. zero, equality, fixed value, and order restrictions on regression coefficients, 5. R numeric or nominal covariates z cov ir affecting x. The main difference between LC Regression analysis and the other forms of LC analysis implemented in Latent GOLD is that it contains a single dependent variable, which may, however, be observed more than once for each case. These multiple responses may be experimental replications, repeated measurements at different time points or occasions, clustered observations, responses on a set of questionnaire items, or other types of dependent observations. The value of the dependent variable for case i at replication t is denoted by y it, and its total number of replications by T i. Note that the index i in T i makes it possible to deal with unequal numbers of observations per case. In the context of LC Regression analysis, it makes sense to make a distinction between two types of exogenous variables: 1. variables influencing the latent variable, which we call covariates, 2. variables influencing the dependent variable, which we call predictors. Covariates will be denoted by zir cov and predictors by z pred itq, where the index reflects that the value of a predictor may change across replications. t in z pred itq A covariate, on the other hand, has the same value across all replications of a particular case. Note that, in fact, we are dealing with a two-level data set, where t indexes the lower-level observations within higher-level observation i. Covariates serve as higher-level exogenous variables and predictors as lower-level exogenous variables. This illustrates that Latent GOLD can be used to estimate (nonparametric) two-level or random-coefficient models (Aitkin, 1999; Skrondal and Rabe-Hesketh, 2004). Using t as an index for time points or time intervals, one obtains nonparametric random-coefficients models for longitudinal data, such growth data, event history data, and panel models with (Vermunt, 1997, 2002a, 2006; Vermunt and Van Dijk, 2002; Wedel et al., 1995; Zackin, De Gruttola, and Laird, 1996). 24

25 5.1 Probability Structure and Linear Predictors The most general probability structure that can be used in the Regression Module takes on the following form: f(y i z cov i, z pred i ) = K P (x z cov T i i ) x=1 t=1 f(y it x, z pred it ). (11) The main differences from the Cluster Module (see equation 6) are that in Regression we make a distinction between covariates and predictors, we allow for different numbers of replications per case, we assume that the conditional ) have the same form for each t, and we do not allow for direct effects between the multiple responses. In Latent GOLD s Regression Module, it is possible to specify a replication weight v it for each of the records of case i. This modifies the definition densities f(y it x, z pred it of f(y i z cov i, z pred i ) somewhat: f(y i z cov i, z pred i ) = K P (x z cov T i i ) x=1 t=1 { f(yit x, z pred it ) } v it. (12) Two possible applications of replication weights include grouping of records and differential weighting of responses. Another possible application is the analysis of multinomial counts. With an M-category multinomial count response variable, one would get M records for each case with responses equal to 1, 2, 3,..., M, respectively, and replication weights equal to the number of times the category concerned was selected. Another slight modification of the probability structure defined in equation (11) occurs with the zero-inflated option. Depending on the scale type, the model is expanded with either one or M latent classes that are assumed to give a particular response with probability one. Zero-inflated Poisson and binomial count models are obtained by adding a single latent class to the model, yielding a model with K + 1 Classes. Class K + 1 has a zero Poisson rate or zero binomial probability, or, equivalently, a probability of one of having zero events; that is, P (y it = 0 x, z pred it ) = 1 for x = K + 1. The same happens in the case of a continuous dependent variable with the additional modification that the dependent variable is assumed to be censored-normal distributed in the other K Classes. This model is sometimes referred to a censored-inflated linear regression model. For ordinal and nominal dependent variables, M Classes are added to the model, each of which responds 25

26 with probability one to a certain category; that is, P (y it = m x, z pred it ) = 1 for x = K + m. Such Classes are sometimes referred to a stayer Classes (in mover-stayer models) or brand-loyal Classes (in brand-loyalty models). Because the linear predictor in P (x z cov i ) has the same form as in LC Cluster models (see equation 9), we will pay no further attention to it here. The other two linear predictors of interest η x,zit and η m x,zit require more explanation because these have a specific form in LC Regression models. For continuous and count responses, we use η x,zit = β x0 + Q q=1 β xq z pred itq, where β x0 is a Class-specific intercept and β xq the Class-specific regression coefficient corresponding to predictor number q. Depending on the scale type of the response variable, this yields either a standard linear, log-linear Poisson, or binary logistic regression model with parameters that differ across latent classes. The above linear term differs from the one used in LC Cluster models in that all parameters can vary between Classes and no further restrictions are needed for identification. Another important difference is that neither η or β has an index t, which implies that parameters do not differ across repeated responses. Differences in predicted values between the multiple responses can thus only be the result of varying predictor values. 17 For nominal responses, we use a multinomial logit model having a linear predictor of the form η m x,zit = β xm0 + Q q=1 β xmq z pred itq, with depending on the selected coding type for nominal variables the identification restrictions M m=1 β xmq = 0, β x1q = 0, or β xmq = 0, for 0 q Q. An ordinal response variable is modeled with an adjacent-category logit model in which η m x,zit = β xm0 + Q q=1 β x q y m z pred itq. Here, y m is the score assigned to category m of the response variable. In the ordinal case, we need only an identifying effect or dummy constraints for the intercept term β xm0. 17 By defining an independent variable replication number or time, as well as the necessary product terms, one can make parameters replication or time dependent. 26

27 5.2 Some Special Cases The simplest probability structure for a LC Regression model occurs if there is a single response per case and no predictors; that is, K f(y i ) = P (x)f(y i x). x=1 This yields a simple univariate finite mixture model in which the mean and possibly also the variance of the distribution of y i is assumed to be Classdependent. Such a model without predictors makes it possible to describe the unobserved heterogeneity with respect to the distribution of y i in the population under study. 18 A more useful LC Regression model is obtained by including predictors in the model, such as, Here, f(y i x, z pred i1 f(y i z pred i1, z pred i2 ) = K x=1 P (x)f(y i x, z pred i1, z pred i2 )., z pred i2 ) denotes the distribution of the dependent variable y given a person s class membership x and predictor values z pred i1 and z pred i2. Depending on the type of dependent variable, the expected value of the appropriate distribution is restricted by means of a logistic, log-linear, or linear regression model. 19 An important extension of the above LC Regression model is obtained by making class membership dependent on covariates (Kamakura, Wedel, and Agrawal, 1994; Vermunt, 1997). An example of such a model is: f(y i z covc i1 z cov i2, z pred i1, z pred i2 ) = K x=1 P (x z cov i1, z cov i2 )f(y i x, z pred i1, z pred i2 ) In this model, it is assumed that the probability of belonging to latent class x depends on the values of zi1 cov and zi2 cov. This is equivalent to the way covariates can be used in LC Cluster models. 18 Relevant references on this topic are Böhning (2000), Dillon and Kumar (1994), Everitt and Hand (1981), Laird (1978), McLachlan and Basford (1988), McLachlan and Peel (2000), Magidson and Vermunt (2003a), and Titterington, Smith, and Makov (1985). 19 For some applications, see Böckenholt (1993), Land, McCall and Nagin (1996), Mare (1994), and Wedel and DeSarbo (1994). 27

28 As already mentioned above, there may be more than one observation per case; that is, there may be more than one replication of the same dependent and independent variables for each observational unit. Extending the above model to multiple replications yields the following probability structure: f(y i z cov i1, z cov i2, z pred i1, z pred i2 ) = K x=1 P (x zi1 cov, z cov T i i2 ) t=1 f(y it x, z pred it1, z pred it2 ). Such a LC Regression model for repeated measures is very similar to multilevel (two-level), mixed, or random-coefficients models, in which random effects are included to deal with the dependent observations problem. The LC Regression model is, in fact, a nonparametric random-effects model (Agresti, 2002, section 13.2; Aitkin, 1999; Skrondal and Rabe-Hesketh, 2004; Vermunt and Van Dijk, 2001). 5.3 Restrictions for the Class-specific Regression Coefficients Various types of restrictions can be imposed on the Class-specific regression coefficients: intercepts and predictor effects can be fixed to zero, restricted to be equal across certain or all Classes, and constrained to be ordered. Moreover, the effects of numeric predictors can be fixed to one. These constraints can either be used as a priori restrictions derived from theory or as post hoc restrictions on estimated models. Certain restrictions apply to parameters within each Class, while others apply across Classes. The within-class restrictions are: No Effect: the specified effect(s) are set to zero selected latent classes; Offset: the selected effect(s) are set to one, thus serving as an offset. 20 The offset effect applies to numeric predictors only, and only with ordinal, continuous, Poisson count, and binomial count responses (thus not with a nominal dependent variable). 20 The term offset stems from the generalized linear modeling framework. It refers to a regression coefficient that is fixed to 1, or equivalently, to a component that offsets the linear part of the regression model by a fixed amount. An offset provides the same role as a cell weight in log-linear an logit analysis. In log-linear and logit model, an offset is, in fact, the log of a cell weight. 28

29 Between-Class restrictions are: Merge Effects: the intercept or the effect a selected predictor is equated across 2 or more specified Classes; Class Independent: the intercept or the effects of a selected predictor is equated across all Classes; Order Restriction (ascending or descending): in each Class, the effect of a selected numeric predictor is assumed to have the same sign or the effects corresponding to a selected nominal predictor are assumed to be ordered (either ascending or descending). That is, for numeric predictors, the ascending restriction implies that the Class-specific coefficients should be at least zero (β 0) and the descending restriction that they are at most zero (β 0). For nominal predictors, ascending implies that the coefficient of category p+1 is larger than or equal to the one of category p (β p β p+1, for each p) and descending that the coefficient of category p + 1 is smaller than or equal to the one of category p (β p β p+1, for each p). Similarly to what was explained in the context of the order-restricted Cluster model, with a nominal dependent variable, the imposed order restriction applies to each of the adjacent category logits. In the case of ascending, this implies β m+1 β m 0 for numeric predictors and β m+1,p β p β m+1,p+1 β m,p+1 for nominal predictors (see Galindo and Vermunt, 2005; Vermunt, 1999; Vermunt and Hagenaars, 2004). The Class-specific error variances of a continuous dependent variable are restricted to be at least 1.0e-6 times the observed variance in the sample. The Class Independent option can be used to specify models in which some coefficients differ across Classes while others do not. This can either be on a priori grounds or can be based on the test statistics from previously estimated models. More specifically, if the Wald(=) test is not significant, it makes sense to check whether an effect can be assumed to be Class independent. There is a special variant of the Class-independent option called No Simple that can be used in conjunction with the intercept in an ordinal regression model. With this option, the intercept is modeled as β xm0 = β m0 + β x 0 y m, where β x 0 is subjected to an effect or dummy coding constraint. 29

30 This specification of a Class-specific intercept is much more parsimonious and is, in fact, equivalent to how x-y relationships with ordinal y s are modeled in LC Cluster models. Rather that estimating K M intercept terms, one now estimates only M + K 1 coefficients; that is, one extra coefficient per extra latent class. Order Restrictions are important if one has a priori knowledge about the sign of an effect. For example, the effect of price on persons preferences is usually assumed to be negative or better, non-positive for each latent class (segment). If the price effect is specified to be Descending, the resulting parameter estimate(s) will be constrained to be in agreement with this assumption. Another application of order restrictions occurs in growth modeling. Rather than assuming a specific functional form for the time dependence, one may wish to make the much less restrictive assumption that the time effect is monotonic (see, for example, Vermunt and Hagenaars, 2004). The No Effect option makes it possible to specify a different regression equation for each latent class. More specifically, each latent class may have different sets of predictors affecting the responses. Post hoc constraints can be based on the reported z value for each of the coefficients. An example of an a priori use of this constraint is the inclusion of a random-responder class, a latent class in which all coefficient are zero, except for the intercept. This is specified as follows: Intercept Predictor1 Predictor2 Class 1 Class 2 Class 3 Class , where indicates that the effect is equal to 0. In this example, Class 1 is the random-responder class. Merge Effects is a much more flexible variant of Class Independent. It can be use to equate the parameters for any set of latent classes. Besides post hoc constraints, very sophisticated a priori constraints can be imposed with this option. An important application is the specification of DFactorlike structures in which each latent class corresponds to the categories of two or more latent variables. For example, consider a set of constraints of the 30

31 form: Intercept Predictor1 Predictor2 Class 1 Class 2 Class 3 Class where the numbers in the cells indicate equal to Class #. This restricted 4-Class model is in fact a 2-dimensional DFactor model: the categories of DFactor 1 differ with respect to the intercept and the categories of DFactor 2 with respect to the two predictor effects. Specifically, level 1 of DFactor 1 is formed by Classes 1 and 2 and level 2 by Classes 3 and 4; level 1 of DFactor 2 is formed by Classes 1 and 3 and level 2 by Classes 2 and 4. The option Offset can be used to specify any nonzero fixed-value constraint on the Class-specific effect of a numeric predictor in models for ordinal, continuous and count responses. This means that it is possible to refine the definition of any Class (segment) by enhancing or reducing the estimated numeric predictor effect for that Class. Recall that numeric predictor q enters as β xq z pred itq in the linear term of the regression model. Suppose, that after estimating the model, the estimate for β xq turned out to be 1.5 for Class 1. If z pred itq is specified to be an offset, the effect of this predictor would be reduced (1.5 would be reduced to 1) for this Class. But suppose that you wish to enhance the importance of this predictor for Class 1; say, you wish to restrict β xq to be equal to 2. The trick is to recode the predictor, replacing each code. If we restrict the effect of this recoded predictor to 1, we obtain 1 2 z pred itq, which shows that the effect of z pred itq is equated to 2. Such recoding can be done by twice the value. Thus, the recoded predictor is defined as 2 z pred itq easily within Latent GOLD, using the Replace option. In addition to post hoc refinements to customize the definition of the resulting Classes, the offset restriction can also be used to make the latent classes conform to various theoretical structures. Probably the most important a priori application of Offset is the possibility of defining zero-inflated models for counts, which can also be specified with the special zero-inflated option discussed above. These are models containing one latent class that has a Poisson rate (binomial probability) equal to 0 and that is not affected by the other predictors. An example of a restrictions table corresponding to, 31

32 such a structure is: Intercept LargeNegative(-100) Predictor1 Predictor2 Class 1 Class 2 Class Here, means no effect and means offset. As can be seen, Class 1 is only affected by an offset, and Classes 2 and 3 have there own intercept and predictor effects. The numeric predictor LargeNegative(-100) takes on the value -100 for all records. 21 As a result of the fixed effect of -100, the rate/probability of experiencing an event will be equal to zero. In the case of a Poisson count, the rate for someone belonging to Class 1 equals exp( 100) = 0. In the binomial case, the probability that someone belonging to Class 1 experiences an event is exp( 100)/[1 + exp( 100)] = 0. Now, we will discusses several more advanced applications of the restriction options. Suppose you assume that the effect of price is linear and negative (descending) for Classes 1-3 and unrestricted for Class 4. This can be accomplished by having two copies of the price variable in the model, say Price1 and Price2. The effect of Price1 (numeric) is specified as ordered and is fixed to zero in Class 4. The effect of Price2 is fixed to zero in Classes 1-3. Similar tricks can be applied in LC growth models, in which one may wish to allow for a different type of time effect in each of the latent classes: for example, quadratic (time and time squared, both numeric) in Class 1, linear (time numeric) ascending in Class 2, unrestricted (time nominal) in Class 3, and no time effect in Class 4. Suppose your assumption is that the effect of a particular predictor is at least 2. This can be accomplished by combining a fixed value constraint with an order constraint. More precisely, an additional predictor defined as 2 z pred itq is specified to be an offset and the effect of the original predictor z pred itq defined to be ascending. 21 It is not necessary to assume that the -100 appears in all records. The value could also be -100 if a particular condition is fulfilled for example, if the price of the evaluated product is larger than a certain amount and 0 otherwise. This shows that the offset option provides a much more flexible way of specifying classes with zero response probabilities than the zero-inflated option. 32

33 Latent GOLD Tutorial 3: (Referred to in Lecture Notes as Pages ) 33

34 Tutorial #3: LC Regression with Repeated Measures DemoData = 'conjoint.sav' This tutorial shows how to develop Latent Class (LC) Regression models using the sample data file conjoint.sav. You will learn how to: Select the dependent variable and specify its scale type Distinguish predictors from covariates Impose restrictions on the predictor effects Specify covariates as active or inactive Determine the number of latent classes (i.e., segments) Examine R 2 and various other information related to model prediction In addition, this example illustrates several advanced options in the LC Regression Module. You will learn how to: Use the optional case ID variable to specify repeated observations Explore the Profile and ProbMeans output Use demographic variables as covariates to predict segment membership Obtain predictions based solely on the covariates Classify cases into latent segments The Data The data for this example are obtained from a hypothetical conjoint marketing study involving repeated measures where respondents were asked to provide likelihood of purchase ratings under each of several different scenarios. A partial listing of the data is shown in Figure 1. Figure 1: Partial Listing of Conjoint Data

35 As suggested in Figure 55, there are 8 records for each case (there are 400 cases in total); one record for each cell in this 2x2x2 complete factorial design of different scenarios for the purchase of a product: FASHION (1 = Traditional; 2 = Modern) QUALITY (1 = Low; 2 = High) PRICE (1 = Lower; 2 = Higher) The dependent variable (RATING) is a rating of purchase intent on a five-point scale. The three attributes listed above will be used as predictor variables in the model. We will also include the two demographic variables as covariates, in a second model. SEX (1 = Male; 2 = Female) AGE (1 = 16-24; 2 = 25-39; 3 = 40+). The Goal Use Latent GOLD to identify latent segments differing with respect to the estimate of importance attached to each of the three attributes, which influence an individual s purchase decision. The LC regression model allows for the fact that these estimates may differ for different segments. That is, for one segment, price and only price may influence the decision, while a second segment may be influenced by quality and modern appearance, but is price insensitive. We will treat RATING as an ordinal dependent variable and compare several models to determine the number of segments. We will then show how to describe the demographic differences between these segments and to classify each respondent into that segment which is most likely. Estimating an LC Regression Model Opening a Data File and Selecting the Type of Model For this example, the data file being used is an SPSS system file. To open the file, from the menus choose: File Open From the Files of type drop down list, select SPSS System Files if this is not already the default listing. All files with.sav extensions appear in the list (see Figure 2). Note: If you copied the sample data file to a directory other than the default directory, change to that directory prior to retrieving the file.

36 Figure 2: Open Dialog Box Select conjoint.sav and click Open to open the Viewer window. Highlight Model1 if it is not already highlighted. Right click to open the Model Selection menu (you may also double click the model name to open this menu or select the type of model from the Model Menu). Select Regression and the LC Regression analysis dialog box, which contains 6 tabs, will open (see Fig. 3). Figure 3: Analysis Dialog Box for LC Regression Model

37 Selecting the Variables for the Analysis For this analysis, RATING will be the dependent variable. Select RATING in the Variables List and click Dependent to move the variable to the Dependent box. We also need to indicate the dependent variable scale type. For this example, we will use the default scale type (Ordinal-Fixed) which takes into account the natural ordering between the 5 levels of purchase intent. By default, the fixed scores on the data file (1, 2, 3, 4 and 5) are used which order the levels and establish equal distance between adjacent levels. As explained above, the data contains repeated observations for each respondent (case). Therefore, we need to indicate which records belong to each case. This is accomplished using a Case ID variable, which contains a unique identification number for each case. All records belonging to the same case are assigned the same unique ID. Select ID in the Variables list and click Case ID to move the variable into the Case ID box. Next, we will select the Predictors. Predictors are used as independent variables in the regression model. In the current example, we use the product attributes FASHION, QUALITY and PRICE as predictors. Select FASHION, QUALITY and PRICE in the Variables list and click Predictors to move the variables into the Predictors box. Specifying the Number of Classes The LC regression model simultaneously estimates a separate regression model for each class. A 1-class model estimates only a single regression model. It makes the standard homogeneity assumption that a single regression model holds true for all cases. In the current example, we will start by estimating a 1- class model and obtain a log-likelihood statistic to be used as a base. We will then estimate additional models, which successively increment the number of classes by 1 and assess the significance of each additional class. One assessment consists of a check of whether the change in the log-likelihood for each pair of successive models fails to decrease by a significant amount as determined by the BIC statistic. (The model having the lowest BIC might then be selected.) A second assessment is to utilize the p-value associated with the L 2 fit statistic. In the box titled Classes (located below the Covariates pushbutton) type 1-4 to request the estimation of 4 different LC Regression models a 1-class model, a 2-class model, a 3-class model and a 4-class model. Scanning the Data File Click Scan (located in the lower left of the Analysis dialog box) to scan the data file. The number of distinct categories (or values) along with the scaling option appears next to each variable in the Dependent and Predictors boxes. To view category labels, frequency counts and any scores assigned to any scanned variable, double click on the variable name in the Dependent or Predictor list box. The Variables dialog box will open (see Figure 4).

38 Double click the dependent variable, RATING. Figure 4: Variables box for RATING Click OK to close the Variables dialog box and return to the Regression Analysis dialog box. Estimating the Model Now that we have selected our variables and specified the models, we are ready to estimate the models. Your analysis dialog box should look like Figure 5. Figure 5: Regression Analysis Dialog Box with Initial Settings Click Estimate (located at the bottom right of the Analysis dialog box).

39 Viewing Output and Interpreting Results For LC Regression models, several output files are produced. To view a summary of the models estimated (Figure 6), Click on the data file name, conjoint.sav in the Outline pane. Figure 6: Summary of Models Estimated This output reports statistics that will assist you in determining the correct number of classes -- the loglikelihood (LL) values, the BIC values, and the number of parameters in the estimated models. It is important to determine the right number of classes because specifying too few ignores class differences, while specifying too many may cause the model to be unstable. While the log-likelihood increases each time the number of classes is increased, the minimum BIC value occurs for Model3, suggesting that the 3- class solution is the best of the four estimated models. The R 2 increases from.37 for the 1-class model to.61 for the 3-class regression. Note 1: Occasionally, you might obtain a local (suboptimal) solution. For these data, it is possible to obtain a local solution for the 4-class model, obtaining LL = instead of If this occurs, click Estimate to re-estimate the 4-class model (see section 6.6 in the Technical Guide for a discussion of preventing local solutions). Note 2: Notice that the p-values based on the model L 2 and reported degrees of freedom (df) are not valid assessments of fit because we are dealing with sparse data. We will now examine the detailed output for the 3-class solution. Profile Output Rename Model3 to "3-class" by clicking on its name. Click the + icon next to 3-class in the Outline pane to expand the listing of output for this model. Click Profile. The Profile output contains information on the class sizes, the class-specific (marginal) probabilities and means of the dependent variable (see Figure 7).

40 Figure 7: Profile Output for 3-Class Model The classes are always ordered from high to low according to their size. It can be seen from the first row of the table that segment 1 contains about 50% of the subjects (.4986), segment 2 contains about 25% and segment 3 contains the remaining 25%. Examination of class-specific probabilities shows that overall, segment 1 is least likely to buy (only 27.56% are Very Unlikely to buy) and segment 3 is most likely (28.08% are Very Likely to buy). Later in this tutorial, we will show how to classify each case into the most appropriate segment. Parameters Output Next, we will view the Parameters output (see Figure 8). For the 3-class model, click Parameters. Figure 8: Parameters Output for 3-Class Model

41 The beta parameter for each predictor is a measure of the influence of that predictor on RATING. The beta effect estimates under the column labeled Class 1 suggest that segment 1 is influenced in a positive way by products for which FASHION = Modern (beta = ), in a negative way by PRICE (beta = ), and not at all by QUALITY (beta is approximately 0). We also see that segment 2 is influenced by all 3 attributes, having a preference for those product choices that are modern (beta = ), and higher quality (beta =.9216), but, like segment 1, their preference also decreases as a function of price (beta = ). Members of segment 3 prefer higher quality products (beta = ), but their preference also decreases as a function of price (beta = ), and they are not influenced by FASHION. Note that PRICE has more or less the same influence on all three segments. The Wald (=) statistic indicates that the differences in these beta effects across classes are not significant (the p-value =.67 which is much higher than.05, the standard level for assessing statistical significance). This means that all 3 segments exhibit price sensitivity to the same degree. This is confirmed when we estimate a model in which this effect is specified to be class-independent (see next section). The p-value for the Wald statistic for PRICE is 2.4x indicating that the amount of price sensitivity is highly significant. With respect to the effect of the other two attributes we find large between-segment differences. The predictor FASHION has a strong influence on segment 1, a less strong effect on segment 2, and virtually no effect on segment 3. QUALITY has a strong effect on segment 3, a less strong effect on segment 2, and virtually no effect on segment 1. The fact that the influence of FASHION and QUALITY differs significantly between the 3 segments is confirmed by the significant p-values associated with the Wald(=) statistics for these attributes. For example, for FASHION, the p-value = 6.2x In summary, segment 1 could be labeled the Fashion-Oriented Segment, segment 3 the Quality-Oriented Segment, and segment 2 is the segment that takes into account all 3 attributes in their purchase decision. To test each individual class-specific beta for statistical significance we can append standard errors, Z- statistics, or both to the output. Right click on the output in the Contents Pane and choose Z Statistic:The Wald statistics are replaced by the Z- statistics for the betas. Notice that the absolute values of the z- score associated with QUALITY for class 1 and with FASHION for class 3 fall under 2, and hence are not significant at the.05 level. Figure 9: Parameters Output with Z-values Restricting Certain Effects to be Zero or Class Independent In the Parameters output above we saw that the beta estimates associated with PRICE are approximately equal for all 3 classes. To test the null hypothesis of equality, we used the Wald(=) statistic. The low value of.67 was too small to reject this null hypothesis. We also showed that 2 of the betas were not significantly different from 0. We will now show how to obtain a more parsimonious model by imposing zero restrictions

42 on the 2 betas and by restricting the betas associated with PRICE to be equal across segments. This is accomplished using the Model Tab. To specify the PRICE effects to be class independent, Double click 3-class to open the Analysis Dialog Box for this model Left click on the Model Tab In the row for PRICE, right click on the Class Independent column and select Yes. Figure 10. Setting the Class-Independent effects for the variable PRICE To specify the betas to be zero, Right click on the cells corresponding to the betas to be set to zero Select No Effect Figure 11. Specifying betas to 0 for variable FASHION Click on Variables to return to the Variables Tab.

43 An = will appear to the right of the variable PRICE to indicate the class independent restriction. Click Estimate to re-estimate this model. The output for this new model will be listed as Model5. Viewing Output and Interpreting Results To view the summary output, Click on the data file name, conjoint.sav in the Outline pane. Figure 12: Summary Output The new model estimated has been added to the bottom of the list as Models 5. For Model 5, the fit is almost identical to Model 3 and we obtain a lower (better) BIC value. Parameters Output Click Parameters for Model5 to view the results containing the desired restrictions. For PRICE, which we specified as class independent, note that the Wald(=) is now zero because the betas have been restricted to be exactly equal to each other across classes. Adding Covariates There is one important topic left with respect to the specification of LC regression models; that is, the use of covariates. In the Covariates list box, we can specify variables that we want to use to predict class membership. For this example we will re-estimate the 3-class model, this time including SEX and AGE as covariates. To estimate the model specifying SEX and AGE as covariates, In the Outline pane, double click Model5 to open the Analysis dialog box for this model. Latent GOLD has maintained our previous settings (we will keep PRICE set as class independent). Select SEX and AGE in the Variables list box. Click Covariates to move these variables to the Covariates box. Right click on the variable names and select scale type Nominal.

44 Figure 13: Specification of Model with Covariates Click Estimate. The output for this new model will be listed as Model6. Viewing Output and Interpreting Results To view the summary output, Click on the data file name, conjoint.sav in the Outline pane. Figure 14: Summary Output The new model has been added to the bottom of the list as Model 6. Note that this model is even better than our previous 3-class models as indicated by the lower BIC value.

45 Parameters Output Click Parameters for Model6 to view the results reported in Figure 15. Figure 15: Parameters Output for 3-Class Model with Covariates First, note that the beta parameter estimates for the 3-class model with covariates (see Figure 15) are similar to those in the original 3-class model (see Figure 8). The gamma parameters of the model for the latent distribution appear at the bottom of the Parameters output in Figure 69 under the heading Model for Classes. The p-values associated with the Wald statistic shows that overall, both effects are significant. The betas associated with SEX = Female (0.5423, , ) suggest that females are more likely than males of belonging to the Fashion-Oriented Segment (segment 1), and much less likely to belong to segment 2. The AGE effects show that the youngest age group is more likely than other respondents to be in the Fashion-Oriented Segment while the oldest age group is more likely to be in the Quality-Oriented Segment.

46 Classification Output To obtain the Classification output, you need to specify it as an option in the Output Tab before estimating your model. To view the standard classification output for Model 6, Double click on Model6 in the Outline pane to open the Analysis dialog box for this model. Click on the Output Tab. Click in the checkbox next to Classification - Posterior and 'Classification - Model' to select this output. Click Estimate to re-estimate the model. The new model estimated has been added to the bottom of the list as Model 7 (it is the same as Model6 except for the additional output file selected). In the Outline pane, for Model7, click Classification. Figure 16: Classification Output for Model 7 (partial listing) We can see that the first respondent (ID=1) would be classified into segment 2, since segment 2 has the highest membership probability (.5660) for that respondent (see Figure 16). (For information on how to append classification scores to you original data file, see Step 9 in Chapter 5.) Now, suppose that you wish to classify new data into the appropriate classes but you only had covariate information on these cases. You would use the Classification - Model (Covariate Classification) output for this purpose. To view this output, click on Classification - Model.

47 Figure 17. Covariate Classification Output for Model7

48 Tutorial #7A: Latent Class Growth Model (# seizures) Overview What you will learn: To use Latent GOLD to identify distinct latent class growth trajectories in the data. To name the identified latent class subgroups based on their growth patterns o classify 36 cases as 'unchanged' (Class 1 above) o classify 17 cases as 'improved' (Class 2 above) o classify the remaining 6 cases as 'unstable' (Class 3 above) How to estimate a latent class growth model (Poisson mixture model) to these data that shows those receiving the drug treatment were significantly more likely than the placebo group to improve and significantly less likely to show no change over their baseline seizure rate (p =.02). 56

49 Figure 1. A reanalysis of a longitudinal data set on counts of epileptic seizures using latent class growth models. (Data source: Thall and Vail, 1990). The Data Download Demo Data Set (.sav) = growth1.sav In this study, 59 epileptics were randomly assigned to either an anti-seizure medication Progabide (TRT = 1; n=31) or a placebo (TRT = 0; n=28) as an adjunct to other chemotherapy. For each, a base number of seizures was recorded for 8 weeks before the drugs were administered (BASE) and then for four consecutive 2 week periods (TIME) each preceding a visit to the clinic. We also know the age (AGE) of each participant. Data for the first 3 cases are shown below: 57

50 Figure 2. Data for the first 3 cases The Poisson Mixture Model Let denote the number of epileptic fits during the 2 weeks prior to visit T=t for case i. Case i is assumed to belong to one of K unobservable latent classes x=1,2,,k. We assume that follows the Poisson mixture model given below: where = base rate of seizures for case i per 2-week baseline period = BASE /4 (labeled 'AVGBASE' in the SPSS file shown above) is the random intercept measuring the overall change from the baseline and x is the random effect associated with the t th 2-week treatment period Hence, each latent class x identifies a distinct pattern of change in the seizure rate from the baseline to the treatment 58

51 period. For identification of the, we use effect coding: In our final model, will be modeled as a class independent offset i.e.,, which implies that the expected % change in the seizure rate from the baseline to period t is: Thus, the growth trajectory associated with latent class x is given by: Tutorials #7A and #7B illustrate somewhat different approaches for using Latent GOLD to estimate 1) the trajectories and 2) the treatment effect. These different approaches agree on the following results: Latent class #1 contains approximately 57% of the cases who show basically no change from the baseline rate. Class #2 (31%-32% of the cases) show a significant reduction in seizure rate. The remaining class(es) consist of 6 unstable cases who show a substantial increase in at least one of the treatment periods. These 6 cases were all identified as outliers in an earlier analysis of these data by Rabe-Hesketh and Skrondal (2004). Those treated with Progabide were significantly less likely to be in class 1 ( no change ) and significantly more likely to be in class 2 ( improved ) than the Placebo group. In tutorial #7A, our analysis consists of two steps. First, we estimate a pure (unsupervised) growth model to identify the K different growth patterns over the four post time periods. This growth model pure in the sense that covariate information (AGE and TREATMENT status of the cases) is not taken into account. During step 2 we assess the relationship between these covariates and the different classes. In tutorial #7B, we estimate the treatment effect and all the model parameters simultaneously (i.e., in a single step) by specifying TREATMENT as an active as opposed to inactive covariate. The models and methods illustrated here are simpler to apply than approaches based on Generalized Estimating Equations that have used in conjunction with these data by others (Diggle, et. al, 1994; Lee, 2004). All estimates are maximum likelihood. 59

52 Setting up the Model To retrieve the setups for these models: Select FILEOPEN and click epil.lgf Double click on Model1 The Variables tab appears as follows: Figure 3. Variables Tab for epil.lgf The scale type for the dependent variable Y is set to Count which means that a Poisson regression model will be estimated for each latent class. ID2 is used in the case ID box, indicating that there are multiple records for each case. The predictors TIME and LBASE are included in the Predictors box: TIME is specified as a nominal predictor which allows the estimated time trend to take on any pattern (such as a reduction during period 1 followed by an increase during period 60

53 2). Separate distinct time trends are identified for each class (i.e., random effects are estimated for TIME). The predictor LBASE is treated as class independent which means that this estimate is restricted to be equal in each class. This is indicated by the symbol = which appears next to the variable LBASE in the Predictors box. Thus, a single fixed effect will be estimated for this. Later, we will restrict this parameter estimate to 1. Four additional variables (TRT, BASE, AGE and AVGBASE) are included in the Covariate box and treated as inactive as indicated by the symbol < I >. Thus, these variables will not affect the estimation of the parameters, but these variables will be cross-tabulated by class in the output. Notice in the Classes box that the symbol 1-4 appears. This indicates that we will estimate a 1- class, 2-class, 3-class and a 4-class model. Click Estimate to estimate these 4 models After the estimation has completed: Click the data file name growth1.sav to display the model summary table Right click in the model summary output table to retrieve the Model Summary Display Click to remove the checkmarks for L-square, df and p-value in the Model Summary Display 61

54 Figure 4. Model Summary Output and Model Summary Display Notice that the BIC statistic is lowest for the 4-class model. We will examine the output for both the 3- and 4-class models here. The R2 statistic is.84 for the 3-class and.92 for the 4-class model, and the misclassification error is approximately 6% for each of these models. Click Parameters associated with the 4-class model 62

55 Figure 5. Parameters Output for 4-class Model Note the following: The coefficient for LBASE is almost 1. The TIME estimates for class 1 are very close to zero, suggesting that this class shows no change from the baseline seizure rate. We will refer to the growth pattern for class 1 as no change. The estimate for the Intercept (alpha) for class 2 is a large negative value suggesting an overall reduction in the seizure rate for this class. Despite the small setback between periods 1 and 2 (indicated by the beta increasing from.0973 to.2976) we will refer to this as the improved class. Classes 3 and 4 show a large positive alpha, suggesting an overall increase (worsening) in the seizure rate. The betas for these classes show an abrupt increase in the number of seizures at some point during the follow-up period (time 1 for class 3 shows a beta of.3740; time 3 for class 4 shows a beta of.8410). We will see later that the cases classified into one of these unstable classes are the same ones that were identified as outliers in previous studies. We will refer to these classes as unstable. To display standard errors for these estimates, right click in the output table and select Standard Errors from the popup menu. 63

56 Figure 6. Parameters Output for 4-class model with Standard Errors For class 1, the estimates for the alpha and beta parameters are all less than 1 standard deviation. Later, we will further refine class 1 by setting the TIME estimates (betas) to zero. The estimates for alpha for the other classes are well above 2 standard errors. Note that the two right-most columns provide the Overall mean and standard deviation for the alpha and beta parameters. For LBASE the standard error for the Overall effect is 0 because this is a fixed effect (i.e., it is class independent). The other estimates have non-zero standard errors because they differ depending upon the class (i.e., they are treated as class dependent or random effects). Click on Profile to display the Profile output for the 4-class model. 64

57 Figure 7. Profile Output for 4-class Model The row labeled Class Size indicates that Class 1 (the no change class) represents 57% of the cases. Class 2 (those who improved) contains about 31% of the cases. The remaining 12% of the cases (classes 3 and 4) are the unstable ones. Click on the icon to the left of Model 3 to expand it Click on Profile to open the Profile output for the 3-class model Figure 8. Profile Output for 3-class Model Notice that the class sizes for classes 1 and 2 are about the same as the corresponding classes for the 4-class model. Click Parameters to display the Parameters output for the 3-class model 65

58 Figure 9. Parameters Output for 3-class Model Notice that the alpha and beta estimates for classes 1 and 2 are also similar to those for the 4- class solution. Thus, we see that the classes 1 and 2 are basically identical in the 3- and 4-class solutions. To see that Class 3 contains both unstable classes from the 4-class solution: Click on Standard Classification Output for the 3-class model. Scroll down to identify the cases for which Modal = 3 These cases are #112, #126, #135, #207, #225, and #227. (The posterior membership probabilities for five of these cases are shown below). 66

59 Figure 10. Standard Classification Output for the 3-class Model Compare this with the Standard Classification Output for the 4-class model Notice that the 6 cases assigned to class 3 in the 3-class solution have posterior membership probabilities of of being in class 3. These cases are the same as those assigned to classes 3 or class 4 in the 4 class solution, again with posterior probabilities equal to These 6 cases are the same as those identified as outliers in the analyses of these data by Rabe- 67

60 Hesketh and Skrondal and excluded from their study. While strict adherence to the BIC criteria would cause us to conclude that there are more than 4 distinct time trends for these data, for our purposes in this tutorial, we will focus here on the 3 class solution which identifies 3 classes that are of substantive interest. Class 1 shows no change, class 2 shows substantial improvement and class 3 contains a small number of unstable outliers who show a substantial increase at some follow-up time period. From a substantive perspective, we will be interested in determining to what extent treatment with the drug Progabide vs. the Placebo can explain the growth pattern exhibited by class 2 as opposed to that exhibited by class 1. We will now refine the 3-class solution by applying the following restrictions: Restrict the beta for LBASE to 1 Restrict the betas for TIME to 0 for class 1 (note that we might also choose to set alpha to 0 for this class) To apply the restrictions: Double-click on Model 3 Click on Model Tab Select the row of 1s associated with LBASE Right-click to retrieve the restrictions popup menu Select Offset 68

61 Figure 11. Model Tab for 3-class Model The 1 s change to * s. To implement the zero restriction: Select the 1 associated with the Class 1 TIME effect Right-click to retrieve the restrictions popup menu Select No Effect The one changes to the symbol - to indicate that this effect is restricted to 0. 69

62 Figure 12. Implementing the zero restriction Click Estimate Click Parameters to display the parameters output 70

63 Figure 13. Parameters Output for new 3-class Model The estimates are similar to Model 3 except that the restrictions have been incorporated into the model. To examine the treatment effect: Click Profile 71

64 Figure 14. Profile Output These column proportions show that most of those in the No Change class received the placebo while most of those in the Improved class received the drug treatment. From the means associated with AVGBASE we also see that the Unstable group had a substantially larger base rate ( ) than the other classes. To display the corresponding row proportions: Click ProbMeans 72

65 Figure 15. ProbMeans Output We see that of those who received the drug, 47.32% are in the Improved class compared to only 14.32% of those who received the placebo. Similarly, only 42.28% showed No change compared to 73.09% of the placebo group. These numbers are in the direction expected if the drug reduced the seizure rate. To rename this model for future use: Click Model 5 to select it Model 5 is now highlighted. To enter the Edit Model Click the highlighted Model 5 Replace Model 5 by typing Final 3-class model 73

66 Figure 16. Renaming Model 5 To save this model for future use: Click Final 3-class model to select it From the File Menu Select Save Definition Click Save The model definition is saved as the file Final 3-class model.lgf 74

Growth models for categorical response variables: standard, latent-class, and hybrid approaches

Growth models for categorical response variables: standard, latent-class, and hybrid approaches Growth models for categorical response variables: standard, latent-class, and hybrid approaches Jeroen K. Vermunt Department of Methodology and Statistics, Tilburg University 1 Introduction There are three

More information

Chapter 4 Longitudinal Research Using Mixture Models

Chapter 4 Longitudinal Research Using Mixture Models Chapter 4 Longitudinal Research Using Mixture Models Jeroen K. Vermunt Abstract This chapter provides a state-of-the-art overview of the use of mixture and latent class models for the analysis of longitudinal

More information

Determining the number of components in mixture models for hierarchical data

Determining the number of components in mixture models for hierarchical data Determining the number of components in mixture models for hierarchical data Olga Lukočienė 1 and Jeroen K. Vermunt 2 1 Department of Methodology and Statistics, Tilburg University, P.O. Box 90153, 5000

More information

Title: Testing for Measurement Invariance with Latent Class Analysis. Abstract

Title: Testing for Measurement Invariance with Latent Class Analysis. Abstract 1 Title: Testing for Measurement Invariance with Latent Class Analysis Authors: Miloš Kankaraš*, Guy Moors*, and Jeroen K. Vermunt Abstract Testing for measurement invariance can be done within the context

More information

Using Mixture Latent Markov Models for Analyzing Change in Longitudinal Data with the New Latent GOLD 5.0 GUI

Using Mixture Latent Markov Models for Analyzing Change in Longitudinal Data with the New Latent GOLD 5.0 GUI Using Mixture Latent Markov Models for Analyzing Change in Longitudinal Data with the New Latent GOLD 5.0 GUI Jay Magidson, Ph.D. President, Statistical Innovations Inc. Belmont, MA., U.S. statisticalinnovations.com

More information

Latent Class Analysis for Models with Error of Measurement Using Log-Linear Models and An Application to Women s Liberation Data

Latent Class Analysis for Models with Error of Measurement Using Log-Linear Models and An Application to Women s Liberation Data Journal of Data Science 9(2011), 43-54 Latent Class Analysis for Models with Error of Measurement Using Log-Linear Models and An Application to Women s Liberation Data Haydar Demirhan Hacettepe University

More information

Jay Magidson Statistical Innovations

Jay Magidson Statistical Innovations Obtaining Meaningful Latent Class Segments with Ratings Data by Adjusting for Level, Scale and Extreme Response Styles Jay Magidson Statistical Innovations www.statisticalinnovations.com Presented at M3

More information

Tilburg University. Mixed-effects logistic regression models for indirectly observed outcome variables Vermunt, Jeroen

Tilburg University. Mixed-effects logistic regression models for indirectly observed outcome variables Vermunt, Jeroen Tilburg University Mixed-effects logistic regression models for indirectly observed outcome variables Vermunt, Jeroen Published in: Multivariate Behavioral Research Document version: Peer reviewed version

More information

Using a Scale-Adjusted Latent Class Model to Establish Measurement Equivalence in Cross-Cultural Surveys:

Using a Scale-Adjusted Latent Class Model to Establish Measurement Equivalence in Cross-Cultural Surveys: Using a Scale-Adjusted Latent Class Model to Establish Measurement Equivalence in Cross-Cultural Surveys: An Application with the Myers-Briggs Personality Type Indicator (MBTI) Jay Magidson, Statistical

More information

Mixed-Effects Logistic Regression Models for Indirectly Observed Discrete Outcome Variables

Mixed-Effects Logistic Regression Models for Indirectly Observed Discrete Outcome Variables MULTIVARIATE BEHAVIORAL RESEARCH, 40(3), 28 30 Copyright 2005, Lawrence Erlbaum Associates, Inc. Mixed-Effects Logistic Regression Models for Indirectly Observed Discrete Outcome Variables Jeroen K. Vermunt

More information

Correspondence Analysis of Longitudinal Data

Correspondence Analysis of Longitudinal Data Correspondence Analysis of Longitudinal Data Mark de Rooij* LEIDEN UNIVERSITY, LEIDEN, NETHERLANDS Peter van der G. M. Heijden UTRECHT UNIVERSITY, UTRECHT, NETHERLANDS *Corresponding author (rooijm@fsw.leidenuniv.nl)

More information

Jay Magidson, Ph.D. Statistical Innovations

Jay Magidson, Ph.D. Statistical Innovations Obtaining Meaningful Latent Class Segments with Ratings Data by Adjusting for Level and Scale Effects by Jay Magidson, Ph.D. Statistical Innovations Abstract Individuals tend to differ in their use of

More information

Investigating Models with Two or Three Categories

Investigating Models with Two or Three Categories Ronald H. Heck and Lynn N. Tabata 1 Investigating Models with Two or Three Categories For the past few weeks we have been working with discriminant analysis. Let s now see what the same sort of model might

More information

Investigating Population Heterogeneity With Factor Mixture Models

Investigating Population Heterogeneity With Factor Mixture Models Psychological Methods 2005, Vol. 10, No. 1, 21 39 Copyright 2005 by the American Psychological Association 1082-989X/05/$12.00 DOI: 10.1037/1082-989X.10.1.21 Investigating Population Heterogeneity With

More information

Generalized Linear Models

Generalized Linear Models York SPIDA John Fox Notes Generalized Linear Models Copyright 2010 by John Fox Generalized Linear Models 1 1. Topics I The structure of generalized linear models I Poisson and other generalized linear

More information

Applied Psychological Measurement 2001; 25; 283

Applied Psychological Measurement 2001; 25; 283 Applied Psychological Measurement http://apm.sagepub.com The Use of Restricted Latent Class Models for Defining and Testing Nonparametric and Parametric Item Response Theory Models Jeroen K. Vermunt Applied

More information

A NEW MODEL FOR THE FUSION OF MAXDIFF SCALING

A NEW MODEL FOR THE FUSION OF MAXDIFF SCALING A NEW MODEL FOR THE FUSION OF MAXDIFF SCALING AND RATINGS DATA JAY MAGIDSON 1 STATISTICAL INNOVATIONS INC. DAVE THOMAS SYNOVATE JEROEN K. VERMUNT TILBURG UNIVERSITY ABSTRACT A property of MaxDiff (Maximum

More information

LOG-MULTIPLICATIVE ASSOCIATION MODELS AS LATENT VARIABLE MODELS FOR NOMINAL AND0OR ORDINAL DATA. Carolyn J. Anderson* Jeroen K.

LOG-MULTIPLICATIVE ASSOCIATION MODELS AS LATENT VARIABLE MODELS FOR NOMINAL AND0OR ORDINAL DATA. Carolyn J. Anderson* Jeroen K. 3 LOG-MULTIPLICATIVE ASSOCIATION MODELS AS LATENT VARIABLE MODELS FOR NOMINAL AND0OR ORDINAL DATA Carolyn J. Anderson* Jeroen K. Vermunt Associations between multiple discrete measures are often due to

More information

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

More information

1 Introduction. 2 A regression model

1 Introduction. 2 A regression model Regression Analysis of Compositional Data When Both the Dependent Variable and Independent Variable Are Components LA van der Ark 1 1 Tilburg University, The Netherlands; avdark@uvtnl Abstract It is well

More information

Logistic regression analysis with multidimensional random effects: A comparison of three approaches

Logistic regression analysis with multidimensional random effects: A comparison of three approaches Quality & Quantity manuscript No. (will be inserted by the editor) Logistic regression analysis with multidimensional random effects: A comparison of three approaches Olga Lukočienė Jeroen K. Vermunt Received:

More information

Anders Skrondal. Norwegian Institute of Public Health London School of Hygiene and Tropical Medicine. Based on joint work with Sophia Rabe-Hesketh

Anders Skrondal. Norwegian Institute of Public Health London School of Hygiene and Tropical Medicine. Based on joint work with Sophia Rabe-Hesketh Constructing Latent Variable Models using Composite Links Anders Skrondal Norwegian Institute of Public Health London School of Hygiene and Tropical Medicine Based on joint work with Sophia Rabe-Hesketh

More information

Power analysis for the Bootstrap Likelihood Ratio Test in Latent Class Models

Power analysis for the Bootstrap Likelihood Ratio Test in Latent Class Models Power analysis for the Bootstrap Likelihood Ratio Test in Latent Class Models Fetene B. Tekle 1, Dereje W. Gudicha 2 and Jeroen. Vermunt 2 Abstract Latent class (LC) analysis is used to construct empirical

More information

Multilevel Regression Mixture Analysis

Multilevel Regression Mixture Analysis Multilevel Regression Mixture Analysis Bengt Muthén and Tihomir Asparouhov Forthcoming in Journal of the Royal Statistical Society, Series A October 3, 2008 1 Abstract A two-level regression mixture model

More information

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science. Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint

More information

Computational Statistics and Data Analysis. Identifiability of extended latent class models with individual covariates

Computational Statistics and Data Analysis. Identifiability of extended latent class models with individual covariates Computational Statistics and Data Analysis 52(2008) 5263 5268 Contents lists available at ScienceDirect Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda Identifiability

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

Yu Xie, Institute for Social Research, 426 Thompson Street, University of Michigan, Ann

Yu Xie, Institute for Social Research, 426 Thompson Street, University of Michigan, Ann Association Model, Page 1 Yu Xie, Institute for Social Research, 426 Thompson Street, University of Michigan, Ann Arbor, MI 48106. Email: yuxie@umich.edu. Tel: (734)936-0039. Fax: (734)998-7415. Association

More information

A class of latent marginal models for capture-recapture data with continuous covariates

A class of latent marginal models for capture-recapture data with continuous covariates A class of latent marginal models for capture-recapture data with continuous covariates F Bartolucci A Forcina Università di Urbino Università di Perugia FrancescoBartolucci@uniurbit forcina@statunipgit

More information

Statistical Power of Likelihood-Ratio and Wald Tests in Latent Class Models with Covariates

Statistical Power of Likelihood-Ratio and Wald Tests in Latent Class Models with Covariates Statistical Power of Likelihood-Ratio and Wald Tests in Latent Class Models with Covariates Abstract Dereje W. Gudicha, Verena D. Schmittmann, and Jeroen K.Vermunt Department of Methodology and Statistics,

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

Discrete Response Multilevel Models for Repeated Measures: An Application to Voting Intentions Data

Discrete Response Multilevel Models for Repeated Measures: An Application to Voting Intentions Data Quality & Quantity 34: 323 330, 2000. 2000 Kluwer Academic Publishers. Printed in the Netherlands. 323 Note Discrete Response Multilevel Models for Repeated Measures: An Application to Voting Intentions

More information

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

Statistical power of likelihood ratio and Wald tests in latent class models with covariates

Statistical power of likelihood ratio and Wald tests in latent class models with covariates DOI 10.3758/s13428-016-0825-y Statistical power of likelihood ratio and Wald tests in latent class models with covariates Dereje W. Gudicha 1 Verena D. Schmittmann 1 Jeroen K. Vermunt 1 The Author(s) 2016.

More information

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA Examples: Multilevel Modeling With Complex Survey Data CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA Complex survey data refers to data obtained by stratification, cluster sampling and/or

More information

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review

More information

Review of Panel Data Model Types Next Steps. Panel GLMs. Department of Political Science and Government Aarhus University.

Review of Panel Data Model Types Next Steps. Panel GLMs. Department of Political Science and Government Aarhus University. Panel GLMs Department of Political Science and Government Aarhus University May 12, 2015 1 Review of Panel Data 2 Model Types 3 Review and Looking Forward 1 Review of Panel Data 2 Model Types 3 Review

More information

Logistic Regression: Regression with a Binary Dependent Variable

Logistic Regression: Regression with a Binary Dependent Variable Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression

More information

Comparing IRT with Other Models

Comparing IRT with Other Models Comparing IRT with Other Models Lecture #14 ICPSR Item Response Theory Workshop Lecture #14: 1of 45 Lecture Overview The final set of slides will describe a parallel between IRT and another commonly used

More information

Latent class analysis and finite mixture models with Stata

Latent class analysis and finite mixture models with Stata Latent class analysis and finite mixture models with Stata Isabel Canette Principal Mathematician and Statistician StataCorp LLC 2017 Stata Users Group Meeting Madrid, October 19th, 2017 Introduction Latent

More information

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter

More information

AVOIDING BOUNDARY ESTIMATES IN LATENT CLASS ANALYSIS BY BAYESIAN POSTERIOR MODE ESTIMATION

AVOIDING BOUNDARY ESTIMATES IN LATENT CLASS ANALYSIS BY BAYESIAN POSTERIOR MODE ESTIMATION Behaviormetrika Vol33, No1, 2006, 43 59 AVOIDING BOUNDARY ESTIMATES IN LATENT CLASS ANALYSIS BY BAYESIAN POSTERIOR MODE ESTIMATION Francisca Galindo Garre andjeroenkvermunt In maximum likelihood estimation

More information

Generalized Linear Latent and Mixed Models with Composite Links and Exploded

Generalized Linear Latent and Mixed Models with Composite Links and Exploded Generalized Linear Latent and Mixed Models with Composite Links and Exploded Likelihoods Anders Skrondal 1 and Sophia Rabe-Hesketh 2 1 Norwegian Institute of Public Health, Oslo (anders.skrondal@fhi.no)

More information

For Bonnie and Jesse (again)

For Bonnie and Jesse (again) SECOND EDITION A P P L I E D R E G R E S S I O N A N A L Y S I S a n d G E N E R A L I Z E D L I N E A R M O D E L S For Bonnie and Jesse (again) SECOND EDITION A P P L I E D R E G R E S S I O N A N A

More information

A Comparison of Multilevel Logistic Regression Models with Parametric and Nonparametric Random Intercepts

A Comparison of Multilevel Logistic Regression Models with Parametric and Nonparametric Random Intercepts A Comparison of Multilevel Logistic Regression Models with Parametric and Nonparametric Random Intercepts Olga Lukocien_e, Jeroen K. Vermunt Department of Methodology and Statistics, Tilburg University,

More information

Bayes methods for categorical data. April 25, 2017

Bayes methods for categorical data. April 25, 2017 Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,

More information

What is Latent Class Analysis. Tarani Chandola

What is Latent Class Analysis. Tarani Chandola What is Latent Class Analysis Tarani Chandola methods@manchester Many names similar methods (Finite) Mixture Modeling Latent Class Analysis Latent Profile Analysis Latent class analysis (LCA) LCA is a

More information

A general non-parametric approach to the analysis of ordinal categorical data Vermunt, Jeroen

A general non-parametric approach to the analysis of ordinal categorical data Vermunt, Jeroen Tilburg University A general non-parametric approach to the analysis of ordinal categorical data Vermunt, Jeroen Published in: Sociological Methodology Document version: Peer reviewed version Publication

More information

Multilevel regression mixture analysis

Multilevel regression mixture analysis J. R. Statist. Soc. A (2009) 172, Part 3, pp. 639 657 Multilevel regression mixture analysis Bengt Muthén University of California, Los Angeles, USA and Tihomir Asparouhov Muthén & Muthén, Los Angeles,

More information

Generalized Models: Part 1

Generalized Models: Part 1 Generalized Models: Part 1 Topics: Introduction to generalized models Introduction to maximum likelihood estimation Models for binary outcomes Models for proportion outcomes Models for categorical outcomes

More information

11. Generalized Linear Models: An Introduction

11. Generalized Linear Models: An Introduction Sociology 740 John Fox Lecture Notes 11. Generalized Linear Models: An Introduction Copyright 2014 by John Fox Generalized Linear Models: An Introduction 1 1. Introduction I A synthesis due to Nelder and

More information

Power and Sample Size Computation for Wald Tests in Latent Class Models

Power and Sample Size Computation for Wald Tests in Latent Class Models Journal of Classification 33:30-51 (2016) DOI: 10.1007/s00357-016-9199-1 Power and Sample Size Computation for Wald Tests in Latent Class Models Dereje W. Gudicha Tilburg University, The Netherlands Fetene

More information

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Tihomir Asparouhov 1, Bengt Muthen 2 Muthen & Muthen 1 UCLA 2 Abstract Multilevel analysis often leads to modeling

More information

Generalized Linear Models (GLZ)

Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) are an extension of the linear modeling process that allows models to be fit to data that follow probability distributions other than the

More information

Walkthrough for Illustrations. Illustration 1

Walkthrough for Illustrations. Illustration 1 Tay, L., Meade, A. W., & Cao, M. (in press). An overview and practical guide to IRT measurement equivalence analysis. Organizational Research Methods. doi: 10.1177/1094428114553062 Walkthrough for Illustrations

More information

Mixture Modeling in Mplus

Mixture Modeling in Mplus Mixture Modeling in Mplus Gitta Lubke University of Notre Dame VU University Amsterdam Mplus Workshop John s Hopkins 2012 G. Lubke, ND, VU Mixture Modeling in Mplus 1/89 Outline 1 Overview 2 Latent Class

More information

Estimated Precision for Predictions from Generalized Linear Models in Sociological Research

Estimated Precision for Predictions from Generalized Linear Models in Sociological Research Quality & Quantity 34: 137 152, 2000. 2000 Kluwer Academic Publishers. Printed in the Netherlands. 137 Estimated Precision for Predictions from Generalized Linear Models in Sociological Research TIM FUTING

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

NELS 88. Latent Response Variable Formulation Versus Probability Curve Formulation

NELS 88. Latent Response Variable Formulation Versus Probability Curve Formulation NELS 88 Table 2.3 Adjusted odds ratios of eighth-grade students in 988 performing below basic levels of reading and mathematics in 988 and dropping out of school, 988 to 990, by basic demographics Variable

More information

Introduction to Generalized Models

Introduction to Generalized Models Introduction to Generalized Models Today s topics: The big picture of generalized models Review of maximum likelihood estimation Models for binary outcomes Models for proportion outcomes Models for categorical

More information

A general non-parametric approach to the analysis of ordinal categorical data Vermunt, Jeroen

A general non-parametric approach to the analysis of ordinal categorical data Vermunt, Jeroen Tilburg University A general non-parametric approach to the analysis of ordinal categorical data Vermunt, Jeroen Published in: Sociological Methodology Document version: Peer reviewed version Publication

More information

MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES

MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES REVSTAT Statistical Journal Volume 13, Number 3, November 2015, 233 243 MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES Authors: Serpil Aktas Department of

More information

Longitudinal Modeling with Logistic Regression

Longitudinal Modeling with Logistic Regression Newsom 1 Longitudinal Modeling with Logistic Regression Longitudinal designs involve repeated measurements of the same individuals over time There are two general classes of analyses that correspond to

More information

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University

More information

Modeling Joint and Marginal Distributions in the Analysis of Categorical Panel Data

Modeling Joint and Marginal Distributions in the Analysis of Categorical Panel Data SOCIOLOGICAL Vermunt et al. / JOINT METHODS AND MARGINAL & RESEARCH DISTRIBUTIONS This article presents a unifying approach to the analysis of repeated univariate categorical (ordered) responses based

More information

Daniel Boduszek University of Huddersfield

Daniel Boduszek University of Huddersfield Daniel Boduszek University of Huddersfield d.boduszek@hud.ac.uk Introduction to moderator effects Hierarchical Regression analysis with continuous moderator Hierarchical Regression analysis with categorical

More information

General structural model Part 2: Categorical variables and beyond. Psychology 588: Covariance structure and factor models

General structural model Part 2: Categorical variables and beyond. Psychology 588: Covariance structure and factor models General structural model Part 2: Categorical variables and beyond Psychology 588: Covariance structure and factor models Categorical variables 2 Conventional (linear) SEM assumes continuous observed variables

More information

Log-linear multidimensional Rasch model for capture-recapture

Log-linear multidimensional Rasch model for capture-recapture Log-linear multidimensional Rasch model for capture-recapture Elvira Pelle, University of Milano-Bicocca, e.pelle@campus.unimib.it David J. Hessen, Utrecht University, D.J.Hessen@uu.nl Peter G.M. Van der

More information

Model Estimation Example

Model Estimation Example Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions

More information

Reader s Guide ANALYSIS OF VARIANCE. Quantitative and Qualitative Research, Debate About Secondary Analysis of Qualitative Data BASIC STATISTICS

Reader s Guide ANALYSIS OF VARIANCE. Quantitative and Qualitative Research, Debate About Secondary Analysis of Qualitative Data BASIC STATISTICS ANALYSIS OF VARIANCE Analysis of Covariance (ANCOVA) Analysis of Variance (ANOVA) Main Effect Model I ANOVA Model II ANOVA Model III ANOVA One-Way ANOVA Two-Way ANOVA ASSOCIATION AND CORRELATION Association

More information

Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p )

Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p ) Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p. 376-390) BIO656 2009 Goal: To see if a major health-care reform which took place in 1997 in Germany was

More information

Experimental Design and Data Analysis for Biologists

Experimental Design and Data Analysis for Biologists Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1

More information

A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL

A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL Discussiones Mathematicae Probability and Statistics 36 206 43 5 doi:0.75/dmps.80 A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL Tadeusz Bednarski Wroclaw University e-mail: t.bednarski@prawo.uni.wroc.pl

More information

Partitioning variation in multilevel models.

Partitioning variation in multilevel models. Partitioning variation in multilevel models. by Harvey Goldstein, William Browne and Jon Rasbash Institute of Education, London, UK. Summary. In multilevel modelling, the residual variation in a response

More information

Contents. Part I: Fundamentals of Bayesian Inference 1

Contents. Part I: Fundamentals of Bayesian Inference 1 Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Multilevel Mixture with Known Mixing Proportions: Applications to School and Individual Level Overweight and Obesity Data from Birmingham, England

Multilevel Mixture with Known Mixing Proportions: Applications to School and Individual Level Overweight and Obesity Data from Birmingham, England 1 Multilevel Mixture with Known Mixing Proportions: Applications to School and Individual Level Overweight and Obesity Data from Birmingham, England By Shakir Hussain 1 and Ghazi Shukur 1 School of health

More information

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,

More information

Bayesian non-parametric model to longitudinally predict churn

Bayesian non-parametric model to longitudinally predict churn Bayesian non-parametric model to longitudinally predict churn Bruno Scarpa Università di Padova Conference of European Statistics Stakeholders Methodologists, Producers and Users of European Statistics

More information

Links Between Binary and Multi-Category Logit Item Response Models and Quasi-Symmetric Loglinear Models

Links Between Binary and Multi-Category Logit Item Response Models and Quasi-Symmetric Loglinear Models Links Between Binary and Multi-Category Logit Item Response Models and Quasi-Symmetric Loglinear Models Alan Agresti Department of Statistics University of Florida Gainesville, Florida 32611-8545 July

More information

36-720: The Rasch Model

36-720: The Rasch Model 36-720: The Rasch Model Brian Junker October 15, 2007 Multivariate Binary Response Data Rasch Model Rasch Marginal Likelihood as a GLMM Rasch Marginal Likelihood as a Log-Linear Model Example For more

More information

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model EPSY 905: Multivariate Analysis Lecture 1 20 January 2016 EPSY 905: Lecture 1 -

More information

Stat 502X Exam 2 Spring 2014

Stat 502X Exam 2 Spring 2014 Stat 502X Exam 2 Spring 2014 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed This exam consists of 12 parts. I'll score it at 10 points per problem/part

More information

A COEFFICIENT OF DETERMINATION FOR LOGISTIC REGRESSION MODELS

A COEFFICIENT OF DETERMINATION FOR LOGISTIC REGRESSION MODELS A COEFFICIENT OF DETEMINATION FO LOGISTIC EGESSION MODELS ENATO MICELI UNIVESITY OF TOINO After a brief presentation of the main extensions of the classical coefficient of determination ( ), a new index

More information

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011) Ron Heck, Fall 2011 1 EDEP 768E: Seminar in Multilevel Modeling rev. January 3, 2012 (see footnote) Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October

More information

26:010:557 / 26:620:557 Social Science Research Methods

26:010:557 / 26:620:557 Social Science Research Methods 26:010:557 / 26:620:557 Social Science Research Methods Dr. Peter R. Gillett Associate Professor Department of Accounting & Information Systems Rutgers Business School Newark & New Brunswick 1 Overview

More information

Introduction to GSEM in Stata

Introduction to GSEM in Stata Introduction to GSEM in Stata Christopher F Baum ECON 8823: Applied Econometrics Boston College, Spring 2016 Christopher F Baum (BC / DIW) Introduction to GSEM in Stata Boston College, Spring 2016 1 /

More information

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links Communications of the Korean Statistical Society 2009, Vol 16, No 4, 697 705 Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links Kwang Mo Jeong a, Hyun Yung Lee 1, a a Department

More information

Testing the Limits of Latent Class Analysis. Ingrid Carlson Wurpts

Testing the Limits of Latent Class Analysis. Ingrid Carlson Wurpts Testing the Limits of Latent Class Analysis by Ingrid Carlson Wurpts A Thesis Presented in Partial Fulfillment of the Requirements for the Degree Master of Arts Approved April 2012 by the Graduate Supervisory

More information

Linear, Generalized Linear, and Mixed-Effects Models in R. Linear and Generalized Linear Models in R Topics

Linear, Generalized Linear, and Mixed-Effects Models in R. Linear and Generalized Linear Models in R Topics Linear, Generalized Linear, and Mixed-Effects Models in R John Fox McMaster University ICPSR 2018 John Fox (McMaster University) Statistical Models in R ICPSR 2018 1 / 19 Linear and Generalized Linear

More information

Interpreting and using heterogeneous choice & generalized ordered logit models

Interpreting and using heterogeneous choice & generalized ordered logit models Interpreting and using heterogeneous choice & generalized ordered logit models Richard Williams Department of Sociology University of Notre Dame July 2006 http://www.nd.edu/~rwilliam/ The gologit/gologit2

More information

DISPLAYING THE POISSON REGRESSION ANALYSIS

DISPLAYING THE POISSON REGRESSION ANALYSIS Chapter 17 Poisson Regression Chapter Table of Contents DISPLAYING THE POISSON REGRESSION ANALYSIS...264 ModelInformation...269 SummaryofFit...269 AnalysisofDeviance...269 TypeIII(Wald)Tests...269 MODIFYING

More information

Likelihood-ratio tests for order-restricted log-linear models Galindo-Garre, F.; Vermunt, Jeroen; Croon, M.A.

Likelihood-ratio tests for order-restricted log-linear models Galindo-Garre, F.; Vermunt, Jeroen; Croon, M.A. Tilburg University Likelihood-ratio tests for order-restricted log-linear models Galindo-Garre, F.; Vermunt, Jeroen; Croon, M.A. Published in: Metodología de las Ciencias del Comportamiento Publication

More information

2005 ICPSR SUMMER PROGRAM REGRESSION ANALYSIS III: ADVANCED METHODS

2005 ICPSR SUMMER PROGRAM REGRESSION ANALYSIS III: ADVANCED METHODS 2005 ICPSR SUMMER PROGRAM REGRESSION ANALYSIS III: ADVANCED METHODS William G. Jacoby Department of Political Science Michigan State University June 27 July 22, 2005 This course will take a modern, data-analytic

More information

RANDOM INTERCEPT ITEM FACTOR ANALYSIS. IE Working Paper MK8-102-I 02 / 04 / Alberto Maydeu Olivares

RANDOM INTERCEPT ITEM FACTOR ANALYSIS. IE Working Paper MK8-102-I 02 / 04 / Alberto Maydeu Olivares RANDOM INTERCEPT ITEM FACTOR ANALYSIS IE Working Paper MK8-102-I 02 / 04 / 2003 Alberto Maydeu Olivares Instituto de Empresa Marketing Dept. C / María de Molina 11-15, 28006 Madrid España Alberto.Maydeu@ie.edu

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: ) NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3

More information

Introducing Generalized Linear Models: Logistic Regression

Introducing Generalized Linear Models: Logistic Regression Ron Heck, Summer 2012 Seminars 1 Multilevel Regression Models and Their Applications Seminar Introducing Generalized Linear Models: Logistic Regression The generalized linear model (GLM) represents and

More information

Dynamic sequential analysis of careers

Dynamic sequential analysis of careers Dynamic sequential analysis of careers Fulvia Pennoni Department of Statistics and Quantitative Methods University of Milano-Bicocca http://www.statistica.unimib.it/utenti/pennoni/ Email: fulvia.pennoni@unimib.it

More information

Econometric Analysis of Cross Section and Panel Data

Econometric Analysis of Cross Section and Panel Data Econometric Analysis of Cross Section and Panel Data Jeffrey M. Wooldridge / The MIT Press Cambridge, Massachusetts London, England Contents Preface Acknowledgments xvii xxiii I INTRODUCTION AND BACKGROUND

More information

Outline. The binary choice model. The multinomial choice model. Extensions of the basic choice model

Outline. The binary choice model. The multinomial choice model. Extensions of the basic choice model Outline The binary choice model Illustration Specification of the binary choice model Interpreting the results of binary choice models ME output The multinomial choice model Illustration Specification

More information