SESSION 2 ASSIGNED READING MATERIALS

Size: px

Start display at page:

Download "SESSION 2 ASSIGNED READING MATERIALS"

Allyson Hines
5 years ago
Views:

1 Introduction to Latent Class Modeling using Latent GOLD SESSION 2 SESSION 2 ASSIGNED READING MATERIALS Copyright 2012 by Statistical Innovations Inc. All rights reserved. No part of this material may be reproduced or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission from Statistical Innovations Inc.

2 LATENT CLASS MODELS Jay Magidson, Statistical Innovations Inc. Jeroen K. Vermunt, Tilburg University Address of correspondence: Jay Magidson Statistical Innovations Inc. Suite Concord Ave. Belmont MA phone: (617) fax: (617)

3 Assigned Reading: Sage A: 4. OTHER TYPES OF LATENT CLASS MODELS Thus far, we have focused on the traditional LC modeling approach, including some important extensions such as covariates, several latent variables, and local dependencies. Some common characteristics of these models are that they serve as scaling methods or tools for dealing with measurement error, that indicators are nominal or ordinal, and that local independence between indicators is the primary model assumption. In this section, we discuss other types of LC models. They are not used as scaling tools, but as clustering methods, tools for dealing with unobserved heterogeneity, density estimation methods, or random-coefficients models (McLachlan and Peel, 2000). Moreover, indicators or dependent variables can be of scale types other than nominal or ordinal and local independence is no longer the basic model assumption. As we will see, in some cases there is only one indicator or dependent variable. The next section presents simple mixture models for univariate distributions, with examples of mixtures of normals and mixtures of Poisson distributions. Then, we extend this basic model by including predictors, yielding what is called mixture regression or LC regression models. We present an example of a mixed linear regression model, and show how the method can deal with various types of repeated measurements. Special attention is given to the relationship with hierarchical or multilevel models. Then, we present another extension of the simple mixture model; that is, a mixture model for multivariate distributions. As will be shown, the resulting LC model can be seen as a model-based alternative to standard hierarchical clustering 3

4 methods like K-means. We end with a short overview of LC methods that were not discussed in detail. 4.1 Simple Mixture Models Figure 3: Simulated Distribution from a Mixture of Two Normals Std. Dev = 2.14 Mean = 1.16 N = Y Consider the histogram depicted in Figure 3. This generated data set of 1,000 cases is obtained from a population consisting of a mixture of two normal distributions. For 60 percent of the population, the variable of interest follows a normal distribution with a mean of 0 and a variance of 1, N(0,1); for the other 40 percent, the mean 4

5 equals 3 and the variance 4, N(3,4). The normal curve that is drawn through the histogram shows that the resulting mixture is clearly not normally distributed. A model that can be used to describe such a phenomenon is a finite mixture model (Everitt and Hand, 1981; McLachlan and Peel, 2000), which is a particular kind of LC model. The basic formula for a mixture of univariate distributions is T X f ( y ) f ( y ). (4) t1 t t The left-hand side of Equation (4) indicates that we are interested in describing the distribution of a random variable y, which depends on a set of unknown parameters X. The right-hand side contains two terms: t is the probability of belonging to latent class or mixture component t and f y ) is the distribution of y within latent class t given some unknown parameters ( t t. The class-specific distribution of y is assumed to belong to a particular parametric family. Depending on the scale-type of y, this can, for instance, be a normal, Poisson, binomial, exponential, or gamma distribution. The summation on the right-hand side indicates that the distribution of y is a weighted mean of the class-specific distributions, where the latent class proportions serve as weights. Mixture models like these have two important types of applications. The first is density estimation: complicated distributions can be approximated by a mixture of simple parametric distributions. Another important application type is 5

6 clustering, in which case the class-specific parameters are used to define the clusters and the posterior membership probabilities are used to classify cases into the most appropriate cluster. Table 9: Test Results for Generated Mixture of Normals Data Model Log- Likelihood BIC LL Number of Parameters Equal variances 1-Class Class Class Class Unequal Variances 2-Class Class Class Table 9 presents test results for various models fitted to the data depicted in Figure 3. We estimated 1- to 4-class mixtures of normal distributions with equal and unequal within cluster-variances. As can be seen, the BIC measure identifies the correct model, the two-cluster model with unequal within-cluster variances, as best. The three-class model with equal within-cluster variances fits almost as well as, showing that a simpler parametric form can sometimes be compensated by a larger number of mixture components. In the two-class model with unequal variances, the estimated probability of belonging to class one is.64. This class has an estimated mean of and a variance of The mean and variance of the other class equals 3.24 and Note that these estimates are close to the population values we used to generate this data set. 6

7 Table 10. Observed and Estimated Frequency Distribution of Packs of Hard-Candy Purchased During Last 7 Days under the 1-Class and 3-Class Poisson Model Number of Packages Frequencies Observed 1-Class Model 3-Class Model Table 10 provides a data set taken from Dillon and Kumar (1994) that we will use as a second example. It gives the observed frequency distribution of the number of packs of hard-candy consumed by 456 respondents during the 7 days prior to the survey. Because the outcome variable is a count without a fixed maximum, it is most natural to assume that it follows a Poisson distribution. The table also reports the estimated frequency distribution obtained with a standard, or 1-class, Poisson model, as well as with a 3-class mixture Poisson model. As can be seen, the standard Poisson 7

8 model does not fit the empirical distribution at all, while the 3-class Poisson describes the data almost perfectly. This shows that a mixture of simple parametric distributions can be used to describe a quite complicated empirical distribution. Test results obtained when applying mixture Poisson models to the hardcandy data set show that models with 2 and 3 mixture components perform much better than the standard Poisson model. As is typical, there is a saturation point at which increasing the number of classes no longer increases the log-likelihood function: in this case, it occurs at 4 classes. The 3-cluster solution is the one that is preferred according to the BIC criterion. The estimated latent class proportions in the 3-class model are 0.54, 0.28, and 0.18, and the Poisson rates are 3.48, 0.29, and This means that we identified a small cluster of heavy users (more than 11 packs in 7 days), a cluster containing slightly more than a quarter of the respondents with almost no usage, and a large group of moderate users. 8

9 Assigned Reading: Sage B: 4.2 LC Regression Models In the simple mixture models discussed above, it was assumed that the mean of the chosen parametric distribution differs across latent classes. This can also be expressed by specifying a linear regression model for the mean of the distribution of interest, t, after applying some transformation or link function g(..) that depends on the scale type of the y variable. For the mean of a binomial or multinomial distribution, we use a logit transformation; for a Poisson mean, a log transformation; and for a normal mean, no transformation or an identity link. The regression model has the form g( t) 0t. As can be seen, this regression model contains only an intercept and this intercept is class-specific. Let w denote a set of predictors or explanatory variables. Suppose we are no longer interested is the unconditional distribution of y, but in the conditional distribution of y given w, f y w, ). A natural way to express the dependency of y ( t on w is by the inclusion of the set of predictors w in the right-hand side of the regression equation. In the case of a single predictor w, the resulting LC regression model (Wedel and DeSarbo, 1994) has the form g( ) tw, t 0t 1 where 0t and 1t are the class-specific regression coefficients. 9

10 Figure 4: Simulated 2-Class LC Regression Model observed YLC1 YLC2 Figure 4 depicts a data set generated from a population consisting of two latent classes, with class-specific regression models equal to 1 3w and 0 1w. It also 1 2 compares the estimated y values for the two-class model (YLC2), with the standard, 1-class, regression model (YLC1). As can be seen, the description given by the standard regression model is very poor compared to the 2-class model. The LC 10

11 regression modeling procedure has no problem identifying the two regression lines without pre-knowledge of class membership. In a LC regression model, the latent variable is a predictor that interacts with the observed predictors, which means that it serves as a moderator variable. Compared to a standard regression model where all predictors are observed this basic LC regression model provides several useful functions. First, it can be used to weaken standard regression assumptions about the nature of the effects (linear, no interactions) and the error term (independent of predictors, particular distribution, homoskedastic). Second, it makes it possible to identify and correct for sources of unobserved heterogeneity. As explained below, this is especially useful in situations where there are repeated measurements or other types of dependent observations. Longitudinal data applications are sometimes referred to as LC or mixture growth models (each latent class has its own growth curve). Third, it can be used to detect outliers since these are cases for which the primary regression model does not hold. An important application area for LC regression modeling is clustering or segmentation (Wedel and Kamakura, 1998). In particular, ratings- and choice-based conjoint studies are designed to identify subgroups (segments) that react differently to product characteristics, which is the same as saying that these groups have different regression coefficients. This type of application is illustrated in more detail below with an empirical example. 11

12 Assigned Reading: Sage C: Example: repeated measurements or clustered observations As explained below, the LC regression model can be viewed as a random-coefficients model that, similar to multilevel or hierarchical models, can take dependencies between observations into account. This extends the application of LC regression models to situations with repeated measurements or other types of dependent observations. We will illustrate LC regression with repeated measurements using an application to longitudinal survey data. This is, therefore, an example of a LC growth model. The data set consist of 264 participants in the 1983 to 1986 yearly waves of the British Social Attitudes Survey (McGrath and Waterton, 1986). The dependent variable is the number of yes responses on seven yes/no questions as to whether it is woman's right to have an abortion under specific circumstances. Because this is a count variable with a fixed total, it is most natural to work with a logit link and binomial error function. The predictors that we used are the year of measurement (1=1983; 2=1984; 3=1985; 4=1986) and religion (1=Roman Catholic, 2=Protestant; 3=Other; 4=No religion). The effect of year of measurement is assumed to be classdependent and the effect of religion is assumed to be the same for all classes. We estimated models with 1 to 5 classes, and the 4-class model turned out to performs best in terms of the BIC criterion. We also estimated more restricted models in which the time effect is assumed to be linear and/or the time effect is assumed to be class independent. These models did not describe the data as well as our four-class model, which indicates that the time trend is non linear and heterogeneous. 12

13 Table 11: Parameter Estimates for the Abortion Example Parameter Class 1 Class 2 Class 3 Class 4 Mean Std.Dev. Class size Intercept Year Religion Roman Catholic Protestant Other No religion The parameters obtained with the 4-class model appear in Table 11. The parameter means across classes indicate that the attitudes are most positive at the last time point and most negative at the second time point. Furthermore, the effects of religion show that people without religion are most in favor and Roman Catholics and Others are most against abortion. Protestants have a position that is close to the no-religion group. The class-specific parameters indicate that the 4 latent classes have very different intercepts and time patterns. The largest class 1 is most against abortion and class 3 is most in favor of abortion. Both latent classes are very stable over time. The overall level of latent class 2 is somewhat higher than of class 1, and it shows somewhat more change of the attitude over time. People belonging to latent class 4 are very instable: at the first two time points they are similar to class 2, at the third time point to class 4, and at the last time point again to class 2 (this can be seen by combining the intercepts with the time effects). Class 4 could therefore be labeled as 13

14 random responders. It is interesting to note that in a three-class solution the randomresponder class and class two are combined. Thus, by going from a three- to a fourclass solution one identifies the interesting group with less stable attitudes. Vermunt and Van Dijk (2001) used the same empirical example to illustrate the similarity between LC regression models and random-coefficients, multilevel, or hierarchical models. Using terminology from multilevel modeling, the time variable is a level-1 predictor and religion a level-2 predictor. The effect of the level-1 predictor time is allowed to vary across level 2 units, in this case individuals. The LC regression output can be transformed into the usual output produced by a standard multilevel or hierarchical model -- means, variances, covariances of the intercept and the three time effects -- by elementary statistical operations. The most important part of this multilevel output is what appears in the last two columns of Table 11. A difference between LC regression analysis and standard hierarchical models is that the former does not make strong assumptions about the distribution of the random coefficients. LC regression models can, therefore, be seen as nonparametric hierarchical models in which the distribution of the random coefficients is approximated by a limited number of mass points (= latent classes). As shown by Vermunt and Van Dijk (2001), the LC approach has the practical advantage of being much less computationally intensive than parametric models, and substantively, easier-to-interpret results are often obtained. 14

15 REFERENCES Agresti, A. (2002). Categorical Data Analysis. Second Edition. New York: Wiley. Andrews, R.L., Ansari, A., and Currim, I.S. (2002). Hierarchical bayes versus finite mixture conjoint analysis models: A comparison of fit, prediction, and partworth recovery, Journal of Marketing Research, 39, Banfield, J.D., and Raftery, A.E. (1993). Model-based Gaussian and non-gaussian clustering. Biometrics, 49, Böckenholt, U. (2002). Comparison and choice: analyzing discrete preference data by latent class scaling models. J.A. Hagenaars and A.L. McCutcheon (eds.), Applied Latent Class Analysis. Cambridge University Press. Clogg, C.C., and Goodman, L.A. (1984). Latent structure analysis of a set of multidimensional contingency tables. Journal of the American Statistical Association, 79, Croon, M.A. (2002) Ordering the classes. J.A. Hagenaars and A.L. McCutcheon (eds.), Applied Latent Class Analysis. Cambridge University Press. Dayton, C.M., and Macready, G.B. (1988). Concomitant-variable latent-class models. Journal of the American Statistical Association, 83, Dayton, C.M. (1998). Latent Class Scaling Models. Thousand Oakes: Sage Publications. Dillon, W.R., and Kumar, A. (1994). Latent structure and other mixture models in marketing: An integrative survey and overview. R.P. Bagozzi (ed.), Advanced Methods of Marketing Research, , Cambridge: Blackwell Publishers. Everitt, B.S., and Hand, D.J. (1981). Finite Mixture Distributions. London: Chapman and Hall. Fraley, C. and Raftery, A.E. (1998). How many clusters? Which clustering method? Answers via model-based cluster analysis. Department of Statistics, University of Washington: Technical Report no. 329.: Formann, A.K. (1992). Linear logistic latent class analysis for polytomous data. Journal of the American Statistical Association, 87, Goodman, L.A. (1974a). Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika, 61, Goodman, L.A. (1974b). The analysis of systems of qualitative variables when some of the variables are unobservable. Part I: A modified latent structure approach, American Journal of Sociology, 79, Haberman, S.J. (1979). Analysis of Qualitative Data, Vol 2, New Developments. New York: Academic Press. Hagenaars, J.A. (1988). Latent structure models with direct effects between indicators: local dependence models. Sociological Methods and Research, 16, Hagenaars, J.A. (1990). Categorical Longitudinal Data Loglinear Analysis of Panel, Trend and Cohort Data. Newbury Park: Sage. Heinen, T. (1996). Latent Class and Discrete Latent Trait Models: Similarities and Differences. Thousand Oakes: Sage Publications. Hunt, L., and Jorgensen. M. (1999). Mixture model clustering using the MULTIMIX program. Australian and New Zeeland Journal of Statistics, 41:

16 Jedidi, K., Jagpal, H.S., and DeSarbo, W.S. (1997). Finite-mixture structural equation models for response-based segmentation and unobserved heterogeneity. Marketing Science, 16, Landis, J.R., and Koch, G.G. (1977). The measurement of observer agreement for categorical data, Biometrics, 33, Langeheine, R., and Van de Pol, F. (1994). Discrete-time mixed Markov latent class models. A. Dale and R.B. Davies (eds.), Analyzing social and political change: a casebook of methods, Langeheine, R., Pannekoek, J., Van de Pol, F. (1996). Bootstrapping goodness-of-fit measures in categorical data analysis. Sociological Methods and Research, 24, Lazarsfeld, P.F., and Henry, N.W. (1968). Latent Structure Analysis. Boston: Houghton Mill. Magidson, J., and Vermunt, J.K. (2001). Latent class factor and cluster models, bi-plots and related graphical displays, Sociological Methodology, 31, Magidson J. and Vermunt, J.K. (2002). Latent class models for clustering: A comparison with K-means. Canadian Journal of Marketing Research, 20, McCutcheon, A.L. (1987). Latent Class Analysis, Sage University Paper. Newbury Park: Sage Publications. McGrath, K., and Waterton, J. (1986). British Social Attitudes, Panel Survey. London: Social and Community Planning Research, Technical Report. McLachlan, G.J., and Peel, D. (2000). Finite Mixture Models. New York: Wiley. Rindskopf, R., and Rindskopf, W. (1986). The value of latent class analysis in medical diagnosis. Statistics in Medicine, 5, Uebersax JS. (1999) Probit latent class analysis with dichotomous or ordered category measures: conditional independence/dependence models. Applied Psychological Measurement, 23, Uebersax JS, and Grove, WM. (1990). Latent class analysis of diagnostic agreement. Statistics in Medicine, 9, Van der Heijden P.G.M., Gilula, Z., and Van der Ark, L.A. (1999). On a relationship between joint correspondence analysis and latent class analysis. Sociological Methodology, 29, Vermunt, J.K. (1997). Log-linear Models for Event Histories. Thousand Oakes: Sage Publications. Vermunt, J.K., and Magidson, J. (2000). Latent GOLD 2.0 User's Guide. Belmont, MA: Statistical Innovations Inc. Vermunt, J.K., and Magidson, J. (2002). Latent class cluster analysis. J.A. Hagenaars and A.L. McCutcheon (eds.), Applied Latent Class Analysis, Cambridge University Press. Vermunt, J.K. and Van Dijk. L. (2001). A nonparametric random-coefficients approach: the latent class regression model. Multilevel Modelling Newsletter, 13, Wedel, M., and DeSarbo, W.S (1994). A review of recent developments in latent class regression models. R.P. Bagozzi (ed.), Advanced Methods of Marketing Research, , Cambridge: Blackwell Publishers. Wedel, M., and Kamakura, W.A. (1998). Market Segmentation: Concepts and Methodological Foundations. Boston: Kluwer Academic Publishers. Yung, Y.F. (1997). Finite mixtures in confirmatory factor-analysis models. Psychometrika, 62,

17 LATENT GOLD 4.0 USER S GUIDE Jeroen K. Vermunt & Jay Magidson Statistical Innovations Thinking outside the brackets! TM 17

18 CHAPTER 5. BASIC STEPS FOR MODEL DEVELOPMENT Assigned Reading: LG U ser' s G uide A: LC REGRESSION MODEL Various restrictions are available for intercepts and predictor effects. In addition, for models with continuous dependent variables, restrictions are available for error variances. The various restrictions include class independent restrictions, order restrictions, zero restrictions, fixed-value (offset) restrictions, and equality restrictions. By default, no restrictions are imposed. Restrictions can be placed prior to estimating an initial model to incorporate prior knowledge. For example,, the zero constraints make it possible to specify a different regression model -- with different predictors -- for each latent class based of a priori knowledge about the classes. A specific application for this is a model with a random responders class for which all predictor effects are zero. An application of the equality constraints is the possibility of defining a DFactor-like structure in which, for example, one DFactor influences the intercept and another the predictor effects. Ordering constraints are important if one has a priori knowledge on the direction of an effect. For example, the price effect on a product rating is usually assumed to be negative (or non-positive) in each latent class (segment). The estimates obtained with Regression will be constrained to be in agreement with this assumption if the price effect is specified to be Descending. Alternatively, restrictions may be employed post-hoc to estimate a new model after viewing the results from an estimated model. For Regression, when K classes have been specified in Step 6, the Model Tab contains K individual columns for each of the K latent classes (1,2,,K). Additional columns are labeled Class Independent and Order Restriction. When the range option is used in Step 6 to specify the estimation of models with different numbers of latent classes, class-specific columns are absent from the Model Tab and only the class independent and order restrictions can be applied. Such restrictions are applied to each of the models generated by the range option. 18

LATENT GOLD 4.0 USER'S GUIDE Figure 5-11. Model Tab for LC Regression Model.

19 LATENT GOLD 4.0 USER'S GUIDE Figure Model Tab for LC Regression Model. Selected intercept and predictor effects can be set equal to zero for certain classes (No Effect), equated across two or more classes (Merge Effect), or equated across all classes (Class Independent). The effects of a predictor can be restricted to be ordered in each class (Ascending or Descending). Error variances can be equated between selected classes (Merge Effect) or between all classes (Class Independent). The effect of a numeric predictor on a dependent variable that has a scale type other than nominal can be fixed to 1 in one or more classes (Offset). The first row consists of the label 'Intercept', followed by separate row for each predictor. Restrictions can be set separately for each row. By default, no restrictions are imposed. This is indicated by the unique integers (1,2,,K) that appear in each row, by the label 'No' indicating that the class independence restriction has not been imposed and the label 'None' indicating that no order restrictions have been selected for any predictors. S P E C I F Y I N G E Q U A L I T Y R E S T R I C T I O N S A C R O S S C L A S S E S The effects for a predictor may be specified as Class Independent (Fixed Effects) or Class Dependent (Random Effects, default). When the Class Independent restriction is applied, the regression coefficient for that predictor is restricted to be equal between each of the K latent classes (segments). When Class Independent is selected for a predictor, an '=' appears to the right of the predictor name in the Variables Tab. For a 1-class regression model (K=1), selection of this option has no effect. To select this option Right click on the desired cell(s) in the column labeled 'Class Independent' to retrieve the popup menu. 19

20 CHAPTER 5. BASIC STEPS FOR MODEL DEVELOPMENT Select 'Yes'. The 'No' changes to 'Yes' in the selected cells, and the indices in the selected rows all change to '1' to indicate that the effects for classes 2, 3, etc. are all restricted to be equal to the corresponding effects for class 1. Alternatively, the restriction of class independence can be imposed as follows: Select all the cells containing the indices for a chosen row Right click to bring up the pop-up menu Select 'Merge Effect'. To undo these restrictions, reselect the cells, right click and select 'Separate Effect'. This alternative way of imposing the class independence restriction can also be used to specify equal effects between some but not all the classes. Simply select those class indices for which the across-class equality restriction is desired and select 'Merge Effect'. The indices selected will change to be equal. 'Merge Effect' can be used more than once for a given row, so that it is possible to specify that certain effects for classes 1 and 2 be restricted to be equal and that the effects for classes 3 and 4 be equal. After imposing these restrictions, the associated indices will appear as ' '. Note: the numbers indicate to which class the effect for the class concerned is equated. The regression intercept may also be restricted to be class independent. To set this option, right click on the row labeled 'Intercept' in the column labeled 'Class Independent' to retrieve the popup menu and select 'Yes'. Note: For dependent variables having scale type 'Ordinal', a third option 'No - Simple' is also available. Rather than complete class independence ('Yes'), a 'simple' implementation of class independence analogous to a class independent intercept specification in the linear regression model (continuous scale type) is used where the (partial) dependent variable means are taken to be class independent rather than the entire marginal dependent variable distribution. For further details of this 'class-independent simple' specification for the intercept in ordinal regression models, see section 5.3 of Technical Guide S P E C I F Y I N G Z E R O R E S T R I C T I O N S To restrict one or more effects to zero, select the desired effects, right click, and select 'No Effect'. This causes a '-' to appear in the selected cells which indicates that the effect(s) associated with these cells is now restricted to zero. These menu options can also be used in combination with each other to produce a desired result. For example, first selecting class independence, followed by selecting 'Separate Effect' in one of these cells causes the index for the selected cell to return to its default setting. However, the remaining cells in that row remain restricted to be equal and 'Yes' automatically changes to 'No' in the Class Independent column. S P E C I F Y I N G O R D E R R E S T R I C T I O N S Order Restriction can be used to indicate that the regression coefficient is monotonically increasing (Ascending) or decreasing (Descending). 20

21 LATENT GOLD 4.0 USER'S GUIDE To impose an ordering restriction, right click on one or more cells containing a 'None' label and right click to retrieve the relevant pop-up menu. Select either Ascending or Descending to impose the desired ordering restriction on the selected predictor effect(s). This restriction is imposed across all classes. With a dependent variable of a scale type other than 'nominal', the usage and interpretation of this restriction is straightforward. In the case of a nominal dependent variable, the difference between parameters of adjacent categories of the dependent variable will be in agreement with the specified order restrictions (see section 5.3 of Technical Guide). SPECIFYING FIXED-VALUE RESTRICTIONS With the Offset option one can indicate that the effect of a numeric predictor should be fixed to 1 in selected classes. The option is not available for Nominal dependent variables. Note: It is possible to use the offset restriction to fix the effect to non-zero quantities other than 1 as well. For example, to restrict the effect of a numeric predictor to equal some desired constant c, you would first rescale that predictor by multiplying it by c. You would then use the rescaled version of the predictor in the model. A specific application occurs in the case in which one would like to fix the response probability to 0 for certain predictor values for certain classes. Suppose that the dependent variable is 1='buy', 0='no buy' and that you suspected that one latent class existed that always responded 'no buy' when the predictor variables reflected a certain pattern (e.g., PRICE $50). This option could be used to identify this latent class by creating a dummy predictor variable, say coded [-100, 0] where the value -100 refers to a situation which is suspected to always result in a 'no buy' for some class (fixing a logit coefficient to -100 amounts to fixing the probability of a 1='buy' response to 0). 21

22 Technical Guide for Latent GOLD 5.0: Basic and Advanced 1 Jeroen K. Vermunt and Jay Magidson Statistical Innovations Inc. (617) This document should be cited as J.K. Vermunt and J. Magidson (2013). Technical Guide for Latent GOLD 5.0: Basic and Advanced. Belmont Massachusetts: StatisticalInnovations Inc. 22

23 Assigned Reading: LG Technical G uide A: 5 Latent Class Regression Models The LC or FM Regression model implemented in Latent GOLD is a model with: a single nominal latent variable x, 2. T i replications or repeated observations of a single dependent variable y it, which may be nominal, ordinal, continuous, or a binomial or Poisson count, 3. Q numeric or nominal predictors z pred itq affecting y it via a GLM, where parameters may differ across latent classes, 16 References to LC or FM Regression analysis are Agresti (2002, section 13.2), Vermunt and Van Dijk (2001), Wedel and DeSarbo (1994), and Wedel and Kamakura (1998). 23

24 4. zero, equality, fixed value, and order restrictions on regression coefficients, 5. R numeric or nominal covariates z cov ir affecting x. The main difference between LC Regression analysis and the other forms of LC analysis implemented in Latent GOLD is that it contains a single dependent variable, which may, however, be observed more than once for each case. These multiple responses may be experimental replications, repeated measurements at different time points or occasions, clustered observations, responses on a set of questionnaire items, or other types of dependent observations. The value of the dependent variable for case i at replication t is denoted by y it, and its total number of replications by T i. Note that the index i in T i makes it possible to deal with unequal numbers of observations per case. In the context of LC Regression analysis, it makes sense to make a distinction between two types of exogenous variables: 1. variables influencing the latent variable, which we call covariates, 2. variables influencing the dependent variable, which we call predictors. Covariates will be denoted by zir cov and predictors by z pred itq, where the index reflects that the value of a predictor may change across replications. t in z pred itq A covariate, on the other hand, has the same value across all replications of a particular case. Note that, in fact, we are dealing with a two-level data set, where t indexes the lower-level observations within higher-level observation i. Covariates serve as higher-level exogenous variables and predictors as lower-level exogenous variables. This illustrates that Latent GOLD can be used to estimate (nonparametric) two-level or random-coefficient models (Aitkin, 1999; Skrondal and Rabe-Hesketh, 2004). Using t as an index for time points or time intervals, one obtains nonparametric random-coefficients models for longitudinal data, such growth data, event history data, and panel models with (Vermunt, 1997, 2002a, 2006; Vermunt and Van Dijk, 2002; Wedel et al., 1995; Zackin, De Gruttola, and Laird, 1996). 24

25 5.1 Probability Structure and Linear Predictors The most general probability structure that can be used in the Regression Module takes on the following form: f(y i z cov i, z pred i ) = K P (x z cov T i i ) x=1 t=1 f(y it x, z pred it ). (11) The main differences from the Cluster Module (see equation 6) are that in Regression we make a distinction between covariates and predictors, we allow for different numbers of replications per case, we assume that the conditional ) have the same form for each t, and we do not allow for direct effects between the multiple responses. In Latent GOLD s Regression Module, it is possible to specify a replication weight v it for each of the records of case i. This modifies the definition densities f(y it x, z pred it of f(y i z cov i, z pred i ) somewhat: f(y i z cov i, z pred i ) = K P (x z cov T i i ) x=1 t=1 { f(yit x, z pred it ) } v it. (12) Two possible applications of replication weights include grouping of records and differential weighting of responses. Another possible application is the analysis of multinomial counts. With an M-category multinomial count response variable, one would get M records for each case with responses equal to 1, 2, 3,..., M, respectively, and replication weights equal to the number of times the category concerned was selected. Another slight modification of the probability structure defined in equation (11) occurs with the zero-inflated option. Depending on the scale type, the model is expanded with either one or M latent classes that are assumed to give a particular response with probability one. Zero-inflated Poisson and binomial count models are obtained by adding a single latent class to the model, yielding a model with K + 1 Classes. Class K + 1 has a zero Poisson rate or zero binomial probability, or, equivalently, a probability of one of having zero events; that is, P (y it = 0 x, z pred it ) = 1 for x = K + 1. The same happens in the case of a continuous dependent variable with the additional modification that the dependent variable is assumed to be censored-normal distributed in the other K Classes. This model is sometimes referred to a censored-inflated linear regression model. For ordinal and nominal dependent variables, M Classes are added to the model, each of which responds 25

26 with probability one to a certain category; that is, P (y it = m x, z pred it ) = 1 for x = K + m. Such Classes are sometimes referred to a stayer Classes (in mover-stayer models) or brand-loyal Classes (in brand-loyalty models). Because the linear predictor in P (x z cov i ) has the same form as in LC Cluster models (see equation 9), we will pay no further attention to it here. The other two linear predictors of interest η x,zit and η m x,zit require more explanation because these have a specific form in LC Regression models. For continuous and count responses, we use η x,zit = β x0 + Q q=1 β xq z pred itq, where β x0 is a Class-specific intercept and β xq the Class-specific regression coefficient corresponding to predictor number q. Depending on the scale type of the response variable, this yields either a standard linear, log-linear Poisson, or binary logistic regression model with parameters that differ across latent classes. The above linear term differs from the one used in LC Cluster models in that all parameters can vary between Classes and no further restrictions are needed for identification. Another important difference is that neither η or β has an index t, which implies that parameters do not differ across repeated responses. Differences in predicted values between the multiple responses can thus only be the result of varying predictor values. 17 For nominal responses, we use a multinomial logit model having a linear predictor of the form η m x,zit = β xm0 + Q q=1 β xmq z pred itq, with depending on the selected coding type for nominal variables the identification restrictions M m=1 β xmq = 0, β x1q = 0, or β xmq = 0, for 0 q Q. An ordinal response variable is modeled with an adjacent-category logit model in which η m x,zit = β xm0 + Q q=1 β x q y m z pred itq. Here, y m is the score assigned to category m of the response variable. In the ordinal case, we need only an identifying effect or dummy constraints for the intercept term β xm0. 17 By defining an independent variable replication number or time, as well as the necessary product terms, one can make parameters replication or time dependent. 26

27 5.2 Some Special Cases The simplest probability structure for a LC Regression model occurs if there is a single response per case and no predictors; that is, K f(y i ) = P (x)f(y i x). x=1 This yields a simple univariate finite mixture model in which the mean and possibly also the variance of the distribution of y i is assumed to be Classdependent. Such a model without predictors makes it possible to describe the unobserved heterogeneity with respect to the distribution of y i in the population under study. 18 A more useful LC Regression model is obtained by including predictors in the model, such as, Here, f(y i x, z pred i1 f(y i z pred i1, z pred i2 ) = K x=1 P (x)f(y i x, z pred i1, z pred i2 )., z pred i2 ) denotes the distribution of the dependent variable y given a person s class membership x and predictor values z pred i1 and z pred i2. Depending on the type of dependent variable, the expected value of the appropriate distribution is restricted by means of a logistic, log-linear, or linear regression model. 19 An important extension of the above LC Regression model is obtained by making class membership dependent on covariates (Kamakura, Wedel, and Agrawal, 1994; Vermunt, 1997). An example of such a model is: f(y i z covc i1 z cov i2, z pred i1, z pred i2 ) = K x=1 P (x z cov i1, z cov i2 )f(y i x, z pred i1, z pred i2 ) In this model, it is assumed that the probability of belonging to latent class x depends on the values of zi1 cov and zi2 cov. This is equivalent to the way covariates can be used in LC Cluster models. 18 Relevant references on this topic are Böhning (2000), Dillon and Kumar (1994), Everitt and Hand (1981), Laird (1978), McLachlan and Basford (1988), McLachlan and Peel (2000), Magidson and Vermunt (2003a), and Titterington, Smith, and Makov (1985). 19 For some applications, see Böckenholt (1993), Land, McCall and Nagin (1996), Mare (1994), and Wedel and DeSarbo (1994). 27

28 As already mentioned above, there may be more than one observation per case; that is, there may be more than one replication of the same dependent and independent variables for each observational unit. Extending the above model to multiple replications yields the following probability structure: f(y i z cov i1, z cov i2, z pred i1, z pred i2 ) = K x=1 P (x zi1 cov, z cov T i i2 ) t=1 f(y it x, z pred it1, z pred it2 ). Such a LC Regression model for repeated measures is very similar to multilevel (two-level), mixed, or random-coefficients models, in which random effects are included to deal with the dependent observations problem. The LC Regression model is, in fact, a nonparametric random-effects model (Agresti, 2002, section 13.2; Aitkin, 1999; Skrondal and Rabe-Hesketh, 2004; Vermunt and Van Dijk, 2001). 5.3 Restrictions for the Class-specific Regression Coefficients Various types of restrictions can be imposed on the Class-specific regression coefficients: intercepts and predictor effects can be fixed to zero, restricted to be equal across certain or all Classes, and constrained to be ordered. Moreover, the effects of numeric predictors can be fixed to one. These constraints can either be used as a priori restrictions derived from theory or as post hoc restrictions on estimated models. Certain restrictions apply to parameters within each Class, while others apply across Classes. The within-class restrictions are: No Effect: the specified effect(s) are set to zero selected latent classes; Offset: the selected effect(s) are set to one, thus serving as an offset. 20 The offset effect applies to numeric predictors only, and only with ordinal, continuous, Poisson count, and binomial count responses (thus not with a nominal dependent variable). 20 The term offset stems from the generalized linear modeling framework. It refers to a regression coefficient that is fixed to 1, or equivalently, to a component that offsets the linear part of the regression model by a fixed amount. An offset provides the same role as a cell weight in log-linear an logit analysis. In log-linear and logit model, an offset is, in fact, the log of a cell weight. 28

29 Between-Class restrictions are: Merge Effects: the intercept or the effect a selected predictor is equated across 2 or more specified Classes; Class Independent: the intercept or the effects of a selected predictor is equated across all Classes; Order Restriction (ascending or descending): in each Class, the effect of a selected numeric predictor is assumed to have the same sign or the effects corresponding to a selected nominal predictor are assumed to be ordered (either ascending or descending). That is, for numeric predictors, the ascending restriction implies that the Class-specific coefficients should be at least zero (β 0) and the descending restriction that they are at most zero (β 0). For nominal predictors, ascending implies that the coefficient of category p+1 is larger than or equal to the one of category p (β p β p+1, for each p) and descending that the coefficient of category p + 1 is smaller than or equal to the one of category p (β p β p+1, for each p). Similarly to what was explained in the context of the order-restricted Cluster model, with a nominal dependent variable, the imposed order restriction applies to each of the adjacent category logits. In the case of ascending, this implies β m+1 β m 0 for numeric predictors and β m+1,p β p β m+1,p+1 β m,p+1 for nominal predictors (see Galindo and Vermunt, 2005; Vermunt, 1999; Vermunt and Hagenaars, 2004). The Class-specific error variances of a continuous dependent variable are restricted to be at least 1.0e-6 times the observed variance in the sample. The Class Independent option can be used to specify models in which some coefficients differ across Classes while others do not. This can either be on a priori grounds or can be based on the test statistics from previously estimated models. More specifically, if the Wald(=) test is not significant, it makes sense to check whether an effect can be assumed to be Class independent. There is a special variant of the Class-independent option called No Simple that can be used in conjunction with the intercept in an ordinal regression model. With this option, the intercept is modeled as β xm0 = β m0 + β x 0 y m, where β x 0 is subjected to an effect or dummy coding constraint. 29

30 This specification of a Class-specific intercept is much more parsimonious and is, in fact, equivalent to how x-y relationships with ordinal y s are modeled in LC Cluster models. Rather that estimating K M intercept terms, one now estimates only M + K 1 coefficients; that is, one extra coefficient per extra latent class. Order Restrictions are important if one has a priori knowledge about the sign of an effect. For example, the effect of price on persons preferences is usually assumed to be negative or better, non-positive for each latent class (segment). If the price effect is specified to be Descending, the resulting parameter estimate(s) will be constrained to be in agreement with this assumption. Another application of order restrictions occurs in growth modeling. Rather than assuming a specific functional form for the time dependence, one may wish to make the much less restrictive assumption that the time effect is monotonic (see, for example, Vermunt and Hagenaars, 2004). The No Effect option makes it possible to specify a different regression equation for each latent class. More specifically, each latent class may have different sets of predictors affecting the responses. Post hoc constraints can be based on the reported z value for each of the coefficients. An example of an a priori use of this constraint is the inclusion of a random-responder class, a latent class in which all coefficient are zero, except for the intercept. This is specified as follows: Intercept Predictor1 Predictor2 Class 1 Class 2 Class 3 Class , where indicates that the effect is equal to 0. In this example, Class 1 is the random-responder class. Merge Effects is a much more flexible variant of Class Independent. It can be use to equate the parameters for any set of latent classes. Besides post hoc constraints, very sophisticated a priori constraints can be imposed with this option. An important application is the specification of DFactorlike structures in which each latent class corresponds to the categories of two or more latent variables. For example, consider a set of constraints of the 30

31 form: Intercept Predictor1 Predictor2 Class 1 Class 2 Class 3 Class where the numbers in the cells indicate equal to Class #. This restricted 4-Class model is in fact a 2-dimensional DFactor model: the categories of DFactor 1 differ with respect to the intercept and the categories of DFactor 2 with respect to the two predictor effects. Specifically, level 1 of DFactor 1 is formed by Classes 1 and 2 and level 2 by Classes 3 and 4; level 1 of DFactor 2 is formed by Classes 1 and 3 and level 2 by Classes 2 and 4. The option Offset can be used to specify any nonzero fixed-value constraint on the Class-specific effect of a numeric predictor in models for ordinal, continuous and count responses. This means that it is possible to refine the definition of any Class (segment) by enhancing or reducing the estimated numeric predictor effect for that Class. Recall that numeric predictor q enters as β xq z pred itq in the linear term of the regression model. Suppose, that after estimating the model, the estimate for β xq turned out to be 1.5 for Class 1. If z pred itq is specified to be an offset, the effect of this predictor would be reduced (1.5 would be reduced to 1) for this Class. But suppose that you wish to enhance the importance of this predictor for Class 1; say, you wish to restrict β xq to be equal to 2. The trick is to recode the predictor, replacing each code. If we restrict the effect of this recoded predictor to 1, we obtain 1 2 z pred itq, which shows that the effect of z pred itq is equated to 2. Such recoding can be done by twice the value. Thus, the recoded predictor is defined as 2 z pred itq easily within Latent GOLD, using the Replace option. In addition to post hoc refinements to customize the definition of the resulting Classes, the offset restriction can also be used to make the latent classes conform to various theoretical structures. Probably the most important a priori application of Offset is the possibility of defining zero-inflated models for counts, which can also be specified with the special zero-inflated option discussed above. These are models containing one latent class that has a Poisson rate (binomial probability) equal to 0 and that is not affected by the other predictors. An example of a restrictions table corresponding to, 31

32 such a structure is: Intercept LargeNegative(-100) Predictor1 Predictor2 Class 1 Class 2 Class Here, means no effect and means offset. As can be seen, Class 1 is only affected by an offset, and Classes 2 and 3 have there own intercept and predictor effects. The numeric predictor LargeNegative(-100) takes on the value -100 for all records. 21 As a result of the fixed effect of -100, the rate/probability of experiencing an event will be equal to zero. In the case of a Poisson count, the rate for someone belonging to Class 1 equals exp( 100) = 0. In the binomial case, the probability that someone belonging to Class 1 experiences an event is exp( 100)/[1 + exp( 100)] = 0. Now, we will discusses several more advanced applications of the restriction options. Suppose you assume that the effect of price is linear and negative (descending) for Classes 1-3 and unrestricted for Class 4. This can be accomplished by having two copies of the price variable in the model, say Price1 and Price2. The effect of Price1 (numeric) is specified as ordered and is fixed to zero in Class 4. The effect of Price2 is fixed to zero in Classes 1-3. Similar tricks can be applied in LC growth models, in which one may wish to allow for a different type of time effect in each of the latent classes: for example, quadratic (time and time squared, both numeric) in Class 1, linear (time numeric) ascending in Class 2, unrestricted (time nominal) in Class 3, and no time effect in Class 4. Suppose your assumption is that the effect of a particular predictor is at least 2. This can be accomplished by combining a fixed value constraint with an order constraint. More precisely, an additional predictor defined as 2 z pred itq is specified to be an offset and the effect of the original predictor z pred itq defined to be ascending. 21 It is not necessary to assume that the -100 appears in all records. The value could also be -100 if a particular condition is fulfilled for example, if the price of the evaluated product is larger than a certain amount and 0 otherwise. This shows that the offset option provides a much more flexible way of specifying classes with zero response probabilities than the zero-inflated option. 32

33 Latent GOLD Tutorial 3: (Referred to in Lecture Notes as Pages ) 33

34 Tutorial #3: LC Regression with Repeated Measures DemoData = 'conjoint.sav' This tutorial shows how to develop Latent Class (LC) Regression models using the sample data file conjoint.sav. You will learn how to: Select the dependent variable and specify its scale type Distinguish predictors from covariates Impose restrictions on the predictor effects Specify covariates as active or inactive Determine the number of latent classes (i.e., segments) Examine R 2 and various other information related to model prediction In addition, this example illustrates several advanced options in the LC Regression Module. You will learn how to: Use the optional case ID variable to specify repeated observations Explore the Profile and ProbMeans output Use demographic variables as covariates to predict segment membership Obtain predictions based solely on the covariates Classify cases into latent segments The Data The data for this example are obtained from a hypothetical conjoint marketing study involving repeated measures where respondents were asked to provide likelihood of purchase ratings under each of several different scenarios. A partial listing of the data is shown in Figure 1. Figure 1: Partial Listing of Conjoint Data

35 As suggested in Figure 55, there are 8 records for each case (there are 400 cases in total); one record for each cell in this 2x2x2 complete factorial design of different scenarios for the purchase of a product: FASHION (1 = Traditional; 2 = Modern) QUALITY (1 = Low; 2 = High) PRICE (1 = Lower; 2 = Higher) The dependent variable (RATING) is a rating of purchase intent on a five-point scale. The three attributes listed above will be used as predictor variables in the model. We will also include the two demographic variables as covariates, in a second model. SEX (1 = Male; 2 = Female) AGE (1 = 16-24; 2 = 25-39; 3 = 40+). The Goal Use Latent GOLD to identify latent segments differing with respect to the estimate of importance attached to each of the three attributes, which influence an individual s purchase decision. The LC regression model allows for the fact that these estimates may differ for different segments. That is, for one segment, price and only price may influence the decision, while a second segment may be influenced by quality and modern appearance, but is price insensitive. We will treat RATING as an ordinal dependent variable and compare several models to determine the number of segments. We will then show how to describe the demographic differences between these segments and to classify each respondent into that segment which is most likely. Estimating an LC Regression Model Opening a Data File and Selecting the Type of Model For this example, the data file being used is an SPSS system file. To open the file, from the menus choose: File Open From the Files of type drop down list, select SPSS System Files if this is not already the default listing. All files with.sav extensions appear in the list (see Figure 2). Note: If you copied the sample data file to a directory other than the default directory, change to that directory prior to retrieving the file.

Right click to open the Model Selection menu (you may also double click the model name to open this menu or

36 Figure 2: Open Dialog Box Select conjoint.sav and click Open to open the Viewer window. Highlight Model1 if it is not already highlighted. Right click to open the Model Selection menu (you may also double click the model name to open this menu or select the type of model from the Model Menu). Select Regression and the LC Regression analysis dialog box, which contains 6 tabs, will open (see Fig. 3). Figure 3: Analysis Dialog Box for LC Regression Model

37 Selecting the Variables for the Analysis For this analysis, RATING will be the dependent variable. Select RATING in the Variables List and click Dependent to move the variable to the Dependent box. We also need to indicate the dependent variable scale type. For this example, we will use the default scale type (Ordinal-Fixed) which takes into account the natural ordering between the 5 levels of purchase intent. By default, the fixed scores on the data file (1, 2, 3, 4 and 5) are used which order the levels and establish equal distance between adjacent levels. As explained above, the data contains repeated observations for each respondent (case). Therefore, we need to indicate which records belong to each case. This is accomplished using a Case ID variable, which contains a unique identification number for each case. All records belonging to the same case are assigned the same unique ID. Select ID in the Variables list and click Case ID to move the variable into the Case ID box. Next, we will select the Predictors. Predictors are used as independent variables in the regression model. In the current example, we use the product attributes FASHION, QUALITY and PRICE as predictors. Select FASHION, QUALITY and PRICE in the Variables list and click Predictors to move the variables into the Predictors box. Specifying the Number of Classes The LC regression model simultaneously estimates a separate regression model for each class. A 1-class model estimates only a single regression model. It makes the standard homogeneity assumption that a single regression model holds true for all cases. In the current example, we will start by estimating a 1- class model and obtain a log-likelihood statistic to be used as a base. We will then estimate additional models, which successively increment the number of classes by 1 and assess the significance of each additional class. One assessment consists of a check of whether the change in the log-likelihood for each pair of successive models fails to decrease by a significant amount as determined by the BIC statistic. (The model having the lowest BIC might then be selected.) A second assessment is to utilize the p-value associated with the L 2 fit statistic. In the box titled Classes (located below the Covariates pushbutton) type 1-4 to request the estimation of 4 different LC Regression models a 1-class model, a 2-class model, a 3-class model and a 4-class model. Scanning the Data File Click Scan (located in the lower left of the Analysis dialog box) to scan the data file. The number of distinct categories (or values) along with the scaling option appears next to each variable in the Dependent and Predictors boxes. To view category labels, frequency counts and any scores assigned to any scanned variable, double click on the variable name in the Dependent or Predictor list box. The Variables dialog box will open (see Figure 4).

38 Double click the dependent variable, RATING. Figure 4: Variables box for RATING Click OK to close the Variables dialog box and return to the Regression Analysis dialog box. Estimating the Model Now that we have selected our variables and specified the models, we are ready to estimate the models. Your analysis dialog box should look like Figure 5. Figure 5: Regression Analysis Dialog Box with Initial Settings Click Estimate (located at the bottom right of the Analysis dialog box).

Viewing Output and Interpreting Results For LC Regression models, several output files are produced. To view a summary of the models estimated (Figure 6), Click on the data file name, conjoint.

39 Viewing Output and Interpreting Results For LC Regression models, several output files are produced. To view a summary of the models estimated (Figure 6), Click on the data file name, conjoint.sav in the Outline pane. Figure 6: Summary of Models Estimated This output reports statistics that will assist you in determining the correct number of classes -- the loglikelihood (LL) values, the BIC values, and the number of parameters in the estimated models. It is important to determine the right number of classes because specifying too few ignores class differences, while specifying too many may cause the model to be unstable. While the log-likelihood increases each time the number of classes is increased, the minimum BIC value occurs for Model3, suggesting that the 3- class solution is the best of the four estimated models. The R 2 increases from.37 for the 1-class model to.61 for the 3-class regression. Note 1: Occasionally, you might obtain a local (suboptimal) solution. For these data, it is possible to obtain a local solution for the 4-class model, obtaining LL = instead of If this occurs, click Estimate to re-estimate the 4-class model (see section 6.6 in the Technical Guide for a discussion of preventing local solutions). Note 2: Notice that the p-values based on the model L 2 and reported degrees of freedom (df) are not valid assessments of fit because we are dealing with sparse data. We will now examine the detailed output for the 3-class solution. Profile Output Rename Model3 to "3-class" by clicking on its name. Click the + icon next to 3-class in the Outline pane to expand the listing of output for this model. Click Profile. The Profile output contains information on the class sizes, the class-specific (marginal) probabilities and means of the dependent variable (see Figure 7).

Examination of class-specific probabilities shows that overall, segment 1 is least likely to buy (only 27.56% are Very Unlikely to buy) and segment 3 is most likely (28.

40 Figure 7: Profile Output for 3-Class Model The classes are always ordered from high to low according to their size. It can be seen from the first row of the table that segment 1 contains about 50% of the subjects (.4986), segment 2 contains about 25% and segment 3 contains the remaining 25%. Examination of class-specific probabilities shows that overall, segment 1 is least likely to buy (only 27.56% are Very Unlikely to buy) and segment 3 is most likely (28.08% are Very Likely to buy). Later in this tutorial, we will show how to classify each case into the most appropriate segment. Parameters Output Next, we will view the Parameters output (see Figure 8). For the 3-class model, click Parameters. Figure 8: Parameters Output for 3-Class Model

41 The beta parameter for each predictor is a measure of the influence of that predictor on RATING. The beta effect estimates under the column labeled Class 1 suggest that segment 1 is influenced in a positive way by products for which FASHION = Modern (beta = ), in a negative way by PRICE (beta = ), and not at all by QUALITY (beta is approximately 0). We also see that segment 2 is influenced by all 3 attributes, having a preference for those product choices that are modern (beta = ), and higher quality (beta =.9216), but, like segment 1, their preference also decreases as a function of price (beta = ). Members of segment 3 prefer higher quality products (beta = ), but their preference also decreases as a function of price (beta = ), and they are not influenced by FASHION. Note that PRICE has more or less the same influence on all three segments. The Wald (=) statistic indicates that the differences in these beta effects across classes are not significant (the p-value =.67 which is much higher than.05, the standard level for assessing statistical significance). This means that all 3 segments exhibit price sensitivity to the same degree. This is confirmed when we estimate a model in which this effect is specified to be class-independent (see next section). The p-value for the Wald statistic for PRICE is 2.4x indicating that the amount of price sensitivity is highly significant. With respect to the effect of the other two attributes we find large between-segment differences. The predictor FASHION has a strong influence on segment 1, a less strong effect on segment 2, and virtually no effect on segment 3. QUALITY has a strong effect on segment 3, a less strong effect on segment 2, and virtually no effect on segment 1. The fact that the influence of FASHION and QUALITY differs significantly between the 3 segments is confirmed by the significant p-values associated with the Wald(=) statistics for these attributes. For example, for FASHION, the p-value = 6.2x In summary, segment 1 could be labeled the Fashion-Oriented Segment, segment 3 the Quality-Oriented Segment, and segment 2 is the segment that takes into account all 3 attributes in their purchase decision. To test each individual class-specific beta for statistical significance we can append standard errors, Z- statistics, or both to the output. Right click on the output in the Contents Pane and choose Z Statistic:The Wald statistics are replaced by the Z- statistics for the betas. Notice that the absolute values of the z- score associated with QUALITY for class 1 and with FASHION for class 3 fall under 2, and hence are not significant at the.05 level. Figure 9: Parameters Output with Z-values Restricting Certain Effects to be Zero or Class Independent In the Parameters output above we saw that the beta estimates associated with PRICE are approximately equal for all 3 classes. To test the null hypothesis of equality, we used the Wald(=) statistic. The low value of.67 was too small to reject this null hypothesis. We also showed that 2 of the betas were not significantly different from 0. We will now show how to obtain a more parsimonious model by imposing zero restrictions

PRICE, right click on the Class Independent column and select Yes. Figure 10.

42 on the 2 betas and by restricting the betas associated with PRICE to be equal across segments. This is accomplished using the Model Tab. To specify the PRICE effects to be class independent, Double click 3-class to open the Analysis Dialog Box for this model Left click on the Model Tab In the row for PRICE, right click on the Class Independent column and select Yes. Figure 10. Setting the Class-Independent effects for the variable PRICE To specify the betas to be zero, Right click on the cells corresponding to the betas to be set to zero Select No Effect Figure 11. Specifying betas to 0 for variable FASHION Click on Variables to return to the Variables Tab.

An = will appear to the right of the variable PRICE to indicate the class independent restriction. Click Estimate to re-estimate this model. The output for this new model will be listed as Model5.

43 An = will appear to the right of the variable PRICE to indicate the class independent restriction. Click Estimate to re-estimate this model. The output for this new model will be listed as Model5. Viewing Output and Interpreting Results To view the summary output, Click on the data file name, conjoint.sav in the Outline pane. Figure 12: Summary Output The new model estimated has been added to the bottom of the list as Models 5. For Model 5, the fit is almost identical to Model 3 and we obtain a lower (better) BIC value. Parameters Output Click Parameters for Model5 to view the results containing the desired restrictions. For PRICE, which we specified as class independent, note that the Wald(=) is now zero because the betas have been restricted to be exactly equal to each other across classes. Adding Covariates There is one important topic left with respect to the specification of LC regression models; that is, the use of covariates. In the Covariates list box, we can specify variables that we want to use to predict class membership. For this example we will re-estimate the 3-class model, this time including SEX and AGE as covariates. To estimate the model specifying SEX and AGE as covariates, In the Outline pane, double click Model5 to open the Analysis dialog box for this model. Latent GOLD has maintained our previous settings (we will keep PRICE set as class independent). Select SEX and AGE in the Variables list box. Click Covariates to move these variables to the Covariates box. Right click on the variable names and select scale type Nominal.

Viewing Output and Interpreting Results To view the summary output, Click on the data file name, conjoint.

44 Figure 13: Specification of Model with Covariates Click Estimate. The output for this new model will be listed as Model6. Viewing Output and Interpreting Results To view the summary output, Click on the data file name, conjoint.sav in the Outline pane. Figure 14: Summary Output The new model has been added to the bottom of the list as Model 6. Note that this model is even better than our previous 3-class models as indicated by the lower BIC value.

45 Parameters Output Click Parameters for Model6 to view the results reported in Figure 15. Figure 15: Parameters Output for 3-Class Model with Covariates First, note that the beta parameter estimates for the 3-class model with covariates (see Figure 15) are similar to those in the original 3-class model (see Figure 8). The gamma parameters of the model for the latent distribution appear at the bottom of the Parameters output in Figure 69 under the heading Model for Classes. The p-values associated with the Wald statistic shows that overall, both effects are significant. The betas associated with SEX = Female (0.5423, , ) suggest that females are more likely than males of belonging to the Fashion-Oriented Segment (segment 1), and much less likely to belong to segment 2. The AGE effects show that the youngest age group is more likely than other respondents to be in the Fashion-Oriented Segment while the oldest age group is more likely to be in the Quality-Oriented Segment.

46 Classification Output To obtain the Classification output, you need to specify it as an option in the Output Tab before estimating your model. To view the standard classification output for Model 6, Double click on Model6 in the Outline pane to open the Analysis dialog box for this model. Click on the Output Tab. Click in the checkbox next to Classification - Posterior and 'Classification - Model' to select this output. Click Estimate to re-estimate the model. The new model estimated has been added to the bottom of the list as Model 7 (it is the same as Model6 except for the additional output file selected). In the Outline pane, for Model7, click Classification. Figure 16: Classification Output for Model 7 (partial listing) We can see that the first respondent (ID=1) would be classified into segment 2, since segment 2 has the highest membership probability (.5660) for that respondent (see Figure 16). (For information on how to append classification scores to you original data file, see Step 9 in Chapter 5.) Now, suppose that you wish to classify new data into the appropriate classes but you only had covariate information on these cases. You would use the Classification - Model (Covariate Classification) output for this purpose. To view this output, click on Classification - Model.

47 Figure 17. Covariate Classification Output for Model7

48 Tutorial #7A: Latent Class Growth Model (# seizures) Overview What you will learn: To use Latent GOLD to identify distinct latent class growth trajectories in the data. To name the identified latent class subgroups based on their growth patterns o classify 36 cases as 'unchanged' (Class 1 above) o classify 17 cases as 'improved' (Class 2 above) o classify the remaining 6 cases as 'unstable' (Class 3 above) How to estimate a latent class growth model (Poisson mixture model) to these data that shows those receiving the drug treatment were significantly more likely than the placebo group to improve and significantly less likely to show no change over their baseline seizure rate (p =.02). 56

49 Figure 1. A reanalysis of a longitudinal data set on counts of epileptic seizures using latent class growth models. (Data source: Thall and Vail, 1990). The Data Download Demo Data Set (.sav) = growth1.sav In this study, 59 epileptics were randomly assigned to either an anti-seizure medication Progabide (TRT = 1; n=31) or a placebo (TRT = 0; n=28) as an adjunct to other chemotherapy. For each, a base number of seizures was recorded for 8 weeks before the drugs were administered (BASE) and then for four consecutive 2 week periods (TIME) each preceding a visit to the clinic. We also know the age (AGE) of each participant. Data for the first 3 cases are shown below: 57

Figure 2. Data for the first 3 cases The Poisson Mixture Model Let denote the number of epileptic fits during the 2 weeks prior to visit T=t for case i.

50 Figure 2. Data for the first 3 cases The Poisson Mixture Model Let denote the number of epileptic fits during the 2 weeks prior to visit T=t for case i. Case i is assumed to belong to one of K unobservable latent classes x=1,2,,k. We assume that follows the Poisson mixture model given below: where = base rate of seizures for case i per 2-week baseline period = BASE /4 (labeled 'AVGBASE' in the SPSS file shown above) is the random intercept measuring the overall change from the baseline and x is the random effect associated with the t th 2-week treatment period Hence, each latent class x identifies a distinct pattern of change in the seizure rate from the baseline to the treatment 58

51 period. For identification of the, we use effect coding: In our final model, will be modeled as a class independent offset i.e.,, which implies that the expected % change in the seizure rate from the baseline to period t is: Thus, the growth trajectory associated with latent class x is given by: Tutorials #7A and #7B illustrate somewhat different approaches for using Latent GOLD to estimate 1) the trajectories and 2) the treatment effect. These different approaches agree on the following results: Latent class #1 contains approximately 57% of the cases who show basically no change from the baseline rate. Class #2 (31%-32% of the cases) show a significant reduction in seizure rate. The remaining class(es) consist of 6 unstable cases who show a substantial increase in at least one of the treatment periods. These 6 cases were all identified as outliers in an earlier analysis of these data by Rabe-Hesketh and Skrondal (2004). Those treated with Progabide were significantly less likely to be in class 1 ( no change ) and significantly more likely to be in class 2 ( improved ) than the Placebo group. In tutorial #7A, our analysis consists of two steps. First, we estimate a pure (unsupervised) growth model to identify the K different growth patterns over the four post time periods. This growth model pure in the sense that covariate information (AGE and TREATMENT status of the cases) is not taken into account. During step 2 we assess the relationship between these covariates and the different classes. In tutorial #7B, we estimate the treatment effect and all the model parameters simultaneously (i.e., in a single step) by specifying TREATMENT as an active as opposed to inactive covariate. The models and methods illustrated here are simpler to apply than approaches based on Generalized Estimating Equations that have used in conjunction with these data by others (Diggle, et. al, 1994; Lee, 2004). All estimates are maximum likelihood. 59

Setting up the Model To retrieve the setups for these models: Select FILEOPEN and click epil.lgf Double click on Model1 The Variables tab appears as follows: Figure 3. Variables Tab for epil.

52 Setting up the Model To retrieve the setups for these models: Select FILEOPEN and click epil.lgf Double click on Model1 The Variables tab appears as follows: Figure 3. Variables Tab for epil.lgf The scale type for the dependent variable Y is set to Count which means that a Poisson regression model will be estimated for each latent class. ID2 is used in the case ID box, indicating that there are multiple records for each case. The predictors TIME and LBASE are included in the Predictors box: TIME is specified as a nominal predictor which allows the estimated time trend to take on any pattern (such as a reduction during period 1 followed by an increase during period 60

53 2). Separate distinct time trends are identified for each class (i.e., random effects are estimated for TIME). The predictor LBASE is treated as class independent which means that this estimate is restricted to be equal in each class. This is indicated by the symbol = which appears next to the variable LBASE in the Predictors box. Thus, a single fixed effect will be estimated for this. Later, we will restrict this parameter estimate to 1. Four additional variables (TRT, BASE, AGE and AVGBASE) are included in the Covariate box and treated as inactive as indicated by the symbol < I >. Thus, these variables will not affect the estimation of the parameters, but these variables will be cross-tabulated by class in the output. Notice in the Classes box that the symbol 1-4 appears. This indicates that we will estimate a 1- class, 2-class, 3-class and a 4-class model. Click Estimate to estimate these 4 models After the estimation has completed: Click the data file name growth1.sav to display the model summary table Right click in the model summary output table to retrieve the Model Summary Display Click to remove the checkmarks for L-square, df and p-value in the Model Summary Display 61

Figure 4. Model Summary Output and Model Summary Display Notice that the BIC statistic is lowest for the 4-class model. We will examine the output for both the 3- and 4-class models here.

54 Figure 4. Model Summary Output and Model Summary Display Notice that the BIC statistic is lowest for the 4-class model. We will examine the output for both the 3- and 4-class models here. The R2 statistic is.84 for the 3-class and.92 for the 4-class model, and the misclassification error is approximately 6% for each of these models. Click Parameters associated with the 4-class model 62

55 Figure 5. Parameters Output for 4-class Model Note the following: The coefficient for LBASE is almost 1. The TIME estimates for class 1 are very close to zero, suggesting that this class shows no change from the baseline seizure rate. We will refer to the growth pattern for class 1 as no change. The estimate for the Intercept (alpha) for class 2 is a large negative value suggesting an overall reduction in the seizure rate for this class. Despite the small setback between periods 1 and 2 (indicated by the beta increasing from.0973 to.2976) we will refer to this as the improved class. Classes 3 and 4 show a large positive alpha, suggesting an overall increase (worsening) in the seizure rate. The betas for these classes show an abrupt increase in the number of seizures at some point during the follow-up period (time 1 for class 3 shows a beta of.3740; time 3 for class 4 shows a beta of.8410). We will see later that the cases classified into one of these unstable classes are the same ones that were identified as outliers in previous studies. We will refer to these classes as unstable. To display standard errors for these estimates, right click in the output table and select Standard Errors from the popup menu. 63

Figure 6. Parameters Output for 4-class model with Standard Errors For class 1, the estimates for the alpha and beta parameters are all less than 1 standard deviation.

56 Figure 6. Parameters Output for 4-class model with Standard Errors For class 1, the estimates for the alpha and beta parameters are all less than 1 standard deviation. Later, we will further refine class 1 by setting the TIME estimates (betas) to zero. The estimates for alpha for the other classes are well above 2 standard errors. Note that the two right-most columns provide the Overall mean and standard deviation for the alpha and beta parameters. For LBASE the standard error for the Overall effect is 0 because this is a fixed effect (i.e., it is class independent). The other estimates have non-zero standard errors because they differ depending upon the class (i.e., they are treated as class dependent or random effects). Click on Profile to display the Profile output for the 4-class model. 64

57 Figure 7. Profile Output for 4-class Model The row labeled Class Size indicates that Class 1 (the no change class) represents 57% of the cases. Class 2 (those who improved) contains about 31% of the cases. The remaining 12% of the cases (classes 3 and 4) are the unstable ones. Click on the icon to the left of Model 3 to expand it Click on Profile to open the Profile output for the 3-class model Figure 8. Profile Output for 3-class Model Notice that the class sizes for classes 1 and 2 are about the same as the corresponding classes for the 4-class model. Click Parameters to display the Parameters output for the 3-class model 65

Figure 9. Parameters Output for 3-class Model Notice that the alpha and beta estimates for classes 1 and 2 are also similar to those for the 4- class solution.

58 Figure 9. Parameters Output for 3-class Model Notice that the alpha and beta estimates for classes 1 and 2 are also similar to those for the 4- class solution. Thus, we see that the classes 1 and 2 are basically identical in the 3- and 4-class solutions. To see that Class 3 contains both unstable classes from the 4-class solution: Click on Standard Classification Output for the 3-class model. Scroll down to identify the cases for which Modal = 3 These cases are #112, #126, #135, #207, #225, and #227. (The posterior membership probabilities for five of these cases are shown below). 66

59 Figure 10. Standard Classification Output for the 3-class Model Compare this with the Standard Classification Output for the 4-class model Notice that the 6 cases assigned to class 3 in the 3-class solution have posterior membership probabilities of of being in class 3. These cases are the same as those assigned to classes 3 or class 4 in the 4 class solution, again with posterior probabilities equal to These 6 cases are the same as those identified as outliers in the analyses of these data by Rabe- 67

60 Hesketh and Skrondal and excluded from their study. While strict adherence to the BIC criteria would cause us to conclude that there are more than 4 distinct time trends for these data, for our purposes in this tutorial, we will focus here on the 3 class solution which identifies 3 classes that are of substantive interest. Class 1 shows no change, class 2 shows substantial improvement and class 3 contains a small number of unstable outliers who show a substantial increase at some follow-up time period. From a substantive perspective, we will be interested in determining to what extent treatment with the drug Progabide vs. the Placebo can explain the growth pattern exhibited by class 2 as opposed to that exhibited by class 1. We will now refine the 3-class solution by applying the following restrictions: Restrict the beta for LBASE to 1 Restrict the betas for TIME to 0 for class 1 (note that we might also choose to set alpha to 0 for this class) To apply the restrictions: Double-click on Model 3 Click on Model Tab Select the row of 1s associated with LBASE Right-click to retrieve the restrictions popup menu Select Offset 68

61 Figure 11. Model Tab for 3-class Model The 1 s change to * s. To implement the zero restriction: Select the 1 associated with the Class 1 TIME effect Right-click to retrieve the restrictions popup menu Select No Effect The one changes to the symbol - to indicate that this effect is restricted to 0. 69

62 Figure 12. Implementing the zero restriction Click Estimate Click Parameters to display the parameters output 70

63 Figure 13. Parameters Output for new 3-class Model The estimates are similar to Model 3 except that the restrictions have been incorporated into the model. To examine the treatment effect: Click Profile 71

64 Figure 14. Profile Output These column proportions show that most of those in the No Change class received the placebo while most of those in the Improved class received the drug treatment. From the means associated with AVGBASE we also see that the Unstable group had a substantially larger base rate ( ) than the other classes. To display the corresponding row proportions: Click ProbMeans 72

Figure 15. ProbMeans Output We see that of those who received the drug, 47.32% are in the Improved class compared to only 14.32% of those who received the placebo. Similarly, only 42.

65 Figure 15. ProbMeans Output We see that of those who received the drug, 47.32% are in the Improved class compared to only 14.32% of those who received the placebo. Similarly, only 42.28% showed No change compared to 73.09% of the placebo group. These numbers are in the direction expected if the drug reduced the seizure rate. To rename this model for future use: Click Model 5 to select it Model 5 is now highlighted. To enter the Edit Model Click the highlighted Model 5 Replace Model 5 by typing Final 3-class model 73

66 Figure 16. Renaming Model 5 To save this model for future use: Click Final 3-class model to select it From the File Menu Select Save Definition Click Save The model definition is saved as the file Final 3-class model.lgf 74

Growth models for categorical response variables: standard, latent-class, and hybrid approaches

Growth models for categorical response variables: standard, latent-class, and hybrid approaches Jeroen K. Vermunt Department of Methodology and Statistics, Tilburg University 1 Introduction There are three