A general class of latent variable models for ordinal manifest variables with covariate effects on the manifest and latent variables

Size: px

Start display at page:

Download "A general class of latent variable models for ordinal manifest variables with covariate effects on the manifest and latent variables"

Hector Logan
6 years ago
Views:

1 337 British Journal of Mathematical and Statistical Psychology (2003), 56, The British Psychological Society A general class of latent variable models for ordinal manifest variables with covariate effects on the manifest and latent variables Irini Moustaki* Department of Statistics, Athens University of Economics and Business, Greece Previous work on a general class of multidimensional latent variable models for analysing ordinal manifest variables is extended here to allow for direct covariate effects on the manifest ordinal variables and covariate effects on the latent variables. A full maximum likelihood estimation method is used to estimate all the model parameters simultaneously. Goodness-of- t statistics and standard errors are discussed. Two examples from the 1996 British Social Attitudes Survey are used to illustrate the methodology. 1. Introduction Latent variable analysis of ordinal variables has been discussed in a number of papers by Samejima (1969), Muraki and Carlson (1995) and Moustaki (2000). However, those papers limit their discussion to the case where the relationships among a number of observed ordinal variables can be solely explained by a set of latent variables. In practice, there might be applications where we would like to allow for observed explanatory variables to account, together with the latent variables, for the associations among the ordinal variables and, in addition, we might want to investigate the effect of other explanatory variables on the latent variables in the model. In this paper we extend the work discussed in Moustaki (2000) to allow for covariate effects both on the manifest variables and on the latent variables. The part of the model that accommodates the effect of the latent variables and a set of observed covariates on the manifest variables is called here the measurement model with direct effects (to distinguish it from the measurement model that only allows for latent variables), and the part of the model that links a set of observed covariates with the * Requests for reprints should be addressed to Irini Moustaki, Department of Statistics, Athens University of Economics and Business, 76 Patision Street, Athens , Greece ( moustaki@aueb.gr).

2 338 Irini Moustaki latent variables is called the structural part of the model. Covariates are allowed to affect the manifest variables indirectly through the latent variables or directly. However, there might be situations where we would like to model the effect of a set of covariates on the latent variables and the effect of a different set of covariates directly on the manifest variables. In the applications section, we discuss an example in which we are interested in measuring overall satisfaction (latent variable) with the National Health Service in the respondents area from ve ordinal indicators controlling for the respondents political af liation (observed covariate). In addition, we allow for covariates age and gender to affect the latent construct satisfaction. In the literature there are two main approaches to conducting latent variable analysis. One is the structural equation modelling (SEM) approach which provides a general framework that allows for covariate effects and is supported by commercial software such as LISREL ( JoÈreskog & SoÈrbom, 1993), EQS (Bentler, 1992) and Mplus (MutheÂn & MutheÂn, 2000). The other is the item response theory (IRT) approach. Within the IRT approach Verhelst, Glas, and Verstralen (1994), Zwinderman (1997), and Glas (2001) discussed the Rasch or one-parameter logistic model with covariate effects, and Sammel, Ryan, and Legler (1997) discussed a unidimensional latent trait model for binary and normal outcomes that allow for covariate effects. The methodology we discuss here for ordinal variables is also based on the IRT approach. We aim in this paper to develop a general IRT framework similar to that of SEM. However, the models discussed here do not allow for relationships among the latent variables. A brief description of the SEM approach in tting latent variable models for ordinal variables is given later in this section. In the case where there is a measurement model with no direct effects, the covariate effects on the latent variables can be estimated in one or two stages. In the one-stage approach the parameters of the measurement and the structural part of the model are estimated simultaneously. In the two-stage approach the measurement model is tted rst, and then factor scores (Moustaki & Knott, 2000) are computed and used as dependent variables on further analysis. Croon and Bolck (1997) mention that, in the one-stage approach, it is more dif cult to identify any misspeci cations in either the measurement or the structural part of the model. Also, due to the greater model complexity, it is possible that a local rather than a global solution will be found. However, they found that the two-stage approach based on the use of factor scores as observed variables regressed on a set of explanatory variables leads to biased estimates. JoÈreskog and Goldberger (1975) discussed a multiple indicators and multiple causes (MIMIC) model for normal manifest variables with a single latent variable which allows for direct and indirect effects of covariates on the latent and manifest variables, respectively. In their results it is apparent that parameter estimates of the measurement and the structural models differ between the one- and two-stage methods. They also found that the one-stage method gives more ef cient parameter estimates. MutheÂn (1989) discusses the MIMIC model for other types of manifest variables, such as binary and ordinal, for capturing heterogeneity across groups (groups are de ned through the covariates). He argues that the MIMIC model is a good alternative to multi-group analysis when not enough data are available to estimate a model in each group. The MIMIC model has been developed within the SEM framework. By that we mean that the approach used in the ordinal case for estimating the parameters of the measurement model is based on polychoric correlations estimated by maximum likelihood. In the SEM framework the ordinal variables y are taken to be manifestations of some underlying, continuous, unobserved variables y. Packages such as LISREL and

3 A general class of latent variable models 339 Mplus t the MIMIC model to ordinal manifest variables in two stages. More speci cally, the distribution of the underlying variables y conditional on the vector of observed covariates w is assumed to follow a multivariate normal distribution with a polychoric correlation matrix P. The elements of P are estimated from the bivariate distribution of the y variables. The model parameters of the measurement and structural part of the model for large samples are estimated using weighted least squares. The asymptotic covariance matrix of the polychoric correlations is used as the weight matrix. A comparison between the LISREL-type models for ordinal variables and the models presented here without the covariate effects can be found in Moustaki (2001) and JoÈreskog and Moustaki (2001). In this paper we discuss a general model framework for analysing ordinal manifest variables which allows for covariate effects both on the latent and manifest variables using full maximum likelihood. This approach is distinct from the SEM approach in three ways. First, all the effects (model parameters) are estimated simultaneously. Secondly, there is no need to assume that each ordinal variable is a manifestation of an underlying variable, and therefore no assumptions are needed for those underlying variables. Instead, distribution assumptions are made for the observed ordinal variables. Thirdly, a full maximum likelihood estimation method is used. This approach is based on an extension of the models for ordinal variables discussed by Samejima (1969), Muraki and Carlson (1995), and Moustaki (2000) to allow for covariate effects. 2. Model and estimation Let y 1, y 2,..., y p be the ordinal observed variables. Lower-case letters are used to denote both the variables and the values that these variables take. Let c i denote the number of categories for the i th variable. The c i ordered categories have probabilities p i1 (z, x), p i2 (z, x),..., p ici (z, x), which are functions of the q 1 vector of latent variables z and the r 1 vector of observed covariates x. The covariates x and the latent variables z affect directly the manifest ordinal variables or, to be more precise, the probability of a response in a speci c category. In addition, we allow the k 1 vector of covariabes w to affect z. Figure 1 shows the relationships that may be modelled using an example of three ordinal variables and three covariates. The graph shows that the three observed ordinal variables y 0 = (y 1, y 2, y 3 ) are indicators of a single latent variable z 1. The latent variable z 1 and the observed covariate x 1 account for the associations among the y variables. The direct arrow from x 1 to y 1 indicates that the mean level (here the thresholds) for variable y 1 is allowed to be different for different values of the x 1 variable. Finally, variables w 0 = (w 1, w 2 ) have an effect on the latent variable z 1. For example, if w 1 is a variable with two categories then the direct arrow from w 1 to z 1 indicates that the mean of the latent variable z 1 is allowed to be different across the two groups de ned by the w 1 variable. Note that variable x 1 needs to be different from variables w for identi cation reasons that will be explained later in the paper. As a result, an arrow cannot be added from x 1 to z 1 when there is already an arrow from x 1 going to all the y variables. Both x 1 and w are considered xed, and they may be correlated. Figure 1 shows all the possible relationships that can be modelled. In certain applications some of those variables might not exist. For example, there might be a case where there are only covariates affecting the latent variables or covariates that only affect the observed ordinal indicators.

4 340 Irini Moustaki Figure 1. Path diagram Measurement model with direct effects First, we model the associations among the y variables as explained by the latent variables z and the covariates x. The general form given in Moustaki (2000) for the latent variable model with ordinal variables is extended here to allow for covariate effects: link[g is (z, x)] = link P( y i # s z, x) = t is X q j = 1 a i j z j + Xr l = 1 b il x l, i = 1,..., p; s = 1,..., c i, (1) where g is (z, x) is the cumulative probability of a response in category s or lower to item y i, written as g is (z, x) = p i1 (z, x) + p i2 (z, x) p is (z, x). To simplify our notation, we will suppress the dependence on the latent variables z and the observed covariates x, and just write g is. It follows that the probability of a randomly selected individual giving a response in category s can be derived from the cumulative probabilities as p is = g is g i, s 1, i = 1,..., p; s = 2,..., c i. (2) There are a number of link functions to choose from, such as the logit, the complementary log-log function, the inverse normal function, the inverse Cauchy, and the log-log function. All these link functions are monotonically increasing functions that map (0, 1) onto (, ). The parameters t is are referred to as `cut-points on the logistic, probit or other scale, where t i1 < t i2 <... < t i, c i, t i0 = and t i, c i = +. We see from (1) that the coef cients of the covariates x and the latent variables z shift the cut-points. For example, let us assume that there is only one covariate x 1 ; suppose that it represents gender. If the effect of gender on the ordinal observed variable y i is signi cant given that the latent variables are in the model, then the cut-points will be different for males and females by ˆb i1. In other words, females and males with the same position on the latent variables are allowed to have different cumulative and response probabilities. The a i j parameters can be considered as discrimination parameters or factor loadings since they measure the effect of the latent variables z on some function of the cumulative probability of responding up to a category of the i th item controlling for the effect of the covariates x. In the case of one latent variable the negative sign on the slope parameter is used to indicate that as z increases the response on the observed item y i is more likely to fall at the high end of the scale. The b il are regression coef cients.

5 A general class of latent variable models 341 Figure 2. Response probabilities, t i1 = 3, t i2 = 0, t i3 = 3, a i1 = 1.0. Figure 2 gives the response probabilities p is computed from (2) for a single latent variable without covariate effects, with the logit as a link function. The response probabilities are computed for an item with four categories for threshold parameters t i1 = 3.0, t i2 = 0.0, t i3 = 3.0 and for a discrimination parameter a i1 = 1.0. Figure 2 shows that an individual with a low score on the latent variable z has a high probability of choosing the lowest category (category 1). Individuals with intermediate scores on the latent variable have moderate probabilities of responding to any of the four categories, and individuals with high scores on the latent variable have a high probability of choosing the largest category (category 4). Furthermore, the shape of the response probabilities differs for each category of the item. Categories 1 and 4 have response probability functions that are monotone decreasing and increasing respectively, while categories 2 and 3 have unimodal functions. Any attempt to model these response probabilities directly will fail due to their different shapes. This is the reason why the cumulative probabilities are modelled instead. In addition to the above two properties, Samejima (1969) showed that the ratios t i1 /a i1 and t i3 /a i1 denote the value on the z scale at which the probability that the response will be allocated to category 1 and 4, respectively, is 0.5. This is true only for the rst and the last category of each item. From Fig. 2 we see that for category 1 and 4 the probability is.5 at z = 3/1 and z = 3/1, respectively. Figure 3 gives the cumulative probabilities g is for a single latent variable without covariate effects and for the same parameter values as used for Fig. 2. As we can see, except for g i4, which is a constant function equal to 1, they all have the same inverted s- shape. If we put the model into the generalized linear model framework then the random component of the model is that for which, conditional on the latent variables z and the covariates x, each of the p random response variables y 1,..., y p has a distribution from

6 342 Irini Moustaki Figure 3. Cumulative probabilities, t i1 = 3, t i2 = 0, t i3 = 3, a i1 = 1.0. the exponential family. The systematic component is the one in which z and x produce a linear predictor h is corresponding to each category of y i : h is = t is X q j = 1 a i j z j + Xr l = 1 b il x l, i = 1,..., p; s = 1,..., c i. Finally, the link between the systematic component and the conditional means of the random component distributions is given by h is = v is (m is ), where m is = E( y is z, x) and v is (. ) is the link function, which can be any monotonic differentiable function. Let y = ( y 1, y 2,..., y p ) represent the whole response pattern for a randomly selected individual. The density function f (y x) of the manifest variables y is f (y x) = g(y z, x)h(z w, L) dz, (3) where g(y z, x) is the conditional density function of y given z and x, and h(z w, L) is the density function of z conditional on w and L. The latent variables are assumed to be independent with normal distributions. The matrix of parameters L is de ned later. The covariates x are assumed to be xed. Under the assumption of conditional independence of y with respect z and x, the latent variables z and observed covariates x account for the interrelationships among the observed ordinal variables, so that when the latent variables are held xed the responses to the p observed variables are independent: g(y z, x) = Yp i = 1 g( y i z, x). (4)

7 For a manifest item y i the conditional probability of ( y i z, x) is given by g( y i z, x) = Yc i s = 1 = Yc i s = 1 p is (z, x) y i s (g is g i, s 1) y is, (5) where y is = 1 if the response y i is in category s and y is = 0 otherwise. Equation (5) can be also written as g( y i z, x) = Yci 1 yi g, s yi is g i, s + 1 g, s + 1 y is i s, (6) s = 1 g i, s + 1 g i, s + 1 where y is = 1 if a randomly selected individual s response to item i is in lower and y is = 0 otherwise. If we take the log of (6), we have: log g( y i z, x) = Xci 1 µ g y is g is log y i, s + g s = 1 i, s + 1 g is i, s + 1 log 1 g i, s + 1 g is = Xci 1 s = 1 [ y is v is (z, x) y i, s + 1 b(v is (z, x))]. (7) From (7) we see that each component is in the form of the general expression for the exponential family distribution. More speci cally: and g is v is (z, x) = log g i, s + 1 g i, s + 1 g is, A general class of latent variable models 343 s = 1,..., c i 1, (8) b(v is (z, x)) = log = log{1 + exp(v g i, s + 1 g is (z, x))}, s = 1,..., c i 1. (9) is To simplify the notation we write v is and b(v is ). The parameter v is is not a linear function of the latent variable Structural model As already mentioned in the Introduction, the effect of covariates on latent variables can be measured either in one or two stages. In this paper we are interested in the one-stage approach where the parameters of the measurement model with or without direct effects (1) and the parameters of the structural model (see (10) below) are estimated simultaneously. Let us assume that the latent variables z m for an individual m are related to a set of observed covariates w m in a simple linear manner: z m = Lw m + d m, m = 1,..., n, (10) where z m is q 1 vector, L is a q k matrix of regression coef cients, w is a k 1 vector of covariates, and d m is a q 1 vector of independent standard normal variables. It follows that the distribution of the latent variables z m conditional on the covariates w m is normal with mean Lw m and variance 1. The covariates w are assumed to be xed and non-stochastic. Alternatively, in the two-stage approach, one can compute latent or factor scores

8 344 Irini Moustaki based on the measurement model (1). The latent scores can be used as dependent variables in further analysis with the vector of covariates w. To score the individuals on the latent dimensions identi ed by the analysis one can use the mean of the posterior distribution of the latent variable z j given the individual s response pattern E(z j y m, x m ). In the q th factor model the posterior mean is given by E(z j y m, x m ) =... z j h(z, L y m, x m ) dz, (11) R z1 R zq where R z j denotes the range of values for z j and h(z, L y m, x m ) is the posterior distribution of the latent variables given the observed variables Model identi cation We now discuss a necessary condition for the identi cation of the model presented in Fig. 1. This model is identi ed as long as the set of covariates x is different from the set of covariates w, as will shortly be explained. Furthermore, the latent variable z 1 is assumed to have a normal distribution with variance 1. This speci cation identi es the scale of z 1, which in turn identi es the scale for the item parameters (see (10)). Let us take a simple case where there is only one latent variable z 1 and one covariate x 1. We assume that the same covariate x 1 not only affects some function of the cumulative probability of responding up to a category s for an ordinal item y i through the measurement model but also affects the latent variable z 1 through the structural part of the model. Equation (1) becomes and (10) becomes link[g is (z 1, x 1 )] = t is Substituting (13) into (12) gives a i1 z 1 + b i1 x 1, i = 1,..., p; s = 1,..., c i, (12) z 1 = l 1 x 1 + d 1, i = 1,..., p. (13) link[g is (z 1, x 1 )] = t is a i1 d 1 (a i1 l 1 b i1 )x 1, i = 1,..., p; s = 1,..., c i. (14) From (14) it is apparent that parameters l 1 and b i1 cannot be estimated separately, and therefore these parameters are not identi ed. If, instead, we had used different covariates then (13) would have been written as Substituting (15) into (12), we have z 1 = l 1 w 1 + d 1, i = 1,..., p. (15) link[g is (z 1, x 1 )] = t is a i1 d 1 + g i1 w 1 + b i1 x 1, i = 1,..., p; s = 1,..., c i, (16) where g i1 = a i1 l 1. Equation (16) is a measurement model with direct effects where the latent variable is represented by the d 1 term and is assumed to have standard normal distribution. The model parameters in (16) are all identi ed even when the covariates w 1 and x 1 are correlated. When substituting the structural part of the model (15) into the measurement model with direct effects (12) we can compute from the reduced form (16) estimates of the effect of the covariate w 1 on the cumulative probabilities of the observed ordinal variables by just multiplying a i1 by l 1. In the general case with more than one latent variable and more than one covariate w, the direct effect of the covariate w l on the cumulative probability of the observed variable y i is computed by the sum P q j = 1 a i j l jl. What we are saying is that (16), obtained by substitution of the structural part of the

9 A general class of latent variable models 345 model into the measurement model with direct effects, is equivalent to (1) when there is no structural part involved. In (1) when there is no structural part the latent variables z have standard normal distributions. In (10) the latent variables are represented by the term d that also has standard normal distribution. However, the two models estimate different numbers of parameters so that the reduced-form parameters for the direct effects obtained from P q j = 1 a i j l j l will not always be close to those obtained when model (1) is used. Despite the fact that models (1) and (16) are equivalent, one might choose one over the other depending on the effects that one is interested in measuring Model estimation The model we have so far discussed consists of two components, the measurement part with the direct effects (1) and the structural part (10). The aim is to estimate all the parameters simultaneously. The estimation method described below is a full maximum likelihood estimation method. This means that the model is tted to the whole response pattern including both the responses to the p ordinal variables and the values of the r covariates. The parameters to be estimated are t, a, û and L. We start by writing down the joint density function of the random variables: f (y, z x, w) = g(y z, x, w)h(z x, w, L). (17) Since y does not depend on w and z does not depend on x, (17) is written as f (y, z x, w) = g(y z, x)h(z w, L). (18) In addition, we assume that the latent variables z and the covariates x account for the associations among the ordinal variables y. The conditional distribution of the y variables given the latent variables and the observed covariates is written as: g(y z, x) = Y p i = 1 g( y i z, x). Using (18), for a random sample of size n the complete log-likelihood is written as: L = Xn = m = 1 X n m = 1 log f (y m, z m x m, w m ) " # X p i = 1 log g( y im z m, x m ) + log h(z m w m, L). (19) Because z is unknown the log-likelihood given in (19) is maximized using an expectation±maximization (EM) algorithm. In the expectation step the expected score function of the model parameters is computed. The expectation is with respect to the posterior distribution of z given the observations (h(z, L y, x)). In the maximization step updated parameter estimates are obtained. The score function is the rst derivative of the log-likelihood with respect to the parameters. The rst term on the right-hand side of (19) denotes the distributions of the observed variables y conditional on the latent variables z and the observed covariates x, and the second term denotes the distribution of z conditional on the observed covariates w.

10 346 Irini Moustaki Estimation of L From (19) we see that the estimation of the parameters contained in the matrix L does not depend on the rst component of the complete log-likelihood. Therefore, estimation of L can be done separately from the rest of the parameters (t, a, and û). In addition, the latent variables are assumed to be independent conditional on w, so that h(z w, L) = h(z 1 w, l 1 )... h(z q w, l q ), where l j is the j th row of the L matrix. The expected score function with respect to the parameter vector l j, j = 1,..., q, takes the form ES m (l j ) =... S m (l j )h(z, L y m, x m ) dz, (20) where h(z, L y m, x m ) denotes the posterior distribution of the latent variables given what has been observed, and S m (l j ) = log h(z j w m, l j ) l j = w m (z j Equation (20) becomes: ES m ( l j ) =... w m (z j w 0 m l j ), j = 1,..., q. w 0 m l j )h(z, L y m, x m ) dz (21) Solving P n m = 1 ES m (l j ) = 0 and approximating the integrals over z by a weighted summation over a nite number of points and weights, we get an explicit solution for the maximum likelihood estimator of l j : P nm P = 1 w n1 m ˆl... P n q t 1 =1 t q = 1 j = z t j h(z t1,..., z t q, L y m, x m ) P nm = 1 w m wm 0, (22) where h(z t 1,..., z t q, L y m, x m ) = g(y m z t 1... z tq, x m )h(z t1 w m, l 1 )... h(z tq w m, l q ). f (y m, x m ) The points for the integral approximations are the Gauss±Hermite quadrature points given in Stroud and Sechrest (1966). This approximation in effect treats the latent variables as discrete with values z t1,..., z tq and their corresponding probabilities h(z t1 w, l 1 ),..., h(z tq w, l q ). This equation is updated at each step of the EM algorithm described in Section Estimation of the model parameters t, a and û The estimation of the parameters t, a and û depends on the rst component of (19). Let ai 0 = (t i1,..., t i, ci 1, a i1,..., a iq, b i1,..., b ir ), i = 1,..., p, where ai 0 is a vector of parameters. The expected score function of the parameter vector a i, where the expectation is taken with respect to h(z, L y, x), is ES m (a i ) =... S m (a i )h(z, L y m, x m ) dz, m = 1,..., n, (23) where S m (a i ) = log g(y m z, x m ) a i, i = 1,..., p.

11 Now log g(y m z, x m ) = Xc i a i Substitute (24) into (23): c Xi ES m (a i ) =... t 1 =1 t q =1 s = 1 m = 1 1 s = 1 1 s = 1 [ y ism v 0 ism [ y ism v 0 ism y i, s + 1, m b 0 (v ism )]. (24) y i, s + 1, m b 0 (v ism )]h(z, L y m, x m ) dz. (25) Solving P n m= 1 ES m (a i ) = 0 and approximating the integral with Gauss±Hermite quadrature points, we get non-explicit solutions for the parameter vector a i : X " n 1... Xn q Xc i 1 X n v y ism X # n b(v ism y i sm ) a i i, s + 1, m h(z t1... z tq, L y m, x m ). a i Expression (26) is written as where X n 1 t 1=1... Xn q r i, s, t1,..., t q = Xn m = 1 r i, s + 1, t1,..., t q = Xn m = 1 Xc i t q=1 s = 1 m = 1 1 [r i, s, t1,..., t q (26) r i, s + 1, t1,...,t q ], (27) h(z t1,..., z t q, L y m, x m )y ism v ism a i (28) h(z t1,..., z tq, L y m, x m )y i, s + 1, m b(v ism ) a i. (29) From the above results we can see that to compute the derivatives with respect to the model parameters for any link function we need to nd the rst derivatives of the functions v ism and b(v ism ) with respect to the model parameters. The maximization of the log-likelihood is done by an EM algorithm. The model without covariate effects has v ism and b(v ism ) functions not depending on the individual m EM algorithm The steps of the EM algorithm are de ned as follows: A general class of latent variable models 347 (1) Choose initial estimates for the model parameters t is, a i j, b il and l jn, where i = 1,..., p; s = 1,..., c i 1; l = 1,..., r; j = 1,..., q; n = 1,..., k. (2) Compute the values r i, s, t1,...,t q and r i, s + 1, t1,...,t q (E-step). (3) Obtain improved estimates for the parameters by solving the non-linear maximum likelihood equations for the parameters t is, a i j, b il and explicit solutions for the parameters l jn of the latent distribution (M-step). (4) Return to step 2 and continue until convergence is attained. At the M-step a one-step Fisher scoring algorithm is used to solve the non-linear maximum likelihood equations Sampling properties of the maximum likelihood estimates From the rst-order asymptotic theory the maximum likelihood estimates have a sampling distribution which is asymptotically normal. Asymptotically the sampling variances and covariances of the maximum likelihood estimates of the model parameters

12 348 Irini Moustaki are given by the elements of the inverse of the information matrix at the maximum likelihood solution. The standard errors given in the examples have been computed from an approximation of the inverse of the information matrix evaluated at the maximum likelihood solution. If we denote by g the set of all model parameters then an approximation of the information matrix is given by ( ) I(ĝ) = Xn 1 1 f (y m, z m x m w m ) f (y m, z m x m w m ) f (y m, z m x m w m ) 2. g j g k m = 1 g = ĝ 2.6. Proportional odds model The general measurement model with direct effects presented in (1) takes different forms depending on the link function used. There are many link functions to choose from such as the logit, the complementary log-log function, the inverse normal function, the inverse Cauchy, and the log-log function. The logit and the inverse normal function, also known as the probit, are the link functions most often used in practice. The probit and logit link functions have very similar shapes and therefore give similar results. Here, we discuss the logit link in more detail since it is the one used in Section 3. When the logit link is used in (1), the model is known as the proportional odds model and is written as µ g is (z, x) X q log = t 1 g is (z, is a x) i j z j + Xr b il x l, (30) where s = 1,..., c i 1; i = 1,..., p. From (30) we obtain: g is = P( y i # s z, x) = exp(t P q is j = 1 a i j z j + P r l = 1 b il x l ) P 1 + q exp(t is j = 1 a i j z j + P r l = 1 b il x l ), (31) where s = 1, 2,..., c i 1 and g imi = 1. Let ai 0 = (a i1,..., a iq, b i1,..., b ir ) and v 0 = (z, x). Then for two individuals with scores v 1 and v 2 the difference between two corresponding logits is a 0 (v 2 v 1 ) and does not depend on the category involved. The derivatives required in (26) for the proportional odds model are given in the Appendix. Models tted to ordinal items should preserve the ordinality property of the items. Models should be invariant when just a reversal of categories occur but not when the categories are arbitrarily permuted. Models such as the proportional odds model, probit, and inverse Cauchy are affected by an arbitrary permutation of the response categories, but not when only a reversal of category order occurs. Under those circumstances there is only a change in the sign of the regression and latent coef cients and a change in sign and order for the threshold parameters. j = 1 l = Goodness of t The goodness of t of the model can be theoretically checked by computing a Pearson chi-square or a likelihood ratio statistic from the whole response pattern. When the number of manifest ordinal variables is large it is expected that many response patterns will have expected frequency less than 5 and many will be so small that they will not occur at all. So from the practical point of view these tests cannot be used.

13 A general class of latent variable models 349 Alternatively we can compute the Pearson chi-square statistic or likelihood ratio statistic only for pairs and triplets of responses. The pairwise distribution of any two variables can be displayed as a two-way contingency table, and chi-square residuals can be constructed in the usual way by comparing the observed and expected frequencies. As a rule of thumb, if we consider the residual in each cell as having a x 2 distribution with one degree of freedom, then a value of the residual greater than 4 is indicative of poor t at the 5% signi cance level. A study of the individual margins provides information about where the model does not t. A detailed discussion on the use of these goodness-of- t measures for ordinal variables can be found in JoÈreskog and Moustaki (2001) and Bartholomew, Steele, Moustaki, and Galbraith (2002, pp. 213± 234). However, for the model with covariate effects the Pearson chi-square statistic or likelihood ratio statistic for pairs and triplets of responses has to be computed for different values of the explanatory variables. This will eventually make the use of these residuals less informative with respect to goodness of t. Alternatively, instead of testing the goodness of t of a speci ed model, we could use a criterion for selecting among a set of different models. This procedure gives information about the goodness of t for each model in comparison with other models. This can be useful for determining the number of factors required or for comparing the model with latent variables and covariate effects with the model with only latent variables. Sclove (1987) gives a review of some of the model selection criteria used in multivariate analysis, such as those due to Akaike, Schwarz and Kashyap. These criteria take into account the value of the likelihood at the maximum likelihood solution and the number of parameters estimated. Akaike s criterion for the determination of the order of an autoregressive model in time series has also been used for the determination of the number of factors in factor analysis (see Akaike, 1987): AIC = 2[log l(â)] + 2m, (32) where l(â) is the maximized likelihood function, m is the number of model parameters and â is a vector with all model parameters (measurement and structural). The model with the smallest AIC value is taken to be the best. In this paper we also use an information complexity criterion proposed by Bozdogan (2000). The criterion is de ned as ICOMP = 2[log l(â)] + 2C 1 ( ˆF 1 (â)), (33) where C 1 denotes the maximal information complexity of ˆF 1 (â), which is the estimated inverse Fisher information matrix. 3. Applications In this section we use the proportional odds model, with a logit link function, to analyse two data sets from the 1996 British Social Attitudes Survey (BSA) Example 1 The rst data set consists of ve ordinal manifest variables ( y 1,..., y 5 ), measuring attitudes to the role of government. Respondents were asked whether, on the whole, 1 Social and Community Planning Research, British Social Attitudes Survey, 1996 {computer le}, Colchester, Essex: The Data Archive {distributor}, 2 December SN: 3921.

14 350 Irini Moustaki they thought it should or not be the government s responsibility to: provide a job for everyone who wants one [ JobEvery] keep prices under control [PriCon] provide a decent standard of living for the unemployed [LivUnem] reduce income differences between the rich and the poor [IncDiff ] provide decent housing for those who can t afford it [Housing] The response alternatives given to the respondents were: de nitely should be, probably should be, probably should not be, and de nitely should not be. Item non-response varied between 2% and 6%. After excluding the missing values, we were left with 822 respondents. Missing values can be incorporated into the latent variable analysis (see O Muircheartaigh and Moustaki, 1999). A covariate x constructed to measure left to right political identi cation was used, after standardization, as a continuous explanatory variable for the manifest ordinal variables. The `left±right variable is available in the 1996 BSA survey; it was constructed from a set of ve items related to redistribution and equality. The variable is usually used for distinguishing party identi cation (see Heath, Jowell, Curtice, & Witherspoon, 1986). We started the analysis by tting the measurement model with no direct effects (equation (30) with no x variables). The estimated thresholds ˆt is and factor loadings â i1 with estimated standard errors are given in Tables 1 and 2, respectively. The fourth column of Table 2 gives standardized factor loadings stâ i1. These express correlations between the manifest variable y i and the latent variable z j. For details on how to compute standardized loadings, see Bartholomew and Knott (1999). Items 3, 4 and 5 have the highest power of discrimination, followed by items 1 and 2. Their positive sign indicate that the more an individual believes that the state should not be responsible for its citizens the less likely it is for that individual to choose the lower categories of the ordinal variables. Table 1. Estimated thresholds and standard errors for the measurement model with no direct effects, Example 1 Item Category ˆt is s.e. JobEvery Pricon LivUnem IncDiff Housing

15 A general class of latent variable models 351 Table 2. Estimated factor loadings, standard errors and standardized factor loadings for the measurement model with no direct effects, Example 1 Item â i1 s.e. st â i1 JobEvery Pricon LivUnem IncDiff Housing Table 3 gives the pair of categories where the chi-square residuals were greater than 4 for the one-factor measurement model with no direct effects. For example, a bad t was detected for category 1 for item 1 and category 1 for item 2. These residuals are not independent and therefore cannot be summed to give an overall goodness-of- t measure. Rather, they indicate pair of items and categories that cannot be tted by the model. They provide information for collapsing categories and for omitting items from the analysis to improve t ( JoÈreskog and Moustaki, 2001). Table 3. Chi-square residuals greater than 4 for two-way margins, Example 1 Item (1, 1), (1, 2), (2, 2) (1, 2), (1, 4) (2, 4) (4, 1) (2, 4), (3, 1), (4, 2) (4, 4) 2 (1, 4), (4, 1) (3, 3) (1, 3), (4, 1) 3 (4, 1), (4, 3) (1, 2), (1, 4), (2, 2) (3, 3), (4, 1), (4, 2) 4 (3, 4), (4, 1) We continued by allowing the `left±right variable to affect the manifest variables directly. We wished to see whether the latent variable z together with the covariate x could explain better the associations among the observed ordinal variables. The maximum likelihood estimates of the threshold parameters are given in Table 4 and the factor loadings and regression parameters are given in Table 5. The estimated factor loadings are all positive and of similar magnitude to the loadings obtained when the measurement model without direct effects was tted. The estimated regression coef cients, taking into account their standard errors, were found to be signi cant. This means that, depending on the individual s position on the `left±right scale, the thresholds for each item y i will be shifted by ˆb i. In addition, the negative sign of the regression coef cients shows that the more right wing an individual is the lower the probability of being in the low-level categories of the ordinal observed variables. Table 6 gives the AIC and ICOMP criteria for the model with and without the covariate. Both criteria suggest that the model with the covariate effect is a better t than the one without the covariate effect on the manifest variables.

16 352 Irini Moustaki Table 4. Estimated thresholds and standard errors for the measurement model with direct effects, Example 1 Item Category ˆt is s.e. JobEvery Pricon LivUnem IncDiff Housing Table 5. Estimated factor loadings, regression parameters and standard errors for the measurement model with direct effects, Example 1 Item â i1 s.e. ˆ b i1 s.e. JobEvery Pricon LivUnem IncDiff Housing Table 6. Model selection criteria, Example 1 Model with no covariate Model with covariate AIC ICOMP Example 2 The second application is also from the 1996 British Social Attitudes Survey. Five ordinal manifest variables were selected for the analysis. The items measure satisfaction with the National Health Service in the respondents area, and more speci cally with the following services provided by general practitioners (GPs): GP s appointment systems [Appointment] Amount of time GP gives to each patient [AmountTime]

17 Being able to choose which GP to see [ChooseGP] Quality of medical treatment by GPs [Quality] Waiting areas at GPs surgeries [WaitingArea] A general class of latent variable models 353 The response alternatives given to the respondents were: in need of a lot of improvement, in need of some improvement, satisfactory, and very good. Item non-response varied between 1.5% and 2.5%. After excluding the missing values, we were left with 841 respondents. In the analysis we were interested in measuring overall satisfaction with GPs from the ve ordinal manifest variables, controlling for respondents political identi cation (measured by an observed covariate with four categories: Conservative, Labour, Liberal Democrat, and other). We also wished to measure the effect of gender and age on the latent variable satisfaction. Age is given in four categories: 18±25, 26±44, 45±64, and 65+. Gender and age are treated as dummy variables; the categories male and 18±25 are taken to be the respective reference categories. Allowing gender and age to affect the latent variable z (and not the manifest variables y directly) implies that all differences in the thresholds of the y variables across different groups de ned by gender and age are expressed through mean differences in the common factor z. First, we tted the one-factor model to the ve ordinal manifest variables without allowing for any covariate effects. The fourth column of Table 7 gives the estimated standardized factor loadings (st â i1 ). These are all positive and of similar magnitude, indicating that the ve ordinal items measure a single factor and all have more or less the same power of discrimination. Their positive signs indicate that the more satis ed an individual is with the National Health Service in his/her area the less likely he/she is to choose the lower categories of the ordinal variables. Table 7. Estimated factor loadings with standard errors and standardized factor loadings for the measurement model without direct effects, Example 2 Item â i1 s.e. st â i1 Appointment AmountTime ChooseGP Quality WaitingArea Table 8 gives pairs of items and categories for which the chi-square residuals computed for those combinations of items and categories are greater than 4. There are a substantial number of pairwise associations that cannot be explained by the model, and therefore a two-factor model might be proven to be a better t. Here, instead of tting a two-factor model we introduce the effects of covariates both on the manifest items and on the latent variable. We continued by tting the one-factor model that allows for covariate effects. The maximum likelihood estimates of the thresholds parameters are given in Table 9 and the factor loadings and regression parameters are given in Table 10. The effects of the covariates age and gender on the latent variables are given in Table 11. The estimated factor loadings all remain positive and of similar magnitude to those

18 354 Irini Moustaki Table 8. Chi-square residuals greater than 4 for two-way margins, Example 2 Item (1, 4), (3, 4) (3, 4), (4, 3) (1, 2), (1, 4) (4, 3) (4, 4) (2, 4), (3, 4) 2 (2, 4), (3, 4) (1, 2), (1, 4) (4, 3), (4, 2) 3 (1, 2), (1, 4) 4 (2, 4), (3, 3) (3, 4) Table 9. Estimated thresholds and standard errors for the measurement model with direct effects, Example 2 Item Category ˆt is s.e. Appointment AmountTime ChooseGP Quality WaitingArea Table 10. Estimated factor loadings, regression parameters and standard errors for the measurement model with direct effects, Example 2 b ˆ i1 ˆbi2 ˆbi3 Item â i1 s.e. Labour s.e. Liberal s.e. Other s.e. Appointment AmountTime ChooseGP Quality WaitingArea

19 Table 11. Estimated structural parameters and standard errors, Example 2 A general class of latent variable models 355 ˆ l l s.e. Constant Female obtained from the one-factor model without covariate effects (see Table 7). The small changes in the values of the estimated factor loadings are an indication of item factorial invariance within the groups de ned by the covariates. The direct effects of the political party covariate on the manifest ordinal variables are similar, with the exception of variable 3 (ChooseGP). Respondents who tend to vote Labour are more likely to express dissatisfaction with each one of the ve ordinal items than those who tend to vote Conservative. The Conservative party category is used as a reference category. Finally, from Table 11 we see that gender has no effect on overall satisfaction with the National Health Service, but that as respondents age increases so does their satisfaction with the Health Service compared with the 18±25 age group. The AIC criterion for the model without the covariates is , and for the model with the covariates is We conclude that the model with the covariate effects is a better t than the one without. 4. Conclusion This paper attempts to generalize the item response theory models to allow for covariate effects both on the manifest and on the latent variables. We have shown that the IRT framework can be extended to cover models often tted within the SEM framework. The IRT approach for the analysis of ordinal variables with covariates is a full maximum likelihood method that does not require the use of underlying variables and the estimation of polychoric correlations as SEM does. Furthermore, for obtaining correct standard errors and goodness-of- t tests in SEM, we need to obtain the asymptotic covariance matrix of the polychoric correlations, which requires large samples. Problems might also arise in the SEM framework when the assumption of bivariate normality for the underlying variables does not hold. The models presented here were tted using GENLAT 1.1 (Moustaki, 2002). GENLAT 1.1 uses an EM algorithm to maximize the log-likelihood. The convergence of the EM algorithm slows down as the number of factors increases, and this is considered the main drawback of the framework presented here. On the other hand, the SEM approach, which is based on the concept of tting a factor model on the polychoric correlation matrix, does not face a computational burden related to the number of factors tted. The EM algorithm has been found to be robust with respect to the initial values used. The program GENLAT can t up to two factors to binary, nominal, ordinal, metric manifest items and can also handle the simultaneous analysis of items with different distributions. As an alternative, the STATA routine GLLAMM (Rabe-Hesketh, Pickles, & Skrondal, 2001) ts latent variable models to ordinal items using the Newton±Raphson algorithm with adaptive quadrature instead of the EM algorithm.

20 356 Irini Moustaki Acknowledgements The author would like to thank the two anonymous referees for their constructive comments in improving the structure and clarity of this paper. References Akaike, H. (1987). Factor analysis and AIC. Psychometrika, 52, 317±332. Bartholomew, D., Steele, F., Moustaki, I., & Galbraith, J. (2002). The analysis and interpretation of multivariate data for social scientists. Boca Raton, FL: Chapman & Hall/CRC. Bartholomew, D. J., & Knott, M. (1999). Latent variable models and factor analysis (2nd ed.). London: Arnold. Bentler, P. M. (1992). EQS: Structural equations program manual. Los Angeles: BMDP Statistical Software. Bozdogan, H. (2000). Akaike s information criterion and recent developments in information complexity. Journal of Mathematical Psychology, 44, 62±91. Croon, M., & Bolck, A. (1997). On the use of factor scores in structural equations models (Technical Report /7). Tilburg: Tilburg University, Work and Organization Research Centre. Glas, C. (2001). Differential item functioning depending on general covariates. In A. Boomsma, M. A. J. van Duijn, & T. A. B. Snijders (Eds), Essays on item response theory (pp. 131±145). New York: Springer-Verlag. Heath, A., Jowell, R., Curtice, J., & Witherspoon, S. (1986). End of award report to the ESRC: Methodological aspects of attitude research. London: SCPR. JoÈreskog, K. G., & Goldberger, A. S. (1975). Estimation of a model with multiple indicators and multiple causes of a single latent variable. Journal of the American Statistical Association, 70, 631±639. JoÈreskog, K. G., & Moustaki, I. (2001). Factor analysis of ordinal variables: A comparison of three approaches. Multivariate Behavioral Research, 36, 347±387. JoÈreskog, K. G., & SoÈrbom, D. (1993). LISREL 8 user s reference guide. Chicago: Scienti c Software International. Moustaki, I. (2000). A latent variable model for ordinal variables. Applied Psychological Measurement, 24, 211±223. Moustaki, I. (2001). A review of exploratory factor analysis for ordinal categorical data. In R. Cudeck, S. du Toit, and D. SoÈrbom (Eds), Structural equation modeling: Present and future. A festschrift in honor of Karl J. JoÈreskog. Chicago: Scienti c Software International. Moustaki, I. (2002). GENLAT 1.1: A computer program for tting a one- or two-factor latent variable model to categorical, metric and mixed observed items with missing values (Technical Report). London: London School of Economics and Political Science, Statistics Department. Moustaki, I., & Knott, M. (2000). Generalized latent trait models. Psychometrika, 65, 391±411. Muraki, E., & Carlson, E. (1995). Full-information factor analysis for polytomous item responses. Applied Psychological Measurement, 19, 73±90. MutheÂn, B. O. (1989). Latent variable modeling in heterogeneous populations. Psychometrika, 54, 557±585. MutheÂn, B. O., & MutheÂn, L. (2000). Mplus user s guide. Los Angeles: MutheÂn & MutheÂn. O Muircheartaigh, C., & Moustaki, I. (1999). Symmetric pattern models: A latent variable approach to item non-response in attitude scales. Journal of the Royal Statistical Society, Series A, 162, 177±194. Rabe-Hesketh, S., Pickles, A., & Skrondal, A. (2001). GLLAMM manual (Technical Report 2001/01). London: King s College, Institute of Psychiatry, Department of Biostatistics and Computing.

GENERALIZED LATENT TRAIT MODELS. 1. Introduction

PSYCHOMETRIKA VOL. 65, NO. 3, 391 411 SEPTEMBER 2000 GENERALIZED LATENT TRAIT MODELS IRINI MOUSTAKI AND MARTIN KNOTT LONDON SCHOOL OF ECONOMICS AND POLITICAL SCIENCE In this paper we discuss a general