A general class of latent variable models for ordinal manifest variables with covariate effects on the manifest and latent variables
|
|
- Hector Logan
- 6 years ago
- Views:
Transcription
1 337 British Journal of Mathematical and Statistical Psychology (2003), 56, The British Psychological Society A general class of latent variable models for ordinal manifest variables with covariate effects on the manifest and latent variables Irini Moustaki* Department of Statistics, Athens University of Economics and Business, Greece Previous work on a general class of multidimensional latent variable models for analysing ordinal manifest variables is extended here to allow for direct covariate effects on the manifest ordinal variables and covariate effects on the latent variables. A full maximum likelihood estimation method is used to estimate all the model parameters simultaneously. Goodness-of- t statistics and standard errors are discussed. Two examples from the 1996 British Social Attitudes Survey are used to illustrate the methodology. 1. Introduction Latent variable analysis of ordinal variables has been discussed in a number of papers by Samejima (1969), Muraki and Carlson (1995) and Moustaki (2000). However, those papers limit their discussion to the case where the relationships among a number of observed ordinal variables can be solely explained by a set of latent variables. In practice, there might be applications where we would like to allow for observed explanatory variables to account, together with the latent variables, for the associations among the ordinal variables and, in addition, we might want to investigate the effect of other explanatory variables on the latent variables in the model. In this paper we extend the work discussed in Moustaki (2000) to allow for covariate effects both on the manifest variables and on the latent variables. The part of the model that accommodates the effect of the latent variables and a set of observed covariates on the manifest variables is called here the measurement model with direct effects (to distinguish it from the measurement model that only allows for latent variables), and the part of the model that links a set of observed covariates with the * Requests for reprints should be addressed to Irini Moustaki, Department of Statistics, Athens University of Economics and Business, 76 Patision Street, Athens , Greece ( moustaki@aueb.gr).
2 338 Irini Moustaki latent variables is called the structural part of the model. Covariates are allowed to affect the manifest variables indirectly through the latent variables or directly. However, there might be situations where we would like to model the effect of a set of covariates on the latent variables and the effect of a different set of covariates directly on the manifest variables. In the applications section, we discuss an example in which we are interested in measuring overall satisfaction (latent variable) with the National Health Service in the respondents area from ve ordinal indicators controlling for the respondents political af liation (observed covariate). In addition, we allow for covariates age and gender to affect the latent construct satisfaction. In the literature there are two main approaches to conducting latent variable analysis. One is the structural equation modelling (SEM) approach which provides a general framework that allows for covariate effects and is supported by commercial software such as LISREL ( JoÈreskog & SoÈrbom, 1993), EQS (Bentler, 1992) and Mplus (MutheÂn & MutheÂn, 2000). The other is the item response theory (IRT) approach. Within the IRT approach Verhelst, Glas, and Verstralen (1994), Zwinderman (1997), and Glas (2001) discussed the Rasch or one-parameter logistic model with covariate effects, and Sammel, Ryan, and Legler (1997) discussed a unidimensional latent trait model for binary and normal outcomes that allow for covariate effects. The methodology we discuss here for ordinal variables is also based on the IRT approach. We aim in this paper to develop a general IRT framework similar to that of SEM. However, the models discussed here do not allow for relationships among the latent variables. A brief description of the SEM approach in tting latent variable models for ordinal variables is given later in this section. In the case where there is a measurement model with no direct effects, the covariate effects on the latent variables can be estimated in one or two stages. In the one-stage approach the parameters of the measurement and the structural part of the model are estimated simultaneously. In the two-stage approach the measurement model is tted rst, and then factor scores (Moustaki & Knott, 2000) are computed and used as dependent variables on further analysis. Croon and Bolck (1997) mention that, in the one-stage approach, it is more dif cult to identify any misspeci cations in either the measurement or the structural part of the model. Also, due to the greater model complexity, it is possible that a local rather than a global solution will be found. However, they found that the two-stage approach based on the use of factor scores as observed variables regressed on a set of explanatory variables leads to biased estimates. JoÈreskog and Goldberger (1975) discussed a multiple indicators and multiple causes (MIMIC) model for normal manifest variables with a single latent variable which allows for direct and indirect effects of covariates on the latent and manifest variables, respectively. In their results it is apparent that parameter estimates of the measurement and the structural models differ between the one- and two-stage methods. They also found that the one-stage method gives more ef cient parameter estimates. MutheÂn (1989) discusses the MIMIC model for other types of manifest variables, such as binary and ordinal, for capturing heterogeneity across groups (groups are de ned through the covariates). He argues that the MIMIC model is a good alternative to multi-group analysis when not enough data are available to estimate a model in each group. The MIMIC model has been developed within the SEM framework. By that we mean that the approach used in the ordinal case for estimating the parameters of the measurement model is based on polychoric correlations estimated by maximum likelihood. In the SEM framework the ordinal variables y are taken to be manifestations of some underlying, continuous, unobserved variables y. Packages such as LISREL and
3 A general class of latent variable models 339 Mplus t the MIMIC model to ordinal manifest variables in two stages. More speci cally, the distribution of the underlying variables y conditional on the vector of observed covariates w is assumed to follow a multivariate normal distribution with a polychoric correlation matrix P. The elements of P are estimated from the bivariate distribution of the y variables. The model parameters of the measurement and structural part of the model for large samples are estimated using weighted least squares. The asymptotic covariance matrix of the polychoric correlations is used as the weight matrix. A comparison between the LISREL-type models for ordinal variables and the models presented here without the covariate effects can be found in Moustaki (2001) and JoÈreskog and Moustaki (2001). In this paper we discuss a general model framework for analysing ordinal manifest variables which allows for covariate effects both on the latent and manifest variables using full maximum likelihood. This approach is distinct from the SEM approach in three ways. First, all the effects (model parameters) are estimated simultaneously. Secondly, there is no need to assume that each ordinal variable is a manifestation of an underlying variable, and therefore no assumptions are needed for those underlying variables. Instead, distribution assumptions are made for the observed ordinal variables. Thirdly, a full maximum likelihood estimation method is used. This approach is based on an extension of the models for ordinal variables discussed by Samejima (1969), Muraki and Carlson (1995), and Moustaki (2000) to allow for covariate effects. 2. Model and estimation Let y 1, y 2,..., y p be the ordinal observed variables. Lower-case letters are used to denote both the variables and the values that these variables take. Let c i denote the number of categories for the i th variable. The c i ordered categories have probabilities p i1 (z, x), p i2 (z, x),..., p ici (z, x), which are functions of the q 1 vector of latent variables z and the r 1 vector of observed covariates x. The covariates x and the latent variables z affect directly the manifest ordinal variables or, to be more precise, the probability of a response in a speci c category. In addition, we allow the k 1 vector of covariabes w to affect z. Figure 1 shows the relationships that may be modelled using an example of three ordinal variables and three covariates. The graph shows that the three observed ordinal variables y 0 = (y 1, y 2, y 3 ) are indicators of a single latent variable z 1. The latent variable z 1 and the observed covariate x 1 account for the associations among the y variables. The direct arrow from x 1 to y 1 indicates that the mean level (here the thresholds) for variable y 1 is allowed to be different for different values of the x 1 variable. Finally, variables w 0 = (w 1, w 2 ) have an effect on the latent variable z 1. For example, if w 1 is a variable with two categories then the direct arrow from w 1 to z 1 indicates that the mean of the latent variable z 1 is allowed to be different across the two groups de ned by the w 1 variable. Note that variable x 1 needs to be different from variables w for identi cation reasons that will be explained later in the paper. As a result, an arrow cannot be added from x 1 to z 1 when there is already an arrow from x 1 going to all the y variables. Both x 1 and w are considered xed, and they may be correlated. Figure 1 shows all the possible relationships that can be modelled. In certain applications some of those variables might not exist. For example, there might be a case where there are only covariates affecting the latent variables or covariates that only affect the observed ordinal indicators.
4 340 Irini Moustaki Figure 1. Path diagram Measurement model with direct effects First, we model the associations among the y variables as explained by the latent variables z and the covariates x. The general form given in Moustaki (2000) for the latent variable model with ordinal variables is extended here to allow for covariate effects: link[g is (z, x)] = link P( y i # s z, x) = t is X q j = 1 a i j z j + Xr l = 1 b il x l, i = 1,..., p; s = 1,..., c i, (1) where g is (z, x) is the cumulative probability of a response in category s or lower to item y i, written as g is (z, x) = p i1 (z, x) + p i2 (z, x) p is (z, x). To simplify our notation, we will suppress the dependence on the latent variables z and the observed covariates x, and just write g is. It follows that the probability of a randomly selected individual giving a response in category s can be derived from the cumulative probabilities as p is = g is g i, s 1, i = 1,..., p; s = 2,..., c i. (2) There are a number of link functions to choose from, such as the logit, the complementary log-log function, the inverse normal function, the inverse Cauchy, and the log-log function. All these link functions are monotonically increasing functions that map (0, 1) onto (, ). The parameters t is are referred to as `cut-points on the logistic, probit or other scale, where t i1 < t i2 <... < t i, c i, t i0 = and t i, c i = +. We see from (1) that the coef cients of the covariates x and the latent variables z shift the cut-points. For example, let us assume that there is only one covariate x 1 ; suppose that it represents gender. If the effect of gender on the ordinal observed variable y i is signi cant given that the latent variables are in the model, then the cut-points will be different for males and females by ˆb i1. In other words, females and males with the same position on the latent variables are allowed to have different cumulative and response probabilities. The a i j parameters can be considered as discrimination parameters or factor loadings since they measure the effect of the latent variables z on some function of the cumulative probability of responding up to a category of the i th item controlling for the effect of the covariates x. In the case of one latent variable the negative sign on the slope parameter is used to indicate that as z increases the response on the observed item y i is more likely to fall at the high end of the scale. The b il are regression coef cients.
5 A general class of latent variable models 341 Figure 2. Response probabilities, t i1 = 3, t i2 = 0, t i3 = 3, a i1 = 1.0. Figure 2 gives the response probabilities p is computed from (2) for a single latent variable without covariate effects, with the logit as a link function. The response probabilities are computed for an item with four categories for threshold parameters t i1 = 3.0, t i2 = 0.0, t i3 = 3.0 and for a discrimination parameter a i1 = 1.0. Figure 2 shows that an individual with a low score on the latent variable z has a high probability of choosing the lowest category (category 1). Individuals with intermediate scores on the latent variable have moderate probabilities of responding to any of the four categories, and individuals with high scores on the latent variable have a high probability of choosing the largest category (category 4). Furthermore, the shape of the response probabilities differs for each category of the item. Categories 1 and 4 have response probability functions that are monotone decreasing and increasing respectively, while categories 2 and 3 have unimodal functions. Any attempt to model these response probabilities directly will fail due to their different shapes. This is the reason why the cumulative probabilities are modelled instead. In addition to the above two properties, Samejima (1969) showed that the ratios t i1 /a i1 and t i3 /a i1 denote the value on the z scale at which the probability that the response will be allocated to category 1 and 4, respectively, is 0.5. This is true only for the rst and the last category of each item. From Fig. 2 we see that for category 1 and 4 the probability is.5 at z = 3/1 and z = 3/1, respectively. Figure 3 gives the cumulative probabilities g is for a single latent variable without covariate effects and for the same parameter values as used for Fig. 2. As we can see, except for g i4, which is a constant function equal to 1, they all have the same inverted s- shape. If we put the model into the generalized linear model framework then the random component of the model is that for which, conditional on the latent variables z and the covariates x, each of the p random response variables y 1,..., y p has a distribution from
6 342 Irini Moustaki Figure 3. Cumulative probabilities, t i1 = 3, t i2 = 0, t i3 = 3, a i1 = 1.0. the exponential family. The systematic component is the one in which z and x produce a linear predictor h is corresponding to each category of y i : h is = t is X q j = 1 a i j z j + Xr l = 1 b il x l, i = 1,..., p; s = 1,..., c i. Finally, the link between the systematic component and the conditional means of the random component distributions is given by h is = v is (m is ), where m is = E( y is z, x) and v is (. ) is the link function, which can be any monotonic differentiable function. Let y = ( y 1, y 2,..., y p ) represent the whole response pattern for a randomly selected individual. The density function f (y x) of the manifest variables y is f (y x) = g(y z, x)h(z w, L) dz, (3) where g(y z, x) is the conditional density function of y given z and x, and h(z w, L) is the density function of z conditional on w and L. The latent variables are assumed to be independent with normal distributions. The matrix of parameters L is de ned later. The covariates x are assumed to be xed. Under the assumption of conditional independence of y with respect z and x, the latent variables z and observed covariates x account for the interrelationships among the observed ordinal variables, so that when the latent variables are held xed the responses to the p observed variables are independent: g(y z, x) = Yp i = 1 g( y i z, x). (4)
7 For a manifest item y i the conditional probability of ( y i z, x) is given by g( y i z, x) = Yc i s = 1 = Yc i s = 1 p is (z, x) y i s (g is g i, s 1) y is, (5) where y is = 1 if the response y i is in category s and y is = 0 otherwise. Equation (5) can be also written as g( y i z, x) = Yci 1 yi g, s yi is g i, s + 1 g, s + 1 y is i s, (6) s = 1 g i, s + 1 g i, s + 1 where y is = 1 if a randomly selected individual s response to item i is in lower and y is = 0 otherwise. If we take the log of (6), we have: log g( y i z, x) = Xci 1 µ g y is g is log y i, s + g s = 1 i, s + 1 g is i, s + 1 log 1 g i, s + 1 g is = Xci 1 s = 1 [ y is v is (z, x) y i, s + 1 b(v is (z, x))]. (7) From (7) we see that each component is in the form of the general expression for the exponential family distribution. More speci cally: and g is v is (z, x) = log g i, s + 1 g i, s + 1 g is, A general class of latent variable models 343 s = 1,..., c i 1, (8) b(v is (z, x)) = log = log{1 + exp(v g i, s + 1 g is (z, x))}, s = 1,..., c i 1. (9) is To simplify the notation we write v is and b(v is ). The parameter v is is not a linear function of the latent variable Structural model As already mentioned in the Introduction, the effect of covariates on latent variables can be measured either in one or two stages. In this paper we are interested in the one-stage approach where the parameters of the measurement model with or without direct effects (1) and the parameters of the structural model (see (10) below) are estimated simultaneously. Let us assume that the latent variables z m for an individual m are related to a set of observed covariates w m in a simple linear manner: z m = Lw m + d m, m = 1,..., n, (10) where z m is q 1 vector, L is a q k matrix of regression coef cients, w is a k 1 vector of covariates, and d m is a q 1 vector of independent standard normal variables. It follows that the distribution of the latent variables z m conditional on the covariates w m is normal with mean Lw m and variance 1. The covariates w are assumed to be xed and non-stochastic. Alternatively, in the two-stage approach, one can compute latent or factor scores
8 344 Irini Moustaki based on the measurement model (1). The latent scores can be used as dependent variables in further analysis with the vector of covariates w. To score the individuals on the latent dimensions identi ed by the analysis one can use the mean of the posterior distribution of the latent variable z j given the individual s response pattern E(z j y m, x m ). In the q th factor model the posterior mean is given by E(z j y m, x m ) =... z j h(z, L y m, x m ) dz, (11) R z1 R zq where R z j denotes the range of values for z j and h(z, L y m, x m ) is the posterior distribution of the latent variables given the observed variables Model identi cation We now discuss a necessary condition for the identi cation of the model presented in Fig. 1. This model is identi ed as long as the set of covariates x is different from the set of covariates w, as will shortly be explained. Furthermore, the latent variable z 1 is assumed to have a normal distribution with variance 1. This speci cation identi es the scale of z 1, which in turn identi es the scale for the item parameters (see (10)). Let us take a simple case where there is only one latent variable z 1 and one covariate x 1. We assume that the same covariate x 1 not only affects some function of the cumulative probability of responding up to a category s for an ordinal item y i through the measurement model but also affects the latent variable z 1 through the structural part of the model. Equation (1) becomes and (10) becomes link[g is (z 1, x 1 )] = t is Substituting (13) into (12) gives a i1 z 1 + b i1 x 1, i = 1,..., p; s = 1,..., c i, (12) z 1 = l 1 x 1 + d 1, i = 1,..., p. (13) link[g is (z 1, x 1 )] = t is a i1 d 1 (a i1 l 1 b i1 )x 1, i = 1,..., p; s = 1,..., c i. (14) From (14) it is apparent that parameters l 1 and b i1 cannot be estimated separately, and therefore these parameters are not identi ed. If, instead, we had used different covariates then (13) would have been written as Substituting (15) into (12), we have z 1 = l 1 w 1 + d 1, i = 1,..., p. (15) link[g is (z 1, x 1 )] = t is a i1 d 1 + g i1 w 1 + b i1 x 1, i = 1,..., p; s = 1,..., c i, (16) where g i1 = a i1 l 1. Equation (16) is a measurement model with direct effects where the latent variable is represented by the d 1 term and is assumed to have standard normal distribution. The model parameters in (16) are all identi ed even when the covariates w 1 and x 1 are correlated. When substituting the structural part of the model (15) into the measurement model with direct effects (12) we can compute from the reduced form (16) estimates of the effect of the covariate w 1 on the cumulative probabilities of the observed ordinal variables by just multiplying a i1 by l 1. In the general case with more than one latent variable and more than one covariate w, the direct effect of the covariate w l on the cumulative probability of the observed variable y i is computed by the sum P q j = 1 a i j l jl. What we are saying is that (16), obtained by substitution of the structural part of the
9 A general class of latent variable models 345 model into the measurement model with direct effects, is equivalent to (1) when there is no structural part involved. In (1) when there is no structural part the latent variables z have standard normal distributions. In (10) the latent variables are represented by the term d that also has standard normal distribution. However, the two models estimate different numbers of parameters so that the reduced-form parameters for the direct effects obtained from P q j = 1 a i j l j l will not always be close to those obtained when model (1) is used. Despite the fact that models (1) and (16) are equivalent, one might choose one over the other depending on the effects that one is interested in measuring Model estimation The model we have so far discussed consists of two components, the measurement part with the direct effects (1) and the structural part (10). The aim is to estimate all the parameters simultaneously. The estimation method described below is a full maximum likelihood estimation method. This means that the model is tted to the whole response pattern including both the responses to the p ordinal variables and the values of the r covariates. The parameters to be estimated are t, a, û and L. We start by writing down the joint density function of the random variables: f (y, z x, w) = g(y z, x, w)h(z x, w, L). (17) Since y does not depend on w and z does not depend on x, (17) is written as f (y, z x, w) = g(y z, x)h(z w, L). (18) In addition, we assume that the latent variables z and the covariates x account for the associations among the ordinal variables y. The conditional distribution of the y variables given the latent variables and the observed covariates is written as: g(y z, x) = Y p i = 1 g( y i z, x). Using (18), for a random sample of size n the complete log-likelihood is written as: L = Xn = m = 1 X n m = 1 log f (y m, z m x m, w m ) " # X p i = 1 log g( y im z m, x m ) + log h(z m w m, L). (19) Because z is unknown the log-likelihood given in (19) is maximized using an expectation±maximization (EM) algorithm. In the expectation step the expected score function of the model parameters is computed. The expectation is with respect to the posterior distribution of z given the observations (h(z, L y, x)). In the maximization step updated parameter estimates are obtained. The score function is the rst derivative of the log-likelihood with respect to the parameters. The rst term on the right-hand side of (19) denotes the distributions of the observed variables y conditional on the latent variables z and the observed covariates x, and the second term denotes the distribution of z conditional on the observed covariates w.
10 346 Irini Moustaki Estimation of L From (19) we see that the estimation of the parameters contained in the matrix L does not depend on the rst component of the complete log-likelihood. Therefore, estimation of L can be done separately from the rest of the parameters (t, a, and û). In addition, the latent variables are assumed to be independent conditional on w, so that h(z w, L) = h(z 1 w, l 1 )... h(z q w, l q ), where l j is the j th row of the L matrix. The expected score function with respect to the parameter vector l j, j = 1,..., q, takes the form ES m (l j ) =... S m (l j )h(z, L y m, x m ) dz, (20) where h(z, L y m, x m ) denotes the posterior distribution of the latent variables given what has been observed, and S m (l j ) = log h(z j w m, l j ) l j = w m (z j Equation (20) becomes: ES m ( l j ) =... w m (z j w 0 m l j ), j = 1,..., q. w 0 m l j )h(z, L y m, x m ) dz (21) Solving P n m = 1 ES m (l j ) = 0 and approximating the integrals over z by a weighted summation over a nite number of points and weights, we get an explicit solution for the maximum likelihood estimator of l j : P nm P = 1 w n1 m ˆl... P n q t 1 =1 t q = 1 j = z t j h(z t1,..., z t q, L y m, x m ) P nm = 1 w m wm 0, (22) where h(z t 1,..., z t q, L y m, x m ) = g(y m z t 1... z tq, x m )h(z t1 w m, l 1 )... h(z tq w m, l q ). f (y m, x m ) The points for the integral approximations are the Gauss±Hermite quadrature points given in Stroud and Sechrest (1966). This approximation in effect treats the latent variables as discrete with values z t1,..., z tq and their corresponding probabilities h(z t1 w, l 1 ),..., h(z tq w, l q ). This equation is updated at each step of the EM algorithm described in Section Estimation of the model parameters t, a and û The estimation of the parameters t, a and û depends on the rst component of (19). Let ai 0 = (t i1,..., t i, ci 1, a i1,..., a iq, b i1,..., b ir ), i = 1,..., p, where ai 0 is a vector of parameters. The expected score function of the parameter vector a i, where the expectation is taken with respect to h(z, L y, x), is ES m (a i ) =... S m (a i )h(z, L y m, x m ) dz, m = 1,..., n, (23) where S m (a i ) = log g(y m z, x m ) a i, i = 1,..., p.
11 Now log g(y m z, x m ) = Xc i a i Substitute (24) into (23): c Xi ES m (a i ) =... t 1 =1 t q =1 s = 1 m = 1 1 s = 1 1 s = 1 [ y ism v 0 ism [ y ism v 0 ism y i, s + 1, m b 0 (v ism )]. (24) y i, s + 1, m b 0 (v ism )]h(z, L y m, x m ) dz. (25) Solving P n m= 1 ES m (a i ) = 0 and approximating the integral with Gauss±Hermite quadrature points, we get non-explicit solutions for the parameter vector a i : X " n 1... Xn q Xc i 1 X n v y ism X # n b(v ism y i sm ) a i i, s + 1, m h(z t1... z tq, L y m, x m ). a i Expression (26) is written as where X n 1 t 1=1... Xn q r i, s, t1,..., t q = Xn m = 1 r i, s + 1, t1,..., t q = Xn m = 1 Xc i t q=1 s = 1 m = 1 1 [r i, s, t1,..., t q (26) r i, s + 1, t1,...,t q ], (27) h(z t1,..., z t q, L y m, x m )y ism v ism a i (28) h(z t1,..., z tq, L y m, x m )y i, s + 1, m b(v ism ) a i. (29) From the above results we can see that to compute the derivatives with respect to the model parameters for any link function we need to nd the rst derivatives of the functions v ism and b(v ism ) with respect to the model parameters. The maximization of the log-likelihood is done by an EM algorithm. The model without covariate effects has v ism and b(v ism ) functions not depending on the individual m EM algorithm The steps of the EM algorithm are de ned as follows: A general class of latent variable models 347 (1) Choose initial estimates for the model parameters t is, a i j, b il and l jn, where i = 1,..., p; s = 1,..., c i 1; l = 1,..., r; j = 1,..., q; n = 1,..., k. (2) Compute the values r i, s, t1,...,t q and r i, s + 1, t1,...,t q (E-step). (3) Obtain improved estimates for the parameters by solving the non-linear maximum likelihood equations for the parameters t is, a i j, b il and explicit solutions for the parameters l jn of the latent distribution (M-step). (4) Return to step 2 and continue until convergence is attained. At the M-step a one-step Fisher scoring algorithm is used to solve the non-linear maximum likelihood equations Sampling properties of the maximum likelihood estimates From the rst-order asymptotic theory the maximum likelihood estimates have a sampling distribution which is asymptotically normal. Asymptotically the sampling variances and covariances of the maximum likelihood estimates of the model parameters
12 348 Irini Moustaki are given by the elements of the inverse of the information matrix at the maximum likelihood solution. The standard errors given in the examples have been computed from an approximation of the inverse of the information matrix evaluated at the maximum likelihood solution. If we denote by g the set of all model parameters then an approximation of the information matrix is given by ( ) I(ĝ) = Xn 1 1 f (y m, z m x m w m ) f (y m, z m x m w m ) f (y m, z m x m w m ) 2. g j g k m = 1 g = ĝ 2.6. Proportional odds model The general measurement model with direct effects presented in (1) takes different forms depending on the link function used. There are many link functions to choose from such as the logit, the complementary log-log function, the inverse normal function, the inverse Cauchy, and the log-log function. The logit and the inverse normal function, also known as the probit, are the link functions most often used in practice. The probit and logit link functions have very similar shapes and therefore give similar results. Here, we discuss the logit link in more detail since it is the one used in Section 3. When the logit link is used in (1), the model is known as the proportional odds model and is written as µ g is (z, x) X q log = t 1 g is (z, is a x) i j z j + Xr b il x l, (30) where s = 1,..., c i 1; i = 1,..., p. From (30) we obtain: g is = P( y i # s z, x) = exp(t P q is j = 1 a i j z j + P r l = 1 b il x l ) P 1 + q exp(t is j = 1 a i j z j + P r l = 1 b il x l ), (31) where s = 1, 2,..., c i 1 and g imi = 1. Let ai 0 = (a i1,..., a iq, b i1,..., b ir ) and v 0 = (z, x). Then for two individuals with scores v 1 and v 2 the difference between two corresponding logits is a 0 (v 2 v 1 ) and does not depend on the category involved. The derivatives required in (26) for the proportional odds model are given in the Appendix. Models tted to ordinal items should preserve the ordinality property of the items. Models should be invariant when just a reversal of categories occur but not when the categories are arbitrarily permuted. Models such as the proportional odds model, probit, and inverse Cauchy are affected by an arbitrary permutation of the response categories, but not when only a reversal of category order occurs. Under those circumstances there is only a change in the sign of the regression and latent coef cients and a change in sign and order for the threshold parameters. j = 1 l = Goodness of t The goodness of t of the model can be theoretically checked by computing a Pearson chi-square or a likelihood ratio statistic from the whole response pattern. When the number of manifest ordinal variables is large it is expected that many response patterns will have expected frequency less than 5 and many will be so small that they will not occur at all. So from the practical point of view these tests cannot be used.
13 A general class of latent variable models 349 Alternatively we can compute the Pearson chi-square statistic or likelihood ratio statistic only for pairs and triplets of responses. The pairwise distribution of any two variables can be displayed as a two-way contingency table, and chi-square residuals can be constructed in the usual way by comparing the observed and expected frequencies. As a rule of thumb, if we consider the residual in each cell as having a x 2 distribution with one degree of freedom, then a value of the residual greater than 4 is indicative of poor t at the 5% signi cance level. A study of the individual margins provides information about where the model does not t. A detailed discussion on the use of these goodness-of- t measures for ordinal variables can be found in JoÈreskog and Moustaki (2001) and Bartholomew, Steele, Moustaki, and Galbraith (2002, pp. 213± 234). However, for the model with covariate effects the Pearson chi-square statistic or likelihood ratio statistic for pairs and triplets of responses has to be computed for different values of the explanatory variables. This will eventually make the use of these residuals less informative with respect to goodness of t. Alternatively, instead of testing the goodness of t of a speci ed model, we could use a criterion for selecting among a set of different models. This procedure gives information about the goodness of t for each model in comparison with other models. This can be useful for determining the number of factors required or for comparing the model with latent variables and covariate effects with the model with only latent variables. Sclove (1987) gives a review of some of the model selection criteria used in multivariate analysis, such as those due to Akaike, Schwarz and Kashyap. These criteria take into account the value of the likelihood at the maximum likelihood solution and the number of parameters estimated. Akaike s criterion for the determination of the order of an autoregressive model in time series has also been used for the determination of the number of factors in factor analysis (see Akaike, 1987): AIC = 2[log l(â)] + 2m, (32) where l(â) is the maximized likelihood function, m is the number of model parameters and â is a vector with all model parameters (measurement and structural). The model with the smallest AIC value is taken to be the best. In this paper we also use an information complexity criterion proposed by Bozdogan (2000). The criterion is de ned as ICOMP = 2[log l(â)] + 2C 1 ( ˆF 1 (â)), (33) where C 1 denotes the maximal information complexity of ˆF 1 (â), which is the estimated inverse Fisher information matrix. 3. Applications In this section we use the proportional odds model, with a logit link function, to analyse two data sets from the 1996 British Social Attitudes Survey (BSA) Example 1 The rst data set consists of ve ordinal manifest variables ( y 1,..., y 5 ), measuring attitudes to the role of government. Respondents were asked whether, on the whole, 1 Social and Community Planning Research, British Social Attitudes Survey, 1996 {computer le}, Colchester, Essex: The Data Archive {distributor}, 2 December SN: 3921.
14 350 Irini Moustaki they thought it should or not be the government s responsibility to: provide a job for everyone who wants one [ JobEvery] keep prices under control [PriCon] provide a decent standard of living for the unemployed [LivUnem] reduce income differences between the rich and the poor [IncDiff ] provide decent housing for those who can t afford it [Housing] The response alternatives given to the respondents were: de nitely should be, probably should be, probably should not be, and de nitely should not be. Item non-response varied between 2% and 6%. After excluding the missing values, we were left with 822 respondents. Missing values can be incorporated into the latent variable analysis (see O Muircheartaigh and Moustaki, 1999). A covariate x constructed to measure left to right political identi cation was used, after standardization, as a continuous explanatory variable for the manifest ordinal variables. The `left±right variable is available in the 1996 BSA survey; it was constructed from a set of ve items related to redistribution and equality. The variable is usually used for distinguishing party identi cation (see Heath, Jowell, Curtice, & Witherspoon, 1986). We started the analysis by tting the measurement model with no direct effects (equation (30) with no x variables). The estimated thresholds ˆt is and factor loadings â i1 with estimated standard errors are given in Tables 1 and 2, respectively. The fourth column of Table 2 gives standardized factor loadings stâ i1. These express correlations between the manifest variable y i and the latent variable z j. For details on how to compute standardized loadings, see Bartholomew and Knott (1999). Items 3, 4 and 5 have the highest power of discrimination, followed by items 1 and 2. Their positive sign indicate that the more an individual believes that the state should not be responsible for its citizens the less likely it is for that individual to choose the lower categories of the ordinal variables. Table 1. Estimated thresholds and standard errors for the measurement model with no direct effects, Example 1 Item Category ˆt is s.e. JobEvery Pricon LivUnem IncDiff Housing
15 A general class of latent variable models 351 Table 2. Estimated factor loadings, standard errors and standardized factor loadings for the measurement model with no direct effects, Example 1 Item â i1 s.e. st â i1 JobEvery Pricon LivUnem IncDiff Housing Table 3 gives the pair of categories where the chi-square residuals were greater than 4 for the one-factor measurement model with no direct effects. For example, a bad t was detected for category 1 for item 1 and category 1 for item 2. These residuals are not independent and therefore cannot be summed to give an overall goodness-of- t measure. Rather, they indicate pair of items and categories that cannot be tted by the model. They provide information for collapsing categories and for omitting items from the analysis to improve t ( JoÈreskog and Moustaki, 2001). Table 3. Chi-square residuals greater than 4 for two-way margins, Example 1 Item (1, 1), (1, 2), (2, 2) (1, 2), (1, 4) (2, 4) (4, 1) (2, 4), (3, 1), (4, 2) (4, 4) 2 (1, 4), (4, 1) (3, 3) (1, 3), (4, 1) 3 (4, 1), (4, 3) (1, 2), (1, 4), (2, 2) (3, 3), (4, 1), (4, 2) 4 (3, 4), (4, 1) We continued by allowing the `left±right variable to affect the manifest variables directly. We wished to see whether the latent variable z together with the covariate x could explain better the associations among the observed ordinal variables. The maximum likelihood estimates of the threshold parameters are given in Table 4 and the factor loadings and regression parameters are given in Table 5. The estimated factor loadings are all positive and of similar magnitude to the loadings obtained when the measurement model without direct effects was tted. The estimated regression coef cients, taking into account their standard errors, were found to be signi cant. This means that, depending on the individual s position on the `left±right scale, the thresholds for each item y i will be shifted by ˆb i. In addition, the negative sign of the regression coef cients shows that the more right wing an individual is the lower the probability of being in the low-level categories of the ordinal observed variables. Table 6 gives the AIC and ICOMP criteria for the model with and without the covariate. Both criteria suggest that the model with the covariate effect is a better t than the one without the covariate effect on the manifest variables.
16 352 Irini Moustaki Table 4. Estimated thresholds and standard errors for the measurement model with direct effects, Example 1 Item Category ˆt is s.e. JobEvery Pricon LivUnem IncDiff Housing Table 5. Estimated factor loadings, regression parameters and standard errors for the measurement model with direct effects, Example 1 Item â i1 s.e. ˆ b i1 s.e. JobEvery Pricon LivUnem IncDiff Housing Table 6. Model selection criteria, Example 1 Model with no covariate Model with covariate AIC ICOMP Example 2 The second application is also from the 1996 British Social Attitudes Survey. Five ordinal manifest variables were selected for the analysis. The items measure satisfaction with the National Health Service in the respondents area, and more speci cally with the following services provided by general practitioners (GPs): GP s appointment systems [Appointment] Amount of time GP gives to each patient [AmountTime]
17 Being able to choose which GP to see [ChooseGP] Quality of medical treatment by GPs [Quality] Waiting areas at GPs surgeries [WaitingArea] A general class of latent variable models 353 The response alternatives given to the respondents were: in need of a lot of improvement, in need of some improvement, satisfactory, and very good. Item non-response varied between 1.5% and 2.5%. After excluding the missing values, we were left with 841 respondents. In the analysis we were interested in measuring overall satisfaction with GPs from the ve ordinal manifest variables, controlling for respondents political identi cation (measured by an observed covariate with four categories: Conservative, Labour, Liberal Democrat, and other). We also wished to measure the effect of gender and age on the latent variable satisfaction. Age is given in four categories: 18±25, 26±44, 45±64, and 65+. Gender and age are treated as dummy variables; the categories male and 18±25 are taken to be the respective reference categories. Allowing gender and age to affect the latent variable z (and not the manifest variables y directly) implies that all differences in the thresholds of the y variables across different groups de ned by gender and age are expressed through mean differences in the common factor z. First, we tted the one-factor model to the ve ordinal manifest variables without allowing for any covariate effects. The fourth column of Table 7 gives the estimated standardized factor loadings (st â i1 ). These are all positive and of similar magnitude, indicating that the ve ordinal items measure a single factor and all have more or less the same power of discrimination. Their positive signs indicate that the more satis ed an individual is with the National Health Service in his/her area the less likely he/she is to choose the lower categories of the ordinal variables. Table 7. Estimated factor loadings with standard errors and standardized factor loadings for the measurement model without direct effects, Example 2 Item â i1 s.e. st â i1 Appointment AmountTime ChooseGP Quality WaitingArea Table 8 gives pairs of items and categories for which the chi-square residuals computed for those combinations of items and categories are greater than 4. There are a substantial number of pairwise associations that cannot be explained by the model, and therefore a two-factor model might be proven to be a better t. Here, instead of tting a two-factor model we introduce the effects of covariates both on the manifest items and on the latent variable. We continued by tting the one-factor model that allows for covariate effects. The maximum likelihood estimates of the thresholds parameters are given in Table 9 and the factor loadings and regression parameters are given in Table 10. The effects of the covariates age and gender on the latent variables are given in Table 11. The estimated factor loadings all remain positive and of similar magnitude to those
18 354 Irini Moustaki Table 8. Chi-square residuals greater than 4 for two-way margins, Example 2 Item (1, 4), (3, 4) (3, 4), (4, 3) (1, 2), (1, 4) (4, 3) (4, 4) (2, 4), (3, 4) 2 (2, 4), (3, 4) (1, 2), (1, 4) (4, 3), (4, 2) 3 (1, 2), (1, 4) 4 (2, 4), (3, 3) (3, 4) Table 9. Estimated thresholds and standard errors for the measurement model with direct effects, Example 2 Item Category ˆt is s.e. Appointment AmountTime ChooseGP Quality WaitingArea Table 10. Estimated factor loadings, regression parameters and standard errors for the measurement model with direct effects, Example 2 b ˆ i1 ˆbi2 ˆbi3 Item â i1 s.e. Labour s.e. Liberal s.e. Other s.e. Appointment AmountTime ChooseGP Quality WaitingArea
19 Table 11. Estimated structural parameters and standard errors, Example 2 A general class of latent variable models 355 ˆ l l s.e. Constant Female obtained from the one-factor model without covariate effects (see Table 7). The small changes in the values of the estimated factor loadings are an indication of item factorial invariance within the groups de ned by the covariates. The direct effects of the political party covariate on the manifest ordinal variables are similar, with the exception of variable 3 (ChooseGP). Respondents who tend to vote Labour are more likely to express dissatisfaction with each one of the ve ordinal items than those who tend to vote Conservative. The Conservative party category is used as a reference category. Finally, from Table 11 we see that gender has no effect on overall satisfaction with the National Health Service, but that as respondents age increases so does their satisfaction with the Health Service compared with the 18±25 age group. The AIC criterion for the model without the covariates is , and for the model with the covariates is We conclude that the model with the covariate effects is a better t than the one without. 4. Conclusion This paper attempts to generalize the item response theory models to allow for covariate effects both on the manifest and on the latent variables. We have shown that the IRT framework can be extended to cover models often tted within the SEM framework. The IRT approach for the analysis of ordinal variables with covariates is a full maximum likelihood method that does not require the use of underlying variables and the estimation of polychoric correlations as SEM does. Furthermore, for obtaining correct standard errors and goodness-of- t tests in SEM, we need to obtain the asymptotic covariance matrix of the polychoric correlations, which requires large samples. Problems might also arise in the SEM framework when the assumption of bivariate normality for the underlying variables does not hold. The models presented here were tted using GENLAT 1.1 (Moustaki, 2002). GENLAT 1.1 uses an EM algorithm to maximize the log-likelihood. The convergence of the EM algorithm slows down as the number of factors increases, and this is considered the main drawback of the framework presented here. On the other hand, the SEM approach, which is based on the concept of tting a factor model on the polychoric correlation matrix, does not face a computational burden related to the number of factors tted. The EM algorithm has been found to be robust with respect to the initial values used. The program GENLAT can t up to two factors to binary, nominal, ordinal, metric manifest items and can also handle the simultaneous analysis of items with different distributions. As an alternative, the STATA routine GLLAMM (Rabe-Hesketh, Pickles, & Skrondal, 2001) ts latent variable models to ordinal items using the Newton±Raphson algorithm with adaptive quadrature instead of the EM algorithm.
20 356 Irini Moustaki Acknowledgements The author would like to thank the two anonymous referees for their constructive comments in improving the structure and clarity of this paper. References Akaike, H. (1987). Factor analysis and AIC. Psychometrika, 52, 317±332. Bartholomew, D., Steele, F., Moustaki, I., & Galbraith, J. (2002). The analysis and interpretation of multivariate data for social scientists. Boca Raton, FL: Chapman & Hall/CRC. Bartholomew, D. J., & Knott, M. (1999). Latent variable models and factor analysis (2nd ed.). London: Arnold. Bentler, P. M. (1992). EQS: Structural equations program manual. Los Angeles: BMDP Statistical Software. Bozdogan, H. (2000). Akaike s information criterion and recent developments in information complexity. Journal of Mathematical Psychology, 44, 62±91. Croon, M., & Bolck, A. (1997). On the use of factor scores in structural equations models (Technical Report /7). Tilburg: Tilburg University, Work and Organization Research Centre. Glas, C. (2001). Differential item functioning depending on general covariates. In A. Boomsma, M. A. J. van Duijn, & T. A. B. Snijders (Eds), Essays on item response theory (pp. 131±145). New York: Springer-Verlag. Heath, A., Jowell, R., Curtice, J., & Witherspoon, S. (1986). End of award report to the ESRC: Methodological aspects of attitude research. London: SCPR. JoÈreskog, K. G., & Goldberger, A. S. (1975). Estimation of a model with multiple indicators and multiple causes of a single latent variable. Journal of the American Statistical Association, 70, 631±639. JoÈreskog, K. G., & Moustaki, I. (2001). Factor analysis of ordinal variables: A comparison of three approaches. Multivariate Behavioral Research, 36, 347±387. JoÈreskog, K. G., & SoÈrbom, D. (1993). LISREL 8 user s reference guide. Chicago: Scienti c Software International. Moustaki, I. (2000). A latent variable model for ordinal variables. Applied Psychological Measurement, 24, 211±223. Moustaki, I. (2001). A review of exploratory factor analysis for ordinal categorical data. In R. Cudeck, S. du Toit, and D. SoÈrbom (Eds), Structural equation modeling: Present and future. A festschrift in honor of Karl J. JoÈreskog. Chicago: Scienti c Software International. Moustaki, I. (2002). GENLAT 1.1: A computer program for tting a one- or two-factor latent variable model to categorical, metric and mixed observed items with missing values (Technical Report). London: London School of Economics and Political Science, Statistics Department. Moustaki, I., & Knott, M. (2000). Generalized latent trait models. Psychometrika, 65, 391±411. Muraki, E., & Carlson, E. (1995). Full-information factor analysis for polytomous item responses. Applied Psychological Measurement, 19, 73±90. MutheÂn, B. O. (1989). Latent variable modeling in heterogeneous populations. Psychometrika, 54, 557±585. MutheÂn, B. O., & MutheÂn, L. (2000). Mplus user s guide. Los Angeles: MutheÂn & MutheÂn. O Muircheartaigh, C., & Moustaki, I. (1999). Symmetric pattern models: A latent variable approach to item non-response in attitude scales. Journal of the Royal Statistical Society, Series A, 162, 177±194. Rabe-Hesketh, S., Pickles, A., & Skrondal, A. (2001). GLLAMM manual (Technical Report 2001/01). London: King s College, Institute of Psychiatry, Department of Biostatistics and Computing.
GENERALIZED LATENT TRAIT MODELS. 1. Introduction
PSYCHOMETRIKA VOL. 65, NO. 3, 391 411 SEPTEMBER 2000 GENERALIZED LATENT TRAIT MODELS IRINI MOUSTAKI AND MARTIN KNOTT LONDON SCHOOL OF ECONOMICS AND POLITICAL SCIENCE In this paper we discuss a general
More informationFactor Analysis and Latent Structure of Categorical Data
Factor Analysis and Latent Structure of Categorical Data Irini Moustaki Athens University of Economics and Business Outline Objectives Factor analysis model Literature Approaches Item Response Theory Models
More informationAnders Skrondal. Norwegian Institute of Public Health London School of Hygiene and Tropical Medicine. Based on joint work with Sophia Rabe-Hesketh
Constructing Latent Variable Models using Composite Links Anders Skrondal Norwegian Institute of Public Health London School of Hygiene and Tropical Medicine Based on joint work with Sophia Rabe-Hesketh
More informationEstimating chopit models in gllamm Political efficacy example from King et al. (2002)
Estimating chopit models in gllamm Political efficacy example from King et al. (2002) Sophia Rabe-Hesketh Department of Biostatistics and Computing Institute of Psychiatry King s College London Anders
More informationStrati cation in Multivariate Modeling
Strati cation in Multivariate Modeling Tihomir Asparouhov Muthen & Muthen Mplus Web Notes: No. 9 Version 2, December 16, 2004 1 The author is thankful to Bengt Muthen for his guidance, to Linda Muthen
More informationPIRLS 2016 Achievement Scaling Methodology 1
CHAPTER 11 PIRLS 2016 Achievement Scaling Methodology 1 The PIRLS approach to scaling the achievement data, based on item response theory (IRT) scaling with marginal estimation, was developed originally
More informationStat 542: Item Response Theory Modeling Using The Extended Rank Likelihood
Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal
More informationCorrelations with Categorical Data
Maximum Likelihood Estimation of Multiple Correlations and Canonical Correlations with Categorical Data Sik-Yum Lee The Chinese University of Hong Kong Wal-Yin Poon University of California, Los Angeles
More informationSTA 216, GLM, Lecture 16. October 29, 2007
STA 216, GLM, Lecture 16 October 29, 2007 Efficient Posterior Computation in Factor Models Underlying Normal Models Generalized Latent Trait Models Formulation Genetic Epidemiology Illustration Structural
More informationIdentifying and accounting for outliers and extreme response patterns in latent variable modelling
Identifying and accounting for outliers and extreme response patterns in latent variable modelling Irini Moustaki Athens University of Economics and Business Outline 1. Define the problem of outliers and
More informationStructural Equation Modeling and Confirmatory Factor Analysis. Types of Variables
/4/04 Structural Equation Modeling and Confirmatory Factor Analysis Advanced Statistics for Researchers Session 3 Dr. Chris Rakes Website: http://csrakes.yolasite.com Email: Rakes@umbc.edu Twitter: @RakesChris
More informationA Marginal Maximum Likelihood Procedure for an IRT Model with Single-Peaked Response Functions
A Marginal Maximum Likelihood Procedure for an IRT Model with Single-Peaked Response Functions Cees A.W. Glas Oksana B. Korobko University of Twente, the Netherlands OMD Progress Report 07-01. Cees A.W.
More informationOverview. Multidimensional Item Response Theory. Lecture #12 ICPSR Item Response Theory Workshop. Basics of MIRT Assumptions Models Applications
Multidimensional Item Response Theory Lecture #12 ICPSR Item Response Theory Workshop Lecture #12: 1of 33 Overview Basics of MIRT Assumptions Models Applications Guidance about estimating MIRT Lecture
More informationClass Notes: Week 8. Probit versus Logit Link Functions and Count Data
Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While
More information8 Nominal and Ordinal Logistic Regression
8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on
More informationLOGISTIC REGRESSION Joseph M. Hilbe
LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of
More informationTitle: Testing for Measurement Invariance with Latent Class Analysis. Abstract
1 Title: Testing for Measurement Invariance with Latent Class Analysis Authors: Miloš Kankaraš*, Guy Moors*, and Jeroen K. Vermunt Abstract Testing for measurement invariance can be done within the context
More informationSingle-level Models for Binary Responses
Single-level Models for Binary Responses Distribution of Binary Data y i response for individual i (i = 1,..., n), coded 0 or 1 Denote by r the number in the sample with y = 1 Mean and variance E(y) =
More informationComparing IRT with Other Models
Comparing IRT with Other Models Lecture #14 ICPSR Item Response Theory Workshop Lecture #14: 1of 45 Lecture Overview The final set of slides will describe a parallel between IRT and another commonly used
More informationLinear Regression With Special Variables
Linear Regression With Special Variables Junhui Qian December 21, 2014 Outline Standardized Scores Quadratic Terms Interaction Terms Binary Explanatory Variables Binary Choice Models Standardized Scores:
More informationINTRODUCTION TO STRUCTURAL EQUATION MODELS
I. Description of the course. INTRODUCTION TO STRUCTURAL EQUATION MODELS A. Objectives and scope of the course. B. Logistics of enrollment, auditing, requirements, distribution of notes, access to programs.
More informationGeneral structural model Part 2: Categorical variables and beyond. Psychology 588: Covariance structure and factor models
General structural model Part 2: Categorical variables and beyond Psychology 588: Covariance structure and factor models Categorical variables 2 Conventional (linear) SEM assumes continuous observed variables
More informationTesting and Model Selection
Testing and Model Selection This is another digression on general statistics: see PE App C.8.4. The EViews output for least squares, probit and logit includes some statistics relevant to testing hypotheses
More informationGeneralized Linear Latent and Mixed Models with Composite Links and Exploded
Generalized Linear Latent and Mixed Models with Composite Links and Exploded Likelihoods Anders Skrondal 1 and Sophia Rabe-Hesketh 2 1 Norwegian Institute of Public Health, Oslo (anders.skrondal@fhi.no)
More informationPACKAGE LMest FOR LATENT MARKOV ANALYSIS
PACKAGE LMest FOR LATENT MARKOV ANALYSIS OF LONGITUDINAL CATEGORICAL DATA Francesco Bartolucci 1, Silvia Pandofi 1, and Fulvia Pennoni 2 1 Department of Economics, University of Perugia (e-mail: francesco.bartolucci@unipg.it,
More informationReview of Multinomial Distribution If n trials are performed: in each trial there are J > 2 possible outcomes (categories) Multicategory Logit Models
Chapter 6 Multicategory Logit Models Response Y has J > 2 categories. Extensions of logistic regression for nominal and ordinal Y assume a multinomial distribution for Y. 6.1 Logit Models for Nominal Responses
More informationNesting and Equivalence Testing
Nesting and Equivalence Testing Tihomir Asparouhov and Bengt Muthén August 13, 2018 Abstract In this note, we discuss the nesting and equivalence testing (NET) methodology developed in Bentler and Satorra
More informationSTA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3
STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae
More informationDimensionality Assessment: Additional Methods
Dimensionality Assessment: Additional Methods In Chapter 3 we use a nonlinear factor analytic model for assessing dimensionality. In this appendix two additional approaches are presented. The first strategy
More informationInterpreting and using heterogeneous choice & generalized ordered logit models
Interpreting and using heterogeneous choice & generalized ordered logit models Richard Williams Department of Sociology University of Notre Dame July 2006 http://www.nd.edu/~rwilliam/ The gologit/gologit2
More informationAssessment, analysis and interpretation of Patient Reported Outcomes (PROs)
Assessment, analysis and interpretation of Patient Reported Outcomes (PROs) Day 2 Summer school in Applied Psychometrics Peterhouse College, Cambridge 12 th to 16 th September 2011 This course is prepared
More informationUCLA Department of Statistics Papers
UCLA Department of Statistics Papers Title Can Interval-level Scores be Obtained from Binary Responses? Permalink https://escholarship.org/uc/item/6vg0z0m0 Author Peter M. Bentler Publication Date 2011-10-25
More informationGeneralized Linear Models for Non-Normal Data
Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture
More informationIntroduction to Structural Equation Modeling
Introduction to Structural Equation Modeling Notes Prepared by: Lisa Lix, PhD Manitoba Centre for Health Policy Topics Section I: Introduction Section II: Review of Statistical Concepts and Regression
More informationModel Estimation Example
Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions
More informationST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses
ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities
More information1 A Non-technical Introduction to Regression
1 A Non-technical Introduction to Regression Chapters 1 and Chapter 2 of the textbook are reviews of material you should know from your previous study (e.g. in your second year course). They cover, in
More informationComputationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models
Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Tihomir Asparouhov 1, Bengt Muthen 2 Muthen & Muthen 1 UCLA 2 Abstract Multilevel analysis often leads to modeling
More informationDetermining the number of components in mixture models for hierarchical data
Determining the number of components in mixture models for hierarchical data Olga Lukočienė 1 and Jeroen K. Vermunt 2 1 Department of Methodology and Statistics, Tilburg University, P.O. Box 90153, 5000
More informationLimited Dependent Variable Models II
Limited Dependent Variable Models II Fall 2008 Environmental Econometrics (GR03) LDV Fall 2008 1 / 15 Models with Multiple Choices The binary response model was dealing with a decision problem with two
More informationPairwise Parameter Estimation in Rasch Models
Pairwise Parameter Estimation in Rasch Models Aeilko H. Zwinderman University of Leiden Rasch model item parameters can be estimated consistently with a pseudo-likelihood method based on comparing responses
More informationCh 6: Multicategory Logit Models
293 Ch 6: Multicategory Logit Models Y has J categories, J>2. Extensions of logistic regression for nominal and ordinal Y assume a multinomial distribution for Y. In R, we will fit these models using the
More informationBasic IRT Concepts, Models, and Assumptions
Basic IRT Concepts, Models, and Assumptions Lecture #2 ICPSR Item Response Theory Workshop Lecture #2: 1of 64 Lecture #2 Overview Background of IRT and how it differs from CFA Creating a scale An introduction
More informationLogistic Regression: Regression with a Binary Dependent Variable
Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression
More informationPairwise Likelihood Estimation for factor analysis models with ordinal data
Working Paper 2011:4 Department of Statistics Pairwise Likelihood Estimation for factor analysis models with ordinal data Myrsini Katsikatsou Irini Moustaki Fan Yang-Wallentin Karl G. Jöreskog Working
More informationECON 594: Lecture #6
ECON 594: Lecture #6 Thomas Lemieux Vancouver School of Economics, UBC May 2018 1 Limited dependent variables: introduction Up to now, we have been implicitly assuming that the dependent variable, y, was
More informationThe Multilevel Logit Model for Binary Dependent Variables Marco R. Steenbergen
The Multilevel Logit Model for Binary Dependent Variables Marco R. Steenbergen January 23-24, 2012 Page 1 Part I The Single Level Logit Model: A Review Motivating Example Imagine we are interested in voting
More informationSummer School in Applied Psychometric Principles. Peterhouse College 13 th to 17 th September 2010
Summer School in Applied Psychometric Principles Peterhouse College 13 th to 17 th September 2010 1 Two- and three-parameter IRT models. Introducing models for polytomous data. Test information in IRT
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population
More informationSEM for Categorical Outcomes
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More informationOnline Appendix to: Marijuana on Main Street? Estimating Demand in Markets with Limited Access
Online Appendix to: Marijuana on Main Street? Estating Demand in Markets with Lited Access By Liana Jacobi and Michelle Sovinsky This appendix provides details on the estation methodology for various speci
More informationEPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7
Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review
More informationSystematic error, of course, can produce either an upward or downward bias.
Brief Overview of LISREL & Related Programs & Techniques (Optional) Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised April 6, 2015 STRUCTURAL AND MEASUREMENT MODELS:
More informationWe begin by thinking about population relationships.
Conditional Expectation Function (CEF) We begin by thinking about population relationships. CEF Decomposition Theorem: Given some outcome Y i and some covariates X i there is always a decomposition where
More informationA multivariate multilevel model for the analysis of TIMMS & PIRLS data
A multivariate multilevel model for the analysis of TIMMS & PIRLS data European Congress of Methodology July 23-25, 2014 - Utrecht Leonardo Grilli 1, Fulvia Pennoni 2, Carla Rampichini 1, Isabella Romeo
More informationMultiple group models for ordinal variables
Multiple group models for ordinal variables 1. Introduction In practice, many multivariate data sets consist of observations of ordinal variables rather than continuous variables. Most statistical methods
More informationSemiparametric Generalized Linear Models
Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student
More informationLatent variable models: a review of estimation methods
Latent variable models: a review of estimation methods Irini Moustaki London School of Economics Conference to honor the scientific contributions of Professor Michael Browne Outline Modeling approaches
More informationAPPLIED STRUCTURAL EQUATION MODELLING FOR RESEARCHERS AND PRACTITIONERS. Using R and Stata for Behavioural Research
APPLIED STRUCTURAL EQUATION MODELLING FOR RESEARCHERS AND PRACTITIONERS Using R and Stata for Behavioural Research APPLIED STRUCTURAL EQUATION MODELLING FOR RESEARCHERS AND PRACTITIONERS Using R and Stata
More informationMeasurement Invariance (MI) in CFA and Differential Item Functioning (DIF) in IRT/IFA
Topics: Measurement Invariance (MI) in CFA and Differential Item Functioning (DIF) in IRT/IFA What are MI and DIF? Testing measurement invariance in CFA Testing differential item functioning in IRT/IFA
More informationMC3: Econometric Theory and Methods. Course Notes 4
University College London Department of Economics M.Sc. in Economics MC3: Econometric Theory and Methods Course Notes 4 Notes on maximum likelihood methods Andrew Chesher 25/0/2005 Course Notes 4, Andrew
More informationFöreläsning /31
1/31 Föreläsning 10 090420 Chapter 13 Econometric Modeling: Model Speci cation and Diagnostic testing 2/31 Types of speci cation errors Consider the following models: Y i = β 1 + β 2 X i + β 3 X 2 i +
More informationA Cautionary Note on the Use of LISREL s Automatic Start Values in Confirmatory Factor Analysis Studies R. L. Brown University of Wisconsin
A Cautionary Note on the Use of LISREL s Automatic Start Values in Confirmatory Factor Analysis Studies R. L. Brown University of Wisconsin The accuracy of parameter estimates provided by the major computer
More informationA class of latent marginal models for capture-recapture data with continuous covariates
A class of latent marginal models for capture-recapture data with continuous covariates F Bartolucci A Forcina Università di Urbino Università di Perugia FrancescoBartolucci@uniurbit forcina@statunipgit
More informationECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam
ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The
More informationFactor analysis. George Balabanis
Factor analysis George Balabanis Key Concepts and Terms Deviation. A deviation is a value minus its mean: x - mean x Variance is a measure of how spread out a distribution is. It is computed as the average
More informationWhat is an Ordinal Latent Trait Model?
What is an Ordinal Latent Trait Model? Gerhard Tutz Ludwig-Maximilians-Universität München Akademiestraße 1, 80799 München February 19, 2019 arxiv:1902.06303v1 [stat.me] 17 Feb 2019 Abstract Although various
More informationAn Introduction to Path Analysis
An Introduction to Path Analysis PRE 905: Multivariate Analysis Lecture 10: April 15, 2014 PRE 905: Lecture 10 Path Analysis Today s Lecture Path analysis starting with multivariate regression then arriving
More informationNELS 88. Latent Response Variable Formulation Versus Probability Curve Formulation
NELS 88 Table 2.3 Adjusted odds ratios of eighth-grade students in 988 performing below basic levels of reading and mathematics in 988 and dropping out of school, 988 to 990, by basic demographics Variable
More informationCitation for published version (APA): Jak, S. (2013). Cluster bias: Testing measurement invariance in multilevel data
UvA-DARE (Digital Academic Repository) Cluster bias: Testing measurement invariance in multilevel data Jak, S. Link to publication Citation for published version (APA): Jak, S. (2013). Cluster bias: Testing
More informationAssessing Factorial Invariance in Ordered-Categorical Measures
Multivariate Behavioral Research, 39 (3), 479-515 Copyright 2004, Lawrence Erlbaum Associates, Inc. Assessing Factorial Invariance in Ordered-Categorical Measures Roger E. Millsap and Jenn Yun-Tein Arizona
More informationSelection endogenous dummy ordered probit, and selection endogenous dummy dynamic ordered probit models
Selection endogenous dummy ordered probit, and selection endogenous dummy dynamic ordered probit models Massimiliano Bratti & Alfonso Miranda In many fields of applied work researchers need to model an
More informationEconomics 241B Estimation with Instruments
Economics 241B Estimation with Instruments Measurement Error Measurement error is de ned as the error resulting from the measurement of a variable. At some level, every variable is measured with error.
More informationChapter 1. GMM: Basic Concepts
Chapter 1. GMM: Basic Concepts Contents 1 Motivating Examples 1 1.1 Instrumental variable estimator....................... 1 1.2 Estimating parameters in monetary policy rules.............. 2 1.3 Estimating
More informationPackage threg. August 10, 2015
Package threg August 10, 2015 Title Threshold Regression Version 1.0.3 Date 2015-08-10 Author Tao Xiao Maintainer Tao Xiao Depends R (>= 2.10), survival, Formula Fit a threshold regression
More informationComparison between conditional and marginal maximum likelihood for a class of item response models
(1/24) Comparison between conditional and marginal maximum likelihood for a class of item response models Francesco Bartolucci, University of Perugia (IT) Silvia Bacci, University of Perugia (IT) Claudia
More informationA Threshold-Free Approach to the Study of the Structure of Binary Data
International Journal of Statistics and Probability; Vol. 2, No. 2; 2013 ISSN 1927-7032 E-ISSN 1927-7040 Published by Canadian Center of Science and Education A Threshold-Free Approach to the Study of
More informationPartitioning variation in multilevel models.
Partitioning variation in multilevel models. by Harvey Goldstein, William Browne and Jon Rasbash Institute of Education, London, UK. Summary. In multilevel modelling, the residual variation in a response
More informationAPPENDICES TO Protest Movements and Citizen Discontent. Appendix A: Question Wordings
APPENDICES TO Protest Movements and Citizen Discontent Appendix A: Question Wordings IDEOLOGY: How would you describe your views on most political matters? Generally do you think of yourself as liberal,
More informationDynamic sequential analysis of careers
Dynamic sequential analysis of careers Fulvia Pennoni Department of Statistics and Quantitative Methods University of Milano-Bicocca http://www.statistica.unimib.it/utenti/pennoni/ Email: fulvia.pennoni@unimib.it
More informationIntroducing Generalized Linear Models: Logistic Regression
Ron Heck, Summer 2012 Seminars 1 Multilevel Regression Models and Their Applications Seminar Introducing Generalized Linear Models: Logistic Regression The generalized linear model (GLM) represents and
More informationWhat is Latent Class Analysis. Tarani Chandola
What is Latent Class Analysis Tarani Chandola methods@manchester Many names similar methods (Finite) Mixture Modeling Latent Class Analysis Latent Profile Analysis Latent class analysis (LCA) LCA is a
More information36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs)
36-309/749 Experimental Design for Behavioral and Social Sciences Dec 1, 2015 Lecture 11: Mixed Models (HLMs) Independent Errors Assumption An error is the deviation of an individual observed outcome (DV)
More informationClass business PS is due Wed. Lecture 20 (QPM 2016) Multivariate Regression November 14, / 44
Multivariate Regression Prof. Jacob M. Montgomery Quantitative Political Methodology (L32 363) November 14, 2016 Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 1 / 44 Class business PS
More informationExploiting TIMSS and PIRLS combined data: multivariate multilevel modelling of student achievement
Exploiting TIMSS and PIRLS combined data: multivariate multilevel modelling of student achievement Second meeting of the FIRB 2012 project Mixture and latent variable models for causal-inference and analysis
More informationGeneralized Models: Part 1
Generalized Models: Part 1 Topics: Introduction to generalized models Introduction to maximum likelihood estimation Models for binary outcomes Models for proportion outcomes Models for categorical outcomes
More informationApplied Psychological Measurement 2001; 25; 283
Applied Psychological Measurement http://apm.sagepub.com The Use of Restricted Latent Class Models for Defining and Testing Nonparametric and Parametric Item Response Theory Models Jeroen K. Vermunt Applied
More informationInvestigating Models with Two or Three Categories
Ronald H. Heck and Lynn N. Tabata 1 Investigating Models with Two or Three Categories For the past few weeks we have been working with discriminant analysis. Let s now see what the same sort of model might
More informationRon Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)
Ron Heck, Fall 2011 1 EDEP 768E: Seminar in Multilevel Modeling rev. January 3, 2012 (see footnote) Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October
More informationCHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA
Examples: Multilevel Modeling With Complex Survey Data CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA Complex survey data refers to data obtained by stratification, cluster sampling and/or
More informationReview of Statistics 101
Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods
More informationNinth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis"
Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis" June 2013 Bangkok, Thailand Cosimo Beverelli and Rainer Lanz (World Trade Organization) 1 Selected econometric
More informationFactor Analysis. Qian-Li Xue
Factor Analysis Qian-Li Xue Biostatistics Program Harvard Catalyst The Harvard Clinical & Translational Science Center Short course, October 7, 06 Well-used latent variable models Latent variable scale
More informationAn Introduction to Mplus and Path Analysis
An Introduction to Mplus and Path Analysis PSYC 943: Fundamentals of Multivariate Modeling Lecture 10: October 30, 2013 PSYC 943: Lecture 10 Today s Lecture Path analysis starting with multivariate regression
More informationBivariate Relationships Between Variables
Bivariate Relationships Between Variables BUS 735: Business Decision Making and Research 1 Goals Specific goals: Detect relationships between variables. Be able to prescribe appropriate statistical methods
More informationLesson 7: Item response theory models (part 2)
Lesson 7: Item response theory models (part 2) Patrícia Martinková Department of Statistical Modelling Institute of Computer Science, Czech Academy of Sciences Institute for Research and Development of
More informationBayesian Analysis of Latent Variable Models using Mplus
Bayesian Analysis of Latent Variable Models using Mplus Tihomir Asparouhov and Bengt Muthén Version 2 June 29, 2010 1 1 Introduction In this paper we describe some of the modeling possibilities that are
More informationA Study of Statistical Power and Type I Errors in Testing a Factor Analytic. Model for Group Differences in Regression Intercepts
A Study of Statistical Power and Type I Errors in Testing a Factor Analytic Model for Group Differences in Regression Intercepts by Margarita Olivera Aguilar A Thesis Presented in Partial Fulfillment of
More informationA NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL
Discussiones Mathematicae Probability and Statistics 36 206 43 5 doi:0.75/dmps.80 A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL Tadeusz Bednarski Wroclaw University e-mail: t.bednarski@prawo.uni.wroc.pl
More informationExperimental Design and Data Analysis for Biologists
Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1
More information