Computational Statistics and Data Analysis. Identifiability of extended latent class models with individual covariates

Size: px
Start display at page:

Download "Computational Statistics and Data Analysis. Identifiability of extended latent class models with individual covariates"

Transcription

1 Computational Statistics and Data Analysis 52(2008) Contents lists available at ScienceDirect Computational Statistics and Data Analysis journal homepage: Identifiability of extended latent class models with individual covariates Antonio Forcina Dipartimento di Economia, Finanza e Statistica, University of Perugia, via Pascoli, 06100, Perugia, Italy a r t i c l e i n f o Article history: Received 1 March 2007 Received in revised form 17 April 2008 Accepted 28 April 2008 Available online 4 May 2008 a b s t r a c t Identifiability for a very flexible family of latent class models introduced recently is examined. These models allow for a conditional association between selected pairs of response variables conditionally on the latent and are based on logistic regression models both for the latent weights and for the conditional distributions of the response variables in terms of subject specific covariates. Generalized logits (global or continuation, which are relevant with ordered categorical responses and involve comparisons of cumulated probabilities) may be used as an alternative to the usual logits of type local which are log-linear. A compact matrix formulation for the Jacobian of the parametrization and a simple algorithm for checking local identifiability numerically is described. A few examples involving causal inference are examined Elsevier B.V. All rights reserved. 1. Introduction Conventional latent class models (see for example Goodman (1974)) assume that the association structure of a set of observed discrete responses is caused by a discrete latent variable which affects each response separately. Subsequently this approach has been extended in two directions: (i) by relaxing the assumption that the response variables are independent given the latent (Hagenaars (1988) used constrained latent log-linear models while Yang and Becker (1997), Ip (2001) and Bartolucci and Forcina (2005) among others used marginal association parameters), thus allowing, for instance, a limited number of bivariate associations conditionally on the latent;(ii) by allowing the marginal distribution of the latent and the conditional distribution of the responses to depend on individual covariates as in a generalized linear model (Dayton and Macready, 1988; Formann, 1992; Melton et al., 1994; Huang and Bandeen-Roche, 2004; Bartolucci and Forcina, 2006). While latent class models provide powerful tools for building flexible probabilistic explanations of the data, they are also extremely vulnerable with respect to identifiability. Most software for fitting latent class models can compute an estimate of the information matrix, which provides an indirect test of local identifiability at the maximum likelihood estimate, see for example Latent GOLD (Vermunt and Magidson, 2005, p. 55). Local identifiability (see Section 3 for a formal definition) is a crucial issue for the interpretation of results and the validity of asymptotic approximations. When latent class models are used to formulate problems of causal inference (which are meaningful only if the model is identifiable) one may want to assess the identifiability of different models even before collecting the data. For most conventional latent class models it is well known which one is identifiable and which is not. Instead, for the class of extended models discussed in this paper, results are available only in very special cases. Catchpole and Morgan(1997) investigate the closely related notion of parametric redundancy and suggest a symbolic algorithm for checking when this deficiency is effective; however the actual implementation of such algorithms is not at all simple in the context described above where the derivation of analytical results is, in most instances, a very challenging task. Instead, as we argue below, numerical tests of parametric redundancy based on the Jacobian of the transformation between the vector of canonical parameters of the manifest distribution and the vector of regression parameters are simple and fast; thus they may be used as a reliable diagnostics of model identifiability. Fax: address: forcina@stat.unipg.it /$ see front matter 2008 Elsevier B.V. All rights reserved. doi: /j.csda

2 5264 A. Forcina / Computational Statistics and Data Analysis 52(2008) The family of latent class models proposed by Bartolucci and Forcina (2006) are taken as the basis for the present investigation and their basic properties are described briefly in Section 2. The connection between local identifiability and parametric redundancy is recalled in Section 3. A simple matrix formulation for the Jacobian of the transformation between the canonical parameters of the manifest distribution and the regression parameters is provided and the properties of a numerical test of model identifiability are examined. A few applications are discussed in Section 4; some of these concern problems of causal inference. 2. Extended latent class models The family of latent class models presented below is essentially identical to the one proposed by Bartolucci and Forcina (2006) in the context of capture recapture date. Let Y 1,...,Y K be discrete response variables, with Y j having k j categories. Supposethatobservationsareavailableonnindividualscharacterizedeachbyapossiblydistinctvectorofcovariatesz i.let r denoteany ofthet = k j possible response configurations and q r (z) = Pr(Y 1 = r 1,...Y K = r K z) denote the corresponding cell probabilities. These probabilities (usually called manifest in contrast to the latent ones to be introducedbelow)maybearrangedintothet 1vectorq(z)inlexicographicorderbylettingtheelementsofr withagiven j index run from 1 to k j while those with a smaller k index are kept fixed. Let γ(z) denote a vector of canonical parameters for the saturated model of the corresponding multinomial distribution; these may be defined as γ(z) = H log[q(z)] where H isany matrixoft 1linearly independent rowcontrasts (see for examplebartolucciet al.(2007),p. 699). Let U denote a discrete latent variable having c categories which is assumed to explain the dependence among the response variables due to individual heterogeneity. Let π u,r (z) denote the joint distribution of (U,Y 1,...,Y K ) z; these probabilities may be arranged into the vector π(z), again in lexicographical order, by letting U run slowest from 1 to c. Clearly, the manifest probabilities q(z) may be obtained from the latent ones by marginalization: q(z) = Lπ(z), where L = 1 c I t.conventionallatentclassmodelsassumethaty 1,...,Y K areindependentconditionallyonu,z.asmentioned in the introduction, several extended models have been proposed which relax this assumption by allowing certain variables to be associated even conditionally on covariates and the latent. In the formulation adopted in this paper the task of defining a relevant family of latent class models with a parsimonious dependence structure is achieved in two steps. First, it is assumed that the latent multinomial distribution is determined by a vector of canonical parameters θ(z) = H log[π(z)] which corresponds to a hierarchical log-linear model and has a dimension v considerably smaller than ct 1. This is equivalent toassumingthatthereare (ct 1) v higherorderinteractionsconstrainedto0.secondly,theparametersofinterestare defined as regression parameters which determine how marginal logits and log-odds ratios depend on covariates; this is explained in more detail in 2.1 As concerns the first step, let G be the right inverse of H; the joint probabilities may be computed from the canonical parameters by the following reconstruction formula π(z) = exp[gθ(z)] 1 exp[gθ(z)]. ct Canonical parameters can, alternatively, be defined by writing explicitly the design matrix of the log-linear model G; a simple way of defining G is given by Bartolucci et al. (2007, p. 699). Then H may be determined as (G G G 1 ct 1 ct G/ct) 1 (G G 1 ct 1 ct /ct). A visual display of the matrix G (and of the matrices C,M to be introduced below) in a typical example is available from Marginal models The marginal parametrization adopted here is based on the work of Bergsma and Rudas(2002) who studied the properties of a class of models for categorical data which are, essentially, a combined set of log-linear models each defined within a different marginal distribution. These models may be useful, for example, when certain univariate and bivariate marginals are of direct interest. In addition, with ordered categorical variables, it may be more meaningful to use parameters based on global or continuation logits, which are not log-linear (see for example Agresti (1990), p ), as an alternative to ordinary logits. In the context of latent class models with continuous covariates and a few bivariate associations, marginal models offer, for example, the ability to formulate a regression model for the logits of any response variable conditionally on the latent but marginal with respect to the other response variables. This will usually be more meaningful than the logits of the same response conditionally on other responses in addition to the latent. In the context of causal inference they provide a direct way of modeling:(i) the marginal distribution of the latent given covariates,(ii) the distribution of the endogenous treatment given the latent and the covariates and (iii) the conditional distribution of each response given the latent, treatment and covariates It is well known (see for instance Lang (1996) p. 726) that any set of marginal parameters may be computed from the general formula η(z) = C log[mπ(z)],

3 A. Forcina / Computational Statistics and Data Analysis 52(2008) where C is a matrix of v row contrasts and M is a matrix of 0 s and 1 s which select cell probabilities to be cumulated or marginalized. The C and M matrices are determined by the following elements: The number of categories of each variable. The type of logits used for each variable; local, global or continuation. A list of the marginal parameters of interest, this may be coded into a matrix where each row corresponds to a type of parameterandeachcolumntoavariable.withineachrowavariablemaybecodedas 1 ifactive,as 0 ifmarginalized and as 2 if parameters of active variables are to be computed conditionally on each possible configuration of the conditioning variables. When only one variable is active, the row defines a set of univariate logits and when two variables are active the rowdefinesaset oflog-odds ratios. A set of Matlab functions for constructing these matrices is available from matfun.pdf Linear models The choice of a specific marginal parametrization is equivalent to the choice of a link function in a generalized linear model; as such it requires a suitable design matrix that specifies how each marginal parameter depends on covariates. Because the response distribution is multivariate, each individual (unit) contributes a whole vector of linear predictors η i (z i ) = X i β where X i is a regression matrix whose elements depend on the vector of covariates z i and β is a vector of parameters composed of intercepts and regression coefficients. UsuallyX i willbeblockdiagonalwithablockforeachcomponentofthemodel:(i)marginaldistributionofthelatent,(ii) conditional distribution of an endogenous treatment given the latent,(iii) univariate marginal distribution of responses given the latent and the treatment,(iv) specific bivariate marginal interactions between responses and treatment given the latent. The block diagonal structure follows from the fact that each of these components depend on distinct parameters. However, when experimenting with parsimonious models, it may be meaningful to impose linear constraints that involve parameters belonging to different blocks. For example, when modeling the univariate marginal distribution of certain response variables conditionally on the latent, one could constrain the slope coefficients to be constant across latent classes. Implementation of such constraints produce columns having non zero elements in different blocks. In the following let X be the matrix obtained by stacking thex i, i = 1,...,nmatrices one above the other. We assume that thismatrixis offullcolumn rankr. 3. Assessment of local identifiability For notational simplicity, in the following we will write γ i instead of γ(z i ) to denote the vector of canonical parameters for the i-th individual with covariate values z i. A similar convention will be used for vectors and matrices associated with the probability distribution conditionally on z i. We will also write γ to denote the vector obtained by stacking the vectors γ i,i = 1,...,noneabovetheother.Byasimilarconventionwewillwrite θ, π, η.thoughtheelementsofthesevectorsare functions of β, this dependence will be marked explicitly only when ambiguity may arise. Following Catchpole and Morgan (1997, p. 187) we recall that Definition 1. Amodelissaidtobelocallyidentifiableif,forany β 0,thesetof βforwhich γ(β) = γ(β 0 )satisfy β β 0 > δ for some δ > 0. Ifthisconditionwasviolatedataparametervalueβ 0,therewouldexistaneighborhoodofβ 0 whosepointscorrespondtothe same manifest distribution. As a consequence, the likelihood function would be flat around β 0 and the information matrix computed at β 0 would be singular (Catchpole and Morgan, 1997, Theorems 2 and 3). As it is explained below, following Catchpole and Morgan(1997), local identifiability is closely related to the rank of the matrix of derivatives of the canonical parameters of the manifest distribution with respect to the regression parameters β The Jacobian matrix Thematrixofderivatives of γ with respect to β may be computed by the chain rule as D = γ β = γ θ θ η η β = QRX; because the canonical parameters for the multinomial distribution of different individuals are distinct, Q and R are block diagonal matrices, so that ( ) Q1 R 1 X 1 D = Q n R n X n.

4 5266 A. Forcina / Computational Statistics and Data Analysis 52(2008) Thefirst two factors within each rowmay be computed again by the chain rule asfollows: Q i = γ i q i π i q i π i θ i = H diag(qi ) 1 LΩ i G [ ] ηi π 1 i R i = π i θ = [ C diag(mπ i ) ] 1 1 MΩ i G i where Ω i = diag(π i ) π i π i. The crucial assumption in the calculations above is the non singularity of the matrix R i ; this follows from the fact that, within the class of marginal models considered here, there is a diffeomorphism between η i and θ i (Bartolucci et al., 2007, Theorem 1). Thus, having assumed that X is of full column rank, the Q i matrices are the only component which may induce rank deficiency. On the other hand, D may still be of full rank even if the Q i matrices are not, because the presence of covariates may restore full rank and thus make identifiable a model which would be not within a single strata (see the examples in Section 4). The results of Catchpole and Morgan (1997, Theorem 4) imply that, the fact that D is of full rank for any admissible β, is a necessary and sufficient condition for the model to be locally identifiable. Thus, to show that a model is not locally identifiable, it is sufficient to find a single β for which D is not of full rank; on the other hand, the fact that D is of full rank onagridof βsmayonlyprovidesubstantialevidencethatthemodelislikelytobelocallyidentifiable.foranumericaltest, it is much easier to establish lack of identifiability than its opposite, because there may exist parameter points where local identifiability fails even if we have been unable to find one. The strategy for a numerical assessment of local identifiability proposed here is to randomly sample a sufficiently large set of parameter points and to examine the distribution of the inverse condition number; this is below when the matrix is rank deficient. Thus, if, say, on 20,000 points the inverse condition number never goes below 10 10, we may conclude, with reasonable confidence, that the model is locally identified with probability close to one Computational issues The web page describes a set of MatLab functions available on the same address which perform the following tasks: (1) Computation of the design matrix G: this requires a vector containing the number of categories for the latent and each response variable and a set of generators defining the maximal interactions to be included in the log-linear model; The matrixofgenerators is a binary matrixwith agenerator in each row. (2) Computation of the C and M matrices require the specification of a marginal parametrization as described above. (3) Computation of the X i matrices: this may be performed by a user defined function with z i as input argument. This function must specify which covariates affect each marginal parameter. The same function may also be used to impose suitable restrictions, like, for instance, that certain marginal parameters are equal or have equal intercepts. 4. Some examples In the following a few examples are described where the presence of covariates seems to restore identifiability of models which, otherwise, would not be identifiable. Within each example, local identifiability is tested by drawing a sample of 20,000parameterpoints βfroman(0 b,4i b ),wherebisthesizeof β.whencovariatesareinvolved,5observationsforeach covariate are sampled independently from a N(0, 4). A set of MatLab functions to replicate each example are provided on the same web site mentioned above. Simple latent class. Suppose that Y 1,Y 2 are binary response variables conditionally independent given a binary latent U; it is well know that this model is not identifiable. In fact the manifest distribution is determined by 3 canonical parameters, while the latent has 5 parameters (the marginal logit of U and two conditional logits for each response given U). Here D is a 3 5 matrix and thus cannot be of full column rank. Now suppose that there are two covariates, X 1 affecting the logits of the latent and X 2 affecting the conditional logits of both responses. Under the assumption that the regression coefficient for the conditional logit of Y j U does not depend on U, there are 8 marginal parameters and the model seems to be identifiable even if observations are available on very few individuals. The distribution of the condition number for D is givenintable1;becausematlabworkswithaprecisionofatleast10 12,thisindicatesthattheprobabilitythattheredoes notexist parameter valuesin the range ±10which can make the matrixdrank deficientisclose to absolute certainty. Causal inference. Consider a context where a binary treatment T may affect two binary response variables. If there is a binary latent U which is assumed to affect both the treatment and the responses, the model is not identifiable. This is obvious because γ for the joint distribution of T,Y 1,Y 2 has dimension 7 while the simplest latent class model has 9 parameters: 1 for the marginal of U, 2 for the conditional distribution of T U and 3 for the distribution of each response Y j T,U (under the assumption that T,U act additively on Y j ). Now assume that three covariates are available: X U which affects the latent, X T whichaffectstheassignmentoftreatmentandx Y whichaffectsbothresponses.intheregressionmodelweassumethat

5 A. Forcina / Computational Statistics and Data Analysis 52(2008) Table 1 Frequency distribution of the inverse condition number of the D matrix for the latent class models described in the text Type of model Condition number of D <10 8 Simple latent class 19, Causal inference 19, Conditional dependence 19, the effect of T on the responses may be different for the two latent classes but the effect of covariates does not depend on the latent. This model has 15 parameters of which 11 are intercepts (1 for U, 2 for T U and 4 for each Y i T,U) and 4 are regression parameters concerning the effect of X U on U, X T on T and X Y on Y 1 and Y 2. The results of our numerical test are given in Table 1 and indicate that the probability that the model is locally identifiable is close to absolute certainty. Conditional dependence. Consider again a problem of causal inference with an endogenous treatment and three responses, all binary. Now assume that Y 1,Y 2 and Y 1,Y 3 are not independent given U,T, though, for simplicity we assume that the two log-odds ratios do not depend on U,T. This model has 17 parameters while the canonical parameter of the manifest distributionoft,y 1,Y 2,Y 3 hassize15.nowassumeagainthattherearethreecovariates,affectingthelatent,thetreatment and the responses, and that the regression coefficients do not depend on the latent. This model has 22 parameters and, according to the simulation reported in Table 1, it should be locally identifiable. The results summarized in Table 1 suggest the following strategy in order to assess local identifiability of a given latent class model numerically. Start with only a few sample points (this would normally take few seconds) and, if an instance of rank deficiency is detected, the model is not identifiable. Otherwise, the model is probably locally identified. Almost absolute certainty may beachievedby a simulation ofthe kind reported intable 1which would normally take just a fewminutes. 5. Discussion As regards the flexibility of the latent class models considered here, it may be worth noting that log-linear parameters are a special case of marginal parameters where all variables are either active or conditioning. Thus the models considered by Hagenaars (1988) and their extension by Vermunt (1996) belong to the setting considered here. Though, for the sake of simplicity, only bivariate associations conditional on the latent have been considered explicitly, the definition of the C,M matricesand the results concerning non singularity of ther i matrices hold irrespective of the conditional association structure allowed among responses. However, in practice, it is very unlikely that interactions higher than the third order may be of interest. Note in addition that models with a complex structure of conditional dependence are very unlikely to be identifiable. The examples indicate that the presence of individual covariates may help in restoring identifiability. The price for this is intherestrictionsimposedbythelinearmodel.itmaybeusefultocompareamodelwherecovariatesareassumedtoaffect marginal logits linearly with a model where individuals are grouped into strata sharing similar covariate configurations and no restriction is imposed across strata; such a model would require a much larger number of parameters and would probably be not identifiable in most instances. The Approach advocated here has some advantages with respect to the usual diagnostic based on the information matrix: it could be performed before collecting the data; it is very fast and computationally efficient because it does not require the computation of the maximum likelihood estimate or even to write the likelihood function; the matrix D is the crucial component; this is required to compute the information matrix whose rank depends on the rankofd. Finallynotethatthestructureofthedatamaybesuchthatthemaximumlikelihoodestimateisclosetotheboundaryof the parameter space where identifiability may fail even if the model is identifiable for a wide range of the parameter values. Thus, the test proposed here cannot ensure that the information matrix is far from being singular when the maximum is close to the boundary of the parameter space. Acknowledgments The author would like to thanks F. Bartolucci and two referees for helpful comments. The author s work was supported by the Italian MIUR funds. References Agresti, A., Categorical Data Analysis. Wiley and Sons, NewYork. Bartolucci, F., Colombi, R., Forcina, A., An extended class of marginal link functions for modeling contingency tables by equality and inequality constraints. Statistica Sinica 17, Bartolucci, F., Forcina, A., Likelihood inference on the underlying structure of IRT models. Psychometrika 70,

6 5268 A. Forcina / Computational Statistics and Data Analysis 52(2008) Bartolucci, F., Forcina, A., A class of latent marginal models for capture recapture data with continuous covariates. Journal of the American Statistical Association 101, Bergsma, W., Rudas, T., Marginal models for categorical data. Annals of Statistics 30, Catchpole, E.A., Morgan, B.J.T., Detecting parameter redundancy. Biometrika 84, Dayton, C.M., Macready, G.B., Concomitant-variables latent class models. Journal of the American Statistical Association 83, Formann, A.K., Linear logistic latent class analysis for polytomous data. Journal of the American Statistical Association 87, Goodman, L., Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika 61, Hagenaars, J.A., Latent structure models with direct effects between indicators: local dependence models. Sociological Methods & Research 16, Huang, G., Bandeen-Roche, K., Building an identifiable latent class model with covariate effects on underlying and measured variables. Psychometrika 69, Ip, E.H., Testing for local dependency in dichotomous and plytomous items response models. Psychometrika 66, Lang, J.B., Maximum likelihood methods for a generalized class of log-linear models. Annals of Statistics 24, Melton, B., Liang, K.Y., Pulver, A.E., Extended latent class approach to the study of familial/sporadic forms of a disease: Its application to the study of the heterogeneity of schizophrenia. Genetic Epidemiology 11, Vermunt, J.K., Log-linear event history analysis: A general approach with missing data, unobserved heterogeneity, and latent variables, Ph.D. Thesis. Tilburg: Tilburg University Press, 350 pages. Vermunt, J.K., Magidson, J., Technical Guide for Latent GOLD 4.0: Basic and Advanced. Statistical Innovations Inc., Belmont, MA. Yang, I., Becker, M.P., Latent variable modeling of diagnostic accuracy. Biometrics 53,

Regression models for multivariate ordered responses via the Plackett distribution

Regression models for multivariate ordered responses via the Plackett distribution Journal of Multivariate Analysis 99 (2008) 2472 2478 www.elsevier.com/locate/jmva Regression models for multivariate ordered responses via the Plackett distribution A. Forcina a,, V. Dardanoni b a Dipartimento

More information

A class of latent marginal models for capture-recapture data with continuous covariates

A class of latent marginal models for capture-recapture data with continuous covariates A class of latent marginal models for capture-recapture data with continuous covariates F Bartolucci A Forcina Università di Urbino Università di Perugia FrancescoBartolucci@uniurbit forcina@statunipgit

More information

ARTICLE IN PRESS. Journal of Multivariate Analysis ( ) Contents lists available at ScienceDirect. Journal of Multivariate Analysis

ARTICLE IN PRESS. Journal of Multivariate Analysis ( ) Contents lists available at ScienceDirect. Journal of Multivariate Analysis Journal of Multivariate Analysis ( ) Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva Marginal parameterizations of discrete models

More information

Smoothness of conditional independence models for discrete data

Smoothness of conditional independence models for discrete data Smoothness of conditional independence models for discrete data A. Forcina, Dipartimento di Economia, Finanza e Statistica, University of Perugia, Italy December 6, 2010 Abstract We investigate the family

More information

AN EXTENDED CLASS OF MARGINAL LINK FUNCTIONS FOR MODELLING CONTINGENCY TABLES BY EQUALITY AND INEQUALITY CONSTRAINTS

AN EXTENDED CLASS OF MARGINAL LINK FUNCTIONS FOR MODELLING CONTINGENCY TABLES BY EQUALITY AND INEQUALITY CONSTRAINTS Statistica Sinica 17(2007), 691-711 AN EXTENDED CLASS OF MARGINAL LINK FUNCTIONS FOR MODELLING CONTINGENCY TABLES BY EQUALITY AND INEQUALITY CONSTRAINTS Francesco Bartolucci 1, Roberto Colombi 2 and Antonio

More information

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Tihomir Asparouhov 1, Bengt Muthen 2 Muthen & Muthen 1 UCLA 2 Abstract Multilevel analysis often leads to modeling

More information

Parameter Redundancy with Covariates

Parameter Redundancy with Covariates Biometrika (2010), xx, x, pp. 1 9 1 2 3 4 5 6 7 C 2007 Biometrika Trust Printed in Great Britain Parameter Redundancy with Covariates By D. J. Cole and B. J. T. Morgan School of Mathematics, Statistics

More information

Log-linear multidimensional Rasch model for capture-recapture

Log-linear multidimensional Rasch model for capture-recapture Log-linear multidimensional Rasch model for capture-recapture Elvira Pelle, University of Milano-Bicocca, e.pelle@campus.unimib.it David J. Hessen, Utrecht University, D.J.Hessen@uu.nl Peter G.M. Van der

More information

Determining the number of components in mixture models for hierarchical data

Determining the number of components in mixture models for hierarchical data Determining the number of components in mixture models for hierarchical data Olga Lukočienė 1 and Jeroen K. Vermunt 2 1 Department of Methodology and Statistics, Tilburg University, P.O. Box 90153, 5000

More information

PACKAGE LMest FOR LATENT MARKOV ANALYSIS

PACKAGE LMest FOR LATENT MARKOV ANALYSIS PACKAGE LMest FOR LATENT MARKOV ANALYSIS OF LONGITUDINAL CATEGORICAL DATA Francesco Bartolucci 1, Silvia Pandofi 1, and Fulvia Pennoni 2 1 Department of Economics, University of Perugia (e-mail: francesco.bartolucci@unipg.it,

More information

STA 216, GLM, Lecture 16. October 29, 2007

STA 216, GLM, Lecture 16. October 29, 2007 STA 216, GLM, Lecture 16 October 29, 2007 Efficient Posterior Computation in Factor Models Underlying Normal Models Generalized Latent Trait Models Formulation Genetic Epidemiology Illustration Structural

More information

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary

More information

The Expected Parameter Change (EPC) for Local. Dependence Assessment in Binary Data Latent. Class Models

The Expected Parameter Change (EPC) for Local. Dependence Assessment in Binary Data Latent. Class Models The Expected Parameter Change (EPC) for Local Dependence Assessment in Binary Data Latent Class Models Daniel L. Oberski Jeroen K. Vermunt Tilburg University, 5000 LE Tilburg, The Netherlands Abstract

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Experimental Design and Data Analysis for Biologists

Experimental Design and Data Analysis for Biologists Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1

More information

Bayes methods for categorical data. April 25, 2017

Bayes methods for categorical data. April 25, 2017 Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,

More information

A general mixed model approach for spatio-temporal regression data

A general mixed model approach for spatio-temporal regression data A general mixed model approach for spatio-temporal regression data Thomas Kneib, Ludwig Fahrmeir & Stefan Lang Department of Statistics, Ludwig-Maximilians-University Munich 1. Spatio-temporal regression

More information

SESSION 2 ASSIGNED READING MATERIALS

SESSION 2 ASSIGNED READING MATERIALS Introduction to Latent Class Modeling using Latent GOLD SESSION 2 SESSION 2 ASSIGNED READING MATERIALS Copyright 2012 by Statistical Innovations Inc. All rights reserved. No part of this material may be

More information

Chapter 4 Longitudinal Research Using Mixture Models

Chapter 4 Longitudinal Research Using Mixture Models Chapter 4 Longitudinal Research Using Mixture Models Jeroen K. Vermunt Abstract This chapter provides a state-of-the-art overview of the use of mixture and latent class models for the analysis of longitudinal

More information

Bayesian networks with a logistic regression model for the conditional probabilities

Bayesian networks with a logistic regression model for the conditional probabilities Available online at www.sciencedirect.com International Journal of Approximate Reasoning 48 (2008) 659 666 www.elsevier.com/locate/ijar Bayesian networks with a logistic regression model for the conditional

More information

A NEW MODEL FOR THE FUSION OF MAXDIFF SCALING

A NEW MODEL FOR THE FUSION OF MAXDIFF SCALING A NEW MODEL FOR THE FUSION OF MAXDIFF SCALING AND RATINGS DATA JAY MAGIDSON 1 STATISTICAL INNOVATIONS INC. DAVE THOMAS SYNOVATE JEROEN K. VERMUNT TILBURG UNIVERSITY ABSTRACT A property of MaxDiff (Maximum

More information

COMPOSITIONAL IDEAS IN THE BAYESIAN ANALYSIS OF CATEGORICAL DATA WITH APPLICATION TO DOSE FINDING CLINICAL TRIALS

COMPOSITIONAL IDEAS IN THE BAYESIAN ANALYSIS OF CATEGORICAL DATA WITH APPLICATION TO DOSE FINDING CLINICAL TRIALS COMPOSITIONAL IDEAS IN THE BAYESIAN ANALYSIS OF CATEGORICAL DATA WITH APPLICATION TO DOSE FINDING CLINICAL TRIALS M. Gasparini and J. Eisele 2 Politecnico di Torino, Torino, Italy; mauro.gasparini@polito.it

More information

Statistical Power of Likelihood-Ratio and Wald Tests in Latent Class Models with Covariates

Statistical Power of Likelihood-Ratio and Wald Tests in Latent Class Models with Covariates Statistical Power of Likelihood-Ratio and Wald Tests in Latent Class Models with Covariates Abstract Dereje W. Gudicha, Verena D. Schmittmann, and Jeroen K.Vermunt Department of Methodology and Statistics,

More information

1 Introduction. 2 A regression model

1 Introduction. 2 A regression model Regression Analysis of Compositional Data When Both the Dependent Variable and Independent Variable Are Components LA van der Ark 1 1 Tilburg University, The Netherlands; avdark@uvtnl Abstract It is well

More information

Describing Contingency tables

Describing Contingency tables Today s topics: Describing Contingency tables 1. Probability structure for contingency tables (distributions, sensitivity/specificity, sampling schemes). 2. Comparing two proportions (relative risk, odds

More information

Logistic regression: Why we often can do what we think we can do. Maarten Buis 19 th UK Stata Users Group meeting, 10 Sept. 2015

Logistic regression: Why we often can do what we think we can do. Maarten Buis 19 th UK Stata Users Group meeting, 10 Sept. 2015 Logistic regression: Why we often can do what we think we can do Maarten Buis 19 th UK Stata Users Group meeting, 10 Sept. 2015 1 Introduction Introduction - In 2010 Carina Mood published an overview article

More information

Growth models for categorical response variables: standard, latent-class, and hybrid approaches

Growth models for categorical response variables: standard, latent-class, and hybrid approaches Growth models for categorical response variables: standard, latent-class, and hybrid approaches Jeroen K. Vermunt Department of Methodology and Statistics, Tilburg University 1 Introduction There are three

More information

Applied Psychological Measurement 2001; 25; 283

Applied Psychological Measurement 2001; 25; 283 Applied Psychological Measurement http://apm.sagepub.com The Use of Restricted Latent Class Models for Defining and Testing Nonparametric and Parametric Item Response Theory Models Jeroen K. Vermunt Applied

More information

The Brown and Payne model of voter transition revisited

The Brown and Payne model of voter transition revisited The Brown and Payne model of voter transition revisited Antonio Forcina and Giovanni M. Marchetti Abstract We attempt a critical assessment of the assumptions, in terms of voting behavior, underlying the

More information

2. Basic concepts on HMM models

2. Basic concepts on HMM models Hierarchical Multinomial Marginal Models Modelli gerarchici marginali per variabili casuali multinomiali Roberto Colombi Dipartimento di Ingegneria dell Informazione e Metodi Matematici, Università di

More information

Links Between Binary and Multi-Category Logit Item Response Models and Quasi-Symmetric Loglinear Models

Links Between Binary and Multi-Category Logit Item Response Models and Quasi-Symmetric Loglinear Models Links Between Binary and Multi-Category Logit Item Response Models and Quasi-Symmetric Loglinear Models Alan Agresti Department of Statistics University of Florida Gainesville, Florida 32611-8545 July

More information

Testing order restrictions in contingency tables

Testing order restrictions in contingency tables Metrika manuscript No. (will be inserted by the editor) Testing order restrictions in contingency tables R. Colombi A. Forcina Received: date / Accepted: date Abstract Though several interesting models

More information

Analysing geoadditive regression data: a mixed model approach

Analysing geoadditive regression data: a mixed model approach Analysing geoadditive regression data: a mixed model approach Institut für Statistik, Ludwig-Maximilians-Universität München Joint work with Ludwig Fahrmeir & Stefan Lang 25.11.2005 Spatio-temporal regression

More information

Correspondence Analysis of Longitudinal Data

Correspondence Analysis of Longitudinal Data Correspondence Analysis of Longitudinal Data Mark de Rooij* LEIDEN UNIVERSITY, LEIDEN, NETHERLANDS Peter van der G. M. Heijden UTRECHT UNIVERSITY, UTRECHT, NETHERLANDS *Corresponding author (rooijm@fsw.leidenuniv.nl)

More information

Latent Class Analysis for Models with Error of Measurement Using Log-Linear Models and An Application to Women s Liberation Data

Latent Class Analysis for Models with Error of Measurement Using Log-Linear Models and An Application to Women s Liberation Data Journal of Data Science 9(2011), 43-54 Latent Class Analysis for Models with Error of Measurement Using Log-Linear Models and An Application to Women s Liberation Data Haydar Demirhan Hacettepe University

More information

Using Mixture Latent Markov Models for Analyzing Change in Longitudinal Data with the New Latent GOLD 5.0 GUI

Using Mixture Latent Markov Models for Analyzing Change in Longitudinal Data with the New Latent GOLD 5.0 GUI Using Mixture Latent Markov Models for Analyzing Change in Longitudinal Data with the New Latent GOLD 5.0 GUI Jay Magidson, Ph.D. President, Statistical Innovations Inc. Belmont, MA., U.S. statisticalinnovations.com

More information

MARGINAL MODELS FOR CATEGORICAL DATA. BY WICHER P. BERGSMA 1 AND TAMÁS RUDAS 2 Tilburg University and Eötvös Loránd University

MARGINAL MODELS FOR CATEGORICAL DATA. BY WICHER P. BERGSMA 1 AND TAMÁS RUDAS 2 Tilburg University and Eötvös Loránd University The Annals of Statistics 2002, Vol. 30, No. 1, 140 159 MARGINAL MODELS FOR CATEGORICAL DATA BY WICHER P. BERGSMA 1 AND TAMÁS RUDAS 2 Tilburg University and Eötvös Loránd University Statistical models defined

More information

Tilburg University. Mixed-effects logistic regression models for indirectly observed outcome variables Vermunt, Jeroen

Tilburg University. Mixed-effects logistic regression models for indirectly observed outcome variables Vermunt, Jeroen Tilburg University Mixed-effects logistic regression models for indirectly observed outcome variables Vermunt, Jeroen Published in: Multivariate Behavioral Research Document version: Peer reviewed version

More information

Identification of discrete concentration graph models with one hidden binary variable

Identification of discrete concentration graph models with one hidden binary variable Bernoulli 19(5A), 2013, 1920 1937 DOI: 10.3150/12-BEJ435 Identification of discrete concentration graph models with one hidden binary variable ELENA STANGHELLINI 1 and BARBARA VANTAGGI 2 1 D.E.F.S., Università

More information

A Two-latent-class Model for Education Transmission

A Two-latent-class Model for Education Transmission A Two-latent-class Model for Education Transmission Antonio Forcina Dipartimento di Economia, Finanza e Statistica, via Pascoli 10, 06100 Perugia, Italy Salvatore Modica Dipartimento SEAF, Viale delle

More information

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal

More information

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review

More information

Introduction to Statistical Analysis

Introduction to Statistical Analysis Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive

More information

The Expected Parameter Change (EPC) for Local. Dependence Assessment in Binary Data Latent. Class Models

The Expected Parameter Change (EPC) for Local. Dependence Assessment in Binary Data Latent. Class Models The Expected Parameter Change (EPC) for Local Dependence Assessment in Binary Data Latent Class Models DL Oberski JK Vermunt Dept of Methodology & Statistics, Tilburg University, The Netherlands Abstract

More information

Unsupervised Learning with Permuted Data

Unsupervised Learning with Permuted Data Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University

More information

Title: Testing for Measurement Invariance with Latent Class Analysis. Abstract

Title: Testing for Measurement Invariance with Latent Class Analysis. Abstract 1 Title: Testing for Measurement Invariance with Latent Class Analysis Authors: Miloš Kankaraš*, Guy Moors*, and Jeroen K. Vermunt Abstract Testing for measurement invariance can be done within the context

More information

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

More information

AVOIDING BOUNDARY ESTIMATES IN LATENT CLASS ANALYSIS BY BAYESIAN POSTERIOR MODE ESTIMATION

AVOIDING BOUNDARY ESTIMATES IN LATENT CLASS ANALYSIS BY BAYESIAN POSTERIOR MODE ESTIMATION Behaviormetrika Vol33, No1, 2006, 43 59 AVOIDING BOUNDARY ESTIMATES IN LATENT CLASS ANALYSIS BY BAYESIAN POSTERIOR MODE ESTIMATION Francisca Galindo Garre andjeroenkvermunt In maximum likelihood estimation

More information

A Guide to Modern Econometric:

A Guide to Modern Econometric: A Guide to Modern Econometric: 4th edition Marno Verbeek Rotterdam School of Management, Erasmus University, Rotterdam B 379887 )WILEY A John Wiley & Sons, Ltd., Publication Contents Preface xiii 1 Introduction

More information

Repeated ordinal measurements: a generalised estimating equation approach

Repeated ordinal measurements: a generalised estimating equation approach Repeated ordinal measurements: a generalised estimating equation approach David Clayton MRC Biostatistics Unit 5, Shaftesbury Road Cambridge CB2 2BW April 7, 1992 Abstract Cumulative logit and related

More information

Investigating Models with Two or Three Categories

Investigating Models with Two or Three Categories Ronald H. Heck and Lynn N. Tabata 1 Investigating Models with Two or Three Categories For the past few weeks we have been working with discriminant analysis. Let s now see what the same sort of model might

More information

A dynamic model for binary panel data with unobserved heterogeneity admitting a n-consistent conditional estimator

A dynamic model for binary panel data with unobserved heterogeneity admitting a n-consistent conditional estimator A dynamic model for binary panel data with unobserved heterogeneity admitting a n-consistent conditional estimator Francesco Bartolucci and Valentina Nigro Abstract A model for binary panel data is introduced

More information

MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES

MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES REVSTAT Statistical Journal Volume 13, Number 3, November 2015, 233 243 MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES Authors: Serpil Aktas Department of

More information

Generalized Linear Models (GLZ)

Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) are an extension of the linear modeling process that allows models to be fit to data that follow probability distributions other than the

More information

Subject CS1 Actuarial Statistics 1 Core Principles

Subject CS1 Actuarial Statistics 1 Core Principles Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

Comparing IRT with Other Models

Comparing IRT with Other Models Comparing IRT with Other Models Lecture #14 ICPSR Item Response Theory Workshop Lecture #14: 1of 45 Lecture Overview The final set of slides will describe a parallel between IRT and another commonly used

More information

Figure 36: Respiratory infection versus time for the first 49 children.

Figure 36: Respiratory infection versus time for the first 49 children. y BINARY DATA MODELS We devote an entire chapter to binary data since such data are challenging, both in terms of modeling the dependence, and parameter interpretation. We again consider mixed effects

More information

Nesting and Equivalence Testing

Nesting and Equivalence Testing Nesting and Equivalence Testing Tihomir Asparouhov and Bengt Muthén August 13, 2018 Abstract In this note, we discuss the nesting and equivalence testing (NET) methodology developed in Bentler and Satorra

More information

Likelihood inference for a class of latent Markov models under linear hypotheses on the transition probabilities

Likelihood inference for a class of latent Markov models under linear hypotheses on the transition probabilities Likelihood inference for a class of latent Markov models under linear hypotheses on the transition probabilities Francesco Bartolucci October 21, Abstract For a class of latent Markov (LM) models for discrete

More information

Power and Sample Size Computation for Wald Tests in Latent Class Models

Power and Sample Size Computation for Wald Tests in Latent Class Models Journal of Classification 33:30-51 (2016) DOI: 10.1007/s00357-016-9199-1 Power and Sample Size Computation for Wald Tests in Latent Class Models Dereje W. Gudicha Tilburg University, The Netherlands Fetene

More information

Outline. Clustering. Capturing Unobserved Heterogeneity in the Austrian Labor Market Using Finite Mixtures of Markov Chain Models

Outline. Clustering. Capturing Unobserved Heterogeneity in the Austrian Labor Market Using Finite Mixtures of Markov Chain Models Capturing Unobserved Heterogeneity in the Austrian Labor Market Using Finite Mixtures of Markov Chain Models Collaboration with Rudolf Winter-Ebmer, Department of Economics, Johannes Kepler University

More information

Anders Skrondal. Norwegian Institute of Public Health London School of Hygiene and Tropical Medicine. Based on joint work with Sophia Rabe-Hesketh

Anders Skrondal. Norwegian Institute of Public Health London School of Hygiene and Tropical Medicine. Based on joint work with Sophia Rabe-Hesketh Constructing Latent Variable Models using Composite Links Anders Skrondal Norwegian Institute of Public Health London School of Hygiene and Tropical Medicine Based on joint work with Sophia Rabe-Hesketh

More information

An Introduction to Multivariate Statistical Analysis

An Introduction to Multivariate Statistical Analysis An Introduction to Multivariate Statistical Analysis Third Edition T. W. ANDERSON Stanford University Department of Statistics Stanford, CA WILEY- INTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION Contents

More information

Longitudinal Modeling with Logistic Regression

Longitudinal Modeling with Logistic Regression Newsom 1 Longitudinal Modeling with Logistic Regression Longitudinal designs involve repeated measurements of the same individuals over time There are two general classes of analyses that correspond to

More information

A general non-parametric approach to the analysis of ordinal categorical data Vermunt, Jeroen

A general non-parametric approach to the analysis of ordinal categorical data Vermunt, Jeroen Tilburg University A general non-parametric approach to the analysis of ordinal categorical data Vermunt, Jeroen Published in: Sociological Methodology Document version: Peer reviewed version Publication

More information

Likelihood-ratio tests for order-restricted log-linear models Galindo-Garre, F.; Vermunt, Jeroen; Croon, M.A.

Likelihood-ratio tests for order-restricted log-linear models Galindo-Garre, F.; Vermunt, Jeroen; Croon, M.A. Tilburg University Likelihood-ratio tests for order-restricted log-linear models Galindo-Garre, F.; Vermunt, Jeroen; Croon, M.A. Published in: Metodología de las Ciencias del Comportamiento Publication

More information

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs Michael J. Daniels and Chenguang Wang Jan. 18, 2009 First, we would like to thank Joe and Geert for a carefully

More information

Discrete Multivariate Statistics

Discrete Multivariate Statistics Discrete Multivariate Statistics Univariate Discrete Random variables Let X be a discrete random variable which, in this module, will be assumed to take a finite number of t different values which are

More information

Factor Analysis. Qian-Li Xue

Factor Analysis. Qian-Li Xue Factor Analysis Qian-Li Xue Biostatistics Program Harvard Catalyst The Harvard Clinical & Translational Science Center Short course, October 7, 06 Well-used latent variable models Latent variable scale

More information

Markov-switching autoregressive latent variable models for longitudinal data

Markov-switching autoregressive latent variable models for longitudinal data Markov-swching autoregressive latent variable models for longudinal data Silvia Bacci Francesco Bartolucci Fulvia Pennoni Universy of Perugia (Italy) Universy of Perugia (Italy) Universy of Milano Bicocca

More information

Mixed-Effects Logistic Regression Models for Indirectly Observed Discrete Outcome Variables

Mixed-Effects Logistic Regression Models for Indirectly Observed Discrete Outcome Variables MULTIVARIATE BEHAVIORAL RESEARCH, 40(3), 28 30 Copyright 2005, Lawrence Erlbaum Associates, Inc. Mixed-Effects Logistic Regression Models for Indirectly Observed Discrete Outcome Variables Jeroen K. Vermunt

More information

Statistical power of likelihood ratio and Wald tests in latent class models with covariates

Statistical power of likelihood ratio and Wald tests in latent class models with covariates DOI 10.3758/s13428-016-0825-y Statistical power of likelihood ratio and Wald tests in latent class models with covariates Dereje W. Gudicha 1 Verena D. Schmittmann 1 Jeroen K. Vermunt 1 The Author(s) 2016.

More information

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

,..., θ(2),..., θ(n)

,..., θ(2),..., θ(n) Likelihoods for Multivariate Binary Data Log-Linear Model We have 2 n 1 distinct probabilities, but we wish to consider formulations that allow more parsimonious descriptions as a function of covariates.

More information

FREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE

FREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE FREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE Donald A. Pierce Oregon State Univ (Emeritus), RERF Hiroshima (Retired), Oregon Health Sciences Univ (Adjunct) Ruggero Bellio Univ of Udine For Perugia

More information

Logistic regression analysis with multidimensional random effects: A comparison of three approaches

Logistic regression analysis with multidimensional random effects: A comparison of three approaches Quality & Quantity manuscript No. (will be inserted by the editor) Logistic regression analysis with multidimensional random effects: A comparison of three approaches Olga Lukočienė Jeroen K. Vermunt Received:

More information

TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST

TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST Econometrics Working Paper EWP0402 ISSN 1485-6441 Department of Economics TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST Lauren Bin Dong & David E. A. Giles Department

More information

STAT 5500/6500 Conditional Logistic Regression for Matched Pairs

STAT 5500/6500 Conditional Logistic Regression for Matched Pairs STAT 5500/6500 Conditional Logistic Regression for Matched Pairs The data for the tutorial came from support.sas.com, The LOGISTIC Procedure: Conditional Logistic Regression for Matched Pairs Data :: SAS/STAT(R)

More information

D-optimal Designs for Multinomial Logistic Models

D-optimal Designs for Multinomial Logistic Models D-optimal Designs for Multinomial Logistic Models Jie Yang University of Illinois at Chicago Joint with Xianwei Bu and Dibyen Majumdar October 12, 2017 1 Multinomial Logistic Models Cumulative logit model:

More information

A general non-parametric approach to the analysis of ordinal categorical data Vermunt, Jeroen

A general non-parametric approach to the analysis of ordinal categorical data Vermunt, Jeroen Tilburg University A general non-parametric approach to the analysis of ordinal categorical data Vermunt, Jeroen Published in: Sociological Methodology Document version: Peer reviewed version Publication

More information

Decomposition of Parsimonious Independence Model Using Pearson, Kendall and Spearman s Correlations for Two-Way Contingency Tables

Decomposition of Parsimonious Independence Model Using Pearson, Kendall and Spearman s Correlations for Two-Way Contingency Tables International Journal of Statistics and Probability; Vol. 7 No. 3; May 208 ISSN 927-7032 E-ISSN 927-7040 Published by Canadian Center of Science and Education Decomposition of Parsimonious Independence

More information

Stat 642, Lecture notes for 04/12/05 96

Stat 642, Lecture notes for 04/12/05 96 Stat 642, Lecture notes for 04/12/05 96 Hosmer-Lemeshow Statistic The Hosmer-Lemeshow Statistic is another measure of lack of fit. Hosmer and Lemeshow recommend partitioning the observations into 10 equal

More information

ANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS

ANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS Libraries 1997-9th Annual Conference Proceedings ANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS Eleanor F. Allan Follow this and additional works at: http://newprairiepress.org/agstatconference

More information

Simulating Longer Vectors of Correlated Binary Random Variables via Multinomial Sampling

Simulating Longer Vectors of Correlated Binary Random Variables via Multinomial Sampling Simulating Longer Vectors of Correlated Binary Random Variables via Multinomial Sampling J. Shults a a Department of Biostatistics, University of Pennsylvania, PA 19104, USA (v4.0 released January 2015)

More information

On the Correlations of Trend-Cycle Errors

On the Correlations of Trend-Cycle Errors On the Correlations of Trend-Cycle Errors Tatsuma Wada Wayne State University This version: December 19, 11 Abstract This note provides explanations for an unexpected result, namely, the estimated parameter

More information

Factor Analysis (10/2/13)

Factor Analysis (10/2/13) STA561: Probabilistic machine learning Factor Analysis (10/2/13) Lecturer: Barbara Engelhardt Scribes: Li Zhu, Fan Li, Ni Guan Factor Analysis Factor analysis is related to the mixture models we have studied.

More information

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW SSC Annual Meeting, June 2015 Proceedings of the Survey Methods Section ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW Xichen She and Changbao Wu 1 ABSTRACT Ordinal responses are frequently involved

More information

Using Bayesian Priors for More Flexible Latent Class Analysis

Using Bayesian Priors for More Flexible Latent Class Analysis Using Bayesian Priors for More Flexible Latent Class Analysis Tihomir Asparouhov Bengt Muthén Abstract Latent class analysis is based on the assumption that within each class the observed class indicator

More information

Computational Systems Biology: Biology X

Computational Systems Biology: Biology X Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#7:(Mar-23-2010) Genome Wide Association Studies 1 The law of causality... is a relic of a bygone age, surviving, like the monarchy,

More information

ECON 594: Lecture #6

ECON 594: Lecture #6 ECON 594: Lecture #6 Thomas Lemieux Vancouver School of Economics, UBC May 2018 1 Limited dependent variables: introduction Up to now, we have been implicitly assuming that the dependent variable, y, was

More information

Modeling Joint and Marginal Distributions in the Analysis of Categorical Panel Data

Modeling Joint and Marginal Distributions in the Analysis of Categorical Panel Data SOCIOLOGICAL Vermunt et al. / JOINT METHODS AND MARGINAL & RESEARCH DISTRIBUTIONS This article presents a unifying approach to the analysis of repeated univariate categorical (ordered) responses based

More information

Introduction to mtm: An R Package for Marginalized Transition Models

Introduction to mtm: An R Package for Marginalized Transition Models Introduction to mtm: An R Package for Marginalized Transition Models Bryan A. Comstock and Patrick J. Heagerty Department of Biostatistics University of Washington 1 Introduction Marginalized transition

More information

TWO-STEP ESTIMATION OF MODELS BETWEEN LATENT CLASSES AND EXTERNAL VARIABLES

TWO-STEP ESTIMATION OF MODELS BETWEEN LATENT CLASSES AND EXTERNAL VARIABLES TWO-STEP ESTIMATION OF MODELS BETWEEN LATENT CLASSES AND EXTERNAL VARIABLES Zsuzsa Bakk leiden university Jouni Kuha london school of economics and political science May 19, 2017 Correspondence should

More information

Contents. Preface to Second Edition Preface to First Edition Abbreviations PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1

Contents. Preface to Second Edition Preface to First Edition Abbreviations PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1 Contents Preface to Second Edition Preface to First Edition Abbreviations xv xvii xix PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1 1 The Role of Statistical Methods in Modern Industry and Services

More information

Quantitative Trendspotting. Rex Yuxing Du and Wagner A. Kamakura. Web Appendix A Inferring and Projecting the Latent Dynamic Factors

Quantitative Trendspotting. Rex Yuxing Du and Wagner A. Kamakura. Web Appendix A Inferring and Projecting the Latent Dynamic Factors 1 Quantitative Trendspotting Rex Yuxing Du and Wagner A. Kamakura Web Appendix A Inferring and Projecting the Latent Dynamic Factors The procedure for inferring the latent state variables (i.e., [ ] ),

More information

Regression Graphics. 1 Introduction. 2 The Central Subspace. R. D. Cook Department of Applied Statistics University of Minnesota St.

Regression Graphics. 1 Introduction. 2 The Central Subspace. R. D. Cook Department of Applied Statistics University of Minnesota St. Regression Graphics R. D. Cook Department of Applied Statistics University of Minnesota St. Paul, MN 55108 Abstract This article, which is based on an Interface tutorial, presents an overview of regression

More information

DIAGNOSTICS FOR STRATIFIED CLINICAL TRIALS IN PROPORTIONAL ODDS MODELS

DIAGNOSTICS FOR STRATIFIED CLINICAL TRIALS IN PROPORTIONAL ODDS MODELS DIAGNOSTICS FOR STRATIFIED CLINICAL TRIALS IN PROPORTIONAL ODDS MODELS Ivy Liu and Dong Q. Wang School of Mathematics, Statistics and Computer Science Victoria University of Wellington New Zealand Corresponding

More information

Contents. Part I: Fundamentals of Bayesian Inference 1

Contents. Part I: Fundamentals of Bayesian Inference 1 Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian

More information