Finite Mixture EFA in Mplus

Similar documents
STK4900/ Lecture 7. Program

Hotelling s Two- Sample T 2

General Linear Model Introduction, Classes of Linear models and Estimation

The Poisson Regression Model

4. Score normalization technical details We now discuss the technical details of the score normalization method.

Tests for Two Proportions in a Stratified Design (Cochran/Mantel-Haenszel Test)

Using Factor Analysis to Study the Effecting Factor on Traffic Accidents

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III

Asymptotically Optimal Simulation Allocation under Dependent Sampling

arxiv: v1 [physics.data-an] 26 Oct 2012

Robustness of classifiers to uniform l p and Gaussian noise Supplementary material

Outline for today. Maximum likelihood estimation. Computation with multivariate normal distributions. Multivariate normal distribution

Estimating function analysis for a class of Tweedie regression models

Numerical Linear Algebra

Radial Basis Function Networks: Algorithms

Introduction to Probability and Statistics

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO)

The Recursive Fitting of Multivariate. Complex Subset ARX Models

Partial Identification in Triangular Systems of Equations with Binary Dependent Variables

A Comparison between Biased and Unbiased Estimators in Ordinary Least Squares Regression

MATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK

SAS for Bayesian Mediation Analysis

Coding Along Hermite Polynomials for Gaussian Noise Channels

Estimation of the large covariance matrix with two-step monotone missing data

LOGISTIC REGRESSION. VINAYANAND KANDALA M.Sc. (Agricultural Statistics), Roll No I.A.S.R.I, Library Avenue, New Delhi

Yixi Shi. Jose Blanchet. IEOR Department Columbia University New York, NY 10027, USA. IEOR Department Columbia University New York, NY 10027, USA

Morten Frydenberg Section for Biostatistics Version :Friday, 05 September 2014

Adaptive estimation with change detection for streaming data

Notes on Instrumental Variables Methods

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests

SHAPE OPTOMIZATION OF H-BEAM FLANGE FOR MAXIMUM PLASTIC ENERGY DISSIPATION

Approximating min-max k-clustering

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning

Estimation of Separable Representations in Psychophysical Experiments

Convex Optimization methods for Computing Channel Capacity

8.7 Associated and Non-associated Flow Rules

An Analysis of Reliable Classifiers through ROC Isometrics

Nesting and Equivalence Testing

Improved Bounds on Bell Numbers and on Moments of Sums of Random Variables

A MIXED CONTROL CHART ADAPTED TO THE TRUNCATED LIFE TEST BASED ON THE WEIBULL DISTRIBUTION

Bayesian Spatially Varying Coefficient Models in the Presence of Collinearity

Models of Regression type: Logistic Regression Model for Binary Response Variable

Use of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek

A generalization of Amdahl's law and relative conditions of parallelism

Hidden Predictors: A Factor Analysis Primer

Chapter 1 Fundamentals

ECE 534 Information Theory - Midterm 2

dn i where we have used the Gibbs equation for the Gibbs energy and the definition of chemical potential

Characterizing the Behavior of a Probabilistic CMOS Switch Through Analytical Models and Its Verification Through Simulations

Paper C Exact Volume Balance Versus Exact Mass Balance in Compositional Reservoir Simulation

Biostat Methods STAT 5500/6500 Handout #12: Methods and Issues in (Binary Response) Logistic Regression

Slash Distributions and Applications

Shadow Computing: An Energy-Aware Fault Tolerant Computing Model

Evaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models

arxiv: v1 [cs.lg] 31 Jul 2014

A PEAK FACTOR FOR PREDICTING NON-GAUSSIAN PEAK RESULTANT RESPONSE OF WIND-EXCITED TALL BUILDINGS

A Qualitative Event-based Approach to Multiple Fault Diagnosis in Continuous Systems using Structural Model Decomposition

Distributed Rule-Based Inference in the Presence of Redundant Information

A SIMPLE PLASTICITY MODEL FOR PREDICTING TRANSVERSE COMPOSITE RESPONSE AND FAILURE

Information collection on a graph

COMMUNICATION BETWEEN SHAREHOLDERS 1

ON THE GRID REFINEMENT RATIO FOR ONE-DIMENSIONAL ADVECTIVE PROBLEMS WITH NONUNIFORM GRIDS

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL

Lower Confidence Bound for Process-Yield Index S pk with Autocorrelated Process Data

The non-stochastic multi-armed bandit problem

1/25/2018 LINEAR INDEPENDENCE LINEAR INDEPENDENCE LINEAR INDEPENDENCE LINEAR INDEPENDENCE

Statistics II Logistic Regression. So far... Two-way repeated measures ANOVA: an example. RM-ANOVA example: the data after log transform

Topic: Lower Bounds on Randomized Algorithms Date: September 22, 2004 Scribe: Srinath Sridhar

DETC2003/DAC AN EFFICIENT ALGORITHM FOR CONSTRUCTING OPTIMAL DESIGN OF COMPUTER EXPERIMENTS

An Introduction to SEM in Mplus

Downloaded from jhs.mazums.ac.ir at 9: on Monday September 17th 2018 [ DOI: /acadpub.jhs ]

State Estimation with ARMarkov Models

One step ahead prediction using Fuzzy Boolean Neural Networks 1

Plausible Values for Latent Variables Using Mplus

Period-two cycles in a feedforward layered neural network model with symmetric sequence processing

Fuzzy Methods. Additions to Chapter 5: Fuzzy Arithmetic. Michael Hanss.

Background. GLM with clustered data. The problem. Solutions. A fixed effects approach

Improved Capacity Bounds for the Binary Energy Harvesting Channel

1 Probability Spaces and Random Variables

Named Entity Recognition using Maximum Entropy Model SEEM5680

t 0 Xt sup X t p c p inf t 0

Uniformly best wavenumber approximations by spatial central difference operators: An initial investigation

Deriving Indicator Direct and Cross Variograms from a Normal Scores Variogram Model (bigaus-full) David F. Machuca Mory and Clayton V.

Information collection on a graph

Khinchine inequality for slightly dependent random variables

A New Asymmetric Interaction Ridge (AIR) Regression Method

arxiv: v2 [stat.me] 3 Nov 2014

Recent Advances on Computer Experiment

Version 1.0. klm. General Certificate of Education June Mathematics. Statistics 4. Mark Scheme

STABILITY ANALYSIS AND CONTROL OF STOCHASTIC DYNAMIC SYSTEMS USING POLYNOMIAL CHAOS. A Dissertation JAMES ROBERT FISHER

CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules

Supplementary materials for: Exploratory Structural Equation Modeling

José Alberto Mauricio. Instituto Complutense de Análisis Económico. Universidad Complutense de Madrid. Campus de Somosaguas Madrid - SPAIN

LINEAR SYSTEMS WITH POLYNOMIAL UNCERTAINTY STRUCTURE: STABILITY MARGINS AND CONTROL

Online Learning of Noisy Data with Kernels

ON THE DEVELOPMENT OF PARAMETER-ROBUST PRECONDITIONERS AND COMMUTATOR ARGUMENTS FOR SOLVING STOKES CONTROL PROBLEMS

Analysis of M/M/n/K Queue with Multiple Priorities

E( x ) [b(n) - a(n, m)x(m) ]

A-optimal diallel crosses for test versus control comparisons. Summary. 1. Introduction

Transcription:

Finite Mixture EFA in Mlus November 16, 2007 In this document we describe the Mixture EFA model estimated in Mlus. Four tyes of deendent variables are ossible in this model: normally distributed, ordered categorical with logit or robit link, Poisson distributed with the exonential link function, and censored variables. Inflation is not available for the Censored and Poisson variables. Suose that we estimate a K class model with M factors and P deendent variables. Denote the variables by Y 1,..., Y P and the normally distributed factors by η 1,..., η M. Let η be the vector of all latent factors η = (η 1,..., η M ). The Mixture model is based on a single categorical latent class variable C. For a normally distributed variable Y we estimate the following model in class k Y = ν k + λ k η + ε where ν k is the intercet arameter, λ k is a vector of loadings of dimension M, and ε is a zero mean normally distributed residual with variance θ k. For an ordered categorical variable Y we estimate the following model in class k P (Y = j) = F (τ kj λ k η) F (τ kj 1 λ k η) for j = 1,..., r where r is the number of categories that the variable Y takes. The arameters τ kj are monotonically increasing for j and for identification uroses τ kr = and τ k0 =. The function F is either the standard normal distribution function, for robit link, or the logit distribution function F (x) = 1/(1 + Ex( x)), for logit link. Alternatively we can secify the model as follows Y = j τ kj 1 Y < τ kj 1

where Y = λ k η + ε where ε is a residual with distribution F. For Poisson distributed variables we estimate the following model in class k P (Y = j) = e Y (Y ) j j! where Y = ν k + λ k η and the arameters to be estimated are again the intercet ν k and the loading vector λ k. For censored variables Y we estimate the following model in class k Y = { Y c if Y if Y > c c where c is the censoring limit and Y Y is latent normally distributed variable = ν k + λ k η + ε where ν k, λ k, and the variance θ k of the zero mean residual ε are to be estimated. The above model is for censored variables with a lower end bound. Similar model is available for censored variables with an uer end bound. We also estimate an unrestricted correlation matrix Ψ k for the factors η in class k when we estimate the model with oblique rotation. If we estimate the model with orthogonal rotation the correlation matrix is fixed to the identity matrix, i.e., the factors are assumed standard normal and orthogonal in all classes. Finally we estimate an unrestricted distribution for the latent class variable C, i.e., we estimate the arameters k = P (C = k). The above model is not identified in rincile. To be identified the model has to include an additional M(M 1) restrictions for oblique rotations or M(M 1)/2 restrictions for orthogonal rotations. Before we roceed with a loading rotation algorithm however we standardize the loadings with resect to the V ar(y ). For normally distributed Y we assume that Y = Y. We construct the standardized loadings λ k as follows λ k = λ k / (V ar(y )) 2

where V ar(y ) = λ k Ψ λ T k + θ k where for censored and normal variables θ k is as secified in the model, for categorical robit link variable it is θ k = 1, for categorical logit link variable θ k = π 2 /3 and for Poisson variables θ k = 0. Similarly we standardize the θ k arameter θk = θ k /V ar(y ) Note also that as constructed the standardized loadings are on the correlation scale, that is, if Λ k is the matrix of all standardized loadings and Θ k is the diagonal matrix with all θk on the diagonal, the estimated correlation matrix of Y = (Y1,..., YP ) is Λ kψλ T k + Θ k. We now define the rotation criteria that will identify the loadings and the factor correlation Ψ. All oblique factor rotations are defined by a square matrix H of dimension M such that HH T has ones on the diagonal. All orthogonal rotations are defined by orthogonal square matrices of dimension M, i.e., HH T = I, where I is the identity matrix. All such factor rotations lead to equivalent factor models with M factors. We estimate the rotation that minimizes the simlicity function, i.e., the rotation criteria Q(Λ H) across all rotation matrices H, where the rotation criteria can be any rotation criteria such as cf-varimax, quartimin, geomin etc, suorted by Mlus. With this additional constraint the loadings and factor correlation are uniquely defined. We now focus on the outut reorted by Mlus. For each class the rotation is erformed indeendently, since all loadings and residual covariances are class secific. In the Mlus outut we reort the rotated standardized loadings Λ H, where H is the otimal rotation. Standard errors for the rotated standardized loadings are also reorted. In addition the class secific intercets ν k are reorted, as well as the threshold arameters τ kj. These arameters are reorted in their original metric, however the threshold arameters τ kj are also reorted in the standardized correlation metric. Denote these by τ kj. Consequently the estimated robabilities for each category is comuted as follows P (Y = j C = k) = Φ 1 (τ kj) Φ 1 (τ kj 1). 3

This comutation is exact for the robit link function, however it is only aroximate for the logit link function. The Mixture EFA model estimation can be challenging in some instances. When all deendent variables are normally distributed there is no numerical integration involved in the estimation and the comutation is fairly quick, however sufficient number of random starts should be used to ensure that the global log-likelihood maximum is reached. When some of the variables are not normally distributed, i.e., Poisson, censored, and ordered categorical variables, numerical integration is used for all factors and thus the comutation will be significantly slower. With Poisson, censored, and ordered categorical variables the Mixture EFA model is ossible but because of the numerical integration and the random starts erturbation the comutational time might be substantial. Mixture EFA with binary variables is a articularly difficult model to estimate because of the flexibility of the model and fairly little information rovided by binary variables - in articular it is fairly easy to exceed or aroach the maximum degrees of freedom when only a few binary variables are used. In addition for Mixture EFA models with categorical variables, the best log-likelihood value found in multile starting value erturbations, can be difficult to relicate, again due to the flexibility of the model. Additional information on mixture factor analysis can be found in McLachlan and Peel (2000) and McLachlan et al. (2004). Mixture factor analysis with categorical variables is discussed in Muthen and Asarouhov (2006). Mixture EFA analysis is illustrated in Examle 4.4, Mlus User s Guide (Muthen and Muthen, 1998-2007). References McLachlan, G. & Peel, D. (2000). Finite mixture models. New York: John Wiley & Sons. McLachlan, G.J., Do, K.A., & Ambroise, C. (2004). Analyzing microarray gene exression data. New York: John Wiley & Sons. Muthen, B. & Asarouhov, T. (2006). Item resonse mixture modeling: Alication to tobacco deendence criteria. Addictive Behaviors, 31, 1050-1066. 4

Muthen, L.K. & Muthen, B.O. (1998-2007). Mlus User s Guide. Fifth Edition. Los Angeles, CA: Muthen & Muthen 5