Latent Class Analysis

Similar documents
What is Latent Class Analysis. Tarani Chandola

Factor Analysis. Qian-Li Xue

Determining the number of components in mixture models for hierarchical data

Categorical and Zero Inflated Growth Models

Mixture Modeling. Identifying the Correct Number of Classes in a Growth Mixture Model. Davood Tofighi Craig Enders Arizona State University

Introduction to latent class model

Growth Mixture Model

Web-based Supplementary Materials for Multilevel Latent Class Models with Dirichlet Mixing Distribution

Mixtures of Rasch Models

SEM for Categorical Outcomes

Variable-Specific Entropy Contribution

Power analysis for the Bootstrap Likelihood Ratio Test in Latent Class Models

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models

lcda: Local Classification of Discrete Data by Latent Class Models

STA 216, GLM, Lecture 16. October 29, 2007

Multilevel Statistical Models: 3 rd edition, 2003 Contents

The impact of covariance misspecification in multivariate Gaussian mixtures on estimation and inference

Generalization to Multi-Class and Continuous Responses. STA Data Mining I

Bayes methods for categorical data. April 25, 2017

Testing the Limits of Latent Class Analysis. Ingrid Carlson Wurpts

Inference using structural equations with latent variables

Mixture Modeling in Mplus

Auxiliary Variables in Mixture Modeling: Using the BCH Method in Mplus to Estimate a Distal Outcome Model and an Arbitrary Secondary Model

Time-Invariant Predictors in Longitudinal Models

Introduction to Structural Equation Modeling

Plausible Values for Latent Variables Using Mplus

The Common Factor Model. Measurement Methods Lecture 15 Chapter 9

Multiple Group CFA Invariance Example (data from Brown Chapter 7) using MLR Mplus 7.4: Major Depression Criteria across Men and Women (n = 345 each)

DIC, AIC, BIC, PPL, MSPE Residuals Predictive residuals

Default Priors and Effcient Posterior Computation in Bayesian

Model Estimation Example

Introduction to Within-Person Analysis and RM ANOVA

Introduction to Random Effects of Time and Model Estimation

ABSTRACT. Yunyun Dai, Doctor of Philosophy, Mixtures of item response theory models have been proposed as a technique to explore

Time-Invariant Predictors in Longitudinal Models

MODEL BASED CLUSTERING FOR COUNT DATA

Related Concepts: Lecture 9 SEM, Statistical Modeling, AI, and Data Mining. I. Terminology of SEM

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Model comparison. Patrick Breheny. March 28. Introduction Measures of predictive power Model selection

Richard N. Jones, Sc.D. HSPH Kresge G2 October 5, 2011

Describing Within-Person Fluctuation over Time using Alternative Covariance Structures

An Introduction to Mplus and Path Analysis

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Bayesian Analysis of Latent Variable Models using Mplus

Generalized Linear Models for Non-Normal Data

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Relating Latent Class Analysis Results to Variables not Included in the Analysis

Latent class analysis with multiple latent group variables

Statistical power of likelihood ratio and Wald tests in latent class models with covariates

SURVIVAL ANALYSIS WITH MULTIPLE DISCRETE INDICATORS OF LATENT CLASSES KLAUS LARSEN, UCLA DRAFT - DO NOT DISTRIBUTE. 1.

Application of Item Response Theory Models for Intensive Longitudinal Data

Bayesian non-parametric model to longitudinally predict churn

November 2002 STA Random Effects Selection in Linear Mixed Models

Latent class analysis and finite mixture models with Stata

Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation

All models are wrong but some are useful. George Box (1979)

Investigating the Feasibility of Using Mplus in the Estimation of Growth Mixture Models

Case of single exogenous (iv) variable (with single or multiple mediators) iv à med à dv. = β 0. iv i. med i + α 1

Structural Equation Modeling and Confirmatory Factor Analysis. Types of Variables

Longitudinal analysis of ordinal data

COMPARING THREE EFFECT SIZES FOR LATENT CLASS ANALYSIS. Elvalicia A. Granado, B.S., M.S. Dissertation Prepared for the Degree of DOCTOR OF PHILOSOPHY

Nesting and Equivalence Testing

General structural model Part 2: Categorical variables and beyond. Psychology 588: Covariance structure and factor models

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs

Generalized Models: Part 1

Estimating Diagnostic Error without a Gold Standard: A Mixed Membership Approach

Multi-level Models: Idea

A (Brief) Introduction to Crossed Random Effects Models for Repeated Measures Data

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM

Global Model Fit Test for Nonlinear SEM

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Variable selection for model-based clustering of categorical data

Model-based cluster analysis: a Defence. Gilles Celeux Inria Futurs

Multi-group analyses for measurement invariance parameter estimates and model fit (ML)

Chapter 14 Combining Models

Time-Invariant Predictors in Longitudinal Models

Chapter 4: Factor Analysis

Time Invariant Predictors in Longitudinal Models

An Introduction to Path Analysis

Review of CLDP 944: Multilevel Models for Longitudinal Data

Optimization in latent class analysis

INTRODUCTION TO STRUCTURAL EQUATION MODELS

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

Non-Parametric Bayes

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Model Assumptions; Predicting Heterogeneity of Variance

Bayesian Mixture Modeling

Clustering on Unobserved Data using Mixture of Gaussians

Basics of Modern Missing Data Analysis

Stochastic approximation EM algorithm in nonlinear mixed effects model for viral load decrease during anti-hiv treatment

Measurement Invariance (MI) in CFA and Differential Item Functioning (DIF) in IRT/IFA

CASE STUDY: Bayesian Incidence Analyses from Cross-Sectional Data with Multiple Markers of Disease Severity. Outline:

A Bayesian Nonparametric Model for Predicting Disease Status Using Longitudinal Profiles

Investigating Population Heterogeneity With Factor Mixture Models

Measurement Invariance Testing with Many Groups: A Comparison of Five Approaches (Online Supplements)

Transcription:

Latent Class Analysis Karen Bandeen-Roche October 27, 2016

Objectives For you to leave here knowing When is latent class analysis (LCA) model useful? What is the LCA model its underlying assumptions? How are LCA parameters interpreted? How are LCA parameters commonly estimated? How is LCA fit adjudicated? What are considerations for identifiability / estimability?

Motivating Example Frailty of Older Adults the sixth age shifts into the lean and slipper d pantaloon, with spectacles on nose and pouch on side, his youthful hose well sav d, a world too wide, for his shrunk shank -- Shakespeare, As You Like It

The Frailty Construct Fried et al., J Gerontol 2001; Bandeen-Roche et al., J Gerontol, 2006

Frailty as a latent variable Underlying : status or degree of syndrome Surrogates : Fried et al. (2001) criteria weight loss above threshold low energy expenditure low walking speed weakness beyond threshold exhaustion

Part I: Model

Latent class model ε 1 Y 1 Frailty η Structural Y m ε m Measurement

Well-used latent variable models Latent variable scale Observed variable scale Continuous Discrete Continuous Factor analysis LISREL Discrete FA IRT (item response) Discrete Latent profile Growth mixture Latent class analysis, regression General software: MPlus, Latent Gold, WinBugs (Bayesian), NLMIXED (SAS)

Analysis of underlying subpopulations Latent class analysis POPULATION U i P 1 P J 11 1M J1 JM Y 1 Y M Y 1 Y M Lazarsfeld & Henry, Latent Structure Analysis, 1968; Goodman, Biometrika, 1974

Latent Variables: What? Integrands in a hierarchical model Observed variables (i=1,,n): Y i =M-variate; x i =P-variate Focus: response (Y) distribution = G Yx (y/x) ( y x) ; x-dependence Model: Y i generated from latent (underlying) U i : F (Measurement) Focus on distribution, regression re U i : F U x ( y U) u, x; π ), x Y U = ( u x; β ) Overall, hierarchical model: F Y x G Y x (Structural) ( y x) = F ( y U = u, x) df ( u x) Y U, x U x

Latent Variable Models Latent Class Regression (LCR) Model Model: f Y x J j= 1 M y mj m ( y x) = P π (1 π j m= 1 mj ) 1 y m Structural model: [ U x ] = Pr { U = j} = Pr{ η = j} = P, j = 1 J i i i j,..., Measurement model: = conditional probabilities > is MxJ π Compare to general form: [ Y ] i U i { Y = 1U = j} = Pr{ Y = j} π 1 mj = Pr im i im ηi = F Y x ( y x) = F ( y U = u, x) df ( u x) Y U, x U x

Latent Variable Models Latent Class Regression (LCR) Model Model: f Y x y ( ) m = P π 1 Measurement assumptions: Conditional independence J ( y x) π M j j = 1 m= 1 1 y Ø {Y i1,,y im } mutually independent conditional on U i Ø Reporting heterogeneity unrelated to measured, unmeasured characteristics mj mj [ ] Y i U i m

Latent Variable Models Latent Class Regression (LCR) Model Model: f Y x y ( ) m = P π 1 Measurement assumptions: Conditional independence J ( y x) π M j j = 1 m= 1 1 y Ø {Y i1,,y im } mutually independent conditional on C i Ø Reporting heterogeneity unrelated to measured, unmeasured characteristics mj mj [ ] Y i C i m

Analysis of underlying subpopulations Method: Latent class analysis Seeks homogeneous subpopulations Features that characterize latent groups Prevalence in overall population Proportion reporting each symptom Number of them = least to achieve homogeneity / conditional independence

Latent class analysis Prediction Of interest: Pr(C=j Y=y) = posterior probability of class membership Once model is fit, a straightforward calculation Pr(C=j Y=y) = Pr( C = j, Y = y) Pr( Y = y) M ym 1 ym Pj π mj ( 1 π mj ) = θ J m= 1 m Pk k= 1 m= 1 π ym mk ( 1 π ) 1 y = ij when evaluated at y i mk m

Part II: Fitting

Estimation Broad Strokes Maximum likelihood EM Algorithm Simplex method (Dayton & Macready, 1988) Possibly with weighting, robust variance correction ML software Specialty: Mplus, Latent Gold Stata: gllamm SAS: macro R: polca Bayesian: winbugs

Estimation Methods other than EM algorithm Bayesian MCMC methods (e.g. per Winbugs) A challenge: label-switching Reversible-jump methods Advantages: feasibility, philosophy Disadvantages Prior choice (high-dimensional; avoiding illogic) Burn-in, duration May obscure identification problems

Estimation Likelihood maximization: E-M algorithm A process of averaging over missing data in this case, missing data is class membership.

Estimation Likelihood maximization: E-M algorithm Rationale: LVs as missing data Brief review Complete data { Y, x u} W =, Complete data log likelihood = log F y, u x ( y, u x, φ) = w ( φ w) taken as a function of ϕ Iterate between (K+1) E-Step: evaluate Q( φ φ ( k ) ) = E u y, x [ ( k ( φ W) y, x; φ ) ] w (K+1) M-Step: maximize Q( φ φ (k ) ) wrt ϕ Convergence to a local likelihood maximum under regularity Dempster, Laird, and Rubin, JRSSB, 1977

Estimation EM example: Latent Class Model max L = η log i = 1 j = 1 m= 1 J m ( ) J y 1 y + im im P j π mj 1 π mj ψ P j = 1 j L π mj : n i= 1 θ ij π ( ) n yim π mj y = 0 π = ( ) mj n mj 1 π mj i= 1 im h= 1 θ θ ij hj L P j n { } 1 : θ = = ij Pjn 0 Pj θij n i= 1

EM-Algorithm Latent class model A process of averaging over missing data in this case, missing data is class membership. 1. Choose starting set of posterior probabilities 2. Use them to estimate P and π (M-step) 3. Calculate Log Likelihood 4. Use estimates of P and π to calculate posterior probabilities (E-step) 5. Repeat 2-4 until LL stops changing.

Global and Local Maxima Multiple starting values very important!

Example: Frailty Women s Health & Aging Studies Longitudinal cohort studies to investigate Causes / course of physical and cognitive disability Physiological determinants of frailty Up to 7 rounds spanning 15 years Companion studies in community, Baltimore, MD moderately disabled women 65+ years: n=1002 mildly disabled women 70-79 years: n=436 This project: n=786 age 70-79 years at baseline Probability-weighted analyses Guralnik et al., NIA, 1995; Fried et al., J Gerontol, 2001

Example: Latent Frailty Classes Women s Health and Aging Study Conditional Probabilities (π) Criterion 2-Class Model 3-Class Model CL. 1 NON- FRAIL CL. 2 FRAIL CL. 1 ROBUST CL. 2 INTERMED. CL. 3 FRAIL Weight Loss.073.26.072.11.54 Weakness.088.51.029.26.77 Slowness.15.70.004.45.85 Low Physical Activity.078.51.000.28.70 Exhaustion.061.34.027.16.56 Class Prevalence (P) (%) 73.3 26.7 39.2 53.6 7.2 Bandeen-Roche et al., J Gerontol, 2006

Example: Latent Frailty Classes Women s Health and Aging Study Criterion 2-Class Model 3-Class Model CL. 1 NON- FRAIL Conditional Probabilities (π) CL. 2 FRAIL CL. 1 ROBUST CL. 2 INTERMED. We estimate that 26% in the frail Subpopulation exhibit weight loss Weight Loss.073.26.072.11.54 Weakness.088.51.029.26.77 Slowness.15.70.004.45.85 CL. 3 FRAIL Low Physical Activity.078.51.000.28.70 Exhaustion.061.34.027.16.56 Class Prevalence (P) (%) 73.3 26.7 39.2 53.6 7.2 Bandeen-Roche et al., J Gerontol, 2006

Part III: Evaluating Fit

Choosing the Number of Classes a priori theory Chi-Square goodness of fit Entropy Information Statistics AIC, BIC, others Lo-Mendell-Rubin (LMR) Not recommended (designed for normal Y) Bootstrapped Likelihood Ratio Test

Entropy Measures classification error 0 terrible 1 perfect E = 1 N J Pr( SC i = j Y% i)*log Pr( Si = j Y% i =j C i =j i) i= 1 j= 1 N*log( J) Dias & Vermunt (2006)

Information Statistics s = # of parameters N= sample size smaller values are better AIC: -2LL+2s BIC: -2LL + s*log(n) BIC is typically recommended - Theory: consistent for selection in model family - Nylund et al, Struct Eq Modeling, 2007

Likelihood Ratio Tests LCA models with different # of classes NOT nested appropriately for direct LRT. Rather: LRT to compare a given model to the saturated model LCA df (binary case): J-1 + J*M P parameters (sum to 1) Saturated df: 2 M -1 Goodness of fit df: 2 M J(M+1) π parameters (M items*j classes)

Bootstrapped Likelihood Ratio Test In the absence of knowledge about theoretical distribution of difference in 2LL, can construct empirical distribution from data. per Nylund (2006) simulation studies, performs best

Example: Frailty Construct Validation Women s Health & Aging Studies Internal convergent validity Criteria manifestation is syndromic a group of signs and symptoms that occur together and characterize a particular abnormality - Merriam-Webster Medical Dictionary

Validation: Frailty as a syndrome Method: Latent class analysis If criteria characterize syndrome: At least two groups (otherwise, no cooccurrence) No subgrouping of symptoms (otherwise, more than one abnormality characterized)

Conditional Probabilities of Meeting Criteria in Latent Frailty Classes WHAS Criterion 2-Class Model 3-Class Model CL. 1 NON- FRAIL CL. 2 FRAIL CL. 1 ROBUST CL. 2 INTERMED. CL. 3 FRAIL Weight Loss.073.26.072.11.54 Weakness.088.51.029.26.77 Slowness.15.70.004.45.85 Low Physical Activity.078.51.000.28.70 Exhaustion.061.34.027.16.56 Class Prevalence (%) 73.3 26.7 39.2 53.6 7.2 Bandeen-Roche et al., J Gerontol, 2006

Results: Frailty Syndrome Validation Data: Women s Health and Aging Study Single-population model fit: inadequate Two-population model fit: good Pearson χ 2 p-value=.22; minimized AIC, BIC Frailty criteria prevalence stepwise across classes no subclustering Syndromic manifestation well indicated

Example Residual checking Frailty construct

Part IV: Identifiability / Estimability

Identifiability Rough idea for non -identifiability: More unknowns than there are (independent) equations to solve for them Definition: Consider a family of distributions F Φ = { F( y, φ); φ Φ}. identifiable iff * no φ Φ The parameter φ Φ * : F(y, φ) = F(y, φ ) is (globally) a.e.

Identifiability Related concepts Local identifiability Basic idea: ϕ identified within a neighborhood Definition: F is locally identifiable at exists a neighborhood τ about φ for all τ Φ. φ = φ 0 φ φ 0 if there 0 : F( y; φ0 ) = F( y, φ) Estimability, empirical identifiability: The information matrix for ϕ given y 1,,y n is non-singular.

Identifiability Latent class (binary Y) Latent class analysis (measurement only) Parameter dimension: 2 M -1 Unconstrained J-class model: J-1 + J*M Need 2 M J(M+1) (necessary, not sufficient) Local identifiability: evaluate the Jacobian of the likelihood function (Goodman, 1974) Estimability: Avoid fewer than 10 allocation per cell n > 10*(2 M ) (rule of thumb)

Identifiability / estimability Latent class analysis Frailty example Need 2 M J(M+1) (necessary, not sufficient) M=5; J=3; 32 3 (5+1) YES By this criterion, could fit up to 9 classes Local identifiability: evaluate the Jacobian of the likelihood function (Goodman, 1974) Estimability: n > 10*(2 M ) n > 10*(2 5 ) = 320 - YES

Objectives For you to leave here knowing When is latent class analysis (LCA) model useful? What is the LCA model its underlying assumptions? How are LCA parameters interpreted? How are LCA parameters commonly estimated? How is LCA fit adjudicated? What are considerations for identifiability / estimability?