Latent Class Analysis

Size: px

Start display at page:

Download "Latent Class Analysis"

Alban Harmon
6 years ago
Views:

1 Latent Class Analysis Karen Bandeen-Roche October 27, 2016

2 Objectives For you to leave here knowing When is latent class analysis (LCA) model useful? What is the LCA model its underlying assumptions? How are LCA parameters interpreted? How are LCA parameters commonly estimated? How is LCA fit adjudicated? What are considerations for identifiability / estimability?

3 Motivating Example Frailty of Older Adults the sixth age shifts into the lean and slipper d pantaloon, with spectacles on nose and pouch on side, his youthful hose well sav d, a world too wide, for his shrunk shank -- Shakespeare, As You Like It

4 The Frailty Construct Fried et al., J Gerontol 2001; Bandeen-Roche et al., J Gerontol, 2006

5 Frailty as a latent variable Underlying : status or degree of syndrome Surrogates : Fried et al. (2001) criteria weight loss above threshold low energy expenditure low walking speed weakness beyond threshold exhaustion

6 Part I: Model

7 Latent class model ε 1 Y 1 Frailty η Structural Y m ε m Measurement

8 Well-used latent variable models Latent variable scale Observed variable scale Continuous Discrete Continuous Factor analysis LISREL Discrete FA IRT (item response) Discrete Latent profile Growth mixture Latent class analysis, regression General software: MPlus, Latent Gold, WinBugs (Bayesian), NLMIXED (SAS)

9 Analysis of underlying subpopulations Latent class analysis POPULATION U i P 1 P J 11 1M J1 JM Y 1 Y M Y 1 Y M Lazarsfeld & Henry, Latent Structure Analysis, 1968; Goodman, Biometrika, 1974

10 Latent Variables: What? Integrands in a hierarchical model Observed variables (i=1,,n): Y i =M-variate; x i =P-variate Focus: response (Y) distribution = G Yx (y/x) ( y x) ; x-dependence Model: Y i generated from latent (underlying) U i : F (Measurement) Focus on distribution, regression re U i : F U x ( y U) u, x; π ), x Y U = ( u x; β ) Overall, hierarchical model: F Y x G Y x (Structural) ( y x) = F ( y U = u, x) df ( u x) Y U, x U x

11 Latent Variable Models Latent Class Regression (LCR) Model Model: f Y x J j= 1 M y mj m ( y x) = P π (1 π j m= 1 mj ) 1 y m Structural model: [ U x ] = Pr { U = j} = Pr{ η = j} = P, j = 1 J i i i j,..., Measurement model: = conditional probabilities > is MxJ π Compare to general form: [ Y ] i U i { Y = 1U = j} = Pr{ Y = j} π 1 mj = Pr im i im ηi = F Y x ( y x) = F ( y U = u, x) df ( u x) Y U, x U x

12 Latent Variable Models Latent Class Regression (LCR) Model Model: f Y x y ( ) m = P π 1 Measurement assumptions: Conditional independence J ( y x) π M j j = 1 m= 1 1 y Ø {Y i1,,y im } mutually independent conditional on U i Ø Reporting heterogeneity unrelated to measured, unmeasured characteristics mj mj [ ] Y i U i m

13 Latent Variable Models Latent Class Regression (LCR) Model Model: f Y x y ( ) m = P π 1 Measurement assumptions: Conditional independence J ( y x) π M j j = 1 m= 1 1 y Ø {Y i1,,y im } mutually independent conditional on C i Ø Reporting heterogeneity unrelated to measured, unmeasured characteristics mj mj [ ] Y i C i m

14 Analysis of underlying subpopulations Method: Latent class analysis Seeks homogeneous subpopulations Features that characterize latent groups Prevalence in overall population Proportion reporting each symptom Number of them = least to achieve homogeneity / conditional independence

15 Latent class analysis Prediction Of interest: Pr(C=j Y=y) = posterior probability of class membership Once model is fit, a straightforward calculation Pr(C=j Y=y) = Pr( C = j, Y = y) Pr( Y = y) M ym 1 ym Pj π mj ( 1 π mj ) = θ J m= 1 m Pk k= 1 m= 1 π ym mk ( 1 π ) 1 y = ij when evaluated at y i mk m

16 Part II: Fitting

17 Estimation Broad Strokes Maximum likelihood EM Algorithm Simplex method (Dayton & Macready, 1988) Possibly with weighting, robust variance correction ML software Specialty: Mplus, Latent Gold Stata: gllamm SAS: macro R: polca Bayesian: winbugs

18 Estimation Methods other than EM algorithm Bayesian MCMC methods (e.g. per Winbugs) A challenge: label-switching Reversible-jump methods Advantages: feasibility, philosophy Disadvantages Prior choice (high-dimensional; avoiding illogic) Burn-in, duration May obscure identification problems

19 Estimation Likelihood maximization: E-M algorithm A process of averaging over missing data in this case, missing data is class membership.

20 Estimation Likelihood maximization: E-M algorithm Rationale: LVs as missing data Brief review Complete data { Y, x u} W =, Complete data log likelihood = log F y, u x ( y, u x, φ) = w ( φ w) taken as a function of ϕ Iterate between (K+1) E-Step: evaluate Q( φ φ ( k ) ) = E u y, x [ ( k ( φ W) y, x; φ ) ] w (K+1) M-Step: maximize Q( φ φ (k ) ) wrt ϕ Convergence to a local likelihood maximum under regularity Dempster, Laird, and Rubin, JRSSB, 1977

21 Estimation EM example: Latent Class Model max L = η log i = 1 j = 1 m= 1 J m ( ) J y 1 y + im im P j π mj 1 π mj ψ P j = 1 j L π mj : n i= 1 θ ij π ( ) n yim π mj y = 0 π = ( ) mj n mj 1 π mj i= 1 im h= 1 θ θ ij hj L P j n { } 1 : θ = = ij Pjn 0 Pj θij n i= 1

22 EM-Algorithm Latent class model A process of averaging over missing data in this case, missing data is class membership. 1. Choose starting set of posterior probabilities 2. Use them to estimate P and π (M-step) 3. Calculate Log Likelihood 4. Use estimates of P and π to calculate posterior probabilities (E-step) 5. Repeat 2-4 until LL stops changing.

23 Global and Local Maxima Multiple starting values very important!

24 Example: Frailty Women s Health & Aging Studies Longitudinal cohort studies to investigate Causes / course of physical and cognitive disability Physiological determinants of frailty Up to 7 rounds spanning 15 years Companion studies in community, Baltimore, MD moderately disabled women 65+ years: n=1002 mildly disabled women years: n=436 This project: n=786 age years at baseline Probability-weighted analyses Guralnik et al., NIA, 1995; Fried et al., J Gerontol, 2001

073.26.072.11.54 Weakness.088.51.029.26.77 Slowness.15.70.004.45.85 Low Physical Activity.078.51.000.28.

25 Example: Latent Frailty Classes Women s Health and Aging Study Conditional Probabilities (π) Criterion 2-Class Model 3-Class Model CL. 1 NON- FRAIL CL. 2 FRAIL CL. 1 ROBUST CL. 2 INTERMED. CL. 3 FRAIL Weight Loss Weakness Slowness Low Physical Activity Exhaustion Class Prevalence (P) (%) Bandeen-Roche et al., J Gerontol, 2006

Example: Latent Frailty Classes Women s Health and Aging Study Criterion 2-Class Model 3-Class Model CL. 1 NON- FRAIL Conditional Probabilities (π) CL. 2 FRAIL CL. 1 ROBUST CL. 2 INTERMED.

26 Example: Latent Frailty Classes Women s Health and Aging Study Criterion 2-Class Model 3-Class Model CL. 1 NON- FRAIL Conditional Probabilities (π) CL. 2 FRAIL CL. 1 ROBUST CL. 2 INTERMED. We estimate that 26% in the frail Subpopulation exhibit weight loss Weight Loss Weakness Slowness CL. 3 FRAIL Low Physical Activity Exhaustion Class Prevalence (P) (%) Bandeen-Roche et al., J Gerontol, 2006

27 Part III: Evaluating Fit

28 Choosing the Number of Classes a priori theory Chi-Square goodness of fit Entropy Information Statistics AIC, BIC, others Lo-Mendell-Rubin (LMR) Not recommended (designed for normal Y) Bootstrapped Likelihood Ratio Test

29 Entropy Measures classification error 0 terrible 1 perfect E = 1 N J Pr( SC i = j Y% i)*log Pr( Si = j Y% i =j C i =j i) i= 1 j= 1 N*log( J) Dias & Vermunt (2006)

30 Information Statistics s = # of parameters N= sample size smaller values are better AIC: -2LL+2s BIC: -2LL + s*log(n) BIC is typically recommended - Theory: consistent for selection in model family - Nylund et al, Struct Eq Modeling, 2007

31 Likelihood Ratio Tests LCA models with different # of classes NOT nested appropriately for direct LRT. Rather: LRT to compare a given model to the saturated model LCA df (binary case): J-1 + J*M P parameters (sum to 1) Saturated df: 2 M -1 Goodness of fit df: 2 M J(M+1) π parameters (M items*j classes)

32 Bootstrapped Likelihood Ratio Test In the absence of knowledge about theoretical distribution of difference in 2LL, can construct empirical distribution from data. per Nylund (2006) simulation studies, performs best

33 Example: Frailty Construct Validation Women s Health & Aging Studies Internal convergent validity Criteria manifestation is syndromic a group of signs and symptoms that occur together and characterize a particular abnormality - Merriam-Webster Medical Dictionary

34 Validation: Frailty as a syndrome Method: Latent class analysis If criteria characterize syndrome: At least two groups (otherwise, no cooccurrence) No subgrouping of symptoms (otherwise, more than one abnormality characterized)

35 Conditional Probabilities of Meeting Criteria in Latent Frailty Classes WHAS Criterion 2-Class Model 3-Class Model CL. 1 NON- FRAIL CL. 2 FRAIL CL. 1 ROBUST CL. 2 INTERMED. CL. 3 FRAIL Weight Loss Weakness Slowness Low Physical Activity Exhaustion Class Prevalence (%) Bandeen-Roche et al., J Gerontol, 2006

36 Results: Frailty Syndrome Validation Data: Women s Health and Aging Study Single-population model fit: inadequate Two-population model fit: good Pearson χ 2 p-value=.22; minimized AIC, BIC Frailty criteria prevalence stepwise across classes no subclustering Syndromic manifestation well indicated

37 Example Residual checking Frailty construct

38 Part IV: Identifiability / Estimability

39 Identifiability Rough idea for non -identifiability: More unknowns than there are (independent) equations to solve for them Definition: Consider a family of distributions F Φ = { F( y, φ); φ Φ}. identifiable iff * no φ Φ The parameter φ Φ * : F(y, φ) = F(y, φ ) is (globally) a.e.

40 Identifiability Related concepts Local identifiability Basic idea: ϕ identified within a neighborhood Definition: F is locally identifiable at exists a neighborhood τ about φ for all τ Φ. φ = φ 0 φ φ 0 if there 0 : F( y; φ0 ) = F( y, φ) Estimability, empirical identifiability: The information matrix for ϕ given y 1,,y n is non-singular.

41 Identifiability Latent class (binary Y) Latent class analysis (measurement only) Parameter dimension: 2 M -1 Unconstrained J-class model: J-1 + J*M Need 2 M J(M+1) (necessary, not sufficient) Local identifiability: evaluate the Jacobian of the likelihood function (Goodman, 1974) Estimability: Avoid fewer than 10 allocation per cell n > 10*(2 M ) (rule of thumb)

42 Identifiability / estimability Latent class analysis Frailty example Need 2 M J(M+1) (necessary, not sufficient) M=5; J=3; 32 3 (5+1) YES By this criterion, could fit up to 9 classes Local identifiability: evaluate the Jacobian of the likelihood function (Goodman, 1974) Estimability: n > 10*(2 M ) n > 10*(2 5 ) = YES

43 Objectives For you to leave here knowing When is latent class analysis (LCA) model useful? What is the LCA model its underlying assumptions? How are LCA parameters interpreted? How are LCA parameters commonly estimated? How is LCA fit adjudicated? What are considerations for identifiability / estimability?

What is Latent Class Analysis. Tarani Chandola

What is Latent Class Analysis. Tarani Chandola What is Latent Class Analysis Tarani Chandola methods@manchester Many names similar methods (Finite) Mixture Modeling Latent Class Analysis Latent Profile Analysis Latent class analysis (LCA) LCA is a