Causal Discovery with Latent Variables: the Measurement Problem

Causal Discovery with Latent Variables: the Measurement Problem Richard Scheines Carnegie Mellon University Peter Spirtes, Clark Glymour, Ricardo Silva, Steve Riese, Erich Kummerfeld, Renjie Yang, Max Mansolf 1

Outline 1. Measurement Error, Coarsening and Conditional Independence 2. Conventional Strategies 3. Latent Variable Models to the Rescue 4. The Problem of Impurity 5. Strategies for Handling Impurity (Rank Constraints) 6. Application to Psychometric Models

Causal Structure D-separation Causal Markov Axiom Testable Statistical Predictions Causal Graphs e.g., Conditional Independence X Y Z X _ _ Z Y x,y,z P(X = x, Z=z Y=y) = P(X = x Y=y) P(Z=z Y=y) 3

Measurement Error and Coarsening Endanger conditional Independence X Z Y X _ _ Y Z Z Z* ε Z = Z + ε Measurement Error Coarsening: - < Z < 0 à Z* = 0 0 Z < i à Z* = 1 i Z < j à Z* = 2.. k Z < à Z* = k X _ _ Y Z (unless Var(ε ) = 0) X _ _ Y Z* (almost always) 4

Parameterizing Measurement Error Y Y m := Y + ε ym ε ym ~ N(0,σ 2 2 ) Y m Measurement_Error = ε Ym ε ym Var(Y m ) = Var(Y) + Var(ε Ym ) Amount of Measurement Error = Var(eYm)/Var(Ym) = Var(Ym) Var(Y)/Var(Ym) 5

Unstandardized Standardized Y m := Y + ε ym Y Y Y m := β* m Y + ε y m β* m ε ym ~ N(0,σ 2 ) Y m Y m ε y m ~ N(0,σ 2 ) ε ym ε y'm Var(Y) = Var(Y m ) = 1.0 Measurement Error = Var(eYm)/Var(Ym) Var(ε y m ) = σ 2 = 1 - β* m 2 β* m = ρ(y, Y m ) Var(ε y m ) = 1 - ρ(y, Y m ) 2 Measurement Error = Var(eY m)/ Var(Y m) 6 Measurement Error = 1 r(y, Y m)2/1.0

Measurement Error True Graph Parameterization 1: Standardized SEM IM Measurement error (negligible)

Measurement Error N= 1,000

Measurement Error Parameterization 1: Negligible Measurement Error GES Output FCI Output

Measurement Error Parameterization 2: Small Measurement Error GES Output FCI Output (α =.05)

Measurement Error Parameterization 2: Small Measurement Error GES Output FCI Output (α =.01)

Coarsening Parameterization 3: GES Output FCI Output (α =.05)

Coarsening L1 > 0.0 à L1c = 1.0 L1 0.0 à L1c = 0.0 L1 > 0.0 à L1c = 1.0 L1 0.0 à L1c = 0.0

Strategies X Z Y 1. Guess (the amount of measurement error): Sensitivity Analysis Bayesian Analysis Bounds Z ε Measurement Error = 1 ρ(z, Z )2 Prior over 14

Strategies 1. Guess the measurement error: Sensitivity Analysis Bayesian Analysis Bounds X Z Y Z ε 2. Multiple Indicators: X Z Y Z1 Z2 15

Strategies 1. Parameterize measurement error: Sensitivity Analysis Bayesian Analysis Bounds X Z Y Z ε 2. Multiple Indicators: Scales X Z Y Z1 Z2 X _ _ Y Z_scale εz1 Z_scale εz2 16

Simulated Example: Lead and IQ Parental Resources Lead Exposure -.5 1.0 IQ Psuedorandom sample: N = 2,000 Test of Lead _ _ IQ Parental Resources: Regression: Dependent Variable: IQ Independent Variables: Lead, PR Independent Variable Coefficient Estimate p-value PR 0.98 0.000 Lead -0.088 0.378 17

Multiple Measures of the Confounder ε 1 ε 2 ε 3 X 1 X 2 X 3 Lead Exposure Parental Resources IQ X 1 := Parental Resources + ε 1 X 2 := Parental Resources + ε 2 X 3 := Parental Resources + ε 3 18

Scales don't preserve conditional independence PR_Scale ε 1 ε 2 ε 3 X 1 X 2 X 3 Lead Exposure Parental Resources IQ PR_Scale = (X 1 + X 2 + X 3 ) / 3 Independent Variable Coefficient Estimate p-value PR_scale 0.290 0.000 Lead -0.423 0.000 19

Indicators Don t Preserve Conditional Independence X 1 X 2 X 3 Lead Exposure Parental Resources IQ Regress IQ on: Lead, X 1, X 2, X 3 Independent Variable Coefficient Estimate p-value X 1 0.22 0.002 X 2 0.45 0.000 X 3 0.18 0.013 Lead -0.414 0.000 20

Strategies 1. Parameterize measurement error: Sensitivity Analysis Bayesian Analysis Bounds 2. Multiple Indicators: Scales SEM εx X Z Y Z ε βxy X Z Y Z1 Z2 εy εz1 εz2 E( ˆ β ) = yx 0 X _ _ Y Z 21

Structural Equation Models Work! True Model Estimated Model X 1 X 2 X 3 X 1 X 2 X 3 Lead Exposure Parental Resources IQ Lead Exposure Parental Resources IQ β In the Estimated Model E( ˆ) β = 0 ˆ β =.07 (p-value =.499) Lead and IQ detectibly screened off by PR 22

Test conditional independence relations among latents generally in a SEM Question: L 1 _ _ L 2 {Q 1, Q 2,..., Q n } β 21 E( ˆ β 21 ) = 0 L 1 _ _ L 2 {Q 1, Q 2,..., Q n }

Pure Measurement Models: Causal Discovery among latents feasible Independence Questions: e.g., L 1 _ _ L 2 {L 3, L 4,..., L n } Constraint Based Search over L MIMBuild (PC) Pattern, PAG over L

The Problem of Impurities ( Unmodeled Complexity ) Specified Model True Model εx β yx εy εx εy X Z Y X Z Y Z1 Z2 Z1 Z2 εz1 εz2 εz1 εz2 E( ˆ β ) = yx 0 X _ _ Y Z E( ˆ β yx ) β xy 25

Test conditional independence relations among latents generally Question: L 1 _ _ L 2 {Q 1, Q 2,..., Q n } β 21 E( ˆ β 21 ) = 0 L 1 _ _ L 2 {Q 1, Q 2,..., Q n }

Strategy: Purify the Measurement Model Full Model F1 F2 F3 x 1 x 2 x 3 x 4 y 1 y 2 y 3 y 4 z 1 z 2 z 3 z 4 Structural Model F1 F2 F3 Measurement Model F1 F2 F3 x 1 x 2 x 3 x 4 y 1 y 2 y 3 y 4 z 1 z 2 z 3 z 4 27

Local Independence / Pure Measurement Models Impure F1 F2 F3 x 1 x 2 x 3 x 4 y 1 y 2 y 3 y 4 z 1 z 2 z 3 z 4 F 1 st Order Pure F1 F2 F3 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x 11 x 12 28

Purify Impure F1 F2 F3 x 1 x 2 x 3 x 4 y 1 y 2 y 3 y 4 z 1 z 2 z 3 z 4 F 1 st Order Purified F1 F2 F3 x 1 x 2 X3 y 1 y 2 y 3 y 4 z 1 z 3 z 4 X 4 Z2 29

Testable/Observable Contraints Latent variable models that are linear below the latents entail testable rank constraints on the measured covariance matrix regardless of the structural model. Impurities selectively defeat such implications, and are thus in many circumstances detectable and localizable. 30

Rank 1 Constraints: Tetrad Equations Fact: given W L λ 1 λ 2 λ 3 λ 4 X Y Z W = λ 1 L + ε 1 X = λ 2 L + ε 2 Y = λ 3 L + ε 3 Z = λ 4 L + ε 4 it follows that Cov WX Cov YZ = (λ 1 λ 2 σ 2 L ) (λ 3λ 4 σ 2 L ) = = (λ 1 λ 3 σ 2 L ) (λ 2λ 4 σ 2 L ) = Cov WY Cov XZ σ WX σ YZ = σ WY σ XZ = σ WZ σ XY tetrad constraints

Charles Spearman (1904) Statistical Constraints à Measurement Model Structure ρ m1,m2 * ρ r1,r2 = ρ m1,r1 * ρ m2,r2 = ρ m1,r2 * ρ m2,r1 g m1 m2 r1 r2

Impurities defeat rank (e.g., tetrad) constraints selectively Truth F1 Truth F1 x 1 x 2 x 3 x 4 x 1 x 2 x 3 x 4 F5 ρ x1,x2 * ρ x3,x4 = ρ x1,x3 * ρ x2,x4 ρ x1,x2 * ρ x3,x4 = ρ x1,x4 * ρ x2,x3 ρ x1,x3 * ρ x2,x4 = ρ x1,x4 * ρ x2,x3 ρ x1,x2 * ρ x3,x4 = ρ x1,x3 * ρ x2,x4 ρ x1,x2 * ρ x3,x4 = ρ x1,x4 * ρ x2,x3 ρ x1,x3 * ρ x2,x4 = ρ x1,x4 * ρ x2,x3 33

2006: Build (1 st -Order) Pure Clusters (BPC) Input: - Covariance matrix over set of original items BPC (FOFC) 1) Cluster (complicated boolean combinations of tetrads) 2) Purify Output: Equivalence class of 1 st order pure clusters BPC: Pointwise consistent Silva, R., Glymour, C., Scheines, R. and Spirtes, P. (2006) Learning the Structure of Latent Linear Structure Models, Journal of Machine Learning Research, 7, 191-246. 34

Case Study: Stress, Depression, and Religion Masters Students (N = 127) 61 - item survey (Likert Scale) Stress: St 1 - St 21 Depression: D 1 - D 20 Religious Coping: C 1 - C 20 St 1 1 St 2 1.. St 21 1 Specified Model Depression Stress + + Coping - Dep 1 1 Dep 2 1.. Dep 20 1 P(χ2) = 0.00 C 1 C 2.. C 20 35

Case Study: Stress, Depression, and Religion St 3 1 Build Pure Clusters St 4 1 St 16 1 St 18 1 Stress Coping Dep 9 1 Depression Dep 13 1 Dep 19 1 St 20 1 C 9 C 12 C 15 C 14 36

Case Study: Stress, Depression, and Religion Assume : Stress causally prior to Depression Find : Stress _ _ Coping Depression St 3 1 St 4 1 St 16 1 St 18 1 Stress + Coping Dep 9 1 Depression Dep 13 1 + Dep 19 1 St 20 1 C 9 C 12 C 15 C 14 P(χ2) = 0.28 37

Psychometric Models in Practice: Almost Never 1 st Order Pure or 1 st -order Purifiable

Psychometric Models in Practice: Almost Never 1 st Order Pure or 1 st -order Purifiable Bi-factor structure

Local Independence / Pure Measurement Models 1 st Order Pure F1 F2 F3 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x 11 x 12 2 nd Order Pure F1 F2 F3 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x 11 x 12 G 40

Beyond 1 st -Order Purity and Tetrads Drton, M., Sturmfels, B., Sullivant, S. (2007) Algebraic factor analysis: tetrads, pentads and beyond, Probability Theory and Related Fields, 138, 3-4, 463-493 Sullivant, S., Talaska, K., & Draisma, J. (2010). Trek Separation for Gaussian Graphical Models. Annals of Statistics, 38(3), 1665-1685 41

Trek-Separation 42

2013 FTFC (2 nd -order pure measurement clusters) Input: Covariance Matrix of measured items: Output: Subset of items and clusters that are 2 nd Order Pure Kummerfeld and Ramsey

2 nd Order Impure F1 F2 F3 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x 11 x 12 G Σ(X) FTFC 2 nd -Order Pure Clusters: {X 1, X 2, X 3, X 4, X 5, X 6 } {X 8, X 9, X 10, X 11, X 12 } 44

Psychometric Models in Practice:Bi-factor vs. Higher-Order Bi-factor Higher-order

Model Comparison: 2 nd -Order vs. Bi-factor 2 nd -Order entails more constraints (simpler, more degrees of freedom, nesting not simple) Statistical comparison: fit - complexity fit to data (good) complexity (bad) p-value(χ 2 ), BIC, AIC, etc. Theoretically BIC desirable.

Model Comparison: 2 nd -Order vs. Bi-factor Empirical finding: Bi-factor usually wins (65 published studies comparing: all chose bi-factors) What if data-generating process is neither 2 nd -order nor bi-factor? What if there is unmodeled complexity?

Simulation Study: Truth: 2 nd -Order

Simulation Study: 2 Strategies Data from impure 2 nd -order model 1. Before FOFC: Fit and compare pure versions of 2 nd -order, bi-factor models with correct clustering on all items 2. After FOFC/FTFC Locate and discard impure indicators Fit and compare pure versions of 2 nd -order, bi-factor models with correct clustering on retained items

Truth Purified Version of Truth {X12, X13, X20} removed

Simulation Study Truth: 2 nd -Order dbic > 0: 2 nd -Order Model dbic < 0: Bi-factor Model

Size of the impurities dbic > 0: 2 nd -Order Model Truth: 2 nd -Order dbic < 0: Bi-factor Model

Level of Inhomogeneity Inhomogeneiity: the degree to which the impurities are concentrated on a small number of indicators Low Inhomogeneiity High Inhomogeneiity

Level of Inhomogeneity Truth: 2 nd -Order dbic > 0: 2 nd -Order Model dbic < 0: Bi-factor Model

Sample Size Truth: 2 nd -Order dbic > 0: 2 nd -Order Model dbic < 0: Bi-factor Model

Thanks 56

Measurement Model Misspecification à Structural Parameter Bias 57

Measurement Model Misspecification à Structural Parameter Bias Specified Model True Model Parametric Measure of Difference: Explained Common Variance (ECV) 58

Measurement Models Unidimensional Correlated Trait

Psychometric Models in Practice: Bi-factor vs. 2 nd (Higher)-Order Models 2 nd Order Bi-factor