Causal Discovery with Latent Variables: the Measurement Problem

Size: px

Start display at page:

Download "Causal Discovery with Latent Variables: the Measurement Problem"

Charleen Richardson
6 years ago
Views:

1 Causal Discovery with Latent Variables: the Measurement Problem Richard Scheines Carnegie Mellon University Peter Spirtes, Clark Glymour, Ricardo Silva, Steve Riese, Erich Kummerfeld, Renjie Yang, Max Mansolf 1

2 Outline 1. Measurement Error, Coarsening and Conditional Independence 2. Conventional Strategies 3. Latent Variable Models to the Rescue 4. The Problem of Impurity 5. Strategies for Handling Impurity (Rank Constraints) 6. Application to Psychometric Models

3 Causal Structure D-separation Causal Markov Axiom Testable Statistical Predictions Causal Graphs e.g., Conditional Independence X Y Z X _ _ Z Y x,y,z P(X = x, Z=z Y=y) = P(X = x Y=y) P(Z=z Y=y) 3

4 Measurement Error and Coarsening Endanger conditional Independence X Z Y X _ _ Y Z Z Z* ε Z = Z + ε Measurement Error Coarsening: - < Z < 0 à Z* = 0 0 Z < i à Z* = 1 i Z < j à Z* = 2.. k Z < à Z* = k X _ _ Y Z (unless Var(ε ) = 0) X _ _ Y Z* (almost always) 4

5 Parameterizing Measurement Error Y Y m := Y + ε ym ε ym ~ N(0,σ 2 2 ) Y m Measurement_Error = ε Ym ε ym Var(Y m ) = Var(Y) + Var(ε Ym ) Amount of Measurement Error = Var(eYm)/Var(Ym) = Var(Ym) Var(Y)/Var(Ym) 5

6 Unstandardized Standardized Y m := Y + ε ym Y Y Y m := β* m Y + ε y m β* m ε ym ~ N(0,σ 2 ) Y m Y m ε y m ~ N(0,σ 2 ) ε ym ε y'm Var(Y) = Var(Y m ) = 1.0 Measurement Error = Var(eYm)/Var(Ym) Var(ε y m ) = σ 2 = 1 - β* m 2 β* m = ρ(y, Y m ) Var(ε y m ) = 1 - ρ(y, Y m ) 2 Measurement Error = Var(eY m)/ Var(Y m) 6 Measurement Error = 1 r(y, Y m)2/1.0

7 Measurement Error True Graph Parameterization 1: Standardized SEM IM Measurement error (negligible)

8 Measurement Error N= 1,000

9 Measurement Error Parameterization 1: Negligible Measurement Error GES Output FCI Output

10 Measurement Error Parameterization 2: Small Measurement Error GES Output FCI Output (α =.05)

11 Measurement Error Parameterization 2: Small Measurement Error GES Output FCI Output (α =.01)

12 Coarsening Parameterization 3: GES Output FCI Output (α =.05)

13 Coarsening L1 > 0.0 à L1c = 1.0 L1 0.0 à L1c = 0.0 L1 > 0.0 à L1c = 1.0 L1 0.0 à L1c = 0.0

14 Strategies X Z Y 1. Guess (the amount of measurement error): Sensitivity Analysis Bayesian Analysis Bounds Z ε Measurement Error = 1 ρ(z, Z )2 Prior over 14

15 Strategies 1. Guess the measurement error: Sensitivity Analysis Bayesian Analysis Bounds X Z Y Z ε 2. Multiple Indicators: X Z Y Z1 Z2 15

16 Strategies 1. Parameterize measurement error: Sensitivity Analysis Bayesian Analysis Bounds X Z Y Z ε 2. Multiple Indicators: Scales X Z Y Z1 Z2 X _ _ Y Z_scale εz1 Z_scale εz2 16

17 Simulated Example: Lead and IQ Parental Resources Lead Exposure IQ Psuedorandom sample: N = 2,000 Test of Lead _ _ IQ Parental Resources: Regression: Dependent Variable: IQ Independent Variables: Lead, PR Independent Variable Coefficient Estimate p-value PR Lead

18 Multiple Measures of the Confounder ε 1 ε 2 ε 3 X 1 X 2 X 3 Lead Exposure Parental Resources IQ X 1 := Parental Resources + ε 1 X 2 := Parental Resources + ε 2 X 3 := Parental Resources + ε 3 18

19 Scales don't preserve conditional independence PR_Scale ε 1 ε 2 ε 3 X 1 X 2 X 3 Lead Exposure Parental Resources IQ PR_Scale = (X 1 + X 2 + X 3 ) / 3 Independent Variable Coefficient Estimate p-value PR_scale Lead

20 Indicators Don t Preserve Conditional Independence X 1 X 2 X 3 Lead Exposure Parental Resources IQ Regress IQ on: Lead, X 1, X 2, X 3 Independent Variable Coefficient Estimate p-value X X X Lead

21 Strategies 1. Parameterize measurement error: Sensitivity Analysis Bayesian Analysis Bounds 2. Multiple Indicators: Scales SEM εx X Z Y Z ε βxy X Z Y Z1 Z2 εy εz1 εz2 E( ˆ β ) = yx 0 X _ _ Y Z 21

22 Structural Equation Models Work! True Model Estimated Model X 1 X 2 X 3 X 1 X 2 X 3 Lead Exposure Parental Resources IQ Lead Exposure Parental Resources IQ β In the Estimated Model E( ˆ) β = 0 ˆ β =.07 (p-value =.499) Lead and IQ detectibly screened off by PR 22

23 Test conditional independence relations among latents generally in a SEM Question: L 1 _ _ L 2 {Q 1, Q 2,..., Q n } β 21 E( ˆ β 21 ) = 0 L 1 _ _ L 2 {Q 1, Q 2,..., Q n }

24 Pure Measurement Models: Causal Discovery among latents feasible Independence Questions: e.g., L 1 _ _ L 2 {L 3, L 4,..., L n } Constraint Based Search over L MIMBuild (PC) Pattern, PAG over L

25 The Problem of Impurities ( Unmodeled Complexity ) Specified Model True Model εx β yx εy εx εy X Z Y X Z Y Z1 Z2 Z1 Z2 εz1 εz2 εz1 εz2 E( ˆ β ) = yx 0 X _ _ Y Z E( ˆ β yx ) β xy 25

26 Test conditional independence relations among latents generally Question: L 1 _ _ L 2 {Q 1, Q 2,..., Q n } β 21 E( ˆ β 21 ) = 0 L 1 _ _ L 2 {Q 1, Q 2,..., Q n }

27 Strategy: Purify the Measurement Model Full Model F1 F2 F3 x 1 x 2 x 3 x 4 y 1 y 2 y 3 y 4 z 1 z 2 z 3 z 4 Structural Model F1 F2 F3 Measurement Model F1 F2 F3 x 1 x 2 x 3 x 4 y 1 y 2 y 3 y 4 z 1 z 2 z 3 z 4 27

28 Local Independence / Pure Measurement Models Impure F1 F2 F3 x 1 x 2 x 3 x 4 y 1 y 2 y 3 y 4 z 1 z 2 z 3 z 4 F 1 st Order Pure F1 F2 F3 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x 11 x 12 28

29 Purify Impure F1 F2 F3 x 1 x 2 x 3 x 4 y 1 y 2 y 3 y 4 z 1 z 2 z 3 z 4 F 1 st Order Purified F1 F2 F3 x 1 x 2 X3 y 1 y 2 y 3 y 4 z 1 z 3 z 4 X 4 Z2 29

30 Testable/Observable Contraints Latent variable models that are linear below the latents entail testable rank constraints on the measured covariance matrix regardless of the structural model. Impurities selectively defeat such implications, and are thus in many circumstances detectable and localizable. 30

31 Rank 1 Constraints: Tetrad Equations Fact: given W L λ 1 λ 2 λ 3 λ 4 X Y Z W = λ 1 L + ε 1 X = λ 2 L + ε 2 Y = λ 3 L + ε 3 Z = λ 4 L + ε 4 it follows that Cov WX Cov YZ = (λ 1 λ 2 σ 2 L ) (λ 3λ 4 σ 2 L ) = = (λ 1 λ 3 σ 2 L ) (λ 2λ 4 σ 2 L ) = Cov WY Cov XZ σ WX σ YZ = σ WY σ XZ = σ WZ σ XY tetrad constraints

32 Charles Spearman (1904) Statistical Constraints à Measurement Model Structure ρ m1,m2 * ρ r1,r2 = ρ m1,r1 * ρ m2,r2 = ρ m1,r2 * ρ m2,r1 g m1 m2 r1 r2

33 Impurities defeat rank (e.g., tetrad) constraints selectively Truth F1 Truth F1 x 1 x 2 x 3 x 4 x 1 x 2 x 3 x 4 F5 ρ x1,x2 * ρ x3,x4 = ρ x1,x3 * ρ x2,x4 ρ x1,x2 * ρ x3,x4 = ρ x1,x4 * ρ x2,x3 ρ x1,x3 * ρ x2,x4 = ρ x1,x4 * ρ x2,x3 ρ x1,x2 * ρ x3,x4 = ρ x1,x3 * ρ x2,x4 ρ x1,x2 * ρ x3,x4 = ρ x1,x4 * ρ x2,x3 ρ x1,x3 * ρ x2,x4 = ρ x1,x4 * ρ x2,x3 33

34 2006: Build (1 st -Order) Pure Clusters (BPC) Input: - Covariance matrix over set of original items BPC (FOFC) 1) Cluster (complicated boolean combinations of tetrads) 2) Purify Output: Equivalence class of 1 st order pure clusters BPC: Pointwise consistent Silva, R., Glymour, C., Scheines, R. and Spirtes, P. (2006) Learning the Structure of Latent Linear Structure Models, Journal of Machine Learning Research, 7,

35 Case Study: Stress, Depression, and Religion Masters Students (N = 127) 61 - item survey (Likert Scale) Stress: St 1 - St 21 Depression: D 1 - D 20 Religious Coping: C 1 - C 20 St 1 1 St St 21 1 Specified Model Depression Stress + + Coping - Dep 1 1 Dep Dep 20 1 P(χ2) = 0.00 C 1 C 2.. C 20 35

36 Case Study: Stress, Depression, and Religion St 3 1 Build Pure Clusters St 4 1 St 16 1 St 18 1 Stress Coping Dep 9 1 Depression Dep 13 1 Dep 19 1 St 20 1 C 9 C 12 C 15 C 14 36

37 Case Study: Stress, Depression, and Religion Assume : Stress causally prior to Depression Find : Stress _ _ Coping Depression St 3 1 St 4 1 St 16 1 St 18 1 Stress + Coping Dep 9 1 Depression Dep Dep 19 1 St 20 1 C 9 C 12 C 15 C 14 P(χ2) =

38 Psychometric Models in Practice: Almost Never 1 st Order Pure or 1 st -order Purifiable

39 Psychometric Models in Practice: Almost Never 1 st Order Pure or 1 st -order Purifiable Bi-factor structure

40 Local Independence / Pure Measurement Models 1 st Order Pure F1 F2 F3 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x 11 x 12 2 nd Order Pure F1 F2 F3 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x 11 x 12 G 40

41 Beyond 1 st -Order Purity and Tetrads Drton, M., Sturmfels, B., Sullivant, S. (2007) Algebraic factor analysis: tetrads, pentads and beyond, Probability Theory and Related Fields, 138, 3-4, Sullivant, S., Talaska, K., & Draisma, J. (2010). Trek Separation for Gaussian Graphical Models. Annals of Statistics, 38(3),

42 Trek-Separation 42

43 2013 FTFC (2 nd -order pure measurement clusters) Input: Covariance Matrix of measured items: Output: Subset of items and clusters that are 2 nd Order Pure Kummerfeld and Ramsey

44 2 nd Order Impure F1 F2 F3 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x 11 x 12 G Σ(X) FTFC 2 nd -Order Pure Clusters: {X 1, X 2, X 3, X 4, X 5, X 6 } {X 8, X 9, X 10, X 11, X 12 } 44

45 Psychometric Models in Practice:Bi-factor vs. Higher-Order Bi-factor Higher-order

46 Model Comparison: 2 nd -Order vs. Bi-factor 2 nd -Order entails more constraints (simpler, more degrees of freedom, nesting not simple) Statistical comparison: fit - complexity fit to data (good) complexity (bad) p-value(χ 2 ), BIC, AIC, etc. Theoretically BIC desirable.

47 Model Comparison: 2 nd -Order vs. Bi-factor Empirical finding: Bi-factor usually wins (65 published studies comparing: all chose bi-factors) What if data-generating process is neither 2 nd -order nor bi-factor? What if there is unmodeled complexity?

48 Simulation Study: Truth: 2 nd -Order

49 Simulation Study: 2 Strategies Data from impure 2 nd -order model 1. Before FOFC: Fit and compare pure versions of 2 nd -order, bi-factor models with correct clustering on all items 2. After FOFC/FTFC Locate and discard impure indicators Fit and compare pure versions of 2 nd -order, bi-factor models with correct clustering on retained items

50 Truth Purified Version of Truth {X12, X13, X20} removed

51 Simulation Study Truth: 2 nd -Order dbic > 0: 2 nd -Order Model dbic < 0: Bi-factor Model

52 Size of the impurities dbic > 0: 2 nd -Order Model Truth: 2 nd -Order dbic < 0: Bi-factor Model

53 Level of Inhomogeneity Inhomogeneiity: the degree to which the impurities are concentrated on a small number of indicators Low Inhomogeneiity High Inhomogeneiity

54 Level of Inhomogeneity Truth: 2 nd -Order dbic > 0: 2 nd -Order Model dbic < 0: Bi-factor Model

55 Sample Size Truth: 2 nd -Order dbic > 0: 2 nd -Order Model dbic < 0: Bi-factor Model

56 Thanks 56

57 Measurement Model Misspecification à Structural Parameter Bias 57

58 Measurement Model Misspecification à Structural Parameter Bias Specified Model True Model Parametric Measure of Difference: Explained Common Variance (ECV) 58

59 Measurement Models Unidimensional Correlated Trait

60 Psychometric Models in Practice: Bi-factor vs. 2 nd (Higher)-Order Models 2 nd Order Bi-factor

Day 3: Search Continued

Center for Causal Discovery Day 3: Search Continued June 15, 2015 Carnegie Mellon University 1 Outline Models Data 1) Bridge Principles: Markov Axiom and D-separation 2) Model Equivalence 3) Model Search