Causal Discovery with Latent Variables: the Measurement Problem

Similar documents
Day 3: Search Continued

Automatic Causal Discovery

Measurement Error and Causal Discovery

Tutorial: Causal Model Search

Learning the Structure of Linear Latent Variable Models

Causal Discovery. Richard Scheines. Peter Spirtes, Clark Glymour, and many others. Dept. of Philosophy & CALD Carnegie Mellon

Challenges (& Some Solutions) and Making Connections

Using Descendants as Instrumental Variables for the Identification of Direct Causal Effects in Linear SEMs

Causal Clustering for 1-Factor Measurement Models

Generalized Measurement Models

Center for Causal Discovery: Summer Workshop

Causality in Econometrics (3)

Causal Models with Hidden Variables

Carnegie Mellon Pittsburgh, Pennsylvania 15213

Introduction to Probabilistic Graphical Models

The intersection axiom of

LEARNING CAUSAL EFFECTS: BRIDGING INSTRUMENTS AND BACKDOORS

Causal Discovery in the Presence of Measurement Error: Identifiability Conditions

Causal Inference & Reasoning with Causal Bayesian Networks

Towards Association Rules with Hidden Variables

Latent Variable and Network Model Implications for Partial Correlation Structures. Riet van Bork University of Amsterdam

Causal Reasoning with Ancestral Graphs

OF CAUSAL INFERENCE THE MATHEMATICS IN STATISTICS. Department of Computer Science. Judea Pearl UCLA

Abstract. Three Methods and Their Limitations. N-1 Experiments Suffice to Determine the Causal Relations Among N Variables

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Equivalence in Non-Recursive Structural Equation Models

CHAPTER 3 BOOLEAN ALGEBRA

A DARK GREY P O N T, with a Switch Tail, and a small Star on the Forehead. Any

BN Semantics 3 Now it s personal!

STAT 518 Intro Student Presentation

Learning causal network structure from multiple (in)dependence models

Learning Instrumental Variables with Structural and Non-Gaussianity Assumptions

Acyclic Linear SEMs Obey the Nested Markov Property

The Specification of Causal Models with Tetrad IV: A Review

Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012

Review: Directed Models (Bayes Nets)

E&CE 223 Digital Circuits & Systems. Lecture Transparencies (Boolean Algebra & Logic Gates) M. Sachdev

Arrowhead completeness from minimal conditional independencies

The TETRAD Project: Constraint Based Aids to Causal Model Specification

E&CE 223 Digital Circuits & Systems. Lecture Transparencies (Boolean Algebra & Logic Gates) M. Sachdev. Section 2: Boolean Algebra & Logic Gates

Linear regression methods

Learning Measurement Models for Unobserved Variables

Causal Discovery by Computer

A MCMC Approach for Learning the Structure of Gaussian Acyclic Directed Mixed Graphs

Causal Discovery with Linear Non-Gaussian Models under Measurement Error: Structural Identifiability Results

SEM with observed variables: parameterization and identification. Psychology 588: Covariance structure and factor models

Causal Inference in the Presence of Latent Variables and Selection Bias

REVIEW 8/2/2017 陈芳华东师大英语系

CAUSAL INFERENCE IN THE EMPIRICAL SCIENCES. Judea Pearl University of California Los Angeles (

Estimation of linear non-gaussian acyclic models for latent factors

Consequences of measurement error. Psychology 588: Covariance structure and factor models

Controlling for latent confounding by confirmatory factor analysis (CFA) Blinded Blinded

Multivariate Gaussians. Sargur Srihari

Machine Learning

Supplementary material to Structure Learning of Linear Gaussian Structural Equation Models with Weak Edges

Tutorial: Gaussian conditional independence and graphical models. Thomas Kahle Otto-von-Guericke Universität Magdeburg

Why write proofs? Why not just test and repeat enough examples to confirm a theory?

LOWELL JOURNAL. MUST APOLOGIZE. such communication with the shore as Is m i Boimhle, noewwary and proper for the comfort

Two Posts to Fill On School Board

Based on slides by Richard Zemel

COMP538: Introduction to Bayesian Networks

Machine Learning Gaussian Naïve Bayes Big Picture

Course 2BA1: Hilary Term 2007 Section 8: Quaternions and Rotations

ECON 3150/4150, Spring term Lecture 7

High-dimensional regression modeling

Strategies for Discovering Mechanisms of Mind using fmri: 6 NUMBERS. Joseph Ramsey, Ruben Sanchez Romero and Clark Glymour

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

arxiv: v1 [stat.ml] 9 Nov 2015

Correlation and Linear Regression

Probabilistic Graphical Models. Guest Lecture by Narges Razavian Machine Learning Class April

Correlation: Relationships between Variables

Towards an extension of the PC algorithm to local context-specific independencies detection

Learning equivalence classes of acyclic models with latent and selection variables from multiple datasets with overlapping variables

Signals and Systems Digital Logic System

Unit 2 Boolean Algebra

CORRELATION. compiled by Dr Kunal Pathak

ECE 238L Boolean Algebra - Part I

Bayesian Networks Representation

Cross-Validation with Confidence

Pairwise Cluster Comparison for Learning Latent Variable Models

Machine Learning

CHAPTER 2 BOOLEAN ALGEBRA

Chapter 2 Combinational Logic Circuits

Key Concepts. Correlation (Pearson & Spearman) & Linear Regression. Assumptions. Correlation parametric & non-para. Correlation

arxiv: v3 [stat.me] 3 Jun 2015

Machine Learning Srihari. Probability Theory. Sargur N. Srihari

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46

arxiv: v2 [stat.ml] 21 Sep 2009

PMR Learning as Inference

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

Bayesian Networks Structure Learning (cont.)

Learning Marginal AMP Chain Graphs under Faithfulness

Directed and Undirected Graphical Models

Digital Circuit And Logic Design I. Lecture 3

Learning Semi-Markovian Causal Models using Experiments

Correlation. Quantitative Aptitude & Business Statistics

Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011

Inferring the Causal Decomposition under the Presence of Deterministic Relations.

Machine Learning

PALACE PIER, ST. LEONARDS. M A N A G E R - B O W A R D V A N B I E N E.

Transcription:

Causal Discovery with Latent Variables: the Measurement Problem Richard Scheines Carnegie Mellon University Peter Spirtes, Clark Glymour, Ricardo Silva, Steve Riese, Erich Kummerfeld, Renjie Yang, Max Mansolf 1

Outline 1. Measurement Error, Coarsening and Conditional Independence 2. Conventional Strategies 3. Latent Variable Models to the Rescue 4. The Problem of Impurity 5. Strategies for Handling Impurity (Rank Constraints) 6. Application to Psychometric Models

Causal Structure D-separation Causal Markov Axiom Testable Statistical Predictions Causal Graphs e.g., Conditional Independence X Y Z X _ _ Z Y x,y,z P(X = x, Z=z Y=y) = P(X = x Y=y) P(Z=z Y=y) 3

Measurement Error and Coarsening Endanger conditional Independence X Z Y X _ _ Y Z Z Z* ε Z = Z + ε Measurement Error Coarsening: - < Z < 0 à Z* = 0 0 Z < i à Z* = 1 i Z < j à Z* = 2.. k Z < à Z* = k X _ _ Y Z (unless Var(ε ) = 0) X _ _ Y Z* (almost always) 4

Parameterizing Measurement Error Y Y m := Y + ε ym ε ym ~ N(0,σ 2 2 ) Y m Measurement_Error = ε Ym ε ym Var(Y m ) = Var(Y) + Var(ε Ym ) Amount of Measurement Error = Var(eYm)/Var(Ym) = Var(Ym) Var(Y)/Var(Ym) 5

Unstandardized Standardized Y m := Y + ε ym Y Y Y m := β* m Y + ε y m β* m ε ym ~ N(0,σ 2 ) Y m Y m ε y m ~ N(0,σ 2 ) ε ym ε y'm Var(Y) = Var(Y m ) = 1.0 Measurement Error = Var(eYm)/Var(Ym) Var(ε y m ) = σ 2 = 1 - β* m 2 β* m = ρ(y, Y m ) Var(ε y m ) = 1 - ρ(y, Y m ) 2 Measurement Error = Var(eY m)/ Var(Y m) 6 Measurement Error = 1 r(y, Y m)2/1.0

Measurement Error True Graph Parameterization 1: Standardized SEM IM Measurement error (negligible)

Measurement Error N= 1,000

Measurement Error Parameterization 1: Negligible Measurement Error GES Output FCI Output

Measurement Error Parameterization 2: Small Measurement Error GES Output FCI Output (α =.05)

Measurement Error Parameterization 2: Small Measurement Error GES Output FCI Output (α =.01)

Coarsening Parameterization 3: GES Output FCI Output (α =.05)

Coarsening L1 > 0.0 à L1c = 1.0 L1 0.0 à L1c = 0.0 L1 > 0.0 à L1c = 1.0 L1 0.0 à L1c = 0.0

Strategies X Z Y 1. Guess (the amount of measurement error): Sensitivity Analysis Bayesian Analysis Bounds Z ε Measurement Error = 1 ρ(z, Z )2 Prior over 14

Strategies 1. Guess the measurement error: Sensitivity Analysis Bayesian Analysis Bounds X Z Y Z ε 2. Multiple Indicators: X Z Y Z1 Z2 15

Strategies 1. Parameterize measurement error: Sensitivity Analysis Bayesian Analysis Bounds X Z Y Z ε 2. Multiple Indicators: Scales X Z Y Z1 Z2 X _ _ Y Z_scale εz1 Z_scale εz2 16

Simulated Example: Lead and IQ Parental Resources Lead Exposure -.5 1.0 IQ Psuedorandom sample: N = 2,000 Test of Lead _ _ IQ Parental Resources: Regression: Dependent Variable: IQ Independent Variables: Lead, PR Independent Variable Coefficient Estimate p-value PR 0.98 0.000 Lead -0.088 0.378 17

Multiple Measures of the Confounder ε 1 ε 2 ε 3 X 1 X 2 X 3 Lead Exposure Parental Resources IQ X 1 := Parental Resources + ε 1 X 2 := Parental Resources + ε 2 X 3 := Parental Resources + ε 3 18

Scales don't preserve conditional independence PR_Scale ε 1 ε 2 ε 3 X 1 X 2 X 3 Lead Exposure Parental Resources IQ PR_Scale = (X 1 + X 2 + X 3 ) / 3 Independent Variable Coefficient Estimate p-value PR_scale 0.290 0.000 Lead -0.423 0.000 19

Indicators Don t Preserve Conditional Independence X 1 X 2 X 3 Lead Exposure Parental Resources IQ Regress IQ on: Lead, X 1, X 2, X 3 Independent Variable Coefficient Estimate p-value X 1 0.22 0.002 X 2 0.45 0.000 X 3 0.18 0.013 Lead -0.414 0.000 20

Strategies 1. Parameterize measurement error: Sensitivity Analysis Bayesian Analysis Bounds 2. Multiple Indicators: Scales SEM εx X Z Y Z ε βxy X Z Y Z1 Z2 εy εz1 εz2 E( ˆ β ) = yx 0 X _ _ Y Z 21

Structural Equation Models Work! True Model Estimated Model X 1 X 2 X 3 X 1 X 2 X 3 Lead Exposure Parental Resources IQ Lead Exposure Parental Resources IQ β In the Estimated Model E( ˆ) β = 0 ˆ β =.07 (p-value =.499) Lead and IQ detectibly screened off by PR 22

Test conditional independence relations among latents generally in a SEM Question: L 1 _ _ L 2 {Q 1, Q 2,..., Q n } β 21 E( ˆ β 21 ) = 0 L 1 _ _ L 2 {Q 1, Q 2,..., Q n }

Pure Measurement Models: Causal Discovery among latents feasible Independence Questions: e.g., L 1 _ _ L 2 {L 3, L 4,..., L n } Constraint Based Search over L MIMBuild (PC) Pattern, PAG over L

The Problem of Impurities ( Unmodeled Complexity ) Specified Model True Model εx β yx εy εx εy X Z Y X Z Y Z1 Z2 Z1 Z2 εz1 εz2 εz1 εz2 E( ˆ β ) = yx 0 X _ _ Y Z E( ˆ β yx ) β xy 25

Test conditional independence relations among latents generally Question: L 1 _ _ L 2 {Q 1, Q 2,..., Q n } β 21 E( ˆ β 21 ) = 0 L 1 _ _ L 2 {Q 1, Q 2,..., Q n }

Strategy: Purify the Measurement Model Full Model F1 F2 F3 x 1 x 2 x 3 x 4 y 1 y 2 y 3 y 4 z 1 z 2 z 3 z 4 Structural Model F1 F2 F3 Measurement Model F1 F2 F3 x 1 x 2 x 3 x 4 y 1 y 2 y 3 y 4 z 1 z 2 z 3 z 4 27

Local Independence / Pure Measurement Models Impure F1 F2 F3 x 1 x 2 x 3 x 4 y 1 y 2 y 3 y 4 z 1 z 2 z 3 z 4 F 1 st Order Pure F1 F2 F3 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x 11 x 12 28

Purify Impure F1 F2 F3 x 1 x 2 x 3 x 4 y 1 y 2 y 3 y 4 z 1 z 2 z 3 z 4 F 1 st Order Purified F1 F2 F3 x 1 x 2 X3 y 1 y 2 y 3 y 4 z 1 z 3 z 4 X 4 Z2 29

Testable/Observable Contraints Latent variable models that are linear below the latents entail testable rank constraints on the measured covariance matrix regardless of the structural model. Impurities selectively defeat such implications, and are thus in many circumstances detectable and localizable. 30

Rank 1 Constraints: Tetrad Equations Fact: given W L λ 1 λ 2 λ 3 λ 4 X Y Z W = λ 1 L + ε 1 X = λ 2 L + ε 2 Y = λ 3 L + ε 3 Z = λ 4 L + ε 4 it follows that Cov WX Cov YZ = (λ 1 λ 2 σ 2 L ) (λ 3λ 4 σ 2 L ) = = (λ 1 λ 3 σ 2 L ) (λ 2λ 4 σ 2 L ) = Cov WY Cov XZ σ WX σ YZ = σ WY σ XZ = σ WZ σ XY tetrad constraints

Charles Spearman (1904) Statistical Constraints à Measurement Model Structure ρ m1,m2 * ρ r1,r2 = ρ m1,r1 * ρ m2,r2 = ρ m1,r2 * ρ m2,r1 g m1 m2 r1 r2

Impurities defeat rank (e.g., tetrad) constraints selectively Truth F1 Truth F1 x 1 x 2 x 3 x 4 x 1 x 2 x 3 x 4 F5 ρ x1,x2 * ρ x3,x4 = ρ x1,x3 * ρ x2,x4 ρ x1,x2 * ρ x3,x4 = ρ x1,x4 * ρ x2,x3 ρ x1,x3 * ρ x2,x4 = ρ x1,x4 * ρ x2,x3 ρ x1,x2 * ρ x3,x4 = ρ x1,x3 * ρ x2,x4 ρ x1,x2 * ρ x3,x4 = ρ x1,x4 * ρ x2,x3 ρ x1,x3 * ρ x2,x4 = ρ x1,x4 * ρ x2,x3 33

2006: Build (1 st -Order) Pure Clusters (BPC) Input: - Covariance matrix over set of original items BPC (FOFC) 1) Cluster (complicated boolean combinations of tetrads) 2) Purify Output: Equivalence class of 1 st order pure clusters BPC: Pointwise consistent Silva, R., Glymour, C., Scheines, R. and Spirtes, P. (2006) Learning the Structure of Latent Linear Structure Models, Journal of Machine Learning Research, 7, 191-246. 34

Case Study: Stress, Depression, and Religion Masters Students (N = 127) 61 - item survey (Likert Scale) Stress: St 1 - St 21 Depression: D 1 - D 20 Religious Coping: C 1 - C 20 St 1 1 St 2 1.. St 21 1 Specified Model Depression Stress + + Coping - Dep 1 1 Dep 2 1.. Dep 20 1 P(χ2) = 0.00 C 1 C 2.. C 20 35

Case Study: Stress, Depression, and Religion St 3 1 Build Pure Clusters St 4 1 St 16 1 St 18 1 Stress Coping Dep 9 1 Depression Dep 13 1 Dep 19 1 St 20 1 C 9 C 12 C 15 C 14 36

Case Study: Stress, Depression, and Religion Assume : Stress causally prior to Depression Find : Stress _ _ Coping Depression St 3 1 St 4 1 St 16 1 St 18 1 Stress + Coping Dep 9 1 Depression Dep 13 1 + Dep 19 1 St 20 1 C 9 C 12 C 15 C 14 P(χ2) = 0.28 37

Psychometric Models in Practice: Almost Never 1 st Order Pure or 1 st -order Purifiable

Psychometric Models in Practice: Almost Never 1 st Order Pure or 1 st -order Purifiable Bi-factor structure

Local Independence / Pure Measurement Models 1 st Order Pure F1 F2 F3 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x 11 x 12 2 nd Order Pure F1 F2 F3 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x 11 x 12 G 40

Beyond 1 st -Order Purity and Tetrads Drton, M., Sturmfels, B., Sullivant, S. (2007) Algebraic factor analysis: tetrads, pentads and beyond, Probability Theory and Related Fields, 138, 3-4, 463-493 Sullivant, S., Talaska, K., & Draisma, J. (2010). Trek Separation for Gaussian Graphical Models. Annals of Statistics, 38(3), 1665-1685 41

Trek-Separation 42

2013 FTFC (2 nd -order pure measurement clusters) Input: Covariance Matrix of measured items: Output: Subset of items and clusters that are 2 nd Order Pure Kummerfeld and Ramsey

2 nd Order Impure F1 F2 F3 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x 11 x 12 G Σ(X) FTFC 2 nd -Order Pure Clusters: {X 1, X 2, X 3, X 4, X 5, X 6 } {X 8, X 9, X 10, X 11, X 12 } 44

Psychometric Models in Practice:Bi-factor vs. Higher-Order Bi-factor Higher-order

Model Comparison: 2 nd -Order vs. Bi-factor 2 nd -Order entails more constraints (simpler, more degrees of freedom, nesting not simple) Statistical comparison: fit - complexity fit to data (good) complexity (bad) p-value(χ 2 ), BIC, AIC, etc. Theoretically BIC desirable.

Model Comparison: 2 nd -Order vs. Bi-factor Empirical finding: Bi-factor usually wins (65 published studies comparing: all chose bi-factors) What if data-generating process is neither 2 nd -order nor bi-factor? What if there is unmodeled complexity?

Simulation Study: Truth: 2 nd -Order

Simulation Study: 2 Strategies Data from impure 2 nd -order model 1. Before FOFC: Fit and compare pure versions of 2 nd -order, bi-factor models with correct clustering on all items 2. After FOFC/FTFC Locate and discard impure indicators Fit and compare pure versions of 2 nd -order, bi-factor models with correct clustering on retained items

Truth Purified Version of Truth {X12, X13, X20} removed

Simulation Study Truth: 2 nd -Order dbic > 0: 2 nd -Order Model dbic < 0: Bi-factor Model

Size of the impurities dbic > 0: 2 nd -Order Model Truth: 2 nd -Order dbic < 0: Bi-factor Model

Level of Inhomogeneity Inhomogeneiity: the degree to which the impurities are concentrated on a small number of indicators Low Inhomogeneiity High Inhomogeneiity

Level of Inhomogeneity Truth: 2 nd -Order dbic > 0: 2 nd -Order Model dbic < 0: Bi-factor Model

Sample Size Truth: 2 nd -Order dbic > 0: 2 nd -Order Model dbic < 0: Bi-factor Model

Thanks 56

Measurement Model Misspecification à Structural Parameter Bias 57

Measurement Model Misspecification à Structural Parameter Bias Specified Model True Model Parametric Measure of Difference: Explained Common Variance (ECV) 58

Measurement Models Unidimensional Correlated Trait

Psychometric Models in Practice: Bi-factor vs. 2 nd (Higher)-Order Models 2 nd Order Bi-factor