Day 3: Search Continued

Size: px

Start display at page:

Download "Day 3: Search Continued"

Jason Mason
5 years ago
Views:

1 Center for Causal Discovery Day 3: Search Continued June 15, 2015 Carnegie Mellon University 1

2 Outline Models Data 1) Bridge Principles: Markov Axiom and D-separation 2) Model Equivalence 3) Model Search A. For Patterns B. For PAGs 4) Multiple Regression vs. Model Search 5) Measurement Issues and Latent Variables 2

3 Search Results? 1) Charitable Giving 2) Lead and IQ 3) Timberlake and Williams 3

4 Constraint-based Search for Patterns 1) Adjacency phase 2) Orientation phase 4

5 Constraint-based Search for Patterns: Adjacency phase X and Y are not adjacent if they are independent conditional on any subset that doesn t X and Y 1) Adjacency Begin with a fully connected undirected graph Remove adjacency X-Y if X _ _ Y any set S

6 X1 X2 X3 Causal Graph X4 Inde pendcie s X1 X2 X1 X4 {X3} X2 X4 {X3} Begin with: X1 X3 X4 X2 From X1 X1 X2 X3 X4 X2 From X1 X1 X4 {X3} X3 X4 X2 From X2 X4 {X3} X1 X3 X4 X2

7 Constraint-based Search for Patterns: Orientation phase 2) Orientation Collider test: Find triples X Y Z, orient according to whether the set that separated X-Z contains Y Away from collider test: Find triples X Y Z, orient Y Z connection via collider test Repeat until no further orientations Apply Meek Rules 7

8 Search: Orientation Patterns Y Unshielded X Y Z Test: X _ _ Z S, is Y S No Yes Collider X Y Z Non-Collider X Y Z X Y Z X Y Z X Y Z

9 Search: Orientation Away from Collider Test Conditions X 1 X 2 X 3 1) X 1 - X 2 adjacent, and into X 2. 2) X 2 - X 3 adjacent 3) X 1 - X 3 not adjacent Test No X 1 _ _ X 3 S, X 2 S Yes X 1 X 3 X 1 X 3 X 2 X 2

10 Search: Orientation After Adjacency Phase X 1 X 3 X 4 X 2 Collider Test: X1 X3 X2 X 1 X1 _ _ X2 X 3 X 4 X 2 Away from Collider Test: X1 X3 X4 X2 X3 X4 X 1 X1 _ _ X4 X3 X 3 X 4 X2 _ _ X4 X3 X 2

11 Away from Collider Power! X 1 X 2 X 3 X 1 _ _ X 3 S, X 2 S X 1 X 2 X 3 X 2 X 3 oriented as X 2 X 3 Why does this test also show that X 2 and X 3 are not confounded? X 1 X 2 X 3 C X 1 X 2 X 3 X 1 _ _ X 3 S, X 2 S X 1 _ _ X 3 S, X 2 S, C S

12 Independence Equivalence Classes: Patterns & PAGs Patterns (Verma and Pearl, 1990): graphical representation of d-separation equivalence among models with no latent common causes PAGs: (Richardson 1994) graphical representation of a d- separation equivalence class that includes models with latent common causes and sample selection bias that are d-separation equivalent over a set of measured variables X 12

13 PAGs: Partial Ancestral Graphs PAG X 1 X 2 X 3 Represents X 2 X 1 X 1 X 2 T 1 T 1 X 3 X 3 X 1 X 2 X 1 X 2 etc. X 3 T 1 T 2 X 3 13

14 PAGs: Partial Ancestral Graphs Z 1 Z 2 PAG X Y Represents Z 1 Z 2 Z 1 Z 2 T 1 X 3 Y X 3 Y etc. T 1 Z 1 Z 2 Z 1 Z 2 T 1 T 2 X 3 Y X 3 Y 14

15 PAGs: Partial Ancestral Graphs What PAG edges mean. X 1 X 2 X 1 and X 2 are not adjacent X 1 X 2 X 2 is not an ancestor of X 1 X 1 X 2 No set d-separates X 2 and X 1 X 1 X 2 X 1 is a cause of X 2 X 1 X 2 There is a latent common cause of X 1 and X 2 15

16 PAG Search: Orientation PAGs Y Unshielded X Y Z X _ _ Z Y X _ _ Z Y Collider Non-Collider X Y Z X Y Z

17 PAG Search: Orientation X1 After Adjacency Phase X3 X4 X2 Collider Test: X1 X3 X2 X1 X1 _ _ X2 X3 X4 X2 Away from Collider Test: X1 X3 X4 X2 X3 X4 X 1 X1 _ _ X4 X3 X2 _ _ X4 X3 X 2 X 3 X 4

18 Interesting Cases M1 L L X1 X2 M2 X Y Z Y1 Y2 L1 Z1 L X Y Z1 X L1 Z2 L2 Z2 M3 Y M4 18

19 Tetrad Demo and Hands-on 1) Create new session 2) Select Search from Simulated Data from Template menu 3) Build graphs for M1, M2, M3 interesting cases, parameterize, instantiate, and generate sample data N=1,000. 4) Execute PC search, a =.05 5) Execute FCI search, a =.05 M1 L X Y Z L M2 X1 Y1 X2 Y2 Z1 L X M3 Y L1 Z2 19

20 Regression & Causal Inference 20

21 Regression & Causal Inference Typical (non-experimental) strategy: 1. Establish a prima facie case (X associated with Y) Z But, omitted variable bias X Y 2. So, identifiy and measure potential confounders Z: a) prior to X, b) associated with X, c) associated with Y 3. Statistically adjust for Z (multiple regression) 21

22 Regression & Causal Inference Multiple regression or any similar strategy is provably unreliable for causal inference regarding X Y, with covariates Z, unless: X truly prior to Y X, Z, and Y are causally sufficient (no confounding) 22

23 Tetrad Demo and Hands-on 1) Create new session 2) Select Search from Simulated Data from Template menu 3) Build a graph for M4 interesting cases, parameterize as SEM, instantiate, and generate sample data N=1,000. 4) Execute PC search, a =.05 5) Execute FCI search, a =.05 23

24 Measurement 24

25 Measurement Error and Coarsening Endanger conditional Independence! X Z Y X _ _ Y Z Z Z* Measurement Error: Z = Z + e X _ _ Y Z (unless Var(e ) = 0) e Coarsening: - < Z < 0 Z* = 0 0 Z < i Z* = 1 i Z < j Z* = 2.. k Z < Z* = k X _ _ Y Z* (almost always) 25

26 Strategies 1. Parameterize measurement error: Sensitivity Analysis Bayesian Analysis Bounds X Z Y Z e 2. Multiple Indicators: X Z Y Z1 Z2 26

27 Strategies 1. Parameterize measurement error: Sensitivity Analysis Bayesian Analysis Bounds X Z Y Z e 2. Multiple Indicators: Scales X Z Y Z1 Z2 X _ _ Y Z_scale Z_scale 27

28 Psuedorandom sample: N = 2,000 Parental Resources Lead Exposure IQ Regression of IQ on Lead, PR Independent Variable Coefficient Estimate p-value PR Lead

29 Multiple Measures of the Confounder e 1 e 2 e 3 X 1 X 2 X 3 Lead Exposure Parental Resources IQ X 1 := g 1 * Parental Resources + e 1 X 2 := g 2 * Parental Resources + e 2 X 3 := g 3 * Parental Resources + e 3 29

30 Scales don't preserve conditional independence X 1 X 2 X 3 Lead Exposure Parental Resources IQ PR_Scale = (X 1 + X 2 + X 3 ) / 3 Independent Variable Coefficient Estimate p-value PR_scale Lead

31 Indicators Don t Preserve Conditional Independence X 1 X 2 X 3 Lead Exposure Parental Resources IQ Regress IQ on: Lead, X 1, X 2, X 3 Independent Variable Coefficient Estimate p-value X X X Lead

32 Strategies 1. Parameterize measurement error: Sensitivity Analysis Bayesian Analysis Bounds 2. Multiple Indicators: Scales SEM ex X Z Y Z e xy X Z Y Z1 Z2 ey ez1 ez2 E( ˆ ) yx 0 X _ _ Y Z 32

33 Structural Equation Models Work True Model Estimated Model X 1 X 2 X 3 X 1 X 2 X 3 Lead Exposure Parental Resources IQ Lead Exposure Parental Resources IQ In the Structural Equation Model E( ˆ) 0 ˆ.07 (p-value =.499) Lead and IQ screened off by PR 33

34 Coarsening is Bad PR_binary Lead Exposure Parental Resources IQ Parental Resources < m(pr) PR_binary = 0 Parental Resources m(pr) PR_binary = 1 Independent Variable Coefficient Estimate p-value Screened-off at.05? PR_binary No Lead No 34

35 Coarsening is Bad PR_binary Lead Exposure Parental Resources IQ Parental Resources < m(pr) PR_binary = 0 Parental Resources m(pr) PR_binary = 1 Independent Variable Coefficient Estimate p-value Screened-off at.05? PR_binary No Lead No 35

36 TV Obesity Proctor, et al. (2003). Television viewing and change in body fat from preschool to early adolescence: The Framingham Children s Study International Journal of Obesity, 27, Exercise TV Obesity (BMI) Diet Goals: Estimate the influence of TV on BMI Tease apart the mechanisms (diet, exercise) 36

37 Measures of Exercise, Diet Exercise_M [L,H] TV (age 4) Exercise Diet (Calories) Obesity (BMI) Age 11 Diet_M [L,H] Exercise_M: L Calories expended in exercise in bottom two tertiles Exercise_M: H Calories expended in exercise in top tertile Diet_M: L Calories consumed in bottom two tertiles Diet_M: H Calories consumed in top tertile 37

38 Measures of Exercise, Diet Exercise_M [L,H] TV (age 4) Exercise Diet (Calories) Obesity (BMI) Age 11 Diet_M [L,H] Findings: TV and Obesity NOT screened off by Exercise_M & Diet_M Bias in mechanism estimation unknown 38

39 Problems with Latent Variable SEMs Specified Model True Model ex xy ey ex ey X Z Y X Z Y Z1 Z2 Z1 Z2 ez1 ez2 ez1 ez2 E( ˆ ) yx 0 X _ _ Y Z E( ˆ yx ) xy 39

40 Latent Variable Models Full Model F1 F2 F3 x 1 x 2 x 3 x 4 y 1 y 2 y 3 y 4 z 1 z 2 z 3 z 4 Structural Model F1 F2 F3 Measurement Model F1 F2 F3 x 1 x 2 x 3 x 4 y 1 y 2 y 3 y 4 z 1 z 2 z 3 z 4 40

41 Psychometric Models Social/Personality Psychology St 1 1 Dep 1 1 St St 21 1 Stress? + Depression? Coping Via Religion Dep Dep 20 1 C 1 C 2.. C 20 41

42 Psychometric Models Educational Research 42

43 Local Independence / Pure Measurement Models Not Locally Independent F1 F2 F3 x 1 x 2 x 3 x 4 y 1 y 2 y 3 y 4 z 1 z 2 z 3 z 4 F Locally Independent F1 F2 F3 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x 11 x 12 Local Independence: For every pair of measured items x i, x j : x i _ _ x j modeled latent parents of x i 43

44 Local Independence / Pure Measurement Models Impure F1 F2 F3 x 1 x 2 x 3 x 4 y 1 y 2 y 3 y 4 z 1 z 2 z 3 z 4 F 1 st Order Pure F1 F2 F3 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x 11 x 12 44

45 Local Independence / Pure Measurement Models 1 st Order Pure F1 F2 F3 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x 11 x 12 2 nd Order Pure F1 F2 F3 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x 11 x 12 G 45

46 Rank 1 Constraints: Tetrad Equations Fact: given W L it follows that X Y Z W = 1 L + e 1 X = 2 L + e 2 Y = 3 L + e 3 Z = 4 L + e 4 Cov WX Cov YZ = ( L) ( L) = = ( L) ( L) =Cov WY Cov XZ WX YZ = WY XZ = WZ XY tetrad constraints

47 Charles Spearman (1904) Statistical Constraints Measurement Model Structure r m1,m2 * r r1,r2 = r m1,r1 * r m2,r2 = r m1,r2 * r m2,r1 g m1 m2 r1 r2

48 Impurities/Deviations from Local Independence defeat tetrad constraints selectively Truth F1 Truth F1 x 1 x 2 x 3 x 4 x 1 x 2 x 3 x 4 F5 r x1,x2 * r x3,x4 = r x1,x3 * r x2,x4 r x1,x2 * r x3,x4 = r x1,x4 * r x2,x3 r x1,x3 * r x2,x4 = r x1,x4 * r x2,x3 r x1,x2 * r x3,x4 = r x1,x3 * r x2,x4 r x1,x2 * r x3,x4 = r x1,x4 * r x2,x3 r x1,x3 * r x2,x4 = r x1,x4 * r x2,x3 48

49 Strategies 1. Cluster and Purify MM first 1. Use rank constraints to find item subsets that form n th order pure clusters 2. Using Pure MM : Search for Structural Model by testing independence relations among latents via SEM estimation 2. Specify Impure Measurement Model 1. Specify Measurement Model for all items 2. Using Specified MM: Search for Structural Model by testing independence relations among latents via SEM estimation 49

50 Purify Impure F1 F2 F3 x 1 x 2 x 3 x 4 y 1 y 2 y 3 y 4 z 1 z 2 z 3 z 4 F 1 st Order Purified F1 F2 F3 x 1 x 2 X3 y 1 y 2 y 3 y 4 z 1 z 3 z 4 X 4 Z2 50

51 Search for Measurement Models BPC, FOFC: Find One Factor Clusters Input: Covariance Matrix of measured items: Output: Subset of items and clusters that are 1st Order Pure FTFC: Find Two Factor Clusters Input: Covariance Matrix of measured items: Output: Subset of items and clusters that are 2 nd Order Pure

52 BPC Case Study: Stress, Depression, and Religion Masters Students (N = 127) 61 - item survey (Likert Scale) Stress: St 1 - St 21 Depression: D 1 - D 20 Religious Coping: C 1 - C 20 St 1 1 St St 21 1 Specified Model Depression Stress + + Coping - Dep 1 1 Dep Dep 20 1 P(c2) = 0.00 C 1 C 2.. C 20 52

53 Case Study: Stress, Depression, and Religion St 3 1 Build Pure Clusters St 4 1 St 16 1 St 18 1 Stress Coping Dep 9 1 Depression Dep 13 1 Dep 19 1 St 20 1 C 9 C 12 C 15 C 14 53

54 Case Study: Stress, Depression, and Religion Assume : Stress causally prior to Depression Find : Stress _ _ Coping Depression St 3 1 St 4 1 St 16 1 St 18 1 Stress + Coping Dep 9 1 Depression Dep Dep 19 1 St 20 1 C 9 C 12 C 15 C 14 P(c2) =

55 2 nd Order Pure F1 F2 F3 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x 11 x 12 G S(X) FTFC 2 nd -Order Pure Clusters: {X 1, X 2, X 3, X 4, X 5, X 6 } {X 8, X 9, X 10, X 11, X 12 } 55

56 Summary of Search 56

57 Causal Search from Passive Observation PC, FGS Patterns (Markov equivalence class - no latent confounding) FCI PAGs (Markov equivalence - including confounders and selection bias) CCD Linear cyclic models (no confounding) Lingam unique DAG (no confounding linear non-gaussian faithfulness not needed) BPC, FOFC, FTFC (Equivalence class of linear latent variable models) LVLingam set of DAGs (confounders allowed) CyclicLingam set of DGs (cyclic models, no confounding) Non-linear additive noise models unique DAG Most of these algorithms are pointwise consistent uniform consistent algorithms require stronger assumptions 57

58 Causal Search from Manipulations/Interventions What sorts of manipulation/interventions have been studied? Do(X=x) : replace P(X parents(x)) with P(X=x) = 1.0 Randomize(X): (replace P(X parents(x)) with P M (X), e.g., uniform) Soft interventions (replace P(X parents(x)) with P M (X parents(x), I), P M (I)) Simultaneous interventions (reduces the number of experiments required to be guaranteed to find the truth with an independence oracle from N-1 to 2 log(n) Sequential interventions Sequential, conditional interventions Time sensitive interventions 58

Causal Discovery with Latent Variables: the Measurement Problem

Causal Discovery with Latent Variables: the Measurement Problem Richard Scheines Carnegie Mellon University Peter Spirtes, Clark Glymour, Ricardo Silva, Steve Riese, Erich Kummerfeld, Renjie Yang, Max