Recovering the Graph Structure of Restricted Structural Equation Models

Size: px

Start display at page:

Download "Recovering the Graph Structure of Restricted Structural Equation Models"

Randell Dean
6 years ago
Views:

1 Recovering the Graph Structure of Restricted Structural Equation Models Workshop on Statistics for Complex Networks, Eindhoven Jonas Peters 1 J. Mooij 3, D. Janzing 2, B. Schölkopf 2, P. Bühlmann 1 1 Seminar for Statistics, ETH Zürich, Switzerland 2 MPI for Intelligent Systems, Tübingen, Germany 3 Radboud University, Nijmegen, Netherlands 31st January 2013

2 How to win a Nobel Prize F. H. Messerli: Chocolate Consumption, Cognitive Function, and Nobel Laureates, N Engl J Med 2012

3 How to win a Nobel Prize F. H. Messerli: Chocolate Consumption, Cognitive Function, and Nobel Laureates, N Engl J Med 2012

4 How to win a Nobel Prize F. H. Messerli: Chocolate Consumption, Cognitive Function, and Nobel Laureates, N Engl J Med 2012

5 What is the Problem? Given some data, what is the causal structure of the underlying mechanism?

6 What is the Problem? Given some data, what is the causal structure of the underlying mechanism? Understand the (physical) process in more detail.

7 What is the Problem? Given some data, what is the causal structure of the underlying mechanism? Understand the (physical) process in more detail. Intervene! Alain s talk (tomorrow)

8 What is the Problem? Given some data, what is the causal structure of the underlying mechanism? Understand the (physical) process in more detail. Intervene! Alain s talk (tomorrow) Use observational data!

9 What is the Problem? Theoretical: P(X 1,..., X 5 )? DAG G 0 X 4 X 5 X 2 X 3 X 1 Practical: iid observations from? estimated P(X 1,..., X 5 ) DAG G 0

10 Structural Equation Models (SEMs) The joint distribution P(X 1,..., X p ) satisfies a Structural Equation Model (SEM) with DAG G 0 if X i = f i (X PAi, N i ) 1 i p with X PAi being the parents of X i in G 0. The N i are required to be jointly independent.

11 Structural Equation Models (SEMs) P(X 1,..., X 4 ) could be generated by X 1 = f 1 (N 1 ) X 2 = f 2 (X 3, X 4, N 2 ) X 3 = f 3 (X 1, N 3 ) X 4 = f 4 (X 3, N 4 ) N i jointly independent G X 1 X 2 X 3 X 4 X 1 = f 1 (X 3, N 1 ) G 0 X 1 is generated by X 2 = f 2 (N 2 ) X 3 = f 3 (X 2, N 3 ) X 4 = f 4 (X 2, X 3, N 4 ) N i jointly independent X 2 X 3 X 4

12 Structural Equation Models (SEMs) P(X 1,..., X 4 ) could be generated by X 1 = g 1 (M 1 ) X 2 = g 2 (X 3, X 4, M 2 ) X 3 = g 3 (X 1, M 3 ) X 4 = g 4 (X 1, X 3, M 4 ) N i jointly independent G X 1 X 2 X 3 X 4 X 1 = f 1 (X 3, N 1 ) G 0 X 1 is generated by X 2 = f 2 (N 2 ) X 3 = f 3 (X 2, N 3 ) X 4 = f 4 (X 2, X 3, N 4 ) N i jointly independent X 2 X 3 X 4

13 SEMs are not identifiable Proposition Given a distribution P(X 1,..., X p ), we can find an SEM for each graph G, such that P is Markov with respect to G. Special case: two variables. JP: Restricted Structural Equation Models for Causal Inference, PhD Thesis 2012 (and others?)

14 The Idea We gain identifiability by restricting the function class (excluding combinations of functions, input and noise distributions).

15 Two Variables - Good I X 1 = N 1 X 2 = βx 1 + N 2 Then there is no linear SEM with same error variances in the backward direction. with N 1, N 2 iid N (0, σ 2 ). X 2 N 2 L 2 βx 1 X 1

16 Two Variables - Good II Consider a distribution corresponding to X 1 = N 1 X 2 = X N 2 X 1 X 2 with N 1 N 2 with N 1 U[0.1, 0.9] N 2 U[ 0.15, 0.15]

17 Two Variables - Good II

18 Two Variables - Good II Jonas Peters (ETH Zu rich) Recovering the Graph Structure of Restricted SEMs 31st January 2013

19 Two Variables - Good II Consider a distribution corresponding to X 1 = N 1 X 2 = f (X 1 ) + N 2 X 1 X 2 with N 1 N 2 For most combinations (f, P(N 1 ), P(N 2 )) there is no X 1 = g(x 2 ) + M 1 X 2 = M 2 X 1 X 2 with M 1 M 2 More or less one exception: (linear, Gaussian, Gaussian) with different error variances. P. Hoyer, D. Janzing, J. Mooij, JP and B. Schölkopf: Nonlinear causal discovery with additive noise models, NIPS 2008

20 Two Variables Is the case of two variables easy or hard? Easy: Visualization. 2 is a very small number. Hard: It extends to the multivariate case. There are no (cond.) independences that could be exploited.

21 Restricted Structural Equation Models Assumption Assume that P(X 1,..., X p ) follows a (specific type of) restricted SEMs with graph G 0 and assume causal minimality. Theorem Then, the true causal DAG can be recovered from the joint distribution.

22 Restricted Structural Equation Models Linear Gaussian Models with same Error Variance X i = β j X j + N i 1 i p j PA i iid with N i N (0, σ 2 ). Assume β j 0 ( causal minimality). Theorem One can identify G 0 from P(X 1,..., X p ). JP, P. Bühlmann: Identifiability of Gaussian Structural Equation Models with Same Error Variances, ArXiv e-print 2012

23 Restricted Structural Equation Models Non-Linear Additive Noise Models X i = f i (X PAi ) + N i 1 i p Theorem with N i iid and graph G 0. Assume causal minimality. Exclude a few combinations of f i, P(N i ) and P(X PAi ). Then one can identify G 0 from P(X 1,..., X p ). P. Hoyer, D. Janzing, J. Mooij, JP and B. Schölkopf: Nonlinear causal discovery with additive noise models, NIPS 2008 JP, J. M. Mooij, D. Janzing and B. Schölkopf: Identifiability of Causal Graphs using Functional Models, UAI 2011 Very similar for discrete variables JP, D. Janzing and B. Schölkopf: Causal inference on discrete data using additive noise models, IEEE TPAMI 2011

24 Practical Method

25 Practical Method There are DAGs with 13 nodes. How can we find the correct SEM without enumerating all DAGs?

26 Practical Method There are DAGs with 13 nodes. How can we find the correct SEM without enumerating all DAGs? Gaussian SEM with same error variance: BIC with greedy search ( ˆβ, ˆσ 2) ( = argmin l(β, σ 2 ; X (1),..., X (n) ) + log(n) ) β 0 β B,σ 2 R + 2 JP and P. Bühlmann: Identifiability of Gaussian SEMs with same error variances, ArXiv e-print 2012

27 Practical Method There are DAGs with 13 nodes. How can we find the correct SEM without enumerating all DAGs? Gaussian SEM with same error variance: BIC with greedy search ( ˆβ, ˆσ 2) ( = argmin l(β, σ 2 ; X (1),..., X (n) ) + log(n) ) β 0 β B,σ 2 R + 2 JP and P. Bühlmann: Identifiability of Gaussian SEMs with same error variances, ArXiv e-print 2012 Nonlinear SEM: Iterated procedure. Always identify the sink node. (Improvements possible!?) J. Mooij, D. Janzing, JP and B. Schölkopf: Regression by dep. minim. and its appl. to causal inference, ICML 2009

28 Experiment Linear SEMs with same Error Variance a Structural Hamming Distance (to DAG) GDS_SEV GES PC BEST_SCORE a Structural Hamming Distance (to CPDAG) GDS_SEV GES PC BEST_SCORE

29 Experiment Linear SEMs with same Error Variance Table: BIC scores of GES and GDS with SEV on microarray data (smaller is better). Prostate Lymphoma DSM Leukemia Brain NCI Colon GES GDS w/ SEV

30 Experiment How to win a Nobel Prize? No (not enough) data for chocolate

31 Experiment How to win a Nobel Prize? No (not enough) data for chocolate... but we have data for coffee!

32 Experiment How to win a Nobel Prize? # Nobel Laureates / 10 mio coffee consumption per capita (kg) Correlation: 0.698, p-value: <

33 Model class too small? Causally insufficient? Experiment How to win a Nobel Prize? # Nobel Laureates / 10 mio coffee consumption per capita (kg) Correlation: 0.698, p-value: < Nobel Prize Coffee: Dependent residuals (p-value of ). Coffee Nobel Prize: Dependent residuals (p-value of < ).

34 Experiment Nonlinear SEMs with two continuous variables

35 Experiment Nonlinear SEMs with three continuous variables Random variables: X 1 : Altitude X 2 : Temperature X 3 : Hours of sunshine Altitude Sunshine Temperature

36 Experiment Nonlinear SEMs with three continuous variables Altitude, Duration of Sunshine, Temperature (349 samples) linear SEM 1 p value mutual independence test enumerated DAGs DAG 20: Alt Sun Temp

37 Experiment Nonlinear SEMs with three continuous variables Altitude, Duration of Sunshine, Temperature (349 samples) nonlinear SEM 0.01 p value mutual independence test Sun DAG 20: Alt Temp enumerated DAGs

38 Conclusions Restricted SEMs exploit different assumptions than traditional methods.... can identify the true DAG.... work well in practice for graphs with a small number of nodes. interesting tool for causal inference... that should be applied to large-scale data sets. Thank you!

Simplicity of Additive Noise Models

Simplicity of Additive Noise Models Jonas Peters ETH Zürich - Marie Curie (IEF) Workshop on Simplicity and Causal Discovery Carnegie Mellon University 7th June 2014 contains joint work with... ETH Zürich: