Recovering the Graph Structure of Restricted Structural Equation Models Workshop on Statistics for Complex Networks, Eindhoven Jonas Peters 1 J. Mooij 3, D. Janzing 2, B. Schölkopf 2, P. Bühlmann 1 1 Seminar for Statistics, ETH Zürich, Switzerland 2 MPI for Intelligent Systems, Tübingen, Germany 3 Radboud University, Nijmegen, Netherlands 31st January 2013
How to win a Nobel Prize F. H. Messerli: Chocolate Consumption, Cognitive Function, and Nobel Laureates, N Engl J Med 2012
How to win a Nobel Prize F. H. Messerli: Chocolate Consumption, Cognitive Function, and Nobel Laureates, N Engl J Med 2012
How to win a Nobel Prize F. H. Messerli: Chocolate Consumption, Cognitive Function, and Nobel Laureates, N Engl J Med 2012
What is the Problem? Given some data, what is the causal structure of the underlying mechanism?
What is the Problem? Given some data, what is the causal structure of the underlying mechanism? Understand the (physical) process in more detail.
What is the Problem? Given some data, what is the causal structure of the underlying mechanism? Understand the (physical) process in more detail. Intervene! Alain s talk (tomorrow)
What is the Problem? Given some data, what is the causal structure of the underlying mechanism? Understand the (physical) process in more detail. Intervene! Alain s talk (tomorrow) Use observational data!
What is the Problem? Theoretical: P(X 1,..., X 5 )? DAG G 0 X 4 X 5 X 2 X 3 X 1 Practical: iid observations from? estimated P(X 1,..., X 5 ) DAG G 0
Structural Equation Models (SEMs) The joint distribution P(X 1,..., X p ) satisfies a Structural Equation Model (SEM) with DAG G 0 if X i = f i (X PAi, N i ) 1 i p with X PAi being the parents of X i in G 0. The N i are required to be jointly independent.
Structural Equation Models (SEMs) P(X 1,..., X 4 ) could be generated by X 1 = f 1 (N 1 ) X 2 = f 2 (X 3, X 4, N 2 ) X 3 = f 3 (X 1, N 3 ) X 4 = f 4 (X 3, N 4 ) N i jointly independent G X 1 X 2 X 3 X 4 X 1 = f 1 (X 3, N 1 ) G 0 X 1 is generated by X 2 = f 2 (N 2 ) X 3 = f 3 (X 2, N 3 ) X 4 = f 4 (X 2, X 3, N 4 ) N i jointly independent X 2 X 3 X 4
Structural Equation Models (SEMs) P(X 1,..., X 4 ) could be generated by X 1 = g 1 (M 1 ) X 2 = g 2 (X 3, X 4, M 2 ) X 3 = g 3 (X 1, M 3 ) X 4 = g 4 (X 1, X 3, M 4 ) N i jointly independent G X 1 X 2 X 3 X 4 X 1 = f 1 (X 3, N 1 ) G 0 X 1 is generated by X 2 = f 2 (N 2 ) X 3 = f 3 (X 2, N 3 ) X 4 = f 4 (X 2, X 3, N 4 ) N i jointly independent X 2 X 3 X 4
SEMs are not identifiable Proposition Given a distribution P(X 1,..., X p ), we can find an SEM for each graph G, such that P is Markov with respect to G. Special case: two variables. JP: Restricted Structural Equation Models for Causal Inference, PhD Thesis 2012 (and others?)
The Idea We gain identifiability by restricting the function class (excluding combinations of functions, input and noise distributions).
Two Variables - Good I X 1 = N 1 X 2 = βx 1 + N 2 Then there is no linear SEM with same error variances in the backward direction. with N 1, N 2 iid N (0, σ 2 ). X 2 N 2 L 2 βx 1 X 1
Two Variables - Good II Consider a distribution corresponding to X 1 = N 1 X 2 = X 2 1 + N 2 X 1 X 2 with N 1 N 2 with N 1 U[0.1, 0.9] N 2 U[ 0.15, 0.15]
Two Variables - Good II
Two Variables - Good II Jonas Peters (ETH Zu rich) Recovering the Graph Structure of Restricted SEMs 31st January 2013
Two Variables - Good II Consider a distribution corresponding to X 1 = N 1 X 2 = f (X 1 ) + N 2 X 1 X 2 with N 1 N 2 For most combinations (f, P(N 1 ), P(N 2 )) there is no X 1 = g(x 2 ) + M 1 X 2 = M 2 X 1 X 2 with M 1 M 2 More or less one exception: (linear, Gaussian, Gaussian) with different error variances. P. Hoyer, D. Janzing, J. Mooij, JP and B. Schölkopf: Nonlinear causal discovery with additive noise models, NIPS 2008
Two Variables Is the case of two variables easy or hard? Easy: Visualization. 2 is a very small number. Hard: It extends to the multivariate case. There are no (cond.) independences that could be exploited.
Restricted Structural Equation Models Assumption Assume that P(X 1,..., X p ) follows a (specific type of) restricted SEMs with graph G 0 and assume causal minimality. Theorem Then, the true causal DAG can be recovered from the joint distribution.
Restricted Structural Equation Models Linear Gaussian Models with same Error Variance X i = β j X j + N i 1 i p j PA i iid with N i N (0, σ 2 ). Assume β j 0 ( causal minimality). Theorem One can identify G 0 from P(X 1,..., X p ). JP, P. Bühlmann: Identifiability of Gaussian Structural Equation Models with Same Error Variances, ArXiv e-print 2012
Restricted Structural Equation Models Non-Linear Additive Noise Models X i = f i (X PAi ) + N i 1 i p Theorem with N i iid and graph G 0. Assume causal minimality. Exclude a few combinations of f i, P(N i ) and P(X PAi ). Then one can identify G 0 from P(X 1,..., X p ). P. Hoyer, D. Janzing, J. Mooij, JP and B. Schölkopf: Nonlinear causal discovery with additive noise models, NIPS 2008 JP, J. M. Mooij, D. Janzing and B. Schölkopf: Identifiability of Causal Graphs using Functional Models, UAI 2011 Very similar for discrete variables JP, D. Janzing and B. Schölkopf: Causal inference on discrete data using additive noise models, IEEE TPAMI 2011
Practical Method
Practical Method There are 18676600744432035186664816926721 DAGs with 13 nodes. How can we find the correct SEM without enumerating all DAGs?
Practical Method There are 18676600744432035186664816926721 DAGs with 13 nodes. How can we find the correct SEM without enumerating all DAGs? Gaussian SEM with same error variance: BIC with greedy search ( ˆβ, ˆσ 2) ( = argmin l(β, σ 2 ; X (1),..., X (n) ) + log(n) ) β 0 β B,σ 2 R + 2 JP and P. Bühlmann: Identifiability of Gaussian SEMs with same error variances, ArXiv e-print 2012
Practical Method There are 18676600744432035186664816926721 DAGs with 13 nodes. How can we find the correct SEM without enumerating all DAGs? Gaussian SEM with same error variance: BIC with greedy search ( ˆβ, ˆσ 2) ( = argmin l(β, σ 2 ; X (1),..., X (n) ) + log(n) ) β 0 β B,σ 2 R + 2 JP and P. Bühlmann: Identifiability of Gaussian SEMs with same error variances, ArXiv e-print 2012 Nonlinear SEM: Iterated procedure. Always identify the sink node. (Improvements possible!?) J. Mooij, D. Janzing, JP and B. Schölkopf: Regression by dep. minim. and its appl. to causal inference, ICML 2009
Experiment Linear SEMs with same Error Variance 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 5 10 15 20 25 30 a Structural Hamming Distance (to DAG) GDS_SEV GES PC BEST_SCORE 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 5 10 15 20 25 30 a Structural Hamming Distance (to CPDAG) GDS_SEV GES PC BEST_SCORE
Experiment Linear SEMs with same Error Variance Table: BIC scores of GES and GDS with SEV on microarray data (smaller is better). Prostate Lymphoma DSM Leukemia Brain NCI Colon GES 4095 4560 2711 5456 1411 5891 3224 GDS w/ SEV 6057 5404 3236 5481 1343 6288 3201
Experiment How to win a Nobel Prize? No (not enough) data for chocolate
Experiment How to win a Nobel Prize? No (not enough) data for chocolate... but we have data for coffee!
Experiment How to win a Nobel Prize? # Nobel Laureates / 10 mio 0 5 15 25 0 2 4 6 8 10 12 coffee consumption per capita (kg) Correlation: 0.698, p-value: < 2.2 10 16.
Model class too small? Causally insufficient? Experiment How to win a Nobel Prize? # Nobel Laureates / 10 mio 0 5 15 25 0 2 4 6 8 10 12 coffee consumption per capita (kg) Correlation: 0.698, p-value: < 2.2 10 16. Nobel Prize Coffee: Dependent residuals (p-value of 1.8 10 11 ). Coffee Nobel Prize: Dependent residuals (p-value of < 2.2 10 16 ).
Experiment Nonlinear SEMs with two continuous variables
Experiment Nonlinear SEMs with three continuous variables Random variables: X 1 : Altitude X 2 : Temperature X 3 : Hours of sunshine Altitude Sunshine Temperature 205 1552 9.7 46 1443 8.2 794 1097 6.4 325 1572 8.1 500 1368 6.2 215 1594 9.4 383 1591 7.8 54 1702 8.3...
Experiment Nonlinear SEMs with three continuous variables Altitude, Duration of Sunshine, Temperature (349 samples) linear SEM 1 p value mutual independence test 0.8 0.6 0.4 0.2 0 0 5 10 15 20 25 enumerated DAGs DAG 20: Alt Sun Temp
Experiment Nonlinear SEMs with three continuous variables Altitude, Duration of Sunshine, Temperature (349 samples) nonlinear SEM 0.01 p value mutual independence test 0.008 0.006 0.004 0.002 0 0 5 10 15 20 25 Sun DAG 20: Alt Temp enumerated DAGs
Conclusions Restricted SEMs...... exploit different assumptions than traditional methods.... can identify the true DAG.... work well in practice for graphs with a small number of nodes. interesting tool for causal inference... that should be applied to large-scale data sets. Thank you!