Simplicity of Additive Noise Models

Size: px
Start display at page:

Download "Simplicity of Additive Noise Models"

Transcription

1 Simplicity of Additive Noise Models Jonas Peters ETH Zürich - Marie Curie (IEF) Workshop on Simplicity and Causal Discovery Carnegie Mellon University 7th June 2014

2 contains joint work with... ETH Zürich: Peter Bühlmann, Jan Ernest Max-Planck-Institute Tübingen: Dominik Janzing, Bernhard Schölkopf University of Amsterdam, Radboud University Nijmegen: Joris Mooij UC Berkeley: Sivaraman Balakrishnan, Martin Wainwright

3 What is the Problem? Theoretical: P(X 1,..., X 5 )? DAG G 0 X 4 X 5 X 2 X 3 X 1 Practical: iid observations from? estimated P(X 1,..., X 5 ) DAG G 0

4 Consider the world of Structural Equation Models Assume P(X 1,..., X 4 ) has been generated by X 1 = f 1 (X 3, N 1 ) X 2 = N 2 X 3 = f 3 (X 2, N 3 ) X 4 = f 4 (X 2, X 3, N 4 ) G 0 X 1 X 2 X 3 N i jointly independent G 0 has no cycles X 4 Can the directed acyclic graph be recovered from P(X 1,..., X 4 )?

5 Consider the world of Structural Equation Models Assume P(X 1,..., X 4 ) has been generated by X 1 = f 1 (X 3, N 1 ) X 2 = N 2 X 3 = f 3 (X 2, N 3 ) X 4 = f 4 (X 2, X 3, N 4 ) G 0 X 1 X 2 X 3 N i jointly independent G 0 has no cycles X 4 Can the directed acyclic graph be recovered from P(X 1,..., X 4 )? No.

6 Consider the world of Structural Equation Models Proposition Given a distribution P, we can find a SEM generating P for each graph G, such that P is Markov with respect to G. JP: Restricted Structural Equation Models for Causal Inference, PhD Thesis 2012 (and probably others?)

7 Consider the world of Structural Equation Models Proposition Given a distribution P, we can find a SEM generating P for each graph G, such that P is Markov with respect to G a lot of graphs. JP: Restricted Structural Equation Models for Causal Inference, PhD Thesis 2012 (and probably others?)

8 Consider the world of Structural Equation Models Assume P(X 1,..., X 4 ) has been generated by X 1 = f 1 (X 3 ) + N 1 X 2 = N 2 X 3 = f 3 (X 2 ) + N 3 G 0 X 1 X 4 = f 4 (X 2, X 3 ) + N 4 X 2 X 3 N i jointly independent X 4 G 0 has no cycles Additive Noise Models. Can the DAG be recovered JP, J. Mooij, D. Janzing and B. Schölkopf: Causal Discovery with Continuous Additive Noise Models, to appear in JMLR

9 Consider the world of Structural Equation Models Assume P(X 1,..., X 4 ) has been generated by X 1 = f 1 (X 3 ) + N 1 X 2 = N 2 X 3 = f 3 (X 2 ) + N 3 G 0 X 1 X 4 = f 4 (X 2, X 3 ) + N 4 X 2 X 3 N i N (0, σi 2 ) jointly independent X 4 G 0 has no cycles Additive Noise Models with Gaussian noise.

10 Consider the world of Structural Equation Models Assume P(X 1,..., X 4 ) has been generated by X 1 = f 1 (X 3 ) + N 1 X 2 = N 2 X 3 = f 3 (X 2 ) + N 3 G 0 X 1 X 4 = f 4 (X 2, X 3 ) + N 4 X 2 X 3 N i N (0, σi 2 ) jointly independent X 4 G 0 has no cycles Additive Noise Models with Gaussian noise. Can the DAG be recovered from P(X 1,..., X 4 )?

11 Consider the world of Structural Equation Models Assume P(X 1,..., X 4 ) has been generated by X 1 = f 1 (X 3 ) + N 1 X 2 = N 2 X 3 = f 3 (X 2 ) + N 3 G 0 X 1 X 4 = f 4 (X 2, X 3 ) + N 4 X 2 X 3 N i N (0, σi 2 ) jointly independent X 4 G 0 has no cycles Additive Noise Models with Gaussian noise. Can the DAG be recovered from P(X 1,..., X 4 )? Yes iff f i nonlinear. JP, J. Mooij, D. Janzing and B. Schölkopf: Causal Discovery with Continuous Additive Noise Models, to appear in JMLR

12 Consider the world of Structural Equation Models

13 Consider the world of Structural Equation Models

14 Consider the world of Structural Equation Models GAUL GAUSS the LINEAR

15 Consider the world of Structural Equation Models Proposition Given a Gaussian distribution P, we can find a linear Gaussian SEM generating P for each graph G, such that P is Markov with respect to G. A. Hauser: Causal Inference from Interventional Data, PhD Thesis 2013 (and probably others?)

16 Consider the world of Structural Equation Models

17 Does X cause Y or vice versa? Consider a distribution generated by Y = f (X ) + N Y with N Y, X ind N X Y

18 Does X cause Y or vice versa? Consider a distribution generated by Y = f (X ) + N Y with N Y, X ind N X Y Then, if f is nonlinear, there is no X = g(y ) + M X with M X, Y ind N X Y JP, J. Mooij, D. Janzing and B. Schölkopf: Causal Discovery with Continuous Additive Noise Models, to appear in JMLR

19 Does X cause Y or vice versa? Consider a distribution corresponding to Y = X 3 + N Y with N Y, X ind N X Y with X N (1, ) N Y N (0, )

20 Does X cause Y or vice versa? Y X

21 Does X cause Y or vice versa? Y X

22 Does X cause Y or vice versa? Y X

23 Does X cause Y or vice versa? Y gam(x ~ s(y))$residuals

24 Does X cause Y or vice versa? Method No. 1: Testing for independent residuals Y Regress each variable on the other and check whether the residuals are independent of the input/independent/explanatory variable gam(x ~ s(y))$residuals

25 Does X cause Y or vice versa? Significant IGCI LiNGaM Additive Noise PNL 70 Accuracy (%) Not significant Decision rate (%)

26 Does X cause Y or vice versa? F. H. Messerli: Chocolate Consumption, Cognitive Function, and Nobel Laureates, N Engl J Med 2012

27 Does X cause Y or vice versa? F. H. Messerli: Chocolate Consumption, Cognitive Function, and Nobel Laureates, N Engl J Med 2012

28 Does X cause Y or vice versa? F. H. Messerli: Chocolate Consumption, Cognitive Function, and Nobel Laureates, N Engl J Med 2012

29 Does X cause Y or vice versa? No (not enough) data for chocolate

30 Does X cause Y or vice versa? No (not enough) data for chocolate... but we have data for coffee!

31 Does X cause Y or vice versa? # Nobel Laureates / 10 mio Correlation: p-value: < coffee consumption per capita (kg)

32 Does X cause Y or vice versa? # Nobel Laureates / 10 mio Correlation: p-value: < coffee consumption per capita (kg) Coffee Nobel Prize: Dependent residuals (p-value of ). Nobel Prize Coffee: Dependent residuals (p-value of ). Model class too small? Causally insufficient?

33 Does X cause Y or vice versa? # Nobel Laureates / 10 mio Correlation: p-value: < coffee consumption per capita (kg) Coffee Nobel Prize: Dependent residuals (p-value of ). Nobel Prize Coffee: Dependent residuals (p-value of ). Model class too small? Causally insufficient? Question: When is a p-value too small?

34 Does X cause Y or vice versa? Method No. 1: Testing for independent residuals Regress each variable on the other and check whether the residuals are independent of the input/independent/explanatory variable. Y gam(x ~ s(y))$residuals

35 Does X cause Y or vice versa? Method No. 1: Testing for independent residuals Regress each variable on the other and check whether the residuals are independent of the input/independent/explanatory variable. Y translates to p > 2 variables. Nice: correct (in population), no distributional assumption about noise. Problem: does not scale well to large p although it s O(p 2 ) ind. tests gam(x ~ s(y))$residuals

36 Does X cause Y or vice versa?

37 Does X cause Y or vice versa?

38 Does X cause Y or vice versa? Method No. 2: Minimizing KL Estimate the direction that corresponds to the closest subspace (details follow).

39 Does X cause Y or vice versa?

40 Does X cause Y or vice versa? Proposition Assume P(X, Y ) is generated by Y = βx 2 + N Y with independent X N (0, σx 2 ) and N Y N (0, σn 2 Y ).

41 Does X cause Y or vice versa? Proposition Assume P(X, Y ) is generated by Y = βx 2 + N Y with independent X N (0, σ 2 X ) and N Y N (0, σ 2 N Y ). Then inf KL(P Q) > 0 if β 0. Q {Q:Y X } gives us finite sample guarantees model misspefication: how much non-gaussianity can we allow for Question: inf Q {Q:Y X } KL(P Q) =...?

42 Does X cause Y or vice versa? Proposition Assume P(X, Y ) is generated by Y = βx 2 + N Y with independent X N (0, σ 2 X ) and N Y N (0, σ 2 N Y ). Then inf Q {Q:Y X } KL(P Q) = 1 2 log ( 1 + 2β 2 σ4 X σ 2 N Y ) gives us finite sample guarantees model misspefication: how much non-gaussianity can we allow for Question: inf Q {Q:Y X } KL(P Q) =...?

43 Does X cause Y or vice versa? Proposition Assume P(X, Y ) is generated by Y = f (X ) + N Y with independent X N (0, σ 2 X ) and N Y N (0, σ 2 N Y ). Then inf KL(P Q) Q {Q:Y X }

44 Does X cause Y or vice versa? Proposition Assume P(X, Y ) is generated by Y = f (X ) + N Y with independent X N (0, σ 2 X ) and N Y N (0, σ 2 N Y ). Then inf KL(P Q) KL (P(Y ) N (0, vary )) Q {Q:Y X }

45 Does X cause Y or vice versa? Proposition Assume P(X, Y ) is generated by Y = f (X ) + N Y with independent X N (0, σ 2 X ) and N Y N (0, σ 2 N Y ). Then inf KL(P Q) KL (P(Y ) N (0, vary )) Q {Q:Y X } gives us finite sample guarantees model misspefication: how much non-gaussianity can we allow for

46 Does X cause Y or vice versa? Proposition Assume P(X, Y ) is generated by Y = f (X ) + N Y with independent X N (0, σ 2 X ) and N Y N (0, σ 2 N Y ). Then inf KL(P Q) KL (P(Y ) N (0, vary )) Q {Q:Y X } gives us finite sample guarantees model misspefication: how much non-gaussianity can we allow for Question: inf Q {Q:Y X } KL(P Q) =...?

47 Can we reconstruct the whole causal network?

48 Can we reconstruct the whole causal network? Theorem let P(X 1,..., X p ) be generated by an additive noise model (+Gaussian) X i = f i (X PAi ) + N i with jointly independent N i N (0, σ 2 i ) and differentiable, non-linear f i. Then we can identify the corresponding DAG from P(X 1,..., X p ). JP, J. Mooij, D. Janzing and B. Schölkopf: Causal Discovery with Continuous Additive Noise Models, to appear in JMLR

49 Can we reconstruct the whole causal network? Theorem let P(X 1,..., X p ) be generated by an additive noise model (+Gaussian) X i = f i (X PAi ) + N i with jointly independent N i N (0, σ 2 i ) and differentiable, non-linear f i. Then we can identify the corresponding DAG from P(X 1,..., X p ). JP, J. Mooij, D. Janzing and B. Schölkopf: Causal Discovery with Continuous Additive Noise Models, to appear in JMLR Surprising: This follows from identifiability in the bivariate case.

50 Can we reconstruct the whole causal network? Theorem let P(X 1,..., X p ) be generated by a causal additive model (+Gaussian) X i = k PA i f i,k (X k ) + N i with jointly independent N i N (0, σ 2 i ) and differentiable, non-linear f i,k. Then we can identify the corresponding DAG from P(X 1,..., X p ). JP, J. Mooij, D. Janzing and B. Schölkopf: Causal Discovery with Continuous Additive Noise Models, to appear in JMLR Surprising: This follows from identifiability in the bivariate case.

51 Can we reconstruct the whole causal network? Given ˆP n (X 1,..., X 4 ). What now?

52 Can we reconstruct the whole causal network? Given ˆP n (X 1,..., X 4 ). What now? Consider model classes Q G := {Q : Q generated by a causal additive model CAM with DAG G} Optimize min DAG G inf KL( ˆP n Q) Q Q G

53 Can we reconstruct the whole causal network? Given ˆP n (X 1,..., X 4 ). What now? Consider model classes Q G := {Q : Q generated by a causal additive model CAM with DAG G} Optimize min DAG G inf KL( ˆP n Q) Q Q G max. likelihood min DAG G p i=1 log var(residuals PA G i X i )

54 Can we reconstruct the whole causal network? Given ˆP n (X 1,..., X 4 ). What now? Consider model classes Q G := {Q : Q generated by a causal additive model CAM with DAG G} Optimize min DAG G inf KL( ˆP n Q) Q Q G max. likelihood min DAG G p i=1 log var(residuals PA G i X i ) Wait, there is no penalization on the number of edges! fully connected graph, i.e. orderings

55 Can we reconstruct the whole causal network? There are DAGs with 13 nodes

56 Can we reconstruct the whole causal network? There are DAGs with 13 nodes. There are only :-) orderings Idea: find causal order Find the order of variables that maximizes likelihood and then perform classical variable selection. This is consistent

57 Can we reconstruct the whole causal network? There are DAGs with 13 nodes. There are only :-) orderings Idea: find causal order Find the order of variables that maximizes likelihood and then perform classical variable selection. This is consistent (but intractable). P. Bühlmann, JP and J. Ernest: CAM: Causal add. models, high-dim. order search and penalized regression, submitted

58 Can we reconstruct the whole causal network? include best edge recompute column X 1 X 4 X 2 X 5 X 3 X 6 STEP 1: Greedy Addition. Include the edge that leads to the largest increase of the log-likelihood.

59 Can we reconstruct the whole causal network? Theorem (Some correctness of greedy search for CAM) Assume that the skeleton of the correct DAG does not contain any cycles (plus some mild conditions/modifications). Greedy addition then yields a correct causal order (in population). JP, S. Balakrishnan and M. Wainwright: in progress. STEP 1: Greedy Addition. Include the edge that leads to the largest increase of the log-likelihood.

60 Can we reconstruct the whole causal network? X 2 X 2 X 1 X 3 X 1 X 3 X 4 X 6 X 4 X 6 X 5 remove edges by variable selection X 5 STEP 2: Variable Selection. For each node, remove non-relevant edges.

61 Can we reconstruct the whole causal network? X 2 X 2 X 1 X 3 X 1 X 3 X 4 X 6 X 4 X 6 X 5 PNS by variable selection X 5 Easy to add for high-dim data: STEP 0: Preliminary Neighbourhood Selection.

62 Can we reconstruct the whole causal network? Simulations p = 100, n = 200, functions drawn from Gaussian Process SHD to true DAG p = 100 CAM RESIT LiNGAM PC CPC GES

63 Can we reconstruct the whole causal network? Simulations p = 100, n = 200, functions drawn from Gaussian Process SID to true DAG p = 100 CAM RESIT LiNGAM PC (lower) PC (upper) CPC (lower) CPC (upper) GES (lower) GES (upper) JP and P. Bühlmann: Structural Intervention Distance (SID) for Evaluating Causal Graphs, arxiv

64 Can we reconstruct the whole causal network? Simulations p = 10, n = 200, linear functions SHD to true CPDAG for p = 10 and n = CAM LiNGAM PC CPC GES SHD

65 2004. Can we reconstruct the whole causal network? Real Data: Arabidopsis thaliana p = 39, n = 118, 20 most significant edges Chloroplast (MEP pathway) Cytoplasm (MVA pathway) DXPS1 DXPS2 DXPS3 AACT1 AACT2 DXR HMGS MCT HMGR1 HMGR2 CMK MK MECPS MPDC1 MPDC2 HDS IPPI2 HDR IPPI1 Mitochondrion FPPS1 FPPS2 UPPS1 GPPS DPPS2 GGPPS 2,6,8,10,11,12 PPDS1 PPDS2 GGPPS1,5,9 DPPS1,3 GGPPS3,4 Wille et al.: Sparse graphical Gaussian modeling of the isoprenoid gene network in arabidopsis thaliana. Genome Biology, 5(11),

66 2004. Can we reconstruct the whole causal network? Real Data: Arabidopsis thaliana p = 39, n = 118, stability selection: expected false positives 2 Chloroplast (MEP pathway) Cytoplasm (MVA pathway) DXPS1 DXPS2 DXPS3 AACT1 AACT2 DXR HMGS MCT HMGR1 HMGR2 CMK MK MECPS MPDC1 MPDC2 HDS IPPI2 HDR IPPI1 Mitochondrion FPPS1 FPPS2 UPPS1 GPPS DPPS2 GGPPS 2,6,8,10,11,12 PPDS1 PPDS2 GGPPS1,5,9 DPPS1,3 GGPPS3,4 Wille et al.: Sparse graphical Gaussian modeling of the isoprenoid gene network in arabidopsis thaliana. Genome Biology, 5(11),

67 Can we reconstruct the whole causal network? Real Data: A ground truth? Assume we are given observational data (j = ; k = ) O kj : expression level of gene j in observation k,

68 Can we reconstruct the whole causal network? Real Data: A ground truth? Assume we are given observational data (j = ; k = ) O kj : expression level of gene j in observation k, and some interventional data (j = ; i =... in total 400 genes) A ij : expression level of gene j under a knock-down of gene i.

69 Can we reconstruct the whole causal network? Real Data: A ground truth? Assume we are given observational data (j = ; k = ) O kj : expression level of gene j in observation k, and some interventional data (j = ; i =... in total 400 genes) A ij : expression level of gene j under a knock-down of gene i. How do we evaluate causal inference methods?

70 Summary Idea: Additive Noise Models Structural assumptions like additive noise models lead to identifiability: X i = f i (X pa(i) ) + N i Idea: causal order + variable selection Find the order of variables that maximizes likelihood (by greedily adding edges) and then perform classical variable selection.

71 Summary Idea: Additive Noise Models Structural assumptions like additive noise models lead to identifiability: X i = f i (X pa(i) ) + N i Idea: causal order + variable selection Find the order of variables that maximizes likelihood (by greedily adding edges) and then perform classical variable selection. Open Questions Identifiability (e.g. in terms of KL-distances) depending on non-linearity, for example. Hidden variables Do the assumptions (roughly) hold in practice? How do we evaluate causal inference methods?

72 Summary Idea: Additive Noise Models Structural assumptions like additive noise models lead to identifiability: X i = f i (X pa(i) ) + N i Idea: causal order + variable selection Find the order of variables that maximizes likelihood (by greedily adding edges) and then perform classical variable selection. Open Questions Identifiability (e.g. in terms of KL-distances) depending on non-linearity, for example. Hidden variables Do the assumptions (roughly) hold in practice? How do we evaluate causal inference methods? Dankeschön!!

Recovering the Graph Structure of Restricted Structural Equation Models

Recovering the Graph Structure of Restricted Structural Equation Models Recovering the Graph Structure of Restricted Structural Equation Models Workshop on Statistics for Complex Networks, Eindhoven Jonas Peters 1 J. Mooij 3, D. Janzing 2, B. Schölkopf 2, P. Bühlmann 1 1 Seminar

More information

Distinguishing between Cause and Effect: Estimation of Causal Graphs with two Variables

Distinguishing between Cause and Effect: Estimation of Causal Graphs with two Variables Distinguishing between Cause and Effect: Estimation of Causal Graphs with two Variables Jonas Peters ETH Zürich Tutorial NIPS 2013 Workshop on Causality 9th December 2013 F. H. Messerli: Chocolate Consumption,

More information

Causality. Bernhard Schölkopf and Jonas Peters MPI for Intelligent Systems, Tübingen. MLSS, Tübingen 21st July 2015

Causality. Bernhard Schölkopf and Jonas Peters MPI for Intelligent Systems, Tübingen. MLSS, Tübingen 21st July 2015 Causality Bernhard Schölkopf and Jonas Peters MPI for Intelligent Systems, Tübingen MLSS, Tübingen 21st July 2015 Charig et al.: Comparison of treatment of renal calculi by open surgery, (...), British

More information

High-dimensional causal inference, graphical modeling and structural equation models

High-dimensional causal inference, graphical modeling and structural equation models High-dimensional causal inference, graphical modeling and structural equation models Peter Bühlmann Seminar für Statistik, ETH Zürich cannot do confirmatory causal inference without randomized intervention

More information

Causal inference (with statistical uncertainty) based on invariance: exploiting the power of heterogeneous data

Causal inference (with statistical uncertainty) based on invariance: exploiting the power of heterogeneous data Causal inference (with statistical uncertainty) based on invariance: exploiting the power of heterogeneous data Peter Bühlmann joint work with Jonas Peters Nicolai Meinshausen ... and designing new perturbation

More information

Advances in Cyclic Structural Causal Models

Advances in Cyclic Structural Causal Models Advances in Cyclic Structural Causal Models Joris Mooij j.m.mooij@uva.nl June 1st, 2018 Joris Mooij (UvA) Rotterdam 2018 2018-06-01 1 / 41 Part I Introduction to Causality Joris Mooij (UvA) Rotterdam 2018

More information

Using background knowledge for the estimation of total causal e ects

Using background knowledge for the estimation of total causal e ects Using background knowledge for the estimation of total causal e ects Interpreting and using CPDAGs with background knowledge Emilija Perkovi, ETH Zurich Joint work with Markus Kalisch and Marloes Maathuis

More information

Inference of Cause and Effect with Unsupervised Inverse Regression

Inference of Cause and Effect with Unsupervised Inverse Regression Inference of Cause and Effect with Unsupervised Inverse Regression Eleni Sgouritsa Dominik Janzing Philipp Hennig Bernhard Schölkopf Max Planck Institute for Intelligent Systems, Tübingen, Germany {eleni.sgouritsa,

More information

Causal Effect Identification in Alternative Acyclic Directed Mixed Graphs

Causal Effect Identification in Alternative Acyclic Directed Mixed Graphs Proceedings of Machine Learning Research vol 73:21-32, 2017 AMBN 2017 Causal Effect Identification in Alternative Acyclic Directed Mixed Graphs Jose M. Peña Linköping University Linköping (Sweden) jose.m.pena@liu.se

More information

Variable Selection in Multivariate Linear Regression Models with Fewer Observations than the Dimension

Variable Selection in Multivariate Linear Regression Models with Fewer Observations than the Dimension Variable Selection in Multivariate Linear Regression Models with Fewer Observations than the Dimension (Last Modified: November 3 2008) Mariko Yamamura 1, Hirokazu Yanagihara 2 and Muni S. Srivastava 3

More information

Abstract. Three Methods and Their Limitations. N-1 Experiments Suffice to Determine the Causal Relations Among N Variables

Abstract. Three Methods and Their Limitations. N-1 Experiments Suffice to Determine the Causal Relations Among N Variables N-1 Experiments Suffice to Determine the Causal Relations Among N Variables Frederick Eberhardt Clark Glymour 1 Richard Scheines Carnegie Mellon University Abstract By combining experimental interventions

More information

COMP538: Introduction to Bayesian Networks

COMP538: Introduction to Bayesian Networks COMP538: Introduction to Bayesian Networks Lecture 9: Optimal Structure Learning Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering Hong Kong University of Science and Technology

More information

Distinguishing Cause from Effect Using Observational Data: Methods and Benchmarks

Distinguishing Cause from Effect Using Observational Data: Methods and Benchmarks Journal of Machine Learning Research 17 (2016) 1-102 Submitted 12/14; Revised 12/15; Published 4/16 Distinguishing Cause from Effect Using Observational Data: Methods and Benchmarks Joris M. Mooij Institute

More information

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models JMLR Workshop and Conference Proceedings 6:17 164 NIPS 28 workshop on causality Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models Kun Zhang Dept of Computer Science and HIIT University

More information

Learning in Bayesian Networks

Learning in Bayesian Networks Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks

More information

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models Kun Zhang Dept of Computer Science and HIIT University of Helsinki 14 Helsinki, Finland kun.zhang@cs.helsinki.fi Aapo Hyvärinen

More information

Causal Structure Learning and Inference: A Selective Review

Causal Structure Learning and Inference: A Selective Review Vol. 11, No. 1, pp. 3-21, 2014 ICAQM 2014 Causal Structure Learning and Inference: A Selective Review Markus Kalisch * and Peter Bühlmann Seminar for Statistics, ETH Zürich, CH-8092 Zürich, Switzerland

More information

arxiv: v3 [stat.me] 10 Mar 2016

arxiv: v3 [stat.me] 10 Mar 2016 Submitted to the Annals of Statistics ESTIMATING THE EFFECT OF JOINT INTERVENTIONS FROM OBSERVATIONAL DATA IN SPARSE HIGH-DIMENSIONAL SETTINGS arxiv:1407.2451v3 [stat.me] 10 Mar 2016 By Preetam Nandy,,

More information

Challenges (& Some Solutions) and Making Connections

Challenges (& Some Solutions) and Making Connections Challenges (& Some Solutions) and Making Connections Real-life Search All search algorithm theorems have form: If the world behaves like, then (probability 1) the algorithm will recover the true structure

More information

arxiv: v2 [cs.lg] 9 Mar 2017

arxiv: v2 [cs.lg] 9 Mar 2017 Journal of Machine Learning Research? (????)??-?? Submitted?/??; Published?/?? Joint Causal Inference from Observational and Experimental Datasets arxiv:1611.10351v2 [cs.lg] 9 Mar 2017 Sara Magliacane

More information

Probabilistic latent variable models for distinguishing between cause and effect

Probabilistic latent variable models for distinguishing between cause and effect Probabilistic latent variable models for distinguishing between cause and effect Joris M. Mooij joris.mooij@tuebingen.mpg.de Oliver Stegle oliver.stegle@tuebingen.mpg.de Dominik Janzing dominik.janzing@tuebingen.mpg.de

More information

A review of some recent advances in causal inference

A review of some recent advances in causal inference A review of some recent advances in causal inference Marloes H. Maathuis and Preetam Nandy arxiv:1506.07669v1 [stat.me] 25 Jun 2015 Contents 1 Introduction 1 1.1 Causal versus non-causal research questions.................

More information

CS Lecture 3. More Bayesian Networks

CS Lecture 3. More Bayesian Networks CS 6347 Lecture 3 More Bayesian Networks Recap Last time: Complexity challenges Representing distributions Computing probabilities/doing inference Introduction to Bayesian networks Today: D-separation,

More information

Foundations of Causal Inference

Foundations of Causal Inference Foundations of Causal Inference Causal inference is usually concerned with exploring causal relations among random variables X 1,..., X n after observing sufficiently many samples drawn from the joint

More information

Rapid Introduction to Machine Learning/ Deep Learning

Rapid Introduction to Machine Learning/ Deep Learning Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/32 Lecture 5a Bayesian network April 14, 2016 2/32 Table of contents 1 1. Objectives of Lecture 5a 2 2.Bayesian

More information

Identifiability of Gaussian structural equation models with equal error variances

Identifiability of Gaussian structural equation models with equal error variances Biometrika (2014), 101,1,pp. 219 228 doi: 10.1093/biomet/ast043 Printed in Great Britain Advance Access publication 8 November 2013 Identifiability of Gaussian structural equation models with equal error

More information

Predicting the effect of interventions using invariance principles for nonlinear models

Predicting the effect of interventions using invariance principles for nonlinear models Predicting the effect of interventions using invariance principles for nonlinear models Christina Heinze-Deml Seminar for Statistics, ETH Zürich heinzedeml@stat.math.ethz.ch Jonas Peters University of

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

The Origin of Deep Learning. Lili Mou Jan, 2015

The Origin of Deep Learning. Lili Mou Jan, 2015 The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets

More information

Distinguishing between cause and effect

Distinguishing between cause and effect JMLR Workshop and Conference Proceedings 6:147 156 NIPS 28 workshop on causality Distinguishing between cause and effect Joris Mooij Max Planck Institute for Biological Cybernetics, 7276 Tübingen, Germany

More information

Causality. Jonas Peters. Lecture Notes Version: September 5, Spring Semester 2015, ETH Zurich

Causality. Jonas Peters. Lecture Notes Version: September 5, Spring Semester 2015, ETH Zurich Causality Lecture Notes Version: September 5, 2015 Spring Semester 2015, ETH Zurich Jonas Peters 2 Contents 1 Introduction 7 1.1 Motivation..................................... 7 1.2 Some bits of probability

More information

arxiv: v1 [math.st] 13 Mar 2013

arxiv: v1 [math.st] 13 Mar 2013 Jointly interventional and observational data: estimation of interventional Markov equivalence classes of directed acyclic graphs arxiv:1303.316v1 [math.st] 13 Mar 013 Alain Hauser and Peter Bühlmann {hauser,

More information

arxiv: v1 [cs.lg] 3 Jan 2017

arxiv: v1 [cs.lg] 3 Jan 2017 Deep Convolutional Neural Networks for Pairwise Causality Karamjit Singh, Garima Gupta, Lovekesh Vig, Gautam Shroff, and Puneet Agarwal TCS Research, New-Delhi, India January 4, 2017 arxiv:1701.00597v1

More information

Discovery of Linear Acyclic Models Using Independent Component Analysis

Discovery of Linear Acyclic Models Using Independent Component Analysis Created by S.S. in Jan 2008 Discovery of Linear Acyclic Models Using Independent Component Analysis Shohei Shimizu, Patrik Hoyer, Aapo Hyvarinen and Antti Kerminen LiNGAM homepage: http://www.cs.helsinki.fi/group/neuroinf/lingam/

More information

Causal discovery from big data : mission (im)possible? Tom Heskes Radboud University Nijmegen The Netherlands

Causal discovery from big data : mission (im)possible? Tom Heskes Radboud University Nijmegen The Netherlands Causal discovery from big data : mission (im)possible? Tom Heskes Radboud University Nijmegen The Netherlands Outline Statistical causal discovery The logic of causal inference A Bayesian approach... Applications

More information

arxiv: v6 [math.st] 3 Feb 2018

arxiv: v6 [math.st] 3 Feb 2018 Submitted to the Annals of Statistics HIGH-DIMENSIONAL CONSISTENCY IN SCORE-BASED AND HYBRID STRUCTURE LEARNING arxiv:1507.02608v6 [math.st] 3 Feb 2018 By Preetam Nandy,, Alain Hauser and Marloes H. Maathuis,

More information

Machine Learning Summer School

Machine Learning Summer School Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,

More information

10708 Graphical Models: Homework 2

10708 Graphical Models: Homework 2 10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves

More information

GRAPHICAL MARKOV MODELS,

GRAPHICAL MARKOV MODELS, 7 th International Conference on Computational and Methodological Statistics (ERCIM 2014), 6-8 December 2014, University of Pisa, Italy: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Three sessions

More information

Supplementary material to Structure Learning of Linear Gaussian Structural Equation Models with Weak Edges

Supplementary material to Structure Learning of Linear Gaussian Structural Equation Models with Weak Edges Supplementary material to Structure Learning of Linear Gaussian Structural Equation Models with Weak Edges 1 PRELIMINARIES Two vertices X i and X j are adjacent if there is an edge between them. A path

More information

Learning of Causal Relations

Learning of Causal Relations Learning of Causal Relations John A. Quinn 1 and Joris Mooij 2 and Tom Heskes 2 and Michael Biehl 3 1 Faculty of Computing & IT, Makerere University P.O. Box 7062, Kampala, Uganda 2 Institute for Computing

More information

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models 02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput

More information

Jointly interventional and observational data: estimation of interventional Markov equivalence classes of directed acyclic graphs

Jointly interventional and observational data: estimation of interventional Markov equivalence classes of directed acyclic graphs J. R. Statist. Soc. B (2015) Jointly interventional and observational data: estimation of interventional Markov equivalence classes of directed acyclic graphs Alain Hauser, University of Bern, and Swiss

More information

11. Learning graphical models

11. Learning graphical models Learning graphical models 11-1 11. Learning graphical models Maximum likelihood Parameter learning Structural learning Learning partially observed graphical models Learning graphical models 11-2 statistical

More information

Gradient directed regularization for sparse Gaussian concentration graphs, with applications to inference of genetic networks

Gradient directed regularization for sparse Gaussian concentration graphs, with applications to inference of genetic networks Biostatistics (2006), 7, 2, pp. 302 317 doi:10.1093/biostatistics/kxj008 Advance Access publication on December 2, 2005 Gradient directed regularization for sparse Gaussian concentration graphs, with applications

More information

Causal Inference on Discrete Data via Estimating Distance Correlations

Causal Inference on Discrete Data via Estimating Distance Correlations ARTICLE CommunicatedbyAapoHyvärinen Causal Inference on Discrete Data via Estimating Distance Correlations Furui Liu frliu@cse.cuhk.edu.hk Laiwan Chan lwchan@cse.euhk.edu.hk Department of Computer Science

More information

Part 1: Expectation Propagation

Part 1: Expectation Propagation Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud

More information

Learning Gaussian Graphical Models with Unknown Group Sparsity

Learning Gaussian Graphical Models with Unknown Group Sparsity Learning Gaussian Graphical Models with Unknown Group Sparsity Kevin Murphy Ben Marlin Depts. of Statistics & Computer Science Univ. British Columbia Canada Connections Graphical models Density estimation

More information

Automatic Causal Discovery

Automatic Causal Discovery Automatic Causal Discovery Richard Scheines Peter Spirtes, Clark Glymour Dept. of Philosophy & CALD Carnegie Mellon 1 Outline 1. Motivation 2. Representation 3. Discovery 4. Using Regression for Causal

More information

Invariance and causality for robust predictions

Invariance and causality for robust predictions Invariance and causality for robust predictions Peter Bühlmann Seminar für Statistik, ETH Zürich joint work with Jonas Peters Univ. Copenhagen Nicolai Meinshausen ETH Zürich Dominik Rothenhäusler ETH Zürich

More information

Graphical Model Selection

Graphical Model Selection May 6, 2013 Trevor Hastie, Stanford Statistics 1 Graphical Model Selection Trevor Hastie Stanford University joint work with Jerome Friedman, Rob Tibshirani, Rahul Mazumder and Jason Lee May 6, 2013 Trevor

More information

Arrowhead completeness from minimal conditional independencies

Arrowhead completeness from minimal conditional independencies Arrowhead completeness from minimal conditional independencies Tom Claassen, Tom Heskes Radboud University Nijmegen The Netherlands {tomc,tomh}@cs.ru.nl Abstract We present two inference rules, based on

More information

Inferring Causal Phenotype Networks from Segregating Populat

Inferring Causal Phenotype Networks from Segregating Populat Inferring Causal Phenotype Networks from Segregating Populations Elias Chaibub Neto chaibub@stat.wisc.edu Statistics Department, University of Wisconsin - Madison July 15, 2008 Overview Introduction Description

More information

High-dimensional learning of linear causal networks via inverse covariance estimation

High-dimensional learning of linear causal networks via inverse covariance estimation High-dimensional learning of linear causal networks via inverse covariance estimation Po-Ling Loh Department of Statistics, University of California, Berkeley, CA 94720, USA Peter Bühlmann Seminar für

More information

Approaches to High-Dimensional Covariance and Precision Matrix Estimation

Approaches to High-Dimensional Covariance and Precision Matrix Estimation Approaches to High-Dimensional Covariance and Precision Matrix Estimation Jianqing Fan 1, Yuan Liao and Han Liu Department of Operations Research and Financial Engineering, Princeton University Bendheim

More information

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence General overview Introduction Directed acyclic graphs (DAGs) and conditional independence DAGs and causal effects

More information

Interpreting and using CPDAGs with background knowledge

Interpreting and using CPDAGs with background knowledge Interpreting and using CPDAGs with background knowledge Emilija Perković Seminar for Statistics ETH Zurich, Switzerland perkovic@stat.math.ethz.ch Markus Kalisch Seminar for Statistics ETH Zurich, Switzerland

More information

Nonlinear causal discovery with additive noise models

Nonlinear causal discovery with additive noise models Nonlinear causal discovery with additive noise models Patrik O. Hoyer University of Helsinki Finland Dominik Janzing MPI for Biological Cybernetics Tübingen, Germany Joris Mooij MPI for Biological Cybernetics

More information

Causal Inference via Kernel Deviance Measures

Causal Inference via Kernel Deviance Measures Causal Inference via Kernel Deviance Measures Jovana Mitrovic Department of Statistics University of Oxford Dino Sejdinovic Department of Statistics University of Oxford Yee Whye Teh Department of Statistics

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04

More information

Towards an extension of the PC algorithm to local context-specific independencies detection

Towards an extension of the PC algorithm to local context-specific independencies detection Towards an extension of the PC algorithm to local context-specific independencies detection Feb-09-2016 Outline Background: Bayesian Networks The PC algorithm Context-specific independence: from DAGs to

More information

Bayesian Discovery of Linear Acyclic Causal Models

Bayesian Discovery of Linear Acyclic Causal Models Bayesian Discovery of Linear Acyclic Causal Models Patrik O. Hoyer Helsinki Institute for Information Technology & Department of Computer Science University of Helsinki Finland Antti Hyttinen Helsinki

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information

ESTIMATING HIGH-DIMENSIONAL INTERVENTION EFFECTS FROM OBSERVATIONAL DATA

ESTIMATING HIGH-DIMENSIONAL INTERVENTION EFFECTS FROM OBSERVATIONAL DATA The Annals of Statistics 2009, Vol. 37, No. 6A, 3133 3164 DOI: 10.1214/09-AOS685 Institute of Mathematical Statistics, 2009 ESTIMATING HIGH-DIMENSIONAL INTERVENTION EFFECTS FROM OBSERVATIONAL DATA BY MARLOES

More information

arxiv: v1 [cs.lg] 26 May 2017

arxiv: v1 [cs.lg] 26 May 2017 Learning Causal tructures Using Regression Invariance ariv:1705.09644v1 [cs.lg] 26 May 2017 AmirEmad Ghassami, aber alehkaleybar, Negar Kiyavash, Kun Zhang Department of ECE, University of Illinois at

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

2 : Directed GMs: Bayesian Networks

2 : Directed GMs: Bayesian Networks 10-708: Probabilistic Graphical Models 10-708, Spring 2017 2 : Directed GMs: Bayesian Networks Lecturer: Eric P. Xing Scribes: Jayanth Koushik, Hiroaki Hayashi, Christian Perez Topic: Directed GMs 1 Types

More information

Pairwise measures of causal direction in linear non-gaussian acy

Pairwise measures of causal direction in linear non-gaussian acy Pairwise measures of causal direction in linear non-gaussian acyclic models Dept of Mathematics and Statistics & Dept of Computer Science University of Helsinki, Finland Abstract Estimating causal direction

More information

Learning Causal Bayesian Networks from Observations and Experiments: A Decision Theoretic Approach

Learning Causal Bayesian Networks from Observations and Experiments: A Decision Theoretic Approach Learning Causal Bayesian Networks from Observations and Experiments: A Decision Theoretic Approach Stijn Meganck 1, Philippe Leray 2, and Bernard Manderick 1 1 Vrije Universiteit Brussel, Pleinlaan 2,

More information

LEARNING DIRECTED ACYCLIC GRAPHS WITH HIDDEN VARIABLES VIA LATENT GAUSSIAN GRAPHICAL MODEL SELECTION arxiv: v1 [stat.

LEARNING DIRECTED ACYCLIC GRAPHS WITH HIDDEN VARIABLES VIA LATENT GAUSSIAN GRAPHICAL MODEL SELECTION arxiv: v1 [stat. LEARNING DIRECTED ACYCLIC GRAPHS WITH HIDDEN VARIABLES VIA LATENT GAUSSIAN GRAPHICAL MODEL SELECTION arxiv:1708.01151v1 [stat.me] 3 Aug 2017 BENJAMIN FROT, PREETAM NANDY, AND MARLOES H. MAATHUIS Abstract.

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Lecture 5: Bayesian Network

Lecture 5: Bayesian Network Lecture 5: Bayesian Network Topics of this lecture What is a Bayesian network? A simple example Formal definition of BN A slightly difficult example Learning of BN An example of learning Important topics

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

Deep Convolutional Neural Networks for Pairwise Causality

Deep Convolutional Neural Networks for Pairwise Causality Deep Convolutional Neural Networks for Pairwise Causality Karamjit Singh, Garima Gupta, Lovekesh Vig, Gautam Shroff, and Puneet Agarwal TCS Research, Delhi Tata Consultancy Services Ltd. {karamjit.singh,

More information

39 On Estimation of Functional Causal Models: General Results and Application to Post-Nonlinear Causal Model

39 On Estimation of Functional Causal Models: General Results and Application to Post-Nonlinear Causal Model 39 On Estimation of Functional Causal Models: General Results and Application to Post-Nonlinear Causal Model KUN ZHANG, Max-Planck Institute for Intelligent Systems, 776 Tübingen, Germany ZHIKUN WANG,

More information

Probabilistic Graphical Models (I)

Probabilistic Graphical Models (I) Probabilistic Graphical Models (I) Hongxin Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 2015-03-31 Probabilistic Graphical Models Modeling many real-world problems => a large number of random

More information

Intro to Probability. Andrei Barbu

Intro to Probability. Andrei Barbu Intro to Probability Andrei Barbu Some problems Some problems A means to capture uncertainty Some problems A means to capture uncertainty You have data from two sources, are they different? Some problems

More information

Inferring deterministic causal relations

Inferring deterministic causal relations Inferring deterministic causal relations Povilas Daniušis 1,2, Dominik Janzing 1, Joris Mooij 1, Jakob Zscheischler 1, Bastian Steudel 3, Kun Zhang 1, Bernhard Schölkopf 1 1 Max Planck Institute for Biological

More information

Conditional probabilities and graphical models

Conditional probabilities and graphical models Conditional probabilities and graphical models Thomas Mailund Bioinformatics Research Centre (BiRC), Aarhus University Probability theory allows us to describe uncertainty in the processes we model within

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Classification: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network

More information

Type-II Errors of Independence Tests Can Lead to Arbitrarily Large Errors in Estimated Causal Effects: An Illustrative Example

Type-II Errors of Independence Tests Can Lead to Arbitrarily Large Errors in Estimated Causal Effects: An Illustrative Example Type-II Errors of Independence Tests Can Lead to Arbitrarily Large Errors in Estimated Causal Effects: An Illustrative Example Nicholas Cornia & Joris M. Mooij Informatics Institute University of Amsterdam,

More information

11 : Gaussian Graphic Models and Ising Models

11 : Gaussian Graphic Models and Ising Models 10-708: Probabilistic Graphical Models 10-708, Spring 2017 11 : Gaussian Graphic Models and Ising Models Lecturer: Bryon Aragam Scribes: Chao-Ming Yen 1 Introduction Different from previous maximum likelihood

More information

JOINT PROBABILISTIC INFERENCE OF CAUSAL STRUCTURE

JOINT PROBABILISTIC INFERENCE OF CAUSAL STRUCTURE JOINT PROBABILISTIC INFERENCE OF CAUSAL STRUCTURE Dhanya Sridhar Lise Getoor U.C. Santa Cruz KDD Workshop on Causal Discovery August 14 th, 2016 1 Outline Motivation Problem Formulation Our Approach Preliminary

More information

Genetic Networks. Korbinian Strimmer. Seminar: Statistical Analysis of RNA-Seq Data 19 June IMISE, Universität Leipzig

Genetic Networks. Korbinian Strimmer. Seminar: Statistical Analysis of RNA-Seq Data 19 June IMISE, Universität Leipzig Genetic Networks Korbinian Strimmer IMISE, Universität Leipzig Seminar: Statistical Analysis of RNA-Seq Data 19 June 2012 Korbinian Strimmer, RNA-Seq Networks, 19/6/2012 1 Paper G. I. Allen and Z. Liu.

More information

Probabilistic Graphical Models for Image Analysis - Lecture 1

Probabilistic Graphical Models for Image Analysis - Lecture 1 Probabilistic Graphical Models for Image Analysis - Lecture 1 Alexey Gronskiy, Stefan Bauer 21 September 2018 Max Planck ETH Center for Learning Systems Overview 1. Motivation - Why Graphical Models 2.

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

Introduction to Probabilistic Graphical Models

Introduction to Probabilistic Graphical Models Introduction to Probabilistic Graphical Models Sargur Srihari srihari@cedar.buffalo.edu 1 Topics 1. What are probabilistic graphical models (PGMs) 2. Use of PGMs Engineering and AI 3. Directionality in

More information

Causal Discovery in the Presence of Measurement Error: Identifiability Conditions

Causal Discovery in the Presence of Measurement Error: Identifiability Conditions Causal Discovery in the Presence of Measurement Error: Identifiability Conditions Kun Zhang,MingmingGong?,JosephRamsey,KayhanBatmanghelich?, Peter Spirtes, Clark Glymour Department of philosophy, Carnegie

More information

Effectiveness of classification approach in recovering pairwise causal relations from data.

Effectiveness of classification approach in recovering pairwise causal relations from data. Graduate Theses and Dissertations Iowa State University Capstones, Theses and Dissertations 2018 Effectiveness of classification approach in recovering pairwise causal relations from data. Kristima Guha

More information

Empirical Likelihood Based Deviance Information Criterion

Empirical Likelihood Based Deviance Information Criterion Empirical Likelihood Based Deviance Information Criterion Yin Teng Smart and Safe City Center of Excellence NCS Pte Ltd June 22, 2016 Outline Bayesian empirical likelihood Definition Problems Empirical

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

QUANTIFYING CAUSAL INFLUENCES

QUANTIFYING CAUSAL INFLUENCES Submitted to the Annals of Statistics arxiv: arxiv:/1203.6502 QUANTIFYING CAUSAL INFLUENCES By Dominik Janzing, David Balduzzi, Moritz Grosse-Wentrup, and Bernhard Schölkopf Max Planck Institute for Intelligent

More information

Predicting causal effects in large-scale systems from observational data

Predicting causal effects in large-scale systems from observational data nature methods Predicting causal effects in large-scale systems from observational data Marloes H Maathuis 1, Diego Colombo 1, Markus Kalisch 1 & Peter Bühlmann 1,2 Supplementary figures and text: Supplementary

More information

Statistical Approaches to Learning and Discovery

Statistical Approaches to Learning and Discovery Statistical Approaches to Learning and Discovery Graphical Models Zoubin Ghahramani & Teddy Seidenfeld zoubin@cs.cmu.edu & teddy@stat.cmu.edu CALD / CS / Statistics / Philosophy Carnegie Mellon University

More information

Causal Models with Hidden Variables

Causal Models with Hidden Variables Causal Models with Hidden Variables Robin J. Evans www.stats.ox.ac.uk/ evans Department of Statistics, University of Oxford Quantum Networks, Oxford August 2017 1 / 44 Correlation does not imply causation

More information

THE VINE COPULA METHOD FOR REPRESENTING HIGH DIMENSIONAL DEPENDENT DISTRIBUTIONS: APPLICATION TO CONTINUOUS BELIEF NETS

THE VINE COPULA METHOD FOR REPRESENTING HIGH DIMENSIONAL DEPENDENT DISTRIBUTIONS: APPLICATION TO CONTINUOUS BELIEF NETS Proceedings of the 00 Winter Simulation Conference E. Yücesan, C.-H. Chen, J. L. Snowdon, and J. M. Charnes, eds. THE VINE COPULA METHOD FOR REPRESENTING HIGH DIMENSIONAL DEPENDENT DISTRIBUTIONS: APPLICATION

More information

Learning Quadratic Variance Function (QVF) DAG Models via OverDispersion Scoring (ODS)

Learning Quadratic Variance Function (QVF) DAG Models via OverDispersion Scoring (ODS) Journal of Machine Learning Research 18 2018 1-44 Submitted 4/17; Revised 12/17; Published 4/18 Learning Quadratic Variance Function QVF DAG Models via OverDispersion Scoring ODS Gunwoong Park Department

More information