Causal inference (with statistical uncertainty) based on invariance: exploiting the power of heterogeneous data

Size: px
Start display at page:

Download "Causal inference (with statistical uncertainty) based on invariance: exploiting the power of heterogeneous data"

Transcription

1 Causal inference (with statistical uncertainty) based on invariance: exploiting the power of heterogeneous data Peter Bühlmann joint work with Jonas Peters Nicolai Meinshausen

2 ... and designing new perturbation experiments for large-scale biological systems

3 Goal in genomics: if we would make an intervention at a single gene, what would be its effect on a phenotype of interest? want to infer/predict such effects without actually doing the intervention e.g. from observational data (from observations of a steady-state system ) but here: focus on observational and interventional data with known, specified interventions changing environments or experimental settings with vaguely or un- specified interventions it doesn t need to be genes can generalize to interventions at more than one variable/gene

4 Goal in genomics: if we would make an intervention at a single gene, what would be its effect on a phenotype of interest? want to infer/predict such effects without actually doing the intervention e.g. from observational data (from observations of a steady-state system ) but here: focus on observational and interventional data with known, specified interventions changing environments or experimental settings with vaguely or un- specified interventions it doesn t need to be genes can generalize to interventions at more than one variable/gene

5 Goal in genomics: if we would make an intervention at a single gene, what would be its effect on a phenotype of interest? want to infer/predict such effects without actually doing the intervention e.g. from observational data (from observations of a steady-state system ) but here: focus on observational and interventional data with known, specified interventions changing environments or experimental settings with vaguely or un- specified interventions it doesn t need to be genes can generalize to interventions at more than one variable/gene

6 Genomics 1. Flowering of Arabidopsis Thaliana phenotype/response variable of interest: Y = days to bolting (flowering) covariates X = gene expressions from p = genes question: infer/predict the effect of knocking-out/knocking-down (or enhancing) a single gene (expression) on the phenotype/response variable Y? randomized experiments for the top-hits from statistical analysis based on n = 47 observational data points with some success (Stekhoven, Moraes, Sveinbjörnsson, Hennig, Maathuis & PB, 2012)

7 Genomics 1. Flowering of Arabidopsis Thaliana phenotype/response variable of interest: Y = days to bolting (flowering) covariates X = gene expressions from p = genes question: infer/predict the effect of knocking-out/knocking-down (or enhancing) a single gene (expression) on the phenotype/response variable Y? randomized experiments for the top-hits from statistical analysis based on n = 47 observational data points with some success (Stekhoven, Moraes, Sveinbjörnsson, Hennig, Maathuis & PB, 2012)

8 2. Gene expressions of yeast p = 5360 genes phenotype of interest: Y = expression of first gene covariates X = gene expressions from all other genes and then phenotype of interest: Y = expression of second gene covariates X = gene expressions from all other genes and so on infer/predict the effects of a single gene knock-down on all other genes

9 Effects of single gene knock-downs on all other genes (yeast) (Maathuis, Colombo, Kalisch & PB, 2010) p = 5360 genes (expression of genes) 231 gene knock downs intervention effects the truth is known in good approximation (thanks to intervention experiments) goal: prediction of the true large intervention effects based on observational data with no knock-downs n = 63 observational data True positives 1, IDA Lasso Elastic net Random ,000 2,000 3,000 4,000 False positives

10 it was a good finding... but some criticisms apply: not very robust: depends to an unpleasant degree how we define the truth (has been found recently by J. Mooij, J. Peters and others) old, publicly available data (Hughes et al., 2000) no collaborator... just (publicily available) data }{{} this is good! we didn t use the interventional data for training

11 A new and better attempt for the same problem single gene knock-down in yeast and measure genomewide expression (observational and interventional data) collaborators: Frank Holstege, Patrick Kemmeren et al. (Utrecht) data from modern technology Kemmeren,..., and Holstege (Cell, 2014)

12 a new approach, developed with Jonas Peters and Nicolai Meinshausen (MPI Tüb. and ETH Z.) based on some invariance principle exploiting heterogeneous data (e.g. interventional or change of environment) e.g. an aspect of Big Data allowing for statistical confidence statements Peters, PB and Meinshausen (2015)

13 Causal inference using invariant prediction Peters, PB and Meinshausen (2015) goal: find the causal variables (components) among a p-dimensional predictor variable X for a specific response variable Y consider data (X e, Y e ) F e with response variables Y e and predictor variables X e here: e }{{} E space of exp. sett. denotes an experimental setting heterogeneous data from different environments/experiments e E (aspects of Big Data )

14 Causal inference using invariant prediction Peters, PB and Meinshausen (2015) goal: find the causal variables (components) among a p-dimensional predictor variable X for a specific response variable Y consider data (X e, Y e ) F e with response variables Y e and predictor variables X e here: e }{{} E space of exp. sett. denotes an experimental setting heterogeneous data from different environments/experiments e E (aspects of Big Data )

15 data (X e, Y e ) F e example 1: E = {1, 2} encoding observational (1) and all potentially unspecific interventional data (2) example 2: E = {1, 2} encoding observational data (1) and (repeated) data from one specific intervention (2) example 3: E = {1, 2, 3} encoding (repeated) data from a first specific intervention (1), from a second specific intervention (2), and from all other (potentially unspecific) interventions (3)

16 important: we have access to more than say only observational data that is: we assume here some setting with e.g. observational and potentially unspecified interventional (or change of environment) data realistic in many applications but different than causal inference from observational data

17 Assumption: invariance of causal linear predictions there exists a true vector γ of linear causal coefficients such that e E : Y e = X e γ + ε e, ε e F ε S = {j; γj 0} causal predictors ε e independent of X S, and F ε is the same for all e, with mean zero and finite variance message: causal coefficients γ and causal variables X S are stable (are the same) across experimental settings! this is intuitively meaningful, and we will make it more rigorous

18 Assumption: invariance of causal linear predictions there exists a true vector γ of linear causal coefficients such that e E : Y e = X e γ + ε e, ε e F ε S = {j; γj 0} causal predictors ε e independent of X S, and F ε is the same for all e, with mean zero and finite variance message: causal coefficients γ and causal variables X S are stable (are the same) across experimental settings! this is intuitively meaningful, and we will make it more rigorous

19 invariance an important mathematical and scientific concept

20 make it more rigorous... e E : Y e = X e γ + ε e, ε e F ε the same for all e S = {j; γ j ε e independent of X S 0} causal predictors we prove this invariance assumption for structural equation models (SEMs), roughly assuming: 1. there are no interventions ( manipulations ) on the target/response variable Y 2. γ and F ε do not change under interventions ( manipulations ) less requirements than for classical, Pearl s -interventions (with truncated factorization) see next...

21 consider H 0,γ,S (E) : γ k = 0 if k / S and F ε such that e E : Y e = X e γ + ε e, ε e X e S, εe F ε the same for all e S, γ are potential causal variables and coefficients and H 0,S (E) : there exists γ such that H 0,γ,S (E) holds a set S {1,..., p} is called plausible causal predictors if H 0,S (E) holds

22 identifiable causal predictors under E: the set S(E), where S(E) = {S; H 0,S (E) holds} under the invariance Assumption we have: S(E) S and this is key to obtain confidence bounds for identifiable causal predictors

23 we also have by definition: S(E 1 ) S(E 2 ) if E 1 E 2 with more interventions more heterogeneity more diversity in Big Data we can identify more causal predictors

24 identifiable causal predictors : true causal predictors : S(E) as E S (E) as E question: when is S(E)(= S ) = S (E)?

25 consider linear Gaussian Structural Equation Model (SEM) X j β jk X k + ε j (j = 1,..., p + 1), }{{} k pa(j) Gaussian X p+1 = Y three types of interventions: classical do-interventions at single variables noise (or soft ) interventions: for e 1(= observational) ε e j A e j εe=1 j or ε e j ε e=1 j + Cj e simultaneous noise interventions where all variables are affected: all intervent. settings are pooled into a single one (e = 2), and for all j: ε e=2 j A e=2 j ε e=1 j or ε e=2 j ε e=1 j + Cj e=2

26 for all these interventions: β jk can change under interventions except for j = p + 1 corresponding to Y even for the classical do-interventions...

27 Theorem (Peter, PB and Meinshausen, 2015) if there is: S(E) = S a single do-intervention for each variable other than Y and E = p a single noise intervention for each variable other than Y and E = p a simultaneous noise intervention for each variable other than Y and E = 2 the conditions can be relaxed such that it is not necessary to intervene at all the variables

28 Statistical confidence sets for causal predictors recap: H 0,γ,S (E) : γ k = 0 if k / S and F ε such that e E : Y e = X e γ + ε e, ε e X e S, εe F ε the same for all e γ is a candidate for being a causal coefficient vector for a set S (not necessarily the true set S of causal predictors) consider plausible causal coefficients for S: Γ S (E) = {γ; H 0,γ,S (E) holds} global plausible causal coefficients: Γ(E) = S Γ S (E)

29 we have, similarly as before: Γ(E 2 ) Γ(E 1 ) if E 1 E 2 that is: Γ(E) is shrinking as we enlarge E with more interventions more heterogeneity more diversity in Big Data we have less plausible coefficients... and in some cases eventually only the causal ones (see identifiability result before)

30 confidence sets with invariant prediction 1. for each S {1,..., p} construct a set ˆΓ S (E) e.g., as follows: test H0,S (E): βpred(s) e }{{} regr. parameter β and σ e (S) }{{} resid. se. σ e E constancy of regression parameter and its residual error across e E 2. set ˆΓ(E) = 3. define Ŝ(E) = if otherwise S {1,...,p} γ ˆΓ(E) H0,S (E) is rejected at level α/2 set ˆΓ S (E) = ˆΓS (E) = classical CI for regr. param. based on X S ˆΓ S (E) {k; γ k 0} (est. plausible causal coeff.) (common part of plaus. causal coeff.)

31 Theorem (Peters, PB and Meinshausen, 2015) assume : P[γ ˆΓ S (E)] 1 α correct confidence interval in a regression problem then: P[γ ˆΓ(E)] 1 α P[Ŝ(E) S ] 1 α on the safe side (conservative) we do not need to care about identifiability: if the effect is not identifiable, the method will not wrongly claim an effect the first result on statistical confidence for causal predictors (route via graphical modeling for confidence sets seems awkward)

32 Theorem (Peters, PB and Meinshausen, 2015) assume : P[γ ˆΓ S (E)] 1 α correct confidence interval in a regression problem then: P[γ ˆΓ(E)] 1 α P[Ŝ(E) S ] 1 α on the safe side (conservative) we do not need to care about identifiability: if the effect is not identifiable, the method will not wrongly claim an effect the first result on statistical confidence for causal predictors (route via graphical modeling for confidence sets seems awkward)

33 Empirical results: simulations 100 different scenarios, 1000 data sets per scenario: E = 2, n obs = n interv {100,..., 500}, p {5,..., 40} SUCCESS PROBABILITY Marginal Marginal Regression Regression Ges Ges Gies (unknown) Gies (unknown) Gies (known) Gies (known) Lingam Lingam Invariant prediction Invariant prediction power to detect causal predictors FWER Marginal Marginal Regression Regression Ges Ges nown) Gies (unknown) nown) Gies (known) gam Lingam ction Invariant prediction familywise error rate: P[Ŝ(E) S ], aimed at 0.05

34 Gene perturbation experiments Kemmeren et al. (2014): genome-wide mrna expressions in yeast: p = 6170 genes n obs = 160 observational samples of wild-types n int = 1479 interventional samples each of them corresponds to a single gene deletion strain for our method: we use E = 2 (observational and interventional data) training-test data: training: all observational and 2/3 of interventional data test: other 1/3 of gene deletion interventions repeat this for the three blocks of interventional data since every interventional data point is used once as a response variable: we use coverage 1 α/n int with α 0.01 and n int = 1479

35 Results 8 genes are found as significant causal predictors method invar.pred. GIES IDA marg.corr. rand.guess. no. true pos * (out of 8) *: quantiles for selecting true positives among 8 random draws 2 (95%), 3 (99%) our invariant prediction method has most power! and it should exhibit control against false positive selections

36 a more detailed view ACTIVITY GENE observational training data interventional training data (interv. on genes other than 5954 and 4710) ACTIVITY GENE interventional test data point (intervention on gene 5954) 1st sign. gene successful ACTIVITY GENE ACTIVITY GENE ACTIVITY GENE 5954 ACTIVITY GENE observational training data interventional training data (interv. on genes other than 3729 and 3730) ACTIVITY GENE interventional test data point (intervention on gene 3729) 2nd sign. gene successful ACTIVITY GENE ACTIVITY GENE ACTIVITY GENE 3729 ACTIVITY GENE observational training data ACTIVITY GENE 3672 interventional training data (interv. on genes other than 3672 and 1475) ACTIVITY GENE 3672 ACTIVITY GENE interventional test data point (intervention on gene 3672) ACTIVITY GENE rd sign. gene not successful (poorer mod. fit?)

37 Concluding thoughts robustness : the main requirement is the invariance assumption e E : Y e = X e γ }{{} =X e S γ +ε e, ε e F ε, ε e X S when making it a definition of causality: essentially no other major conditions required (except hidden confounders) when proving it: have shown that it holds for SEM s under various (and weaker) notions of (potentially unspecified) interventions can also prove it for some hidden variable IV-type models (Peters, PB and Meinshausen, 2015) the invariance assumption is (probably) a rather robust assumption

38 statistical confidence statements are essential for improving replicability we haven t included non-linear or non-gaussian structures cf. Imoto (2002), Mooij et al. (2009), Peters et al. (2014), Shimizu (2005) improved identifiability invariance assumption for non-linear additive noise models: e E : Y e = f (X e S ) + εe, ε e F ε, ε e X S

39 due to robustness and generality of heterogeneous data ( change of environment ) progress towards more confirmatory causal inference? if it isn t confirmatory: potentially useful for prioritization of future experiments experimental validations are important (simple organisms in biology are ideal for pursuing this!)

40 Thank you! R-package: pcalg (Kalisch, Mächler, Colombo, Maathuis & PB, ) R-package: InvariantCausalPrediction (Meinshausen, 2014) References to some of our own work: Peters, J., Bühlmann, P. and Meinshausen, N. (2015). Causal inference using invariant prediction: identification and confidence intervals. Preprint arxiv: Hauser, A. and Bühlmann, P. (2015). Jointly interventional and observational data: estimation of interventional Markov equivalence classes of directed acyclic graphs. Journal of the Royal Statistical Society, Series B, 77, Bühlmann, P., Peters, J. and Ernest, J. (2014). CAM: Causal Additive Models, high-dimensional order search and penalized regression. Annals of Statistics 42, Peters, J. and Bühlmann, P. (2014). Identifiability of Gaussian structural equation models with equal error variances. Biometrika 101, van de Geer, S. and Bühlmann, P. (2013). l0-penalized maximum likelihood for sparse directed acyclic graphs. Annals of Statistics 41, Kalisch, M., Mächler, M., Colombo, D., Maathuis, M.H. and Bühlmann, P. (2012). Causal inference using graphical models with the R package pcalg. Journal of Statistical Software 47 (11), Stekhoven, D.J., Moraes, I., Sveinbjörnsson, G., Hennig, L., Maathuis, M.H. and Bühlmann, P. (2012). Causal stability ranking. Bioinformatics 28, Maathuis, M.H., Colombo, D., Kalisch, M. and Bühlmann, P. (2010). Predicting causal effects in large-scale systems from observational data. Nature Methods 7, Maathuis, M.H., Kalisch, M. and Bühlmann, P. (2009). Estimating high-dimensional intervention effects from observational data. Annals of Statistics 37,

Causality. Bernhard Schölkopf and Jonas Peters MPI for Intelligent Systems, Tübingen. MLSS, Tübingen 21st July 2015

Causality. Bernhard Schölkopf and Jonas Peters MPI for Intelligent Systems, Tübingen. MLSS, Tübingen 21st July 2015 Causality Bernhard Schölkopf and Jonas Peters MPI for Intelligent Systems, Tübingen MLSS, Tübingen 21st July 2015 Charig et al.: Comparison of treatment of renal calculi by open surgery, (...), British

More information

High-dimensional causal inference, graphical modeling and structural equation models

High-dimensional causal inference, graphical modeling and structural equation models High-dimensional causal inference, graphical modeling and structural equation models Peter Bühlmann Seminar für Statistik, ETH Zürich cannot do confirmatory causal inference without randomized intervention

More information

Predicting causal effects in large-scale systems from observational data

Predicting causal effects in large-scale systems from observational data nature methods Predicting causal effects in large-scale systems from observational data Marloes H Maathuis 1, Diego Colombo 1, Markus Kalisch 1 & Peter Bühlmann 1,2 Supplementary figures and text: Supplementary

More information

Invariance and causality for robust predictions

Invariance and causality for robust predictions Invariance and causality for robust predictions Peter Bühlmann Seminar für Statistik, ETH Zürich joint work with Jonas Peters Univ. Copenhagen Nicolai Meinshausen ETH Zürich Dominik Rothenhäusler ETH Zürich

More information

Simplicity of Additive Noise Models

Simplicity of Additive Noise Models Simplicity of Additive Noise Models Jonas Peters ETH Zürich - Marie Curie (IEF) Workshop on Simplicity and Causal Discovery Carnegie Mellon University 7th June 2014 contains joint work with... ETH Zürich:

More information

Causal Structure Learning and Inference: A Selective Review

Causal Structure Learning and Inference: A Selective Review Vol. 11, No. 1, pp. 3-21, 2014 ICAQM 2014 Causal Structure Learning and Inference: A Selective Review Markus Kalisch * and Peter Bühlmann Seminar for Statistics, ETH Zürich, CH-8092 Zürich, Switzerland

More information

Using background knowledge for the estimation of total causal e ects

Using background knowledge for the estimation of total causal e ects Using background knowledge for the estimation of total causal e ects Interpreting and using CPDAGs with background knowledge Emilija Perkovi, ETH Zurich Joint work with Markus Kalisch and Marloes Maathuis

More information

Jointly interventional and observational data: estimation of interventional Markov equivalence classes of directed acyclic graphs

Jointly interventional and observational data: estimation of interventional Markov equivalence classes of directed acyclic graphs J. R. Statist. Soc. B (2015) Jointly interventional and observational data: estimation of interventional Markov equivalence classes of directed acyclic graphs Alain Hauser, University of Bern, and Swiss

More information

arxiv: v1 [math.st] 13 Mar 2013

arxiv: v1 [math.st] 13 Mar 2013 Jointly interventional and observational data: estimation of interventional Markov equivalence classes of directed acyclic graphs arxiv:1303.316v1 [math.st] 13 Mar 013 Alain Hauser and Peter Bühlmann {hauser,

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04

More information

Causal Effect Identification in Alternative Acyclic Directed Mixed Graphs

Causal Effect Identification in Alternative Acyclic Directed Mixed Graphs Proceedings of Machine Learning Research vol 73:21-32, 2017 AMBN 2017 Causal Effect Identification in Alternative Acyclic Directed Mixed Graphs Jose M. Peña Linköping University Linköping (Sweden) jose.m.pena@liu.se

More information

High-dimensional learning of linear causal networks via inverse covariance estimation

High-dimensional learning of linear causal networks via inverse covariance estimation High-dimensional learning of linear causal networks via inverse covariance estimation Po-Ling Loh Department of Statistics, University of California, Berkeley, CA 94720, USA Peter Bühlmann Seminar für

More information

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Lucas Janson, Stanford Department of Statistics WADAPT Workshop, NIPS, December 2016 Collaborators: Emmanuel

More information

Predicting the effect of interventions using invariance principles for nonlinear models

Predicting the effect of interventions using invariance principles for nonlinear models Predicting the effect of interventions using invariance principles for nonlinear models Christina Heinze-Deml Seminar for Statistics, ETH Zürich heinzedeml@stat.math.ethz.ch Jonas Peters University of

More information

Interpreting and using CPDAGs with background knowledge

Interpreting and using CPDAGs with background knowledge Interpreting and using CPDAGs with background knowledge Emilija Perković Seminar for Statistics ETH Zurich, Switzerland perkovic@stat.math.ethz.ch Markus Kalisch Seminar for Statistics ETH Zurich, Switzerland

More information

DISCUSSION OF A SIGNIFICANCE TEST FOR THE LASSO. By Peter Bühlmann, Lukas Meier and Sara van de Geer ETH Zürich

DISCUSSION OF A SIGNIFICANCE TEST FOR THE LASSO. By Peter Bühlmann, Lukas Meier and Sara van de Geer ETH Zürich Submitted to the Annals of Statistics DISCUSSION OF A SIGNIFICANCE TEST FOR THE LASSO By Peter Bühlmann, Lukas Meier and Sara van de Geer ETH Zürich We congratulate Richard Lockhart, Jonathan Taylor, Ryan

More information

arxiv: v1 [cs.lg] 26 May 2017

arxiv: v1 [cs.lg] 26 May 2017 Learning Causal tructures Using Regression Invariance ariv:1705.09644v1 [cs.lg] 26 May 2017 AmirEmad Ghassami, aber alehkaleybar, Negar Kiyavash, Kun Zhang Department of ECE, University of Illinois at

More information

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models 02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput

More information

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 Statistics Boot Camp Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 March 21, 2018 Outline of boot camp Summarizing and simplifying data Point and interval estimation Foundations of statistical

More information

Learning in Bayesian Networks

Learning in Bayesian Networks Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks

More information

Marginal integration for nonparametric causal inference

Marginal integration for nonparametric causal inference Electronic Journal of Statistics Vol. 9 (2015) 3155 3194 ISSN: 1935-7524 DOI: 10.1214/15-EJS1075 Marginal integration for nonparametric causal inference Jan Ernest and Peter Bühlmann Seminar für Statistik

More information

Supplementary material to Structure Learning of Linear Gaussian Structural Equation Models with Weak Edges

Supplementary material to Structure Learning of Linear Gaussian Structural Equation Models with Weak Edges Supplementary material to Structure Learning of Linear Gaussian Structural Equation Models with Weak Edges 1 PRELIMINARIES Two vertices X i and X j are adjacent if there is an edge between them. A path

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 5, 06 Reading: See class website Eric Xing @ CMU, 005-06

More information

Probabilistic Causal Models

Probabilistic Causal Models Probabilistic Causal Models A Short Introduction Robin J. Evans www.stat.washington.edu/ rje42 ACMS Seminar, University of Washington 24th February 2011 1/26 Acknowledgements This work is joint with Thomas

More information

Uncertainty quantification in high-dimensional statistics

Uncertainty quantification in high-dimensional statistics Uncertainty quantification in high-dimensional statistics Peter Bühlmann ETH Zürich based on joint work with Sara van de Geer Nicolai Meinshausen Lukas Meier 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70

More information

Genetic Networks. Korbinian Strimmer. Seminar: Statistical Analysis of RNA-Seq Data 19 June IMISE, Universität Leipzig

Genetic Networks. Korbinian Strimmer. Seminar: Statistical Analysis of RNA-Seq Data 19 June IMISE, Universität Leipzig Genetic Networks Korbinian Strimmer IMISE, Universität Leipzig Seminar: Statistical Analysis of RNA-Seq Data 19 June 2012 Korbinian Strimmer, RNA-Seq Networks, 19/6/2012 1 Paper G. I. Allen and Z. Liu.

More information

A review of some recent advances in causal inference

A review of some recent advances in causal inference A review of some recent advances in causal inference Marloes H. Maathuis and Preetam Nandy arxiv:1506.07669v1 [stat.me] 25 Jun 2015 Contents 1 Introduction 1 1.1 Causal versus non-causal research questions.................

More information

Recovering the Graph Structure of Restricted Structural Equation Models

Recovering the Graph Structure of Restricted Structural Equation Models Recovering the Graph Structure of Restricted Structural Equation Models Workshop on Statistics for Complex Networks, Eindhoven Jonas Peters 1 J. Mooij 3, D. Janzing 2, B. Schölkopf 2, P. Bühlmann 1 1 Seminar

More information

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los

More information

High-dimensional statistics with a view towards applications in biology

High-dimensional statistics with a view towards applications in biology High-dimensional statistics with a view towards applications in biology Peter Bühlmann, Markus Kalisch and Lukas Meier ETH Zürich May 23, 2013 Abstract We review statistical methods for high-dimensional

More information

Empirical Bayes Moderation of Asymptotically Linear Parameters

Empirical Bayes Moderation of Asymptotically Linear Parameters Empirical Bayes Moderation of Asymptotically Linear Parameters Nima Hejazi Division of Biostatistics University of California, Berkeley stat.berkeley.edu/~nhejazi nimahejazi.org twitter/@nshejazi github/nhejazi

More information

arxiv: v3 [stat.me] 10 Mar 2016

arxiv: v3 [stat.me] 10 Mar 2016 Submitted to the Annals of Statistics ESTIMATING THE EFFECT OF JOINT INTERVENTIONS FROM OBSERVATIONAL DATA IN SPARSE HIGH-DIMENSIONAL SETTINGS arxiv:1407.2451v3 [stat.me] 10 Mar 2016 By Preetam Nandy,,

More information

Hierarchical High-Dimensional Statistical Inference

Hierarchical High-Dimensional Statistical Inference Hierarchical High-Dimensional Statistical Inference Peter Bu hlmann ETH Zu rich main collaborators: Sara van de Geer, Nicolai Meinshausen, Lukas Meier, Ruben Dezeure, Jacopo Mandozzi, Laura Buzdugan 0

More information

Latent Variable models for GWAs

Latent Variable models for GWAs Latent Variable models for GWAs Oliver Stegle Machine Learning and Computational Biology Research Group Max-Planck-Institutes Tübingen, Germany September 2011 O. Stegle Latent variable models for GWAs

More information

Distinguishing between Cause and Effect: Estimation of Causal Graphs with two Variables

Distinguishing between Cause and Effect: Estimation of Causal Graphs with two Variables Distinguishing between Cause and Effect: Estimation of Causal Graphs with two Variables Jonas Peters ETH Zürich Tutorial NIPS 2013 Workshop on Causality 9th December 2013 F. H. Messerli: Chocolate Consumption,

More information

Learning discrete graphical models via generalized inverse covariance matrices

Learning discrete graphical models via generalized inverse covariance matrices Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,

More information

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Mingon Kang, Ph.D. Computer Science, Kennesaw State University Problems

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Causal Graphical Models in Systems Genetics

Causal Graphical Models in Systems Genetics 1 Causal Graphical Models in Systems Genetics 2013 Network Analysis Short Course - UCLA Human Genetics Elias Chaibub Neto and Brian S Yandell July 17, 2013 Motivation and basic concepts 2 3 Motivation

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu Department of Statistics, University of Illinois at Urbana-Champaign WHOA-PSI, Aug, 2017 St. Louis, Missouri 1 / 30 Background Variable

More information

Deep Convolutional Neural Networks for Pairwise Causality

Deep Convolutional Neural Networks for Pairwise Causality Deep Convolutional Neural Networks for Pairwise Causality Karamjit Singh, Garima Gupta, Lovekesh Vig, Gautam Shroff, and Puneet Agarwal TCS Research, Delhi Tata Consultancy Services Ltd. {karamjit.singh,

More information

GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data

GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data 1 Gene Networks Definition: A gene network is a set of molecular components, such as genes and proteins, and interactions between

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

arxiv: v6 [math.st] 3 Feb 2018

arxiv: v6 [math.st] 3 Feb 2018 Submitted to the Annals of Statistics HIGH-DIMENSIONAL CONSISTENCY IN SCORE-BASED AND HYBRID STRUCTURE LEARNING arxiv:1507.02608v6 [math.st] 3 Feb 2018 By Preetam Nandy,, Alain Hauser and Marloes H. Maathuis,

More information

10708 Graphical Models: Homework 2

10708 Graphical Models: Homework 2 10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves

More information

Causal Inference for High-Dimensional Data. Atlantic Causal Conference

Causal Inference for High-Dimensional Data. Atlantic Causal Conference Causal Inference for High-Dimensional Data Atlantic Causal Conference Overview Conditional Independence Directed Acyclic Graph (DAG) Models factorization and d-separation Markov equivalence Structure Learning

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

This paper has been submitted for consideration for publication in Biometrics

This paper has been submitted for consideration for publication in Biometrics Biometrics 63, 1?? March 2014 DOI: 10.1111/j.1541-0420.2005.00454.x PenPC: A Two-step Approach to Estimate the Skeletons of High Dimensional Directed Acyclic Graphs Min Jin Ha Department of Biostatistics,

More information

Proximity-Based Anomaly Detection using Sparse Structure Learning

Proximity-Based Anomaly Detection using Sparse Structure Learning Proximity-Based Anomaly Detection using Sparse Structure Learning Tsuyoshi Idé (IBM Tokyo Research Lab) Aurelie C. Lozano, Naoki Abe, and Yan Liu (IBM T. J. Watson Research Center) 2009/04/ SDM 2009 /

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University UMN Statistics Seminar, Mar 30, 2017 Overview Parameter est. Model selection Point est. MLE, M-est.,... Cross-validation

More information

25 : Graphical induced structured input/output models

25 : Graphical induced structured input/output models 10-708: Probabilistic Graphical Models 10-708, Spring 2013 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Meghana Kshirsagar (mkshirsa), Yiwen Chen (yiwenche) 1 Graph

More information

Dimension Reduction. David M. Blei. April 23, 2012

Dimension Reduction. David M. Blei. April 23, 2012 Dimension Reduction David M. Blei April 23, 2012 1 Basic idea Goal: Compute a reduced representation of data from p -dimensional to q-dimensional, where q < p. x 1,...,x p z 1,...,z q (1) We want to do

More information

arxiv: v1 [stat.ml] 13 Mar 2018

arxiv: v1 [stat.ml] 13 Mar 2018 SAM: Structural Agnostic Model, Causal Discovery and Penalized Adversarial Learning arxiv:1803.04929v1 [stat.ml] 13 Mar 2018 Diviyan Kalainathan 1, Olivier Goudet 1, Isabelle Guyon 1, David Lopez-Paz 2,

More information

Shrinkage Methods: Ridge and Lasso

Shrinkage Methods: Ridge and Lasso Shrinkage Methods: Ridge and Lasso Jonathan Hersh 1 Chapman University, Argyros School of Business hersh@chapman.edu February 27, 2019 J.Hersh (Chapman) Ridge & Lasso February 27, 2019 1 / 43 1 Intro and

More information

Post-Selection Inference

Post-Selection Inference Classical Inference start end start Post-Selection Inference selected end model data inference data selection model data inference Post-Selection Inference Todd Kuffner Washington University in St. Louis

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010 Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,

More information

Empirical Likelihood Based Deviance Information Criterion

Empirical Likelihood Based Deviance Information Criterion Empirical Likelihood Based Deviance Information Criterion Yin Teng Smart and Safe City Center of Excellence NCS Pte Ltd June 22, 2016 Outline Bayesian empirical likelihood Definition Problems Empirical

More information

arxiv: v4 [math.st] 19 Jun 2018

arxiv: v4 [math.st] 19 Jun 2018 Complete Graphical Characterization and Construction of Adjustment Sets in Markov Equivalence Classes of Ancestral Graphs Emilija Perković, Johannes Textor, Markus Kalisch and Marloes H. Maathuis arxiv:1606.06903v4

More information

arxiv: v2 [cs.lg] 9 Mar 2017

arxiv: v2 [cs.lg] 9 Mar 2017 Journal of Machine Learning Research? (????)??-?? Submitted?/??; Published?/?? Joint Causal Inference from Observational and Experimental Datasets arxiv:1611.10351v2 [cs.lg] 9 Mar 2017 Sara Magliacane

More information

Robust Inverse Covariance Estimation under Noisy Measurements

Robust Inverse Covariance Estimation under Noisy Measurements .. Robust Inverse Covariance Estimation under Noisy Measurements Jun-Kun Wang, Shou-De Lin Intel-NTU, National Taiwan University ICML 2014 1 / 30 . Table of contents Introduction.1 Introduction.2 Related

More information

Causal Inference on Discrete Data via Estimating Distance Correlations

Causal Inference on Discrete Data via Estimating Distance Correlations ARTICLE CommunicatedbyAapoHyvärinen Causal Inference on Discrete Data via Estimating Distance Correlations Furui Liu frliu@cse.cuhk.edu.hk Laiwan Chan lwchan@cse.euhk.edu.hk Department of Computer Science

More information

Least Absolute Shrinkage is Equivalent to Quadratic Penalization

Least Absolute Shrinkage is Equivalent to Quadratic Penalization Least Absolute Shrinkage is Equivalent to Quadratic Penalization Yves Grandvalet Heudiasyc, UMR CNRS 6599, Université de Technologie de Compiègne, BP 20.529, 60205 Compiègne Cedex, France Yves.Grandvalet@hds.utc.fr

More information

Center for Causal Discovery: Summer Workshop

Center for Causal Discovery: Summer Workshop Center for Causal Discovery: Summer Workshop - 2015 June 8-11, 2015 Carnegie Mellon University 1 Goals 1) Working knowledge of graphical causal models 2) Basic working knowledge of Tetrad V 3) Basic understanding

More information

High-Dimensional Learning of Linear Causal Networks via Inverse Covariance Estimation

High-Dimensional Learning of Linear Causal Networks via Inverse Covariance Estimation University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 10-014 High-Dimensional Learning of Linear Causal Networks via Inverse Covariance Estimation Po-Ling Loh University

More information

Causal Model Selection Hypothesis Tests in Systems Genetics

Causal Model Selection Hypothesis Tests in Systems Genetics 1 Causal Model Selection Hypothesis Tests in Systems Genetics Elias Chaibub Neto and Brian S Yandell SISG 2012 July 13, 2012 2 Correlation and Causation The old view of cause and effect... could only fail;

More information

An ensemble learning method for variable selection

An ensemble learning method for variable selection An ensemble learning method for variable selection Vincent Audigier, Avner Bar-Hen CNAM, MSDMA team, Paris Journées de la statistique 2018 1 / 17 Context Y n 1 = X n p β p 1 + ε n 1 ε N (0, σ 2 I) β sparse

More information

Basics of Modern Missing Data Analysis

Basics of Modern Missing Data Analysis Basics of Modern Missing Data Analysis Kyle M. Lang Center for Research Methods and Data Analysis University of Kansas March 8, 2013 Topics to be Covered An introduction to the missing data problem Missing

More information

Approximate Bayesian Computation

Approximate Bayesian Computation Approximate Bayesian Computation Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki and Aalto University 1st December 2015 Content Two parts: 1. The basics of approximate

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

The lasso: some novel algorithms and applications

The lasso: some novel algorithms and applications 1 The lasso: some novel algorithms and applications Newton Institute, June 25, 2008 Robert Tibshirani Stanford University Collaborations with Trevor Hastie, Jerome Friedman, Holger Hoefling, Gen Nowak,

More information

Exam: high-dimensional data analysis January 20, 2014

Exam: high-dimensional data analysis January 20, 2014 Exam: high-dimensional data analysis January 20, 204 Instructions: - Write clearly. Scribbles will not be deciphered. - Answer each main question not the subquestions on a separate piece of paper. - Finish

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

Type-II Errors of Independence Tests Can Lead to Arbitrarily Large Errors in Estimated Causal Effects: An Illustrative Example

Type-II Errors of Independence Tests Can Lead to Arbitrarily Large Errors in Estimated Causal Effects: An Illustrative Example Type-II Errors of Independence Tests Can Lead to Arbitrarily Large Errors in Estimated Causal Effects: An Illustrative Example Nicholas Cornia & Joris M. Mooij Informatics Institute University of Amsterdam,

More information

BTRY 7210: Topics in Quantitative Genomics and Genetics

BTRY 7210: Topics in Quantitative Genomics and Genetics BTRY 7210: Topics in Quantitative Genomics and Genetics Jason Mezey Biological Statistics and Computational Biology (BSCB) Department of Genetic Medicine jgm45@cornell.edu February 12, 2015 Lecture 3:

More information

Group exponential penalties for bi-level variable selection

Group exponential penalties for bi-level variable selection for bi-level variable selection Department of Biostatistics Department of Statistics University of Kentucky July 31, 2011 Introduction In regression, variables can often be thought of as grouped: Indicator

More information

Causal Mediation Analysis in R. Quantitative Methodology and Causal Mechanisms

Causal Mediation Analysis in R. Quantitative Methodology and Causal Mechanisms Causal Mediation Analysis in R Kosuke Imai Princeton University June 18, 2009 Joint work with Luke Keele (Ohio State) Dustin Tingley and Teppei Yamamoto (Princeton) Kosuke Imai (Princeton) Causal Mediation

More information

Identifiability of Gaussian structural equation models with equal error variances

Identifiability of Gaussian structural equation models with equal error variances Biometrika (2014), 101,1,pp. 219 228 doi: 10.1093/biomet/ast043 Printed in Great Britain Advance Access publication 8 November 2013 Identifiability of Gaussian structural equation models with equal error

More information

Tutorial on Approximate Bayesian Computation

Tutorial on Approximate Bayesian Computation Tutorial on Approximate Bayesian Computation Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology 16 May 2016

More information

ESTIMATING HIGH-DIMENSIONAL INTERVENTION EFFECTS FROM OBSERVATIONAL DATA

ESTIMATING HIGH-DIMENSIONAL INTERVENTION EFFECTS FROM OBSERVATIONAL DATA The Annals of Statistics 2009, Vol. 37, No. 6A, 3133 3164 DOI: 10.1214/09-AOS685 Institute of Mathematical Statistics, 2009 ESTIMATING HIGH-DIMENSIONAL INTERVENTION EFFECTS FROM OBSERVATIONAL DATA BY MARLOES

More information

Advances in Cyclic Structural Causal Models

Advances in Cyclic Structural Causal Models Advances in Cyclic Structural Causal Models Joris Mooij j.m.mooij@uva.nl June 1st, 2018 Joris Mooij (UvA) Rotterdam 2018 2018-06-01 1 / 41 Part I Introduction to Causality Joris Mooij (UvA) Rotterdam 2018

More information

LASSO-Type Penalization in the Framework of Generalized Additive Models for Location, Scale and Shape

LASSO-Type Penalization in the Framework of Generalized Additive Models for Location, Scale and Shape LASSO-Type Penalization in the Framework of Generalized Additive Models for Location, Scale and Shape Nikolaus Umlauf https://eeecon.uibk.ac.at/~umlauf/ Overview Joint work with Andreas Groll, Julien Hambuckers

More information

11 : Gaussian Graphic Models and Ising Models

11 : Gaussian Graphic Models and Ising Models 10-708: Probabilistic Graphical Models 10-708, Spring 2017 11 : Gaussian Graphic Models and Ising Models Lecturer: Bryon Aragam Scribes: Chao-Ming Yen 1 Introduction Different from previous maximum likelihood

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University WHOA-PSI Workshop, St Louis, 2017 Quotes from Day 1 and Day 2 Good model or pure model? Occam s razor We really

More information

arxiv: v1 [cs.lg] 3 Jan 2017

arxiv: v1 [cs.lg] 3 Jan 2017 Deep Convolutional Neural Networks for Pairwise Causality Karamjit Singh, Garima Gupta, Lovekesh Vig, Gautam Shroff, and Puneet Agarwal TCS Research, New-Delhi, India January 4, 2017 arxiv:1701.00597v1

More information

Causal Discovery by Computer

Causal Discovery by Computer Causal Discovery by Computer Clark Glymour Carnegie Mellon University 1 Outline 1. A century of mistakes about causation and discovery: 1. Fisher 2. Yule 3. Spearman/Thurstone 2. Search for causes is statistical

More information

Marginal Screening and Post-Selection Inference

Marginal Screening and Post-Selection Inference Marginal Screening and Post-Selection Inference Ian McKeague August 13, 2017 Ian McKeague (Columbia University) Marginal Screening August 13, 2017 1 / 29 Outline 1 Background on Marginal Screening 2 2

More information

Causal Inference: Discussion

Causal Inference: Discussion Causal Inference: Discussion Mladen Kolar The University of Chicago Booth School of Business Sept 23, 2016 Types of machine learning problems Based on the information available: Supervised learning Reinforcement

More information

36-463/663: Multilevel & Hierarchical Models

36-463/663: Multilevel & Hierarchical Models 36-463/663: Multilevel & Hierarchical Models (P)review: in-class midterm Brian Junker 132E Baker Hall brian@stat.cmu.edu 1 In-class midterm Closed book, closed notes, closed electronics (otherwise I have

More information

MSA220/MVE440 Statistical Learning for Big Data

MSA220/MVE440 Statistical Learning for Big Data MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

Integrating expert s knowledge in constraint learning algorithm with time-dependent exposures

Integrating expert s knowledge in constraint learning algorithm with time-dependent exposures Integrating expert s knowledge in constraint learning algorithm with time-dependent exposures V. Asvatourian, S. Michiels, E. Lanoy Biostatistics and Epidemiology unit, Gustave-Roussy, Villejuif, France

More information

Statistical Models. David M. Blei Columbia University. October 14, 2014

Statistical Models. David M. Blei Columbia University. October 14, 2014 Statistical Models David M. Blei Columbia University October 14, 2014 We have discussed graphical models. Graphical models are a formalism for representing families of probability distributions. They are

More information

Graphical Model Selection

Graphical Model Selection May 6, 2013 Trevor Hastie, Stanford Statistics 1 Graphical Model Selection Trevor Hastie Stanford University joint work with Jerome Friedman, Rob Tibshirani, Rahul Mazumder and Jason Lee May 6, 2013 Trevor

More information

Predicting Protein Functions and Domain Interactions from Protein Interactions

Predicting Protein Functions and Domain Interactions from Protein Interactions Predicting Protein Functions and Domain Interactions from Protein Interactions Fengzhu Sun, PhD Center for Computational and Experimental Genomics University of Southern California Outline High-throughput

More information

The Monte Carlo Method: Bayesian Networks

The Monte Carlo Method: Bayesian Networks The Method: Bayesian Networks Dieter W. Heermann Methods 2009 Dieter W. Heermann ( Methods)The Method: Bayesian Networks 2009 1 / 18 Outline 1 Bayesian Networks 2 Gene Expression Data 3 Bayesian Networks

More information

Regularization and Variable Selection via the Elastic Net

Regularization and Variable Selection via the Elastic Net p. 1/1 Regularization and Variable Selection via the Elastic Net Hui Zou and Trevor Hastie Journal of Royal Statistical Society, B, 2005 Presenter: Minhua Chen, Nov. 07, 2008 p. 2/1 Agenda Introduction

More information

Regression. Oscar García

Regression. Oscar García Regression Oscar García Regression methods are fundamental in Forest Mensuration For a more concise and general presentation, we shall first review some matrix concepts 1 Matrices An order n m matrix is

More information