Marginal consistency of constraint-based causal learning

Similar documents
Causal Discovery Methods Using Causal Probabilistic Networks

Interpreting and using CPDAGs with background knowledge

Towards an extension of the PC algorithm to local context-specific independencies detection

Using background knowledge for the estimation of total causal e ects

Predicting causal effects in large-scale systems from observational data

Arrowhead completeness from minimal conditional independencies

arxiv: v3 [stat.me] 3 Jun 2015

Causal Structure Learning and Inference: A Selective Review

Robust reconstruction of causal graphical models based on conditional 2-point and 3-point information

arxiv: v4 [math.st] 19 Jun 2018

Learning equivalence classes of acyclic models with latent and selection variables from multiple datasets with overlapping variables

arxiv: v2 [cs.lg] 9 Mar 2017

Learning causal network structure from multiple (in)dependence models

Measurement Error and Causal Discovery

Causal Reasoning with Ancestral Graphs

Generalized Do-Calculus with Testable Causal Assumptions

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

Robust reconstruction of causal graphical models based on conditional 2-point and 3-point information

Scalable Probabilistic Causal Structure Discovery

Causality in Econometrics (3)

arxiv: v2 [cs.lg] 11 Nov 2016

Causal Models with Hidden Variables

How Mega is the Mega? Measuring the Spillover Effects of WeChat by Machine Learning and Econometrics

Constraint-based Causal Discovery: Conflict Resolution with Answer Set Programming

Faithfulness of Probability Distributions and Graphs

arxiv: v3 [cs.lg] 26 Jan 2017

Abstract. Three Methods and Their Limitations. N-1 Experiments Suffice to Determine the Causal Relations Among N Variables

Learning Semi-Markovian Causal Models using Experiments

Identifiability assumptions for directed graphical models with feedback

Learning Multivariate Regression Chain Graphs under Faithfulness

Local Structure Discovery in Bayesian Networks

Lecture 24, Causal Discovery

Automatic Causal Discovery

Markov properties for directed graphs

Causal Discovery. Richard Scheines. Peter Spirtes, Clark Glymour, and many others. Dept. of Philosophy & CALD Carnegie Mellon

Causal Inference for High-Dimensional Data. Atlantic Causal Conference

Conditional Independence test for categorical data using Poisson log-linear model

Equivalence in Non-Recursive Structural Equation Models

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

Data Mining 2018 Bayesian Networks (1)

Challenges (& Some Solutions) and Making Connections

Inferring the Causal Decomposition under the Presence of Deterministic Relations.

Experimental Design for Learning Causal Graphs with Latent Variables

Causal Inference on Data Containing Deterministic Relations.

Causal inference (with statistical uncertainty) based on invariance: exploiting the power of heterogeneous data

arxiv: v1 [math.st] 6 Jul 2015

Constraint-based Causal Discovery with Mixed Data

Constraint-based causal discovery with mixed data

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

Learning of Causal Relations

Tutorial: Causal Model Search

arxiv: v1 [cs.ai] 16 Aug 2016

Stochastic Complexity for Testing Conditional Independence on Discrete Data

Noisy-OR Models with Latent Confounding

Machine Learning Summer School

A review of some recent advances in causal inference

arxiv: v6 [math.st] 3 Feb 2018

Generalized Measurement Models

Causal Discovery with Latent Variables: the Measurement Problem

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

The Inflation Technique for Causal Inference with Latent Variables

Probabilistic Graphical Networks: Definitions and Basic Results

Joint estimation of linear non-gaussian acyclic models

Copula PC Algorithm for Causal Discovery from Mixed Data

Package CausalFX. May 20, 2015

Causal Inference in the Presence of Latent Variables and Selection Bias

Causal Inference & Reasoning with Causal Bayesian Networks

Robustification of the PC-algorithm for Directed Acyclic Graphs

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

Markovian Combination of Decomposable Model Structures: MCMoSt

Pairwise Cluster Comparison for Learning Latent Variable Models

JOINT PROBABILISTIC INFERENCE OF CAUSAL STRUCTURE

Detecting marginal and conditional independencies between events and learning their causal structure.

Bounding the False Discovery Rate in Local Bayesian Network Learning

Causal Discovery of Linear Cyclic Models from Multiple Experimental Data Sets with Overlapping Variables

Undirected Graphical Models

Learning Latent Variable Models by Pairwise Cluster Comparison: Part I Theory and Overview

Causal Effect Identification in Alternative Acyclic Directed Mixed Graphs

Fast and Accurate Causal Inference from Time Series Data

Day 3: Search Continued

Supplementary material to Structure Learning of Linear Gaussian Structural Equation Models with Weak Edges

Using Descendants as Instrumental Variables for the Identification of Direct Causal Effects in Linear SEMs

Strategies for Discovering Mechanisms of Mind using fmri: 6 NUMBERS. Joseph Ramsey, Ruben Sanchez Romero and Clark Glymour

Comparing Bayesian Networks and Structure Learning Algorithms

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Learning the Structure of Linear Latent Variable Models

CAUSAL DISCOVERY UNDER NON-STATIONARY FEEDBACK

arxiv: v4 [stat.ml] 2 Dec 2017

Inference of Cause and Effect with Unsupervised Inverse Regression

LEARNING MARGINAL AMP CHAIN GRAPHS UNDER FAITHFULNESS REVISITED

An Introduction to Reversible Jump MCMC for Bayesian Networks, with Application

arxiv: v3 [stat.me] 10 Mar 2016

On Learning Causal Models from Relational Data

Causal Inference on Discrete Data via Estimating Distance Correlations

Bayesian Networks Representation and Reasoning

On Causal Explanations of Quantum Correlations

PC Algorithm for Nonparanormal Graphical Models

Probabilistic Graphical Models

GEOMETRY OF THE FAITHFULNESS ASSUMPTION IN CAUSAL INFERENCE 1

Bayesian Network Structure Learning and Inference Methods for Handwriting

Transcription:

Marginal consistency of constraint-based causal learning nna Roumpelaki, Giorgos orboudakis, Sofia Triantafillou and Ioannis Tsamardinos University of rete, Greece.

onstraint-based ausal Learning Measure X 1,, X 8 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 I X 8 X 3 X 2 X 1 X 6 X 4 X 5 X 7

onstraint-based ausal Learning Measure X 1,, X 8 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 I X 8 X 3 X 2 X 1 X 6 X 4 X 5 X 7

onstraint-based ausal Learning Measure X 1,, X 8 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 I X 8 X 3 X 2 X 1 X 6 X 4 X 5 X 7 How important is the choice of variables in learning causal relationships? Would you have learnt the same relationships if you had chosen a subset of the variables?

Maximal ncestral Graphs Maximal ncestral Graphs apture the conditional independencies of the joint probability P over a set of variables under ausal Markov and aithfulness conditions irected edges denote causal ancestry. i-directed edges denote confounding. No directed cycles. No almost directed cycles. re closed under marginalization. G = V, is a MG, G[ L = V L, is also a MG. an also handle selection (not here). causes confounded

Partially Oriented ncestral Graphs Summarize pairwise features of a Markov equivalence class of MGs. ircles denote ambiguous endpoints. an be learnt from data using I. I: Sensitive to error propagation. Order-dependent (if you change the order of variables in the data set you get a different result). xtensions of I [4]: Order independent (ii) [1]: output does not depend on the order of the variables. onservative I (ci) [3]: Makes additional tests of independence for more robust orientations. orgoes orientations that are not supported by all tests. Majority rule (mi) [3]: conservative I that orients ambiguous triplets based on majority of test results for each triplet. causes in all Markov equivalent MGs and are either confounded or causes

What happens when you marginalize out variables? Measure X 1,, X 8 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 I X 8 X 3 X 2 X 1 X 6 X 4 X 5 X 7

What happens when you marginalize out variables? Measure X 1,, X 8 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 I X 8 X 3 X 2 X 1 X 6 X 4 marginal dataset without variables X 2, X 5 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 5 X 7 X 8 X 2 X 3 X 1 X6 X 4 X 5 X 7

What happens when you marginalize out variables? Measure X 1,, X 8 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 I X 8 X 3 X 2 X 1 X 6 X 4 marginal dataset without variables X 2, X 5 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 5 X 7 X 8 X 2 X 3 X 1 X6 X 4 X 5 X 7

inding Nnc P (X, Y) Nnc P : X is not an ancestor of Y in all MGs represented by P. X has no potentially directed path to Y. PG P

inding Nnc P (X, Y) Nnc P : X is not an ancestor of Y in all MGs represented by P. X has no potentially directed path to Y. (, ), (, ), (, ) PG P

inding Nnc P (X, Y) Nnc P : X is not an ancestor of Y in all MGs represented by P X has no potentially directed path to Y. (, ), (, ), (, ) (, ), (, ), (, ), (, ) PG P

inding Nnc P (X, Y) Nnc P : X is not an ancestor of Y in all MGs represented by P X has no potentially directed path to Y. (, ), (, ), (, ) (, ), (, ), (, ), (, ) (, ), (, ), (, ), (, ), (, ) PG P

inding nc P (X, Y) Nnc P : X is an ancestor of Y in all MGs represented by P. Take the transitive closure of directed edges in the PG.

inding nc P (X, Y) Nnc P : X is an ancestor of Y in all MGs represented by P. Take the transitive closure of directed edges in the PG.,,,,

inding nc P (X, Y) Nnc P : X is an ancestor of Y in all MGs represented by P. Take the transitive closure of directed edges in the PG.,,,,

inding nc P (X, Y) Nnc P : X is an ancestor of Y in all MGs represented by P. Take the transitive closure of directed edges in the PG.,,,, Some relations are not so obvious! causes in all MGs repr. by P.

inding nc P X Theorem: If (X,Y) is not in the transitive closure of PG P, X is an ancestor of Y if and only if U,V, U V such that: U V 1. There are uncovered potentially directed paths from X to Y via U and V in P, and 2. <U,X,V> is an unshielded definite non-collider in P. Y Uncovered potentially directed path (every triplet is an unshielded noncollider).

inding nc P (X, Y) Nnc P : X is an ancestor of Y in all MGs represented by P. Take the transitive closure of directed edges in the PG.,,,,,,,,,

mbiguous relationships If (X, Y) nc P Nnc P : mbiguous relationship. If you marginalize out variables, (non) ancestral relationships can become ambiguous.

original original Marginal onsistency Under perfect statistical knowledge: ncestral relations in the marginal are ancestral in the original PG (over V L) nc P[ L nc P V L (d=0, e=0) In reality: ncestral relations in the marginal can be: non-ancestral in the original PG (d). ambiguous in the original PG (e) Non-ancestral relations in the marginal nonancestral in the original PG (over V L) Nnc P[ L Nnc P V L (c=0, f=0) Non-ancestral relations in the marginal can be: ncestral in the original PG (c). ambiguous in the original PG (f) marginal marginal

xperiments 50 Gs of 20 variables. Graph density: 0.1 and 0.2. ontinuous data sets with 1000 samples. 100 random marginals of 18 and 15 variables. lgorithms used (pcalg) [2]: I ii mi

Results (ncestral relationships) 18 variables 15 variables density 0.1 density 0.2 ncestral in both. ncestral in marginal nonancestral in original. ncestral in marginal ambiguous in original.

Results (Non ncestral Relationships) 18 variables 15 variables density 0.1 density 0.2 Non-ncestral in both. Non-ncestral in marginal, ancestral in original. Non-ncestral in marginal ambiguous in original.

onclusions onstraint-based methods are sensitive to marginalization. onsistency of causal predictions drops for denser networks/smaller marginals.. Non-causal predictions are very consistent. Majority rule I outperforms other I variants.

Ranking based on marginal consistency an you use marginal consistency to find more robust predictions? re predictions that are frequent in the marginals more robust? ompare with bootstrapping.

Results 18 variables 15 variables bootstrapping density 0.1 density 0.2

onclusion/uture work Ranking by marginal consistency can help identify robust causal predictions. ootstrapping is more successful in ranking causal predictions (by a small margin). Ranking by marginals can become much faster by caching tests of independence. Try it for much larger data sets (number of variables). ombine with bootstrapping. Try to identify a marginal that maximizes marginal consistency. Try to identify a graph that is consistent for more marginals.

References 1. iego olombo and Marloes H Maathuis. Order-independent constraint-based causal structural learning. JMLR 2014. 2. Markus Kalisch, Martin Mächler, iego olombo, Marloes H Maathuis, and Peter ühlmann. ausal inference using graphical models with the R package pcalg. JSS 2012. 3. Joseph Ramsey, Jiji Zhang, and Peter Spirtes. djacency-faithfulness and conservative causal inference UI 2006. 4. Peter Spirtes, lark Glymour, and Richard Scheines. ausation, Prediction,and Search. MIT Press, 2000.