Marginal consistency of constraint-based causal learning nna Roumpelaki, Giorgos orboudakis, Sofia Triantafillou and Ioannis Tsamardinos University of rete, Greece.
onstraint-based ausal Learning Measure X 1,, X 8 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 I X 8 X 3 X 2 X 1 X 6 X 4 X 5 X 7
onstraint-based ausal Learning Measure X 1,, X 8 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 I X 8 X 3 X 2 X 1 X 6 X 4 X 5 X 7
onstraint-based ausal Learning Measure X 1,, X 8 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 I X 8 X 3 X 2 X 1 X 6 X 4 X 5 X 7 How important is the choice of variables in learning causal relationships? Would you have learnt the same relationships if you had chosen a subset of the variables?
Maximal ncestral Graphs Maximal ncestral Graphs apture the conditional independencies of the joint probability P over a set of variables under ausal Markov and aithfulness conditions irected edges denote causal ancestry. i-directed edges denote confounding. No directed cycles. No almost directed cycles. re closed under marginalization. G = V, is a MG, G[ L = V L, is also a MG. an also handle selection (not here). causes confounded
Partially Oriented ncestral Graphs Summarize pairwise features of a Markov equivalence class of MGs. ircles denote ambiguous endpoints. an be learnt from data using I. I: Sensitive to error propagation. Order-dependent (if you change the order of variables in the data set you get a different result). xtensions of I [4]: Order independent (ii) [1]: output does not depend on the order of the variables. onservative I (ci) [3]: Makes additional tests of independence for more robust orientations. orgoes orientations that are not supported by all tests. Majority rule (mi) [3]: conservative I that orients ambiguous triplets based on majority of test results for each triplet. causes in all Markov equivalent MGs and are either confounded or causes
What happens when you marginalize out variables? Measure X 1,, X 8 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 I X 8 X 3 X 2 X 1 X 6 X 4 X 5 X 7
What happens when you marginalize out variables? Measure X 1,, X 8 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 I X 8 X 3 X 2 X 1 X 6 X 4 marginal dataset without variables X 2, X 5 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 5 X 7 X 8 X 2 X 3 X 1 X6 X 4 X 5 X 7
What happens when you marginalize out variables? Measure X 1,, X 8 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 I X 8 X 3 X 2 X 1 X 6 X 4 marginal dataset without variables X 2, X 5 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 5 X 7 X 8 X 2 X 3 X 1 X6 X 4 X 5 X 7
inding Nnc P (X, Y) Nnc P : X is not an ancestor of Y in all MGs represented by P. X has no potentially directed path to Y. PG P
inding Nnc P (X, Y) Nnc P : X is not an ancestor of Y in all MGs represented by P. X has no potentially directed path to Y. (, ), (, ), (, ) PG P
inding Nnc P (X, Y) Nnc P : X is not an ancestor of Y in all MGs represented by P X has no potentially directed path to Y. (, ), (, ), (, ) (, ), (, ), (, ), (, ) PG P
inding Nnc P (X, Y) Nnc P : X is not an ancestor of Y in all MGs represented by P X has no potentially directed path to Y. (, ), (, ), (, ) (, ), (, ), (, ), (, ) (, ), (, ), (, ), (, ), (, ) PG P
inding nc P (X, Y) Nnc P : X is an ancestor of Y in all MGs represented by P. Take the transitive closure of directed edges in the PG.
inding nc P (X, Y) Nnc P : X is an ancestor of Y in all MGs represented by P. Take the transitive closure of directed edges in the PG.,,,,
inding nc P (X, Y) Nnc P : X is an ancestor of Y in all MGs represented by P. Take the transitive closure of directed edges in the PG.,,,,
inding nc P (X, Y) Nnc P : X is an ancestor of Y in all MGs represented by P. Take the transitive closure of directed edges in the PG.,,,, Some relations are not so obvious! causes in all MGs repr. by P.
inding nc P X Theorem: If (X,Y) is not in the transitive closure of PG P, X is an ancestor of Y if and only if U,V, U V such that: U V 1. There are uncovered potentially directed paths from X to Y via U and V in P, and 2. <U,X,V> is an unshielded definite non-collider in P. Y Uncovered potentially directed path (every triplet is an unshielded noncollider).
inding nc P (X, Y) Nnc P : X is an ancestor of Y in all MGs represented by P. Take the transitive closure of directed edges in the PG.,,,,,,,,,
mbiguous relationships If (X, Y) nc P Nnc P : mbiguous relationship. If you marginalize out variables, (non) ancestral relationships can become ambiguous.
original original Marginal onsistency Under perfect statistical knowledge: ncestral relations in the marginal are ancestral in the original PG (over V L) nc P[ L nc P V L (d=0, e=0) In reality: ncestral relations in the marginal can be: non-ancestral in the original PG (d). ambiguous in the original PG (e) Non-ancestral relations in the marginal nonancestral in the original PG (over V L) Nnc P[ L Nnc P V L (c=0, f=0) Non-ancestral relations in the marginal can be: ncestral in the original PG (c). ambiguous in the original PG (f) marginal marginal
xperiments 50 Gs of 20 variables. Graph density: 0.1 and 0.2. ontinuous data sets with 1000 samples. 100 random marginals of 18 and 15 variables. lgorithms used (pcalg) [2]: I ii mi
Results (ncestral relationships) 18 variables 15 variables density 0.1 density 0.2 ncestral in both. ncestral in marginal nonancestral in original. ncestral in marginal ambiguous in original.
Results (Non ncestral Relationships) 18 variables 15 variables density 0.1 density 0.2 Non-ncestral in both. Non-ncestral in marginal, ancestral in original. Non-ncestral in marginal ambiguous in original.
onclusions onstraint-based methods are sensitive to marginalization. onsistency of causal predictions drops for denser networks/smaller marginals.. Non-causal predictions are very consistent. Majority rule I outperforms other I variants.
Ranking based on marginal consistency an you use marginal consistency to find more robust predictions? re predictions that are frequent in the marginals more robust? ompare with bootstrapping.
Results 18 variables 15 variables bootstrapping density 0.1 density 0.2
onclusion/uture work Ranking by marginal consistency can help identify robust causal predictions. ootstrapping is more successful in ranking causal predictions (by a small margin). Ranking by marginals can become much faster by caching tests of independence. Try it for much larger data sets (number of variables). ombine with bootstrapping. Try to identify a marginal that maximizes marginal consistency. Try to identify a graph that is consistent for more marginals.
References 1. iego olombo and Marloes H Maathuis. Order-independent constraint-based causal structural learning. JMLR 2014. 2. Markus Kalisch, Martin Mächler, iego olombo, Marloes H Maathuis, and Peter ühlmann. ausal inference using graphical models with the R package pcalg. JSS 2012. 3. Joseph Ramsey, Jiji Zhang, and Peter Spirtes. djacency-faithfulness and conservative causal inference UI 2006. 4. Peter Spirtes, lark Glymour, and Richard Scheines. ausation, Prediction,and Search. MIT Press, 2000.