Towards an extension of the PC algorithm to local context-specific independencies detection

Similar documents
Learning Multivariate Regression Chain Graphs under Faithfulness

COMP538: Introduction to Bayesian Networks

Interpreting and using CPDAGs with background knowledge

Arrowhead completeness from minimal conditional independencies

Introduction to Probabilistic Graphical Models

Learning Causal Bayesian Networks from Observations and Experiments: A Decision Theoretic Approach

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

Causality in Econometrics (3)

CSC 412 (Lecture 4): Undirected Graphical Models

Undirected Graphical Models: Markov Random Fields

Identifiability assumptions for directed graphical models with feedback

Directed and Undirected Graphical Models

Causal Inference & Reasoning with Causal Bayesian Networks

Inferring the Causal Decomposition under the Presence of Deterministic Relations.

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute BW/WI & Institute for Computer Science, University of Hildesheim

CS Lecture 3. More Bayesian Networks

Causal Inference on Data Containing Deterministic Relations.

Abstract. Three Methods and Their Limitations. N-1 Experiments Suffice to Determine the Causal Relations Among N Variables

Learning in Bayesian Networks

Using Bayesian Network Representations for Effective Sampling from Generative Network Models

Lecture 4 October 18th

Respecting Markov Equivalence in Computing Posterior Probabilities of Causal Graphical Features

Learning causal network structure from multiple (in)dependence models

Faithfulness of Probability Distributions and Graphs

Rapid Introduction to Machine Learning/ Deep Learning

Directed Graphical Models or Bayesian Networks

Supplementary material to Structure Learning of Linear Gaussian Structural Equation Models with Weak Edges

CAUSAL MODELS: THE MEANINGFUL INFORMATION OF PROBABILITY DISTRIBUTIONS

Undirected Graphical Models

Probabilistic Graphical Models (I)

A Decision Theoretic View on Choosing Heuristics for Discovery of Graphical Models

An Introduction to Bayesian Machine Learning

An Efficient Bayesian Network Structure Learning Algorithm in the Presence of Deterministic Relations

Representation of undirected GM. Kayhan Batmanghelich

Using Bayesian Network Representations for Effective Sampling from Generative Network Models

PCSI-labeled Directed Acyclic Graphs

Graphical Models and Independence Models

arxiv: v3 [stat.me] 3 Jun 2015

3 : Representation of Undirected GM

Marginal consistency of constraint-based causal learning

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

Learning Semi-Markovian Causal Models using Experiments

Outline. Spring It Introduction Representation. Markov Random Field. Conclusion. Conditional Independence Inference: Variable elimination

Detecting marginal and conditional independencies between events and learning their causal structure.

Robustification of the PC-algorithm for Directed Acyclic Graphs

Learning equivalence classes of acyclic models with latent and selection variables from multiple datasets with overlapping variables

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

Undirected Graphical Models

Tópicos Especiais em Modelagem e Análise - Aprendizado por Máquina CPS863

CS839: Probabilistic Graphical Models. Lecture 2: Directed Graphical Models. Theo Rekatsinas

Graphical Models and Kernel Methods

Probabilistic Graphical Networks: Definitions and Basic Results

arxiv: v6 [math.st] 3 Feb 2018

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Measurement Error and Causal Discovery

Lecture 6: Graphical Models

Bayesian Networks to design optimal experiments. Davide De March

arxiv: v4 [math.st] 19 Jun 2018

Causal Effect Identification in Alternative Acyclic Directed Mixed Graphs

Using background knowledge for the estimation of total causal e ects

STA 4273H: Statistical Machine Learning

Probabilistic Graphical Models

Identifiability of Gaussian structural equation models with equal error variances

Lecture 6: Graphical Models: Learning

Junction Tree, BP and Variational Methods

BN Semantics 3 Now it s personal! Parameter Learning 1

Cheng Soon Ong & Christian Walder. Canberra February June 2018

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Bayesian Networks: Representation, Variable Elimination

Graphical models. Sunita Sarawagi IIT Bombay

Machine Learning Summer School

Automatic Causal Discovery

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

Learning With Bayesian Networks. Markus Kalisch ETH Zürich

Learning the Structure of Linear Latent Variable Models

Learning Marginal AMP Chain Graphs under Faithfulness

On Learning Causal Models from Relational Data

The Role of Assumptions in Causal Discovery

Causal Reasoning with Ancestral Graphs

Expectation Propagation in Factor Graphs: A Tutorial

COMP538: Introduction to Bayesian Networks

BN Semantics 3 Now it s personal!

Causal Inference for High-Dimensional Data. Atlantic Causal Conference

Directed and Undirected Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models

Causal Structure Learning and Inference: A Selective Review

Statistical Approaches to Learning and Discovery

Data Mining 2018 Bayesian Networks (1)

Bayesian Networks BY: MOHAMAD ALSABBAGH

Causal Bayesian networks. Peter Antal

Being Bayesian About Network Structure:

Artificial Intelligence Bayes Nets: Independence

Causal Inference in the Presence of Latent Variables and Selection Bias

GEOMETRY OF THE FAITHFULNESS ASSUMPTION IN CAUSAL INFERENCE 1

Directed Graphical Models

Total positivity in Markov structures

Review: Directed Models (Bayes Nets)

Equivalence in Non-Recursive Structural Equation Models

Causal Models with Hidden Variables

CPSC 540: Machine Learning

Transcription:

Towards an extension of the PC algorithm to local context-specific independencies detection Feb-09-2016

Outline Background: Bayesian Networks The PC algorithm Context-specific independence: from DAGs to LDAGs The PSPC algorithm

Background Bayesian Networks (BNs), are a powerful tool for the construction of multivariate distributions from univariate independent components B = (G, P ) G being a Directed Acyclic Graph (DAG) P being a probability distribution factorizing according to G (Hammersley, Clifford 1971)

Background Each variable is conditionally independent of all its nondescendants in the graph given the value of all its parents: P(V) = P(X 1,..., X d ) = d P(X i pa(x i )) i=1 Main assumptions: Causal Markov Condition (CMC) Causal Faithfulness Condition (CFC) Computationally more efficient: d local small CPTs, V = d

Background Some fields of applications: Probabilistic expert systems Decision analysis Causality Data mining Complex statistical models

Background

Background G = (V, E) V = (X 1,..., X d ) r.v.s as nodes in the graph E V V (i, j) E representing (conditional) dependence among variables X i and X j

Background A toy example... Parents pa(d) = {A, B} Children ch(d) = {E} Non-descendants nd(d) = {A, B, C} V-structures A B A B D D is a collider

Background V = {A, B, C, D, E} P (V ) = P (A, B, C, D, E) = P (A)P (C A)P (B)P (D A, B)P (D E) = P (A, C)P (A, B, C)P (D, E) P (A)P (D)

Background Markov Equivalence Classes {C A D} P {C A D} = C D A

Background P (A, B, C) = P (A, B)P (B, C) P (B)

Background P (A, B, C) = P (A)P (C)P (B A, C)

Background {A D E} = A E D

Background Learning and inference on BNs (Koller, Friedman 2009): Structure Learning*: Search-and-Score (or Bayesian) approach Constraint-based approach* Parameter Estimation ML estimation, Bayesian estimation Inference Variable elimination, Belief Propagation, MAP Estimation, Sampling methods

The PC algorithm Spirtes P, Glymour C, Scheines R (1993, 1st ed.) Causally Sufficient setting: V = O, H S = Sound and complete under i) Consistency of CI statistical tests ii) CMC, CFC

The PC algorithm Input: V, oracle/sample knowledge on the pattern of independencies among variables S1 S2 S3 S4 Output: A Completed Partially Directed Acyclic Graph (CPDAG) is returned, definining a Markov Equivalence Class

The PC algorithm S1: G := complete undirected graph over V S2: The skeleton of G is inferred and a list M of unshielded triples is returned Lemma 1 (Zhang, Spirtes 2008, Spirtes et al. 2000): X Adj(Y ; G) iff S V\{X, Y } s.t. X Y S

The PC algorithm S3: < X, Y, Z > in M is eventually oriented as a v-structure according to: Lemma 2 (Zhang, Spirtes 2008, Spirtes et al. 2000): In a DAG G, given any unshielded triple < X, Y, Z >, Y is a collider iff S s.t. X Z S, Y S; Y is a non-collider iff S s.t. X Z S, Y / S S4: As many unoriented edges as possible are oriented according to the orientation rules provided by Zhang (2008)

The PC algorithm Conservative PC algorithm (CPC, Ramsey et al. 2013) S3 S3 S4 S4 (see [2] for details) CFC is relaxed Output is an e-pattern where unfaithful triples* are allowed P M P are represented by the same e-pattern! *Triples which are not qualified as a v-structures or Markov chains

CSI Conditional Independence (CI): Let X, Y, Z be pairwise disjoint subsets of V, X is conditionally independent of Y given Z, if (x, y, z) V al(x) V al(y ) V al(z) whenever P (y, z) > 0 P (x y, z) = P (x z) X Y Z

CSI CI: X Y Z P (x y, z) = P (x z), wheneverp (y, z) > 0 Context-Specific Conditional Independence (CSI, Boutilier 1996): Let X, Y, Z, C be pairwise disjoint subsets of V, X is conditionally independent of Y given Z in context C = c, where c V al(c), if it holds that (x, y, z) V al(x) V al(y ) V al(z) whenever P (y, z, c) > 0 P (x y, z, c) = P (x z, c) X Y Z, c

CSI CI: X Y Z P (x y, z) = P (x z), wheneverp (y, z) > 0 CSI: X Y Z, c P (x y, z, c) = P (x z, c), wheneverp (y, z, c) > 0 Local CSI: X and Y are CSI given C = c X and C define a partition of pa(y )

CSI CI: X Y Z P (x y, z) = P (x z), wheneverp (y, z) > 0 CSI: X Y Z, c P (x y, z, c) = P (x z, c), wheneverp (y, z, c) > 0 Local CSI e.g. (Zhang, 1998) X: Weather, Y : Income, C: Profession

CSI CI: X Y Z P (x y, z) = P (x z), wheneverp (y, z) > 0 CSI: X Y Z, c P (x y, z, c) = P (x z, c), wheneverp (y, z, c) > 0 Local CSI e.g. (Zhang, 1998) X: Weather, Y : Income, C: Profession

From Local CSIs to LDAGs Labelled Directed Acyclic Graphs (LDAGs, Pensar et al. 2014) account for Local CSIs: G L = (V, E, L E ), where V is the set of nodes, corresponding to the set of r.v.s E is the set of oriented edges, (i, j) E iff X i pa(x j ) L E is the set of all labels, L E = (i,j) E L (i,j)

LDAGs e.g. (Pensar 2014) G L = (V, E, L E ), V = {1, 2, 3, 4}, E = {(2, 1), (3, 1), (4, 1)} L 2,1 = (0, 1) X 1 X 2 (X 3, X 4 ) = (0, 1) L 4,1 = (, 1) = V al(x 2 ) {1} X 1 X 4 X 2, X 3 = 1

Extending the PC algorithm CSPC algorithm for undirected log-linear models (Edera et al., 2013) PSPC algorithm for LDAG models

Extending the PC algorithm Input: V, oracle/sample knowledge on the pattern of independencies among variables S1 S2 S3 S4 Unmark the unfaithful triples CSeek routine (+ Orient Parents) Output (Best case scenario): A Completed Partially Labelled Directed Acyclic Graph (CPLDAG) is returned, definining a CSI-Equivalence Class (see Pensar et al., 2014)

Extending the PC algorithm: the CSeek routine

Discussion and future work Consistency and generalizations Assumptions (CSC, CI tests, from CMC+CFC to CMC+AFC to CMC+TFC to...) and related issues Computational efficiency Idea: CSeek routine applied to unfaithful triples only according to some threshold Development of the algorithm and applications Efficient inference on BNs with LDAGs (Zhang, Poole 1998, Poole 2003)

References Boutilier, Craig, et al., Context-specific independence in Bayesian networks, Proceedings of the Twelfth international conference on Uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., 1996 Edera, Alejandro, Federico Schluter, and Facundo Bromberg, Learning Markov networks with context-specific independences, Tools with Artificial Intelligence (ICTAI), 2013 IEEE 25th International Conference on. IEEE, 2013 Isozaki, Takashi, A robust causal discovery algorithm against faithfulness violation, Information and Media Technologies 9.1 (2014): 121-131 Kalisch, Markus, and Peter Buhlmann, Estimating high-dimensional directed acyclic graphs with the PC-algorithm, The Journal of Machine Learning Research 8 (2007): 613-636 Kalisch, Markus, and Peter Buhlmann, Robustification of the PC-algorithm for Directed Acyclic Graphs, Journal of Computational and Graphical Statistics 17.4 (2008): 773-789

References Koller, Daphne, and Nir Friedman, Probabilistic graphical models: principles and techniques, MIT press, 2009 Lemeire, Jan, Stijn Meganck, and Francesco Cartella, Robust independence-based causal structure learning in absence of adjacency faithfulness, on Probabilistic Graphical Models (2010): 169 Pensar, Johan, et al.,labeled directed acyclic graphs: a generalization of context-specific independence in directed graphical models, Data Mining and Knowledge Discovery 29.2 (2015): 503-533 Poole, David, and Nevin Lianwen Zhang, Exploiting contextual independence in probabilistic inference, J. Artif. Intell. Res.(JAIR) 18 (2003): 263-313 Ramsey, Joseph, Jiji Zhang, and Peter L. Spirtes, Adjacency-faithfulness and conservative causal inference, arxiv preprint arxiv:1206.6843 (2012)

References Spirtes, Peter, Clark N. Glymour, and Richard Scheines, Causation, prediction, and search, MIT press, 2000 Zhang, Jiji, and Peter Spirtes, Strong faithfulness and uniform consistency in causal inference, Proceedings of the nineteenth conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., 2002 Zhang, Jiji, and Peter Spirtes, Detection of unfaithfulness and robust causal inference, Minds and Machines 18.2 (2008): 239-271 Zhang, Nevin Lianwen, Inference in bayesian networks: the role of context specific independence, (1998) Zhang, Nevin Lianwen, and David Poole, On the role of context-specific independence in probabilistic inference, IJCAI-99: Proceedings of the 16th International Joint Conference on Artificial intelligence, Vols 1,2 (1999)

ADDITIONAL FEATURES

[1] Background: Main assumptions 1/3 Causal Markov Condition (CMC): Given a set of (causally sufficient) r.v.s V whose causal structure is represented by a DAG G, X G nd(x) pa(x) X nd(x) pa(x) (1) P is Markov to G whenever (1) holds G is an I-map of P whenever (1) holds

[1] Background: Main assumptions 2/3 Causal Faithfulness Condition (CFC): Given a set of (causally sufficient) r.v.s V whose causal structure is represented by a DAG G, the joint probability distribution P(V) is faithful to G if it holds that If CMC does not entail X Y S then X will be dependent on Y conditional on S in P

[1]Background: Main assumptions 3/3 Two observations on the CFC assumptions: It follows that whenever CMC and CFC hold: X G nd(x) pa(x) X nd(x)\pa(x) pa(x) (2) P is faithful to G whenever (2) holds G is a perfect I-map of P whenever (2) holds Lebesgue measure zero argument (Meek, 1995): not too restrictive!

[2] PC algorithm continued 1/3 Given pointwise consistent statistical tests for the independence among variables, the PC procedure is pointwise consistent under CMC and CFC. Uniform consistency? CFC λ-strong CFC (λ-sfc) (provided uniformly consistent statistical tests) Robins et al. (2003, 2006) on CFC s decomposability Isozaki (2014) on weak CFC test-related violations Complexity bounded by d 2 (d 1) k 1 \(k 1)!, k - maximal degree of connectivity for any vertex

[2] PC algorithm continued 2/3 S3 Let G* be the graph resulting from S1+S2 and M be the list of unshielded triples. For each?x,y,z? in M, for every S Adj(X; G*), Adj(Y; G*) Adj(Z; G*): If S s.t. X Z S, Y / S then X Y Z := X Y Z If S s.t. X Z S, Y S then leave the triple unmarked Otherwise, mark the triple as unfaithful: X Y Z := X Y Z S4 Orientation rules that are applied to unoriented unshielded triples only

[2] PC algorithm continued 3/3 e-patterns: A DAG G is represented by an e-pattern e-g if (i) A Adj(B; e-g) corresponds to A Adj(B, G) (ii) A B in G is marked as A B in e-g (iii) The colliders in G are either marked as such or as part of an unfaithful triple in e-g

[3] λ-strong CFC 1/3 Gaussian setting (Zhang, Spirtes 2003, Uhler et al. 2013)) Discrete setting* (Rudas et al. 2015) Parametrization! (Many variations to be considered)

[3] λ-strong CFC 2/3 e.g. (Rudas et al. 2015) Variation dependent case V = {A, B} set of 2 binary r.v.s parametrized as cell probabilities within 3 (2x2 CPT) φ 1 log-odds ratio, φ 2 Yule s coefficient as measures of association Given λ > 0, P is λ-sfc to G whenever φ 1 = log(p 00p 11 ) p 01 p 10 > λ or φ 2 = p 00 p 11 p 01 p 10 p 00 p 11 + p 01 p 10 > λ

[3] λ-strong CFC 3/3 e.g. (Rudas et al. 2015) Variation independent case V = {A, B} set of 2 binary r.v.s parametrized as conditional probabilities within (0, 1) 3 (2x2 CPT), with θ 1 = P (A = 0), θ 2 = P (B = 0 A = 0), θ 3 = P (B = 0 A = 1) φ 3 absolute difference between conditional probabilities as measure of association: Given λ > 0, P is λ-sfc to G whenever φ 3 = θ 2 θ 3 > λ

[4] Properties of LDAGs 1/3 Labelled Directed Acyclic Graphs (LDAGs, Pensar et al. 2014) account for Local CSIs: G L = (V, E, L E ), where V is the set of nodes, corresponding to the set of r.v.s E is the set of oriented edges, (i, j) E iff X i pa(x j ) L E is the set of all labels, L E = (i,j) E L (i,j) L(i,j) being a list of configurations of L (i,j) = pa(x j )\X i : x L(i,j) L (i,j) x L(i,j) V al(l (i,j) ) is s.t. X j X i L (i,j) = x L(i,j)

[4] Properties of LDAGs 2/3 Maximality Regularity CSI-faithfulness* (CS-LDAG: G L (x C )) CSI-equivalence: G L = (V, E, L E ) and G L = (V, E, L E) belong to the same CSI-equivalence if GL and G L share the same skeleton G(x V ) and G (x V ) are Markov equivalent x v V al(v ) If x V V al(v ) s.t. no label in both L E and L E is satisfied, G and G are Markov equivalent

[4] Properties of LDAGs 3/3 CSI-faithfulness Def. Let B be a BN and let B(c) be the model instatiated to context C=c. If are not d-separated by Z in B and they are d-separated by Z in B(c) then they are CSI-separated by Z given context C=c in B, namely X GC=c Y Z X G Y Z, C = c CSI-CFC of P to G follows from CSI-separation (Boutilier et al. 1996) Context-specific Hammersley-Clifford theorem (Edera et al. 2013)

[5] Further definitions: Markov Equivalence Classes Def. Two DAGs belong to the same Markov Equivalence class (ME class) whenever they entail the same conditional independence relations among the observed variables G = (V, E ), G = (V, E ) s.t. P (V ; G ) M P (V ; G ) Elements of a ME class are represented by means of a Partially oriented DAG (PDAG, or by a Completed Partially oriented DAG, CPDAG)

[5] D-separation A path u =< X,..., Y > is blocked by some subset Z V \{X, Y } if either Z is on u and no element of Z is a collider on u u contains a collider W and W Z de(z) Def. X and Y are d-separated by Z iff Z blocks all paths between X and Y