Towards an extension of the PC algorithm to local context-specific independencies detection

Towards an extension of the PC algorithm to local context-specific independencies detection Feb-09-2016

Outline Background: Bayesian Networks The PC algorithm Context-specific independence: from DAGs to LDAGs The PSPC algorithm

Background Bayesian Networks (BNs), are a powerful tool for the construction of multivariate distributions from univariate independent components B = (G, P ) G being a Directed Acyclic Graph (DAG) P being a probability distribution factorizing according to G (Hammersley, Clifford 1971)

Background Each variable is conditionally independent of all its nondescendants in the graph given the value of all its parents: P(V) = P(X 1,..., X d ) = d P(X i pa(x i )) i=1 Main assumptions: Causal Markov Condition (CMC) Causal Faithfulness Condition (CFC) Computationally more efficient: d local small CPTs, V = d

Background Some fields of applications: Probabilistic expert systems Decision analysis Causality Data mining Complex statistical models

Background

Background G = (V, E) V = (X 1,..., X d ) r.v.s as nodes in the graph E V V (i, j) E representing (conditional) dependence among variables X i and X j

Background A toy example... Parents pa(d) = {A, B} Children ch(d) = {E} Non-descendants nd(d) = {A, B, C} V-structures A B A B D D is a collider

Background V = {A, B, C, D, E} P (V ) = P (A, B, C, D, E) = P (A)P (C A)P (B)P (D A, B)P (D E) = P (A, C)P (A, B, C)P (D, E) P (A)P (D)

Background Markov Equivalence Classes {C A D} P {C A D} = C D A

Background P (A, B, C) = P (A, B)P (B, C) P (B)

Background P (A, B, C) = P (A)P (C)P (B A, C)

Background {A D E} = A E D

Background Learning and inference on BNs (Koller, Friedman 2009): Structure Learning*: Search-and-Score (or Bayesian) approach Constraint-based approach* Parameter Estimation ML estimation, Bayesian estimation Inference Variable elimination, Belief Propagation, MAP Estimation, Sampling methods

The PC algorithm Spirtes P, Glymour C, Scheines R (1993, 1st ed.) Causally Sufficient setting: V = O, H S = Sound and complete under i) Consistency of CI statistical tests ii) CMC, CFC

The PC algorithm Input: V, oracle/sample knowledge on the pattern of independencies among variables S1 S2 S3 S4 Output: A Completed Partially Directed Acyclic Graph (CPDAG) is returned, definining a Markov Equivalence Class

The PC algorithm S1: G := complete undirected graph over V S2: The skeleton of G is inferred and a list M of unshielded triples is returned Lemma 1 (Zhang, Spirtes 2008, Spirtes et al. 2000): X Adj(Y ; G) iff S V\{X, Y } s.t. X Y S

The PC algorithm S3: < X, Y, Z > in M is eventually oriented as a v-structure according to: Lemma 2 (Zhang, Spirtes 2008, Spirtes et al. 2000): In a DAG G, given any unshielded triple < X, Y, Z >, Y is a collider iff S s.t. X Z S, Y S; Y is a non-collider iff S s.t. X Z S, Y / S S4: As many unoriented edges as possible are oriented according to the orientation rules provided by Zhang (2008)

The PC algorithm Conservative PC algorithm (CPC, Ramsey et al. 2013) S3 S3 S4 S4 (see [2] for details) CFC is relaxed Output is an e-pattern where unfaithful triples* are allowed P M P are represented by the same e-pattern! *Triples which are not qualified as a v-structures or Markov chains

CSI Conditional Independence (CI): Let X, Y, Z be pairwise disjoint subsets of V, X is conditionally independent of Y given Z, if (x, y, z) V al(x) V al(y ) V al(z) whenever P (y, z) > 0 P (x y, z) = P (x z) X Y Z

CSI CI: X Y Z P (x y, z) = P (x z), wheneverp (y, z) > 0 Context-Specific Conditional Independence (CSI, Boutilier 1996): Let X, Y, Z, C be pairwise disjoint subsets of V, X is conditionally independent of Y given Z in context C = c, where c V al(c), if it holds that (x, y, z) V al(x) V al(y ) V al(z) whenever P (y, z, c) > 0 P (x y, z, c) = P (x z, c) X Y Z, c

CSI CI: X Y Z P (x y, z) = P (x z), wheneverp (y, z) > 0 CSI: X Y Z, c P (x y, z, c) = P (x z, c), wheneverp (y, z, c) > 0 Local CSI: X and Y are CSI given C = c X and C define a partition of pa(y )

CSI CI: X Y Z P (x y, z) = P (x z), wheneverp (y, z) > 0 CSI: X Y Z, c P (x y, z, c) = P (x z, c), wheneverp (y, z, c) > 0 Local CSI e.g. (Zhang, 1998) X: Weather, Y : Income, C: Profession

From Local CSIs to LDAGs Labelled Directed Acyclic Graphs (LDAGs, Pensar et al. 2014) account for Local CSIs: G L = (V, E, L E ), where V is the set of nodes, corresponding to the set of r.v.s E is the set of oriented edges, (i, j) E iff X i pa(x j ) L E is the set of all labels, L E = (i,j) E L (i,j)

LDAGs e.g. (Pensar 2014) G L = (V, E, L E ), V = {1, 2, 3, 4}, E = {(2, 1), (3, 1), (4, 1)} L 2,1 = (0, 1) X 1 X 2 (X 3, X 4 ) = (0, 1) L 4,1 = (, 1) = V al(x 2 ) {1} X 1 X 4 X 2, X 3 = 1

Extending the PC algorithm CSPC algorithm for undirected log-linear models (Edera et al., 2013) PSPC algorithm for LDAG models

Extending the PC algorithm Input: V, oracle/sample knowledge on the pattern of independencies among variables S1 S2 S3 S4 Unmark the unfaithful triples CSeek routine (+ Orient Parents) Output (Best case scenario): A Completed Partially Labelled Directed Acyclic Graph (CPLDAG) is returned, definining a CSI-Equivalence Class (see Pensar et al., 2014)

Extending the PC algorithm: the CSeek routine

Discussion and future work Consistency and generalizations Assumptions (CSC, CI tests, from CMC+CFC to CMC+AFC to CMC+TFC to...) and related issues Computational efficiency Idea: CSeek routine applied to unfaithful triples only according to some threshold Development of the algorithm and applications Efficient inference on BNs with LDAGs (Zhang, Poole 1998, Poole 2003)

References Boutilier, Craig, et al., Context-specific independence in Bayesian networks, Proceedings of the Twelfth international conference on Uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., 1996 Edera, Alejandro, Federico Schluter, and Facundo Bromberg, Learning Markov networks with context-specific independences, Tools with Artificial Intelligence (ICTAI), 2013 IEEE 25th International Conference on. IEEE, 2013 Isozaki, Takashi, A robust causal discovery algorithm against faithfulness violation, Information and Media Technologies 9.1 (2014): 121-131 Kalisch, Markus, and Peter Buhlmann, Estimating high-dimensional directed acyclic graphs with the PC-algorithm, The Journal of Machine Learning Research 8 (2007): 613-636 Kalisch, Markus, and Peter Buhlmann, Robustification of the PC-algorithm for Directed Acyclic Graphs, Journal of Computational and Graphical Statistics 17.4 (2008): 773-789

References Koller, Daphne, and Nir Friedman, Probabilistic graphical models: principles and techniques, MIT press, 2009 Lemeire, Jan, Stijn Meganck, and Francesco Cartella, Robust independence-based causal structure learning in absence of adjacency faithfulness, on Probabilistic Graphical Models (2010): 169 Pensar, Johan, et al.,labeled directed acyclic graphs: a generalization of context-specific independence in directed graphical models, Data Mining and Knowledge Discovery 29.2 (2015): 503-533 Poole, David, and Nevin Lianwen Zhang, Exploiting contextual independence in probabilistic inference, J. Artif. Intell. Res.(JAIR) 18 (2003): 263-313 Ramsey, Joseph, Jiji Zhang, and Peter L. Spirtes, Adjacency-faithfulness and conservative causal inference, arxiv preprint arxiv:1206.6843 (2012)

References Spirtes, Peter, Clark N. Glymour, and Richard Scheines, Causation, prediction, and search, MIT press, 2000 Zhang, Jiji, and Peter Spirtes, Strong faithfulness and uniform consistency in causal inference, Proceedings of the nineteenth conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., 2002 Zhang, Jiji, and Peter Spirtes, Detection of unfaithfulness and robust causal inference, Minds and Machines 18.2 (2008): 239-271 Zhang, Nevin Lianwen, Inference in bayesian networks: the role of context specific independence, (1998) Zhang, Nevin Lianwen, and David Poole, On the role of context-specific independence in probabilistic inference, IJCAI-99: Proceedings of the 16th International Joint Conference on Artificial intelligence, Vols 1,2 (1999)

ADDITIONAL FEATURES

[1] Background: Main assumptions 1/3 Causal Markov Condition (CMC): Given a set of (causally sufficient) r.v.s V whose causal structure is represented by a DAG G, X G nd(x) pa(x) X nd(x) pa(x) (1) P is Markov to G whenever (1) holds G is an I-map of P whenever (1) holds

[1] Background: Main assumptions 2/3 Causal Faithfulness Condition (CFC): Given a set of (causally sufficient) r.v.s V whose causal structure is represented by a DAG G, the joint probability distribution P(V) is faithful to G if it holds that If CMC does not entail X Y S then X will be dependent on Y conditional on S in P

[1]Background: Main assumptions 3/3 Two observations on the CFC assumptions: It follows that whenever CMC and CFC hold: X G nd(x) pa(x) X nd(x)\pa(x) pa(x) (2) P is faithful to G whenever (2) holds G is a perfect I-map of P whenever (2) holds Lebesgue measure zero argument (Meek, 1995): not too restrictive!

[2] PC algorithm continued 1/3 Given pointwise consistent statistical tests for the independence among variables, the PC procedure is pointwise consistent under CMC and CFC. Uniform consistency? CFC λ-strong CFC (λ-sfc) (provided uniformly consistent statistical tests) Robins et al. (2003, 2006) on CFC s decomposability Isozaki (2014) on weak CFC test-related violations Complexity bounded by d 2 (d 1) k 1 \(k 1)!, k - maximal degree of connectivity for any vertex

[2] PC algorithm continued 2/3 S3 Let G* be the graph resulting from S1+S2 and M be the list of unshielded triples. For each?x,y,z? in M, for every S Adj(X; G*), Adj(Y; G*) Adj(Z; G*): If S s.t. X Z S, Y / S then X Y Z := X Y Z If S s.t. X Z S, Y S then leave the triple unmarked Otherwise, mark the triple as unfaithful: X Y Z := X Y Z S4 Orientation rules that are applied to unoriented unshielded triples only

[2] PC algorithm continued 3/3 e-patterns: A DAG G is represented by an e-pattern e-g if (i) A Adj(B; e-g) corresponds to A Adj(B, G) (ii) A B in G is marked as A B in e-g (iii) The colliders in G are either marked as such or as part of an unfaithful triple in e-g

[3] λ-strong CFC 1/3 Gaussian setting (Zhang, Spirtes 2003, Uhler et al. 2013)) Discrete setting* (Rudas et al. 2015) Parametrization! (Many variations to be considered)

[3] λ-strong CFC 2/3 e.g. (Rudas et al. 2015) Variation dependent case V = {A, B} set of 2 binary r.v.s parametrized as cell probabilities within 3 (2x2 CPT) φ 1 log-odds ratio, φ 2 Yule s coefficient as measures of association Given λ > 0, P is λ-sfc to G whenever φ 1 = log(p 00p 11 ) p 01 p 10 > λ or φ 2 = p 00 p 11 p 01 p 10 p 00 p 11 + p 01 p 10 > λ

[3] λ-strong CFC 3/3 e.g. (Rudas et al. 2015) Variation independent case V = {A, B} set of 2 binary r.v.s parametrized as conditional probabilities within (0, 1) 3 (2x2 CPT), with θ 1 = P (A = 0), θ 2 = P (B = 0 A = 0), θ 3 = P (B = 0 A = 1) φ 3 absolute difference between conditional probabilities as measure of association: Given λ > 0, P is λ-sfc to G whenever φ 3 = θ 2 θ 3 > λ

[4] Properties of LDAGs 1/3 Labelled Directed Acyclic Graphs (LDAGs, Pensar et al. 2014) account for Local CSIs: G L = (V, E, L E ), where V is the set of nodes, corresponding to the set of r.v.s E is the set of oriented edges, (i, j) E iff X i pa(x j ) L E is the set of all labels, L E = (i,j) E L (i,j) L(i,j) being a list of configurations of L (i,j) = pa(x j )\X i : x L(i,j) L (i,j) x L(i,j) V al(l (i,j) ) is s.t. X j X i L (i,j) = x L(i,j)

[4] Properties of LDAGs 2/3 Maximality Regularity CSI-faithfulness* (CS-LDAG: G L (x C )) CSI-equivalence: G L = (V, E, L E ) and G L = (V, E, L E) belong to the same CSI-equivalence if GL and G L share the same skeleton G(x V ) and G (x V ) are Markov equivalent x v V al(v ) If x V V al(v ) s.t. no label in both L E and L E is satisfied, G and G are Markov equivalent

[4] Properties of LDAGs 3/3 CSI-faithfulness Def. Let B be a BN and let B(c) be the model instatiated to context C=c. If are not d-separated by Z in B and they are d-separated by Z in B(c) then they are CSI-separated by Z given context C=c in B, namely X GC=c Y Z X G Y Z, C = c CSI-CFC of P to G follows from CSI-separation (Boutilier et al. 1996) Context-specific Hammersley-Clifford theorem (Edera et al. 2013)

[5] Further definitions: Markov Equivalence Classes Def. Two DAGs belong to the same Markov Equivalence class (ME class) whenever they entail the same conditional independence relations among the observed variables G = (V, E ), G = (V, E ) s.t. P (V ; G ) M P (V ; G ) Elements of a ME class are represented by means of a Partially oriented DAG (PDAG, or by a Completed Partially oriented DAG, CPDAG)

[5] D-separation A path u =< X,..., Y > is blocked by some subset Z V \{X, Y } if either Z is on u and no element of Z is a collider on u u contains a collider W and W Z de(z) Def. X and Y are d-separated by Z iff Z blocks all paths between X and Y