Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute BW/WI & Institute for Computer Science, University of Hildesheim

Similar documents
Bayesian Networks. 7. Approximate Inference / Sampling

Prof. Dr. Lars Schmidt-Thieme, L. B. Marinho, K. Buza Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course

Bayesian Networks. B. Inference / B.3 Approximate Inference / Sampling

Towards an extension of the PC algorithm to local context-specific independencies detection

Inferring the Causal Decomposition under the Presence of Deterministic Relations.

Introduction to Probabilistic Graphical Models

Directed and Undirected Graphical Models

COMP538: Introduction to Bayesian Networks

arxiv: v1 [stat.ml] 23 Oct 2016

Abstract. Three Methods and Their Limitations. N-1 Experiments Suffice to Determine the Causal Relations Among N Variables

Bayesian Networks. 10. Parameter Learning / Missing Values

Rapid Introduction to Machine Learning/ Deep Learning

A Decision Theoretic View on Choosing Heuristics for Discovery of Graphical Models

Causal Inference on Data Containing Deterministic Relations.

Respecting Markov Equivalence in Computing Posterior Probabilities of Causal Graphical Features

Learning Multivariate Regression Chain Graphs under Faithfulness

Faithfulness of Probability Distributions and Graphs

Automatic Causal Discovery

Learning Causal Bayesian Networks from Observations and Experiments: A Decision Theoretic Approach

Lecture 4 October 18th

Equivalence in Non-Recursive Structural Equation Models

Directed Graphical Models or Bayesian Networks

Causal Models with Hidden Variables

Improving Search-Based Inference in Bayesian Networks

Learning in Bayesian Networks

Augmented Bayesian Networks for Representing Information Equivalences and Extensions to the PC Learning Algorithm.

Monotone DAG Faithfulness: A Bad Assumption

Learning Marginal AMP Chain Graphs under Faithfulness

Causal Inference & Reasoning with Causal Bayesian Networks

Learning Semi-Markovian Causal Models using Experiments

1. what conditional independencies are implied by the graph. 2. whether these independecies correspond to the probability distribution

On the Identification of a Class of Linear Models

Markovian Combination of Decomposable Model Structures: MCMoSt

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

On Learning Causal Models from Relational Data

Separable and Transitive Graphoids

Introduction to Probabilistic Graphical Models

Arrowhead completeness from minimal conditional independencies

Causal Effect Identification in Alternative Acyclic Directed Mixed Graphs

Fast and Accurate Causal Inference from Time Series Data

Causality in Econometrics (3)

Undirected Graphical Models: Markov Random Fields

BN Semantics 3 Now it s personal! Parameter Learning 1

Summary of the Bayes Net Formalism. David Danks Institute for Human & Machine Cognition

Data Mining 2018 Bayesian Networks (1)

Università degli Studi di Modena e Reggio Emilia. On Block Ordering of Variables in Graphical Modelling. Alberto Roverato Luca La Rocca

Model Complexity of Pseudo-independent Models

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Supplementary material to Structure Learning of Linear Gaussian Structural Equation Models with Weak Edges

STA 4273H: Statistical Machine Learning

Branch and Bound for Regular Bayesian Network Structure Learning

Reasoning Under Uncertainty: Conditioning, Bayes Rule & the Chain Rule

Bayesian Networks BY: MOHAMAD ALSABBAGH

Learning Bayesian Networks Does Not Have to Be NP-Hard

Applying Bayesian networks in the game of Minesweeper

The Role of Assumptions in Causal Discovery

4.1 Notation and probability review

An Introduction to Bayesian Machine Learning

3 : Representation of Undirected GM

Parameter Learning: Binary Variables

Identifying the irreducible disjoint factors of a multivariate probability distribution

Bayesian network modeling. 1

Learning With Bayesian Networks. Markus Kalisch ETH Zürich

Machine Learning Summer School

CS 5522: Artificial Intelligence II

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

Robustification of the PC-algorithm for Directed Acyclic Graphs

Tópicos Especiais em Modelagem e Análise - Aprendizado por Máquina CPS863

Exact distribution theory for belief net responses

On Markov Properties in Evidence Theory

Estimation and Control of the False Discovery Rate of Bayesian Network Skeleton Identification

Exchangeability and Invariance: A Causal Theory. Jiji Zhang. (Very Preliminary Draft) 1. Motivation: Lindley-Novick s Puzzle

Local Characterizations of Causal Bayesian Networks

Improving the Reliability of Causal Discovery from Small Data Sets using the Argumentation Framework

Lecture 5: Bayesian Network

arxiv: v1 [cs.lg] 2 Jun 2017

CS Lecture 3. More Bayesian Networks

Belief Update in CLG Bayesian Networks With Lazy Propagation

CAUSAL MODELS: THE MEANINGFUL INFORMATION OF PROBABILITY DISTRIBUTIONS

Learning causal network structure from multiple (in)dependence models

bound on the likelihood through the use of a simpler variational approximating distribution. A lower bound is particularly useful since maximization o

Identifying Linear Causal Effects

Discovery of Pseudo-Independent Models from Data

Using Descendants as Instrumental Variables for the Identification of Direct Causal Effects in Linear SEMs

Measurement Error and Causal Discovery

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

On the Conditional Independence Implication Problem: A Lattice-Theoretic Approach

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

R eports. Confirmatory path analysis in a generalized multilevel context BILL SHIPLEY 1

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Properties of Continuous Probability Distributions The graph of a continuous probability distribution is a curve. Probability is represented by area

Lecture 6: Graphical Models: Learning

Bounding the False Discovery Rate in Local Bayesian Network Learning

Bayes Nets: Independence

An Efficient Bayesian Network Structure Learning Algorithm in the Presence of Deterministic Relations

Introduction to Artificial Intelligence. Unit # 11

Representing Independence Models with Elementary Triplets

Learning the Structure of Linear Latent Variable Models

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

Transcription:

Course on Bayesian Networks, summer term 2010 0/42

Bayesian Networks Bayesian Networks 11. Structure Learning / Constrained-based Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Business Economics and Information Systems & Institute for Computer Science University of Hildesheim http://www.ismll.uni-hildesheim.de Course on Bayesian Networks, summer term 2010 1/42

Bayesian Networks 1. Checking Probalistic Independencies 2. Markov Equivalence and DAG patterns 3. PC Algorithm Course on Bayesian Networks, summer term 2010 1/42

Bayesian Networks / 1. Checking Probalistic Independencies The Very Last Step Assume, we know the whole structure of a bn except a single edge. This edge represents a single independence statement. Check it and include edge based on outcome of that test. Course on Bayesian Networks, summer term 2010 1/42

Bayesian Networks / 1. Checking Probalistic Independencies Exact Check / Example (1/3) If X and Y are independent, then p(x, Y ) = p(x) p(y ) observed Y = 0 1 X = 0 3 6 1 1 2 observed relative frequencies p(x, Y ): Y = 0 1 X = 0 0.25 0.5 1 0.083 0.167 expected relative frequencies p(x) p(y ): Y = 0 1 X = 0 0.25 0.5 1 0.083 0.167 Course on Bayesian Networks, summer term 2010 2/42

Bayesian Networks / 1. Checking Probalistic Independencies Exact Check / Example (2/3) If X and Y are independent, then p(x, Y ) = p(x) p(y ) observed Y = 0 1 X = 0 3000 6000 1 1000 2000 observed relative frequencies p(x, Y ): Y = 0 1 X = 0 0.25 0.5 1 0.083 0.167 expected relative frequencies p(x) p(y ): Y = 0 1 X = 0 0.25 0.5 1 0.083 0.167 Course on Bayesian Networks, summer term 2010 3/42

Bayesian Networks / 1. Checking Probalistic Independencies Exact Check / Example (3/3) If X and Y are independent, then p(x, Y ) = p(x) p(y ) observed Y = 0 1 X = 0 2999 6001 1 1000 2000 observed relative frequencies p(x, Y ): Y = 0 1 X = 0 0.2499167 0.5000833 1 0.0833333 0.1666667 expected relative frequencies p(x) p(y ): Y = 0 1 X = 0 0.2499375 0.5000625 1 0.0833125 0.1666875 Course on Bayesian Networks, summer term 2010 4/42

Bayesian Networks / 1. Checking Probalistic Independencies Gamma function (repetition, see I.2) Definition 1. Gamma function Γ(a) := t a 1 e t dt converging for a > 0. Lemma 1 (Γ is generalization of factorial). (i) Γ(n) = (n 1)! for n N. (ii) Γ(a+1) Γ(a) = a. 0 gamma(x) 0 5 10 15 20 0 1 2 3 4 5 x Course on Bayesian Networks, summer term 2010 5/42

Bayesian Networks / 1. Checking Probalistic Independencies Incomplete Gamma Function Definition 2. Incomplete Gamma function x γ(a, x) := t a 1 e t dt defined for a > 0 and x [0, ]. Lemma 2. γ(a, ) = Γ(a) 0 4E5 3E5 2E5 2E5 2E5 1E5 5E4 0 2 4 a 10 6 x 8 5 10 0 15 20 20 15 10 5 0 0 5 10 γ γ(5, x) Course on Bayesian Networks, summer term 2010 6/42 x 15 20

Bayesian Networks / 1. Checking Probalistic Independencies χ 2 distribution Definition 3. chi-square distribution (χ 2 ) has density 1 p(x) := 2 df 2 Γ( df 2 ) xdf 2 1 e x 2; defined on ]0, [. Its cumulative distribution function (cdf) is: p(x < x) := γ(df 2, x 2 ) Γ( df 2 ) ; density 0.0 0.2 0.4 0.6 0.8 1.0 df=4 df=3 df=2 df=1 density 0.00 0.05 0.10 0.15 df=5 df=6 df=7 df=10 density 0.00 0.01 0.02 0.03 0.04 df=50 df=60 df=70 0 2 4 6 8 10 0 2 4 6 8 10 0 20 40 60 80 100 x x x χ 2 1, χ 2 2, χ 2 3, χ 2 4 χ 2 5, χ 2 6, χ 2 7, χ 2 10 χ 2 50, χ 2 60, χ 2 70 Course on Bayesian Networks, summer term 2010 7/42

Bayesian Networks / 1. Checking Probalistic Independencies χ 2 distribution Lemma 3. E(χ 2 (x, df)) = df A Java implementation of the incomplete gamma function (and thus of χ 2 distribution) can be found, e.g., in COLT (http://dsd.lbl.gov/ hoschek/colt/), package cern.jet.stat, class Gamma. Be careful, sometimes γ(a, x) := 1 Γ(a) or γ(a, x) := x x 0 t a 1 e t dt = 1 Γ(a) γ(a, x) (e.g., R) t a 1 e t dt = Γ(a) γ(a, x) (e.g., Maple) are referenced as incomplete gamma function. Course on Bayesian Networks, summer term 2010 8/42

Bayesian Networks / 1. Checking Probalistic Independencies Count Variables / Just 2 Variables Let X, Y be random variables, D dom(x) dom(y ) and for two values x dom(x), y dom(y ) c X=x := {d D d X = x} c Y =y := {d D d X = x} c X=x,Y =y := {d D d X = x, d Y = y} their counts. Course on Bayesian Networks, summer term 2010 9/42

Bayesian Networks / 1. Checking Probalistic Independencies χ 2 and G 2 statistics / Just 2 Variables If X, Y are independent, then and thus p(x, Y ) = p(x) p(y ) E(c X=x,Y =y c X=x, c Y =y ) = c X=x c Y =y D Then the statistics as well as χ 2 := G 2 := 2 x dom(x) y dom(y ) x dom(x) y dom(y ) (c X=x,Y =y c X=x c Y =y D ) 2 c X=x c Y =y D c X=x,Y =y ln c X=x,Y =y ) ( cx=x c Y =y D are asymptotically χ 2 -distributed with df = ( dom(x) 1) ( dom(y ) 1) degrees of freedom. Course on Bayesian Networks, summer term 2010 10/42

Bayesian Networks / 1. Checking Probalistic Independencies Testing Independency / informal Generally, the statistics have the form χ 2 = (observed expected) 2 expected G 2 = observed ln ( ) observed expected χ 2 = 0 and G 2 = 0 for exact independent variables. The larger χ 2 and G 2, the more likely / stronger the depedency between X and Y. Course on Bayesian Networks, summer term 2010 11/42

Bayesian Networks / 1. Checking Probalistic Independencies Testing Independency / more formally More formally, under the null hypothesis of independency of X and Y, the probability for χ 2 and G 2 to have the computed values (or even larger ones) is p χ 2 df (X > χ 2 ) and p χ 2 df (X > G 2 ) Let p 0 be a given threshold called significance level and often choosen as 0.05 or 0.01. If p(x > χ 2 ) < p 0, we can reject the null hypothesis and thus accept its alternative hypothesis of dependency of X and Y. i.e., add the edge between X and Y. If p(x > χ 2 ) p 0, we cannot reject the null hypothesis. Here, we then will accept the null hypothesis, i.e., not add the edge between X and Y. Course on Bayesian Networks, summer term 2010 12/42

Bayesian Networks / 1. Checking Probalistic Independencies Example 1 observed Y = 0 1 X = 0 3 6 1 1 2 margins Y = 0 1 X = 0 3 6 9 1 1 2 3 4 8 12 expected Y = 0 1 X = 0 3 6 9 1 1 2 3 4 8 12 χ 2 = G 2 = 0 and p(x > 0) = 1 Hence, for any significance level X and Y are considered independent. X Y Course on Bayesian Networks, summer term 2010 13/42

Bayesian Networks / 1. Checking Probalistic Independencies Example 2 (1/2) observed Y = 0 1 X = 0 6 1 1 2 4 margins Y = 0 1 X = 0 6 1 7 1 2 4 6 8 5 13 expected Y = 0 1 X = 0 4.31 2.69 7 1 3.69 2.31 6 8 5 13 χ 2 (6 4.31)2 = 4.31 =3.75, + (1 2.69)2 2.69 + (2 3.69)2 3.69 p χ 2 1 (X > 3.75) = 0.053 + (4 2.31)2 2.31 i.e., with a significance level of p 0 = 0.05 we would not be able to reject the null hypothesis of independency of X and Y. X Y Course on Bayesian Networks, summer term 2010 14/42

Bayesian Networks / 1. Checking Probalistic Independencies Example 2 (2/2) If we use G 2 instead of χ 2, G 2 = 3.94, p χ 2 1 (X > 3.94) = 0.047 with a significance level of p 0 = 0.05 we would have to reject the null hypothesis of independency of X and Y. Here, we then accept the alternative, depedency of X and Y. X Y Course on Bayesian Networks, summer term 2010 15/42

Bayesian Networks / 1. Checking Probalistic Independencies count variables / general case Let V be a set of random variables. We write v V as abbreviation for v dom(v). For a dataset D dom(v) and each subset X V of variables and each configuration x X of these variables let c X =x := {d D d X = x} be a (random) variable containing the frequencies of occurences of X = x in the data. Course on Bayesian Networks, summer term 2010 16/42

Bayesian Networks / 1. Checking Probalistic Independencies G 2 statistics / general case Let X, Y, Z V be three disjoint subsets of variables. If I(X, Y Z) then p(x, Z) p(y, Z) p(x, Y, Z) = p(z) and thus for each configuration x X, y Y, and z Z E(c X =x,y=y,z=z c X =x,z=z, c Y=y,Z=z ) = c X =x,z=z c Y=y,Z=z c Z=z The statistics G 2 := 2 ( ) cx =x,y=y,z=z c Z=z c X =x,y=y,z=z ln c X =x,z=z c Y=y,Z=z x X y Y z Z is asymptotically χ 2 -distributed with df = X X ( dom X 1) Y Y ( dom Y 1) Z Z dom Z degrees of freedom. Course on Bayesian Networks, summer term 2010 17/42

Bayesian Networks / 1. Checking Probalistic Independencies Recommendations Recommendations [SGS00, p. 95]: As heuristics, reduce degrees of freedom by 1 for each structural zero: df reduced := df {(x, y, z) X Y Z c X =x,y=y,z=z = 0} Use G 2 instead of χ 2. If D < 10 df, assume conditional dependency. Problems: null hypothesis is accepted if it is not rejected. (especially problematic for small samples) repeated testing. Course on Bayesian Networks, summer term 2010 18/42

Bayesian Networks 1. Checking Probalistic Independencies 2. Markov Equivalence and DAG patterns 3. PC Algorithm Course on Bayesian Networks, summer term 2010 19/42

Bayesian Networks / 2. Markov Equivalence and DAG patterns Markov-equivalence Definition 4. Let G, H be two graphs on a set V (undirected or DAGs). G and H are called markov-equivalent, if they have the same independency model, i.e. I G (X, Y Z) I H (X, Y Z), X, Y, Z V The notion of markov-equivalence for undirected graphs is uninteresting, as every undirected graph is markovequivalent only to itself (corollary of uniqueness of minimal representation!). Course on Bayesian Networks, summer term 2010 19/42

Bayesian Networks / 2. Markov Equivalence and DAG patterns Markov-equivalence Why is markov-equivalence important? 1. in structure learning, the set of all graphs over V is our search space. if we can restrict searching to equivalence classes, the search space becomes smaller. 2. if we interpret the edges of our graph as causal relationships between variables, it is of interest, which edges are necessary (i.e., occur in all instances of the equivalence class), and which edges are only possible (i.e., occur in some instances of the equivalence class, but not in some others; i.e., there are alternative explanations). Course on Bayesian Networks, summer term 2010 20/42

Bayesian Networks / 2. Markov Equivalence and DAG patterns Markov-equivalence Definition 5. Let G be a directed graph. We call a chain p 1 p 2 p 3 uncoupled if there is no edge between p 1 and p 3. Lemma 4 (markov-equivalence criterion, [PGV90]). Let G and H be two DAGs on the vertices V. G and H are markov-equivalent if and only if (i) G and H have the same links (u(g) = u(h)) and (ii) G and H have the same uncoupled head-to-head meetings. The set of uncoupled head-to-head meetings is also denoted as V-structure of G. Course on Bayesian Networks, summer term 2010 21/42

Bayesian Networks / 2. Markov Equivalence and DAG patterns Markov-equivalence / examples C C A B D A B D E F E F Figure 1: Example for markov-equivalent DAGs. G G,,-~~--- VTV (!i}(a,b, C) (b) (A, C, B) (e) (B, A, C) (B,C,A) (d)(c,a, B) (e) (C, B, A) ---n~~ ~,.., ~"... 1 ri. -'...-.~o 're>rt.pli-mads T assoclate clwlt h t h e cl epencl ene Ymode. Figure 2: Which minimal DAG-representations of I are equivalent? [CGH97, p. 240] Course on Bayesian Networks, summer term 2010 22/42

Bayesian Networks / 2. Markov Equivalence and DAG patterns Directed graph patterns Definition 6. Let V be a set and E V 2 P 2 (V ) a set of ordered and unordered pairs of elements of V with (v, w), (w, v) E for v, w V with {v, w} E. Then G := (V, E) is called a directed graph pattern. The elements of V are called vertices, the elemtents of E edges: unordered pairs are called undirected edges, ordered pairs directed edges. (v, w) E G or (w, v) E G A B } {v, w} E H C D We say, a directed graph pattern H is a pattern of the directed graph G, if there is an orientation of the unoriented edges of H that yields G, i.e. { (v, w) EH or (v, w) E G {v, w} E H (v, w) E G (v, w) E H E G Figure 3: Directed graph pattern. Course on Bayesian Networks, summer term 2010 23/42 F

Bayesian Networks / 2. Markov Equivalence and DAG patterns DAG patterns Definition 7. A directed graph pattern H is called an acyclic directed graph pattern (DAG pattern), if C it is the directed graph pattern of a DAG G A B D or equivalently H does not contain a completely directed cycle, i.e. there is no sequence v 1,..., v n V with (v i, v i+1 ) E for i = 1,..., n 1 (i.e. the directed graph got by dropping undirected edges is a DAG). Figure 4: DAG pattern. E G F A B C D Figure 5: Directed graph pattern that is not a DAG pattern. Course on Bayesian Networks, summer term 2010 24/42

Bayesian Networks / 2. Markov Equivalence and DAG patterns DAG patterns represent markov equivalence classes Lemma 5. Each markov equivalence class corresponds uniquely to a DAG pattern G: (i) The markov equivalence class consists of all DAGs that G is a pattern of, i.e., that give G by dropping the directions of some edges that are not part of an uncoupled head-to-head meeting, (ii) The DAG pattern contains a directed edge (v, w), if all representatives of the markov equivalence class contain this directed edge, otherwise (i.e. if some representatives have (v, w), some others (w, v)) the DAG pattern contains the undirected edge {v, w}. The directed edges of the DAG pattern are also called irreversible or compelled, the undirected edges are also called reversible. Course on Bayesian Networks, summer term 2010 25/42

Bayesian Networks / 2. Markov Equivalence and DAG patterns DAG patterns represent markov equivalence classes / example C A B D E F C G C A B D A B D E F E F G G Figure 6: DAG pattern and its markov equivalence class representatives. Course on Bayesian Networks, summer term 2010 26/42

Bayesian Networks / 2. Markov Equivalence and DAG patterns DAG patterns represent markov equivalence classes But beware, not every DAG pattern represents a Markov-equivalence class! Example: X Y Z is not a DAG pattern of a Markov-equivalence class, but X Y Z is. Course on Bayesian Networks, summer term 2010 27/42

Bayesian Networks / 2. Markov Equivalence and DAG patterns DAG patterns represent markov equivalence classes But just skeleton plus uncoupled head-to-head meetings do not make a DAG pattern that represents a markov-equivalence class either. Example: X Z Y W is not a DAG pattern that represents a Markov-equivalence class, as any of its represenatives also has Z W. But X Z Y W is. Course on Bayesian Networks, summer term 2010 28/42

Bayesian Networks / 2. Markov Equivalence and DAG patterns Computing DAG patterns So, to compute the DAG pattern that represents the equivalence class of a given DAG, 1. start with the skeleton plus all uncoupled head-to-head-meetings, 2. add entailed edges successively (saturating). Course on Bayesian Networks, summer term 2010 29/42

Bayesian Networks / 2. Markov Equivalence and DAG patterns Saturating DAG patterns rule 1: rule 2: X X X X Y any path Z Y any path Z Y Z Y Z rule 3: rule 4: X X X Z W Y X Z W Y W Y Z W Y Z Dashed link can be W Z, W Z, or W Z (so rule 4 is actually a compact notation for 3 rules). Course on Bayesian Networks, summer term 2010 30/42

Bayesian Networks / 2. Markov Equivalence and DAG patterns Computing DAG patterns 1 saturate(graph pattern G = (V, E)) : 2 apply rules 1 4 to G until no more rule matches 3 return G 1 dag-pattern(graph G = (V, E)) : 2 H := (V, F ) with F := {{x, y} (x, y) E} 3 for X Z Y uncoupled head-to-head-meeting in G do 4 orient X Z Y in H 5 od 6 saturate(h) 7 return H Figure 7: Algorithm for computing the DAG pattern of the Markov-equivalence class of a given DAG. Lemma 6. For a given graph G, algorithm 7 computes correctly the DAG pattern that represents its Markov-equivalence class. Furthermore, here, even the rule set 1 3 will do and is non-redundant. See [Mee95] for a proof. Course on Bayesian Networks, summer term 2010 31/42

Bayesian Networks / 2. Markov Equivalence and DAG patterns Computing DAG patterns What follows, is an alternative algorithm for computing DAG patterns that represent the Markov-equivalence class of a given DAG. Course on Bayesian Networks, summer term 2010 32/42

Bayesian Networks / 2. Markov Equivalence and DAG patterns Toplogical edge ordering Definition 8. Let G := (V, E) be a directed graph. A bijective map y x τ : {1,..., E } E is called an ordering of the edges of G. it is z τ 1 (x, y) < τ 1 (y, z)! < τ 1 (x, z) An edge ordering τ is called topological edge ordering if (i) numbers increase on all paths, i.e. τ 1 (x, y) < τ 1 (y, z) 4 y 2 5 x 1 v 3 for paths x y z and (ii) shortcuts have larger numbers, i.e. for x, y, z with z Figure 8: Example for a topological edge ordering. w Course on Bayesian Networks, summer term 2010 33/42

Bayesian Networks / 2. Markov Equivalence and DAG patterns Toplogical edge ordering 1 topological-edge-ordering(g = (V, E)) : 2 σ := topological-ordering(g) 3 E := E 4 for i = 1,..., E do 5 Let (v, w) E with σ 1 (w) minimal and then with σ 1 (v) maximal 6 τ(i) := (v, w) 7 E := E \ {(v, w)} 8 od 9 return τ Figure 9: Algorithm for computing a topological edge ordering of a DAG. Course on Bayesian Networks, summer term 2010 34/42

Bayesian Networks / 2. Markov Equivalence and DAG patterns 1 dag-pattern(g = (V, E)) : 2 τ := topological-edge-ordering(g) 3 E irr := 4 E rev := 5 E rest := E 6 while E rest do 7 Let (y, z) E rest with τ 1 (y, z) minimal 8 [label pa(z) :] 9 if (x, y) E irr with (x, z) E 10 E irr := E irr {(x, z) x pa(z)} 11 else 12 E irr := E irr {(x, z) (x, y) E irr } 13 if (x, z) E with x {y} pa(y) 14 E irr := E irr {(x, z) (x, z) E rest } 15 else 16 E rev := E rev {(x, z) (x, z) E rest } 17 fi 18 fi 19 E rest := E \ E irr \ E rev 20 od 21 return Ḡ := (V, E irr {{v, w} (v, w) E rev }) Figure 10: Algorithm for computing the DAG pattern representing the markov equivalence class of a DAG G. [Chi95] Course on Bayesian Networks, summer term 2010 35/42

Bayesian Networks / 2. Markov Equivalence and DAG patterns A simple but important lemma Lemma 7 ([Chi95]). Let G be a DAG and x, y, z three vertices of G that are pairwise adjacent. If any two of the connecting edges are reversible, then the third one is also. Course on Bayesian Networks, summer term 2010 36/42

Bayesian Networks / 2. Markov Equivalence and DAG patterns line 10 1 dag-pattern(g = (V, E)) : 2 τ := topological-edge-ordering(g) 3 E irr := 4 E rev := 5 E rest := E 6 while E rest do 7 Let (y, z) E rest with τ 1 (y, z) minimal 8 [label pa(z) :] 9 if (x, y) E irr with (x, z) E 10 E irr := E irr {(x, z) x pa(z)} 11 else 12 E irr := E irr {(x, z) (x, y) E irr } 13 if (x, z) E with x {y} pa(y) 14 E irr := E irr {(x, z) (x, z) E rest } 15 else 16 E rev := E rev {(x, z) (x, z) E rest } 17 fi 18 fi 19 E rest := E \ E irr \ E rev 20 od 21 return Ḡ := (V, E irr {{v, w} (v, w) E rev }) a) x = y: y z x X cycle! b) x y: case 1) x and y not adjacent: x y z X case 2) x and y adjacent: y z x Xcycle! y z x Xlemma! Course on Bayesian Networks, summer term 2010 37/42

Bayesian Networks / 2. Markov Equivalence and DAG patterns line 12 1 dag-pattern(g = (V, E)) : 2 τ := topological-edge-ordering(g) 3 E irr := 4 E rev := 5 E rest := E 6 while E rest do 7 Let (y, z) E rest with τ 1 (y, z) minimal 8 [label pa(z) :] 9 if (x, y) E irr with (x, z) E 10 E irr := E irr {(x, z) x pa(z)} 11 else 12 E irr := E irr {(x, z) (x, y) E irr } 13 if (x, z) E with x {y} pa(y) 14 E irr := E irr {(x, z) (x, z) E rest } 15 else 16 E rev := E rev {(x, z) (x, z) E rest } 17 fi 18 fi 19 E rest := E \ E irr \ E rev 20 od 21 return Ḡ := (V, E irr {{v, w} (v, w) E rev }) case 1) (y, z) is irreversible: x y z Xcycle! case 2) (y, z) is reversible: x y z Xlemma! Course on Bayesian Networks, summer term 2010 38/42

Bayesian Networks / 2. Markov Equivalence and DAG patterns line 14 1 dag-pattern(g = (V, E)) : 2 τ := topological-edge-ordering(g) 3 E irr := 4 E rev := 5 E rest := E 6 while E rest do 7 Let (y, z) E rest with τ 1 (y, z) minimal 8 [label pa(z) :] 9 if (x, y) E irr with (x, z) E 10 E irr := E irr {(x, z) x pa(z)} 11 else 12 E irr := E irr {(x, z) (x, y) E irr } 13 if (x, z) E with x {y} pa(y) 14 E irr := E irr {(x, z) (x, z) E rest } 15 else 16 E rev := E rev {(x, z) (x, z) E rest } 17 fi 18 fi 19 E rest := E \ E irr \ E rev 20 od 21 return Ḡ := (V, E irr {{v, w} (v, w) E rev }) a) x = x: y z X x b) x x: case 1) (x, y) irreversible x y z Xcycle! case 2) (x, y) is reversible: x y z Xlemma! Course on Bayesian Networks, summer term 2010 39/42

Bayesian Networks / 2. Markov Equivalence and DAG patterns Summary (1/2) (Conditional) probabilistic independence in estimated JPDs has to be checked by means of a statistical test (e.g., χ 2, G 2 ). For those tests, a test statistics (χ 2 ) is computed and its probability under the assumption of independence is computed. 1. If this is too small, the independency assumption is rejected and dependency assumed. 2. If this exceeds a given lower bound, the independency assumption cannot be rejected and independency assumed. Course on Bayesian Networks, summer term 2010 40/42

Bayesian Networks / 2. Markov Equivalence and DAG patterns Summary (2/2) Some DAGs encode the same independency relation (Markov equivalence). A Markov equivalence class can be represented by a DAG pattern. (but not all DAG patterns represent a Markov equivalence class!) For a given DAG, its DAG pattern can be computed by 1. start from the undirected skeleton, 2. add all directions of uncoupled head-to-head meetings, 3. saturate infered directions (using 3 rules). Course on Bayesian Networks, summer term 2010 41/42

Bayesian Networks 1. Checking Probalistic Independencies 2. Markov Equivalence and DAG patterns 3. PC Algorithm Course on Bayesian Networks, summer term 2010 42/42

Bayesian Networks / 3. PC Algorithm References [CGH97] Enrique Castillo, José Manuel Gutiérrez, and Ali S. Hadi. Expert Systems and Probabilistic Network Models. Springer, New York, 1997. [Chi95] D. Chickering. A transformational characterization of equivalent bayesian network structures. In Philippe Besnard and Steve Hanks, editors, Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pages 87 98. Morgan Kaufmann, 1995. [Mee95] C. Meek. Causal inference and causal explanation with background knowledge. In Proceedings of Eleventh Conference on Uncertainty in Artificial Intelligence, Montreal, QU, pages 403 418. Morgan Kaufmann, August 1995. [PGV90] J. Pearl, D. Geiger, and T. S. Verma. The logic of influence diagrams. In R. M. Oliver and J. Q. Smith, editors, Influence Diagrams, Belief Networks and Decision Analysis. Wiley, Sussex, 1990. (a shorter version originally appeared in Kybernetica, Vol. 25, No. 2, 1989). [SGS00] Peter Spirtes, Clark Glymour, and Richard Scheines. Search. MIT Press, 2 edition, 2000. Causation, Prediction, and Course on Bayesian Networks, summer term 2010 42/42