Prof. Dr. Lars Schmidt-Thieme, L. B. Marinho, K. Buza Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course

Similar documents
Bayesian Networks. B. Inference / B.3 Approximate Inference / Sampling

Bayesian Networks. 7. Approximate Inference / Sampling

Lecture 4 October 18th

Markov properties for undirected graphs

Undirected Graphical Models

Undirected Graphical Models: Markov Random Fields

4.1 Notation and probability review

Probabilistic Reasoning: Graphical Models

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute BW/WI & Institute for Computer Science, University of Hildesheim

Bayesian Networks. 10. Parameter Learning / Missing Values

Conditional Independence and Factorization

Chris Bishop s PRML Ch. 8: Graphical Models

Probabilistic Graphical Models (I)

Exact Inference I. Mark Peot. In this lecture we will look at issues associated with exact inference. = =

Rapid Introduction to Machine Learning/ Deep Learning

Statistical Approaches to Learning and Discovery

3 : Representation of Undirected GM

Graphical Models and Independence Models

Conditional Independence and Markov Properties

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

CSC 412 (Lecture 4): Undirected Graphical Models

Lecture 12: May 09, Decomposable Graphs (continues from last time)

CPSC 540: Machine Learning

On Markov Properties in Evidence Theory

Learning from Sensor Data: Set II. Behnaam Aazhang J.S. Abercombie Professor Electrical and Computer Engineering Rice University

Probabilistic Graphical Models. Rudolf Kruse, Alexander Dockhorn Bayesian Networks 153

Chapter 16. Structured Probabilistic Models for Deep Learning

1. what conditional independencies are implied by the graph. 2. whether these independecies correspond to the probability distribution

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Directed and Undirected Graphical Models

CS281A/Stat241A Lecture 19

Variable Elimination (VE) Barak Sternberg

Graphical Models Another Approach to Generalize the Viterbi Algorithm

ACO Comprehensive Exam March 20 and 21, Computability, Complexity and Algorithms

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Lecture 17: May 29, 2002

Likelihood Analysis of Gaussian Graphical Models

Independencies. Undirected Graphical Models 2: Independencies. Independencies (Markov networks) Independencies (Bayesian Networks)

Tree-width. September 14, 2015

Chordal Graphs, Interval Graphs, and wqo

Markov properties for undirected graphs

Bayesian Networks: Representation, Variable Elimination

STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas. Bayesian networks: Representation

Linear-Time Algorithms for Finding Tucker Submatrices and Lekkerkerker-Boland Subgraphs

4 : Exact Inference: Variable Elimination

Graphical Model Inference with Perfect Graphs

Undirected Graphical Models

Bayesian Networks Representation

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Based on slides by Richard Zemel

Dynamic Programming on Trees. Example: Independent Set on T = (V, E) rooted at r V.

Undirected Graphical Models

Part V. Matchings. Matching. 19 Augmenting Paths for Matchings. 18 Bipartite Matching via Flows

Codes on graphs. Chapter Elementary realizations of linear block codes

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Total positivity in Markov structures

Machine Learning Lecture 14

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI

Cographs; chordal graphs and tree decompositions

Reasoning with Bayesian Networks

CS 188: Artificial Intelligence. Bayes Nets

Directed Graphical Models

Classical Complexity and Fixed-Parameter Tractability of Simultaneous Consecutive Ones Submatrix & Editing Problems

Qualifier: CS 6375 Machine Learning Spring 2015

Data Mining 2018 Bayesian Networks (1)

An Introduction to Bayesian Machine Learning

Probabilistic Graphical Models

Model Complexity of Pseudo-independent Models

USING GRAPHS AND GAMES TO GENERATE CAP SET BOUNDS

Rapid Introduction to Machine Learning/ Deep Learning

A Latin Square of order n is an n n array of n symbols where each symbol occurs once in each row and column. For example,

Learning Multivariate Regression Chain Graphs under Faithfulness

Observation 4.1 G has a proper separation of order 0 if and only if G is disconnected.

Markov Random Fields

Conditional Random Field

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics

Uncertainty and Bayesian Networks

Markov Networks. l Like Bayes Nets. l Graphical model that describes joint probability distribution using tables (AKA potentials)

k-degenerate Graphs Allan Bickle Date Western Michigan University

Lecture 6: Graphical Models

Statistical Learning

Bayesian Networks Inference with Probabilistic Graphical Models

Representation of undirected GM. Kayhan Batmanghelich

ACO Comprehensive Exam March 17 and 18, Computability, Complexity and Algorithms

Probabilistic Graphical Models Lecture Notes Fall 2009

Markov Networks. l Like Bayes Nets. l Graph model that describes joint probability distribution using tables (AKA potentials)

DECOMPOSITIONS OF MULTIGRAPHS INTO PARTS WITH THE SAME SIZE

Markov Independence (Continued)

Bayesian networks approximation

STA 4273H: Statistical Machine Learning

Causality in Econometrics (3)

The Minimum Rank, Inverse Inertia, and Inverse Eigenvalue Problems for Graphs. Mark C. Kempton

COMP538: Introduction to Bayesian Networks

MATHEMATICAL ENGINEERING TECHNICAL REPORTS. Boundary cliques, clique trees and perfect sequences of maximal cliques of a chordal graph

Artificial Intelligence

Probabilistic Graphical Models

k-blocks: a connectivity invariant for graphs

Decomposable Graphical Gaussian Models

Machine Learning 4771

Transcription:

Course on Bayesian Networks, winter term 2007 0/31

Bayesian Networks Bayesian Networks I. Bayesian Networks / 1. Probabilistic Independence and Separation in Graphs Prof. Dr. Lars Schmidt-Thieme, L. B. Marinho, K. Buza Information Systems and Machine Learning Lab (ISMLL) Institute of Economics and Information Systems & Institute of Computer Science University of Hildesheim http://www.ismll.uni-hildesheim.de Course on Bayesian Networks, winter term 2007 1/31

Bayesian Networks Outline of the Course section I. Probabilistic Independence and Separation in Graphs II. Inference III. Learning key concepts Prob. independence, separation in graphs, Markov and Bayesian Network Exact inference, Approx. inference Parameter Learning, Parameter Learning with missing values, Learning structure by Constrained-based Learning, Learning Structure by Local Search Course on Bayesian Networks, winter term 2007 1/31

Bayesian Networks 1. Basic Probability Calculus 2. Separation in undirected graphs Course on Bayesian Networks, winter term 2007 2/31

Bayesian Networks / 1. Basic Probability Calculus Joint probability distributions Pain Y N Weightloss Y N Y N Vomiting Y N Y N Y N Y N Adeno Y 0.220 0.220 0.025 0.025 0.095 0.095 0.010 0.010 N 0.004 0.009 0.005 0.012 0.031 0.076 0.050 0.113 Figure 1: Joint probability distribution p(p, W, V, A) of four random variables P (pain), W (weightloss), V (vomiting) and A (adeno). Course on Bayesian Networks, winter term 2007 2/31

Bayesian Networks / 1. Basic Probability Calculus Joint probability distributions Discrete JPDs are described by nested tables, multi-dimensional arrays, data cubes, or tensors having entries in [0, 1] and summing to 1. Course on Bayesian Networks, winter term 2007 3/31

Bayesian Networks / 1. Basic Probability Calculus Marginal probability distributions Definition 1. Let p be a the joint probability of the random variables X := {X 1,..., X n } and Y X a subset thereof. Then p(y = y) := p Y (y) := p(x \ Y = x, Y = y) x dom X \Y is a probability distribution of Y called marginal probability distribution. Example 1. Marginal p(v, A): Vomiting Y N Adeno Y 0.350 0.350 N 0.090 0.210 Pain Y N Weightloss Y N Y N Vomiting Y N Y N Y N Y N Adeno Y 0.220 0.220 0.025 0.025 0.095 0.095 0.010 0.010 N 0.004 0.009 0.005 0.012 0.031 0.076 0.050 0.113 Figure 2: Joint probability distribution p(p, W, V, A) of four random variables P (pain), W (weightloss), V (vomiting) and A (adeno). Course on Bayesian Networks, winter term 2007 4/31

Bayesian Networks / 1. Basic Probability Calculus Marginal probability distributions / example 1220 1330 1170 1280 I al a2 a3 a4 an numbers in parts per 1000 20 90 10 80 b3 400 2 1 20 17 b2 240 28 24 5 3 bl 360 C3 13001 b3 b2 bl b3 b2 bl 2 9 1 8 2 1 20 17 84 72 15 9 al a2 a3 a4 40 180 20 160 12 6 120 102 168 144 30 18 Cl 12401 Cl C2 C3 20 180 200 40 160 40 180 120 60 al a2 a3 a4 50 115 35 100 82 133 99 146 88 82 36 34 b3 b2 bl C3 C2 Cl Figure 3: Joint probability distribution and all of its marginals [?, p. 75]. Course on Bayesian Networks, winter term 2007 5/31

Bayesian Networks / 1. Basic Probability Calculus Extreme and non-extreme probability distributions Definition 2. By p > 0 we mean p(x) > 0, for all x dom(p) Then p is called non-extreme. Example 2. ( ) 0.4 0.0 ( ) 0.4 0.1 0.3 0.3 0.2 0.3 Course on Bayesian Networks, winter term 2007 6/31

Bayesian Networks / 1. Basic Probability Calculus Conditional probability distributions Definition 3. For a JPD p and a subset Y dom(p) of its variables with p Y > 0 we define p Y := p p Y as conditional probability distribution of p w.r.t. Y. A conditional probability distribution w.r.t. Y sums to 1 for all fixed values of Y, i.e., (p Y ) Y 1 Course on Bayesian Networks, winter term 2007 7/31

Bayesian Networks / 1. Basic Probability Calculus Conditional probability distributions / example Example 3. Let p be the JPD ( ) 0.4 0.1 p := 0.2 0.3 on two variables R (rows) and C (columns) with the domains dom(r) = dom(c) = {1, 2}. The conditional probability distribution w.r.t. C is ( ) 2/3 1/4 p C := 1/3 3/4 Course on Bayesian Networks, winter term 2007 8/31

Bayesian Networks / 1. Basic Probability Calculus Chain rule Lemma 1 (Chain rule). Let p be a JPD on variables X 1, X 2,..., X n with p(x 1,..., X n 1 ) > 0. Then p(x 1, X 2,..., X n ) = p(x n X 1,..., X n 1 ) p(x 2 X 1 ) p(x 1 ) The chain rule provides a factorization of the JPD in some of its conditional marginals. The factorizations stemming from the chain rule are trivial as they have as many parameters as the original JPD: #parameters = 2 n 1 + 2 n 2 + + 2 1 + 2 0 = 2 n 1 (example computation for all binary variables) Course on Bayesian Networks, winter term 2007 9/31

Bayesian Networks / 1. Basic Probability Calculus Bayes formula Lemma 2 (Bayes Formula). Let p be a JPD and X, Y be two disjoint sets of its variables. Let p(y) > 0. Then p(x Y) = p(y X ) p(x ) p(y) Thomas Bayes (1701/2 1761) Course on Bayesian Networks, winter term 2007 10/31

Bayesian Networks / 1. Basic Probability Calculus Independent variables Definition 4. Two sets X, Y of variables are called independent, when p(x = x, Y = y) = p(x = x) p(y = y) for all x and y or equivalently for y with p(y = y) > 0. p(x = x Y = y) = p(x = x) Course on Bayesian Networks, winter term 2007 11/31

Bayesian Networks / 1. Basic Probability Calculus Independent variables / example Example 4. Let Ω be the cards in an ordinary deck and R = true, if a card is royal, T = true, if a card is a ten or a jack, S = true, if a card is spade. Cards for a single color: 2 3 4 5 6 7 8 9 10 J Q K A ROYALS S R T p(r, T S) Y Y Y 1/13 N 2/13 N Y 1/13 N 9/13 N Y Y 3/39 = 1/13 N 6/39 = 2/13 N Y 3/39 = 1/13 N 27/39 = 9/13 R T p(r, T ) Y Y 4/52 = 1/13 N 8/52 = 2/13 N Y 4/52 = 1/13 N 36/52 = 9/13 Course on Bayesian Networks, winter term 2007 12/31

Bayesian Networks / 1. Basic Probability Calculus Conditionally independent variables Definition 5. Let X, Y, and Z be sets of variables. X, Y are called conditionally independent given Z, when for all events Z = z with p(z = z) > 0 all pairs of events X = x and Y = y are conditionally independend given Z = z, i.e. p(x = x, Y = y, Z = z) = p(x = x, Z = z) p(y = y, Z = z) p(z = z) for all x, y and z (with p(z = z) > 0), or equivalently p(x = x Y = y, Z = z) = p(x = x Z = z) We write I p (X, Y Z) for the statement, that X and Y are conditionally independent given Z. Course on Bayesian Networks, winter term 2007 13/31

Bayesian Networks / 1. Basic Probability Calculus Conditionally independent variables Example 5. Assume S (shape), C (color), and L (label) be three random variables that are distributed as shown in figure 4. We show I p ({L}, {S} {C}), i.e., that label and shape are conditionally independent given the color. C S L p(l C, S) black square 1 2/6 = 1/3 2 4/6 = 2/3 round 1 1/3 2 2/3 white square 1 1/2 2 1/2 round 1 1/2 2 1/2 C L p(l C) black 1 3/9 = 1/3 2 6/9 = 2/3 white 1 2/4 = 1/2 2 2/4 = 1/2... aaqq Figure 4: 13 objects with different shape, color, and label [?, p. 8]. Course on Bayesian Networks, winter term 2007 14/31

Bayesian Networks 1. Basic Probability Calculus 2. Separation in undirected graphs Course on Bayesian Networks, winter term 2007 15/31

Definition 6. Let V be any set and Graphs E P 2 (V ) := {{x, y} x, y V } be a subset of sets of unordered pairs of V. Then G := (V, E) is called an undirected graph. The elements of V are called vertices or nodes, the elements of E edges. Let e = {x, y} E be an edge, then we call the vertices x, y incident to the edge e. F B A E D G H Figure 5: Example graph. C Course on Bayesian Networks, winter term 2007 15/31

Graphs Representation The most useful methods of representing graphs are: Symbolically as (V, E) Pictorially Numerically, using certain types of matrices Course on Bayesian Networks, winter term 2007 16/31

Definition 7. We call two vertices x, y V adjacent, or neighbors if there is an edge {x, y} E. Characteristics of Undirected Graphs The set of all vertices adjacent with a given vertex x V is called its fan or boundary: fan(x) := {y V {x, y} E} Figure 6: Neighbors of node E [?, p. 120]. Figure 7: Boundary of the set {D, E}[?, p. 120]. Course on Bayesian Networks, winter term 2007 17/31

Characteristics of Undirected Graphs Definition 8. Let G = (V, E) be an undirected graph. An undirected graph G X = (X, E X ) is called a subgraph of G iff X V and E X = (X X) E An undirected graph is said to be complete iff its set of edges is complete, i.e. iff all possible edges are present, or formally iff E = V V {(A, A) A V } Figure 8: Example of complete graph [?, p. 118]. A complete subgraph is called a clique. A clique is called maximal iff it is not a subgraph of a larger clique. Figure 9: Example of cliques[?, p. 119]. Course on Bayesian Networks, winter term 2007 18/31

Definition 9. Let V be a set. We call V := i N V i the set of finite sequences in V. The length of a sequence s V is denoted by s. Let G = (V, E) be a graph. We call G := V G := {p V {p i, p i+1 } E, i = 1,..., p 1} the set of paths on G. Any contiguous subsequence of a path p G is called a subpath of p, i.e. any path (p i, p i+1,..., p j ) with 1 i j n. The subpath (p 2, p 3,..., p n 1 ) is called the interior of p. A path of length p 2 is called proper. Paths on graphs A B E D F G H Figure 10: Example graph. The sequences C (A, D, G, H) (C, E, B, D) (F ) are paths on G, but the sequences (A, D, E, C) (A, H, C, F ) are not. Course on Bayesian Networks, winter term 2007 19/31

Types of Undirected Graphs Definition 10. Let G = (V, E) be an undirected graph. Two distinct nodes A, B V are called connected in G iff there exists at least one path between every two nodes. A connected undirected graph is said to be a tree if for every pair of nodes there exists a unique path. Figure 11: Disconnected graph [?, p. 121]. A connected undirected graph is called multiply-connected if it contains at least one pair of nodes that are joined by more than one path. Figure 12: Examples of a tree and a multiplyconnected graph [?, p. 122]. Course on Bayesian Networks, winter term 2007 20/31

Separation in graphs (u-separation) Definition 11. Let G := (V, E) be a graph. Let Z V be a subset of vertices. We say, two vertices x, y V are u-separated by Z in G, if every path from x to y contains some vertex of Z ( p G : p 1 = x, p p = y i {1,..., n} : p i Z). Let X, Y, Z V be three disjoint subsets of vertices. We say, the vertices X and Y are u-separated by Z in G, if every path from any vertex from X to any vertex from Y is separated by Z, i.e., contains some vertex of Z. We write I G (X, Y Z) for the statement, that X and Y are u-separated by Z in G. I G is called u-separation relation in G. (a) I(A, I IE) Figure 13: Example for u-separation [?, p. 179]. Course on Bayesian Networks, winter term 2007 21/31

Separation in graphs (u-separation) (a) I(A, I IE) (b)d(a,i (a) I(A, I IE), I IE) (b)d(a,iib) (b)d(a,iib) (c) I( {A, C}, {D, H} I {B, E}) (d)d({a, C), {D (c) I( {A, C}, {D, H} I {B, E}) (d)d({a, C), {D,H} I {E,I}) Figure 14: More examples for u-separation [?, p. 179]. {D, H} I {B, E}) (d)d({a, C), {D,H} I {E,I}) Course on Bayesian Networks, winter term 2007 22/31

Properties of ternary relations Definition 12. Let V be any set and I a ternary relation on P(V ), i.e., I (P(V )) 3. I is called symmetric, if I(X, Y Z) I(Y, X Z) I is called decomposable, if I(X, Y W Z) I(X, Y Z) and I(X, W Z) I is called composable, if I(X, Y Z) and I(X, W Z) I(X, Y W Z) 186 186 5. 5. Building ProbabilisticModels Models (a) (a) Symmetry (b) (b) Decomposition Figure 15: Examples for a) symmetry and b) decomposition [?, p. 186]. Course on Bayesian Networks, winter term 2007 23/31

Properties of ternary relations Definition 13. I is called strongly unionable, if I(X, Y Z) I(X, Y Z W ) 186 5. Building Probabilistic Models I is called weakly unionable, if I(X, Y W Z) I(X, W Z Y ) and I(X, Y Z W ) (a) Symmetry (b) Decomposition =? (a) Strang Union (b) Strang Transitivity (c) Weak Union Figure 16: Examples for a) strong union and b) weak union [?, p. 186,189]. Course on Bayesian Networks, winter term 2007 24/31

Properties of ternary relations Definition 14. I is called(b) contractable, Decomposition if (a) Symmetry =? I is called intersectable, if =? (c) Weak Union I(X, W Z Y ) and I(X, Y Z) I(X, Y W Z) I(X, W Z Y ) and I(X, Y Z W ) I(X, Y W Z) (c) Weak Union (d) Contraction =? (d) Contraction Figure 17: Examples for a) contraction and b) intersection [?, p. 186]. PTnTTDlC' ~ ~ " (e) Intersection =? Course on Bayesian Networks, winter term 2007 25/31

Properties of ternary relations Definition 15. I is called strongly transitive, if I(X, Y Z) I(X, {v} Z) or I({v}, Y Z) v V \ Z I is called weakly transitive, if I(X, Y Z) and I(X, Y Z {v}) I(X, {v} Z) or I({v}, Y Z) v V \ Z (a) Strang Union (b) Strang Transitivity (b) Strang Transitivity (e) Weak Transitivity Figure 18: Examples for a) strong transitivity and b) weak transitivity. [?, p. 189] Course on Bayesian Networks, winter term 2007 26/31 or

Properties of ternary relations Definition 16. I is called chordal, if I({a}, {c} {b, d}) and I({b}, {d} {a, c}) I({a}, {c} {b}) or I({a}, {c} {d}) a a a a b d & b d b d or b d c c c c Figure 19: Example for chordality. Course on Bayesian Networks, winter term 2007 27/31

Properties of u-separation / no chordality For u-separation the chordality property does not hold (in general). (e) Weak Transitivity or & :::} or (d) Chordality Figure 20: Counterexample for chordality in undirected graphs (u-separation) [?, p. 189]. Course on Bayesian Networks, winter term 2007 28/31

Properties of u-separation symmetry decomposition composition strong union weak union contraction intersection strong transitivity weak transitivity chordality relation u-separation + + + + + + + + + Course on Bayesian Networks, winter term 2007 29/31

Breadth-First Search Idea: start with initial node as border. iteratively replace border by all nodes reachable from the old border. Course on Bayesian Networks, winter term 2007 30/31

Breadth-First Search / Example 3 1 5 6 7 5 5 3 7 5 2 4 7 G A N H C D B E I L K J F M Course on Bayesian Networks, winter term 2007 30/31

Breadth-First Search / Example 3 1 5 6 7 5 5 3 7 5 2 4 7 G A N H C D B E I L K J F M Course on Bayesian Networks, winter term 2007 30/31

Breadth-First Search / Example 3 1 5 6 7 5 5 3 7 5 2 4 7 G A N H C D B E I L K J F M Course on Bayesian Networks, winter term 2007 30/31

To test, if for a given graph G = (V, E) two given sets X, Y V of vertices are u-separated by a third given set Z V of vertices, we may use standard breadth-first search to compute all vertices that can be reached from X (see, e.g., [?], [?]). 1 breadth-first search(g, X) : 2 border := X 3 reached := 4 while border do 5 reached := reached border 6 border := fan G (border) \ reached 7 od 8 return reached Figure 24: Breadth-first search algorithm for enumerating all vertices reachable from X. Checking u-separation For checking u-separation we have to tweak the algorithm 1. not to add vertices from Z to the border and 2. to stop if a vertex of Y has been reached. 1 check-u-separation(g, X, Y, Z) : 2 border := X 3 reached := 4 while border do 5 reached := reached border 6 border := fan G (border) \ reached \ Z 7 if border Y 8 return false 9 fi 10 od 11 return true Figure 25: Breadth-first search algorithm for checking u-separation of X and Y by Z. Course on Bayesian Networks, winter term 2007 31/31