Probabilistic Reasoning: Graphical Models

Size: px
Start display at page:

Download "Probabilistic Reasoning: Graphical Models"

Transcription

1 Probabilistic Reasoning: Graphical Models Christian Borgelt Intelligent Data Analysis and Graphical Models Research Unit European Center for Soft Computing c/ Gonzalo Gutiérrez Quirós s/n, Mieres (Asturias), Spain Christian Borgelt Probabilistic Reasoning: Graphical Models 1

2 Overview Graphical Models: Core Ideas and Notions A Simple Example: How does it work in principle? Conditional Independence Graphs conditional independence and the graphoid axioms separation in (directed and undirected) graphs decomposition/factorization of distributions Evidence Propagation in Graphical Models Building Graphical Models Learning Graphical Models from Data quantitative (parameter) and qualitative (structure) learning evaluation measures and search methods learning by measuring the strength of marginal dependences learning by conditional independence tests Summary Christian Borgelt Probabilistic Reasoning: Graphical Models 2

3 Graphical Models: Core Ideas and Notions Decomposition: Under certain conditions a distribution δ (e.g. a probability distribution) on a multi-dimensional domain, which encodes prior or generic knowledge about this domain, can be decomposed into a set {δ 1,..., δ s } of (usually overlapping) distributions on lower-dimensional subspaces. Simplified Reasoning: If such a decomposition is possible, it is sufficient to know the distributions on the subspaces to draw all inferences in the domain under consideration that can be drawn using the original distribution δ. Such a decomposition can nicely be represented as a graph (in the sense of graph theory), and therefore it is called a Graphical Model. The graphical representation encodes conditional independences that hold in the distribution, describes a factorization of the probability distribution, indicates how evidence propagation has to be carried out. Christian Borgelt Probabilistic Reasoning: Graphical Models 3

4 A Simple Example: The Relational Case Christian Borgelt Probabilistic Reasoning: Graphical Models 4

5 A Simple Example Example Domain 10 simple geometrical objects, 3 attributes. Relation color shape size small medium small medium medium large medium medium medium large One object is chosen at random and examined. Inferences are drawn about the unobserved attributes. Christian Borgelt Probabilistic Reasoning: Graphical Models 5

6 The Reasoning Space large medium small medium The reasoning space consists of a finite set Ω of states. The states are described by a set of n attributes A i, i = 1,..., n, whose domains {a (i) 1,..., a(i) n i } can be seen as sets of propositions or events. The events in a domain are mutually exclusive and exhaustive. The reasoning space is assumed to contain the true, but unknown state ω 0. Technically, the attributes A i are random variables. Christian Borgelt Probabilistic Reasoning: Graphical Models 6

7 The Relation in the Reasoning Space Relation color shape size small medium small medium medium large medium medium medium large Relation in the Reasoning Space large medium small Each cube represents one tuple. The spatial representation helps to understand the decomposition mechanism. However, in practice graphical models refer to (many) more than three attributes. Christian Borgelt Probabilistic Reasoning: Graphical Models 7

8 Reasoning Let it be known (e.g. from an observation) that the given object is green. This information considerably reduces the space of possible value combinations. From the prior knowledge it follows that the given object must be either a triangle or a square and either medium or large. large medium small large medium small Christian Borgelt Probabilistic Reasoning: Graphical Models 8

9 Prior Knowledge and Its Projections large medium small large medium small large medium small large medium small Christian Borgelt Probabilistic Reasoning: Graphical Models 9

10 Cylindrical Extensions and Their Intersection large medium small Intersecting the cylindrical extensions of the projection to the subspace spanned by color and shape and of the projection to the subspace spanned by shape and size yields the original three-dimensional relation. large medium small large medium small Christian Borgelt Probabilistic Reasoning: Graphical Models 10

11 Reasoning with Projections The reasoning result can be obtained using only the projections to the subspaces without reconstructing the original three-dimensional relation: color size s m l extend shape project project extend s m l This justifies a graph representation: color shape size Christian Borgelt Probabilistic Reasoning: Graphical Models 11

12 Using other Projections 1 large medium small large medium small large medium small large medium small This choice of subspaces does not yield a decomposition. Christian Borgelt Probabilistic Reasoning: Graphical Models 12

13 Using other Projections 2 large medium small large medium small large medium small large medium small This choice of subspaces does not yield a decomposition. Christian Borgelt Probabilistic Reasoning: Graphical Models 13

14 Is Decomposition Always Possible? 2 1 large medium small large medium small large medium small large medium small A modified relation (without tuples 1 or 2) may not possess a decomposition. Christian Borgelt Probabilistic Reasoning: Graphical Models 14

15 Relational Graphical Models: Formalization Christian Borgelt Probabilistic Reasoning: Graphical Models 15

16 Possibility-Based Formalization Definition: Let Ω be a (finite) sample space. A discrete possibility measure R on Ω is a function R : 2 Ω {0, 1} satisfying 1. R( ) = 0 and 2. E 1, E 2 Ω : R(E 1 E 2 ) = max{r(e 1 ), R(E 2 )}. Similar to Kolmogorov s axioms of probability theory. If an event E can occur (if it is possible), then R(E) = 1, otherwise (if E cannot occur/is impossible) then R(E) = 0. R(Ω) = 1 is not required, because this would exclude the empty relation. From the axioms it follows R(E 1 E 2 ) min{r(e 1 ), R(E 2 )}. Attributes are introduced as random variables (as in probability theory). R(A = a) and R(a) are abbreviations of R({ω A(ω) = a}). Christian Borgelt Probabilistic Reasoning: Graphical Models 16

17 Possibility-Based Formalization (continued) Definition: Let U = {A 1,..., A n } be a set of attributes defined on a (finite) sample space Ω with respective domains dom(a i ), i = 1,..., n. A relation r U over U is the restriction of a discrete possibility measure R on Ω to the set of all events that can be defined by stating values for all attributes in U. That is, r U = R EU, where E U = { E 2 Ω a1 dom(a 1 ) :... a n dom(a n ) : E = } A j = a j A j U = { E 2 Ω a1 dom(a 1 ) :... a n dom(a n ) : E = { ω Ω A j U A j (ω) = a j }}. A relation corresponds to the notion of a probability distribution. Advantage of this formalization: No index transformation functions are needed for projections, there are just fewer terms in the conjunctions. Christian Borgelt Probabilistic Reasoning: Graphical Models 17

18 Possibility-Based Formalization (continued) Definition: Let U = {A 1,..., A n } be a set of attributes and r U a relation over U. Furthermore, let M = {M 1,..., M m } 2 U be a set of nonempty (but not necessarily disjoint) subsets of U satisfying M = U. r U is called decomposable w.r.t. M iff M M a 1 dom(a 1 ) :... a n dom(a n ) : ( ) r U A i = a i = min M M A i U { rm ( A i M A i = a i )}. If r U is decomposable w.r.t. M, the set of relations R M = {r M1,..., r Mm } = {r M M M} is called the decomposition of r U. Equivalent to join decomposability in database theory (natural join). Christian Borgelt Probabilistic Reasoning: Graphical Models 18

19 Relational Decomposition: Simple Example large medium small Taking the minimum of the projection to the subspace spanned by color and shape and of the projection to the subspace spanned by shape and size yields the original three-dimensional relation. large medium small large medium small Christian Borgelt Probabilistic Reasoning: Graphical Models 19

20 Conditional Possibility and Independence Definition: Let Ω be a (finite) sample space, R a discrete possibility measure on Ω, and E 1, E 2 Ω events. Then R(E 1 E 2 ) = R(E 1 E 2 ) is called the conditional possibility of E 1 given E 2. Definition: Let Ω be a (finite) sample space, R a discrete possibility measure on Ω, and A, B, and C attributes with respective domains dom(a), dom(b), and dom(c). A and B are called conditionally relationally independent given C, written A R B C, iff a dom(a) : b dom(b) : c dom(c) : R(A = a, B = b C = c) = min{r(a = a C = c), R(B = b C = c)}, R(A = a, B = b, C = c) = min{r(a = a, C = c), R(B = b, C = c)}. Similar to the corresponding notions of probability theory. Christian Borgelt Probabilistic Reasoning: Graphical Models 20

21 Conditional Independence: Simple Example large medium small Example relation describing ten simple geometric objects by three attributes: color, shape, and size. In this example relation, the color of an object is conditionally relationally independent of its size given its shape. Intuitively: if we fix the shape, the colors and sizes that are possible together with this shape can be combined freely. Alternative view: once we know the shape, the color does not provide additional information about the size (and vice versa). Christian Borgelt Probabilistic Reasoning: Graphical Models 21

22 Relational Evidence Propagation Due to the fact that color and size are conditionally independent given the shape, the reasoning result can be obtained using only the projections to the subspaces: color size s m l extend shape project project extend s m l This reasoning scheme can be formally justified with discrete possibility measures. Christian Borgelt Probabilistic Reasoning: Graphical Models 22

23 Relational Evidence Propagation, Step 1 R(B = b A = a obs ) = R ( A = a, B = b, (1) = max a dom(a) a dom(a) { (2) = max a dom(a) { (3) = max { max a dom(a) c dom(c) c dom(c) C = c A = aobs ) max {R(A = a, B = b, C = c A = a obs)}} c dom(c) A: color B: shape C: size max {min{r(a = a, B = b, C = c), R(A = a A = a obs)}}} c dom(c) {min{r(a = a, B = b), R(B = b, C = c), R(A = a A = a obs )}}} = max {min{r(a = a, B = b), R(A = a A = a obs), a dom(a) max {R(B = b, C = c)} }} c dom(c) }{{} =R(B=b) R(A=a,B=b) = max a dom(a) {min{r(a = a, B = b), R(A = a A = a obs)}}. Christian Borgelt Probabilistic Reasoning: Graphical Models 23

24 Relational Evidence Propagation, Step 1 (continued) (1) holds because of the second axiom a discrete possibility measure has to satisfy. (3) holds because of the fact that the relation R ABC can be decomposed w.r.t. the set M = {{A, B}, {B, C}}. (A: color, B: shape, C: size) (2) holds, since in the first place R(A = a, B = b, C = c A = a obs ) = R(A = a, B = b, C = c, A = a obs ) { R(A = a, B = b, C = c), if a = aobs, = 0, otherwise, and secondly R(A = a A = a obs ) = R(A = a, A = a obs ) { R(A = a), if a = aobs, = 0, otherwise, and therefore, since trivially R(A = a) R(A = a, B = b, C = c), R(A = a, B = b, C = c A = a obs ) = min{r(a = a, B = b, C = c), R(A = a A = a obs )}. Christian Borgelt Probabilistic Reasoning: Graphical Models 24

25 Relational Evidence Propagation, Step 2 R(C = c A = a obs ) = R ( A = a, (1) = max a dom(a) a dom(a) { (2) = max a dom(a) { (3) = max { max a dom(a) b dom(b) b dom(b) B = b, C = c A = aobs ) max {R(A = a, B = b, C = c A = a obs)}} b dom(b) A: color B: shape C: size max {min{r(a = a, B = b, C = c), R(A = a A = a obs)}}} b dom(b) = max {min{r(b = b, C = c), b dom(b) max {min{r(a = a, B = b), R(B = b, C = c), R(A = a A = a obs )}}} a dom(a) {min{r(a = a, B = b), R(A = a A = a obs)}} }{{} =R(B=b A=a obs ) = max b dom(b) {min{r(b = b, C = c), R(B = b A = a obs)}}. } Christian Borgelt Probabilistic Reasoning: Graphical Models 25

26 A Simple Example: The Probabilistic Case Christian Borgelt Probabilistic Reasoning: Graphical Models 26

27 A Probability Distribution small 240 medium 460 all numbers in parts per large s m l large medium small The numbers state the probability of the corresponding value combination. Compared to the example relation, the possible combinations are now frequent. Christian Borgelt Probabilistic Reasoning: Graphical Models 27

28 Reasoning: Computing Conditional Probabilities small 122 medium 520 all numbers in parts per large s m l large medium small Using the information that the given object is green: The observed color has a posterior probability of 1. Christian Borgelt Probabilistic Reasoning: Graphical Models 28

29 Probabilistic Decomposition: Simple Example As for relational graphical models, the three-dimensional probability distribution can be decomposed into projections to subspaces, namely the marginal distribution on the subspace spanned by color and shape and the marginal distribution on the subspace spanned by shape and size. The original probability distribution can be reconstructed from the marginal distributions using the following formulae i, j, k : P ( a (color) i, a (shape) j, a (size) k ) = P ( a (color) i = P ( a (color) i, a (shape) j ) P ( a (size), a (shape) ) ( (shape) P a j P ( a (shape) j k j ) a (shape) ) j ), a (size) k These equations express the conditional independence of attributes color and size given the attribute shape, since they only hold if i, j, k : P ( a (size) k a (shape) j ) = P ( a (size) k a (color) i, a (shape) j ) Christian Borgelt Probabilistic Reasoning: Graphical Models 29

30 Reasoning with Projections Again the same result can be obtained using only projections to subspaces (marginal probability distributions): new old old new new old color shape new old size line new old old new s column m old new s m l l This justifies a graph representation: color shape size Christian Borgelt Probabilistic Reasoning: Graphical Models 30

31 Probabilistic Graphical Models: Formalization Christian Borgelt Probabilistic Reasoning: Graphical Models 31

32 Probabilistic Decomposition Definition: Let U = {A 1,..., A n } be a set of attributes and p U a probability distribution over U. Furthermore, let M = {M 1,..., M m } 2 U be a set of nonempty (but not necessarily disjoint) subsets of U satisfying M M M = U. p U is called decomposable or factorizable w.r.t. M iff it can be written as a product of m nonnegative functions φ M : E M IR + 0, M M, i.e., iff a 1 dom(a 1 ) :... a n dom(a n ) : ) ( A i = a i = φ M p U ( A i U M M A i M A i = a i ). If p U is decomposable w.r.t. M the set of functions Φ M = {φ M1,..., φ Mm } = {φ M M M} is called the decomposition or the factorization of p U. The functions in Φ M are called the factor potentials of p U. Christian Borgelt Probabilistic Reasoning: Graphical Models 32

33 Conditional Independence Definition: Let Ω be a (finite) sample space, P a probability measure on Ω, and A, B, and C attributes with respective domains dom(a), dom(b), and dom(c). A and B are called conditionally probabilistically independent given C, written A P B C, iff a dom(a) : b dom(b) : c dom(c) : P (A = a, B = b C = c) = P (A = a C = c) P (B = b C = c) Equivalent formula (sometimes more convenient): a dom(a) : b dom(b) : c dom(c) : P (A = a B = b, C = c) = P (A = a C = c) Conditional independences make it possible to consider parts of a probability distribution independent of others. Therefore it is plausible that a set of conditional independences may enable a decomposition of a joint probability distribution. Christian Borgelt Probabilistic Reasoning: Graphical Models 33

34 Conditional Independence: An Example Dependence (fictitious) between smoking and life expectancy. Each dot represents one person. x-axis: y-axis: age at death average number of cigarettes per day Weak, but clear dependence: The more cigarettes are smoked, the lower the life expectancy. (Note that this data is artificial and thus should not be seen as revealing an actual dependence.) Christian Borgelt Probabilistic Reasoning: Graphical Models 34

35 Conditional Independence: An Example Group 1 Conjectured explanation: There is a common cause, namely whether the person is exposed to stress at work. If this were correct, splitting the data should remove the dependence. Group 1: exposed to stress at work (Note that this data is artificial and therefore should not be seen as an argument against health hazards caused by smoking.) Christian Borgelt Probabilistic Reasoning: Graphical Models 35

36 Conditional Independence: An Example Group 2 Conjectured explanation: There is a common cause, namely whether the person is exposed to stress at work. If this were correct, splitting the data should remove the dependence. Group 2: not exposed to stress at work (Note that this data is artificial and therefore should not be seen as an argument against health hazards caused by smoking.) Christian Borgelt Probabilistic Reasoning: Graphical Models 36

37 Probabilistic Decomposition (continued) Chain Rule of Probability: a 1 dom(a 1 ) :... a n dom(a n ) : P ( n i=1 A i = a i ) = n i=1 P ( A i = a i i 1 j=1 A j = a j ) The chain rule of probability is valid in general (or at least for strictly positive distributions). Chain Rule Factorization: a 1 dom(a 1 ) :... a n dom(a n ) : P ( n i=1 A ) n i = a i = P ( A i = a i A j parents(a i ) A ) j = a j i=1 Conditional independence statements are used to cancel conditions. Christian Borgelt Probabilistic Reasoning: Graphical Models 37

38 Reasoning with Projections Due to the fact that color and size are conditionally independent given the shape, the reasoning result can be obtained using only the projections to the subspaces: new old old new new old color shape new old size line new old old new s column m old new s m l l This reasoning scheme can be formally justified with probability measures. Christian Borgelt Probabilistic Reasoning: Graphical Models 38

39 Probabilistic Evidence Propagation, Step 1 P (B = b A = a obs ) = P ( A = a, B = b, (1) = (2) = (3) = = = a dom(a) a dom(a) c dom(c) a dom(a) c dom(c) a dom(a) c dom(c) a dom(a) a dom(a) c dom(c) C = c A = aobs ) P (A = a, B = b, C = c A = a obs ) P (A = a, B = b, C = c) P (A = a A = a obs) P (A = a) P (A = a, B = b)p (B = b, C = c) P (B = b) P (A = a, B = b) P (A = a A = a obs) P (A = a) P (A = a, B = b) P (A = a A = a obs). P (A = a) A: color B: shape C: size P (A = a A = a obs) P (A = a) P (C = c B = b) c dom(c) }{{} =1 Christian Borgelt Probabilistic Reasoning: Graphical Models 39

40 Probabilistic Evidence Propagation, Step 1 (continued) (1) holds because of Kolmogorov s axioms. (3) holds because of the fact that the distribution p ABC can be decomposed w.r.t. the set M = {{A, B}, {B, C}}. (A: color, B: shape, C: size) (2) holds, since in the first place P (A = a, B = b, C = c A = a obs ) = P (A = a, B = b, C = c, A = a obs) P (A = a obs ) and secondly and therefore P (A = a, A = a obs ) = = P (A = a, B = b, C = c A = a obs ) P (A = a, B = b, C = c), if a = a P (A = a obs ) obs, 0, otherwise, { P (A = a), if a = aobs, 0, otherwise, = P (A = a, B = b, C = c) P (A = a A = a obs). P (A = a) Christian Borgelt Probabilistic Reasoning: Graphical Models 40

41 Probabilistic Evidence Propagation, Step 2 P (C = c A = a obs ) = P ( A = a, (1) = (2) = (3) = = = a dom(a) a dom(a) b dom(b) a dom(a) b dom(b) a dom(a) b dom(b) b dom(b) b dom(b) b dom(b) P (B = b, C = c) P (B = b) B = b, C = c A = aobs ) P (A = a, B = b, C = c A = a obs ) P (A = a, B = b, C = c) P (A = a A = a obs) P (A = a) P (A = a, B = b)p (B = b, C = c) P (B = b) A: color B: shape C: size P (A = a A = a obs) P (A = a) P (A = a, B = b) R(A = a A = a obs) P (A = a) a dom(a) } {{ } =P (B=b A=a obs ) P (B = b, C = c) P (B = b A = a obs). P (B = b) Christian Borgelt Probabilistic Reasoning: Graphical Models 41

42 Excursion: Possibility Theory Christian Borgelt Probabilistic Reasoning: Graphical Models 42

43 Possibility Theory The best-known calculus for handling uncertainty is, of course, probability theory. [Laplace 1812] An less well-known, but noteworthy alternative is possibility theory. [Dubois and Prade 1988] In the interpretation we consider here, possibility theory can handle uncertain and imprecise information, while probability theory, at least in its basic form, was only designed to handle uncertain information. Types of imperfect information: Imprecision: disjunctive or set-valued information about the obtaining state, which is certain: the true state is contained in the disjunction or set. Uncertainty: precise information about the obtaining state (single case), which is not certain: the true state may differ from the stated one. Vagueness: meaning of the information is in doubt: the interpretation of the given statements about the obtaining state may depend on the user. Christian Borgelt Probabilistic Reasoning: Graphical Models 43

44 Possibility Theory: Axiomatic Approach Definition: Let Ω be a (finite) sample space. A possibility measure Π on Ω is a function Π : 2 Ω [0, 1] satisfying 1. Π( ) = 0 and 2. E 1, E 2 Ω : Π(E 1 E 2 ) = max{π(e 1 ), Π(E 2 )}. Similar to Kolmogorov s axioms of probability theory. From the axioms follows Π(E 1 E 2 ) min{π(e 1 ), Π(E 2 )}. Attributes are introduced as random variables (as in probability theory). Π(A = a) is an abbreviation of Π({ω Ω A(ω) = a}) If an event E is possible without restriction, then Π(E) = 1. If an event E is impossible, then Π(E) = 0. Christian Borgelt Probabilistic Reasoning: Graphical Models 44

45 Possibility Theory and the Context Model Interpretation of Degrees of Possibility [Gebhardt and Kruse 1993] Let Ω be the (nonempty) set of all possible states of the world, ω 0 the actual (but unknown) state. Let C = {c 1,..., c n } be a set of contexts (observers, frame conditions etc.) and (C, 2 C, P ) a finite probability space (context weights). Let Γ : C 2 Ω be a set-valued mapping, which assigns to each context the most specific correct set-valued specification of ω 0. The sets Γ(c) are called the focal sets of Γ. Γ is a random set (i.e., a set-valued random variable) [Nguyen 1978]. The basic possibility assignment induced by Γ is the mapping π : Ω [0, 1] π(ω) P ({c C ω Γ(c)}). Christian Borgelt Probabilistic Reasoning: Graphical Models 45

46 Example: Dice and Shakers shaker 1 shaker 2 shaker 3 shaker 4 shaker 5 tetrahedron hexahedron octahedron icosahedron dodecahedron numbers degree of possibility = = = = = 1 5 Christian Borgelt Probabilistic Reasoning: Graphical Models 46

47 From the Context Model to Possibility Measures Definition: Let Γ : C 2 Ω be a random set. The possibility measure induced by Γ is the mapping Π : 2 Ω [0, 1], E P ({c C E Γ(c) }). Problem: From the given interpretation it follows only: E Ω : max π(ω) Π(E) min { 1, π(ω) }. ω E ω E c 1 : 1 2 c 2 : 1 4 c 3 : 1 4 π c 1 : 2 1 c 2 : 4 1 c 3 : 4 1 π Christian Borgelt Probabilistic Reasoning: Graphical Models 47

48 From the Context Model to Possibility Measures (cont.) Attempts to solve the indicated problem: Require the focal sets to be consonant: Definition: Let Γ : C 2 Ω be a random set with C = {c 1,..., c n }. The focal sets Γ(c i ), 1 i n, are called consonant, iff there exists a sequence c i1, c i2,..., c in, 1 i 1,..., i n n, 1 j < k n : i j i k, so that Γ(c i1 ) Γ(c i2 )... Γ(c in ). mass assignment theory [Baldwin et al. 1995] Problem: The voting model is not sufficient to justify consonance. Use the lower bound as the most pessimistic choice. [Gebhardt 1997] Problem: Basic possibility assignments represent negative information, the lower bound is actually the most optimistic choice. Justify the lower bound from decision making purposes. [Borgelt 1995, Borgelt 2000] Christian Borgelt Probabilistic Reasoning: Graphical Models 48

49 From the Context Model to Possibility Measures (cont.) Assume that in the end we have to decide on a single event. Each event is described by the values of a set of attributes. Then it can be useful to assign to a set of events the degree of possibility of the most possible event in the set. Example: max max Christian Borgelt Probabilistic Reasoning: Graphical Models 49

50 Possibility Distributions Definition: Let X = {A 1,..., A n } be a set of attributes defined on a (finite) sample space Ω with respective domains dom(a i ), i = 1,..., n. A possibility distribution π X over X is the restriction of a possibility measure Π on Ω to the set of all events that can be defined by stating values for all attributes in X. That is, π X = Π EX, where E X = { E 2 Ω a1 dom(a 1 ) :... a n dom(a n ) : E = } A j = a j A j X = { E 2 Ω a1 dom(a 1 ) :... a n dom(a n ) : E = { ω Ω A j X Corresponds to the notion of a probability distribution. A j (ω) = a j }}. Advantage of this formalization: No index transformation functions are needed for projections, there are just fewer terms in the conjunctions. Christian Borgelt Probabilistic Reasoning: Graphical Models 50

51 A Simple Example: The Possibilistic Case Christian Borgelt Probabilistic Reasoning: Graphical Models 51

52 A Possibility Distribution small 90 medium 80 all numbers in parts per large s m l large medium small The numbers state the degrees of possibility of the corresp. value combination. Christian Borgelt Probabilistic Reasoning: Graphical Models 52

53 Reasoning small 40 medium 70 all numbers in parts per large s m l large medium small Using the information that the given object is green. Christian Borgelt Probabilistic Reasoning: Graphical Models 53

54 Possibilistic Decomposition As for relational and probabilistic networks, the three-dimensional possibility distribution can be decomposed into projections to subspaces, namely: the maximum projection to the subspace color shape and the maximum projection to the subspace shape size. It can be reconstructed using the following formula: i, j, k : π ( a (color) i = min, a (shape) j { ( π a (color) i = min { max, a (size) ) k, a (shape) j ( π a (color) k i ( π a (color) i max i ), π, a (shape) j, a (shape) j ( a (shape) j, a (size) ) k, a (size) k Note the analogy to the probabilistic reconstruction formulas., a (size) )} k, ) } Christian Borgelt Probabilistic Reasoning: Graphical Models 54

55 Reasoning with Projections Again the same result can be obtained using only projections to subspaces (maximal degrees of possibility): s m l new old color size old new old new min new max line shape new old min new max column old new s m l This justifies a graph representation: color shape size Christian Borgelt Probabilistic Reasoning: Graphical Models 55

56 Possibilistic Graphical Models: Formalization Christian Borgelt Probabilistic Reasoning: Graphical Models 56

57 Conditional Possibility and Independence Definition: Let Ω be a (finite) sample space, Π a possibility measure on Ω, and E 1, E 2 Ω events. Then Π(E 1 E 2 ) = Π(E 1 E 2 ) is called the conditional possibility of E 1 given E 2. Definition: Let Ω be a (finite) sample space, Π a possibility measure on Ω, and A, B, and C attributes with respective domains dom(a), dom(b), and dom(c). A and B are called conditionally possibilistically independent given C, written A Π B C, iff a dom(a) : b dom(b) : c dom(c) : Π(A = a, B = b C = c) = min{π(a = a C = c), Π(B = b C = c)}. Similar to the corresponding notions of probability theory. Christian Borgelt Probabilistic Reasoning: Graphical Models 57

58 Possibilistic Evidence Propagation, Step 1 π(b = b A = a obs ) = π ( A = a, B = b, (1) = max a dom(a) a dom(a) { (2) = max a dom(a) { (3) = max { max a dom(a) c dom(c) c dom(c) C = c A = aobs ) max {π(a = a, B = b, C = c A = a obs)}} c dom(c) A: color B: shape C: size max {min{π(a = a, B = b, C = c), π(a = a A = a obs)}}} c dom(c) {min{π(a = a, B = b), π(b = b, C = c), π(a = a A = a obs )}}} = max {min{π(a = a, B = b), π(a = a A = a obs), a dom(a) max {π(b = b, C = c)} }} c dom(c) }{{} =π(b=b) π(a=a,b=b) = max a dom(a) {min{π(a = a, B = b), π(a = a A = a obs)}} Christian Borgelt Probabilistic Reasoning: Graphical Models 58

59 Graphical Models: The General Theory Christian Borgelt Probabilistic Reasoning: Graphical Models 59

60 (Semi-)Graphoid Axioms Definition: Let V be a set of (mathematical) objects and ( ) a three-place relation of subsets of V. Furthermore, let W, X, Y, and Z be four disjoint subsets of V. The four statements symmetry: (X Y Z) (Y X Z) decomposition: (W X Y Z) (W Y Z) (X Y Z) weak union: (W X Y Z) (X Y Z W ) contraction: (X Y Z W ) (W Y Z) (W X Y Z) are called the semi-graphoid axioms. A three-place relation ( ) that satisfies the semi-graphoid axioms for all W, X, Y, and Z is called a semi-graphoid. The above four statements together with intersection: (W Y Z X) (X Y Z W ) (W X Y Z) are called the graphoid axioms. A three-place relation ( ) that satisfies the graphoid axioms for all W, X, Y, and Z is called a graphoid. Christian Borgelt Probabilistic Reasoning: Graphical Models 60

61 Illustration of the (Semi-)Graphoid Axioms decomposition: W X Z Y W Z Y X Z Y weak union: W X Z Y W X Z Y contraction: W X Z Y W Z Y W X Z Y intersection: W X Z Y W X Z Y W X Z Y Similar to the properties of separation in graphs. Idea: Represent conditional independence by separation in graphs. Christian Borgelt Probabilistic Reasoning: Graphical Models 61

62 Separation in Graphs Definition: Let G = (V, E) be an undirected graph and X, Y, and Z three disjoint subsets of nodes. Z u-separates X and Y in G, written X Z Y G, iff all paths from a node in X to a node in Y contain a node in Z. A path that contains a node in Z is called blocked (by Z), otherwise it is called active. Definition: Let G = (V, E) be a directed acyclic graph and X, Y, and Z three disjoint subsets of nodes. Z d-separates X and Y in G, written X Z Y G, iff there is no path from a node in X to a node in Y along which the following two conditions hold: 1. every node with converging edges either is in Z or has a descendant in Z, 2. every other node is not in Z. A path satisfying the two conditions above is said to be active, otherwise it is said to be blocked (by Z). Christian Borgelt Probabilistic Reasoning: Graphical Models 62

63 Separation in Directed Acyclic Graphs Example Graph: A 1 A 6 A 9 A 3 A 7 A 2 A 4 A 5 A 8 Valid Separations: {A 1 } {A 3 } {A 4 } {A 8 } {A 7 } {A 9 } {A 3 } {A 4, A 6 } {A 7 } {A 1 } {A 2 } Invalid Separations: {A 1 } {A 4 } {A 2 } {A 1 } {A 6 } {A 7 } {A 4 } {A 3, A 7 } {A 6 } {A 1 } {A 4, A 9 } {A 5 } Christian Borgelt Probabilistic Reasoning: Graphical Models 63

64 Conditional (In)Dependence Graphs Definition: Let ( δ ) be a three-place relation representing the set of conditional independence statements that hold in a given distribution δ over a set U of attributes. An undirected graph G = (U, E) over U is called a conditional dependence graph or a dependence map w.r.t. δ, iff for all disjoint subsets X, Y, Z U of attributes X δ Y Z X Z Y G, i.e., if G captures by u-separation all (conditional) independences that hold in δ and thus represents only valid (conditional) dependences. Similarly, G is called a conditional independence graph or an independence map w.r.t. δ, iff for all disjoint subsets X, Y, Z U of attributes X Z Y G X δ Y Z, i.e., if G captures by u-separation only (conditional) independences that are valid in δ. G is said to be a perfect map of the conditional (in)dependences in δ, if it is both a dependence map and an independence map. Christian Borgelt Probabilistic Reasoning: Graphical Models 64

65 Conditional (In)Dependence Graphs Definition: A conditional dependence graph is called maximal w.r.t. a distribution δ (or, in other words, a maximal dependence map w.r.t. δ) iff no edge can be added to it so that the resulting graph is still a conditional dependence graph w.r.t. the distribution δ. Definition: A conditional independence graph is called minimal w.r.t. a distribution δ (or, in other words, a minimal independence map w.r.t. δ) iff no edge can be removed from it so that the resulting graph is still a conditional independence graph w.r.t. the distribution δ. Conditional independence graphs are sometimes required to be minimal. However, this requirement is not necessary for a conditional independence graph to be usable for evidence propagation. The disadvantage of a non-minimal conditional independence graph is that evidence propagation may be more costly computationally than necessary. Christian Borgelt Probabilistic Reasoning: Graphical Models 65

66 Limitations of Graph Representations Perfect directed map, no perfect undirected map: A B A = a 1 A = a 2 pabc B = b 1 B = b 2 B = b 1 B = b 2 C C = c 1 4 / 24 3 / 24 3 / 24 2 / 24 C = c 2 2 / 24 3 / 24 3 / 24 4 / 24 Perfect undirected map, no perfect directed map: A B D C A = a 1 A = a 2 pabcd B = b 1 B = b 2 B = b 1 B = b 2 D = d 1 1 / 47 1 / 47 1 / 47 2 / 47 C = c 1 D = d 2 1 / 47 1 / 47 2 / 47 4 / 47 D = d 1 1 / 47 2 / 47 1 / 47 4 / 47 C = c 2 D = d 2 2 / 47 4 / 47 4 / / 47 Christian Borgelt Probabilistic Reasoning: Graphical Models 66

67 Limitations of Graph Representations There are also probability distributions for which there exists neither a directed nor an undirected perfect map: A A = a 1 A = a 2 pabc B = b 1 B = b 2 B = b 1 B = b 2 B C C = c 1 2 / 12 1 / 12 1 / 12 2 / 12 C = c 2 1 / 12 2 / 12 2 / 12 1 / 12 In such cases either not all dependences or not all independences can be captured by a graph representation. In such a situation one usually decides to neglect some of the independence information, that is, to use only a (minimal) conditional independence graph. This is sufficient for correct evidence propagation, the existence of a perfect map is not required. Christian Borgelt Probabilistic Reasoning: Graphical Models 67

68 Markov Properties of Undirected Graphs Definition: An undirected graph G = (U, E) over a set U of attributes is said to have (w.r.t. a distribution δ) the pairwise Markov property, iff in δ any pair of attributes which are nonadjacent in the graph are conditionally independent given all remaining attributes, i.e., iff A, B U, A B : (A, B) / E A δ B U {A, B}, local Markov property, iff in δ any attribute is conditionally independent of all remaining attributes given its neighbors, i.e., iff A U : A δ U closure(a) boundary(a), global Markov property, iff in δ any two sets of attributes which are u-separated by a third are conditionally independent given the attributes in the third set, i.e., iff X, Y, Z U : X Z Y G X δ Y Z. Christian Borgelt Probabilistic Reasoning: Graphical Models 68

69 Markov Properties of Directed Acyclic Graphs Definition: A directed acyclic graph G = (U, E) over a set U of attributes is said to have (w.r.t. a distribution δ) the pairwise Markov property, iff in δ any attribute is conditionally independent of any non-descendant not among its parents given all remaining non-descendants, i.e., iff A, B U : B nondescs(a) parents(a) A δ B nondescs(a) {B}, local Markov property, iff in δ any attribute is conditionally independent of all remaining non-descendants given its parents, i.e., iff A U : A δ nondescs(a) parents(a) parents(a), global Markov property, iff in δ any two sets of attributes which are d-separated by a third are conditionally independent given the attributes in the third set, i.e., iff X, Y, Z U : X Z Y G X δ Y Z. Christian Borgelt Probabilistic Reasoning: Graphical Models 69

70 Equivalence of Markov Properties Theorem: If a three-place relation ( δ ) representing the set of conditional independence statements that hold in a given joint distribution δ over a set U of attributes satisfies the graphoid axioms, then the pairwise, the local, and the global Markov property of an undirected graph G = (U, E) over U are equivalent. Theorem: If a three-place relation ( δ ) representing the set of conditional independence statements that hold in a given joint distribution δ over a set U of attributes satisfies the semi-graphoid axioms, then the local and the global Markov property of a directed acyclic graph G = (U, E) over U are equivalent. If ( δ ) satisfies the graphoid axioms, then the pairwise, the local, and the global Markov property are equivalent. Christian Borgelt Probabilistic Reasoning: Graphical Models 70

71 Markov Equivalence of Graphs Can two distinct graphs represent the exactly the same set of conditional independence statements? The answer is relevant for learning graphical models from data, because it determines whether we can expect a unique graph as a learning result or not. Definition: Two (directed or undirected) graphs G 1 = (U, E 1 ) and G 2 = (U, E 2 ) with the same set U of nodes are called Markov equivalent iff they satisfy the same set of node separation statements (with d-separation for directed graphs and u-separation for undirected graphs), or formally, iff X, Y, Z U : X Z Y G1 X Z Y G2. No two different undirected graphs can be Markov equivalent. The reason is that these two graphs, in order to be different, have to differ in at least one edge. However, the graph lacking this edge satisfies a node separation (and thus expresses a conditional independence) that is not statisfied (expressed) by the graph possessing the edge. Christian Borgelt Probabilistic Reasoning: Graphical Models 71

72 Markov Equivalence of Graphs Definition: Let G = (U, E) be a directed graph. The skeleton of G is the undirected graph G = (V, E) where E contains the same edges as E, but with their directions removed, or formally: E = {(A, B) U U (A, B) E (B, A) E}. Definition: Let G = (U, E) be a directed graph and A, B, C U three nodes of G. The triple (A, B, C) is called a v-structure of G iff (A, B) E and (C, B) E, but neither (A, C) E nor (C, A) E, that is, iff G has converging edges from A and C at B, but A and C are unconnected. Theorem: Let G 1 = (U, E 1 ) and G 2 = (U, E 2 ) be two directed acyclic graphs with the same node set U. The graphs G 1 and G 2 are Markov equivalent iff they possess the same skeleton and the same set of v-structures. Intuitively: Edge directions may be reversed if this does not change the set of v-structures. Christian Borgelt Probabilistic Reasoning: Graphical Models 72

73 Markov Equivalence of Graphs A A B C B C D D Graphs with the same skeleton, but converging edges at different nodes, which start from connected nodes, can be Markov equivalent. A A B C B C D D Of several edges that converge at a node only a subset may actually represent a v-structure. This v-structure, however, is relevant. Christian Borgelt Probabilistic Reasoning: Graphical Models 73

74 Undirected Graphs and Decompositions Definition: A probability distribution p V over a set V of variables is called decomposable or factorizable w.r.t. an undirected graph G = (V, E) iff it can be written as a product of nonnegative functions on the maximal cliques of G. That is, let M be a family of subsets of variables, such that the subgraphs of G induced by the sets M M are the maximal cliques of G. Then there exist functions φ M : E M IR + 0, M M, a 1 dom(a 1 ) :... a n dom(a n ) : Example: p V ( A i V A i = a i ) = ( φ M M M A i M A i = a i ). A 1 A 2 A 3 A 4 A 5 A 6 p V (A 1 = a 1,..., A 6 = a 6 ) = φ A1 A 2 A 3 (A 1 = a 1, A 2 = a 2, A 3 = a 3 ) φ A3 A 5 A 6 (A 3 = a 3, A 5 = a 5, A 6 = a 6 ) φ A2 A 4 (A 2 = a 2, A 4 = a 4 ) φ A4 A 6 (A 4 = a 4, A 6 = a 6 ). Christian Borgelt Probabilistic Reasoning: Graphical Models 74

75 Directed Acyclic Graphs and Decompositions Definition: A probability distribution p U over a set U of attributes is called decomposable or factorizable w.r.t. a directed acyclic graph G = (U, E) over U, iff it can be written as a product of the conditional probabilities of the attributes given their parents in G, i.e., iff a 1 dom(a 1 ) :... a n dom(a n ) : p U ( A i U A i = a i ) = A i U P ( A i = a i A j parents G (A i ) A j = a j ). Example: A 1 A 2 A 3 A 4 A 5 A 6 A 7 P (A 1 = a 1,..., A 7 = a 7 ) = P (A 1 = a 1 ) P (A 2 = a 2 A 1 = a 1 ) P (A 3 = a 3 ) P (A 4 = a 4 A 1 = a 1, A 2 = a 2 ) P (A 5 = a 5 A 2 = a 2, A 3 = a 3 ) P (A 6 = a 6 A 4 = a 4, A 5 = a 5 ) P (A 7 = a 7 A 5 = a 5 ). Christian Borgelt Probabilistic Reasoning: Graphical Models 75

76 Conditional Independence Graphs and Decompositions Core Theorem of Graphical Models: Let p V be a strictly positive probability distribution on a set V of (discrete) variables. A directed or undirected graph G = (V, E) is a conditional independence graph w.r.t. p V if and only if p V is factorizable w.r.t. G. Definition: A Markov network is an undirected conditional independence graph of a probability distribution p V together with the family of positive functions φ M of the factorization induced by the graph. Definition: A Bayesian network is a directed conditional independence graph of a probability distribution p U together with the family of conditional probabilities of the factorization induced by the graph. Sometimes the conditional independence graph is required to be minimal, if it is to be used as the graph underlying a Markov or Bayesian network. For correct evidence propagation it is not required that the graph is minimal. Evidence propagation may just be less efficient than possible. Christian Borgelt Probabilistic Reasoning: Graphical Models 76

77 Probabilistic Graphical Models: Evidence Propagation in Undirected Trees Christian Borgelt Probabilistic Reasoning: Graphical Models 77

78 Evidence Propagation in Undirected Trees A µ A B µ B A B Node processors communicating by message passing. The messages represent information collected in the corresponding subgraphs. Derivation of the Propagation Formulae Computation of Marginal Distribution: P (A g = a g ) = A k U {A g }: a k dom(a k ) P ( A i U A i = a i ), Factor Potential Decomposition w.r.t. Undirected Tree: P (A g = a g ) = A k U {A g }: a k dom(a k ) (A i,a j ) E φ Ai A j (a i, a j ). Christian Borgelt Probabilistic Reasoning: Graphical Models 78

79 Evidence Propagation in Undirected Trees All factor potentials have only two arguments, because we deal with a tree: the maximal cliques of a tree are simply its edges, as there are no cycles. In addition, a tree has the convenient property that by removing an edge it is split into two disconnected subgraphs. In order to be able to refer to such subgraphs, we define: U A B = {A} {C U A G C, G = (U, E {(A, B), (B, A)})}, that is, UB A is the set of those attributes that can still be reached from the attribute A if the edge A B is removed. Similarly, we introduce a notation for the edges in these subgraphs, namely E A B = E (U A B U A B ). Thus G A B = (U B A, EA B ) is the subgraph containing all attributes that can be reached from the attribute B through its neighbor A (including A itself). Christian Borgelt Probabilistic Reasoning: Graphical Models 79

80 Evidence Propagation in Undirected Trees In the next step we split the product over all edges into individual factors w.r.t. the neighbors of the goal attribute: we write one factor for each neighbor. Each of these factors captures the part of the factorization that refers to the subgraph consisting of the attributes that can be reached from the goal attribute through this neighbor, including the factor potential of the edge that connects the neighbor to the goal attribute. That is, we write: P (A g = a g ) = A k U {A g }: a k dom(a k ) A h neighbors(a g ) ( φag A h (a g, a h ) (A i,a j ) E A h Ag φ Ai A j (a i, a j ) ). Note that indeed each factor of the outer product in the above formula refers only to attributes in the subgraph that can be reached from the attribute A g through the neighbor attribute A h defining the factor. Christian Borgelt Probabilistic Reasoning: Graphical Models 80

81 Evidence Propagation in Undirected Trees In the third step it is exploited that terms that are independent of a summation variable can be moved out of the corresponding sum. In addition we make use of i j a i b j = ( i a i )( j This yields a decomposition of the expression for P (A g = a g ) into factors: P (A g = a g ) = = A h neighbors(a g ) A h neighbors(a g ) ( A k U A h Ag : a k dom(a k ) µ Ah A g (A g = a g ). φ Ag A h (a g, a h ) b j ). (A i,a j ) E A h Ag φ Ai A j (a i, a j ) ) Each factor represents the probabilistic influence of the subgraph that can be reached through the corresponding neighbor A h neighbors(a g ). Thus it can be interpreted as a message about this influence sent from A h to A g. Christian Borgelt Probabilistic Reasoning: Graphical Models 81

82 Evidence Propagation in Undirected Trees With this formula the propagation formula can now easily be derived. The key is to consider a single factor of the above product and to compare it to the expression for P (A h = a h ) for the corresponding neighbor A h, that is, to P (A h = a h ) = A k U {A h }: a k dom(a k ) (A i,a j ) E φ Ai A j (a i, a j ). Note that this formula is completely analogous to the formula for P (A g = a g ) after the first step, that is, after the application of the factorization formula, with the only difference that this formula refers to A h instead of A g : P (A g = a g ) = A k U {A g }: a k dom(a k ) (A i,a j ) E φ Ai A j (a i, a j ). We now identify terms that occur in both formulas. Christian Borgelt Probabilistic Reasoning: Graphical Models 82

83 Evidence Propagation in Undirected Trees Exploiting that obviously U = U A h A g U A g A and drawing on the distributive law h again, we can easily rewrite this expression as a product with two factors: P (A h = a h ) = ( A A h k UAg {A h}: a k dom(a k ) ( A k U A g A h : (A i,a j ) E A h Ag φ Ai A j (a i, a j ) ) φ Ag A h (a g, a h ) φ Ai A j (a i, a j ). a k dom(a k ) (A i,a j ) E A g A }{{ h } = µ Ag A h (A h = a h ) ) Christian Borgelt Probabilistic Reasoning: Graphical Models 83

84 Evidence Propagation in Undirected Trees As a consequence, we obtain the simple expression µ Ah A g (A g = a g ) ( = φag A h (a g, a h ) = a h dom(a h ) a h dom(a h ) ( φag A h (a g, a h ) This formula is very intuitive: P (A h = a h ) ) µ Ag A h (A h = a h ) A i neighbors(a h ) {A g } µ Ai A h (A h = a h ) ). In the upper form it says that all information collected at A k (expressed as P (A k = a k )) should be transferred to A g, with the exception of the information that was received from A g. In the lower form the formula says that everything coming in through edges other than A g A k has to be combined and then passed on to A g. Christian Borgelt Probabilistic Reasoning: Graphical Models 84

85 Evidence Propagation in Undirected Trees The second form of this formula also provides us with a means to start the message computations. Obviously, the value of the message µ Ah A g (A g = a g ) can immediately be computed if A h is a leaf node of the tree. In this case the product has no factors and thus the equation reduces to µ Ah A g (A g = a g ) = a h dom(a h ) φ Ag A h (a g, a h ). After all leaves have computed these messages, there must be at least one node, for which messages from all but one neighbor are known. This enables this node to compute the message to the neighbor it did not receive a message from. After that, there must again be at least one node, which has received messages from all but one neighbor. Hence it can send a message and so on, until all messages have been computed. Christian Borgelt Probabilistic Reasoning: Graphical Models 85

86 Evidence Propagation in Undirected Trees Up to now we have assumed that no evidence has been added to the network, that is, that no attributes have been instantiated. However, if attributes are instantiated, the formulae change only slightly. We have to add to the joint probability distribution an evidence factor for each instantiated attribute: if U obs is the set of observed (instantiated) attributes, we compute P (A g = a g = α where the a (obs) o ) A o U obs A o = a (obs) A k U {A g }: a k dom(a k ) o P ( A i U A i = a i ) A o U obs evidence factor for A {}} o { P (A o = a o A o = a (obs) o ), P (A o = a o ) are the observed values and α is a normalization constant, α = β P (A j = a (obs) j ) with β = P ( A j = a (obs) j ) 1. A j U obs A j U obs Christian Borgelt Probabilistic Reasoning: Graphical Models 86

87 Evidence Propagation in Undirected Trees The justification for this formula is analogous to the justification for the introduction of similar evidence factors for the observed attributes in the simple three-attribute example (color/shape/size): P ( A i U = β P ( = A i = a i A i U with β as defined above, o ) A o U obs A o = a (obs) A i = a i, A o = a (obs) A o U obs o ) β P ( ) A i U A i = a i, if Ai U obs : a i = a (obs) i, 0, otherwise, β = P ( A j = a (obs) j ) 1. A j U obs Christian Borgelt Probabilistic Reasoning: Graphical Models 87

88 Evidence Propagation in Undirected Trees In addition, it is clear that A j U obs : P ( A j = a j A j = a (obs) j ) = 1, if a j = a (obs) j, 0, otherwise, Therefore we have A j U obs P ( A j = a j A j = a (obs) j ) = 1, if A j U obs : a j = a (obs) j, 0, otherwise. Combining these equations, we arrive at the formula stated above: P (A g = a g = α o ) A o U obs A o = a (obs) A k U {A g }: a k dom(a k ) P ( A i U A i = a i ) A o U obs evidence factor for A {}} o { P (A o = a o A o = a (obs) o ), P (A o = a o ) Christian Borgelt Probabilistic Reasoning: Graphical Models 88

89 Evidence Propagation in Undirected Trees Note that we can neglect the normalization factor α, because it can always be recovered from the fact that a probability distribution, whether marginal or conditional, must be normalized. That is, instead of trying to determine α beforehand in order to compute P (A g = a g A o U obs A o = a (obs) o ) directly, we confine ourselves to computing 1 α P (A g = a g A o U obs A o = a (obs) o ) for all a g dom(a g ). Then we determine α indirectly with the equation a g dom(a g ) P (A g = a g A o = a (obs) o ) = 1. A o U obs In other words, the computed values 1 α P (A g = a g A o U obs A o = a (obs) o ) are simply normalized to sum 1 to compute the desired probabilities. Christian Borgelt Probabilistic Reasoning: Graphical Models 89

90 Evidence Propagation in Undirected Trees If the derivation is redone with the modified initial formula for the probability of a value of some goal attribute A g, the evidence factors P (A o = a o A o = a (obs) o )/P (A o = a o ) directly influence only the formula for the messages that are sent out from the instantiated attributes. Therefore we obtain the following formula for the messages that are sent from an instantiated attribute A o : µ Ao A i (A i = a i ) ( = φai A o (a i, a o ) = a o dom(a o ) γ φ Ai A o (a i, a (obs) o ), if a o = a (obs) o, 0, otherwise, where γ = 1 / µ Ai A o (A o = a (obs) o ). P (A o = a o ) ) P (A o = a o A o = a (obs) µ Ai A o (A o = a o ) P (A o = a o ) o ) Christian Borgelt Probabilistic Reasoning: Graphical Models 90

91 Evidence Propagation in Undirected Trees This formula is again very intuitive: In an undirected tree, any attribute A o u-separates all attributes in a subgraph reached through one of its neighbors from all attributes in a subgraph reached through any other of its neighbors. Consequently, if A o is instantiated, all paths through A o are blocked and thus no information should be passed from one neighbor to any other. Note that in an implementation we can neglect γ, because it is the same for all values a i dom(a i ) and thus can be incorporated into the constant α. Rewriting the Propagation Formulae in Vector Form: We need to determine the probability of all values of the goal attribute and we have to evaluate the messages for all values of the attributes that are arguments. Therefore it is convenient to write the equations in vector form, with a vector for each attribute that has as many elements as the attribute has values. The factor potentials can then be represented as matrices. Christian Borgelt Probabilistic Reasoning: Graphical Models 91

92 Probabilistic Graphical Models: Evidence Propagation in Polytrees Christian Borgelt Probabilistic Reasoning: Graphical Models 92

93 Evidence Propagation in Polytrees λ B A A π A B B Idea: Node processors communicating by message passing: π-messages are sent from parent to child and λ-messages are sent from child to parent. Derivation of the Propagation Formulae Computation of Marginal Distribution: P (A g = a g ) = A i U {A g }: a i dom(a i ) P ( A j U A j = a j ) Chain Rule Factorization w.r.t. the Polytree: P (A g = a g ) = A i U {A g }: a i dom(a i ) A k U P ( A k = a k A j parents(a k ) A j = a j ) Christian Borgelt Probabilistic Reasoning: Graphical Models 93

94 Evidence Propagation in Polytrees (continued) Decomposition w.r.t. Subgraphs: P (A g = a g ) = A i U {A g }: a i dom(a i ) A k U + (A g ) A k U (A g ) Attribute sets underlying subgraphs: U + (A) = U (A) = ( ( P Ag = a ) g A j = a j P ( A k = a k P ( A k = a k A j parents(a g ) A j parents(a k ) A j parents(a k ) A j = a j ) A j = a j )). U A B (C) = {C} {D U D G C, G = (U, E {(A, B)})}, C parents(a) C children(a) U C A (C), U +(A, B) = U A C (C), U (A, B) = C parents(a) {B} C children(a) {B} U C A (C), U C A (C). Christian Borgelt Probabilistic Reasoning: Graphical Models 94

95 Evidence Propagation in Polytrees (continued) Terms that are independent of a summation variable can be moved out of the corresponding sum. This yields a decomposition into two main factors: P (A g = a g ) = ( A i parents(a g ): a i dom(a i ) [ A i U+ (A g): a i dom(a i ) [ A i U (A g ): a i dom(a i ) P ( A g = a g A k U + (A g ) A k U (A g ) = π(a g = a g ) λ(a g = a g ), where U +(A g ) = U + (A g ) parents(a g ). A j parents(a g ) P ( A k = a k P ( A k = a k A j = a j ) A j parents(a k ) A j parents(a k ) A j = a j )]) A j = a j )] Christian Borgelt Probabilistic Reasoning: Graphical Models 95

96 Evidence Propagation in Polytrees (continued) P ( A k = a k A i U+ (A g): A k U + (A g ) a i dom(a i ) ( = = A p parents(a g ) [ A i U+ (A p): a i dom(a i ) [ A i U (A p,a g ): a i dom(a i ) A p parents(a g ) [ A i U (A p,a g ): a i dom(a i ) A i parents(a p ): a i dom(a i ) A k U + (A p ) A k U (A p,a g ) π(a p = a p ) A k U (A p,a g ) A j parents(a k ) P ( A p = a p P ( A k = a k P ( A k = a k P ( A k = a k A j = a j ) A j parents(a p ) A j parents(a k ) A j parents(a k ) A j parents(a k ) A j = a j ) A j = a j )]) A j = a j )] A j = a j )] Christian Borgelt Probabilistic Reasoning: Graphical Models 96

97 Evidence Propagation in Polytrees (continued) A i U+ (A g): A k U + (A g ) a i dom(a i ) = = P ( A k = a k π(a p = a p ) A p parents(a g ) [ A i U (A p,a g ): a i dom(a i ) A p parents(a g ) A j parents(a k ) P ( A k = a k A j = a j ) A k U (A p,a g ) A j parents(a k ) π Ap A g (A p = a p ) A j = a j )] π(a g = a g ) = A i parents(a g ): a i dom(a i ) P (A g = a g A p parents(a g ) A j parents(a g ) A j = a j ) π Ap A g (A p = a p ) Christian Borgelt Probabilistic Reasoning: Graphical Models 97

98 Evidence Propagation in Polytrees (continued) λ(a g = a g ) = = = A c children(a g ) ( A i U (A g ): a i dom(a i ) a c dom(a c ) A k U (A g ) P (A c = a c P (A k = a k A j parents(a k ) A j = a j ) A j = a j ) [ A i parents(a c ) {A g }: a i dom(a i ) A j parents(a c ) P (A k = a k A i U+ (A c,a g ): A k U + (A c,a g ) A j parents(a k ) [ a i dom(a i ) P (A k = a k A j = a j ) ] A i U (A c ): A k U (A c ) A j parents(a k ) } a i dom(a i ) {{ } A c children(a g ) = λ(a c = a c ) λ Ac A g (A g = a g ) A j = a j ) ]) Christian Borgelt Probabilistic Reasoning: Graphical Models 98

99 Propagation Formulae without Evidence π Ap A c (A p = a p ) = π(a p = a p ) [ = P (A p = a p ) λ Ac A p (A p = a p ) A i U (A p,a c ): a i dom(a i ) P ( A k = a k A k U (A p,a c ) A j parents(a k ) A j = a j )] λ Ac A p (A p = a p ) = a c dom(a c ) λ(a c = a c ) A i parents(a c ) {A p }: a i dom(a k ) A k parents(a c ) {A p } P ( A c = a c π Ak A p (A k = a k ) A j parents(a c ) A j = a j ) Christian Borgelt Probabilistic Reasoning: Graphical Models 99

100 Evidence Propagation in Polytrees (continued) Evidence: The attributes in a set X obs are observed. P ( A g = a g = = α A i U {A g }: a i dom(a i ) A k = a (obs) k A k X obs A i U {A g }: a i dom(a i ) P ( P ( A j U A j U ) A j = a j A j = a j ) A k = a (obs) k A k X obs A k X obs P ( ) A k = a k Ak = a (obs) k ), where α = P 1 ( A k X A obs k = a (obs) k ) Christian Borgelt Probabilistic Reasoning: Graphical Models 100

101 Propagation Formulae with Evidence π Ap A c (A p = a p ) = P = ( A p = a p Ap = a (obs) p [ { A i U (A p,a c ): a i dom(a i ) β, if a p = a (obs) p, 0, otherwise, ) A k U (A p,a c ) π(a p = a p ) P ( A k = a k A j parents(a k ) A j = a j )] The value of β is not explicitly determined. Usually a value of 1 is used and the correct value is implicitly determined later by normalizing the resulting probability distribution for A g. Christian Borgelt Probabilistic Reasoning: Graphical Models 101

102 Propagation Formulae with Evidence λ Ac A p (A p = a p ) = a c dom(a c ) P ( A c = a c Ac = a (obs) c A i parents(a c ) {A p }: a i dom(a k ) ) P ( A c = a c λ(a c = a c ) A k parents(a c ) {A p } A j parents(a c ) A j = a j ) π Ak A c (A k = a k ) Christian Borgelt Probabilistic Reasoning: Graphical Models 102

103 Probabilistic Graphical Models: Evidence Propagation in Multiply Connected Networks Christian Borgelt Probabilistic Reasoning: Graphical Models 103

104 Propagation in Multiply Connected Networks Multiply connected networks pose a problem: There are several ways on which information can travel from one attribute (node) to another. As a consequence, the same evidence may be used twice to update the probability distribution of an attribute. Since probabilistic update is not idempotent, multiple inclusion of the same evidence usually invalidates the result. General idea to solve this problem: Transform network into a singly connected structure. B A D C A BC D Merging attributes can make the polytree algorithm applicable in multiply connected networks. Christian Borgelt Probabilistic Reasoning: Graphical Models 104

105 Triangulation and Join Tree Construction original graph triangulated moral graph maximal cliques join tree A singly connected structure is obtained by triangulating the graph and then forming a tree of maximal cliques, the so-called join tree. For evidence propagation a join tree is enhanced by so-called separators on the edges, which are intersection of the connected nodes junction tree. Christian Borgelt Probabilistic Reasoning: Graphical Models 105

106 Graph Triangulation Algorithm: Graph Triangulation Input: An undirected graph G = (V, E). Output: A triangulated undirected graph G = (V, E ) with E E. 1. Compute an ordering of the nodes of the graph using maximum cardinality search. That is, number the nodes from 1 to n = V, in increasing order, always assigning the next number to the node having the largest set of previously numbered neighbors (breaking ties arbitrarily). 2. From i = n = V to i = 1 recursively fill in edges between any nonadjacent neighbors of the node numbered i that have lower ranks than i (including neighbors linked to the node numbered i in previous steps). If no edges are added to the graph G, then the original graph G is triangulated; otherwise the new graph (with the added edges) is triangulated. Christian Borgelt Probabilistic Reasoning: Graphical Models 106

107 Join Tree Construction Algorithm: Join Tree Construction Input: A triangulated undirected graph G = (V, E). Output: A join tree G = (V, E ) for G. 1. Find all maximal cliques C 1,..., C k of the input graph G and thus form the set V of vertices of the graph G (each maximal clique is a node). 2. Form the set E = {(C i, C j ) C i C j } of candidate edges and assign to each edge the size of the intersection of the connected maximal cliques as a weight, that is, set w((c i, C j )) = C i C j. 3. Form a maximum spanning tree from the edges in E w.r.t. the weight w, using, for example, the algorithms proposed by [Kruskal 1956, Prim 1957]. The edges of this maximum spanning tree are the edges in E. Christian Borgelt Probabilistic Reasoning: Graphical Models 107

108 Reasoning in Join/Junction Trees Reasoning in join trees follows the same lines as for undirected trees. Multiple pieces of evidence from different branches may be incorporated into a distribution before continuing by summing/marginalizing new old old new new old color shape new old size line new old old new s column m old new s m l l Christian Borgelt Probabilistic Reasoning: Graphical Models 108

109 Graphical Models: Manual Model Building Christian Borgelt Probabilistic Reasoning: Graphical Models 109

110 Building Graphical Models: Causal Modeling Manual creation of a reasoning system based on a graphical model: causal model of given domain heuristics! conditional independence graph formally provable decomposition of the distribution formally provable evidence propagation scheme Problem: strong assumptions about the statistical effects of causal relations. Nevertheless this approach often yields usable graphical models. Christian Borgelt Probabilistic Reasoning: Graphical Models 110

111 @ Probabilistic Graphical Models: An Example Danish Jersey Cattle Blood A A A A A A attributes: 11 offspring ph.gr. 1 1 dam correct? 12 offspring ph.gr. 2 2 sire correct? 13 offspring genotype 3 stated dam ph.gr factor 40 4 stated dam ph.gr factor 41 5 stated sire ph.gr factor 42 6 stated sire ph.gr factor 43 7 true dam ph.gr lysis 40 8 true dam ph.gr lysis 41 9 true sire ph.gr lysis true sire ph.gr lysis 43 The grey nodes correspond to observable attributes. This graph was specified by human domain experts, based on knowledge about (causal) dependences of the variables. Christian Borgelt Probabilistic Reasoning: Graphical Models 111

112 Probabilistic Graphical Models: An Example Danish Jersey Cattle Blood Type Determination Full 21-dimensional domain has = possible states. Bayesian network requires only 306 conditional probabilities. Example of a conditional probability table (attributes 2, 9, and 5): sire true sire stated sire phenogroup 1 correct phenogroup 1 F1 V1 V2 yes F yes V yes V no F no V no V The probabilities are acquired from human domain experts or estimated from historical data. Christian Borgelt Probabilistic Reasoning: Graphical Models 112

113 C Probabilistic Graphical Models: An Example A A A A *@ $ #@@ + " Danish Jersey Cattle Blood Type Determination A A A A C C C C.C/ C./ C-C B B B B B,B, B, B, moral graph (already triangulated) join tree Christian Borgelt Probabilistic Reasoning: Graphical Models 113

114 Graphical Models and Causality Christian Borgelt Probabilistic Reasoning: Graphical Models 114

115 Graphical Models and Causality A B C A B C A B C causal chain common cause common effect Example: A accelerator pedal B fuel supply C engine speed Example: A ice cream sales B temperature C bathing accidents Example: A influenza B fever C measles A C A C B A C A C B A C A C B Christian Borgelt Probabilistic Reasoning: Graphical Models 115

116 Common Cause Assumption (Causal Markov Assumption) L T? t r r l l 0 1 / 2 1 / / 2 1 / 2 1 / 2 1 / 2 R Y-shaped tube arrangement into which a ball is dropped (T ). Since the ball can reappear either at the left outlet (L) or the right outlet (R) the corresponding variables are dependent. Counter argument: The cause is insufficiently described. If the exact shape, position and velocity of the ball and the tubes are known, the outlet can be determined and the variables become independent. Counter counter argument: Quantum mechanics states that location and momentum of a particle cannot both at the same time be measured with arbitrary precision. Christian Borgelt Probabilistic Reasoning: Graphical Models 116

117 Sensitive Dependence on the Initial Conditions Sensitive dependence on the initial conditions means that a small change of the initial conditions (e.g. a change of the initial position or velocity of a particle) causes a deviation that grows exponentially with time. Many physical systems show, for arbitrary initial conditions, a sensitive dependence on the initial conditions. Due to this quantum mechanical effects sometimes have macroscopic consequences. Example: Billiard with round (or generally convex) obstacles. Initial imprecision: after four collisions: 100 degrees Christian Borgelt Probabilistic Reasoning: Graphical Models 117

118 Learning Graphical Models from Data Christian Borgelt Probabilistic Reasoning: Graphical Models 118

119 Learning Graphical Models from Data Given: Desired: A database of sample cases from a domain of interest. A (good) graphical model of the domain of interest. Quantitative or Parameter Learning The structure of the conditional independence graph is known. Conditional or marginal distributions have to be estimated by standard statistical methods. (parameter estimation) Qualitative or Structural Learning The structure of the conditional independence graph is not known. A good graph has to be selected from the set of all possible graphs. (model selection) Tradeoff between model complexity and model accuracy. Algorithms consist of a search scheme (which graphs are considered?) and a scoring function (how good is a given graph?). Christian Borgelt Probabilistic Reasoning: Graphical Models 119

120 Danish Jersey Cattle Blood Type Determination A fraction of the database of sample cases: y y f1 v2 f1 v2 f1 v2 f1 v2 v2 v2 v2v2 n y n y y y f1 v2 ** ** f1 v2 ** ** ** ** f1v2 y y n y y y f1 v2 f1 f1 f1 v2 f1 f1 f1 f1 f1f1 y y n n y y f1 v2 f1 f1 f1 v2 f1 f1 f1 f1 f1f1 y y n n y y f1 v2 f1 v1 f1 v2 f1 v1 v2 f1 f1v2 y y n y y y f1 f1 ** ** f1 f1 ** ** f1 f1 f1f1 y y n n y y f1 v1 ** ** f1 v1 ** ** v1 v2 v1v2 n y y y y y f1 v2 f1 v1 f1 v2 f1 v1 f1 v1 f1v1 y y y y attributes 500 real world sample cases A lot of missing values (indicated by **) Christian Borgelt Probabilistic Reasoning: Graphical Models 120

121 Learning Graphical Models from Data: Learning the Parameters Christian Borgelt Probabilistic Reasoning: Graphical Models 121

122 Learning the Parameters of a Graphical Model Given: Desired: A database of sample cases from a domain of interest. The graph underlying a graphical model for the domain. Good values for the numeric parameters of the model. Example: Naive Bayes Classifiers A naive Bayes classifier is a Bayesian network with a star-like structure. The class attribute is the only unconditioned attribute. All other attributes are conditioned on the class only. A 2 A 3 C A 4 A 1 A n The structure of a naive Bayes classifier is fixed once the attributes have been selected. The only remaining task is to estimate the parameters of the needed probability distributions. Christian Borgelt Probabilistic Reasoning: Graphical Models 122

123 Probabilistic Classification A classifier is an algorithm that assigns a class from a predefined set to a case or object, based on the values of descriptive attributes. An optimal classifier maximizes the probability of a correct class assignment. Let C be a class attribute with dom(c) = {c 1,..., c nc }, which occur with probabilities p i, 1 i n C. Let q i be the probability with which a classifier assigns class c i. (q i {0, 1} for a deterministic classifier) The probability of a correct assignment is P (correct assignment) = n C i=1 p i q i. Therefore the best choice for the q i is { 1, if pi = max n C q i = k=1 p k, 0, otherwise. Christian Borgelt Probabilistic Reasoning: Graphical Models 123

124 Probabilistic Classification (continued) Consequence: An optimal classifier should assign the most probable class. This argument does not change if we take descriptive attributes into account. Let U = {A 1,..., A m } be a set of descriptive attributes with domains dom(a k ), 1 k m. Let A 1 = a 1,..., A m = a m be an instantiation of the descriptive attributes. An optimal classifier should assign the class c i for which P (C = c i A 1 = a 1,..., A m = a m ) = max n C j=1 P (C = c j A 1 = a 1,..., A m = a m ) Problem: We cannot store a class (or the class probabilities) for every possible instantiation A 1 = a 1,..., A m = a m of the descriptive attributes. (The table size grows exponentially with the number of attributes.) Therefore: Simplifying assumptions are necessary. Christian Borgelt Probabilistic Reasoning: Graphical Models 124

125 Bayes Rule and Bayes Classifiers Bayes rule is a formula that can be used to invert conditional probabilities: Let X and Y be events, P (X) > 0. Then P (Y X) = P (X Y ) P (Y ). P (X) Bayes rule follows directly from the definition of conditional probability: P (Y X) = P (X Y ) P (X) Bayes classifiers: Compute the class probabilities as P (C = c i A 1 = a 1,..., A m = a m ) = and P (X Y ) = P (X Y ). P (Y ) P (A 1 = a 1,..., A m = a m C = c i ) P (C = c i ). P (A 1 = a 1,..., A m = a m ) Looks unreasonable at first sight: Even more probabilities to store. Christian Borgelt Probabilistic Reasoning: Graphical Models 125

126 Naive Bayes Classifiers Naive Assumption: The descriptive attributes are conditionally independent given the class. Bayes Rule: P (C = c i a) = P (A 1 = a 1,..., A m = a m C = c i ) P (C = c i ) P (A 1 = a 1,..., A m = a m ) p 0 = P ( a) Chain Rule of Probability: P (C = c i a) = P (C = c i) p 0 m k=1 Conditional Independence Assumption: P (C = c i a) = P (C = c i) p 0 m k=1 P (A k = a k A 1 = a 1,..., A k 1 = a k 1, C = c i ) P (A k = a k C = c i ) Christian Borgelt Probabilistic Reasoning: Graphical Models 126

127 Naive Bayes Classifiers (continued) Consequence: Manageable amount of data to store. Store distributions P (C = c i ) and 1 j m : P (A j = a j C = c i ). Classification: Compute for all classes c i P (C = c i A 1 = a 1,..., A m = a m ) p 0 = P (C = c i ) and predict the class c i for which this value is largest. n j=1 P (A j = a j C = c i ) Relation to Bayesian Networks: A 2 A 3 C A 4 A 1 A n Decomposition formula: P (C = c i, A 1 = a 1,..., A n = a n ) = P (C = c i ) n j=1 P (A j = a j C = c i ) Christian Borgelt Probabilistic Reasoning: Graphical Models 127

128 Naive Bayes Classifiers: Parameter Estimation Estimation of Probabilities: Nominal/Categorical Attributes: ˆP (A j = a j C = c i ) = #(A j = a j, C = c i ) + γ #(C = c i ) + n Aj γ #(ϕ) is the number of example cases that satisfy the condition ϕ. n Aj is the number of values of the attribute A j. γ is called Laplace correction. γ = 0: Maximum likelihood estimation. Common choices: γ = 1 or γ = 1 2. Laplace correction helps to avoid problems with attribute values that do not occur with some class in the given data. It also introduces a bias towards a uniform distribution. Christian Borgelt Probabilistic Reasoning: Graphical Models 128

129 Naive Bayes Classifiers: Parameter Estimation Estimation of Probabilities: Metric/Numeric Attributes: Assume a normal distribution. 1 P (A j = a j C = c i ) = 2πσj (c i ) exp (a j µ j (c i )) 2 2σj 2(c i) Estimate of mean value Estimate of variance ˆµ j (c i ) = ˆσ 2 j (c i) = 1 ξ 1 #(C = c i ) #(C=c i ) j=1 #(C=c i ) k=1 a j (k) ( aj (k) ˆµ j (c i ) ) 2 ξ = #(C = c i ) : Maximum likelihood estimation ξ = #(C = c i ) 1: Unbiased estimation Christian Borgelt Probabilistic Reasoning: Graphical Models 129

130 Naive Bayes Classifiers: Simple Example 1 No Sex Age Blood pr. Drug 1 male 20 normal A 2 female 73 normal B 3 female 37 high A 4 male 33 low B 5 female 48 high A 6 male 29 normal A 7 female 52 normal B 8 male 42 low B 9 male 61 normal B 10 female 30 normal A 11 female 26 low B 12 male 54 high A P (Drug) A B P (Sex Drug) A B male female P (Age Drug) A B µ σ P (Blood Pr. Drug) A B low normal high A simple database and estimated (conditional) probability distributions. Christian Borgelt Probabilistic Reasoning: Graphical Models 130

131 Naive Bayes Classifiers: Simple Example 1 P (Drug A male, 61, normal) = c 1 P (Drug A) P (male Drug A) P (61 Drug A) P (normal Drug A) c = c = P (Drug A male, 61, normal) = c 1 P (Drug B) P (male Drug B) P (61 Drug B) P (normal Drug B) c = c = P (Drug A female, 30, normal) = c 2 P (Drug A) P (female Drug A) P (30 Drug A) P (normal Drug A) c = c = P (Drug A female, 30, normal) = c 2 P (Drug B) P (female Drug B) P (30 Drug B) P (normal Drug B) c = c = Christian Borgelt Probabilistic Reasoning: Graphical Models 131

132 Naive Bayes Classifiers: Simple Example data points, 2 classes Small squares: mean values Inner ellipses: one standard deviation Outer ellipses: two standard deviations Classes overlap: classification is not perfect Naive Bayes Classifier Christian Borgelt Probabilistic Reasoning: Graphical Models 132

133 Naive Bayes Classifiers: Simple Example 3 20 data points, 2 classes Small squares: mean values Inner ellipses: one standard deviation Outer ellipses: two standard deviations Attributes are not conditionally independent given the class. Naive Bayes Classifier Christian Borgelt Probabilistic Reasoning: Graphical Models 133

134 Naive Bayes Classifiers: Iris Data 150 data points, 3 classes Iris setosa Iris versicolor Iris virginica (red) (green) (blue) Shown: 2 out of 4 attributes sepal length sepal width petal length petal width (horizontal) (vertical) 6 misclassifications on the training data (with all 4 attributes) Naive Bayes Classifier Christian Borgelt Probabilistic Reasoning: Graphical Models 134

Decomposition. Rudolf Kruse, Alexander Dockhorn Bayesian Networks 81

Decomposition. Rudolf Kruse, Alexander Dockhorn Bayesian Networks 81 Decomposition Rudolf Kruse, Alexander Dockhorn Bayesian Networks 81 Object Representation Property Car family body Property Hatchback Motor Radio Doors Seat cover 2.8 L Type 4 Leather, 150kW alpha Type

More information

Fuzzy Systems. Possibility Theory

Fuzzy Systems. Possibility Theory Fuzzy Systems Possibility Theory Prof. Dr. Rudolf Kruse Christoph Doell {kruse,doell}@iws.cs.uni-magdeburg.de Otto-von-Guericke University of Magdeburg Faculty of Computer Science Department of Knowledge

More information

Probabilistic Graphical Models. Rudolf Kruse, Alexander Dockhorn Bayesian Networks 153

Probabilistic Graphical Models. Rudolf Kruse, Alexander Dockhorn Bayesian Networks 153 Probabilistic Graphical Models Rudolf Kruse, Alexander Dockhorn Bayesian Networks 153 The Big Objective(s) In a wide variety of application fields two main problems need to be addressed over and over:

More information

On Markov Properties in Evidence Theory

On Markov Properties in Evidence Theory On Markov Properties in Evidence Theory 131 On Markov Properties in Evidence Theory Jiřina Vejnarová Institute of Information Theory and Automation of the ASCR & University of Economics, Prague vejnar@utia.cas.cz

More information

Markov properties for undirected graphs

Markov properties for undirected graphs Graphical Models, Lecture 2, Michaelmas Term 2011 October 12, 2011 Formal definition Fundamental properties Random variables X and Y are conditionally independent given the random variable Z if L(X Y,

More information

Prof. Dr. Lars Schmidt-Thieme, L. B. Marinho, K. Buza Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course

Prof. Dr. Lars Schmidt-Thieme, L. B. Marinho, K. Buza Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course Course on Bayesian Networks, winter term 2007 0/31 Bayesian Networks Bayesian Networks I. Bayesian Networks / 1. Probabilistic Independence and Separation in Graphs Prof. Dr. Lars Schmidt-Thieme, L. B.

More information

Independencies. Undirected Graphical Models 2: Independencies. Independencies (Markov networks) Independencies (Bayesian Networks)

Independencies. Undirected Graphical Models 2: Independencies. Independencies (Markov networks) Independencies (Bayesian Networks) (Bayesian Networks) Undirected Graphical Models 2: Use d-separation to read off independencies in a Bayesian network Takes a bit of effort! 1 2 (Markov networks) Use separation to determine independencies

More information

CS Lecture 3. More Bayesian Networks

CS Lecture 3. More Bayesian Networks CS 6347 Lecture 3 More Bayesian Networks Recap Last time: Complexity challenges Representing distributions Computing probabilities/doing inference Introduction to Bayesian networks Today: D-separation,

More information

Probabilistic Graphical Models (I)

Probabilistic Graphical Models (I) Probabilistic Graphical Models (I) Hongxin Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 2015-03-31 Probabilistic Graphical Models Modeling many real-world problems => a large number of random

More information

Conditional Independence and Markov Properties

Conditional Independence and Markov Properties Conditional Independence and Markov Properties Lecture 1 Saint Flour Summerschool, July 5, 2006 Steffen L. Lauritzen, University of Oxford Overview of lectures 1. Conditional independence and Markov properties

More information

Lecture 4 October 18th

Lecture 4 October 18th Directed and undirected graphical models Fall 2017 Lecture 4 October 18th Lecturer: Guillaume Obozinski Scribe: In this lecture, we will assume that all random variables are discrete, to keep notations

More information

1. what conditional independencies are implied by the graph. 2. whether these independecies correspond to the probability distribution

1. what conditional independencies are implied by the graph. 2. whether these independecies correspond to the probability distribution NETWORK ANALYSIS Lourens Waldorp PROBABILITY AND GRAPHS The objective is to obtain a correspondence between the intuitive pictures (graphs) of variables of interest and the probability distributions of

More information

Faithfulness of Probability Distributions and Graphs

Faithfulness of Probability Distributions and Graphs Journal of Machine Learning Research 18 (2017) 1-29 Submitted 5/17; Revised 11/17; Published 12/17 Faithfulness of Probability Distributions and Graphs Kayvan Sadeghi Statistical Laboratory University

More information

Possibilistic Networks: Data Mining Applications

Possibilistic Networks: Data Mining Applications Possibilistic Networks: Data Mining Applications Rudolf Kruse and Christian Borgelt Dept. of Knowledge Processing and Language Engineering Otto-von-Guericke University of Magdeburg Universitätsplatz 2,

More information

Directed and Undirected Graphical Models

Directed and Undirected Graphical Models Directed and Undirected Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Last Lecture Refresher Lecture Plan Directed

More information

A Tutorial on Bayesian Belief Networks

A Tutorial on Bayesian Belief Networks A Tutorial on Bayesian Belief Networks Mark L Krieg Surveillance Systems Division Electronics and Surveillance Research Laboratory DSTO TN 0403 ABSTRACT This tutorial provides an overview of Bayesian belief

More information

4.1 Notation and probability review

4.1 Notation and probability review Directed and undirected graphical models Fall 2015 Lecture 4 October 21st Lecturer: Simon Lacoste-Julien Scribe: Jaime Roquero, JieYing Wu 4.1 Notation and probability review 4.1.1 Notations Let us recall

More information

Markov properties for undirected graphs

Markov properties for undirected graphs Graphical Models, Lecture 2, Michaelmas Term 2009 October 15, 2009 Formal definition Fundamental properties Random variables X and Y are conditionally independent given the random variable Z if L(X Y,

More information

Motivation. Bayesian Networks in Epistemology and Philosophy of Science Lecture. Overview. Organizational Issues

Motivation. Bayesian Networks in Epistemology and Philosophy of Science Lecture. Overview. Organizational Issues Bayesian Networks in Epistemology and Philosophy of Science Lecture 1: Bayesian Networks Center for Logic and Philosophy of Science Tilburg University, The Netherlands Formal Epistemology Course Northern

More information

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Part I C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Probabilistic Graphical Models Graphical representation of a probabilistic model Each variable corresponds to a

More information

Capturing Independence Graphically; Undirected Graphs

Capturing Independence Graphically; Undirected Graphs Capturing Independence Graphically; Undirected Graphs COMPSCI 276, Spring 2014 Set 2: Rina Dechter (Reading: Pearl chapters 3, Darwiche chapter 4) 1 Constraint Networks Example: map coloring Variables

More information

Exact Inference I. Mark Peot. In this lecture we will look at issues associated with exact inference. = =

Exact Inference I. Mark Peot. In this lecture we will look at issues associated with exact inference. = = Exact Inference I Mark Peot In this lecture we will look at issues associated with exact inference 10 Queries The objective of probabilistic inference is to compute a joint distribution of a set of query

More information

3 : Representation of Undirected GM

3 : Representation of Undirected GM 10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:

More information

Representing Independence Models with Elementary Triplets

Representing Independence Models with Elementary Triplets Representing Independence Models with Elementary Triplets Jose M. Peña ADIT, IDA, Linköping University, Sweden jose.m.pena@liu.se Abstract An elementary triplet in an independence model represents a conditional

More information

Conditional probabilities and graphical models

Conditional probabilities and graphical models Conditional probabilities and graphical models Thomas Mailund Bioinformatics Research Centre (BiRC), Aarhus University Probability theory allows us to describe uncertainty in the processes we model within

More information

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence General overview Introduction Directed acyclic graphs (DAGs) and conditional independence DAGs and causal effects

More information

Part I Qualitative Probabilistic Networks

Part I Qualitative Probabilistic Networks Part I Qualitative Probabilistic Networks In which we study enhancements of the framework of qualitative probabilistic networks. Qualitative probabilistic networks allow for studying the reasoning behaviour

More information

Statistical Approaches to Learning and Discovery

Statistical Approaches to Learning and Discovery Statistical Approaches to Learning and Discovery Graphical Models Zoubin Ghahramani & Teddy Seidenfeld zoubin@cs.cmu.edu & teddy@stat.cmu.edu CALD / CS / Statistics / Philosophy Carnegie Mellon University

More information

Undirected Graphical Models

Undirected Graphical Models Undirected Graphical Models 1 Conditional Independence Graphs Let G = (V, E) be an undirected graph with vertex set V and edge set E, and let A, B, and C be subsets of vertices. We say that C separates

More information

Toward Computing Conflict-Based Diagnoses in Probabilistic Logic Programming

Toward Computing Conflict-Based Diagnoses in Probabilistic Logic Programming Toward Computing Conflict-Based Diagnoses in Probabilistic Logic Programming Arjen Hommersom 1,2 and Marcos L.P. Bueno 2 1 Open University of the Netherlands 2 Radboud University Nijmegen, The Netherlands

More information

Reasoning with Bayesian Networks

Reasoning with Bayesian Networks Reasoning with Lecture 1: Probability Calculus, NICTA and ANU Reasoning with Overview of the Course Probability calculus, Bayesian networks Inference by variable elimination, factor elimination, conditioning

More information

Capturing Independence Graphically; Undirected Graphs

Capturing Independence Graphically; Undirected Graphs Capturing Independence Graphically; Undirected Graphs COMPSCI 276, Spring 2017 Set 2: Rina Dechter (Reading: Pearl chapters 3, Darwiche chapter 4) 1 Outline Graphical models: The constraint network, Probabilistic

More information

Separable and Transitive Graphoids

Separable and Transitive Graphoids Separable and Transitive Graphoids Dan Geiger Northrop Research and Technology Center One Research Park Palos Verdes, CA 90274 David Heckerman Medical Computer Science Group Knowledge Systems Laboratory

More information

Directed Graphical Models

Directed Graphical Models CS 2750: Machine Learning Directed Graphical Models Prof. Adriana Kovashka University of Pittsburgh March 28, 2017 Graphical Models If no assumption of independence is made, must estimate an exponential

More information

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 Plan for today and next week Today and next time: Bayesian networks (Bishop Sec. 8.1) Conditional

More information

Rapid Introduction to Machine Learning/ Deep Learning

Rapid Introduction to Machine Learning/ Deep Learning Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/32 Lecture 5a Bayesian network April 14, 2016 2/32 Table of contents 1 1. Objectives of Lecture 5a 2 2.Bayesian

More information

Learning Multivariate Regression Chain Graphs under Faithfulness

Learning Multivariate Regression Chain Graphs under Faithfulness Sixth European Workshop on Probabilistic Graphical Models, Granada, Spain, 2012 Learning Multivariate Regression Chain Graphs under Faithfulness Dag Sonntag ADIT, IDA, Linköping University, Sweden dag.sonntag@liu.se

More information

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Probabilistic Reasoning. (Mostly using Bayesian Networks) Probabilistic Reasoning (Mostly using Bayesian Networks) Introduction: Why probabilistic reasoning? The world is not deterministic. (Usually because information is limited.) Ways of coping with uncertainty

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

TDT70: Uncertainty in Artificial Intelligence. Chapter 1 and 2

TDT70: Uncertainty in Artificial Intelligence. Chapter 1 and 2 TDT70: Uncertainty in Artificial Intelligence Chapter 1 and 2 Fundamentals of probability theory The sample space is the set of possible outcomes of an experiment. A subset of a sample space is called

More information

6.867 Machine learning, lecture 23 (Jaakkola)

6.867 Machine learning, lecture 23 (Jaakkola) Lecture topics: Markov Random Fields Probabilistic inference Markov Random Fields We will briefly go over undirected graphical models or Markov Random Fields (MRFs) as they will be needed in the context

More information

Machine Learning Lecture 14

Machine Learning Lecture 14 Many slides adapted from B. Schiele, S. Roth, Z. Gharahmani Machine Learning Lecture 14 Undirected Graphical Models & Inference 23.06.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de

More information

Introduction to Probabilistic Graphical Models

Introduction to Probabilistic Graphical Models Introduction to Probabilistic Graphical Models Franz Pernkopf, Robert Peharz, Sebastian Tschiatschek Graz University of Technology, Laboratory of Signal Processing and Speech Communication Inffeldgasse

More information

Markov properties for directed graphs

Markov properties for directed graphs Graphical Models, Lecture 7, Michaelmas Term 2009 November 2, 2009 Definitions Structural relations among Markov properties Factorization G = (V, E) simple undirected graph; σ Say σ satisfies (P) the pairwise

More information

On minimal models of the Region Connection Calculus

On minimal models of the Region Connection Calculus Fundamenta Informaticae 69 (2006) 1 20 1 IOS Press On minimal models of the Region Connection Calculus Lirong Xia State Key Laboratory of Intelligent Technology and Systems Department of Computer Science

More information

On Conditional Independence in Evidence Theory

On Conditional Independence in Evidence Theory 6th International Symposium on Imprecise Probability: Theories and Applications, Durham, United Kingdom, 2009 On Conditional Independence in Evidence Theory Jiřina Vejnarová Institute of Information Theory

More information

Introduction to Probabilistic Graphical Models

Introduction to Probabilistic Graphical Models Introduction to Probabilistic Graphical Models Kyu-Baek Hwang and Byoung-Tak Zhang Biointelligence Lab School of Computer Science and Engineering Seoul National University Seoul 151-742 Korea E-mail: kbhwang@bi.snu.ac.kr

More information

Preliminaries Bayesian Networks Graphoid Axioms d-separation Wrap-up. Bayesian Networks. Brandon Malone

Preliminaries Bayesian Networks Graphoid Axioms d-separation Wrap-up. Bayesian Networks. Brandon Malone Preliminaries Graphoid Axioms d-separation Wrap-up Much of this material is adapted from Chapter 4 of Darwiche s book January 23, 2014 Preliminaries Graphoid Axioms d-separation Wrap-up 1 Preliminaries

More information

Undirected Graphical Models 4 Bayesian Networks and Markov Networks. Bayesian Networks to Markov Networks

Undirected Graphical Models 4 Bayesian Networks and Markov Networks. Bayesian Networks to Markov Networks Undirected Graphical Models 4 ayesian Networks and Markov Networks 1 ayesian Networks to Markov Networks 2 1 Ns to MNs X Y Z Ns can represent independence constraints that MN cannot MNs can represent independence

More information

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2016/2017 Lesson 13 24 march 2017 Reasoning with Bayesian Networks Naïve Bayesian Systems...2 Example

More information

STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas. Bayesian networks: Representation

STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas. Bayesian networks: Representation STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas Bayesian networks: Representation Motivation Explicit representation of the joint distribution is unmanageable Computationally:

More information

COMP538: Introduction to Bayesian Networks

COMP538: Introduction to Bayesian Networks COMP538: Introduction to Bayesian Networks Lecture 9: Optimal Structure Learning Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering Hong Kong University of Science and Technology

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Uncertainty and Bayesian Networks

Uncertainty and Bayesian Networks Uncertainty and Bayesian Networks Tutorial 3 Tutorial 3 1 Outline Uncertainty Probability Syntax and Semantics for Uncertainty Inference Independence and Bayes Rule Syntax and Semantics for Bayesian Networks

More information

Independence, Decomposability and functions which take values into an Abelian Group

Independence, Decomposability and functions which take values into an Abelian Group Independence, Decomposability and functions which take values into an Abelian Group Adrian Silvescu Department of Computer Science Iowa State University Ames, IA 50010, USA silvescu@cs.iastate.edu Abstract

More information

Bayesian Networks: Representation, Variable Elimination

Bayesian Networks: Representation, Variable Elimination Bayesian Networks: Representation, Variable Elimination CS 6375: Machine Learning Class Notes Instructor: Vibhav Gogate The University of Texas at Dallas We can view a Bayesian network as a compact representation

More information

Y. Xiang, Inference with Uncertain Knowledge 1

Y. Xiang, Inference with Uncertain Knowledge 1 Inference with Uncertain Knowledge Objectives Why must agent use uncertain knowledge? Fundamentals of Bayesian probability Inference with full joint distributions Inference with Bayes rule Bayesian networks

More information

Undirected Graphical Models: Markov Random Fields

Undirected Graphical Models: Markov Random Fields Undirected Graphical Models: Markov Random Fields 40-956 Advanced Topics in AI: Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2015 Markov Random Field Structure: undirected

More information

Graphical Model Inference with Perfect Graphs

Graphical Model Inference with Perfect Graphs Graphical Model Inference with Perfect Graphs Tony Jebara Columbia University July 25, 2013 joint work with Adrian Weller Graphical models and Markov random fields We depict a graphical model G as a bipartite

More information

Chapter 16. Structured Probabilistic Models for Deep Learning

Chapter 16. Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe

More information

Belief Update in CLG Bayesian Networks With Lazy Propagation

Belief Update in CLG Bayesian Networks With Lazy Propagation Belief Update in CLG Bayesian Networks With Lazy Propagation Anders L Madsen HUGIN Expert A/S Gasværksvej 5 9000 Aalborg, Denmark Anders.L.Madsen@hugin.com Abstract In recent years Bayesian networks (BNs)

More information

A Relational Knowledge System

A Relational Knowledge System Relational Knowledge System by Shik Kam Michael Wong, Cory James utz, and Dan Wu Technical Report CS-01-04 January 2001 Department of Computer Science University of Regina Regina, Saskatchewan Canada S4S

More information

Learning in Bayesian Networks

Learning in Bayesian Networks Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks

More information

OPPA European Social Fund Prague & EU: We invest in your future.

OPPA European Social Fund Prague & EU: We invest in your future. OPPA European Social Fund Prague & EU: We invest in your future. Bayesian networks exercises Collected by: Jiří Kléma, klema@labe.felk.cvut.cz Fall 2012/2013 Goals: The text provides a pool of exercises

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine CS 484 Data Mining Classification 7 Some slides are from Professor Padhraic Smyth at UC Irvine Bayesian Belief networks Conditional independence assumption of Naïve Bayes classifier is too strong. Allows

More information

Probabilistic Graphical Networks: Definitions and Basic Results

Probabilistic Graphical Networks: Definitions and Basic Results This document gives a cursory overview of Probabilistic Graphical Networks. The material has been gleaned from different sources. I make no claim to original authorship of this material. Bayesian Graphical

More information

Planar Graphs (1) Planar Graphs (3) Planar Graphs (2) Planar Graphs (4)

Planar Graphs (1) Planar Graphs (3) Planar Graphs (2) Planar Graphs (4) S-72.2420/T-79.5203 Planarity; Edges and Cycles 1 Planar Graphs (1) Topological graph theory, broadly conceived, is the study of graph layouts. Contemporary applications include circuit layouts on silicon

More information

Probability Propagation

Probability Propagation Graphical Models, Lecture 12, Michaelmas Term 2010 November 19, 2010 Characterizing chordal graphs The following are equivalent for any undirected graph G. (i) G is chordal; (ii) G is decomposable; (iii)

More information

Claw-free Graphs. III. Sparse decomposition

Claw-free Graphs. III. Sparse decomposition Claw-free Graphs. III. Sparse decomposition Maria Chudnovsky 1 and Paul Seymour Princeton University, Princeton NJ 08544 October 14, 003; revised May 8, 004 1 This research was conducted while the author

More information

Markov Independence (Continued)

Markov Independence (Continued) Markov Independence (Continued) As an Undirected Graph Basic idea: Each variable V is represented as a vertex in an undirected graph G = (V(G), E(G)), with set of vertices V(G) and set of edges E(G) the

More information

A new Approach to Drawing Conclusions from Data A Rough Set Perspective

A new Approach to Drawing Conclusions from Data A Rough Set Perspective Motto: Let the data speak for themselves R.A. Fisher A new Approach to Drawing Conclusions from Data A Rough et Perspective Zdzisław Pawlak Institute for Theoretical and Applied Informatics Polish Academy

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Bayesian Machine Learning - Lecture 7

Bayesian Machine Learning - Lecture 7 Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1

More information

Bayesian Networks Representation

Bayesian Networks Representation Bayesian Networks Representation Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University March 19 th, 2007 Handwriting recognition Character recognition, e.g., kernel SVMs a c z rr r r

More information

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4 ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian

More information

Independence in Generalized Interval Probability. Yan Wang

Independence in Generalized Interval Probability. Yan Wang Independence in Generalized Interval Probability Yan Wang Woodruff School of Mechanical Engineering, Georgia Institute of Technology, Atlanta, GA 30332-0405; PH (404)894-4714; FAX (404)894-9342; email:

More information

Likelihood Analysis of Gaussian Graphical Models

Likelihood Analysis of Gaussian Graphical Models Faculty of Science Likelihood Analysis of Gaussian Graphical Models Ste en Lauritzen Department of Mathematical Sciences Minikurs TUM 2016 Lecture 2 Slide 1/43 Overview of lectures Lecture 1 Markov Properties

More information

Learning from Sensor Data: Set II. Behnaam Aazhang J.S. Abercombie Professor Electrical and Computer Engineering Rice University

Learning from Sensor Data: Set II. Behnaam Aazhang J.S. Abercombie Professor Electrical and Computer Engineering Rice University Learning from Sensor Data: Set II Behnaam Aazhang J.S. Abercombie Professor Electrical and Computer Engineering Rice University 1 6. Data Representation The approach for learning from data Probabilistic

More information

Undirected graphical models

Undirected graphical models Undirected graphical models Semantics of probabilistic models over undirected graphs Parameters of undirected models Example applications COMP-652 and ECSE-608, February 16, 2017 1 Undirected graphical

More information

1 Undirected Graphical Models. 2 Markov Random Fields (MRFs)

1 Undirected Graphical Models. 2 Markov Random Fields (MRFs) Machine Learning (ML, F16) Lecture#07 (Thursday Nov. 3rd) Lecturer: Byron Boots Undirected Graphical Models 1 Undirected Graphical Models In the previous lecture, we discussed directed graphical models.

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

Machine Learning Summer School

Machine Learning Summer School Machine Learning Summer School Lecture 1: Introduction to Graphical Models Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ epartment of ngineering University of ambridge, UK

More information

Decomposing planar cubic graphs

Decomposing planar cubic graphs Decomposing planar cubic graphs Arthur Hoffmann-Ostenhof Tomáš Kaiser Kenta Ozeki Abstract The 3-Decomposition Conjecture states that every connected cubic graph can be decomposed into a spanning tree,

More information

2 : Directed GMs: Bayesian Networks

2 : Directed GMs: Bayesian Networks 10-708: Probabilistic Graphical Models, Spring 2015 2 : Directed GMs: Bayesian Networks Lecturer: Eric P. Xing Scribes: Yi Cheng, Cong Lu 1 Notation Here the notations used in this course are defined:

More information

CHAPTER-17. Decision Tree Induction

CHAPTER-17. Decision Tree Induction CHAPTER-17 Decision Tree Induction 17.1 Introduction 17.2 Attribute selection measure 17.3 Tree Pruning 17.4 Extracting Classification Rules from Decision Trees 17.5 Bayesian Classification 17.6 Bayes

More information

On flexible database querying via extensions to fuzzy sets

On flexible database querying via extensions to fuzzy sets On flexible database querying via extensions to fuzzy sets Guy de Tré, Rita de Caluwe Computer Science Laboratory Ghent University Sint-Pietersnieuwstraat 41, B-9000 Ghent, Belgium {guy.detre,rita.decaluwe}@ugent.be

More information

Markov Networks. l Like Bayes Nets. l Graphical model that describes joint probability distribution using tables (AKA potentials)

Markov Networks. l Like Bayes Nets. l Graphical model that describes joint probability distribution using tables (AKA potentials) Markov Networks l Like Bayes Nets l Graphical model that describes joint probability distribution using tables (AKA potentials) l Nodes are random variables l Labels are outcomes over the variables Markov

More information

Introduction to Bayesian Learning

Introduction to Bayesian Learning Course Information Introduction Introduction to Bayesian Learning Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Apprendimento Automatico: Fondamenti - A.A. 2016/2017 Outline

More information

12 : Variational Inference I

12 : Variational Inference I 10-708: Probabilistic Graphical Models, Spring 2015 12 : Variational Inference I Lecturer: Eric P. Xing Scribes: Fattaneh Jabbari, Eric Lei, Evan Shapiro 1 Introduction Probabilistic inference is one of

More information

Information Flow on Directed Acyclic Graphs

Information Flow on Directed Acyclic Graphs Information Flow on Directed Acyclic Graphs Michael Donders, Sara Miner More, and Pavel Naumov Department of Mathematics and Computer Science McDaniel College, Westminster, Maryland 21157, USA {msd002,smore,pnaumov}@mcdaniel.edu

More information

Cographs; chordal graphs and tree decompositions

Cographs; chordal graphs and tree decompositions Cographs; chordal graphs and tree decompositions Zdeněk Dvořák September 14, 2015 Let us now proceed with some more interesting graph classes closed on induced subgraphs. 1 Cographs The class of cographs

More information

Variable Elimination: Algorithm

Variable Elimination: Algorithm Variable Elimination: Algorithm Sargur srihari@cedar.buffalo.edu 1 Topics 1. Types of Inference Algorithms 2. Variable Elimination: the Basic ideas 3. Variable Elimination Sum-Product VE Algorithm Sum-Product

More information

Probabilistic Logics and Probabilistic Networks

Probabilistic Logics and Probabilistic Networks Probabilistic Logics and Probabilistic Networks Jan-Willem Romeijn, Philosophy, Groningen Jon Williamson, Philosophy, Kent ESSLLI 2008 Course Page: http://www.kent.ac.uk/secl/philosophy/jw/2006/progicnet/esslli.htm

More information

Index coding with side information

Index coding with side information Index coding with side information Ehsan Ebrahimi Targhi University of Tartu Abstract. The Index Coding problem has attracted a considerable amount of attention in the recent years. The problem is motivated

More information

Uncertainty and knowledge. Uncertainty and knowledge. Reasoning with uncertainty. Notes

Uncertainty and knowledge. Uncertainty and knowledge. Reasoning with uncertainty. Notes Approximate reasoning Uncertainty and knowledge Introduction All knowledge representation formalism and problem solving mechanisms that we have seen until now are based on the following assumptions: All

More information

Probability Propagation

Probability Propagation Graphical Models, Lectures 9 and 10, Michaelmas Term 2009 November 13, 2009 Characterizing chordal graphs The following are equivalent for any undirected graph G. (i) G is chordal; (ii) G is decomposable;

More information

7 RC Simulates RA. Lemma: For every RA expression E(A 1... A k ) there exists a DRC formula F with F V (F ) = {A 1,..., A k } and

7 RC Simulates RA. Lemma: For every RA expression E(A 1... A k ) there exists a DRC formula F with F V (F ) = {A 1,..., A k } and 7 RC Simulates RA. We now show that DRC (and hence TRC) is at least as expressive as RA. That is, given an RA expression E that mentions at most C, there is an equivalent DRC expression E that mentions

More information

p L yi z n m x N n xi

p L yi z n m x N n xi y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen

More information

Dialectical Frameworks: Argumentation Beyond Dung

Dialectical Frameworks: Argumentation Beyond Dung Dialectical Frameworks: Argumentation Beyond Dung Gerhard Brewka Computer Science Institute University of Leipzig brewka@informatik.uni-leipzig.de joint work with Stefan Woltran G. Brewka (Leipzig) NMR

More information