Sistemi Cognitivi per le

Size: px

Start display at page:

Download "Sistemi Cognitivi per le"

Darcy Harris
6 years ago
Views:

1 Università degli Studi di Genova Dipartimento di Ingegneria Biofisica ed Elettronica Sistemi Cognitivi per le Telecomunicazioni Prof. Carlo Regazzoni DIBE

2 An Introduction to Bayesian Networks Markov random fields This presentation is included in data fusion as a probabilistic situation awareness tool for problem representation: state estimation. Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 2

3 Outline Probabilistic graphical models Bayesian Networks Representation of Bayesian Networks Dynamic Bayesian Networks Data Fusion with Dynamic Bayesian Networks Markov random fields (MRF) MRF: conditional independence MRF: Cliques MRF: Factorization MRF: Example of MRF Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 3

4 Part I A brief introduction to probabilistic graphical models Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 4

5 Probabilistic graphical models Definition: Diagrammatic representations of probability distribution. Properties: A simple way to visualize the structure of a probabilistic model Insights into the properties of the model, including conditional independence properties Expressing complex computations in terms of graphical manipulations Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 5

6 Probabilistic graphical models A graph comprises nodes/ vertices Nodes are connected by edges/ links/ arcs Each node represents one (or a group of) random variable The links express probabilistic relationship between these random variables The link Node a Node a Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 6

7 Probabilistic graphical models Two major classes of graphical models: Bayesian Networks (directed graphical models) Markov random fields (undirected graphical models) Directed graphical models: The links of the graphs have a particular directionality indicated by arrows. Undirected graphical models: The links do not carry arrows and have no directional significance. Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 7

8 Probabilistic graphical models Directed graphical models: Useful for expressing causal relationships between random variables. Undirected graphical models: Better suited to express soft constraints between random variables. Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 8

9 Part II Bayesian Networks Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 9

10 Bayesian networks To motivate the use of directed graphs consider the following example: Consider the joint distribution ib ti p( a, b, c ) over the random variables a, b, and c Applying the product rule we have: (,, ) = p( c a, b) p( a, b) p a b c = p( c a, b) p( b a) p( a) Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 1

11 Bayesian networks First, we introduce a node for each random variable: a b c Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 11

12 Bayesian networks Then, for each conditional distribution we add directed links (arrows) from the nodes on which the distribution is conditioned. a b c Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 12

13 Bayesian networks Explanations: ( ) For the factor p b a there will be a link from node b to node a. We say that the node a is the parent of node b. And the node b is the child of node a. ( ) For the factor p c a, b there will be links from nodes a and b to node c. Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 13

14 Bayesian networks ( ) The distribution of p a, b, c is symmetrical with respect to three random variables. In the previous example, the ordering a, b, c was chosen. a b c Different ordering results in different decompositions and hence a different graphical model. (,, ) = p ( cab, ) p ( ab, ) p abc p p = (, ) ( ) ( ) p c a b p b a p a Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 14

15 Bayesian networks Other ordering can be as follows: ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ordering a, b, c p a, b, c = p c a, b p b a p a ordering b, a, c p a, b, c = p c a, b p a b p b ordering a, c, b p a, b, c = p b a, c p c a p a ordering cab,, p abc,, = p bac, p ac p c ordering b, c, a p a, b, c = p a b, c p c b p b ordering c, b, a p a, b, c = p ab, c p b c p c Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 15

16 Bayesian networks The previous example can be extended. Consider the joint distribution over K variables. By repeated application of the product rule of probability we have: px,..., x = px x,..., x px x px ( ) ( ) ( ) ( ) 1 K k 1 K Each node has incoming links from all lower numbered nodes. This graph is fully connected because there is a link between every pair of nodes. Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 16

17 Bayesian networks In general, the relationship between a directed graph and the corresponding distribution over random variables is given by the product over conditional distribution of all nodes of the graph conditioned on their parents. For a graph with K nodes the distribution is given by: K p x1 x = p x parent x (,..., ) ( ( )) K k k k = 1 Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 17

18 Bayesian networks: a general example from a high-level view Task: object recognition Each observed data point corresponds to the image of one object. The latent variables have an interpretation as the position and orientation of the object. Latent (hidden) variables are the ones that are not observed. The goal: given a particular observed image, what is the posterior distribution over objects integrating over all possible positions and orientations? Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 18

19 Bayesian networks: a general example from a high-level view The graphical model is as follows: Object Position Orientation Image Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 19

20 Conditional independence Conditional independence is an important concept for probability distribution over multiple variables. Having a distribution ib ti over three random variables a, b and c, consider that the conditional distribution of a, given b and c, is such that it does not depend on b: (, ) = p( ac) p ab c We say that a is conditionally independent of b given c. Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 2

21 Conditional independence The Conditional independence can be expressed in a different way: p ( a, b c) = p( ab, c) p( b c ) = ( ) p ( ) p ac p bc Therefore, the joint distribution of a and b, conditioned on c, factorizes into the product of marginal probabilities of a and b, conditioned on c. The colored-part used the equation in the previous page. Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 21

22 Conditional independence The two following formulas for conditional independence are equivalent. (, ) = p( ac) p ab c (, ) = p( ac) p( bc) p abc The above formulas must hold for every ypossible value of c. Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 22

23 Conditional independence The graphical model for the aforementioned conditional independence is as follows: p abc,, = p ac p bc p c ( ) ( ) ( ) ( ) c a b Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 23

24 Investigating conditional independence In order to investigate the conditional independence between a and b given c, three possible combinations of the graphical model can be investigated. These combinations are called: tail-to-tail, head-totail, and head-to-head (next slide) For each combination, two cases are considered (next slide): c is not observed (c is shown using a normal node) ) c is observed (c is shown using a shaded node) Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 24

25 Investigating conditional independence c c Tail-to-tail Head-to-tail a b a b a c b a c b a b a b Head-to-head c c is not observed c c is observed Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 25

26 Investigating conditional independence tail-to-tail: we can consider a simple graphical interpretation by considering the path from a to b via c. The node c is tail-to-tail to because it is connected to the tail of two arrows. head-to-tail: the node c is connected to the head of one arrow and to the tail of the other. head-to-head: the node c is connected to the tail of two arrows. Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 26

27 Investigating conditional independence Investigating the conditional independence between a and b given c, when c is tail-to-tail: c is not observed c is observed c c a b a b c is not observed c is observed Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 27

28 Investigating conditional independence Case 1: tail-to-tail, c is not observed: we marginalize both sides of the following equation (representing the graph) with respect to c. p a, b, c = p a c p b c p c ( ) ( ) ( ) ( ) c a b Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 28

29 Investigating conditional independence Marginalizing the equation with respect to c: (, ) p( ac) p( bc) p( c) p ab = c In general, it does not factorize into the product That means, given the above equation, in general: (, ) p( a) p( b) p a b Therefore, a and b are not conditionally independent given a null set (nothing has been observed). p( a) p( b) Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 29

30 Investigating conditional independence Graphical interpretation: Consider the path from a to b given c. If c is not observed, the presence of the path connecting a and b causes these nodes to be dependent. c a b Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 3

31 Investigating conditional independence Case 2: tail-to-tail, c is observed. Bayes theorem p ( abc, ) = = = (,, ) p( c) p a b c ( ) ( ) ( ) p ac p b c p c ( ) p c ( ) ( ) p ac p b c Therefore, a and b are conditionally independent given c. Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 31

32 Investigating conditional independence Graphical interpretation: Consider the path from a to b given c. If c is observed, the conditioned node (c) blocks the path from a to b causes these nodes to become conditionally dependent. c a b Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 32

33 Investigating conditional independence Investigating other cases in the same way we have: head-to-tail, tail c is unobserved: a and b are dependent. head-to-tail, c is observed: a and b are independent. head-to-head, c is unobserved: a and b are dependent. head-to-head, c is observed: a and b are independent. The head-to-head case is inverse with respect to the other two cases. Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 33

34 Investigating conditional independence The important feature of graphical models: The conditional independence of the joint distribution can be read directly from the graph. To this end, no analytical manipulation is needed. To read the conditional independence directly from the graph, a general framework can be derived by a reasoning similar to the previous example. The framework is called d-separation (d stands for directed). ) Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 34

35 Bayesian Network for solving problems: Video object tracking - an example Consider the following example: A given object should be tracked in an image sequence. To represent the object, the position of the object should be estimated in an image frame. The object position is represented by one point in the image frame called the reference point. The reference point can be any given point, e.g. the center of the object. What is the joint distribution of all random variables involving in tracking the object? Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 35

36 Bayesian Network for solving problems: Video object tracking - solution Here is an object (square) in an image frame: The image frame is a 2D matrix: Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 36

37 Bayesian Network for solving problems: Video object tracking - solution The object shape is represented by its corners. The red boxes are the object corners. Here is the corner-based representation of the object. Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 37

38 Bayesian Network for solving problems: Video object tracking - solution Video object tracking solution Here is the corner-based representation of the object. It can be shown as a binary matrix: Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 38

39 Bayesian Network for solving problems: Video object tracking - solution Video object tracking solution Also it is possible to consider one of the object pixels (e.g. the center) as the object position (the red box) Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 39

40 Bayesian Network for solving problems: Video object tracking - solution Also it is possible to consider one of the object pixels (e.g. the center) as the object position (the red box). So, the shape model is formed using the green arrows. Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 4

41 Bayesian Network for solving problems: Video object tracking - solution The shape, represented by corners, is formed the relative coordinates of corners with respect to the position: C1 (2,2) (2,-2) C4 C2 (-2,2) (-2,-2) C3 Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 41

42 Bayesian Network for solving problems: Video object tracking - solution To form the joint probability, to track the object in the next frame, we use the following notation: I N = x y I N N Image frame at time t: t image size: Object position at time t Object shape at time t: X = t {( x,y) 1 x x N,1 y y N} { } = { } S = S C X t i,t 1 i M i,t t 1 i M M is sthe number of corners es&c C is the corner coordinates Observations at time t: Z t = { Z1 Z N } (all image pixels, that may/may not be corners) S t = { } { X X } = ( 2,2 ),( 2, 2 ),( 2,2 ),( 2, 2) corner t Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 42

43 Bayesian Network for solving problems: Video object tracking - solution In the aforementioned example, the object position at time t-1 is used to build the object shape at time t-1. At time t, a corner extractor t extracts t the corners (observations). The object position at time t should be estimated. Therefore, we need to find the joint distribution over the above mentioned variables: p ( X,S,Z) = p( X,S, Z Z ) t t-1 t t t-1 1 N Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 43

44 Bayesian Network for solving problems: Video object tracking - solution From the shape represented before, it is clear that each observation (corner) can be available or unavailable independently from other corners. Given an object at time t-1, itsshape model can be formed. The shape at time t-1, is independent from the observations at time t. The position at time t is estimated using the shape at time t-1 and observations at time t. Therefore, position depends on both observations and shape. Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 44

45 Bayesian Network for solving problems: Video object tracking - solution Using Bayesian network the joint distribution can be shown using the following graph: Z Z1 2 Z N St-1 X t Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 45

46 Bayesian Network for solving problems: Video object tracking - solution Using Bayesian network the joint distribution can be shown using the following graph: Z 1 Z 2 Z N S t-1 ( X,S,Z) = ( X,S, Z Z ) = ( X Z Z, S) ( S ) ( Z ) ( Z ) p p p p p p t t-1 t t t-1 1 N t 1 N t-1 1 N N X t = p ( Xt St-1) p( St-1) p( Xt Z n) p( Z n) N n= 1 n= 1 Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 46

47 Bayesian Network for solving problems: Video object tracking - solution At the current example, we consider pixels. = 14 9 = 126 We define a function q to show if a given pixel i is a corner q ( Z i ) = 1 or it is not a corner q ( Z i ) = p ( Z i ) is the a priori probability of the pixel i to be a corner. We define it as follows: p( Z i ) =.5 i,1 i N p ( St-1) is the a priori probability of having the shape S at time t-1. since we have just one shape, p ( S t-1 ) = 1 p ( Xt St-1) is the probability that each position Xcan be the object position at time t. since no observation is still observed, all positions have equal probabilities: Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 47 I t

48 Bayesian Network for solving problems: Video object tracking - solution p ( Xt St-1) = N = 126 Therefore, the only unknown term is the posterior probability bilit of the position given some observations: For each position i if no corner is observed, If a corner is observed at position Z then: p ( Xt Z i ) ( ) { Z i Sj,t-1} Z i ε > if Xt + j :1 j M = 1 δ if Xt { Z i + Sj,t-1 } card ( S) p ( Xt Z n ) ( Xt Z n ).8 p = card S is the cardinality of the shape, i.e. the number of shape corners. Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 48

49 Bayesian Network for solving problems: Video object tracking - solution p ( X Z ) To make t i a probability function, the marginal probability over all positions must sum to 1: 1 card ( S) δ card ( S) + ε( N card ( S) ) = 1 N δ = ε 1 card ( S) For example, there are 4 corners in the current shape definition, and hence, card S = ( ) 4 ε =.1 N = 126 δ =.2195 ε If we consider and since we will have: Setting the small value of is to avoid a zero matrix. Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 49

50 Bayesian Network for solving problems: Video object tracking - solution The aforementioned values indicate that if a corner is observed, 122 object pixels will have a probability equal to.1 1 to be the object position and 4 object pixels will have a probability of Now, consider an example in which 6 corners are observed (next slide). The corners that will be observed in this example are as follows: a= Z31 : q( 3,3) = 1 d = Z91 : q( 7,7) = 1 b = Z 87 : q ( 3,7 ) = 1 e= Z39 : q ( 11,3 113 ) = 1 c=z : q 7,3 = 1 f =Z : q 12,6 = 1 ( ) ( ) Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 5

51 Bayesian Network for solving problems: Video object tracking - solution The observations set Z - Nothing is observed Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 51

52 Bayesian Network for solving problems: p Video object tracking - solution ( ) p( ) XZ = XS -1 =.8 X I t n t t t t Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 52

53 Bayesian Network for solving problems: Video object tracking - solution a was observed a Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 53

54 Bayesian Network for solving problems: Video object tracking - solution The probabilities of 4 positions increase. a Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 54

55 The new probability values: p( XZ t 31) Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 55

56 Bayesian Network for solving problems: Video object tracking - solution b was observed a b Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 56

57 Bayesian Network for solving problems: Video object tracking - solution The probabilities of 4 positions increase. a b Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 57

58 Bayesian Network for solving problems: Video object tracking - solution The new probability values: p ( X ) t Z Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 58

59 Bayesian Network for solving problems: Video object tracking - solution c was observed a c b Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 59

60 Bayesian Network for solving problems: Video object tracking - solution The probabilities of 4 positions increase. a c b Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 6

61 Bayesian Network for solving problems: Video object tracking - solution The new probability values: p ( X ) t Z Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 61

62 Bayesian Network for solving problems: Video object tracking - solution d was observed a c b d Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 62

63 Bayesian Network for solving problems: Video object tracking - solution The probabilities of 4 positions increase. a c b d Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 63

64 Bayesian Network for solving problems: Video object tracking - solution The new probability values: p ( X ) t Z Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 64

65 Bayesian Network for solving problems: Video object tracking - solution e was observed a c e b d Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 65

66 Bayesian Network for solving problems: Video object tracking - solution The probabilities of 4 positions increase. a c e b d Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 66

67 Bayesian Network for solving problems: Video object tracking - solution The new probability values: p ( X ) t Z Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 67

68 Bayesian Network for solving problems: Video object tracking - solution f was observed a c e f b d Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 68

69 Bayesian Network for solving problems: Video object tracking - solution The probabilities of 4 positions increase. a c e f b d Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 69

70 Bayesian Network for solving problems: Video object tracking - solution The new probability values: p ( X ) t Z Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 7

71 Bayesian Network for solving problems: Video object tracking - solution Now, we have all probability values. Therefore: ( X, S ) ( ) ( ) ( ) ( -1, Z X S S t t t t t-1 t-1 Z n X Z t n ) p = p p p p = n= 1 n= constant ( Xt Z31) ( Xt Z35 ) ( Xt Z39 ) ( Xt Z68 ) ( Xt Z87 ) ( Xt Z91 ) p p p p p p Note that in the above multiplication, the. symbol is for entry-by-entry multiplication not the matrix multiplication. Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 71

72 Bayesian Network for solving problems: Video object tracking - solution ( XZ) ( XZ ) ( XZ ) ( XZ ) ( XZ ) ( XZ) p p p p p p t 31 t 35 t 39 t 68 t 87 t x x x x x x x x x x x x x x Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 72

73 Bayesian Network for solving problems: Video object tracking - solution Multiplying all probability matrices, the posterior probability of the position will be maximized at position (5,5). 5) Therefore, the position (5,5) is the new position of the object. Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 73

74 Bayesian Network for solving problems: Video object tracking - solution a c e f b d Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 74

75 Bayesian networks Representation Our goal is to represent the joint distribution over a set of random variables: {,, } χ = X1 Xn P Let us define Val ( X 1 ) as all the possible discrete assignments of X 1 The probability space of the full joint distribution is n i=1 val(x i ) (Eq.1) Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 75

76 Bayesian networks Simplest Example Consider the problem of a company trying to hire a recent college graduate. The goal is to hire intelligent students but there is not possibility to measure intelligence directly. The company has access to the Scholastic Aptitude Test (SAT) scores, which are informative but not fully indicative. In this simple example, we induce two random variables: Intelligence that can take values {i,i 1 } and SAT score that can take values {s, s 1 }. Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 76

77 Bayesian networks Simplest Example In this case that our random variables are discrete and binary-valued, the number of non-redundant parameters for a full distribution is defined by 2 n 1 (the last parameter is fully defined by the others) equal 2 to 2 1= 3 i i 1 Low Intelligence High Intelligence Intelligence I S P(I,S) i s.665 i s 1.35 s s 1 Low score High score SAT i 1 s.6 i 1 s Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 77

78 Bayesian networks Simplest Example Factorizing the joint distribution pis (, ) can give us more natural casuality meaning to the parameters. Our probabilty bilt distribution ib ti can be factorized as pis (, ) = pi () ps ( I) And the original distribution pis (, ) can be represented with the following CPTs: i i I s s 1 i.95.5 i Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 78

79 Bayesian networks Simplest Example From the mathematical perspective, the last alternative leads to the exactly same joint distribution (you can prove yourself by doing the math) The 2 defined CPTs required 3 non-redundant parameters to be fully specified. 1 parameter from the binomial distribution from first table and 2 parameters for the two binomial distributions of second table (one for every possible assigment of the parents) Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 79

80 Bayesian networks Second Example Now let s assume that the company has also access to the student s grade in some course. We can enhance our Bayesian Network with the variable G that can take the values {g 1, g 2, g 3 } p ( ISG,, ) = p ( I ) p ( S I ) p ( G I ) g 1 g 2 A B g 3 C Grade Intelligence SAT I g 1 g 2 g 3 i i Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 8

81 Bayesian networks Second Example Some important remarks about the enhanced model: Although all the joint distribution changed, the CPTs defined in the first example are still valid. (adding nodes to some parts of the graphs does not mean change all the graph) According to Eg.(1), the total number of non-redudant parameters to describe the full distribution is 11 Using the current factorization allows to describe the distribution with only 7 non-redundant paramteres. Which mean that the factorized distribution is more compact. The less number of parameter come from the missing link between G and S. Implicitly this implies that Grades is conditional independent from SAT given Intelligence Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 81

82 Bayesian networks Third Example Lets consider a more complex scenario where we add two random variables D and L The grades now depend on how difficult the course is. It can take the values {d, d 1 }={easy {easy, hard} And a letter recommendation that can take values {l, l 1 } = {strong, weak} Difficulty Intelligence Grade SAT Letter Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 82

83 Bayesian networks Third Example d d Difficulty Intelligence i i g 1 g 2 g 3 i d i d i 1 d i 1 d Grade Letter l l 1 g g g SAT s s 1 i.95.5 i Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 83

84 Bayesian networks Third Example What is the factorization of the presented graph? What is the probability space dimension of the full probability distribution? What is the total t number of non-redudant d parameters of the full distribution? What is the total number of non-redudant parameters of the factorized distribution? Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 84

85 Bayesian networks Third Example What is the factorization of the presented graph? pidgsl (,,,, ) = pi () pdpg ( ) ( IDpS, ) ( I) plg ( ) What is the probability space dimension of the full probability distribution? 48 What is the total t number of non-redudant d parameters of the full posterior? 47 What is the total number of non-redudant parameters of the factorized distribution? 15 Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 85

86 Bayesian networks Third Example How to calculate entrances of the full distribution using the CPTs? Let s say we want to find the joint probability of: A student being intelligent. The course to be easy. To obtain a B in given course. Obtain a good score in SAT exam. Receive a strong recommendation letter. pi d g s l pi pd pg i d ps i pl g (,,,, ) = ( ) ( ) (, ) ( ) ( ) =.3*.6*.8*.8*.4 =.468 Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 86

87 Bayesian networks Behavior with Evidence Once selected the random variables, constructed the Graphical Model and set the CPTs, we can infer the posterior probability of an even given some evidence. py ( = y E= e) The naive way obtain the this posterior probability is by eliminating the entries in the joint inconsistent with our observation e and renormalize the result entries to sum up to 1. Then we compute the probability of the event y by summing the probabilities of all of the entries in the resulting posterior that are consistent with y. Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 87

88 Bayesian networks Behavior with Evidence Let s see how the probabilities change once we get evidence. The probability to obtain a strong recommendation without 1 any evidence is around 5.2% pl ( ) If we know that the student is not intelligent, this probability decreases to 38.9% 1 pl ( i).389 If we discover that the course is an easy class, the probability increases again to 51.3% 1 pl ( i, d ).513 Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 88

89 Bayesian networks Behavior with Evidence Let s see another interesting example of the effect called explaining away. Without seeing any evidence, our belief that a student is intelligent is 3% pi ( 1 ) = 3.3 If we have the evidence that the student got a C in the course, the probability of being intelligent decreases to 79%but 7.9% at the same time the probability of the course being difficult increases from 4% to 62,9% pi g 1 3 ( ).79 pd g 1 3 ( ).629 If the students submits the SAT score with a high score, his probability of being intelligent goes from 7.9% to 57.8% and the probability of the course to be difficult to 76% p i g s ( i, ).578 p d g s ( d, ).76 Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 89

90 Bayesian networks Behavior with Evidence In the last example the high SAT score outweighs the poor grade because low intelligence students are extremely unlikely to receive high scores in SAT, whereas high intelligence students can still get C s if the course is difficult. Explaining away is an instance of a general reasoning pattern called intercasual reasoning, where different causes of the same effect can interact. This type of reasoning is avery common pattern in human reasoning. Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 9

91 Bayesian networks Graph Independences What are the independence that can be drawn from the graph? ( L I, D, S G ) ( S D, G, L I) Difficulty Intelligence ( G S I, D ) ( I D) ( D IS, ) Grade Letter SAT Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 91

92 Dynamic Bayesian networks Dynamic Bayesian Networks (DBNs) can be considered as an extension of Bayesian Networks to handle temporal models. The term dynamic i is due to the fact that t they are use to represent a dynamic model (A model with a variable state over time) A DBN is defined by ( B, B ) where B defines the prior probability over the state and B is a two-slice temporal Bayes net (2TBN) which defines how the systems evolves in time. Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 92

93 Dynamic Bayesian networks There are two types of edges (dependencies) that can be defined in a DBN. Intra-slice topology (within a slice) and inter-slice topology (between two slices) Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 93

94 Dynamic Bayesian networks The decision of how to relate two variables, if either intra-slice (aka intra-time-slice) or inter-slice (aka intertime-slice) depends on how tight the coupling is between them. If the effect of one variable on the other is inmediate (much shorter then the time granularity) the influence should manifest as intra-slice edge. If the effect is slightly longer-term the influence should manifest as inter-slice edge. An inter-slice edge connecting two instances of the same variable is called persistence-edge Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 94

95 Dynamic Bayesian networks The DBN structure must satisfy the following assumptions: B The structure and CPDs in the time. do not change over Inter-slice arcs are all from left to right, in accordance with the temporal evolution. No cycles must be present in the intra-slice arcs. Thus we can view a DBN as a compact representation from which we can generate an infinite set of Bayesian networks (one for every T>) Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 95

96 Dynamic Bayesian networks Hidden Markov Models (HMMs) and Kalman Filter Model (KFM) are specific nontrivial examples of DBNs. The are formed by one hidden variable with persistence links between time steps and one observed. Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 96

97 Dynamic Bayesian networks HMM HMM is characterized by one dicrete hidden node. The probabilities that have to be defined are: px ( ) that is the initial state distribution and represents the uncertainty t on the intial value of thestate. t px ( k xk 1) that is the transition model. It describes how the state evolves in time. pz ( k xk ) that is the observation model and represents how the observations are related and generated by the hidden state. It is also called likelihood. Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 97

98 Dynamic Bayesian networks KFM Revisited KFM is characterized by one continuous hidden node. All nodes are assumed to be linear-gaussian distributions. ib ti The probabilities then defined as: px ( ) = Nx (, Q) Initial state Transition model Observation model px ( x ) = Ν ( Fx + Gu, Q ) k k 1 k 1 k pz ( x) = Ν( Hx, V ) k k Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 98

condiationally dependent fusion Conditionally dependent d fusion

99 Dynamic Bayesian networks Data Fusion There mainly three ways to fuse observations in DBNs Conditionally independent fusion Linearly condiationally dependent fusion Conditionally dependent d fusion Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 99

100 Dynamic Bayesian networks Data Fusion Mathematically this relations can be expressed, i 1 defining {, 2,..., L Z k = z k z k z k } as the set of different observations (or sensors), as: Conditionally indepent fusion pz x pz x pz x pz x 1 2 L ( k k) = ( k k) ( k k)... ( k k) Linearly condiationally dependent fusion pz x pz x pz x pz x L L ( k k) = α k ( k k) + α k ( k k) α k ( k k) Subject to: L i α k = 1 Conditionally dependent fusion i pz ( x) = pz ( z, x) pz ( z, x)... pz ( x) L 1: L 1 L 1 1: L 2 1 k k k k k k k k k k 1 Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni

101 Dynamic Bayesian networks Data Fusion Let s consider the example of visual tracking: The likelihood coming from motion and color can be taken as conditionally independent (the motion of the object can be assumed not correlated to it s motion) The problem of fusing different cues in order to create active discriminative appearance models using different color spaces can be fused with conditional linear dependency where more weight is given to the cue that is more discriminative at that time step. Now consider the case where we not only want to use different color spaces but we want to actively find the color space that best separates foreground and background. Then try to find the best color description in this color space. In this case, the color description depends on the color space chosen. Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 11

102 Dynamic Bayesian networks - Advance DBN structures The DBN in the figure is constructed from HMMs and is called factorial HMM. This type of model is very useful in a variety of appliacations, for example, when several sources of sound are being heard simultaneously through a single microphone. Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 12

103 Dynamic Bayesian networks - Advance DBN structures The DBN in the figure is called coupled HMM. This type of model is constructed from a set of chains, with each chain having its own observation. Chains interact t via their state t variable affecting adjacent chains. This kind of HMM is useful, for example, to model interaction between different interacting objects. Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 13

104 Bayesian networks belief propagation Consider the simplest tree structured network: X Y { } If evidence e= Y= y is observed, then from Bayes rule, the belief distribution of is given by: X ( ) = ( ) =α ( ) λ( ) BEL x p x p x x ( ) ( ) λ x is the likelihood vector e α p ( e ) p x is the prior probability of x = 1 Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 14

105 Bayesian networks belief propagation ( x) p( x) p( y x) λ = e = Y= = Myx Where M yx is the conditional probability matrix: ( ) ( ) ( 1 1) ( 2 1) ( n 1) ( 1 2 ) ( 2 2 ) ( n 2 ) M = p y x = p Y= y X= x = yx p y x p y x p y x p y x p y x p y x p y 1 x p y 2 x p y x ( m ) ( m ) ( n m ) Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 15

106 Bayesian networks belief propagation If Y is not observed directly, but it is supported by indirect observation e= { Z= z} we still have: ( ) = ( e) =α ( ) λ( ) BEL x p x p x x X Y Z But the likelihood vector can no longer be directly obtained from M but it must reflect the matrix M yx zyas well. Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 16

107 Bayesian networks belief propagation ( x ) p ( e x ) p ( e x, y ) p ( y x ) λ = = = ( ) y y ( e ) ( ) = λ ( ) p y p y x M y λ y can also be obtained from yx M z y Y separates X from the evidence Therefore, the belief of a node in a chain can be obtained by ypropagating p gthe likelihood vector: M ut M M M x u yx zy T U X Y Z λ ( t) ( u) λ λ ( x) λ ( y) Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 17

108 Bayesian networks bidirectional belief propagation Now consider that two evidences are observed at the + - two sides of the chain, we call them and + e In this way we have: U X Y e - ( ) = ( e +, e ) = α ( e, e + ) ( e + ) = απ( ) λ( ) BEL x p x p x p x x x e e π ( ) ( + x p x ) = e The posterior probability of x given the evidence ( ) ( ) λ x = p e x The likelihood of x + e Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 18

109 Bayesian networks bidirectional belief propagation Again, if Y is not directly observed, we know how to calculate the likelihood vector in the chain. But, if U is not directly observed, we need + to - e e propagate + the information about π ( x) from e down the chain. posterior probability + e T U X Y Z - e π ( ) ( + ) ( + x p x p x u, ) p( + e e u e ) = = = u u p( xu) π ( u) = π ( u) M xu U separates X from the evidence Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 19

110 Bayesian networks bidirectional belief propagation + e π is forward-propagated. λ is backward-propagated. Each node computes it own after obtaining the of its parent. Each node computes it own after obtaining the of its child. π λ π ( t ) π ( u ) π ( x ) π ( y ) π ( z ) T U X Y Z λ ( t) λ ( u) λ ( x) λ ( y) λ ( z) π λ e Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 11

111 Bayesian networks: summary Properties: It specifies a factorization of the joint distribution into a product of local conditional distribution. It also defines a set of conditional independence properties that must be satisfied by the distribution that factorizes according to the graph. Drawback: Due to presence of paths having head-to-head nodes, the d-separation test is somewhat subtle. Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 111

112 part III Markov Random Fields Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 112

113 Markov random fields A Markov random field (MRF) is also known as Markov network or undirected graphical model. It has a set of nodes and a set of links (like Bayesian network). The links are undirected. An MRF is an alternative graphical semantics. Using an MRF, the conditional independence is determined by simple graph separation and easier than a Bayesian network. Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 113

114 Markov random fields By removing directionality from the links, the asymmetry between parent and child nodes is removed. Therefore, the head-to-head nodes no longer arise. Using an MRF, the conditional independence is determined by simple graph separation and easier than a Bayesian network. Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 114

115 MRF: conditional independence Assume the following MRF: We identify three sets of nodes: A, B, and C. We want to test the conditional independence between set A and set B. A C B Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 115

116 MRF: conditional independence To test the conditional independence: We consider all possible paths that connect nodes in set A to nodes in set B. If all such paths pass through one or more nodes in C, all such paths are blocked, so the conditional independence holds. A C B Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 116

117 MRF: conditional independence To test the conditional independence (continued): If there is at least one path that is not blocked, the property does not necessarily hold. More precisely: there exist at least some distributions corresponding to the graph that do not satisfy the conditional independence relation. Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 117

118 MRF: conditional independence An alternative way to test the conditional independence: To remove all nodes in set C together with all links that connect to these nodes. Is there a path that connects any node in A to any node in B? A C B Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 118

119 MRF: factorization If two nodes A and B are not connected by a link, they must be conditionally independent, given all other nodes in the graph. The reason is that: There is no direct path between A and B. All other paths pass through nodes that are observed (they are blocked). Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 119

120 MRF: factorization Markov blanket of a node A consists of the set of neighboring nodes. The conditional distribution ib ti of A, conditioned d on all other variables in the graph, depends only on the variables in the Markov blanket (previous slide). A Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 12

121 MRF: factorization The conditional independence property between and can be expressed as: x j x i (, ) { } = { } ( ) ( ) { } p x x X p x X p x X i j \ i, j i \ i, j j \ i, j X X i denotes the set of all variables, with and \{ i, j} removed. x x j Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 121

122 MRF: factorization - clique To generalize factorization, we need to define another graphical concept called clique. A clique is a subset of nodes such hthat tthere is a link between all pairs of nodes in the subset (fully connected). ) A maximal clique is a clique such that it is not possible to add any other node to it. An example of cliques appears in the next slide. Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 122

123 MRF: factorization - clique Cliques of two nodes: x, x, x, x, x, x, x, x, x, x { } { } { } { } { } Maximal cliques: x, x, x, x, x, x { } { } x 1 x 2 x 3 x 4 Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 123

124 MRF: factorization - clique We define the factors in the decomposition of joint distribution to be functions of variables in the cliques. To avoid loss of generality, we can define them over the maximal cliques. We denote a clique by C The set of variables in that clique is denoted by X C We denote potential functions over the maximal ψ X cliques by ( ) C C Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 124

125 MRF: factorization The joint distribution is written as the product of potential functions. p 1 X = ψ X & Z = ψ X Z ( ) ( ) ( ) C C C C C X C Z is called partition function (normalization constant). Considering only potential functions that ψc ( X C ) we ensure that p ( X ) Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 125

126 Markov random field: illustrative example Exploiting dependencies using Graph cuts algorithm on MRF Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 126

127 Markov random field: summary Properties: It specifies a factorization of the joint distribution into a product of potential functions defined over the maximal cliques. It also defines a set of conditional independence properties. Investigating the conditional independence properties is done by simple graph separation. Determining the conditional independence properties is easier than that of Bayesian networks. Corso di Sistemi Cognitivi per le Telecomunicazioni Prof. C. Regazzoni 127

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular