Course 16:198:520: Introduction To Artificial Intelligence Lecture 9. Markov Networks. Abdeslam Boularias. Monday, October 14, 2015

Size: px
Start display at page:

Download "Course 16:198:520: Introduction To Artificial Intelligence Lecture 9. Markov Networks. Abdeslam Boularias. Monday, October 14, 2015"

Transcription

1 Course 16:198:520: Introduction To Artificial Intelligence Lecture 9 Markov Networks Abdeslam Boularias Monday, October 14, / 58

2 Overview Bayesian networks, presented in the previous lecture, are one type of graphical models. Bayesian networks are mostly used for diagnostic, such as in medicine and in business analytics. In this lecture, we focus on another important class of graphical models known as Markov Networks (a.k.a Markov Random Fields). Markov Networks are extensively used in computer vision and image processing. Andrey Markov 2 / 58

3 Application 1: Image denoising from Christopher Burger, Christian Schuler and Stefan Harmeling. CVPR / 58

4 GRAPHICAL 8. MODELS Application 1: Image denoising ( from Pattern Recognition and Machine Learning by Christopher Bishop) 8. GRAPHICAL MODELS Original image Noisy image Figure Figure Illustration of image of image de-noising using using a Markov a Markov random random field. field. The The top top row row shows shows the the original original binary binary image image on on the the left left and and the the corrupted image image after after randomly changing 10% 10% of the of the pixels pixels on on the the right. right. The The Illustration of bottom image bottom row de-noising row shows shows the using the Reconstructed restored a Markov images images random obtained image field. using The using iterated top iterated rowa conditional shows Markov themodels original models Network (ICM) (ICM) on on the the left left and and using using on the left and the the graph-cut corrupted algorithm image after on on the randomly the right. right. ICM changing ICM produces 10% anofan theimage pixels where where on96% the96% right. of the of The the pixels pixels agree agree with with the the original original 4 / 58

5 ODELS Application 1: Image denoising ( from Pattern Recognition and Machine Learning by Christopher Bishop) 8.3. Markov Random Fields 389 Figure 8.31 An undirected graphical model representing a Markov random field for image de-noising, in which xi is a binary variable denoting the state of pixel i in the unknown noise-free image, and yi denotes the corresponding value of pixel i in the observed noisy image. y i Noisy image equivalently we can add the corresponding energies. In this example, this allows us to add an extra term hx i for each pixel i in the noise-free image. Such a term has the effect of biasing the model towards pixel values that have one particular sign in preference to the other. The complete energy function for the model then takes the form p(x, y) = 1 exp{ E(x, y)}. (8.43) e de-noising using a Markov random field. The top row shows Zthe original corrupted image after randomly changing 10% of the pixels on the right. The ages obtained using iterated We conditional now fix models the elements (ICM) on of the y to left the and observed using values given by the pixels of the ght. ICM produces an image noisy where image, 96% which of theimplicitly pixels agree defines with the a conditional original distribution p(x y) over noise- x i Markov Network We construct a graph (network), where each node corresponds to a pixel in the image. Nodes are binary random variables (e.g. blue or yellow). E(x, y) =h x i β x i x j η x i y i (8.42) Nodes X describe the true color i of {i,j} each pixel. i Nodes Ywhich describe defines a joint thedistribution observed over xcolor and y given of by each pixel. Adjacent pixels correspond to adjacent nodes in the graph. Inference problem: Find the Maximum A Posteriori (MAP) argmax X P (X Y ). 5 / 58

6 Application 2: Background/Foreground separation Segmentation of Multivariate Mixed Data via Lossy Coding and Compression. Yi Ma, Harm Derksen, Wei Hong and John Wright. IEEE Transactions on Pattern Analysis and Machine Intelligence, / 58

7 Application 3: detecting graspable parts of objects from depth images 7 / 58

8 Markov Networks Markov networks are useful for applications where relations between variables are symmetrical. Bayesian networks represent causal relations, e.g. Smoking Cancer. Markov networks represent correlations between variables, e.g. Bob is a democrat Alice is a democrat when Bob and Alice are friends, these two variables depend on each other, but neither is the cause of the other. A Markov network is a set of random variables having a Markov property described by an undirected graph. 8 / 58

9 Markov Networks An example of a Markov random field. Each edge represents dependency. In this example: A depends on B and D. B depends on A and D. D depends on A, B, and E. E depends on D and C. C depends on E. (From Wikipedia) 9 / 58

10 Complete subgraphs (cliques) Definition A complete subgraph, or a clique, is a subgraph where every two vertices are connected to each other. Example 1-vertex cliques: C 1 = {A}, C 2 = {B}, C 3 = {C}, C 4 = {D}, C 5 = {E}. 2-vertex cliques: C 6 = {A, B}, C 7 = {A, D}, C 8 = {B, D}, C 9 = {D, E}, C 10 = {E, C}. 3-vertex cliques: C 11 = {A, B, D}. 10 / 58

11 Factor Potential Definition Let X be a set of random variables. We define a factor potential to be a function from values of X to R +. Example Let X = {BobDemocrat, AliceDemocrat}. We define the following factor potential φ: φ([bobdemocrat = true, AliceDemocrat = true]) = 21 φ([bobdemocrat = true, AliceDemocrat = f alse]) = 2.7 φ([bobdemocrat = f alse, AliceDemocrat = true]) = 3.6 φ([bobdemocrat = f alse, AliceDemocrat = f alse]) = / 58

12 Definition of a Markov Network Definition Let X = {X 1, X 2,..., X n } be a set of random variables and G a graph that has X as vertices. Let C = {C 1, C 2,..., C m } be the set of all the complete subgraphs in G. Let φ 1, φ 2,..., φ m be factor potentials defined over C 1, C 2,..., C m respectively. X is a Markov network if: P (X 1, X 2,..., X n ) = 1 Z φ 1(C 1 ) φ 2 (C 2 ) φ m (C m ). Z is a normalization constant, called the partition function, it is defined as Z = φ 1 (C 1 ) φ 2 (C 2 ) φ m (C m ). X 1,X 2,...,X n (each C i is a subset of {X 1, X 2,..., X n }). 12 / 58

13 Example 1 1-vertex cliques: C 1 = {A}, C 2 = {B}, C 3 = {C}, C 4 = {D}, C 5 = {E}. 2-vertex cliques: C 6 = {A, B}, C 7 = {A, D}, C 8 = {B, D}, C 9 = {D, E}, C 10 = {E, C}. 3-vertex cliques: C 11 = {A, B, D}. Assume the random variables {A, B, C, D, E} are boolean. Define a factor potential φ i for each clique C i. Example: φ 6 is defined over the clique C 6 = {A, B} φ 6 ([A = true, B = true]) = 21 φ 6 ([A = true, B = false]) = 2.7 φ 6 ([A = false, B = true]) = 3.6 φ 6 ([A = false, B = false]) = / 58

14 Example 1 Probability of (A = true, B = false, C = true, D = true, E = false) is given as: P (A = true, B = false, C = true, D = true, E = false) = 1 Z φ 1(C 1 ) φ 2 (C 2 ) φ 3 (C 3 ) φ 4 (C 4 ) φ 5 (C 5 ) φ 6 (C 6 ) φ 7 (C 7 ) φ 8 (C 8 ) φ 9 (C 9 ) φ 10 (C 10 ) φ 11 (C 11 ). The variables inside each clique C i take their corresponding values from (A = true, B = false, C = true, D = true, E = false). Z = (A,B,C,D,E) {true,false} 5 m φ i (C i ) i=0 14 / 58

15 tation 21 Representation Example 2 B Patient 1 TB Patient 1 TB Patient 2 B Patient 3 TB Patient 4 TB Patient 2 TB Patient 3 TB Patient 4 P 1 p 1 p 1 P 1 P 3 π(p 1, P 3 ) P 1 P 2 π(p 1, P 2 ) p 1 p 2 1 P 1 P 2 π(p 1, P 2 ) p 1 p p 1 p 2 1 p 1 p 0.5 P 2 2 π(p 2 ) p 1 p P p p 1 π(p 2 2 p ) p 1 p p p p 1 p 2 2 p P 1 P 2 P 1 P 2 P 4 P 2 π(p 2, P 4 ) π(p 2 ) p 1 p 3 1 P 1 P 3 π(p 1, P 3 ) p 2 p 4 1 P 2 P 4 π(p 2, P 4 p 1 p p 1 p 3 1 p 2 p p 2 p 4 1 p 1 p p 2 p p 1 p p 2 p 0.5 p 4 1 p 3 2 p 2 p 4 2 p 1 p 3 P 0.5 p 2 p P 4 p 1 p 3 2 p 2 p 4 2 P 3 π(p 3 ) P P 3 4 π(pp 4 ) p 3 P 3 P 4 π(p 3, P 4 ) 0.2 p P 3 π(p 3 ) P 4 π(p 4 ) p p p 1 p 3 4 p 3 P 3 P 4 π(p 4 3, P ) 0.2 p p p 3 p p 3 p 4 1 p p 3 p p 3 p p 3 p 4 2 p 3 p p 3 p 4 2 Graph (a) of random variables Tables of factor (b) potentials. Potentials of 1-vertex (a) cliques are denoted by π(.) and potentials of 2-vertex (b) cliques are 2.3 denoted (a) Aby simple π(.,.). Random Markovvariable network P i describing can be interpreted the tuberculosis as patient having status TB. oftwo four s. Figure The links 2.3 between (a) A simple patients Markov indicate network which describing patients the have tuberculosis been in contact status of f variables are linked if the corresponding patients had a physical contact with each other. ch patients. other. (b) The The links same between Markov patients network, indicate together whichwith patients the node haveand been edge in con π(p 1 ) Example taken from Daphne Koller, Nir Friedman, Lise Getoor and Ben Taskar. Graphical Models in a Nutshell 15 / P 2 p 2 p 2

16 Independence properties in Markov networks Let X be a Markov network. We use the notation (X Y ) Z to indicate that X is independent of Y given Z, i.e. P (X, Y Z) = P (X Z)P (Y Z), in other terms P (X Y, Z) = P (X Z). 16 / 58

17 Independence properties in Markov networks Let X be a Markov network. We use the notation (X Y ) Z to indicate that X is independent of Y given Z, i.e. P (X, Y Z) = P (X Z)P (Y Z), in other terms P (X Y, Z) = P (X Z). We show here some nice independence properties in Markov networks. Independence properties are useful for fast inference, we don t have to enumerate all the possible combinations of values of all the variables. 17 / 58

18 Independence properties in Markov networks Let X be a Markov network. We use the notation (X Y ) Z to indicate that X is independent of Y given Z, i.e. P (X, Y Z) = P (X Z)P (Y Z), in other terms P (X Y, Z) = P (X Z). We show here some nice independence properties in Markov networks. Independence properties are useful for fast inference, we don t have to enumerate all the possible combinations of values of all the variables. Pairwise Markov property Any two non-adjacent variables, X i and X j, are conditionally independent of each other given all other variables X {X i, X j } (X i X j ) X {X i, X j }. In other terms, P (X i X j, X {X i, X j }) = P (X i X {X i, X j }). 18 / 58

19 Example A and E are independent of each other, given B, C, and D. 19 / 58

20 Independence properties in Markov networks Local Markov property A variable is conditionally independent of all other variables given its neighbors ( Xi X {X i, neighbors(x i )} ) neighbors(x i ). 20 / 58

21 Independence properties in Markov networks P 1 p 1 p 1 π(p 1 ) P 1 P p 1 p p 1 p p 1 p p 1 p Example TB Patient 1 TB Patient 2 P 1 P 3 π(p 1, P 3 ) p 1 p 3 1 p 1 p p 1 p p 1 p 3 2 P 1 P 3 TB Patient 3 TB Patient 4 P 3 p 3 p 3 π(p 3 ) P 3 p 3 p 3 P p 3 p 3 This Markov Network describes the following (a) local Markov assumptions: (P 1 P 4 P 2, P 3 ), Figure 2.3 (a) A simple Markov network describing the t (P 2 P 3 P 1, patients. P 4 ), The links between patients indicate which patien with each other. (b) The same Markov network, together potentials. 21 / 58

22 Independence properties in Markov networks Global Markov property Any two subsets of variables, X A X and X B X, are conditionally independent given a separating subset X S X: (X A X B ) X S. where every path from a node in X A to a node in X B passes through X S. 22 / 58

23 Example X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 X 10 X 11 X 12 X 13 X 14 X 15 X 16 X 17 X 18 X 19 X 20 X 21 A Markov Network 23 / 58

24 Example X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 X 10 X 11 X 12 X 13 X 14 X 15 X 16 X 17 X 18 X 19 X 20 X 21 Set X S = {X 4, X 11, X 17 } separates sets X A = {X 1, X 2, X 3, X 8, X 9, X 10, X 15, X 16 } and X B = {X 5, X 6, X 7, X 12, X 13, X 14, X 18, X 19, X 20, X 21 } 24 / 58

25 Example X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 X 10 X 11 X 12 X 13 X 14 X 15 X 16 X 17 X 18 X 19 X 20 X 21 Set X S = {X 4, X 11, X 17 } separates sets X A = {X 1, X 2, X 3, X 8, X 9, X 10, X 15, X 16 } and X B = {X 5, X 6, X 7, X 12, X 13, X 14, X 18, X 19, X 20, X 21 } Example: any path between X 8 and X 13 should pass through X 4, X 11 or X / 58

26 Example X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 X 10 X 11 X 12 X 13 X 14 X 15 X 16 X 17 X 18 X 19 X 20 X 21 Set X S = {X 4, X 11, X 17 } separates sets X A = {X 1, X 2, X 3, X 8, X 9, X 10, X 15, X 16 } and X B = {X 5, X 6, X 7, X 12, X 13, X 14, X 18, X 19, X 20, X 21 } Example: any path between X 8 and X 13 should pass through X 4, X 11 or X / 58

27 Example X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 X 10 X 11 X 12 X 13 X 14 X 15 X 16 X 17 X 18 X 19 X 20 X 21 Set X S = {X 4, X 11, X 17 } separates sets X A = {X 1, X 2, X 3, X 8, X 9, X 10, X 15, X 16 } and X B = {X 5, X 6, X 7, X 12, X 13, X 14, X 18, X 19, X 20, X 21 } Example: P (X 8 X 13, X 4, X 11, X 17 ) = P (X 8 X 4, X 11, X 17 ), and P (X 13 X 8, X 4, X 11, X 17 ) = P (X 13 X 4, X 11, X 17 ). 27 / 58

28 Hammersley-Clifford theorem Theorem Let X = {X 1, X 2,..., X n } be a set of random variables and G a graph that has X as vertices. If X satisfies the global Markov independence property, then P (X 1, X 2,..., X n ) = 1 Z φ 1(C 1 ) φ 2 (C 2 ) φ m (C m ). where C = {C 1, C 2,..., C m } is the set of all the complete subgraphs in G, and φ 1, φ 2,..., φ m are factor potentials defined over C 1, C 2,..., C m respectively. In other terms, X is a Markov network. 28 / 58

29 Hammersley-Clifford theorem Theorem Let X = {X 1, X 2,..., X n } be a set of random variables and G a graph that has X as vertices. If X satisfies the global Markov independence property, then P (X 1, X 2,..., X n ) = 1 Z φ 1(C 1 ) φ 2 (C 2 ) φ m (C m ). where C = {C 1, C 2,..., C m } is the set of all the complete subgraphs in G, and φ 1, φ 2,..., φ m are factor potentials defined over C 1, C 2,..., C m respectively. In other terms, X is a Markov network. Global Markov independence Factorization into potentials 29 / 58

30 What exactly are the factor potentials φ i (C i )? In the previous examples, the factor potentials φ i (C i ) look only at the values of the variables contained in clique C i. For example: φ([patient 1 has TB = true]) = 30 φ([patient 1 has TB = false]) = 3 φ([patient 1 has TB = true, Patient 2 has TB = true]) = 21 φ([patient 1 has TB = true, Patient 2 has TB = false]) = 2.7 Clearly, we cannot just use the same φ([patient 1 has TB = true]) for every patient x. Every patient is different. Also, we cannot write down φ([patient x has TB = true]) for every possible patient x, there are infinitely many patients. 30 / 58

31 What exactly are the factor potentials φ i (C i )? In the previous examples, the factor potentials φ i (C i ) look only at the values of the variables contained in clique C i. For example: φ([patient 1 has TB = true]) = 30 φ([patient 1 has TB = false]) = 3 φ([patient 1 has TB = true, Patient 2 has TB = true]) = 21 φ([patient 1 has TB = true, Patient 2 has TB = false]) = 2.7 Clearly, we cannot just use the same φ([patient 1 has TB = true]) for every patient x. Every patient is different. Also, we cannot write down φ([patient x has TB = true]) for every possible patient x, there are infinitely many patients. We include side information, or features, regarding each variable and each clique. For example, the medical checkup of Patient x. φ([patient x has TB = true]) is not directly a function of Patient x, but it is a function of the features of Patient x. 31 / 58

32 Clique features Each clique C i can be described by a vector of features f i. Example 1: Clique C 1 = { Patient 1} is described by a vector of features f 1 : f 1 [0] = Age of Patient 1 f 1 [1] = Blood pressure of Patient 1 f 1 [2] = Body temperature of Patient 1 Example 1: Clique C 2 = { Patient 1, Patient 2} is described by a vector of features f 2 : { f2 [0] = Type of interaction between Patients 1 and 2 f 2 [1] = Duration of interaction between Patients 1 and 2 32 / 58

33 Logistic model Typically, we use a log-linear model with feature functions f i to represent factor potentials φ i : ( k ) φ i (C i ) = exp w Ci [j]f i [j], j=0 Where f i [j] is the j th feature of the clique C i, and w Ci [j] is its corresponding weight (a value in R). w Ci, like φ i (C i ), depends on the values taken by the variables in clique C i. 33 / 58

34 Logistic model Typically, we use a log-linear model with feature functions f i to represent factor potentials φ i : ( k ) φ i (C i ) = exp w i [j]f i [j], Where f i [j] si the j th feature of the clique C i, and w i [j] is its corresponding weight (a value in R). w Ci, like φ i (C i ), depends on the values taken by the variables in clique C i. j=0 Therefore, P (X 1, X 2,..., X n ) = 1 Z m φ i (C i ) = 1 ( m Z exp i=0 k i=0 j=0 ) w Ci [j]f i [j] 34 / 58

35 Associative Markov Networks Associative Markov Networks are a popular variant of Markov Networks that have been successfully used in computer vision. They are a special variant of Pairwise Markov Networks. In Pairwise Markov Networks, there are two types of factor potentials, potentials φ node associated with individual variables, and potentials φ edge associated with edges (links between variables). A logistic model is used to represent the potentials: ( ) φ node (X i ) = exp k sign(x i)w node [k]f i [k] ( ) φ edge (X i, X j ) = exp k w edge[k]f (i,j) [k] sign(x i ) = +1 if X i = true and sign(x i ) = 1 if X i = false 35 / 58

36 Associative Markov Networks Associative Markov Networks are a popular variant of Markov Networks that have been successfully used in computer vision. They are a special variant of Pairwise Markov Networks. In Pairwise Markov Networks, there are two types of factor potentials, potentials φ node associated with individual variables, and potentials φ edge associated with edges (links between variables). A logistic model is used to represent the potentials: ( ) φ node (X i ) = exp k sign(x i)w node [k]f i [k] ( ) φ edge (X i, X j ) = exp k w edge[k]f (i,j) [k] sign(x i ) = +1 if X i = true and sign(x i ) = 1 if X i = false The name logistic model comes from the fact that log φ(x i ) = k sign(x i)w[k]f i [k] 36 / 58

37 Associative Markov Networks In Associative Markov Networks, edge potentials are defined as ( ) φ edge (X i, X j ) = exp k w edge[k]f (i,j) [k] if X i = X j. ( ) φ edge (X i, X j ) = exp 0 = 1 if X i X j. 37 / 58

38 Associative Markov Networks In Associative Markov Networks, edge potentials are defined as ( ) φ edge (X i, X j ) = exp k w edge[k]f (i,j) [k] if X i = X j. ( ) φ edge (X i, X j ) = exp 0 = 1 if X i X j. The joint probability distribution is given as P (X) = 1 ( Z exp sign(x i )w node [k]f i [k] + X i k (X i,x j ) k s.t. X i =X j and X i,x j are neighbors ) w edge [k]f (i,j) [k]. 38 / 58

39 Application of Associative Markov Networks: Image segmentation sky building void flower aeroplane grass Figure 4. Example images and segmentations using PLSA-MRF on the 21 class data set, with topic vectors learned from labeled pa Bird Book Chair Road Cat Dog / 58 8 Boat Sign Body Flower Bicycle Car Water Face Aeroplane Sky Cow Tree Sheep Grass Textonboost PLSA-MRF/P PLSA-MRF/I Building Jakob Verbeek, Bill Triggs. Region Classification with Markov Field Aspect Models. In CVPR 2007.

40 Application of Associative Markov Networks: Image segmentation void flower Jakob Verbeek, Bill Triggs. Region Classification sky with Markov Field Aspect Models. In CVPR Figure 4. Example images and segmentations using PLS building void flower 40 / 58

41 Application of Associative Markov Networks: Image segmentation X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 sky 41 / 58

42 Learning Associative Markov Networks Each variable X i in the network corresponds to a small patch in the image. X i = true means patch i in the image corresponds to a flower. X i = false means patch i in the image does not correspond to a flower. We represent the joint probability distribution over all possible values of variables X i as P (X) = 1 ( Z exp sign(x i)w node [k]f i[k] + ) w edge [k]f (i,j) [k]. X i k (X i,x j ) k s.t. X i =X j and X i,x j are neighbors The instance that has the highest probability should be when variables X 4, X 5, X 11, X 12, X 18 and X 19 are all true while all the remaining variables are false. 42 / 58

43 Application of Associative Markov Networks: Image segmentation X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 sky 43 / 58

44 Application of Associative Markov Networks: Image segmentation Edge features We can use a unique constant feature for the edges (links): f (i,j) [0] = 1, (i, j) s.t X i is neighbor of X j Node features We can use the Histograms of Oriented Gradients (HOG) features to represent the patch corresponding to each variable. f i [k] counts the number of occurrences of image gradient orientation 2πk/N in the portion i of the image (where N is the maximum number or orientations considered). We can also add RGB colours as features. 44 / 58

45 Learning Associative Markov Networks Now that we know what the feature vectors f i and f i,j are, how can we find their weight vectors w node and w edge? We can collect a lot of annotated examples, and find weights w node and w edge that maximize the likelihood. The likelihood function is concave and can be maximized using a gradient ascent. 45 / 58

46 Learning Associative Markov Networks To learn weights wnode and wedge, we start by collecting examples of images of a particular object (e.g, flower). Examples of images containing a flower For each image, we create an Associative Markov Network and label its nodes as true (object detected), or false. Red nodes are variables set to true, blue nodes are variables set to true. 46 / 58

47 Maximum Likelihood Let X ex be an assignment of values to the variables (nodes) in one of the examples. We have L X ex(w) = 1 ( exp sign(x i)w node [k]f i[k] + ) w edge [k]f (i,j) [k]. Z w X i X ex where w = [w node, w edge ]. k L X ex(w) is the likelihood of w given X ex. (X i,x j ) k s.t. X i =X j in X ex, X i,x j are neighbors L X ex(w) is a function of both w and X ex. But here, X ex is known from the annotated example. We want to find w that maximizes L X ex(w). We need to compute ( w L X ex(w) = LX ex(w) w node [0], L X ex(w) w node [1],..., L X ex(w) w edge [0], L X ex(w) w edge [1] )., / 58

48 Maximum Likelihood L X ex(w) = 1 ( exp Z w X i X ex sign(x i)w node [k]f i[k] + k (X i,x j ) k s.t. X i =X j in X ex, X i,x j are neighbors ) w edge [k]f (i,j) [k]. L ( X ex w node [k] (w) = L X ex(w) sign(x i )f i [k] L X (w) X i X ex X where X is an arbitrary assignments of values to the variables. L ( X ex w edge [k] (w) = L X ex(w) f (i,j) [k] (X i,x j) k s.t. X i=x j in X ex, X i,x j are neighbors L X (w) X (X i,x j) X i X k s.t. X i=x j in X, X i,x j are neighbors ) sign(x i )f i [k]. ) f (i,j) [k]. 48 / 58

49 Most Likely Explanation After we learn the weights of the model using gradient ascent, we are given a completely new image and we are asked to label the nodes in it into true (e.g, flower), or false. Notice that we do not need to compute the probability of each combination of values, we only need to find the combination that has the maximum propagation. For binary variables, this problem can be transformed into finding a minimum cut in a graph, and solved in polynomial time. 49 / 58

50 Inference in Markov Networks We return now to general Markov Networks, and let s say we want to compute P (X i ) for some variable X i X. The belief propagation algorithm, a.k.a the sum-product algorithm, provides an exact answer for tree-structured graphs, and an approximate answer for general graphs (for which the algorithm is known as loopy belief propagation). 50 / 58

51 Factor graphs We have seen that the joint probability is the product of factors φ i, defined over cliques C i. Each factor involves one or more variables. Each variable is involved in one or more factors. 51 / 58

52 Factor graphs 1-vertex cliques: C 1 = {A}, C 2 = {B}, C 3 = {C}, C 4 = {D}, C 5 = {E}. 2-vertex cliques: C 6 = {A, B}, C 7 = {A, D}, C 8 = {B, D}, C 9 = {D, E}, C 10 = {E, C}. 3-vertex cliques: C 11 = {A, B, D}. A B C D E φ 1 φ 2 φ 3 φ 4 φ 5 φ 6 φ 7 φ 8 φ 9 φ 10 φ / 58

53 Factor graphs The sum-product algorithm is based on repeatedly passing messages between variables and factors until convergence. A B C D E φ 1 φ 2 φ 3 φ 4 φ 5 φ 6 φ 7 φ 8 φ 9 φ 10 φ / 58

54 The sum-product algorithm We have two types of messages: Message µ Xi φ from a variable X i to a factor φ Message µ φ Xi from a factor φ to a variable X i. A B C D E µ A φ1 µ A φ6 µ A φ7 µ A φ11 φ 1 φ 2 φ 3 φ 4 φ 5 φ 6 φ 7 φ 8 φ 9 φ 10 φ / 58

55 The sum-product algorithm We have two types of messages: Message µ Xi φ from a variable X i to a factor φ Message µ φ Xi from a factor φ to a variable X i. A B C D E µ φ6 A µ φ6 B φ 1 φ 2 φ 3 φ 4 φ 5 φ 6 φ 7 φ 8 φ 9 φ 10 φ / 58

56 The sum-product algorithm A message µ Xi φ from a variable X i to a factor φ is the product of the messages from all other factors involving X i (except the recipient) x i domain(x i ) : µ Xi φ(x i ) = φ j φ s.t X i C j µ φj X i (x i ). where φ j are the factors of all the cliques that contain X i, except the factor φ (the recipient). 56 / 58

57 The sum-product algorithm A message µ φ Xi from factor φ to variable X i is the product of the factor with messages from all other variables, marginalized over all variables except the one associated with X i x i domain(x i ) : µ φ Xi (x i ) = φ(x) µ Xj φ(x j ). X s.t X i =x i X j C {X i } where C is the clique of variables associated with φ, and X are joint values of all the variables in clique C. We sum over all the possible values of the variables in the clique (except for X i which is set to x i ). x j is the value that variable X j takes in X. 57 / 58

58 The sum-product algorithm The initial messages (at the first iteration) are all equal to 1. Upon convergence (if convergence happened), the estimated marginal distribution of each node is: P (X i = x i ) µ φj X i (x i ) φ j s.t X i C j The same algorithm can be used for finding the Maximum A Posteriori (MAP) by replacing the sums by maxima. This algorithm is known as Max-Product. 58 / 58

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

Undirected Graphical Models: Markov Random Fields

Undirected Graphical Models: Markov Random Fields Undirected Graphical Models: Markov Random Fields 40-956 Advanced Topics in AI: Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2015 Markov Random Field Structure: undirected

More information

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Part I C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Probabilistic Graphical Models Graphical representation of a probabilistic model Each variable corresponds to a

More information

Probabilistic Graphical Models Lecture Notes Fall 2009

Probabilistic Graphical Models Lecture Notes Fall 2009 Probabilistic Graphical Models Lecture Notes Fall 2009 October 28, 2009 Byoung-Tak Zhang School of omputer Science and Engineering & ognitive Science, Brain Science, and Bioinformatics Seoul National University

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Undirected graphical models

Undirected graphical models Undirected graphical models Semantics of probabilistic models over undirected graphs Parameters of undirected models Example applications COMP-652 and ECSE-608, February 16, 2017 1 Undirected graphical

More information

Markov Networks.

Markov Networks. Markov Networks www.biostat.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts Markov network syntax Markov network semantics Potential functions Partition function

More information

MAP Examples. Sargur Srihari

MAP Examples. Sargur Srihari MAP Examples Sargur srihari@cedar.buffalo.edu 1 Potts Model CRF for OCR Topics Image segmentation based on energy minimization 2 Examples of MAP Many interesting examples of MAP inference are instances

More information

Markov Networks. l Like Bayes Nets. l Graph model that describes joint probability distribution using tables (AKA potentials)

Markov Networks. l Like Bayes Nets. l Graph model that describes joint probability distribution using tables (AKA potentials) Markov Networks l Like Bayes Nets l Graph model that describes joint probability distribution using tables (AKA potentials) l Nodes are random variables l Labels are outcomes over the variables Markov

More information

Markov Random Fields

Markov Random Fields Markov Random Fields Umamahesh Srinivas ipal Group Meeting February 25, 2011 Outline 1 Basic graph-theoretic concepts 2 Markov chain 3 Markov random field (MRF) 4 Gauss-Markov random field (GMRF), and

More information

Markov Networks. l Like Bayes Nets. l Graphical model that describes joint probability distribution using tables (AKA potentials)

Markov Networks. l Like Bayes Nets. l Graphical model that describes joint probability distribution using tables (AKA potentials) Markov Networks l Like Bayes Nets l Graphical model that describes joint probability distribution using tables (AKA potentials) l Nodes are random variables l Labels are outcomes over the variables Markov

More information

Lecture 9: PGM Learning

Lecture 9: PGM Learning 13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and

More information

3 : Representation of Undirected GM

3 : Representation of Undirected GM 10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah Introduction to Graphical Models Srikumar Ramalingam School of Computing University of Utah Reference Christopher M. Bishop, Pattern Recognition and Machine Learning, Jonathan S. Yedidia, William T. Freeman,

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models David Sontag New York University Lecture 4, February 16, 2012 David Sontag (NYU) Graphical Models Lecture 4, February 16, 2012 1 / 27 Undirected graphical models Reminder

More information

Lecture 6: Graphical Models

Lecture 6: Graphical Models Lecture 6: Graphical Models Kai-Wei Chang CS @ Uniersity of Virginia kw@kwchang.net Some slides are adapted from Viek Skirmar s course on Structured Prediction 1 So far We discussed sequence labeling tasks:

More information

Rapid Introduction to Machine Learning/ Deep Learning

Rapid Introduction to Machine Learning/ Deep Learning Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/24 Lecture 5b Markov random field (MRF) November 13, 2015 2/24 Table of contents 1 1. Objectives of Lecture

More information

CSC 412 (Lecture 4): Undirected Graphical Models

CSC 412 (Lecture 4): Undirected Graphical Models CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:

More information

Review: Directed Models (Bayes Nets)

Review: Directed Models (Bayes Nets) X Review: Directed Models (Bayes Nets) Lecture 3: Undirected Graphical Models Sam Roweis January 2, 24 Semantics: x y z if z d-separates x and y d-separation: z d-separates x from y if along every undirected

More information

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah Introduction to Graphical Models Srikumar Ramalingam School of Computing University of Utah Reference Christopher M. Bishop, Pattern Recognition and Machine Learning, Jonathan S. Yedidia, William T. Freeman,

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 Plan for today and next week Today and next time: Bayesian networks (Bishop Sec. 8.1) Conditional

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 12 Dynamical Models CS/CNS/EE 155 Andreas Krause Homework 3 out tonight Start early!! Announcements Project milestones due today Please email to TAs 2 Parameter learning

More information

Markov Random Fields for Computer Vision (Part 1)

Markov Random Fields for Computer Vision (Part 1) Markov Random Fields for Computer Vision (Part 1) Machine Learning Summer School (MLSS 2011) Stephen Gould stephen.gould@anu.edu.au Australian National University 13 17 June, 2011 Stephen Gould 1/23 Pixel

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Directed and Undirected Graphical Models

Directed and Undirected Graphical Models Directed and Undirected Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Last Lecture Refresher Lecture Plan Directed

More information

Statistical Approaches to Learning and Discovery

Statistical Approaches to Learning and Discovery Statistical Approaches to Learning and Discovery Graphical Models Zoubin Ghahramani & Teddy Seidenfeld zoubin@cs.cmu.edu & teddy@stat.cmu.edu CALD / CS / Statistics / Philosophy Carnegie Mellon University

More information

Intelligent Systems:

Intelligent Systems: Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM Roadmap for next two lectures Definition

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

Variational Inference (11/04/13)

Variational Inference (11/04/13) STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further

More information

Probabilistic Graphical Models. Guest Lecture by Narges Razavian Machine Learning Class April

Probabilistic Graphical Models. Guest Lecture by Narges Razavian Machine Learning Class April Probabilistic Graphical Models Guest Lecture by Narges Razavian Machine Learning Class April 14 2017 Today What is probabilistic graphical model and why it is useful? Bayesian Networks Basic Inference

More information

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 9 Undirected Models CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due next Wednesday (Nov 4) in class Start early!!! Project milestones due Monday (Nov 9)

More information

Machine Learning Lecture 14

Machine Learning Lecture 14 Many slides adapted from B. Schiele, S. Roth, Z. Gharahmani Machine Learning Lecture 14 Undirected Graphical Models & Inference 23.06.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models CPSC

More information

CS Lecture 4. Markov Random Fields

CS Lecture 4. Markov Random Fields CS 6347 Lecture 4 Markov Random Fields Recap Announcements First homework is available on elearning Reminder: Office hours Tuesday from 10am-11am Last Time Bayesian networks Today Markov random fields

More information

Undirected Graphical Models

Undirected Graphical Models Undirected Graphical Models 1 Conditional Independence Graphs Let G = (V, E) be an undirected graph with vertex set V and edge set E, and let A, B, and C be subsets of vertices. We say that C separates

More information

Chapter 17: Undirected Graphical Models

Chapter 17: Undirected Graphical Models Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)

More information

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence General overview Introduction Directed acyclic graphs (DAGs) and conditional independence DAGs and causal effects

More information

Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials

Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials by Phillip Krahenbuhl and Vladlen Koltun Presented by Adam Stambler Multi-class image segmentation Assign a class label to each

More information

Bayesian Machine Learning - Lecture 7

Bayesian Machine Learning - Lecture 7 Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

9 Forward-backward algorithm, sum-product on factor graphs

9 Forward-backward algorithm, sum-product on factor graphs Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 9 Forward-backward algorithm, sum-product on factor graphs The previous

More information

1 Undirected Graphical Models. 2 Markov Random Fields (MRFs)

1 Undirected Graphical Models. 2 Markov Random Fields (MRFs) Machine Learning (ML, F16) Lecture#07 (Thursday Nov. 3rd) Lecturer: Byron Boots Undirected Graphical Models 1 Undirected Graphical Models In the previous lecture, we discussed directed graphical models.

More information

Alternative Parameterizations of Markov Networks. Sargur Srihari

Alternative Parameterizations of Markov Networks. Sargur Srihari Alternative Parameterizations of Markov Networks Sargur srihari@cedar.buffalo.edu 1 Topics Three types of parameterization 1. Gibbs Parameterization 2. Factor Graphs 3. Log-linear Models with Energy functions

More information

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4 ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian

More information

Alternative Parameterizations of Markov Networks. Sargur Srihari

Alternative Parameterizations of Markov Networks. Sargur Srihari Alternative Parameterizations of Markov Networks Sargur srihari@cedar.buffalo.edu 1 Topics Three types of parameterization 1. Gibbs Parameterization 2. Factor Graphs 3. Log-linear Models Features (Ising,

More information

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

Inference in Graphical Models Variable Elimination and Message Passing Algorithm Inference in Graphical Models Variable Elimination and Message Passing lgorithm Le Song Machine Learning II: dvanced Topics SE 8803ML, Spring 2012 onditional Independence ssumptions Local Markov ssumption

More information

Random Field Models for Applications in Computer Vision

Random Field Models for Applications in Computer Vision Random Field Models for Applications in Computer Vision Nazre Batool Post-doctorate Fellow, Team AYIN, INRIA Sophia Antipolis Outline Graphical Models Generative vs. Discriminative Classifiers Markov Random

More information

Does Better Inference mean Better Learning?

Does Better Inference mean Better Learning? Does Better Inference mean Better Learning? Andrew E. Gelfand, Rina Dechter & Alexander Ihler Department of Computer Science University of California, Irvine {agelfand,dechter,ihler}@ics.uci.edu Abstract

More information

Graphical models. Sunita Sarawagi IIT Bombay

Graphical models. Sunita Sarawagi IIT Bombay 1 Graphical models Sunita Sarawagi IIT Bombay http://www.cse.iitb.ac.in/~sunita 2 Probabilistic modeling Given: several variables: x 1,... x n, n is large. Task: build a joint distribution function Pr(x

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

Graphical Models Another Approach to Generalize the Viterbi Algorithm

Graphical Models Another Approach to Generalize the Viterbi Algorithm Exact Marginalization Another Approach to Generalize the Viterbi Algorithm Oberseminar Bioinformatik am 20. Mai 2010 Institut für Mikrobiologie und Genetik Universität Göttingen mario@gobics.de 1.1 Undirected

More information

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Problem Set 3 Issued: Thursday, September 25, 2014 Due: Thursday,

More information

Graphical Models and Independence Models

Graphical Models and Independence Models Graphical Models and Independence Models Yunshu Liu ASPITRG Research Group 2014-03-04 References: [1]. Steffen Lauritzen, Graphical Models, Oxford University Press, 1996 [2]. Christopher M. Bishop, Pattern

More information

6.867 Machine learning, lecture 23 (Jaakkola)

6.867 Machine learning, lecture 23 (Jaakkola) Lecture topics: Markov Random Fields Probabilistic inference Markov Random Fields We will briefly go over undirected graphical models or Markov Random Fields (MRFs) as they will be needed in the context

More information

An Introduction to Bayesian Networks: Representation and Approximate Inference

An Introduction to Bayesian Networks: Representation and Approximate Inference An Introduction to Bayesian Networks: Representation and Approximate Inference Marek Grześ Department of Computer Science University of York Graphical Models Reading Group May 7, 2009 Data and Probabilities

More information

Lecture 5: Bayesian Network

Lecture 5: Bayesian Network Lecture 5: Bayesian Network Topics of this lecture What is a Bayesian network? A simple example Formal definition of BN A slightly difficult example Learning of BN An example of learning Important topics

More information

Chapter 16. Structured Probabilistic Models for Deep Learning

Chapter 16. Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Bayesian Learning in Undirected Graphical Models

Bayesian Learning in Undirected Graphical Models Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ Work with: Iain Murray and Hyun-Chul

More information

Outline. Spring It Introduction Representation. Markov Random Field. Conclusion. Conditional Independence Inference: Variable elimination

Outline. Spring It Introduction Representation. Markov Random Field. Conclusion. Conditional Independence Inference: Variable elimination Probabilistic Graphical Models COMP 790-90 Seminar Spring 2011 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Outline It Introduction ti Representation Bayesian network Conditional Independence Inference:

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 10 Undirected Models CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due this Wednesday (Nov 4) in class Project milestones due next Monday (Nov 9) About half

More information

Representation of undirected GM. Kayhan Batmanghelich

Representation of undirected GM. Kayhan Batmanghelich Representation of undirected GM Kayhan Batmanghelich Review Review: Directed Graphical Model Represent distribution of the form ny p(x 1,,X n = p(x i (X i i=1 Factorizes in terms of local conditional probabilities

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1

More information

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2016/2017 Lesson 13 24 march 2017 Reasoning with Bayesian Networks Naïve Bayesian Systems...2 Example

More information

10708 Graphical Models: Homework 2

10708 Graphical Models: Homework 2 10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves

More information

Notes on Machine Learning for and

Notes on Machine Learning for and Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori

More information

Notes on Markov Networks

Notes on Markov Networks Notes on Markov Networks Lili Mou moull12@sei.pku.edu.cn December, 2014 This note covers basic topics in Markov networks. We mainly talk about the formal definition, Gibbs sampling for inference, and maximum

More information

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayesian Networks BY: MOHAMAD ALSABBAGH Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 18 The Junction Tree Algorithm Collect & Distribute Algorithmic Complexity ArgMax Junction Tree Algorithm Review: Junction Tree Algorithm end message

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Discriminative Fields for Modeling Spatial Dependencies in Natural Images

Discriminative Fields for Modeling Spatial Dependencies in Natural Images Discriminative Fields for Modeling Spatial Dependencies in Natural Images Sanjiv Kumar and Martial Hebert The Robotics Institute Carnegie Mellon University Pittsburgh, PA 15213 {skumar,hebert}@ri.cmu.edu

More information

Approximate Message Passing Algorithms

Approximate Message Passing Algorithms November 4, 2017 Outline AMP (Donoho et al., 2009, 2010a) Motivations Derivations from a message-passing perspective Limitations Extensions Generalized Approximate Message Passing (GAMP) (Rangan, 2011)

More information

Machine Learning Summer School

Machine Learning Summer School Machine Learning Summer School Lecture 1: Introduction to Graphical Models Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ epartment of ngineering University of ambridge, UK

More information

Introduction To Graphical Models

Introduction To Graphical Models Peter Gehler Introduction to Graphical Models Introduction To Graphical Models Peter V. Gehler Max Planck Institute for Intelligent Systems, Tübingen, Germany ENS/INRIA Summer School, Paris, July 2013

More information

4 : Exact Inference: Variable Elimination

4 : Exact Inference: Variable Elimination 10-708: Probabilistic Graphical Models 10-708, Spring 2014 4 : Exact Inference: Variable Elimination Lecturer: Eric P. ing Scribes: Soumya Batra, Pradeep Dasigi, Manzil Zaheer 1 Probabilistic Inference

More information

Revisiting the Limits of MAP Inference by MWSS on Perfect Graphs

Revisiting the Limits of MAP Inference by MWSS on Perfect Graphs Revisiting the Limits of MAP Inference by MWSS on Perfect Graphs Adrian Weller University of Cambridge CP 2015 Cork, Ireland Slides and full paper at http://mlg.eng.cam.ac.uk/adrian/ 1 / 21 Motivation:

More information

CSCE 478/878 Lecture 6: Bayesian Learning

CSCE 478/878 Lecture 6: Bayesian Learning Bayesian Methods Not all hypotheses are created equal (even if they are all consistent with the training data) Outline CSCE 478/878 Lecture 6: Bayesian Learning Stephen D. Scott (Adapted from Tom Mitchell

More information

The Origin of Deep Learning. Lili Mou Jan, 2015

The Origin of Deep Learning. Lili Mou Jan, 2015 The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets

More information

Chapter 10: Random Fields

Chapter 10: Random Fields LEARNING AND INFERENCE IN GRAPHICAL MODELS Chapter 10: Random Fields Dr. Martin Lauer University of Freiburg Machine Learning Lab Karlsruhe Institute of Technology Institute of Measurement and Control

More information

Graphical Model Inference with Perfect Graphs

Graphical Model Inference with Perfect Graphs Graphical Model Inference with Perfect Graphs Tony Jebara Columbia University July 25, 2013 joint work with Adrian Weller Graphical models and Markov random fields We depict a graphical model G as a bipartite

More information

Machine Learning, Fall 2012 Homework 2

Machine Learning, Fall 2012 Homework 2 0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0

More information

An Introduction to Bayesian Machine Learning

An Introduction to Bayesian Machine Learning 1 An Introduction to Bayesian Machine Learning José Miguel Hernández-Lobato Department of Engineering, Cambridge University April 8, 2013 2 What is Machine Learning? The design of computational systems

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 16 Undirected Graphs Undirected Separation Inferring Marginals & Conditionals Moralization Junction Trees Triangulation Undirected Graphs Separation

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

p L yi z n m x N n xi

p L yi z n m x N n xi y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen

More information

Bayesian Learning in Undirected Graphical Models

Bayesian Learning in Undirected Graphical Models Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ and Center for Automated Learning and

More information

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Markov Random Fields: Representation Conditional Random Fields Log-Linear Models Readings: KF

More information

Stephen Scott.

Stephen Scott. 1 / 28 ian ian Optimal (Adapted from Ethem Alpaydin and Tom Mitchell) Naïve Nets sscott@cse.unl.edu 2 / 28 ian Optimal Naïve Nets Might have reasons (domain information) to favor some hypotheses/predictions

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3

More information

Introduction to Graphical Models

Introduction to Graphical Models Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic

More information

Gibbs Fields & Markov Random Fields

Gibbs Fields & Markov Random Fields Statistical Techniques in Robotics (16-831, F10) Lecture#7 (Tuesday September 21) Gibbs Fields & Markov Random Fields Lecturer: Drew Bagnell Scribe: Bradford Neuman 1 1 Gibbs Fields Like a Bayes Net, a

More information

Junction Tree, BP and Variational Methods

Junction Tree, BP and Variational Methods Junction Tree, BP and Variational Methods Adrian Weller MLSALT4 Lecture Feb 21, 2018 With thanks to David Sontag (MIT) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,

More information

Lecture 8: PGM Inference

Lecture 8: PGM Inference 15 September 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I 1 Variable elimination Max-product Sum-product 2 LP Relaxations QP Relaxations 3 Marginal and MAP X1 X2 X3 X4

More information

STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas. Markov networks: Representation

STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas. Markov networks: Representation STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas Markov networks: Representation Markov networks: Undirected Graphical models Model the following distribution using a Bayesian

More information

5. Sum-product algorithm

5. Sum-product algorithm Sum-product algorithm 5-1 5. Sum-product algorithm Elimination algorithm Sum-product algorithm on a line Sum-product algorithm on a tree Sum-product algorithm 5-2 Inference tasks on graphical models consider

More information