Gibbs Fields & Markov Random Fields

Similar documents
1 Undirected Graphical Models. 2 Markov Random Fields (MRFs)

Gibbs Field & Markov Random Field

3 : Representation of Undirected GM

Undirected Graphical Models: Markov Random Fields

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

CS Lecture 4. Markov Random Fields

Chris Bishop s PRML Ch. 8: Graphical Models

Undirected Graphical Models

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Review: Directed Models (Bayes Nets)

Rapid Introduction to Machine Learning/ Deep Learning

Independencies. Undirected Graphical Models 2: Independencies. Independencies (Markov networks) Independencies (Bayesian Networks)

6.867 Machine learning, lecture 23 (Jaakkola)

Markov Networks.

Probabilistic Graphical Models

Conditional Independence and Factorization

CSC 412 (Lecture 4): Undirected Graphical Models

Lecture 4 October 18th

Machine Learning 4771

Undirected Graphical Models

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Statistical Approaches to Learning and Discovery

STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas. Markov networks: Representation

Directed and Undirected Graphical Models

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Machine Learning Lecture 14

Undirected graphical models

Probabilistic Graphical Models

EE512 Graphical Models Fall 2009

Probabilistic Graphical Models

Graphical Models - Part II

UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics. EECS 281A / STAT 241A Statistical Learning Theory

4.1 Notation and probability review

Example: multivariate Gaussian Distribution

Representation of undirected GM. Kayhan Batmanghelich

Markov Random Fields

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Chapter 16. Structured Probabilistic Models for Deep Learning

Markov properties for undirected graphs

Markov Networks. l Like Bayes Nets. l Graph model that describes joint probability distribution using tables (AKA potentials)

Markov properties for undirected graphs

Directed Graphical Models

Based on slides by Richard Zemel

Probabilistic Graphical Models (I)

Statistical Learning

Probabilistic Graphical Models Lecture Notes Fall 2009

An Introduction to Bayesian Machine Learning

3 Undirected Graphical Models

CRF for human beings

Introduction to Graphical Models

Probabilistic Graphical Models

A brief introduction to Conditional Random Fields

Bayesian Networks (Part II)

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

Bayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Graphical Models and Kernel Methods

Markov Random Fields

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

Markov Networks. l Like Bayes Nets. l Graphical model that describes joint probability distribution using tables (AKA potentials)

CS Lecture 3. More Bayesian Networks

Lecture 9: PGM Learning

STA 4273H: Statistical Machine Learning

Lecture 6: Graphical Models

Alternative Parameterizations of Markov Networks. Sargur Srihari

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Markov random fields. The Markov property

Lecture 15. Probabilistic Models on Graph

10 : HMM and CRF. 1 Case Study: Supervised Part-of-Speech Tagging

Probabilistic Graphical Models

Lecture 17: May 29, 2002

11 : Gaussian Graphic Models and Ising Models

Graphical Models. Outline. HMM in short. HMMs. What about continuous HMMs? p(o t q t ) ML 701. Anna Goldenberg ... t=1. !

Variational Inference (11/04/13)

2 : Directed GMs: Bayesian Networks

Probabilistic Graphical Models. Guest Lecture by Narges Razavian Machine Learning Class April

Undirected Graphical Models 4 Bayesian Networks and Markov Networks. Bayesian Networks to Markov Networks

STA 4273H: Statistical Machine Learning

CS 343: Artificial Intelligence

Probabilistic Models

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

Random Field Models for Applications in Computer Vision

Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics

Intelligent Systems (AI-2)

ˆ The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Graphical Models - Part I

Machine Learning for Data Science (CS4786) Lecture 19

Introduction to Graphical Models

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

Alternative Parameterizations of Markov Networks. Sargur Srihari

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Oliver Schulte - CMPT 419/726. Bishop PRML Ch.

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Undirected Graphical Models

Intelligent Systems:

p(x) p(x Z) = y p(y X, Z) = αp(x Y, Z)p(Y Z)

Markov and Gibbs Random Fields

Transcription:

Statistical Techniques in Robotics (16-831, F10) Lecture#7 (Tuesday September 21) Gibbs Fields & Markov Random Fields Lecturer: Drew Bagnell Scribe: Bradford Neuman 1 1 Gibbs Fields Like a Bayes Net, a Gibbs Field is a representation of a set of random variables and their relationships. An example of a Gibbs Field is given in Figure 1; edges are undirected, and connote some correlation between the connected nodes. As with a Bayes Net, fewer connections means more structure. Gibbs Fields are powerful because they imply a way to write the joint probability of the random variables as functions over cliques in the graph. x 3 x 2 Figure 1: A Gibbs Field with nodes, x 2, x 3,,. 1.1 Cliques and Joint Probability A clique is any fully connected subset of the the graph (e.g. { }, {, x 2, x 3 }, or {x 3, }). We denote the set of all cliques in a graph as C, with a clique c i C comprising its nodes (e.g. c 3 = {x 3, }). The joint probability for any set of random variables x = {,... x n } represented by a Gibbs Field can be written as the product of clique potentials φ i : P (x) = 1 Z φ i (c i ), (1) with φ i (c i ) the ith clique potential, a function only of the values of the clique members in c i. Each potential function φ i must be positive, but unlike probability distribution functions, they need not sum to 1. A normalization constant Z is required in to create a valid probability distribution Z = x c C φ i(c i ). For any Gibbs Field, there is a subset Ĉ of C consisting of only maximal cliques which are not proper subsets of any other clique. For example, the Gibbs Field in Figure 1 has two maximal cliques: ĉ 1 = {x 0,, x 2 } and ĉ 2 = {x 2, x 3, }. We can write a clique potential ˆφ for each maximal clique that is the product of all the potentials of its sub-cliques. In this way, we can write the joint 1 Some content adapted from previous scribes: Chris Dellin, Bruno Hexsel, and Byron Boots c i C 1

probability as only a product over these maximal clique potentials: P (x) = 1 Z c i Ĉ ˆφ i (c i ). (2) Note that this is defined over all maximal cliques, not just the single largest one. We usually take these potentials to be only functions over the maximal cliques, as in (2). 1.2 Clique Potentials as Energy Functions Often, clique potentials of the form φ i (c i ) = exp( f(c i )) are used, with f i (c i ) an energy function over values of c i. The energy assigned by the function f i (c i ) is an indicator of the likelihood of the corresponding relationships within the clique, with a higher energy configuration having lower probability and vice-versa. If this is the case, (1) can be written as P (x) = 1 Z exp f i (c i ). (3) ci C For example, we can write energy functions over the cliques in the example graph from Figure 1. Let f 1 ({x 0,, x 2 }) = x 2 0 + ( 5x 2 3) 2, and f 2 ({x 2, x 3, }) = (x 2 x 3 ) 2 + (x 2 + x 3 + ) 2. Then the joint probability can be written as P (x) = 1 Z exp [ (x 2 0 + ( 5x 2 3) 2 ) ((x 2 x 3 ) 2 + (x 2 + x 3 + ) 2 ) ]. In this form (f i quadratic in the xs), the Gibbs Field is known as a Gaussian Gibbs Field. 1.3 Moralizing: Converting a Bayes Net to a Gibbs Field Consider the Bayes Net in Figure 2(a). Simply removing the arrows (Figure 2(b)) to create a Gibbs Field is not sufficient! In particular, in the resulting Gibbs Field, observing node B causes nodes A and C to become independent. This is the opposite of what the original Bayes Net represented! Instead, we need to moralize the graph. Whenever there are two parents that are not connected (married), we connect them. Thus, Figure 2(c) shows the correct representation of the original Bayes net. Note that during this conversion we actually lose information, namely that A and C are marginally independent. 2 Markov Random Fields (MRFs) A Markov Random Field (MRF) is an undirected graphical model that explicitly expresses the conditional independence relationships between nodes. Two nodes are conditionally independent if all paths between them are blocked by given nodes. See Figures 3(a) and 3(b) for examples. Note that this rule is much simpler than for Bayes Nets. Due to the way that Markov Random Fields express the relationship between nodes, they make a lot of sense as a representation of physical space. 2

A C A C A C B B B D (a) Bayes Net D (b) Incorrect! D (c) Correct, with moralizing Figure 2: Moralizing process. (a) Node is conditionally independent to node x 3 given x 2, and x 7. There is no unblocked path between and x 3. (b) There is at least one path from to x 3 given x 2, and x 6; and x 3 are not (necessarily) conditionally independent. Figure 3: Markov Random Field example. In an MRF, a path is blocked if any node along the path is observed. If there is no un-blocked path between two nodes, they are conditionally independent. A given node can be made conditionally independent of any non-neighboring node by observing each neighboring node. This is often called the Markov blanket because it blankets the node from the rest of the graph. For the example in Figure 3(a), we can write a series of conditional independence relationiships that are asserted from the Field. For example: x 3 x 2, x 7 x 3, x 6, 0 2 x 2, x 7, 0. An MRF itself is a picture explaining these assumptions, which specify information about the underlying probability distribution. Specifically, it represents, for any given set of observed values, which nodes are conditionally independent. This does not, however, mean that these are the only conditional independencies. The fact that there is a path does not necessarily mean that there is 3

a dependence. In fact, a distribution where each variable is fully independent can be represented with any set of edges, since any conditional independencies read from the MRF are definitely true. In general, the fewer edged an MRF has, the more information it is providing, since a sparser graph will have blocked paths with fewer observed variables. This means that in a sparser graph, there are more conditional independencies specified. One important thing to understand is the relationship between the set of possible probability distributions and the set of possible MRFs. Since a distribution of all independent variables satisfies any set of conditional independencies, we know that the independent distribution: P (x) = i P (x i ). represents every possible MRF. This also means that there can be more than one distribution representing a given MRF. In the other direction, we can also see that there are many possible MRFs for a given distribution. For any distribution, we can draw a fully connected MRF, but this is completely uninformative. In general, it may be incredibly difficult to determine the most useful MRF given a distribution, but we can typically draw the MRF using assumptions, for example each node is affected only by its neighbors 2.1 Relationship to Gibbs Fields MRFs and Gibbs Fields both look the same in the sense that they are undirected graphs. Gibbs fields have an implicit probability function φ for each clique, while MRFs only specify the conditional independence. A natural question to ask is weather we can use the same graph that represents an MRF and come up with a set of φ functions which represent the same distributions. The Hammersley-Clifford theorem (also called the Fundamental theorem of random fields) proves that a Markov Random Field and Gibbs Field are equivalent with regard to the same graph. 2 In other words: Given any Markov Random Field, all joint probability distributions that satisfy the conditional independencies can be written as clique potentials over the maximal cliques of the corresponding Gibbs Field. Given any Gibbs Field, all of its joint probability distributions satisfy the conditional independence relationships specified by the corresponding Markov Random Field. 3 Factor Graphs One problem with MRFs and Gibbs fields is that they don t have the granularity we might need. Consider, for example, the Gibbs field in Figure 4. From this picture, we would have to define φ over maximum cliques, which in this case are the squares. When this was a 4-connected grid, we only had to define functions over pairs. In order to make it clear that functions need only be 2 Actually, this is true only as long as P (x) 0 x; that is, as long as all configurations of values are possible. 4

Figure 4: Diagonal connections are ambiguous, does the whole block of 4 nodes need to be considered together, or is this distribution still defined pair-wise between neighbors? Figure 5: In the factor graph, the square nodes clearly represent that only neighboring nodes are dependent on each other and the squares to not need to all be considered simultaneously defined over each pair in Figure 4, we define a factor graph. A factor graph has two types of nodes: random variables (same as the nodes in MRFs or Gibbs fields) and factors, which are square nodes and represent that all connected random variable nodes are dependent on each-other (as if they had a line in a traditional Gibbs field). This allows us to specify directly which vertices the φ functions need be defined on. In Figure 5 we can see that the factor graph representation of Figure 4 removes the ambiguity and shows us that, indeed, only neighboring pairs should be connected. In Figure 6 we see an alternate factor graph that shows us that the each node in each square is indeed dependent on each other node. The Factor Graphs in both Figures 5 and 6 are both consistent with the Gibbs field in Figure 4. 5

Figure 6: In this graph, each square of four nodes must indeed be considered simultaneously. 6