Gibbs Field & Markov Random Field

Similar documents
Gibbs Fields & Markov Random Fields

1 Undirected Graphical Models. 2 Markov Random Fields (MRFs)

Undirected graphical models

3 Undirected Graphical Models

Independencies. Undirected Graphical Models 2: Independencies. Independencies (Markov networks) Independencies (Bayesian Networks)

Lecture 4 October 18th

3 : Representation of Undirected GM

Probabilistic Graphical Models

Undirected Graphical Models: Markov Random Fields

Probabilistic Graphical Models

Undirected Graphical Models

CS Lecture 4. Markov Random Fields

Markov Networks.

Review: Directed Models (Bayes Nets)

Chris Bishop s PRML Ch. 8: Graphical Models

Markov Networks. l Like Bayes Nets. l Graph model that describes joint probability distribution using tables (AKA potentials)

4.1 Notation and probability review

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

CSC 412 (Lecture 4): Undirected Graphical Models

Probabilistic Graphical Models

Markov Networks. l Like Bayes Nets. l Graphical model that describes joint probability distribution using tables (AKA potentials)

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Undirected Graphical Models

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Rapid Introduction to Machine Learning/ Deep Learning

Chapter 16. Structured Probabilistic Models for Deep Learning

Markov properties for undirected graphs

Markov properties for undirected graphs

EE512 Graphical Models Fall 2009

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Conditional Independence and Factorization

Machine Learning Lecture 14

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

6.867 Machine learning, lecture 23 (Jaakkola)

Undirected Graphical Models 4 Bayesian Networks and Markov Networks. Bayesian Networks to Markov Networks

Representation of undirected GM. Kayhan Batmanghelich

Lecture 17: May 29, 2002

Directed and Undirected Graphical Models

Lecture 6: Graphical Models

Statistical Approaches to Learning and Discovery

4 : Exact Inference: Variable Elimination

Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Artificial Intelligence Bayesian Networks

9 Forward-backward algorithm, sum-product on factor graphs

Probabilistic Graphical Models

CRF for human beings

Bayesian Machine Learning - Lecture 7

Machine Learning 4771

Introduction to Graphical Models

UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics. EECS 281A / STAT 241A Statistical Learning Theory

Directed Graphical Models

Graphical Models and Kernel Methods

A brief introduction to Conditional Random Fields

10 : HMM and CRF. 1 Case Study: Supervised Part-of-Speech Tagging

2 : Directed GMs: Bayesian Networks

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

An Introduction to Bayesian Machine Learning

Probabilistic Graphical Models (I)

An Introduction to Exponential-Family Random Graph Models

Intelligent Systems:

Sampling Methods (11/30/04)

Using Graphs to Describe Model Structure. Sargur N. Srihari

CS Lecture 3. More Bayesian Networks

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

Lecture 9: PGM Learning

Statistical Learning

Lecture 15. Probabilistic Models on Graph

Markov Random Fields

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Informatics 2D Reasoning and Agents Semester 2,

Mathematical Formulation of Our Example

Probabilistic Graphical Models

Based on slides by Richard Zemel

Intelligent Systems (AI-2)

Directed Graphical Models or Bayesian Networks

1 Bayesian Linear Regression (BLR)

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

Extensions of Bayesian Networks. Outline. Bayesian Network. Reasoning under Uncertainty. Features of Bayesian Networks.

Lecture 5: Bayesian Network

STA 4273H: Statistical Machine Learning

CPSC 540: Machine Learning

Weighted Majority and the Online Learning Approach

Bayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018

Bayesian Networks (Part II)

Probabilistic Graphical Models Lecture Notes Fall 2009

Lecture 12: May 09, Decomposable Graphs (continues from last time)

Intelligent Systems (AI-2)

Intelligent Systems (AI-2)

Graphical Models - Part II

{ p if x = 1 1 p if x = 0

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Machine Learning. Torsten Möller.

STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas. Markov networks: Representation

Covariance. if X, Y are independent

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics

Probabilistic Machine Learning

Introduction to Graphical Models

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Transcription:

Statistical Techniques in Robotics (16-831, F12) Lecture#07 (Wednesday September 19) Gibbs Field & Markov Random Field Lecturer: Drew Bagnell Scribe:Minghao Ruan 1 1 SLAM (continued from last lecture) Figure 1 depicts a typical SLAM system representd as a Bayes Network. x i represents the robot state at time i; z i is the corresponding measurement at instance i; and l i s are some landmarks. x0 x1 x2 x3 x4 x5 l1 z0 z1 z2 z3 z4 z5 l2 l3 l4 Figure 1: Represent the SLAM problem as a Bayes Network. It should be noted, however, this diagram is quite strange in the following sense: A real robot may not be absolutely certain about which landmark it observes at any instance. (We basically assume the landmarks are broadcasting their names) Not seeing a certain landmark could also provide useful information. We also assume one landmark per observation. This is because we can break up multiple landmarks into two observations with no motion between the two instances. Despite these facts, we still prefer four 2D filters(for each landmark) to one 8D filter which is much more expensive to maintain. Therefore we are interested in the question: given all the observations, are the landmarks conditionally independent of each other? The answer is No. For example l 1 l 3 z 0... z n is not true becuase observing z 0 unblockes the path between the two converging arrows (as shown in Fig 3(a)). 1 Some content adapted from previous scribes: Bradford Neuman, Byron Boots 1

x0 x3 x0 x3 l1 z0 z3 l1 z0 z3 l3 l3 (a) Observation unblockes the path (b) Knowning the states blocks the path If this is a mapping problem, however, where the robot states x i are known, the landmarks are indeed conditionally independent because knowledge about x i blocks the flow of information as shown in Fig 3(b). 2 Undirected Graph 2.1 Gibbs Field Like Bayes Network, Gibbs Field also represents the relationship of a set of random variables, but with undirected connections. A simple Gibbs Field is shown in Figure 2; x 3 x 5 x 2 x 4 Figure 2: A simple Gibbs Field with 5 random variables. The first important structure in the graph is a clique, which is a subset of the vertices in which any two nodes are connected by an edge. The simplest clique is an individual node, for example, x 3 and x 6. Often we are interested in the maximum cliques which are not proper subsets of any other cliques. In Fig 2, the two maximum cliques are c 1 = {, x 2, x 3 } and c 2 = {x 3, x 4, x 5 }. For any Gibbs Field, we can define the joint probability as follows: p( x) = 1 Z Φ ci ( x ci ) (1) c i C where Φ ci is the ith clique potential function which only depends on its member nodes in c i. Each potential function has to be strictly positive, but unlike probability distribution, they need not sum to 1. For this reason, the final product has to be normalized by summing over all possible outcomes to make it a valid probability distribution. Since we can easily write a clique potential function as 2

a product of all potential functions of its sub-cliques, it is convenient to define the joint probability over only the maximum cliques. 2.1.1 A Concrete Example Suppose we have a point cloud from a laser scan and we d like to label each point as either V egetation(= 1) or Ground(= 0). Suppose the raw cloud has been converted to some mesh grid as in Fig 3(b), and we zoom in to a small patch as in Fig 3(c). (a) Laser scan points (b) Connect points in a mesh (c) One clique of the graph We can then define a joint probability destribution for this configuration in Fig 3(c): p( x) = 1 Z Φ 12(, x 2 )Φ 13 (, x 3 )Φ 23 (x 2, x 3 )Φ 14 (, x 4 ) (2) where { 1 if x i = x j Φ ij (x i, x j ) = 1/2 otherwise Intuitively, the maximum probability can be achieved when all vertices have the same label. In real problems, Eq. 2 also takes into account of prior information. For example, a modified version of Eq. 2 may look like: p( x) = 1 Z Φ 12(, x 2 )Φ 13 (, x 3 )Φ 23 (x 2, x 3 )Φ 14 (, x 4 )Φ 1 ( )Φ 2 (x 2 )Φ 3 (x 3 )Φ 4 (x 4 ) (3) where Φ i (x i ) = { 1 Vegetation 0 Ground 3

2.1.2 Clique Potential as Energy Function As the name suggests, a potential function can be related to the energy of a state, which is an indicator of the likelihood of a particular configuation associated with the clique. Higher energy means unstable or unlikely, whereas lower energy means stable or more likely. The energy functions usually take the following form: p( x) = 1 Z e f c( x) (4) 3 Markov Random Field(MRF) Sometimes we are only interested in finding out given a graph, whether two sets of random variables are conditionally independent if a third set of variables are observed. Becuase of its simple rule of finding conditional independence, Markov Random Field is most commonly used to express such relationship between any given random variables. Much like the structure of Bayes Network makes it suitable for logical reasoning, the graphical representation of the MRF makes it particular useful for inference in physical space. 3.1 Rules of Conditional Independence The simple answer is: two variables are conditionally independent if all paths between two variables are blocked (by observed nodes). The formal rules are the following: Local rule: A variable is conditionally independent of all other variables given all its neigbors. Fig. 3(d) shows the Markov blanket as all paths into and out of x 3 are blanketed. Global rule: If all paths between two sets of variables A and B pass through another set of nodes S. Then any two subsets of A and B are conditionally independent is S is observed. x 4 x 4 x 5 x 6 x 7 x 8 x 5 x 6 x 7 x 8 x 9 0 1 2 x 9 0 1 2 (d) x 3 x 2, x 4, x 7 (e) x 3 x 2, x 4, x 6 Figure 3: Markov Random Field example. 4

3.2 Hammersley-Clifford theorem Both Gibbs Field and MRF can be derived from a given graph but they each reveals a different aspect of the problem. Like Bayes Network, Gibbs Field also defines a joint probability, whereas Markov Random Field is mainly for listing conditional independence between variables. Both theories developed independently for a while until Hammersley-Clifford theorem states that under certain conditions, the Gibbs Field and Markov Random Field are equivalent. Though the proof is not a trivial task, the conclusion is quite straightforward: Gibbs field on graph G has all the conditional independence relationships of an MRF on G. Every MRF can be written as a Gibbs Field on the same graph provided p( x) > 0 everywhere. 4 Converting Bayes Net to Gibbs Field/MRF Let s first consider the toy problem in Fig. 4, where the the conversion is done by directly removing the arrows. (a) Bayes Net representation of a set of RVs (b) Gibbs Field/MRF of the same set of RVs Figure 4: Converting a simple Bayes Network to a Gibbs Field/MRF By definition, the joint probability for the Bayes Network is p( x) = p( ) p(x 2 ) p(x 3 x 2 ). Similarly, the joint probability for the Gibbs Field/MRF can be written as product of potential functions over cliques: p( x) = 1 Z Φ 12(, x 2 ) Φ 23 (x 2, x 3 ) Φ 1 ( ) Φ 2 (x 2 ) Φ 3 (x 3 ). To convert the Bayes Net to Gibbs Field/MRF, we only have to come up with the right potential functions that are consistent with the Bayes Network description. In this case, the following assignment would satisfy the requirement: Φ 1 ( ) = p( ) Φ 12 (, x 2 ) = p( x 2 ) Φ 23 (x 2, x 3 ) = p(x 2 x 3 ) Φ 2 (x 2 ) = Φ 3 (x 3 ) = Φ 13 (, x 3 ) = 1 What if we have a more complicated graph? For instance, a converging triple as in Fig. 5(a) The intuitive answer would also be simply removing the arrows, but this is insufficient. Let s examine what happens after removing the arrows in Fig. 5(a). Observing B blocks the path in Gibbs Field/MRF in Fig. 5(b), but the same action would have unblocked the path between A and C in Bayes net in Fig. 5(a), therefore simply removing the arrows results in a contradiction. 5

A C A C A C B B B D (a) Bayes Net D (b) Incorrect! D (c) Correct, with moralizing Figure 5: Moralizing process. The solution is to Moralize the parents whenever there is a converging child. The downside is the loss of structure in the graph (recall from last lecture, fewer arrows means more structure). In this case, A and C are no longer independent even if B is not observed. 6