Markov Random Fields

Similar documents
Probabilistic Graphical Models Lecture Notes Fall 2009

Undirected Graphical Models: Markov Random Fields

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Undirected Graphical Models

Continuous State MRF s

MAP Examples. Sargur Srihari

Undirected graphical models

3 : Representation of Undirected GM

CSC 412 (Lecture 4): Undirected Graphical Models

Directed and Undirected Graphical Models

Review: Directed Models (Bayes Nets)

Statistical Learning

Conditional Independence and Factorization

Independencies. Undirected Graphical Models 2: Independencies. Independencies (Markov networks) Independencies (Bayesian Networks)

Rapid Introduction to Machine Learning/ Deep Learning

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Discriminative Fields for Modeling Spatial Dependencies in Natural Images

Chris Bishop s PRML Ch. 8: Graphical Models

Undirected Graphical Models

Markov Chains and MCMC

Markov and Gibbs Random Fields

3 Undirected Graphical Models

Course 16:198:520: Introduction To Artificial Intelligence Lecture 9. Markov Networks. Abdeslam Boularias. Monday, October 14, 2015

Variational Inference (11/04/13)

Markov Random Fields and Bayesian Image Analysis. Wei Liu Advisor: Tom Fletcher

Probabilistic Graphical Models

Probabilistic Graphical Models

Probabilistic Graphical Models

Simultaneous Multi-frame MAP Super-Resolution Video Enhancement using Spatio-temporal Priors

Statistics 352: Spatial statistics. Jonathan Taylor. Department of Statistics. Models for discrete data. Stanford University.

1 Undirected Graphical Models. 2 Markov Random Fields (MRFs)

Bayesian Image Segmentation Using MRF s Combined with Hierarchical Prior Models

Alternative Parameterizations of Markov Networks. Sargur Srihari

Gibbs Fields & Markov Random Fields

CPSC 540: Machine Learning

Covariance Matrix Simplification For Efficient Uncertainty Management

Markov Random Fields

CS Lecture 4. Markov Random Fields

Chapter 16. Structured Probabilistic Models for Deep Learning

6 Markov Chain Monte Carlo (MCMC)

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

Intelligent Systems:

Graphical Models. Outline. HMM in short. HMMs. What about continuous HMMs? p(o t q t ) ML 701. Anna Goldenberg ... t=1. !

Bayesian Machine Learning - Lecture 7

UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Most image analysis problems are \inference" problems: f g \Inference" Observed image Inferred image For example, \edge detection": \Inference" Page 2

Random Field Models for Applications in Computer Vision

Markov properties for undirected graphs

Dual Decomposition for Inference

CPSC 540: Machine Learning

Markov Random Fields for Computer Vision (Part 1)

Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials

Better restore the recto side of a document with an estimation of the verso side: Markov model and inference with graph cuts

Alternative Parameterizations of Markov Networks. Sargur Srihari

Discriminative Random Fields: A Discriminative Framework for Contextual Interaction in Classification

Markov random fields. The Markov property

A Generative Perspective on MRFs in Low-Level Vision Supplemental Material

Machine Learning 4771

Probabilistic Graphical Models

EE512 Graphical Models Fall 2009

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Implicit Priors for Model-Based Inversion

Areal Unit Data Regular or Irregular Grids or Lattices Large Point-referenced Datasets

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Lecture 4 October 18th

Markov properties for undirected graphs

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Probabilistic Graphical Models

Submodularity in Machine Learning

Estimation theory and information geometry based on denoising

Chapter 10: Random Fields

4.1 Notation and probability review

Energy Based Models. Stefano Ermon, Aditya Grover. Stanford University. Lecture 13

High-dimensional graphical model selection: Practical and information-theoretic limits

Representation of undirected GM. Kayhan Batmanghelich

Markov Networks. l Like Bayes Nets. l Graphical model that describes joint probability distribution using tables (AKA potentials)

Markov Random Fields (A Rough Guide)

Joint Optimization of Segmentation and Appearance Models

Change Detection in Optical Aerial Images by a Multi-Layer Conditional Mixed Markov Model

Learning discrete graphical models via generalized inverse covariance matrices

Spectral Clustering. Spectral Clustering? Two Moons Data. Spectral Clustering Algorithm: Bipartioning. Spectral methods

Energy Minimization via Graph Cuts

Introduction to Graphical Models

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Graphical Models and Independence Models

An Introduction to Exponential-Family Random Graph Models

4 : Exact Inference: Variable Elimination

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

High dimensional Ising model selection

The Gibbs fields approach and related dynamics in image processing

Introduction to Bayesian methods in inverse problems

CS 294-2, Visual Grouping and Recognition èprof. Jitendra Malikè Sept. 8, 1999

Bayesian Learning in Undirected Graphical Models

Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013

Notes on Regularization and Robust Estimation Psych 267/CS 348D/EE 365 Prof. David J. Heeger September 15, 1998

Gibbs Field & Markov Random Field

Summary STK 4150/9150

Probabilistic Graphical Models (I)

Intelligent Systems (AI-2)

Transcription:

Markov Random Fields Umamahesh Srinivas ipal Group Meeting February 25, 2011

Outline 1 Basic graph-theoretic concepts 2 Markov chain 3 Markov random field (MRF) 4 Gauss-Markov random field (GMRF), and applications 5 Other popular MRFs 02/25/2011 ipal Group Meeting 2

References 1 Charles Bouman, Markov random fields and stochastic image models. Tutorial presented at ICIP 1995 2 Mario Figueiredo, Bayesian methods and Markov random fields. Tutorial presented at CVPR 1998 02/25/2011 ipal Group Meeting 3

Basic graph-theoretic concepts A graph G = (V, E) is a finite collection of nodes (or vertices) V = {n 1, n 2,..., n N } and set of edges E ( ) V 2 We consider only undirected graphs Neighbor: Two nodes n i, n j V are neighbors if (n i, n j ) E Neighborhood of a node: N (n i ) = {n j : (n i, n j ) E} Neighborhood is a symmetric relation: n i N (n j ) n j N (n i ) Complete graph: n i V, N (n i ) = {(n i, n j ), j = {1, 2,..., N}\{i}} Clique: a complete subgraph of G. Maximal clique: Clique with maximal number of nodes; cannot add any other node while still retaining complete connectedness. 02/25/2011 ipal Group Meeting 4

Basic graph-theoretic concepts A graph G = (V, E) is a finite collection of nodes (or vertices) V = {n 1, n 2,..., n N } and set of edges E ( ) V 2 We consider only undirected graphs Neighbor: Two nodes n i, n j V are neighbors if (n i, n j ) E Neighborhood of a node: N (n i ) = {n j : (n i, n j ) E} Neighborhood is a symmetric relation: n i N (n j ) n j N (n i ) Complete graph: n i V, N (n i ) = {(n i, n j ), j = {1, 2,..., N}\{i}} Clique: a complete subgraph of G. Maximal clique: Clique with maximal number of nodes; cannot add any other node while still retaining complete connectedness. 02/25/2011 ipal Group Meeting 4

Basic graph-theoretic concepts A graph G = (V, E) is a finite collection of nodes (or vertices) V = {n 1, n 2,..., n N } and set of edges E ( ) V 2 We consider only undirected graphs Neighbor: Two nodes n i, n j V are neighbors if (n i, n j ) E Neighborhood of a node: N (n i ) = {n j : (n i, n j ) E} Neighborhood is a symmetric relation: n i N (n j ) n j N (n i ) Complete graph: n i V, N (n i ) = {(n i, n j ), j = {1, 2,..., N}\{i}} Clique: a complete subgraph of G. Maximal clique: Clique with maximal number of nodes; cannot add any other node while still retaining complete connectedness. 02/25/2011 ipal Group Meeting 4

Illustration V = {1, 2, 3, 4, 5, 6} E = {(1, 2), (1, 3), (2, 4), (2, 5), (3, 4), (3, 6), (4, 6), (5, 6)} N (4) = {2, 3, 6} Examples of cliques: {(1), (3, 4, 6), (2, 5)} Set of all cliques: V E {3, 4, 6} 02/25/2011 ipal Group Meeting 5

Separation Let A, B, C be three disjoint subsets of V C separates A from B if any path from a node in A to a node in B contains some node in C Example: C = {1, 4, 6} separates A = {3} from B = {2, 5} 02/25/2011 ipal Group Meeting 6

Markov chains Graphical model: Associate each node of a graph with a random variable (or a collection thereof) Homogeneous 1-D Markov chain: p(x n x i, i < n) = p(x n x n 1 ) Probability of a sequence given by: N p(x) = p(x 0 ) p(x n x n 1 ) n=1 02/25/2011 ipal Group Meeting 7

2-D Markov chains Advantages: Simple expressions for probability Simple parameter estimation Disadvantages: No natural ordering of image pixels Anisotropic model behavior 02/25/2011 ipal Group Meeting 8

Random fields on graphs Consider a collection of random variables x = (x 1, x 2,..., x N ) with associated joint probability distribution p(x) Let A, B, C be three disjoint subsets of V. Let x A denote the collection of random variables in A. Conditional independence: A B C A B C p(x A, x B x C) = p(x A x C)p(x B x C) Markov random field: undirected graphical model in which each node corresponds to a random variable or a collection of random variables, and the edges identify conditional dependencies. 02/25/2011 ipal Group Meeting 9

Random fields on graphs Consider a collection of random variables x = (x 1, x 2,..., x N ) with associated joint probability distribution p(x) Let A, B, C be three disjoint subsets of V. Let x A denote the collection of random variables in A. Conditional independence: A B C A B C p(x A, x B x C) = p(x A x C)p(x B x C) Markov random field: undirected graphical model in which each node corresponds to a random variable or a collection of random variables, and the edges identify conditional dependencies. 02/25/2011 ipal Group Meeting 9

Markov properties Pairwise Markovianity: (n i, n j ) / E x i and x j are independent when conditioned on all other variables Local Markovianity: p(x i, x j x \{i,j} ) = p(x i x \{i,j} )p(x j x \{i,j} ) Given its neighborhood, a variable is independent on the rest of the variables p(x i x V\{i} ) = p(x i x N (i) ) Global Markovianity: Let A, B, C be three disjoint subsets of V. If C separates A from B p(x A, x B x C ) = p(x A x C )p(x B x C ), then p( ) is global Markov w.r.t. G. 02/25/2011 ipal Group Meeting 10

Markov properties Pairwise Markovianity: (n i, n j ) / E x i and x j are independent when conditioned on all other variables Local Markovianity: p(x i, x j x \{i,j} ) = p(x i x \{i,j} )p(x j x \{i,j} ) Given its neighborhood, a variable is independent on the rest of the variables p(x i x V\{i} ) = p(x i x N (i) ) Global Markovianity: Let A, B, C be three disjoint subsets of V. If C separates A from B p(x A, x B x C ) = p(x A x C )p(x B x C ), then p( ) is global Markov w.r.t. G. 02/25/2011 ipal Group Meeting 10

Markov properties Pairwise Markovianity: (n i, n j ) / E x i and x j are independent when conditioned on all other variables Local Markovianity: p(x i, x j x \{i,j} ) = p(x i x \{i,j} )p(x j x \{i,j} ) Given its neighborhood, a variable is independent on the rest of the variables p(x i x V\{i} ) = p(x i x N (i) ) Global Markovianity: Let A, B, C be three disjoint subsets of V. If C separates A from B p(x A, x B x C ) = p(x A x C )p(x B x C ), then p( ) is global Markov w.r.t. G. 02/25/2011 ipal Group Meeting 10

Hammersley-Clifford Theorem Consider a random field x on a graph G, such that p(x) > 0. Let C denote the set of all maximal cliques of the graph. If the field has the local Markov property, then p(x) can be written as a Gibbs distribution: { p(x) = 1 Z exp } V C (x C ), C C where Z, the normalizing constant, is called the partition function; V C (x C ) are the clique potentials If p(x) can be written in Gibbs form for the cliques of some graph, then it has the global Markov property. Fundamental consequence: every Markov random field can be specified via clique potentials. 02/25/2011 ipal Group Meeting 11

Hammersley-Clifford Theorem Consider a random field x on a graph G, such that p(x) > 0. Let C denote the set of all maximal cliques of the graph. If the field has the local Markov property, then p(x) can be written as a Gibbs distribution: { p(x) = 1 Z exp } V C (x C ), C C where Z, the normalizing constant, is called the partition function; V C (x C ) are the clique potentials If p(x) can be written in Gibbs form for the cliques of some graph, then it has the global Markov property. Fundamental consequence: every Markov random field can be specified via clique potentials. 02/25/2011 ipal Group Meeting 11

Hammersley-Clifford Theorem Consider a random field x on a graph G, such that p(x) > 0. Let C denote the set of all maximal cliques of the graph. If the field has the local Markov property, then p(x) can be written as a Gibbs distribution: { p(x) = 1 Z exp } V C (x C ), C C where Z, the normalizing constant, is called the partition function; V C (x C ) are the clique potentials If p(x) can be written in Gibbs form for the cliques of some graph, then it has the global Markov property. Fundamental consequence: every Markov random field can be specified via clique potentials. 02/25/2011 ipal Group Meeting 11

Regular rectangular lattices V = {(i, j), i = 1,..., M, j = 1,..., N} Order-K neighborhood system: N K (i, j) = {(m, n) : (i m) 2 + (j n) 2 K} 02/25/2011 ipal Group Meeting 12

Auto-models Only pair-wise interactions In terms of clique potentials: C > 2 V C ( ) = 0 Simplest possible neighborhood models 02/25/2011 ipal Group Meeting 13

Gauss-Markov Random Fields (GMRF) Joint probability function (assuming zero mean): { 1 p(x) = exp 1 } (2π) n/2 Σ 1/2 2 xt Σ 1 x Quadratic form in the exponent: x T Σ 1 x = x i x i Σ 1 i,j auto-model i j The neighborhood system is determined by the potential matrix Σ 1 Local conditionals are univariate Gaussian 02/25/2011 ipal Group Meeting 14

Gauss-Markov Random Fields (GMRF) Joint probability function (assuming zero mean): { 1 p(x) = exp 1 } (2π) n/2 Σ 1/2 2 xt Σ 1 x Quadratic form in the exponent: x T Σ 1 x = x i x i Σ 1 i,j auto-model i j The neighborhood system is determined by the potential matrix Σ 1 Local conditionals are univariate Gaussian 02/25/2011 ipal Group Meeting 14

Gauss-Markov Random Fields Specification via clique potentials: ( ) 2 ( ) 2 V C (x C ) = 1 αi C x i = 1 αi C x i 2 2 i C as long as i / C αi C = 0 The exponent of the GMRF density becomes: ( ) 2 V C (x C ) = 1 αi C x i 2 C C C C i V ( ) = 1 αi C αj C x i x j = 1 2 2 xt Σ 1 x. i V j V C C i V 02/25/2011 ipal Group Meeting 15

GMRF: Application to image processing Classical image smoothing prior Consider an image to be a rectangular lattice with first-order pixel neighborhoods Cliques: pairs of vertically or horizontally adjacent pixels Clique potentials: squares of first-order differences (approximation of continuous derivative) V {(i,j),(i,j 1)} (x i,j, x i,j 1 ) = 1 2 (x i,j x i,j 1 ) 2 Resulting Σ 1 : block-tridiagonal with tridiagonal blocks 02/25/2011 ipal Group Meeting 16

Bayesian image restoration with GMRF prior Observation model: y = Hx + n, n N(0, σ 2 I) Smoothing GMRF prior: p(x) exp{ 1 2 xt Σ 1 x} MAP estimate: ˆx = [σ 2 Σ 1 + H T H] 1 H T y 02/25/2011 ipal Group Meeting 17

Bayesian image restoration with GMRF prior Figure: (a) Original image, (b) Blurred and slightly noisy image, (c) Restored version of (b), (d) No blur, severe noise, (e) Restored version of (d). Deblurring: good Denoising: oversmoothing; edge discontinuities smoothed out How to preserve discontinuities? Other prior models Hidden/latent binary random variables Robust potential functions (e.g. L 2 vs. L 1-norm) 02/25/2011 ipal Group Meeting 18

Bayesian image restoration with GMRF prior Figure: (a) Original image, (b) Blurred and slightly noisy image, (c) Restored version of (b), (d) No blur, severe noise, (e) Restored version of (d). Deblurring: good Denoising: oversmoothing; edge discontinuities smoothed out How to preserve discontinuities? Other prior models Hidden/latent binary random variables Robust potential functions (e.g. L 2 vs. L 1-norm) 02/25/2011 ipal Group Meeting 18

Compound GMRF Insert binary variables v to turn off clique potentials Modified clique potentials: V (x i,j, x i,j 1, v i,j ) = 1 2 (1 v i,j)(x i,j x i,j 1 ) 2 Intuitive explanation: v = 0 clique potential is quadratic ( on ) v = 1 V C ( ) = 0 no smoothing; image has an edge at this location. Can choose separate latent variables v and h for vertical and horizontal edges respectively { p(x h, v) exp 1 } 2 xt Σ 1 (h, v)x MAP estimate: ˆx = [σ 2 Σ 1 (h, v) + H T H] 1 H T y 02/25/2011 ipal Group Meeting 19

Compound GMRF Insert binary variables v to turn off clique potentials Modified clique potentials: V (x i,j, x i,j 1, v i,j ) = 1 2 (1 v i,j)(x i,j x i,j 1 ) 2 Intuitive explanation: v = 0 clique potential is quadratic ( on ) v = 1 V C ( ) = 0 no smoothing; image has an edge at this location. Can choose separate latent variables v and h for vertical and horizontal edges respectively { p(x h, v) exp 1 } 2 xt Σ 1 (h, v)x MAP estimate: ˆx = [σ 2 Σ 1 (h, v) + H T H] 1 H T y 02/25/2011 ipal Group Meeting 19

Compound GMRF Insert binary variables v to turn off clique potentials Modified clique potentials: V (x i,j, x i,j 1, v i,j ) = 1 2 (1 v i,j)(x i,j x i,j 1 ) 2 Intuitive explanation: v = 0 clique potential is quadratic ( on ) v = 1 V C ( ) = 0 no smoothing; image has an edge at this location. Can choose separate latent variables v and h for vertical and horizontal edges respectively { p(x h, v) exp 1 } 2 xt Σ 1 (h, v)x MAP estimate: ˆx = [σ 2 Σ 1 (h, v) + H T H] 1 H T y 02/25/2011 ipal Group Meeting 19

Compound GMRF Insert binary variables v to turn off clique potentials Modified clique potentials: V (x i,j, x i,j 1, v i,j ) = 1 2 (1 v i,j)(x i,j x i,j 1 ) 2 Intuitive explanation: v = 0 clique potential is quadratic ( on ) v = 1 V C ( ) = 0 no smoothing; image has an edge at this location. Can choose separate latent variables v and h for vertical and horizontal edges respectively { p(x h, v) exp 1 } 2 xt Σ 1 (h, v)x MAP estimate: ˆx = [σ 2 Σ 1 (h, v) + H T H] 1 H T y 02/25/2011 ipal Group Meeting 19

Discontinuity-preserving restoration Convex potentials: Generalized Gaussians: V (x) = x p, p [1, 2] Stevenson: V (x) = { x 2, 2a x a 2, x < a x a Green: V (x) = 2a 2 log cosh(x/a) Non-convex potentials: Blake, Zisserman: V (x) = (min{ x, a}) 2 Geman, McClure: V (x) = x2 x 2 +a 2 02/25/2011 ipal Group Meeting 20

Ising model (2-D MRF) V (x i, x j ) = βδ(x i x j ), β is a model parameter Energy function: V C (x C ) = β(boundary length) C C Longer boundaries less probable 02/25/2011 ipal Group Meeting 21

Application: Image segmentation Discrete MRF used to model the segmentation field Each class represented by a value X s {0,..., M 1} Joint distribution function: P {Y dy, X = x} = p(y x)p(x) (Bayesian) MAP estimation: ˆX = arg max x p x Y (x Y ) = arg max(log p(y x) + log(p(x))) x 02/25/2011 ipal Group Meeting 22

Application: Image segmentation Discrete MRF used to model the segmentation field Each class represented by a value X s {0,..., M 1} Joint distribution function: P {Y dy, X = x} = p(y x)p(x) (Bayesian) MAP estimation: ˆX = arg max x p x Y (x Y ) = arg max(log p(y x) + log(p(x))) x 02/25/2011 ipal Group Meeting 22

Application: Image segmentation Discrete MRF used to model the segmentation field Each class represented by a value X s {0,..., M 1} Joint distribution function: P {Y dy, X = x} = p(y x)p(x) (Bayesian) MAP estimation: ˆX = arg max x p x Y (x Y ) = arg max(log p(y x) + log(p(x))) x 02/25/2011 ipal Group Meeting 22

MAP optimization for segmentation Data model: p y x (y x) = s S p(y s x s ) Prior (Ising) model: p x (x) = 1 Z exp{ βt 1(x)}, where t 1 (x) is the number of horizontal and vertical neighbors of x having a different value MAP estimate: Hard optimization problem ˆx = arg min{ log p y x (y x) + βt 1 (x)} x 02/25/2011 ipal Group Meeting 23

MAP optimization for segmentation Data model: p y x (y x) = s S p(y s x s ) Prior (Ising) model: p x (x) = 1 Z exp{ βt 1(x)}, where t 1 (x) is the number of horizontal and vertical neighbors of x having a different value MAP estimate: Hard optimization problem ˆx = arg min{ log p y x (y x) + βt 1 (x)} x 02/25/2011 ipal Group Meeting 23

Some proposed approaches Iterated conditional modes: iterative minimization w.r.t. each pixel Simulated annealing: Generate samples from prior distribution Multi-scale resolution segmentation Figure: (a) Synthetic image with three textures, (b) ICM, (c) Simulated annealing, (d) Multi-resolution approach. 02/25/2011 ipal Group Meeting 24

Summary Graphical models study probability distributions whose conditional dependencies arise out of specific graph structures Markov random field is an undirected graphical model with special factorization properties 2-D MRFs have been widely used as priors in image processing problems Choice of potential functions leads to different optimization problems 02/25/2011 ipal Group Meeting 25