Markov Random Fields for Computer Vision (Part 1)

Similar documents
Introduction To Graphical Models

Intelligent Systems:

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Part 6: Structured Prediction and Energy Minimization (1/2)

Energy Minimization via Graph Cuts

Lecture 9: PGM Learning

Undirected graphical models

Discrete Inference and Learning Lecture 3

CSC 412 (Lecture 4): Undirected Graphical Models

MAP Examples. Sargur Srihari

Probabilistic Graphical Models & Applications

Discriminative Fields for Modeling Spatial Dependencies in Natural Images

Submodularity in Machine Learning

Naïve Bayes classification

CPSC 540: Machine Learning

Undirected Graphical Models

Undirected Graphical Models: Markov Random Fields

Learning with Structured Inputs and Outputs

Probabilistic Graphical Models

Graph Cut based Inference with Co-occurrence Statistics. Ľubor Ladický, Chris Russell, Pushmeet Kohli, Philip Torr

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

CS Lecture 4. Markov Random Fields

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials

Probabilistic Graphical Networks: Definitions and Basic Results

Course 16:198:520: Introduction To Artificial Intelligence Lecture 9. Markov Networks. Abdeslam Boularias. Monday, October 14, 2015

Markov Random Fields

3 : Representation of Undirected GM

Pushmeet Kohli Microsoft Research

Bayesian Learning (II)

Alternative Parameterizations of Markov Networks. Sargur Srihari

The Naïve Bayes Classifier. Machine Learning Fall 2017

Bayesian Machine Learning

CPSC 540: Machine Learning

Intelligent Systems (AI-2)

Lecture 6: Graphical Models

Chapter 16. Structured Probabilistic Models for Deep Learning

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Tightness of LP Relaxations for Almost Balanced Models

Conditional Random Field

Decision Tree Fields

Max$Sum(( Exact(Inference(

Truncated Max-of-Convex Models Technical Report

A Graph Cut Algorithm for Higher-order Markov Random Fields

Review: Directed Models (Bayes Nets)

An Introduction to Bayesian Machine Learning

Part 7: Structured Prediction and Energy Minimization (2/2)

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Lecture 15. Probabilistic Models on Graph

5. Sum-product algorithm

Joint Optimization of Segmentation and Appearance Models

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

Graphical Models for Collaborative Filtering

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 3 September 14, Readings: Mitchell Ch Murphy Ch.

Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials

Part 4: Conditional Random Fields

Graphical Models and Kernel Methods

Probabilistic Graphical Models. Guest Lecture by Narges Razavian Machine Learning Class April

13: Variational inference II

Based on slides by Richard Zemel

Support Vector Machines

Probabilistic Graphical Models

CSCE 478/878 Lecture 6: Bayesian Learning

A Generative Perspective on MRFs in Low-Level Vision Supplemental Material

Semi-Markov/Graph Cuts

Random Field Models for Applications in Computer Vision

Better restore the recto side of a document with an estimation of the verso side: Markov model and inference with graph cuts

Linear Dynamical Systems

Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

Graphical models. Sunita Sarawagi IIT Bombay

MLE/MAP + Naïve Bayes

Machine Learning, Midterm Exam: Spring 2009 SOLUTION

Bayesian Networks Inference with Probabilistic Graphical Models

Transformation of Markov Random Fields for Marginal Distribution Estimation

Variational Inference (11/04/13)

Partially Directed Graphs and Conditional Random Fields. Sargur Srihari

Bayesian Machine Learning - Lecture 7

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc.

Bayesian Learning in Undirected Graphical Models

MAP Estimation Algorithms in Computer Vision - Part II

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

Hidden Markov Models

STA 4273H: Statistical Machine Learning

Material presented. Direct Models for Classification. Agenda. Classification. Classification (2) Classification by machines 6/16/2010.

STA 414/2104: Machine Learning

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

On Partial Optimality in Multi-label MRFs

CPSC 540: Machine Learning

The Minimum Message Length Principle for Inductive Inference

Markov Random Fields and Bayesian Image Analysis. Wei Liu Advisor: Tom Fletcher

Graphical Model Inference with Perfect Graphs

Revisiting the Limits of MAP Inference by MWSS on Perfect Graphs

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

CPSC 540: Machine Learning

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS

Sequential Supervised Learning

A Graph Cut Algorithm for Generalized Image Deconvolution

Probabilistic Graphical Models (I)

Transcription:

Markov Random Fields for Computer Vision (Part 1) Machine Learning Summer School (MLSS 2011) Stephen Gould stephen.gould@anu.edu.au Australian National University 13 17 June, 2011 Stephen Gould 1/23

Pixel Labeling Label every pixel in an image with a class label from some pre-defined set, i.e., y p L. Stephen Gould 2/23

Pixel Labeling Label every pixel in an image with a class label from some pre-defined set, i.e., y p L. Interactive figure-ground segmentation (Boykov and Jolly, 2001; Boykov and Funka-Lea, 2006) Stephen Gould 2/23

Pixel Labeling Label every pixel in an image with a class label from some pre-defined set, i.e., y p L. Interactive figure-ground segmentation (Boykov and Jolly, 2001; Boykov and Funka-Lea, 2006) Surface context (Hoiem et al., 2005) Stephen Gould 2/23

Pixel Labeling Label every pixel in an image with a class label from some pre-defined set, i.e., y p L. Interactive figure-ground segmentation (Boykov and Jolly, 2001; Boykov and Funka-Lea, 2006) Surface context (Hoiem et al., 2005) Semantic labeling (He et al., 2004; Shotton et al., 2006; Gould et al., 2009) Stephen Gould 2/23

Pixel Labeling Label every pixel in an image with a class label from some pre-defined set, i.e., y p L. Interactive figure-ground segmentation (Boykov and Jolly, 2001; Boykov and Funka-Lea, 2006) Surface context (Hoiem et al., 2005) Semantic labeling (He et al., 2004; Shotton et al., 2006; Gould et al., 2009) Stereo matching (Scharstein and Szeliski, 2002) Stephen Gould 2/23

Pixel Labeling Label every pixel in an image with a class label from some pre-defined set, i.e., y p L. Interactive figure-ground segmentation (Boykov and Jolly, 2001; Boykov and Funka-Lea, 2006) Surface context (Hoiem et al., 2005) Semantic labeling (He et al., 2004; Shotton et al., 2006; Gould et al., 2009) Stereo matching (Scharstein and Szeliski, 2002) Image denoising (Felzenszwalb and Huttenlocher, 2004; Szeliski et al., 2008) Stephen Gould 2/23

Digital Photo Montage (Agarwala et al., 2004) Stephen Gould 3/23

Digital Photo Montage demonstration Stephen Gould 4/23

Tutorial Overview Part 1. Pairwise conditional Markov random fields for the pixel labeling problem (45 minutes) Part 2. Pseudo-boolean functions and graph-cuts (1 hour) Part 3. Higher-order terms and inference as integer programming (30 minutes) please ask lots of questions Stephen Gould 5/23

Probability Review Bayes Rule P(y x) = }{{} posterior likelihood {}}{ P(x y) prior {}}{ P(y) P(x) Maximum a Posteriori (MAP) inference: y = argmax y P(y x). Stephen Gould 6/23

Probability Review Bayes Rule P(y x) = }{{} posterior likelihood {}}{ P(x y) prior {}}{ P(y) P(x) Maximum a Posteriori (MAP) inference: y = argmax y P(y x). Conditional Independence Random variables y and x are conditionally independent given z if P(y,x z) = P(y z)p(x z). Stephen Gould 6/23

Graphical Models We can exploit conditional independence assumptions to represent probability distributions in a way that is both compact and efficient for inference. This tutorial is all about one particular representation, called a Markov Random Field (MRF), and the associated inference algorithms that are used in computer vision. A B C D a d b,c A B C D 1 Z Ψ(a,b)Ψ(b,d)Ψ(d,c)Ψ(c,a) Stephen Gould 7/23

Graphical Models A B C D P(a,b,c,d) = 1 Z Ψ(a,b)Ψ(b,d)Ψ(d,c)Ψ(c,a) where ψ = logψ. = 1 Z exp{ ψ(a,b) ψ(b,d) ψ(d,c) ψ(c,a)} Stephen Gould 8/23

Energy Functions Let x be some observations (i.e., features from the image) and let y = (y 1,...,y n ) be a vector of random variables. Then we can write the conditional probability of y given x as P(y x) = 1 exp{ E (y;x)} Z(x) where Z(x) = y L n exp{ E (y;x)} is called the partition function. Stephen Gould 9/23

Energy Functions Let x be some observations (i.e., features from the image) and let y = (y 1,...,y n ) be a vector of random variables. Then we can write the conditional probability of y given x as P(y x) = 1 exp{ E (y;x)} Z(x) where Z(x) = y L n exp{ E (y;x)} is called the partition function. The energy function E (y; x) usually has some structured form: E (y;x) = c ψ c (y c ;x) where ψ c (y c ;x) are clique potentials defined over a subset of random variables y c y. Stephen Gould 9/23

Conditional Markov Random Fields E (y;x) = c ψ c (y c ;x) = ψi U (y i ;x) + ij i V ij Eψ P (y i,y j ;x) + c c Cψ H (y c;x). } {{ } unary } {{ } pairwise } {{ } higher-order x 1 x 2 x 3 y 1 y 2 y 3 x 4 x 5 x 6 y 4 y 5 y 6 x 7 x 8 x 9 y 7 y 8 y 9 Stephen Gould 10/23

Pixel Neighbourhoods y 1 y 2 y 3 y 1 y 2 y 3 y 4 y 5 y 6 y 4 y 5 y 6 y 7 y 8 y 9 y 7 y 8 y 9 4-connected, N 4 8-connected, N 8 Stephen Gould 11/23

Binary MRF Example Consider the following energy function for two binary random variables, y 1 and y 2. 0 1 5 2 0 1 1 3 0 1 0 1 0 3 4 0 E (y 1,y 2 ) = ψ 1 (y 1 )+ψ 2 (y 2 )+ψ 12 (y 1,y 2 ) Stephen Gould 12/23

Binary MRF Example Consider the following energy function for two binary random variables, y 1 and y 2. 0 1 5 2 0 1 1 3 0 1 0 1 0 3 4 0 E (y 1,y 2 ) = ψ 1 (y 1 )+ψ 2 (y 2 )+ψ 12 (y 1,y 2 ) = 5ȳ 1 +2y 1 }{{} ψ 1 +ȳ 2 +3y 2 }{{} ψ 2 +3ȳ 1 y 2 +4y 1 ȳ 2 }{{} ψ 12 where ȳ 1 = 1 y 1 and ȳ 2 = 1 y 2. Stephen Gould 12/23

Binary MRF Example Consider the following energy function for two binary random variables, y 1 and y 2. 0 1 5 2 0 1 1 3 0 1 0 1 0 3 4 0 E (y 1,y 2 ) = ψ 1 (y 1 )+ψ 2 (y 2 )+ψ 12 (y 1,y 2 ) = 5ȳ 1 +2y 1 }{{} ψ 1 +ȳ 2 +3y 2 }{{} ψ 2 +3ȳ 1 y 2 +4y 1 ȳ 2 }{{} ψ 12 where ȳ 1 = 1 y 1 and ȳ 2 = 1 y 2. Graphical Model y 1 y 2 Probability Table y 1 y 2 E P 0 0 6 0.244 0 1 11 0.002 1 0 7 0.090 1 1 5 0.664 Stephen Gould 12/23

Compactness of Representation Consider a 1 mega-pixel image, e.g., 1000 1000 pixels. We want to annotate each pixel with a label from L. Let L = L. There are L 106 possible ways to label such an image. A naive encoding i.e., one big table would require L 106 1 parameters. A pairwise MRF over N 4 requires 10 6 L parameters for the unary terms and 2 1000 (1000 1)L 2 parameters for the pairwise terms, i.e., O(10 6 L 2 ). Even less are required if we share parameters. Stephen Gould 13/23

Inference and Energy Minimization We are usually interested in finding the most probable labeling, y = argmax y P(y x) = argmine (y;x). y This is known as maximum a posteriori (MAP) inference or energy minimization. Stephen Gould 14/23

Inference and Energy Minimization We are usually interested in finding the most probable labeling, y = argmax y P(y x) = argmine (y;x). y This is known as maximum a posteriori (MAP) inference or energy minimization. A number of techniques can be used to find y, including: message-passing (dynamic programming) integer programming (part 3) graph-cuts (part 2) However, in general, inference is NP-hard. Stephen Gould 14/23

Characterizing Markov Random Fields Markov random fields can be categorized via a number of different dimensions: Label space: binary vs. multi-label; homogeneous vs. heterogeneous. Order: unary vs. pairwise vs. higher-order. Structure: chain vs. tree vs. grid vs. general graph; neighbourhood size. Potentials: submodular, convex, compressible. These all affect tractability of inference. Stephen Gould 15/23

Markov Random Fields for Pixel Labeling P(y x) P(x y)p(y) = exp{ E (y;x)} energy {}}{ E (y;x) = ψi U (y i ;x) i V } {{ } unary +λ ij N 8 ψ P ij(y i,y j ;x) }{{} pairwise ψ U i (y i ;x) = ψ P ij(y i,y j ;x) = [[y i y j ]] }{{} Potts prior likelihood {}}{ [[y i = l]]logp(x i l) l L Here the prior acts to smooth predictions (independent of x). Stephen Gould 16/23

Prior Strength λ = 1 λ = 4 λ = 16 λ = 128 λ = 1024 Stephen Gould 17/23

Interactive Segmentation Model Label space: foreground or background L = {0,1} Unary term: Gaussian mixture models for foreground and background ψ U i (y i ;x) = k 1 Σ 2 k + 1 (x 2 i µ k ) T Σ 1 k (x i µ k ) logλ k Pairwise term: contrast-dependent smoothness prior { ) ψij(y P λ i,y j ;x) = 0 +λ 1 exp ( x i x j 2 2β, if y i y j 0, otherwise Stephen Gould 18/23

Geometric/Semantic Labeling Model Label space: pre-defined label set, e.g., L = {sky,tree,grass,...} Unary term: Boosted decision-tree classifiers over texton-layout features [Shotton et al., 2006] ψ U i (y i = l;x) = θ l logp(φ i (x) l) Pairwise term: contrast-dependent smoothness prior { ) ψij P (y λ i,y j ;x) = 0 +λ 1 exp ( x i x j 2 2β, if y i y j 0, otherwise Stephen Gould 19/23

Stereo Matching Model Label space: pixel disparity L = {0,1,...,127} Unary term: sum of absolute differences (SAD) or normalized cross-correlation (NCC) ψ U i (y i ;x) = (u,v) W x left (u,v) x right (u y i,v) Pairwise term: discontinuity preserving prior ψ P ij(y i,y j ) = max{ y i y j,d max } Stephen Gould 20/23

Image Denoising Model Label space: pixel intensity or colour Unary term: square distance L = {0,1,...,255} ψ U i (y i ;x) = y i x i 2 Pairwise term: truncated L 2 distance ψij P (y i,y j ) = max { y i y j 2,dmax 2 } Stephen Gould 21/23

Digital Photo Montage Model Label space: image index L = {1,2,...,K} Unary term: none! Pairwise term: seem penalty ψij(y P i,y j ;x) = x yi (i) x yj (i) + x yi (j) x yj (j) (or edge-normalized variant) Stephen Gould 22/23

end of part 1 Stephen Gould 23/23