Markov Random Fields for Computer Vision (Part 1)

Markov Random Fields for Computer Vision (Part 1) Machine Learning Summer School (MLSS 2011) Stephen Gould stephen.gould@anu.edu.au Australian National University 13 17 June, 2011 Stephen Gould 1/23

Pixel Labeling Label every pixel in an image with a class label from some pre-defined set, i.e., y p L. Stephen Gould 2/23

Pixel Labeling Label every pixel in an image with a class label from some pre-defined set, i.e., y p L. Interactive figure-ground segmentation (Boykov and Jolly, 2001; Boykov and Funka-Lea, 2006) Surface context (Hoiem et al., 2005) Semantic labeling (He et al., 2004; Shotton et al., 2006; Gould et al., 2009) Stephen Gould 2/23

Digital Photo Montage (Agarwala et al., 2004) Stephen Gould 3/23

Digital Photo Montage demonstration Stephen Gould 4/23

Tutorial Overview Part 1. Pairwise conditional Markov random fields for the pixel labeling problem (45 minutes) Part 2. Pseudo-boolean functions and graph-cuts (1 hour) Part 3. Higher-order terms and inference as integer programming (30 minutes) please ask lots of questions Stephen Gould 5/23

Probability Review Bayes Rule P(y x) = }{{} posterior likelihood {}}{ P(x y) prior {}}{ P(y) P(x) Maximum a Posteriori (MAP) inference: y = argmax y P(y x). Stephen Gould 6/23

Probability Review Bayes Rule P(y x) = }{{} posterior likelihood {}}{ P(x y) prior {}}{ P(y) P(x) Maximum a Posteriori (MAP) inference: y = argmax y P(y x). Conditional Independence Random variables y and x are conditionally independent given z if P(y,x z) = P(y z)p(x z). Stephen Gould 6/23

Graphical Models We can exploit conditional independence assumptions to represent probability distributions in a way that is both compact and efficient for inference. This tutorial is all about one particular representation, called a Markov Random Field (MRF), and the associated inference algorithms that are used in computer vision. A B C D a d b,c A B C D 1 Z Ψ(a,b)Ψ(b,d)Ψ(d,c)Ψ(c,a) Stephen Gould 7/23

Graphical Models A B C D P(a,b,c,d) = 1 Z Ψ(a,b)Ψ(b,d)Ψ(d,c)Ψ(c,a) where ψ = logψ. = 1 Z exp{ ψ(a,b) ψ(b,d) ψ(d,c) ψ(c,a)} Stephen Gould 8/23

Energy Functions Let x be some observations (i.e., features from the image) and let y = (y 1,...,y n ) be a vector of random variables. Then we can write the conditional probability of y given x as P(y x) = 1 exp{ E (y;x)} Z(x) where Z(x) = y L n exp{ E (y;x)} is called the partition function. The energy function E (y; x) usually has some structured form: E (y;x) = c ψ c (y c ;x) where ψ c (y c ;x) are clique potentials defined over a subset of random variables y c y. Stephen Gould 9/23

Conditional Markov Random Fields E (y;x) = c ψ c (y c ;x) = ψi U (y i ;x) + ij i V ij Eψ P (y i,y j ;x) + c c Cψ H (y c;x). } {{ } unary } {{ } pairwise } {{ } higher-order x 1 x 2 x 3 y 1 y 2 y 3 x 4 x 5 x 6 y 4 y 5 y 6 x 7 x 8 x 9 y 7 y 8 y 9 Stephen Gould 10/23

Pixel Neighbourhoods y 1 y 2 y 3 y 1 y 2 y 3 y 4 y 5 y 6 y 4 y 5 y 6 y 7 y 8 y 9 y 7 y 8 y 9 4-connected, N 4 8-connected, N 8 Stephen Gould 11/23

Binary MRF Example Consider the following energy function for two binary random variables, y 1 and y 2. 0 1 5 2 0 1 1 3 0 1 0 1 0 3 4 0 E (y 1,y 2 ) = ψ 1 (y 1 )+ψ 2 (y 2 )+ψ 12 (y 1,y 2 ) Stephen Gould 12/23

Binary MRF Example Consider the following energy function for two binary random variables, y 1 and y 2. 0 1 5 2 0 1 1 3 0 1 0 1 0 3 4 0 E (y 1,y 2 ) = ψ 1 (y 1 )+ψ 2 (y 2 )+ψ 12 (y 1,y 2 ) = 5ȳ 1 +2y 1 }{{} ψ 1 +ȳ 2 +3y 2 }{{} ψ 2 +3ȳ 1 y 2 +4y 1 ȳ 2 }{{} ψ 12 where ȳ 1 = 1 y 1 and ȳ 2 = 1 y 2. Stephen Gould 12/23

Compactness of Representation Consider a 1 mega-pixel image, e.g., 1000 1000 pixels. We want to annotate each pixel with a label from L. Let L = L. There are L 106 possible ways to label such an image. A naive encoding i.e., one big table would require L 106 1 parameters. A pairwise MRF over N 4 requires 10 6 L parameters for the unary terms and 2 1000 (1000 1)L 2 parameters for the pairwise terms, i.e., O(10 6 L 2 ). Even less are required if we share parameters. Stephen Gould 13/23

Inference and Energy Minimization We are usually interested in finding the most probable labeling, y = argmax y P(y x) = argmine (y;x). y This is known as maximum a posteriori (MAP) inference or energy minimization. A number of techniques can be used to find y, including: message-passing (dynamic programming) integer programming (part 3) graph-cuts (part 2) However, in general, inference is NP-hard. Stephen Gould 14/23

Characterizing Markov Random Fields Markov random fields can be categorized via a number of different dimensions: Label space: binary vs. multi-label; homogeneous vs. heterogeneous. Order: unary vs. pairwise vs. higher-order. Structure: chain vs. tree vs. grid vs. general graph; neighbourhood size. Potentials: submodular, convex, compressible. These all affect tractability of inference. Stephen Gould 15/23

Markov Random Fields for Pixel Labeling P(y x) P(x y)p(y) = exp{ E (y;x)} energy {}}{ E (y;x) = ψi U (y i ;x) i V } {{ } unary +λ ij N 8 ψ P ij(y i,y j ;x) }{{} pairwise ψ U i (y i ;x) = ψ P ij(y i,y j ;x) = [[y i y j ]] }{{} Potts prior likelihood {}}{ [[y i = l]]logp(x i l) l L Here the prior acts to smooth predictions (independent of x). Stephen Gould 16/23

Prior Strength λ = 1 λ = 4 λ = 16 λ = 128 λ = 1024 Stephen Gould 17/23

Interactive Segmentation Model Label space: foreground or background L = {0,1} Unary term: Gaussian mixture models for foreground and background ψ U i (y i ;x) = k 1 Σ 2 k + 1 (x 2 i µ k ) T Σ 1 k (x i µ k ) logλ k Pairwise term: contrast-dependent smoothness prior { ) ψij(y P λ i,y j ;x) = 0 +λ 1 exp ( x i x j 2 2β, if y i y j 0, otherwise Stephen Gould 18/23

Geometric/Semantic Labeling Model Label space: pre-defined label set, e.g., L = {sky,tree,grass,...} Unary term: Boosted decision-tree classifiers over texton-layout features [Shotton et al., 2006] ψ U i (y i = l;x) = θ l logp(φ i (x) l) Pairwise term: contrast-dependent smoothness prior { ) ψij P (y λ i,y j ;x) = 0 +λ 1 exp ( x i x j 2 2β, if y i y j 0, otherwise Stephen Gould 19/23

Stereo Matching Model Label space: pixel disparity L = {0,1,...,127} Unary term: sum of absolute differences (SAD) or normalized cross-correlation (NCC) ψ U i (y i ;x) = (u,v) W x left (u,v) x right (u y i,v) Pairwise term: discontinuity preserving prior ψ P ij(y i,y j ) = max{ y i y j,d max } Stephen Gould 20/23

Image Denoising Model Label space: pixel intensity or colour Unary term: square distance L = {0,1,...,255} ψ U i (y i ;x) = y i x i 2 Pairwise term: truncated L 2 distance ψij P (y i,y j ) = max { y i y j 2,dmax 2 } Stephen Gould 21/23

Digital Photo Montage Model Label space: image index L = {1,2,...,K} Unary term: none! Pairwise term: seem penalty ψij(y P i,y j ;x) = x yi (i) x yj (i) + x yi (j) x yj (j) (or edge-normalized variant) Stephen Gould 22/23

end of part 1 Stephen Gould 23/23