Pushmeet Kohli. Microsoft Research Cambridge. IbPRIA 2011

Size: px

Start display at page:

Download "Pushmeet Kohli. Microsoft Research Cambridge. IbPRIA 2011"

Octavia Ferguson
5 years ago
Views:

1 Pushmeet Kohli Microsoft Research Cambridge IbPRIA 2011

2 2:30 4:30 Labelling Problems Graphical Models Message Passing 4:30 5:00 - Coffee break 5:00 7:00 - Graph Cuts Move Making Algorithms Speed and Efficiency

3 Infer the values of hidden random variables Object Segmentation Pose Estimation Geometry Estimation Sky Tree Building Grass Orientation Joint angles

5 ( n = number of pixels ) z є (R,G,B) n x є {0,1} n n random variables: x 1,,x n Joint Probability: P(x 1,,x n ) Conditional Probability: P(x 1 z 1,z n )

6 ( n = number of pixels ) z є (R,G,B) n x є {0,1} n

7 ( n = number of pixels ) z є (R,G,B) n x є {0,1} n P(x z) = P(z x) P(x) / P(z) ~ P(z x) P(x) Posterior Probability Likelihood (data-dependent) Prior (data- independent) (MAP Solution) x* = arg max P(x z) = arg min E(x) x x

8 Posterior Likelihood Prior P(x z) = P(z x) P(x) P(z i x i ) x i

9 Green Red Green Red P(x z) ~ P(z x) P(x) P(z x) = F GMM (z,x)

10 P(x z) ~ P(z x) P(x) Log P(z i x i =0) P(z i x i =1) MAP Solution x* = argmax P(z x) x = argmax P(z i x i ) x x i

11 Posterior Likelihood Prior P(x z) = P(z x) P(x) Encourages consistency between labelling of adjacent pixels f(x i,x j ) x i,x j

12 P(x z) ~ P(z x) P(x) x i x j P(x) = f ij (x i,x j ) i,j Є N = exp{- x i -x j } MRF Ising prior i,j Є N

13 P(x z) = P(z i x i ) Posterior Probability x i P(x i,x j ) x i,x j -ve log E(x,z,w) = θ i (x i,z i ) + w θ ij (x i,x j,z i,z j ) Energy i i,j

14 E(x,z,w) = θ i (x i,z i ) + w θ ij (x i,x j ) i i,j Image Likelihood Solution MRF Solution (Ising prior)

15 P(x z) = P(z i x i ) x i P(x i,x j ) x i,x j

16 P(x z) = P(z i x i ) x i P(x i,x j,z i,z j ) x i,x j -ve log E(x,z,w) = θ i (x i,z i ) + w θ ij (x i,x j,z i,z j ) i i,j [Boykov and Jolly 01] [Blake et al. 04] [Rother, Kolmogorov and Blake `04]

17 E(x,z,w) = θ i (x i,z i ) + w θ ij (x i,x j,z i,z j ) i i,j Cost (θ ij ) Pairwise Cost z i -z j 2 [Boykov and Jolly 01] [Blake et al. 04] [Rother, Kolmogorov and Blake `04]

18 E(x,z,w) = θ i (x i,z i ) + w θ ij (x i,x j,z i,z j ) i i,j Pairwise Cost Global Minimum (x*) [Boykov and Jolly 01] [Blake et al. 04] [Rother, Kolmogorov and Blake `04]

19 Demo

20 Factors graphs & Order of a posterior distribution (or energy function)

Write probability distributions as Graphical model: P(x) ~ exp{-e(x)} E(x) = θ(x 1,x 2,x 3 ) + θ(x 2,x 4 ) + θ(x 3,x 4 ) + θ(x 3,x 5 ) Gibbs distribution 4 factors x 1 x 2 unobserved variables are in

21 Write probability distributions as Graphical model: P(x) ~ exp{-e(x)} E(x) = θ(x 1,x 2,x 3 ) + θ(x 2,x 4 ) + θ(x 3,x 4 ) + θ(x 3,x 5 ) Gibbs distribution 4 factors x 1 x 2 unobserved variables are in same factor. x 3 x 4 References: x 5 Factor graph - Pattern Recognition and Machine Learning [Bishop 08, book, chapter 8] - several lectures at the Machine Learning Summer School 2009 (see video lectures)

22 x 1 x 2 Definition Order : The arity (number of variables) of the largest factor x 3 x 4 E(X) = θ(x 1,x 2,x 3 ) θ(x 2,x 4 ) θ(x 3,x 4 ) θ(x 3,x 5 ) arity 3 arity 2 x 5 Factor graph with order 3

23 4-connected; pairwise MRF E(x) = θ ij (x i,x j ) i,j Є N 4 higher(8)-connected; pairwise MRF E(x) = θ ij (x i,x j ) i,j Є N 8 Higher-order RF E(x) = θ ij (x i,x j ) i,j Є N 4 +θ(x 1,,x n ) Order 2 Order 2 Order n Pairwise energy higher-order energy

24 1. Image segmentation 2. Stereo 3. Semantic Segmentation

25 1. Image segmentation 2. Stereo 3. Semantic Segmentation

26 P(x z) ~ exp{-e(x)} E(x) = θ i (x i, z i ) + θ ij (x i,x j ) i i,j Є N 4 Observed variable Unobserved (latent) variable z i x j x i Factor graph

Goal: Detect and segment test image: 1 Large set of example segmentation: (Template, Position, Scale, Orientation) Up to 2.000.

27 Goal: Detect and segment test image: 1 Large set of example segmentation: (Template, Position, Scale, Orientation) Up to exemplars w T(1) T(2) T(3) E(x,w): {0,1} n x {Exemplar} R E(x,w) = T(w) i -x i + θ ij (x i,x j ) i i,j Є N 4 Hamming distance [Kohli et al. IJCV 08, Lempisky et al. ECCV 08]

28 UIUC dataset; 98.8% accuracy [Kohli et al. IJCV 08, Lempisky et al. ECCV 08]

30 Training Image Pairwise Energy P(x) Test Image Test Image (60% Noise) Minimized using st-mincut or max-product message passing Result

31 Higher Order Structure not Preserved Training Image Pairwise Energy P(x) Test Image Test Image (60% Noise) Minimized using st-mincut or max-product message passing Result

32 Minimize: Where: E(X) = P(X) + h c (X c ) c h c : {0,1} c R Higher Order Function ( c = 10x10 = 100) Assigns cost to possible labellings! Exploit function structure to transform it to a Pairwise function p 1 p 2 p 3 Rother and Kohli, MSR Tech Report [2010]

33 Training Image Learned Patterns Test Image Test Image (60% Noise) Pairwise Result Higher-Order Result Rother and Kohli, MSR Tech Report [2010]

34 1. Image segmentation 2. Stereo 3. Semantic Segmentation

35 d=0 d=4 Image left(a) Image right(b) Ground truth depth Images rectified Ignore occlusion for now Energy: d i E(d): {0,,D-1} n R Labels: d (depth/shift)

Right Image Energy: Unary: E(d): {0,,D-1} n

possible, NCC, ) Pairwise: i i-2 (d i =2)

36 Right Image Energy: Unary: E(d): {0,,D-1} n R E(d) = θ i (d i ) + θ ij (d i,d j ) i i,j Є N 4 θ i (d i ) = (l j -r i-di ) SAD; Sum of absolute differences (many others possible, NCC, ) Pairwise: i i-2 (d i =2) left right θ ij (d i,d j ) = g( d i -d j ) Left Image

37 cost [Olga Veksler PhD thesis, Daniel Cremers et al.] θ ij (d i,d j ) = g( d i -d j ) d i -d j No truncation (global min.)

38 cost [Olga Veksler PhD thesis, Daniel Cremers et al.] θ ij (d i,d j ) = g( d i -d j ) d i -d j discontinuity preserving potentials [Blake&Zisserman 83, 87] No truncation (global min.) with truncation (NP hard optimization)

39 cost θ ij (d i,d j ) = g( d i -d j ) Left image d i -d j Potts model (Potts model) Smooth disparities [Olga Veksler PhD thesis]

40 see No MRF Pixel independent (WTA) No horizontal links Efficient since independent chains Pairwise MRF [Boykov et al. 01] Ground truth

41 1. Image segmentation 2. Stereo 3. Semantic/Object Segmentation

42 Assign a label to each pixel (or Voxel in 3D) Object Segmentation Organ Detection water boat tree chair road

43 (1) No relationships between Pixel s (2) Pixel Groups, no relationships between groups (3) Pairwise Relationships between Pixel groups (4) Pairwise Relationships between Pixel s (5) Higher order Relationships between Pixels

44 Image Window (W) Pixel to be classified (P) Image P,W Pixel Classifier Boosting [Shotton et al, 2006] Random Decision Forests Cost for assigning label Cow

45 Image (MSRC-21) Pairwise CRF Higher order CRF Ground Truth Grass Sheep

46 [Unwrap Mosaic, Rav-Acha, Kohli, Fitzgibbon, Rother, Siggraph 08]

48 E(x) = f i (x i ) + g ij (x i,x j ) + h c (x c ) i ij c Unary Pairwise Higher Order x takes discrete values How to minimize E(x)?

49 NP-Hard MAXCUT CSP Space of Problems

50 Which functions are exactly solvable? Approximate solutions of NP-hard problems Scalability and Efficiency

51 Which functions are exactly solvable? Boros Hammer [1965], Kolmogorov Zabih [ECCV 2002, PAMI 2004], Ishikawa [PAMI 2003], Schlesinger [EMMCVPR 2007], Kohli Kumar Torr [CVPR2007, PAMI 2008], Ramalingam Kohli Alahari Torr [CVPR 2008], Kohli Ladicky Torr [CVPR 2008, IJCV 2009], Zivny Jeavons [CP 2008] Approximate solutions of NP-hard problems Schlesinger [1976 ], Kleinberg and Tardos [FOCS 99], Chekuri et al. [2001], Boykov et al. [PAMI 2001], Wainwright et al. [NIPS 2001], Werner [PAMI 2007], Komodakis [PAMI 2005], Lempitsky et al. [ICCV 2007], Kumar et al. [NIPS 2007], Kumar et al. [ICML 2008], Sontag and Jakkola [NIPS 2007], Kohli et al. [ICML 2008], Kohli et al. [CVPR 2008, IJCV 2009], Rother et al. [2009] Scalability and Efficiency Kohli Torr [ICCV 2005, PAMI 2007], Juan and Boykov [CVPR 2006], Alahari Kohli Torr [CVPR 2008], Delong and Boykov [CVPR 2008]

52 Dynamic Programming (DP) Belief Propagtion (BP) Tree-Reweighted (TRW), Max-sum Diffusion Message passing Graph Cut (GC) Branch & Bound Combinatorial Algorithms Relaxation methods Convex Optimization (Linear Programming)

53 NP-Hard MAXCUT CSP Tree Structured Space of Problems

55 E(x) = f i (x i ) + g ij (x i,x j ) i ij p q Iteratively pass messages between nodes... Message update rule? Belief propagation (BP) Tree-reweighted belief propagation (TRW) Max-product (minimizing an energy function, or MAP estimation) x* = arg max P(x) = arg min E(x) E(x) = -ln P(x) + K Sum-product (computing marginal probabilities) P(x i ) = P(x) x V-{i}

56 f (x p ) + g pq (x p,x q ) p q r L 1 j M p->q (L 1 ) = min f (x p ) + g pq (x p, L 1 ) x p = min (5+0, 1+2, 2+2) Potts Model g pq =2 ( x p x q ) M p->q (L 1,L 2,L 3 ) = (3,1,2)

57 f (x p ) + g pq (x p,x q ) p q r L 1 Potts Model g pq =2 ( x p x q )

58 M p->q + f (x q ) + g qr (x q,x r ) M q->r + f (x r ) p q r min { E(x) x r = j} x (Min-marginal) Select minimum cost label Trace back path to get minimum cost labeling Global minimum in linear time

59 leaf p leaf q r root Dynamic programming: global minimum in linear time BP: Inward pass (dynamic programming) Outward pass Gives min-marginals

60 p( xp) pq( xp, xq) p q r j

61 p( xp) pq( xp, xq) p q r j

62 p( xp) pq( xp, xq) p q r j

63 p( xp) pq( xp, xq) p q r j

64 p q r k

65 p q r

66 p q r

67 p q r

68 p q r Min-marginal for node q and label j: j

69 Pass messages using same rules Empirically often works quite well May not converge Pseudo min-marginals Gives local minimum in the tree neighborhood [Weiss&Freeman 01],[Wainwright et al. 04] Assumptions: BP has converged no ties in pseudo min-marginals

70 Naïve implementation: O(K 2 ) [K = number of labels] Often can be improved to O(K) Potts interactions, truncated linear, truncated quadratic,... j

71 + Provides a lower bound Lower Bound < E(x*) < E(x ) Slightly different messages Try to solve a LP relaxation of the MAP problem [Kolmogorov, Wainwright et al.]

72 Related to Dynamic Programming Exact on Trees (Hyper-trees) Connections to Linear programming Popular methods: BP, TRW-S, Diffusion Schedules have a significant result on the solution For more details: [Kolmogorov 06, Wainwright et al. 05]

73 n = Number of Variables Segmentation Energy NP-Hard MAXCUT CSP Pair-wise O(n3 ) Tree Structured Submodular Functions O(n 6 ) Space of Problems

75 Pseudo-boolean function f {0,1} n R is submodular if f(a) + f(b) f(a B) + f(a B) (OR) (AND) for all A,B ϵ {0,1} n Example: n = 2, A = [1,0], B = [0,1] f([1,0]) + f([0,1]) f([1,1]) + f([0,0]) Property : Sum of submodular functions is submodular Binary Image Segmentation Energy is submodular E(x) = c i x i + d ij x i -x j i i,j

76 Discrete Analogues of Concave Functions [Lovasz, 83] Widely applied in Operations Research Applications in Machine Learning MAP Inference in Markov Random Fields Clustering [Narasimhan, Jojic, & Bilmes, NIPS 2005] Structure Learning [Narasimhan & Bilmes, NIPS 2006] Maximizing the spread of influence through a social network [Kempe, Kleinberg & Tardos, KDD 2003]

77 Polynomial time algorithms Ellipsoid Algorithm: [Grotschel, Lovasz & Schrijver 81] First strongly polynomial algorithm: [Iwata et al. 00] [A. Schrijver 00] Current Best: O(n 5 Q + n 6 ) [Q is function evaluation time] [Orlin 07] Symmetric functions: E(x) = E(1-x) Can be minimized in O(n 3 ) Minimizing Pairwise submodular functions Can be transformed to st-mincut/max-flow [Hammer, 1965] Very low empirical running time ~ O(n) E(X) = f i (x i ) + g ij (x i,x j ) i ij

78 Source v 1 v Graph (V, E, C) Vertices V = {v 1, v 2... v n } Edges E = {(v 1, v 2 )...} Costs C = {c (1, 2)...} Sink

79 What is a st-cut? Source v 1 v Sink

80 What is a st-cut? Source An st-cut (S,T) divides the nodes between source and sink v 1 v What is the cost of a st-cut? Sum of cost of all edges going from S to T Sink = 15

81 What is a st-cut? Source v 1 v Sink = 8 An st-cut (S,T) divides the nodes between source and sink. What is the cost of a st-cut? Sum of cost of all edges going from S to T What is the st-mincut? st-cut with the minimum cost

82 Construct a graph such that: 1. Any st-cut corresponds to an assignment of x 2. The cost of the cut is equal to the energy of x : E(x) S st-mincut E(x) T Solution [Hammer, 1965] [Kolmogorov and Zabih, 2002

83 E(x) = θ i (x i ) + θ ij (x i,x j ) i i,j For all ij θ ij (0,1) + θ ij (1,0) θ ij (0,0) + θ ij (1,1) Equivalent (transformable) E(x) = c i x i + c ij x i (1-x j ) i i,j c ij 0

84 E(a 1,a 2 ) Source (0) a 1 a 2 Sink (1)

85 E(a 1,a 2 ) = 2a 1 Source (0) 2 a 1 a 2 Sink (1)

86 E(a 1,a 2 ) = 2a 1 + 5ā 1 Source (0) 2 a 1 a 2 5 Sink (1)

87 E(a 1,a 2 ) = 2a 1 + 5ā 1 + 9a 2 + 4ā 2 Source (0) 2 9 a 1 a Sink (1)

88 E(a 1,a 2 ) = 2a 1 + 5ā 1 + 9a 2 + 4ā 2 + 2a 1 ā 2 Source (0) 2 9 a 1 a Sink (1)

89 E(a 1,a 2 ) = 2a 1 + 5ā 1 + 9a 2 + 4ā 2 + 2a 1 ā 2 + ā 1 a Source (0) a 1 a Sink (1)

90 E(a 1,a 2 ) = 2a 1 + 5ā 1 + 9a 2 + 4ā 2 + 2a 1 ā 2 + ā 1 a Source (0) a 1 a Sink (1)

91 E(a 1,a 2 ) = 2a 1 + 5ā 1 + 9a 2 + 4ā 2 + 2a 1 ā 2 + ā 1 a 2 Source (0) 2 9 Cost of cut = 11 1 a 1 a 2 a 1 = 1 a 2 = E (1,1) = 11 Sink (1)

92 E(a 1,a 2 ) = 2a 1 + 5ā 1 + 9a 2 + 4ā 2 + 2a 1 ā 2 + ā 1 a 2 Source (0) a 1 a 2 st-mincut cost = 8 a 1 = 1 a 2 = E (1,0) = 8 Sink (1)

93 Solve the dual maximum flow problem Source v 1 v Sink 4 Compute the maximum flow between Source and Sink s.t. Edges: Flow < Capacity Nodes: Flow in = Flow out Min-cut\Max-flow Theorem In every network, the maximum flow equals the cost of the st-mincut Assuming non-negative capacity

94 Flow = 0 Source Augmenting Path Based Algorithms v 1 v Sink

95 Flow = 0 Source Augmenting Path Based Algorithms Find path from source to sink with positive capacity v 1 v Sink

96 Source v 1 v Flow = Augmenting Path Based Algorithms 1. Find path from source to sink with positive capacity 2. Push maximum possible flow through this path Sink

97 Source v 1 v 2 3 Flow = Augmenting Path Based Algorithms 1. Find path from source to sink with positive capacity 2. Push maximum possible flow through this path Sink

98 Source v 1 v 2 3 Flow = 2 2 Sink 4 Augmenting Path Based Algorithms 1. Find path from source to sink with positive capacity 2. Push maximum possible flow through this path 3. Repeat until no path can be found

99 Source v 1 v 2 3 Flow = 2 2 Sink 4 Augmenting Path Based Algorithms 1. Find path from source to sink with positive capacity 2. Push maximum possible flow through this path 3. Repeat until no path can be found

100 Source v 1 v 2 3 Flow = Sink 0 Augmenting Path Based Algorithms 1. Find path from source to sink with positive capacity 2. Push maximum possible flow through this path 3. Repeat until no path can be found

101 Source v 1 v 2 3 Flow = 6 2 Sink 0 Augmenting Path Based Algorithms 1. Find path from source to sink with positive capacity 2. Push maximum possible flow through this path 3. Repeat until no path can be found

102 Source v 1 v 2 3 Flow = 6 2 Sink 0 Augmenting Path Based Algorithms 1. Find path from source to sink with positive capacity 2. Push maximum possible flow through this path 3. Repeat until no path can be found

103 Source v 1 v 2 1 Flow = Sink 0 Augmenting Path Based Algorithms 1. Find path from source to sink with positive capacity 2. Push maximum possible flow through this path 3. Repeat until no path can be found

104 Flow = 8 Source Augmenting Path Based Algorithms Find path from source to sink with positive capacity v 1 v Sink 0 2. Push maximum possible flow through this path 3. Repeat until no path can be found

105 Source v 1 v 2 2 Flow = 8 3 Sink 0 Augmenting Path Based Algorithms 1. Find path from source to sink with positive capacity 2. Push maximum possible flow through this path 3. Repeat until no path can be found

106 E(a 1,a 2 ) = 2a 1 + 5ā 1 + 9a 2 + 4ā 2 + 2a 1 ā 2 + ā 1 a Source (0) a 1 a Sink (1)

107 E(a 1,a 2 ) = 2a 1 + 5ā 1 + 9a 2 + 4ā 2 + 2a 1 ā 2 + ā 1 a 2 Source (0) a 1 a a 1 + 5ā 1 = 2(a 1 +ā 1 ) + 3ā 1 = 2 + 3ā 1 Sink (1)

108 E(a 1,a 2 ) = 2 + 3ā 1 + 9a 2 + 4ā 2 + 2a 1 ā 2 + ā 1 a 2 Source (0) a 1 a a 1 + 5ā 1 = 2(a 1 +ā 1 ) + 3ā 1 = 2 + 3ā 1 Sink (1)

109 E(a 1,a 2 ) = 2 + 3ā 1 + 9a 2 + 4ā 2 + 2a 1 ā 2 + ā 1 a 2 Source (0) a 1 a a 2 + 4ā 2 = 4(a 2 +ā 2 ) + 5ā 2 = 4 + 5ā 2 Sink (1)

110 E(a 1,a 2 ) = 2 + 3ā 1 + 5a a 1 ā 2 + ā 1 a 2 Source (0) a 1 a a 2 + 4ā 2 = 4(a 2 +ā 2 ) + 5ā 2 = 4 + 5ā 2 Sink (1)

111 E(a 1,a 2 ) = 6 + 3ā 1 + 5a 2 + 2a 1 ā 2 + ā 1 a Source (0) a 1 a Sink (1)

112 E(a 1,a 2 ) = 6 + 3ā 1 + 5a 2 + 2a 1 ā 2 + ā 1 a Source (0) a 1 a Sink (1)

113 E(a 1,a 2 ) = 6 + 3ā 1 + 5a 2 + 2a 1 ā 2 + ā 1 a 2 3ā 1 + 5a 2 + 2a 1 ā Source (0) a 1 a Sink (1) = 2(ā 1 +a 2 +a 1 ā 2 ) +ā 1 +3a 2 = 2(1+ā 1 a 2 ) +ā 1 +3a 2 F1 = ā 1 +a 2 +a 1 ā 2 F2 = 1+ā 1 a 2 a 1 a 2 F1 F

114 E(a 1,a 2 ) = 8 + ā 1 + 3a 2 + 3ā 1 a 2 3ā 1 + 5a 2 + 2a 1 ā Source (0) a 1 a Sink (1) = 2(ā 1 +a 2 +a 1 ā 2 ) +ā 1 +3a 2 = 2(1+ā 1 a 2 ) +ā 1 +3a 2 F1 = ā 1 +a 2 +a 1 ā 2 F2 = 1+ā 1 a 2 a 1 a 2 F1 F

115 E(a 1,a 2 ) = 8 + ā 1 + 3a 2 + 3ā 1 a 2 Source (0) a 1 a No more augmenting paths possible Sink (1)

116 E(a 1,a 2 ) = 8 + ā 1 + 3a 2 + 3ā 1 a 2 Residual Graph (positive coefficients) Total Flow bound on the optimal solution Source (0) a 1 a Sink (1) Tight Bound --> Inference of the optimal solution becomes trivial

117 E(a 1,a 2 ) = 8 + ā 1 + 3a 2 + 3ā 1 a 2 Residual Graph (positive coefficients) Total Flow bound on the energy of the optimal solution Source (0) a 1 a 2 st-mincut cost = 8 a 1 = 1 a 2 = E (1,0) = 8 Sink (1) Tight Bound --> Inference of the optimal solution becomes trivial

118 Augmenting Path and Push-Relabel n: #nodes m: #edges U: maximum edge weight Algorithms assume nonnegative edge weights [Slide credit: Andrew Goldberg]

119 Augmenting Path and Push-Relabel n: #nodes m: #edges U: maximum edge weight Algorithms assume nonnegative edge weights [Slide credit: Andrew Goldberg]

120 Ford Fulkerson: Choose any augmenting path Source a 1 a Sink

121 Ford Fulkerson: Choose any augmenting path Source a 1 a 2 0 Good Augmenting Paths Sink

122 Ford Fulkerson: Choose any augmenting path Source a 1 a 2 0 Bad Augmenting Path Sink

123 Ford Fulkerson: Choose any augmenting path Source a 1 a Sink

124 Ford Fulkerson: Choose any augmenting path n: #nodes m: #edges Source a 1 a Sink We will have to perform 2000 augmentations! Worst case complexity: O (m x Total_Flow) (Pseudo-polynomial bound: depends on flow)

125 Dinic: Choose shortest augmenting path n: #nodes m: #edges Source a 1 a Sink Worst case Complexity: O (m n 2 )

paths efficiently High worst-case time complexity Empirically outperforms other algorithms on

126 Specialized algorithms for vision problems Grid graphs Low connectivity (m ~ O(n)) Dual search tree augmenting path algorithm [Boykov and Kolmogorov PAMI 2004] Finds approximate shortest augmenting paths efficiently High worst-case time complexity Empirically outperforms other algorithms on vision problems Efficient code available on the web

127 E(x) = c i x i + d ij x i -x j i i,j E: {0,1} n R 0 fg 1 bg n = number of pixels x* = arg min E(x) x How to minimize E(x)? Global Minimum (x*)

128 Graph *g; For all pixels p /* Add a node to the graph */ nodeid(p) = g->add_node(); Source (0) /* Set cost of terminal edges */ set_weights(nodeid(p), fgcost(p), bgcost(p)); end for all adjacent pixels p,q add_weights(nodeid(p), nodeid(q), cost(p,q)); end g->compute_maxflow(); Sink (1) label_p = g->is_connected_to_source(nodeid(p)); // is the label of pixel p (0 or 1)

129 Graph *g; For all pixels p end /* Add a node to the graph */ nodeid(p) = g->add_node(); /* Set cost of terminal edges */ set_weights(nodeid(p), fgcost(p), bgcost(p)); Source (0) bgcost(a 1 ) bgcost(a 2 ) a 1 a 2 for all adjacent pixels p,q add_weights(nodeid(p), nodeid(q), cost(p,q)); end g->compute_maxflow(); label_p = g->is_connected_to_source(nodeid(p)); // is the label of pixel p (0 or 1) fgcost(a 1 ) fgcost(a 2 ) Sink (1)

130 Graph *g; For all pixels p end /* Add a node to the graph */ nodeid(p) = g->add_node(); /* Set cost of terminal edges */ set_weights(nodeid(p), fgcost(p), bgcost(p)); Source (0) bgcost(a 1 ) bgcost(a 2 ) cost(p,q) a 1 a 2 for all adjacent pixels p,q add_weights(nodeid(p), nodeid(q), cost(p,q)); end g->compute_maxflow(); label_p = g->is_connected_to_source(nodeid(p)); // is the label of pixel p (0 or 1) fgcost(a 1 ) fgcost(a 2 ) Sink (1)

131 Graph *g; For all pixels p end /* Add a node to the graph */ nodeid(p) = g->add_node(); /* Set cost of terminal edges */ set_weights(nodeid(p), fgcost(p), bgcost(p)); Source (0) bgcost(a 1 ) bgcost(a 2 ) cost(p,q) a 1 a 2 for all adjacent pixels p,q add_weights(nodeid(p), nodeid(q), cost(p,q)); end g->compute_maxflow(); fgcost(a 1 ) fgcost(a 2 ) Sink (1) label_p = g->is_connected_to_source(nodeid(p)); // is the label of pixel p (0 or 1) a 1 = bg a 2 = fg

132

133 Mixed (Real-Integer) Problems Higher Order Energy Functions Multi-label Problems Ordered Labels Stereo (depth labels) Unordered Labels Object segmentation ( car, `road, `person )

134 Mixed (Real-Integer) Problems Higher Order Energy Functions Multi-label Problems Ordered Labels Stereo (depth labels) Unordered Labels Object segmentation ( car, `road, `person )

135 We have seen several of them in the intro... x binary image segmentation (x i {0,1}) ω non-local parameter (lives in some large set Ω) E(x,ω) = C(ω) + θ i (ω, x i ) + θ ij (ω,x i,x j ) i i,j constant unary potentials pairwise potentials 0 ω Template Position Scale Orientation

136 x binary image segmentation (x i {0,1}) ω non-local parameter (lives in some large set Ω) E(x,ω) = C(ω) + θ i (ω, x i ) + θ ij (ω,x i,x j ) i i,j constant unary potentials pairwise potentials 0 {x*,ω*} = arg min E(x,ω) x,ω Standard graph cut energy if ω is fixed [Kohli et al, 06,08] [Lempitsky et al, 08]

137 Local Method: Gradient Descent over ω ω * = arg min min E (x,ω) x ω Submodular ω* [Kohli et al, 06,08]

138 Local Method: Gradient Descent over ω ω * = arg min min E (x,ω) x ω Submodular E (x,ω 1 ) E (x,ω 2 ) Similar Energy Functions Dynamic Graph Cuts time speedup! [Kohli et al, 06,08]

139 Global Method: Branch and Mincut Produces the global optimal solution Exhaustively explores all ω in Ω in the worst case 30,000,000 shapes [Lempitsky et al, 08]

140 Mixed (Real-Integer) Problems Higher Order Energy Functions Multi-label Problems Ordered Labels Stereo (depth labels) Unordered Labels Object segmentation ( car, `road, `person )

141 n = number of pixels E: {0,1} n R 0 fg, 1 bg E(X) = c i x i + d ij x i -x j i i,j Image Unary Cost Segmentation [Boykov and Jolly 01] [Blake et al. 04] [Rother, Kolmogorov and Blake `04]

142 [Kohli et al. 07] Patch Dictionary (Tree) h(x p ) = { C 1 if x i = 0, i ϵ p C max otherwise C max C 1 p

143 n = number of pixels E: {0,1} n R 0 fg, 1 bg E(X) = c i x i + d ij x i -x j + h p (X p ) i i,j p h(x p ) = { C 1 if x i = 0, i ϵ p C max otherwise p [Kohli et al. 07]

144 n = number of pixels E: {0,1} n R 0 fg, 1 bg E(X) = c i x i + d ij x i -x j + h p (X p ) i i,j p Image Pairwise Segmentation Final Segmentation [Kohli et al. 07]

145 Higher Order Submodular Functions Exact Transformation? Pairwise Submodular Function Billionnet and M. Minoux [DAM 1985] Kolmogorov & Zabih [PAMI 2004] Freedman & Drineas [CVPR2005] Kohli Kumar Torr [CVPR2007, PAMI 2008] Kohli Ladicky Torr [CVPR 2008, IJCV 2009] Ramalingam Kohli Alahari Torr [CVPR 2008] Zivny et al. [CP 2008] S T st-mincut

146 Higher Order Function Pairwise Submodular Function Identified transformable families of higher order function s.t. 1. Constant or polynomial number of auxiliary variables (a) added 2. All pairwise functions (g) are submodular

147 Example: H (X) = F ( x i ) concave H (X) 0 x i [Kohli et al. 08]

148 Simple Example using Auxiliary variables f(x) = { 0 if all x i = 0 C 1 otherwise x ϵ L = {0,1} n min f(x) x = x,a ϵ {0,1} Higher Order Submodular Function min C 1 a + C 1 ā x i Quadratic Submodular Function x i < 1 a=0 (ā=1) f(x) = 0 x i 1 a=1 (ā=0) f(x) = C 1

149 min f(x) x = Higher Order Submodular Function min C 1 a + C 1 ā x i x,a ϵ {0,1} Quadratic Submodular Function C 1 x i C x i [Kohli et al. 08]

150 min f(x) x = Higher Order Submodular Function min C 1 a + C 1 ā x i x,a ϵ {0,1} Quadratic Submodular Function C 1 x i C 1 a=0 a=1 Lower envelop of concave functions is concave x i [Kohli et al. 08]

151 min f(x) x = Higher Order Submodular Function min f 1 (x)a + f 2 (x)ā x,a ϵ {0,1} Quadratic Submodular Function f 2 (x) f 1 (x) Lower envelop of concave functions is concave x i [Kohli et al. 08]

152 min f(x) x = Higher Order Submodular Function min f 1 (x)a + f 2 (x)ā x,a ϵ {0,1} Quadratic Submodular Function f 2 (x) a=0 f 1 (x) a=1 Lower envelop of concave functions is concave x i [Kohli et al. 08]

153 min f(x) x = x ϵ {0,1} Higher Order Submodular Function min max f 1 (x)a + f 2 (x)ā a ϵ {0,1} Quadratic Submodular Function f 2 (x) a=0 f 1 (x) a=1 Upper envelope of linear functions is convex Very Hard x i Problem!!!! [Kohli and Kumar, CVPR 2010]

154 SILHOUETTE CONSTRAINTS SIZE/VOLUME PRIORS [Sinha et al. 05, Cremers et al. 08] [Woodford et al. 2009] 3D RECONSTRUCTION BINARY SEGMENTATION Rays must touch silhouette at least once Object Background Prior on size of object (say n/2) [Kohli and Kumar, CVPR 2010]

155 Mixed (Real-Integer) Problems Higher Order Energy Functions Multi-label Problems Ordered Labels Stereo (depth labels) Unordered Labels Object segmentation ( car, `road, `person )

156 Min y E(y) = f i (y i ) + g ij (y i,y j ) i i,j y ϵ Labels L = {l 1, l 2,, l k } Exact Transformation to QPBF [Roy and Cox 98] [Ishikawa 03] [Schlesinger & Flach 06] [Ramalingam, Alahari, Kohli, and Torr 08] Move making algorithms

157 So what is the problem? E m (y 1,y 2,..., y n ) E b (x 1,x 2,..., x m ) y i ϵ L = {l 1, l 2,, l k } x i ϵ L = {0,1} Multi-label Problem Binary label Problem such that: Let Y and X be the set of feasible solutions, then 1. One-One encoding function T:X->Y 2. arg min E m (y) = T(arg min E b (x))

158 Popular encoding scheme [Roy and Cox 98, Ishikawa 03, Schlesinger & Flach 06] # Nodes = n * k # Edges = m * k 2

159 Popular encoding scheme [Roy and Cox 98, Ishikawa 03, Schlesinger & Flach 06] Ishikawa s result: E(y) = θ i (y i ) + θ ij (y i,y j ) i i,j y ϵ Labels L = {l 1, l 2,, l k } # Nodes = n * k # Edges = m * k 2 θ ij (y i,y j ) = g( y i -y j ) g( y i -y j ) y i -y j Convex Function

160 Popular encoding scheme [Roy and Cox 98, Ishikawa 03, Schlesinger & Flach 06] Schlesinger & Flach 06: E(y) = θ i (y i ) + θ ij (y i,y j ) i i,j y ϵ Labels L = {l 1, l 2,, l k } θ ij (l i+1,l j ) + θ ij (l i,l j+1 ) θ ij (l i,l j ) + θ ij (l i+1,l j+1 ) # Nodes = n * k # Edges = m * k 2 l i +1 l i l j +1 l j

161 Image MAP Solution Scanline algorithm [Roy and Cox, 98]

162 Applicability Cannot handle truncated costs (non-robust) θ ij (y i,y j ) = g( y i -y j ) y i -y j Computational Cost Very high computational cost Problem size = Variables x Labels Gray level image denoising (1 Mpixel image) (~2.5 x 10 8 graph nodes) discontinuity preserving potentials Blake&Zisserman 83,87

163 Other less known algorithms Ishikawa Transformation [03] Schlesinger Transformation [06] Hochbaum [01] Unary Potentials Arbitrary Pair-wise Potentials Convex and Symmetric Complexity T(nk, mk 2 ) Arbitrary Submodular T(nk, mk 2 ) Linear Convex and Symmetric T(n, m) + n log k Hochbaum [01] Convex Convex and Symmetric O(mn log n log nk) T(a,b) = complexity of maxflow with a nodes and b edges

164 Min y E(y) = f i (y i ) + g ij (y i,y j ) i i,j y ϵ Labels L = {l 1, l 2,, l k } Exact Transformation to QPBF Move making algorithms [Boykov, Veksler and Zabih 2001] [Woodford, Fitzgibbon, Reid, Torr, 2008] [Lempitsky, Rother, Blake, 2008] [Veksler, 2008] [Kohli, Ladicky, Torr 2008]

165 Energy Solution Space

166 Energy Current Solution Search Neighbourhood Optimal Move Solution Space

167 Energy Current Solution Search Neighbourhood (t) x c Optimal Move Key Property Move Space Solution Space Bigger move space Better solutions Finding the optimal move hard

168 Minimizing Pairwise Functions [Boykov Veksler and Zabih, PAMI 2001] Series of locally optimal moves Each move reduces energy Optimal move by minimizing submodular function Current Solution Move Space (t) : 2 n Space of Solutions (x) : L n n L Search Neighbourhood Number of Variables Number of Labels Extend to minimize Higher order Functions Kohli et al. 07, 08, 09

169 New solution x = t x 1 + (1- t) x 2 Current Solution Second solution E m (t) = E(t x 1 + (1- t) x 2 ) Minimize over move variables t For certain x 1 and x 2, the move energy is sub-modular QPBF [Boykov, Veksler and Zabih 2001]

170 Variables labeled α, β can swap their labels [Boykov, Veksler and Zabih 2001]

171 Variables labeled α, β can swap their labels Swap Sky, House Tree Ground House Sky [Boykov, Veksler and Zabih 2001]

172 Variables labeled α, β can swap their labels Move energy is submodular if: Unary Potentials: Arbitrary Pairwise potentials: Semi-metric θ ij (l a,l b ) 0 θ ij (l a,l b ) = 0 a = b Examples: Potts model, Truncated Convex [Boykov, Veksler and Zabih 2001]

173 Variables take label a or retain current label [Boykov, Veksler and Zabih 2001] [Boykov, Veksler, Zabih]

174 Variables take label a or retain current label Status: Initialize Expand Ground Sky House with Tree Tree Ground House Sky [Boykov,, Veksler Veksler, and Zabih] 2001]

175 Variables take label a or retain current label Move energy is submodular if: Unary Potentials: Arbitrary Pairwise potentials: Metric Semi metric + Triangle Inequality θ ij (l a,l b ) + θ ij (l b,l c ) θ ij (l a,l c ) Examples: Potts model, Truncated linear Cannot solve truncated quadratic [Boykov, Veksler and Zabih 2001] [Boykov, Veksler, Zabih]

176 Expansion and Swap can be derived as a primal dual scheme Get solution of the dual problem which is a lower bound on the energy of solution Weak guarantee on the solution E(x) < 2(d max /d min ) E(x*) θ ij (l i,l j ) = g( l i -l j ) d max d min y i -y j [Komodakis et al 05, 07]

177 x = t x 1 + (1-t) x 2 New solution First solution Second solution Minimize over move variables t Move Type First Solution Second Solution Guarantee Expansion Old solution All alpha Metric Fusion Any solution Any solution Move functions can be non-submodular!!

178 x = t x 1 + (1-t) x 2 x 1, x 2 can be continuous Optical Flow Example Solution from Method 2 x 2 Solution from Method 1 x 1 F x Final Solution [Woodford, Fitzgibbon, Reid, Torr, 2008] [Lempitsky, Rother, Blake, 2008]

179 Move variables can be multi-label x = (t ==1) x 1 + (t==2) x 2 + +(t==k) x k Optimal move found out by using the Ishikawa Transform Useful for minimizing energies with truncated convex pairwise potentials T θ ij (y i,y j ) = min( y i -y j 2,T) θ ij (y i,y j ) y i -y j [Kumar and Torr, 2008] [Veksler, 2008]

180 Image Noisy Image Expansion Move Range Moves [Veksler, 2008]

181

3,600,000,000 Pixels Created from about 800 8

182 3,600,000,000 Pixels Created from about MegaPixel Images [Kopf et al. (MSR Redmond) SIGGRAPH 2007 ]

183 [Kopf et al. (MSR Redmond) SIGGRAPH 2007 ]

184 Processing Videos 1 minute video of 1M pixel resolution 3.6 B pixels 3D reconstruction [500 x 500 x 500 =.125B voxels]

185 Segment First Frame Can we do better? Segment Second Frame Kohli [Kohli & Torr & Torr, (ICCV05, PAMI07) PAMI07]

186 Image Flow Segmentation Kohli [Kohli & Torr & Torr, (ICCV05, PAMI07) PAMI07]

187 Frame 1 E A minimize S A Reuse Computation differences between A and B E B* Simpler fast Frame 2 E B minimize S B time speedup! Reparametrization [Kohli & Torr, ICCV05 PAMI07] [Komodakis & Paragios, CVPR07]

188 Original Energy E(a 1,a 2 ) = 2a 1 + 5ā 1 + 9a 2 + 4ā 2 + 2a 1 ā 2 + ā 1 a 2 Reparametrized Energy E(a 1,a 2 ) = 8 + ā 1 + 3a 2 + 3ā 1 a 2 New Energy E(a 1,a 2 ) = 2a 1 + 5ā 1 + 9a 2 + 4ā 2 + 7a 1 ā 2 + ā 1 a 2 New Reparametrized Energy E(a 1,a 2 ) = 8 + ā 1 + 3a 2 + 3ā 1 a 2 + 5a 1 ā 2 [Kohli & Torr, ICCV05 PAMI07] Kohli & [Komodakis Torr (ICCV05, & Paragios, PAMI07) CVPR07]

189 Original Problem (Large) Approximation algorithm (Slow) Approximate Solution Fast partially optimal algorithm [ Alahari Kohli & Torr CVPR 08]

190 Original Problem (Large) Approximation algorithm (Slow) Approximate Solution Fast partially optimal algorithm [Kovtun 03] Fast partially optimal algorithm [ Kohli et al. 09] Reduced Problem Solved Problem (Global Optima) Approximation algorithm (Fast) Approximate Solution [ Alahari Kohli & Torr CVPR 08]

Tree Reweighted Message Passing (9.89 sec) Tree Reweighted Message Passing Total Time (0.

191 3-100 Times Speed up Fast partially optimal algorithm [Kovtun 03] Fast partially optimal algorithm [ Kohli et al. 09] Original Problem (Large) sky Reduced Problem Building Solved Airplane Problem (Global Optima) Grass Tree Reweighted Message Passing (9.89 sec) Tree Reweighted Message Passing Total Time (0.30 sec) sky Building Approximate Airplane Solution Grass sky Building Approximate Airplane Solution Grass [ Alahari Kohli & Torr CVPR 08]

192 Message Passing algorithms Belief propagation, TRW Exact on tree structured graphs Connections to Linear programming relaxations Graph Cuts and submodular functions Move making algorithms expansion, swap, fusion Minimizing Higher order functions

193 Going beyond small connectivity Representation, Inference and learning of Higher order models Topology constraints: connectivity of objects Parallelization GPUs Linear programming formulations

194

195 Past presenters Pawan, Carsten and Andrew Blake Special thanks to the IbPRIA team In particular: Joost Van de Weijer Jordi Gonzàlez Mario Hernández Tejera

196

197

198 Set of Label assignments Associated Costs p 1 p 2 p 3 Introduce pattern variable z with (t+1)-states to get a pairwise function: Unary term assigns the cost for the selected label assignment z Pairwise terms ensure x is consistent with pattern z

199 E(x) = θ i (x i, z i ) + θ ij (x i,x j,z i,z j ) i i,j Є N 4 θ ij (x i,x j,z i,z j ) = x i -x j (-exp{-ß z i -z j }) θ ij ß=2(Mean( z i -z j 2 ) ) -1 Conditional Random Field (CRF): no pure prior z i -z j z i z j Observed variable Unobserved (latent) variable x j Factor graph x j i MRF CRF

unary Pair-wise E(x) = θ i (x i,z i ) + w θ ij

optimal x i -x j x i original input TRW-S

200 unary Pair-wise E(x) = θ i (x i,z i ) + w θ ij (x i,x j ) x i R Convex unary and pairwise terms: θ ij (x i,x j ) = g( x i -x j ) θ i (x i,z i ) = x i -z i Can be solved globally optimal x i -x j x i original input TRW-S (discrete labels) HBF [Szelsiki 06] (continuous labels) ~ 15times faster

201 Non-submodular Energy Functions Mixed (Real-Integer) Problems Higher Order Energy Functions Multi-label Problems Ordered Labels Stereo (depth labels) Unordered Labels Object segmentation ( car, `road, `person )

202 E(x) = θ i (x i ) + θ ij (x i,x j ) i i,j θ ij (0,1) + θ ij (1,0) θ ij (0,0) + θ ij (1,1) for some ij Minimizing general non-submodular functions is NPhard. Commonly used method is to solve a relaxation of the problem [Boros and Hammer, 02]

203

204 Integer Program,

205 , Linear Program Feasibility Region (LOCAL Polytope)

206 Formulate the program Solve LP and obtain Integer solution by rounding Round Integral solution is not guaranteed to be optimal However, we can obtain multiplicative bounds on the error Feasibility Region (LOCAL Polytope)

207 Traditional LP Solvers are unable to handle large instances ( >10 7 variables) Simplex Interior Point Methods [Bertsimas and Tsitsikilis, 1997] [Boyd and Vandenberghe, 2004] Exploit the structure of the problem Special Message Passing Solvers Message Passing Algorithms Weak convergence properties [Kolmogorov and Wainwright] [Ravikumar et al., ICML 2008] Feasibility Region (LOCAL Polytope)

208 A new weaker relaxation...roof-dual [Boros and Hammer, 2002] [Kohli et al., ICML 2008] [Kolmogorov and Rother, PAMI 2006] x... which can be exactly solved by minimizing a pairwise submodular function Obtain partially optimal solutions x = [00*10**1111*] x OPT = [ ] Feasibility Region (LOCAL Polytope)

209 unary ~ pq ( 0,0) pq(1,1) pq(0,1) pq(1,0) pairwise submodular pq ~ (0,0) pq (1,1) ~ pq ~ (0,1) pairwise nonsubmodular pq (1,0) [Boros and Hammer, 02]

210 Double number of variables: [Boros and Hammer, 02]

211 Double number of variables: x x, x x 1 x ) p p p ( p p [Boros and Hammer, 02] Ignore constraint and solve

212 Double number of variables: x x, x x 1 x ) p p p ( p p Local Optimality [Boros and Hammer, 02]

Discrete Inference and Learning Lecture 3

Discrete Inference and Learning Lecture 3 MVA 2017 2018 h