Part 7: Structured Prediction and Energy Minimization (2/2)

Size: px
Start display at page:

Download "Part 7: Structured Prediction and Energy Minimization (2/2)"

Transcription

1 Part 7: Structured Prediction and Energy Minimization (2/2) Colorado Springs, 25th June 2011

2 G: Worst-case Complexity Hard problem Generality Optimality Worst-case complexity Integrality Determinism

3 G: Worst-case Complexity Giving up Worst-case Complexity Worst-case complexity is an asymptotic behaviour Worst-case complexity quantifies worst case Practical case might be very different Issue: what is the distribution over inputs? Popular methods with bad or unknown worst-case complexity Simplex Method for Linear Programming Hash tables Branch-and-bound search

4 G: Worst-case Complexity Branch-and-bound Search Implicit enumeration: globally optimal Choose 1. Partitioning of solution space 2. Branching scheme 3. Upper and lower bounds over partitions Branch and bound is very flexible, many tuning possibilities in partitioning, branching schemes and bounding functions, can be very efficient in practise, typically has worst-case complexity equal to exhaustive enumeration

5 G: Worst-case Complexity Branch-and-Bound (cont) A 5 A 4 C 2 C 1 A 2 A 1 C 3 A 3 Y Work with partitioning of solution space Y Active nodes (white) Closed nodes (gray)

6 G: Worst-case Complexity Branch-and-Bound (cont) A C Y Y Initially: everything active Goal: everything closed

7 G: Worst-case Complexity Branch-and-Bound (cont) A 5 A 4 C 2 C 1 A 2 A 1 C 3 A 3 Y

8 G: Worst-case Complexity Branch-and-Bound (cont) A 2 Take an active element (A 2 )

9 G: Worst-case Complexity Branch-and-Bound (cont) A 2 Partition into two or more subsets of Y

10 G: Worst-case Complexity Branch-and-Bound (cont) C 4 C 5 A 6 Evaluate bounds and set node active or closed Closing possible if we can prove that no solution in a partition can be better than a known solution of value L g(a) L close A

11 G: Worst-case Complexity Example 1: Efficient Subwindow Search Efficient Subwindow Search (Lampert and Blaschko, 2008) Find the bounding box that maximizes a linear scoring function y = argmax w, φ(y, x) y Y

12 G: Worst-case Complexity Example 1: Efficient Subwindow Search Efficient Subwindow Search (Lampert and Blaschko, 2008) g(x, y) = β + w(x i ) x i within y

13 G: Worst-case Complexity Example 1: Efficient Subwindow Search Efficient Subwindow Search (Lampert and Blaschko, 2008) Subsets B of bounding boxes specified by interval coordinates, B 2 Y

14 G: Worst-case Complexity Example 1: Efficient Subwindow Search Efficient Subwindow Search (Lampert and Blaschko, 2008) Upper bound: g(x, B) g(x, y) for all y B g(x, B) = β + x i within B max g(x, y) max y B max{w(x i ), 0} + x i within B min min{0, w(x i )}

15 G: Worst-case Complexity Example 1: Efficient Subwindow Search Efficient Subwindow Search (Lampert and Blaschko, 2008) PASCAL VOC 2007 detection challenge bounding boxes found using ESS

16 G: Worst-case Complexity Example 2: Branch-and-Mincut Branch-and-Mincut (Lempitsky et al., 2008) Binary image segmentation with non-local interaction y 1, y 2 Y

17 G: Worst-case Complexity Example 2: Branch-and-Mincut Branch-and-Mincut (Lempitsky et al., 2008) Binary image segmentation with non-local interaction E(z, y) = C(y)+ p V F p (y)z p + p V B p (y)(1 z p )+ g(x, y) = max z 2 V E(z, y) Here: z {0, 1} V is a binary pixel mask F p (y), B p (y) are foreground/background unary energies P pq (y) is a standard pairwise energy Global dependencies on y {i,j} E P pq (y) z p z q,

18 G: Worst-case Complexity Example 2: Branch-and-Mincut Branch-and-Mincut (Lempitsky et al., 2008) Upper bound for any subset A Y: max y A = max y A max max z 2 V + X p V g(x, y) = max 2 z 2 V + X 2 4 {i,j} E max y A z 2 4C(y) + X p V V E(z, y) P pq (y) z p z q 5 max C(y) y A «+ X p V F p (y)z p + X p V B p (y)(1 z p) 3 «max F p (y) z p y A «max y A Bp (y) (1 z p) + X {i,j} E 3 «max y A Ppq (y) z p z q 5.

19 Hard problem Generality Optimality Worst-case complexity Integrality Determinism

20 Problem Relaxations Optimization problems (minimizing g : G R over Y G) can become easier if feasible set is enlarged, and/or objective function is replaced with a bound. g(x), h(x) h(z) Y g(y) y, z Z Y

21 Problem Relaxations Optimization problems (minimizing g : G R over Y G) can become easier if feasible set is enlarged, and/or objective function is replaced with a bound. Definition (Relaxation (Geoffrion, 1974)) Given two optimization problems (g, Y, G) and (h, Z, G), the problem (h, Z, G) is said to be a relaxation of (g, Y, G) if, 1. Z Y, i.e. the feasible set of the relaxation contains the feasible set of the original problem, and 2. y Y : h(y) g(y), i.e. over the original feasible set the objective function h achieves no smaller values than the objective function g.

22 Relaxations Relaxed solution z provides a bound: h(z ) g(y ), (maximization: upper bound) h(z ) g(y ), (minimization: lower bound) Relaxation is typically tractable Evidence that relaxations are better for learning (Kulesza and Pereira, 2007), (Finley and Joachims, 2008), (Martins et al., 2009) Drawback: z Z \ Y possible Are there principled methods to construct relaxations? Linear Programming Relaxations Lagrangian relaxation Lagrangian/Dual decomposition

23 Relaxations Relaxed solution z provides a bound: h(z ) g(y ), (maximization: upper bound) h(z ) g(y ), (minimization: lower bound) Relaxation is typically tractable Evidence that relaxations are better for learning (Kulesza and Pereira, 2007), (Finley and Joachims, 2008), (Martins et al., 2009) Drawback: z Z \ Y possible Are there principled methods to construct relaxations? Linear Programming Relaxations Lagrangian relaxation Lagrangian/Dual decomposition

24 Linear Programming Relaxation y 2 1 Ay b 1 y 1 max y c y sb.t. Ay b, y is integer.

25 Linear Programming Relaxation y 2 1 Ay b 1 y 1 max y c y sb.t. Ay b, y is integer.

26 Linear Programming Relaxation y 2 y 2 1 Ay b 1 Ay b 1 y 1 1 y 1 max y c y sb.t. Ay b,

27 MAP-MRF Linear Programming Relaxation Y 1 Y 2 Y 1 Y 1 Y 2 Y 2 θ 1 θ 1,2 θ 2

28 MAP-MRF Linear Programming Relaxation Y 1 Y 2 Y 1 Y 1 Y 2 Y 2 y 1 = 2 (y 1, y 2 ) = (2, 3) y 2 = 3 Energy E(y; x) = θ 1 (y 1 ; x) + θ 1,2 (y 1, y 2 ; x) + θ 2 (y 2 ; x) Probability p(y x) = 1 Z(x) exp{ E(y; x)} MAP prediction: argmax y Y p(y x) = argmin E(y; x) y Y

29 MAP-MRF Linear Programming Relaxation Y 1 Y 2 Y 1 Y 1 Y 2 Y 2 µ 1 {0, 1} Y1 µ 1,2 {0, 1} Y1 Y2 µ 2 {0, 1} Y2 µ 1 (y 1 ) = y 2 Y 2 µ 1,2 (y 1, y 2 ) y 1 Y 1 µ 1,2 (y 1, y 2 ) = µ 2 (y 2 ) y 1 Y 1 µ 1 (y 1 ) = 1 (y 1,y 2) Y 1 Y 2 µ 1,2 (y 1, y 2 ) = 1 y 2 Y 2 µ 2 (y 2 ) = 1 Energy is now linear: E(y; x) = θ, µ =: g(y, x)

30 MAP-MRF Linear Programming Relaxation (cont) max µ sb.t. X X X X θ i (y i )µ i (y i ) + θ i,j (y i, y j )µ i,j (y i, y j ) i V y i Y i {i,j} E (y i,y j ) Y i Y j X µ i (y i ) = 1, i V, y i Y i X µ i,j (y i, y j ) = µ i (y i ), {i, j} E, y i Y i y j Y j µ i (y i ) {0, 1}, i V, y i Y i, µ i,j (y i, y j ) {0, 1}, {i, j} E, (y i, y j ) Y i Y j.

31 MAP-MRF Linear Programming Relaxation (cont) max µ sb.t. X X X X θ i (y i )µ i (y i ) + θ i,j (y i, y j )µ i,j (y i, y j ) i V y i Y i {i,j} E (y i,y j ) Y i Y j X µ i (y i ) = 1, i V, y i Y i X µ i,j (y i, y j ) = µ i (y i ), {i, j} E, y i Y i y j Y j µ i (y i ) {0, 1}, i V, y i Y i, µ i,j (y i, y j ) {0, 1}, {i, j} E, (y i, y j ) Y i Y j. NP-hard integer linear program

32 MAP-MRF Linear Programming Relaxation (cont) max µ sb.t. X X X X θ i (y i )µ i (y i ) + θ i,j (y i, y j )µ i,j (y i, y j ) i V y i Y i {i,j} E (y i,y j ) Y i Y j X µ i (y i ) = 1, i V, y i Y i X µ i,j (y i, y j ) = µ i (y i ), {i, j} E, y i Y i y j Y j µ i (y i ) [0, 1], i V, y i Y i, µ i,j (y i, y j ) [0, 1], {i, j} E, (y i, y j ) Y i Y j. linear program

33 MAP-MRF LP Relaxation, Works MAP-MRF LP Analysis (Wainwright et al., Trans. Inf. Tech., 2005), (Weiss et al., UAI 2007), (Werner, PAMI 2005) (Kolmogorov, PAMI 2006) Improving tightness (Komodakis and Paragios, ECCV 2008), (Kumar and Torr, ICML 2008), (Sontag and Jaakkola, NIPS 2007), (Sontag et al., UAI 2008), (Werner, CVPR 2008), (Kumar et al., NIPS 2007) Algorithms based on the LP (Globerson and Jaakkola, NIPS 2007), (Kumar and Torr, ICML 2008), (Sontag et al., UAI 2008)

34 MAP-MRF LP Relaxation, Known results 1. LOCAL is tight iff the factor graph is a forest (acyclic) 2. All labelings are vertices of LOCAL (Wainwright and Jordan, 2008) 3. For cyclic graphs there are additional fractional vertices. 4. If all factors have regular energies, the fractional solutions are never optimal (Wainwright and Jordan, 2008) 5. For models with only binary states: half-integrality, integral variables are certain

35 Lagrangian Relaxation Assumption max y g(y) sb.t. y D, y C. Optimizing g(y) over y D is easy. Optimizing over y D C is hard. High-level idea Incorporate y C constraint into objective function For an excellent introduction, see (Guignard, Lagrangean Relaxation, TOP 2003) and (Lemaréchal, Lagrangian Relaxation, 2001).

36 Lagrangian Relaxation (cont) Recipe 1. Express C in terms of equalities and inequalities C = {y G : u i (y) = 0, i = 1,..., I, v j (y) 0, j = 1,..., J}, ui : G R differentiable, typically affine, vj : G R differentiable, typically convex. 2. Introduce Lagrange multipliers, yielding max y g(y) sb.t. y D, u i (y) = 0 : λ, i = 1,..., I, v j (y) 0 : µ, j = 1,..., J.

37 Lagrangian Relaxation (cont) 2. Introduce Lagrange multipliers, yielding 3. Build partial Lagrangian max y g(y) sb.t. y D, u i (y) = 0 : λ, i = 1,..., I, v j (y) 0 : µ, j = 1,..., J. max y g(y) + λ u(y) + µ v(y) (1) sb.t. y D.

38 Lagrangian Relaxation (cont) 3. Build partial Lagrangian max y g(y) + λ u(y) + µ v(y) (1) sb.t. y D. Theorem (Weak Duality of Lagrangean Relaxation) For differentiable functions u i : G R and v j : G R, and for any λ R I and any non-negative µ R J, µ 0, problem (1) is a relaxation of the original problem: its value is larger than or equal to the optimal value of the original problem.

39 Lagrangian Relaxation (cont) 3. Build partial Lagrangian q(λ, µ) := max y g(y) + λ u(y) + µ v(y) (1) sb.t. y D. 4. Minimize upper bound wrt λ, µ min λ,µ q(λ, µ) sb.t. µ 0 efficiently solvable if q(λ, µ) can be evaluated

40 Optimizing Lagrangian Dual Functions 4. Minimize upper bound wrt λ, µ min λ,µ sb.t. µ 0 Theorem (Lagrangean Dual Function) q(λ, µ) (2) 1. q is convex in λ and µ, Problem (2) is a convex minimization problem. 2. If q is unbounded below, then the original problem is infeasible. 3. For any λ, µ 0, let y(λ, µ) = argmax y D g(y) + λ u(y) + µ v(u) Then, a subgradient of q can be constructed by evaluating the constraint functions at y(λ, µ) as u(y(λ, µ)) q(λ, µ), and v(y(λ, µ)) q(λ, µ). λ µ

41 Example: MAP-MRF Message Passing max µ sb.t. X X X X θ i (y i )µ i (y i ) + θ i,j (y i, y j )µ i,j (y i, y j ) i V y i Y i {i,j} E (y i,y j ) Y i Y j X µ i (y i ) = 1, i V, y i Y i X µ i,j (y i, y j ) = 1, {i, j} E, (y i,y j ) Y i Y j X µ i,j (y i, y j ) = µ i (y i ), {i, j} E, y i Y i y j Y j µ i (y i ) {0, 1}, i V, y i Y i, µ i,j (y i, y j ) {0, 1}, {i, j} E, (y i, y j ) Y i Y j.

42 Example: MAP-MRF Message Passing max µ sb.t. X X X X θ i (y i )µ i (y i ) + θ i,j (y i, y j )µ i,j (y i, y j ) i V y i Y i {i,j} E (y i,y j ) Y i Y j X µ i (y i ) = 1, i V, y i Y i X µ i,j (y i, y j ) = 1, {i, j} E, (y i,y j ) Y i Y j X µ i,j (y i, y j ) = µ i (y i ), {i, j} E, y i Y i y j Y j µ i (y i ) {0, 1}, i V, y i Y i, µ i,j (y i, y j ) {0, 1}, {i, j} E, (y i, y j ) Y i Y j. If the constraint would not be there, problem is trivial! (Wainwright and Jordan, 2008, section 4.1.3)

43 Example: MAP-MRF Message Passing max µ sb.t. X X X X θ i (y i )µ i (y i ) + θ i,j (y i, y j )µ i,j (y i, y j ) i V y i Y i {i,j} E (y i,y j ) Y i Y j X µ i (y i ) = 1, i V, y i Y i X µ i,j (y i, y j ) = 1, {i, j} E, (y i,y j ) Y i Y j X µ i,j (y i, y j ) = µ i (y i ) : φ i,j (y i ), {i, j} E, y i Y i y j Y j µ i (y i ) {0, 1}, i V, y i Y i, µ i,j (y i, y j ) {0, 1}, {i, j} E, (y i, y j ) Y i Y j. If the constraint would not be there, problem is trivial! (Wainwright and Jordan, 2008, section 4.1.3)

44 Example: MAP-MRF Message Passing q(φ) := max µ sb.t. X X X X θ i (y i )µ i (y i ) + θ i,j (y i, y j )µ i,j (y i, y j ) i V y i Y i {i,j} E (y i,y j ) Y i Y j X X φ i,j (y i X µ i,j (y i, y j ) µ i (y i ) A {i,j} E y i Y i y j Y j X µ i (y i ) = 1, i V, y i Y i X µ i,j (y i, y j ) = 1, {i, j} E, (y i,y j ) Y i Y j µ i (y i ) {0, 1}, i V, y i Y i, µ i,j (y i, y j ) {0, 1}, {i, j} E, (y i, y j ) Y i Y j.

45 Example: MAP-MRF Message Passing q(φ) := max µ sb.t. 0 X i (y i ) i V y i Y i + X X {i,j} E X X j V :{i,j} E 1 φ i,j (y i ) A µ i (y i ) (y i,y j ) Y i Y j (θ i,j (y i, y j ) + φ i,j (y i )) µ i,j (y i, y j ) y i Y i µ i (y i ) = 1, i V, X (y i,y j ) Y i Y j µ i,j (y i, y j ) = 1, {i, j} E, µ i (y i ) {0, 1}, i V, y i Y i, µ i,j (y i, y j ) {0, 1}, {i, j} E, (y i, y j ) Y i Y j.

46 Example: MAP-MRF Message Passing φ 1,2 R Y1 φ 2,1 R Y2 Y 1 Y 2 Y 1 Y 1 Y 2 Y 2 θ 1 θ 1,2 θ 2 +φ 1,2 φ 1,2 φ 2,1 +φ 2,1 Parent-to-child region-graph messages (Meltzer et al., UAI 2009) Max-sum diffusion reparametrization (Werner, PAMI 2007)

47 Example behaviour of objectives 10 0 Dual objective Primal objective Objective Iteration

48 Primal Solution Recovery Assume we solved the dual problem for (λ, µ ) 1. Can we obtain a primal solution y? 2. Can we say something about the relaxation quality? Theorem (Sufficient Optimality Conditions) If for a given λ, µ 0, we have u(y(λ, µ)) = 0 and v(y(λ, µ)) 0 (primal feasibility) and further we have λ u(y(λ, µ)) = 0, and µ v(y(λ, µ)) = 0, (complementary slackness), then y(λ, µ) is an optimal primal solution to the original problem, and (λ, µ) is an optimal solution to the dual problem.

49 Primal Solution Recovery Assume we solved the dual problem for (λ, µ ) 1. Can we obtain a primal solution y? 2. Can we say something about the relaxation quality? Theorem (Sufficient Optimality Conditions) If for a given λ, µ 0, we have u(y(λ, µ)) = 0 and v(y(λ, µ)) 0 (primal feasibility) and further we have λ u(y(λ, µ)) = 0, and µ v(y(λ, µ)) = 0, (complementary slackness), then y(λ, µ) is an optimal primal solution to the original problem, and (λ, µ) is an optimal solution to the dual problem.

50 Primal Solution Recovery (cont) But, we might never see a solution satisfying the optimality condition is the case only if there is no duality gap, i.e. q(λ, µ ) = g(x, y ) Special case: integer linear programs we can always reconstruct a primal solution to min y g(y) (3) sb.t. y conv(d), y C. For example for subgradient method updates it is known that (Anstreicher and Wolsey, MathProg 2009) 1 lim T T T y(λ t, µ t ) y of (3). t=1

51 Dual/Lagrangian Decomposition Additive structure max y K g k (y) k=1 sb.t. y Y, such that K k=1 g k(y) is hard, but for any k, max y sb.t. g k (y) y Y is tractable. (Guignard, Lagrangean Decomposition, MathProg 1987), the original paper for dual decomposition (Conejo et al., Decomposition Techniques in Mathematical Programming, 2006), continuous variable problems

52 Dual/Lagrangian Decomposition Additive structure max y K g k (y) k=1 sb.t. y Y, such that K k=1 g k(y) is hard, but for any k, max y sb.t. g k (y) y Y is tractable. (Guignard, Lagrangean Decomposition, MathProg 1987), the original paper for dual decomposition (Conejo et al., Decomposition Techniques in Mathematical Programming, 2006), continuous variable problems

53 Dual/Lagrangian Decomposition Idea 1. Selectively duplicate variables ( variable splitting ) max y 1,...,y K,y K g k (y k ) k=1 sb.t. y Y, y k Y, k = 1,..., K, y = y k : λ k, k = 1,..., K. (4) 2. Add coupling equality constraints between duplicated variables 3. Apply Lagrangian relaxation to the coupling constraint Also known as dual decomposition and variable splitting.

54 Dual/Lagrangian Decomposition (cont) max y 1,...,y K,y K g k (y k ) k=1 sb.t. y Y, y k Y, k = 1,..., K, y = y k : λ k, k = 1,..., K. (5)

55 Dual/Lagrangian Decomposition (cont) max y 1,...,y K,y K K g k (y k ) + λ k (y y k ) k=1 sb.t. y Y, k=1 y k Y, k = 1,..., K.

56 Dual/Lagrangian Decomposition (cont) max y 1,...,y K,y ( K ( gk (y k ) λ ) K ) k y k + λ k y k=1 sb.t. y Y, y k Y, k = 1,..., K. k=1 Problem is decomposed

57 Dual/Lagrangian Decomposition (cont) q(λ) := max y 1,...,y K,y ( K ( gk (y k ) λ ) K ) k y k + λ k y k=1 sb.t. y Y, Dual optimization problem is y k Y, k = 1,..., K. min λ sb.t. q(λ) K λ k = 0, k=1 k=1 where {λ K k=1 λ k 0} is the domain where q(λ) > Projected subgradient method, using (y y k ) 1 K (y y l ) = 1 K y l y k. λ k K K l=1 l=1

58 Primal Interpretation Primal interpretation for linear case due to (Guignard, 1987) Theorem (Lagrangian Decomposition Primal) Let g(x, y) = c(x) y be linear, then the solution of the dual obtains the value of the following relaxed primal optimization problem. min y K g k (x, y) k=1 sb.t. y conv(y k ), k = 1,..., K.

59 Primal Interpretation (cont) y D conv(y 1 ) conv(y 1 ) conv(y 2 ) y conv(y 2 ) c y min y K g k (x, y) k=1 sb.t. y conv(y k ), k = 1,..., K.

60 Applications in Computer Vision Very broadly applicable (Komodakis et al., ICCV 2007) (Woodford et al., ICCV 2009) (Strandmark and Kahl, CVPR 2010) (Vicente et al., ICCV 2009) (Torresani et al., ECCV 2008) (Werner, TechReport 2009) Further reading (Sontag et al., Introduction to Dual Decomposition for Inference, OfML Vol, 2011)

61 Determinism Hard problem Generality Optimality Worst-case complexity Integrality Determinism

62 Determinism Giving up Determinism Algorithms that use randomness, and are non-deterministic, result does not exclusively depend on the input, and are allowed to return a wrong result (with low probability), or/and have a random runtime. Is it a good idea? for some problems randomized algorithms are the only known efficient algorithms, using randomness for hard problems has a long tradition in sampling-based algorithms, physics, computational chemistry, etc. such algorithms can be simple and effective, but proving theorems about them can be much harder

63 Determinism Simulated Annealing Basic idea (Kirkpatrick et al., 1983) 1. Define a distribution p(y; T ) that concentrates mass on states y with high values g(x, y) 2. Simulate p(y; T ) 3. Increase concentration and repeat Defining p(y; T ) Boltzmann distribution Simulating p(y; T ) MCMC: usually done using a Metropolis chain or a Gibbs sampler

64 Determinism Simulated Annealing (cont) Definition (Boltzmann Distribution) For a finite set Y, a function g : X Y R and a temperature parameter T > 0, let p(y; T ) = 1 ( ) g(x, y) Z(T ) exp, (5) T with Z(T ) = y Y Y at temperature T. g(x,y) exp( T ) be the Boltzmann distribution for g over

65 Determinism Simulated Annealing (cont) 40 Function g Function value State y

66 Determinism Simulated Annealing (cont) Probability mass Distribution P(T=100.0) State y

67 Determinism Simulated Annealing (cont) Probability mass Distribution P(T=10.0) State y

68 Determinism Simulated Annealing (cont) Probability mass Distribution P(T=4.0) State y

69 Determinism Simulated Annealing (cont) Probability mass Distribution P(T=1.0) State y

70 Determinism Simulated Annealing (cont) Probability mass Distribution P(T=0.1) State y

71 Determinism Simulated Annealing (cont) 40 Function g Function value State y

72 Determinism Simulated Annealing (cont) 1: y = SimulatedAnnealing(Y, g, T, N) 2: Input: 3: Y finite feasible set 4: g : X Y R objective function 5: T R K sequence of K decreasing temperatures 6: N N K sequence of K step lengths 7: (y, y ) (y 0, y 0 ) 8: for k = 1,..., K do 9: y simulate Markov chain p(y; T (k)) starting from y for N(k) steps 10: end for 11: y y

73 Determinism Guarantees Theorem (Guaranteed Optimality (Geman and Geman, 1984)) If there exist a k 0 2 such that for all k k 0 the temperature T (k) satisfies T (k) Y (max y Y g(x, y) min y Y g(x, y)), log k then the probability of seeing the maximizer y of g tends to one as k. too slow in practise faster schedules are used in practise, such as T (k) = T 0 α k.

74 Determinism Guarantees Theorem (Guaranteed Optimality (Geman and Geman, 1984)) If there exist a k 0 2 such that for all k k 0 the temperature T (k) satisfies T (k) Y (max y Y g(x, y) min y Y g(x, y)), log k then the probability of seeing the maximizer y of g tends to one as k. too slow in practise faster schedules are used in practise, such as T (k) = T 0 α k.

75 Determinism Example: Simulated Annealing (Geman and Geman, 1984) Figure: Factor graph model 2D 8-neighbor 128-by-128 grid, 5 possible labels Pairwise Potts potentials Task: restoration Figure: Approximate sample (200 sweeps)

76 Determinism Example: Simulated Annealing (Geman and Geman, 1984) Figure: Approximate sample (200 sweeps) Figure: Corrupted with Gaussian noise 2D 8-neighbor 128-by-128 grid, 5 possible labels Pairwise Potts potentials Task: restoration

77 Determinism Example: Simulated Annealing (Geman and Geman, 1984) Figure: Corrupted input image Figure: Restoration, 25 sweeps Derive unary energies from corrupted input image (optimal) Simulated annealing, 25 Gibbs sampling sweeps

78 Determinism Example: Simulated Annealing (Geman and Geman, 1984) Figure: Corrupted input image Figure: Restoration, 300 sweeps Simulated annealing, 300 Gibbs sampling sweeps Schedule T (k) = 4.0/ log(1 + k)

79 Determinism Example: Simulated Annealing (Geman and Geman, 1984) Figure: Noise-free sample Figure: Restoration, 300 sweeps Simulated annealing, 300 Gibbs sampling sweeps Schedule T (k) = 4.0/ log(1 + k)

80 End The End... Tutorial in written form now publisher s FnT Computer Graphics and Vision series PDF available on authors homepages Thank you!

Discrete Inference and Learning Lecture 3

Discrete Inference and Learning Lecture 3 Discrete Inference and Learning Lecture 3 MVA 2017 2018 h

More information

Part 6: Structured Prediction and Energy Minimization (1/2)

Part 6: Structured Prediction and Energy Minimization (1/2) Part 6: Structured Prediction and Energy Minimization (1/2) Providence, 21st June 2012 Prediction Problem Prediction Problem y = f (x) = argmax y Y g(x, y) g(x, y) = p(y x), factor graphs/mrf/crf, g(x,

More information

Joint Optimization of Segmentation and Appearance Models

Joint Optimization of Segmentation and Appearance Models Joint Optimization of Segmentation and Appearance Models David Mandle, Sameep Tandon April 29, 2013 David Mandle, Sameep Tandon (Stanford) April 29, 2013 1 / 19 Overview 1 Recap: Image Segmentation 2 Optimization

More information

Dual Decomposition for Inference

Dual Decomposition for Inference Dual Decomposition for Inference Yunshu Liu ASPITRG Research Group 2014-05-06 References: [1]. D. Sontag, A. Globerson and T. Jaakkola, Introduction to Dual Decomposition for Inference, Optimization for

More information

Introduction To Graphical Models

Introduction To Graphical Models Peter Gehler Introduction to Graphical Models Introduction To Graphical Models Peter V. Gehler Max Planck Institute for Intelligent Systems, Tübingen, Germany ENS/INRIA Summer School, Paris, July 2013

More information

Pushmeet Kohli Microsoft Research

Pushmeet Kohli Microsoft Research Pushmeet Kohli Microsoft Research E(x) x in {0,1} n Image (D) [Boykov and Jolly 01] [Blake et al. 04] E(x) = c i x i Pixel Colour x in {0,1} n Unary Cost (c i ) Dark (Bg) Bright (Fg) x* = arg min E(x)

More information

Accelerated dual decomposition for MAP inference

Accelerated dual decomposition for MAP inference Vladimir Jojic vjojic@csstanfordedu Stephen Gould sgould@stanfordedu Daphne Koller koller@csstanfordedu Computer Science Department, 353 Serra Mall, Stanford University, Stanford, CA 94305, USA Abstract

More information

A Combined LP and QP Relaxation for MAP

A Combined LP and QP Relaxation for MAP A Combined LP and QP Relaxation for MAP Patrick Pletscher ETH Zurich, Switzerland pletscher@inf.ethz.ch Sharon Wulff ETH Zurich, Switzerland sharon.wulff@inf.ethz.ch Abstract MAP inference for general

More information

Accelerated dual decomposition for MAP inference

Accelerated dual decomposition for MAP inference Vladimir Jojic vjojic@csstanfordedu Stephen Gould sgould@stanfordedu Daphne Koller koller@csstanfordedu Computer Science Department, 353 Serra Mall, Stanford University, Stanford, CA 94305, USA Abstract

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models David Sontag New York University Lecture 6, March 7, 2013 David Sontag (NYU) Graphical Models Lecture 6, March 7, 2013 1 / 25 Today s lecture 1 Dual decomposition 2 MAP inference

More information

How to Compute Primal Solution from Dual One in LP Relaxation of MAP Inference in MRF?

How to Compute Primal Solution from Dual One in LP Relaxation of MAP Inference in MRF? How to Compute Primal Solution from Dual One in LP Relaxation of MAP Inference in MRF? Tomáš Werner Center for Machine Perception, Czech Technical University, Prague Abstract In LP relaxation of MAP inference

More information

Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passing

Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passing Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passing Huayan Wang huayanw@cs.stanford.edu Computer Science Department, Stanford University, Palo Alto, CA 90 USA Daphne Koller koller@cs.stanford.edu

More information

Optimization for Communications and Networks. Poompat Saengudomlert. Session 4 Duality and Lagrange Multipliers

Optimization for Communications and Networks. Poompat Saengudomlert. Session 4 Duality and Lagrange Multipliers Optimization for Communications and Networks Poompat Saengudomlert Session 4 Duality and Lagrange Multipliers P Saengudomlert (2015) Optimization Session 4 1 / 14 24 Dual Problems Consider a primal convex

More information

Tighter Relaxations for MAP-MRF Inference: A Local Primal-Dual Gap based Separation Algorithm

Tighter Relaxations for MAP-MRF Inference: A Local Primal-Dual Gap based Separation Algorithm Tighter Relaxations for MAP-MRF Inference: A Local Primal-Dual Gap based Separation Algorithm Dhruv Batra Sebastian Nowozin Pushmeet Kohli TTI-Chicago Microsoft Research Cambridge Microsoft Research Cambridge

More information

Dual Decomposition for Marginal Inference

Dual Decomposition for Marginal Inference Dual Decomposition for Marginal Inference Justin Domke Rochester Institute of echnology Rochester, NY 14623 Abstract We present a dual decomposition approach to the treereweighted belief propagation objective.

More information

A note on the primal-dual method for the semi-metric labeling problem

A note on the primal-dual method for the semi-metric labeling problem A note on the primal-dual method for the semi-metric labeling problem Vladimir Kolmogorov vnk@adastral.ucl.ac.uk Technical report June 4, 2007 Abstract Recently, Komodakis et al. [6] developed the FastPD

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

Tighter Relaxations for MAP-MRF Inference: A Local Primal-Dual Gap based Separation Algorithm

Tighter Relaxations for MAP-MRF Inference: A Local Primal-Dual Gap based Separation Algorithm Tighter Relaxations for MAP-MRF Inference: A Local Primal-Dual Gap based Separation Algorithm Dhruv Batra Sebastian Nowozin Pushmeet Kohli TTI-Chicago Microsoft Research Cambridge Microsoft Research Cambridge

More information

BBM402-Lecture 20: LP Duality

BBM402-Lecture 20: LP Duality BBM402-Lecture 20: LP Duality Lecturer: Lale Özkahya Resources for the presentation: https://courses.engr.illinois.edu/cs473/fa2016/lectures.html An easy LP? which is compact form for max cx subject to

More information

Making the Right Moves: Guiding Alpha-Expansion using Local Primal-Dual Gaps

Making the Right Moves: Guiding Alpha-Expansion using Local Primal-Dual Gaps Making the Right Moves: Guiding Alpha-Expansion using Local Primal-Dual Gaps Dhruv Batra TTI Chicago dbatra@ttic.edu Pushmeet Kohli Microsoft Research Cambridge pkohli@microsoft.com Abstract 5 This paper

More information

Markov Random Fields for Computer Vision (Part 1)

Markov Random Fields for Computer Vision (Part 1) Markov Random Fields for Computer Vision (Part 1) Machine Learning Summer School (MLSS 2011) Stephen Gould stephen.gould@anu.edu.au Australian National University 13 17 June, 2011 Stephen Gould 1/23 Pixel

More information

- Well-characterized problems, min-max relations, approximate certificates. - LP problems in the standard form, primal and dual linear programs

- Well-characterized problems, min-max relations, approximate certificates. - LP problems in the standard form, primal and dual linear programs LP-Duality ( Approximation Algorithms by V. Vazirani, Chapter 12) - Well-characterized problems, min-max relations, approximate certificates - LP problems in the standard form, primal and dual linear programs

More information

Undirected Graphical Models: Markov Random Fields

Undirected Graphical Models: Markov Random Fields Undirected Graphical Models: Markov Random Fields 40-956 Advanced Topics in AI: Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2015 Markov Random Field Structure: undirected

More information

CSE 254: MAP estimation via agreement on (hyper)trees: Message-passing and linear programming approaches

CSE 254: MAP estimation via agreement on (hyper)trees: Message-passing and linear programming approaches CSE 254: MAP estimation via agreement on (hyper)trees: Message-passing and linear programming approaches A presentation by Evan Ettinger November 11, 2005 Outline Introduction Motivation and Background

More information

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness. CS/ECE/ISyE 524 Introduction to Optimization Spring 2016 17 14. Duality ˆ Upper and lower bounds ˆ General duality ˆ Constraint qualifications ˆ Counterexample ˆ Complementary slackness ˆ Examples ˆ Sensitivity

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Maria Ryskina, Yen-Chia Hsu 1 Introduction

More information

Markov Random Fields

Markov Random Fields Markov Random Fields Umamahesh Srinivas ipal Group Meeting February 25, 2011 Outline 1 Basic graph-theoretic concepts 2 Markov chain 3 Markov random field (MRF) 4 Gauss-Markov random field (GMRF), and

More information

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah Introduction to Graphical Models Srikumar Ramalingam School of Computing University of Utah Reference Christopher M. Bishop, Pattern Recognition and Machine Learning, Jonathan S. Yedidia, William T. Freeman,

More information

Generalized Roof Duality for Pseudo-Boolean Optimization

Generalized Roof Duality for Pseudo-Boolean Optimization Generalized Roof Duality for Pseudo-Boolean Optimization Fredrik Kahl Petter Strandmark Centre for Mathematical Sciences, Lund University, Sweden {fredrik,petter}@maths.lth.se Abstract The number of applications

More information

Decision Procedures An Algorithmic Point of View

Decision Procedures An Algorithmic Point of View An Algorithmic Point of View ILP References: Integer Programming / Laurence Wolsey Deciding ILPs with Branch & Bound Intro. To mathematical programming / Hillier, Lieberman Daniel Kroening and Ofer Strichman

More information

Undirected graphical models

Undirected graphical models Undirected graphical models Semantics of probabilistic models over undirected graphs Parameters of undirected models Example applications COMP-652 and ECSE-608, February 16, 2017 1 Undirected graphical

More information

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Summary of Class Advanced Topics Dhruv Batra Virginia Tech HW1 Grades Mean: 28.5/38 ~= 74.9%

More information

Rounding-based Moves for Semi-Metric Labeling

Rounding-based Moves for Semi-Metric Labeling Rounding-based Moves for Semi-Metric Labeling M. Pawan Kumar, Puneet K. Dokania To cite this version: M. Pawan Kumar, Puneet K. Dokania. Rounding-based Moves for Semi-Metric Labeling. Journal of Machine

More information

THE topic of this paper is the following problem:

THE topic of this paper is the following problem: 1 Revisiting the Linear Programming Relaxation Approach to Gibbs Energy Minimization and Weighted Constraint Satisfaction Tomáš Werner Abstract We present a number of contributions to the LP relaxation

More information

On Partial Optimality in Multi-label MRFs

On Partial Optimality in Multi-label MRFs Pushmeet Kohli 1 Alexander Shekhovtsov 2 Carsten Rother 1 Vladimir Kolmogorov 3 Philip Torr 4 pkohli@microsoft.com shekhovt@cmp.felk.cvut.cz carrot@microsoft.com vnk@adastral.ucl.ac.uk philiptorr@brookes.ac.uk

More information

Probabilistic Graphical Models & Applications

Probabilistic Graphical Models & Applications Probabilistic Graphical Models & Applications Learning of Graphical Models Bjoern Andres and Bernt Schiele Max Planck Institute for Informatics The slides of today s lecture are authored by and shown with

More information

Intelligent Systems:

Intelligent Systems: Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM Roadmap for next two lectures Definition

More information

Learning with Structured Inputs and Outputs

Learning with Structured Inputs and Outputs Learning with Structured Inputs and Outputs Christoph H. Lampert IST Austria (Institute of Science and Technology Austria), Vienna Microsoft Machine Learning and Intelligence School 2015 July 29-August

More information

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah Introduction to Graphical Models Srikumar Ramalingam School of Computing University of Utah Reference Christopher M. Bishop, Pattern Recognition and Machine Learning, Jonathan S. Yedidia, William T. Freeman,

More information

6 Markov Chain Monte Carlo (MCMC)

6 Markov Chain Monte Carlo (MCMC) 6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution

More information

Submodularity in Machine Learning

Submodularity in Machine Learning Saifuddin Syed MLRG Summer 2016 1 / 39 What are submodular functions Outline 1 What are submodular functions Motivation Submodularity and Concavity Examples 2 Properties of submodular functions Submodularity

More information

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling Professor Erik Sudderth Brown University Computer Science October 27, 2016 Some figures and materials courtesy

More information

Lagrangean relaxation

Lagrangean relaxation Lagrangean relaxation Giovanni Righini Corso di Complementi di Ricerca Operativa Joseph Louis de la Grange (Torino 1736 - Paris 1813) Relaxations Given a problem P, such as: minimize z P (x) s.t. x X P

More information

Junction Tree, BP and Variational Methods

Junction Tree, BP and Variational Methods Junction Tree, BP and Variational Methods Adrian Weller MLSALT4 Lecture Feb 21, 2018 With thanks to David Sontag (MIT) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,

More information

On Partial Optimality in Multi-label MRFs

On Partial Optimality in Multi-label MRFs On Partial Optimality in Multi-label MRFs P. Kohli 1 A. Shekhovtsov 2 C. Rother 1 V. Kolmogorov 3 P. Torr 4 1 Microsoft Research Cambridge 2 Czech Technical University in Prague 3 University College London

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2014 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Yu-Hsin Kuo, Amos Ng 1 Introduction Last lecture

More information

Higher-Order Clique Reduction Without Auxiliary Variables

Higher-Order Clique Reduction Without Auxiliary Variables Higher-Order Clique Reduction Without Auxiliary Variables Hiroshi Ishikawa Department of Computer Science and Engineering Waseda University Okubo 3-4-1, Shinjuku, Tokyo, Japan hfs@waseda.jp Abstract We

More information

Tightness of LP Relaxations for Almost Balanced Models

Tightness of LP Relaxations for Almost Balanced Models Tightness of LP Relaxations for Almost Balanced Models Adrian Weller University of Cambridge AISTATS May 10, 2016 Joint work with Mark Rowland and David Sontag For more information, see http://mlg.eng.cam.ac.uk/adrian/

More information

CSC 412 (Lecture 4): Undirected Graphical Models

CSC 412 (Lecture 4): Undirected Graphical Models CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:

More information

Markov Random Fields and Bayesian Image Analysis. Wei Liu Advisor: Tom Fletcher

Markov Random Fields and Bayesian Image Analysis. Wei Liu Advisor: Tom Fletcher Markov Random Fields and Bayesian Image Analysis Wei Liu Advisor: Tom Fletcher 1 Markov Random Field: Application Overview Awate and Whitaker 2006 2 Markov Random Field: Application Overview 3 Markov Random

More information

UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS

UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS JONATHAN YEDIDIA, WILLIAM FREEMAN, YAIR WEISS 2001 MERL TECH REPORT Kristin Branson and Ian Fasel June 11, 2003 1. Inference Inference problems

More information

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Michael Patriksson 0-0 The Relaxation Theorem 1 Problem: find f := infimum f(x), x subject to x S, (1a) (1b) where f : R n R

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

3.10 Lagrangian relaxation

3.10 Lagrangian relaxation 3.10 Lagrangian relaxation Consider a generic ILP problem min {c t x : Ax b, Dx d, x Z n } with integer coefficients. Suppose Dx d are the complicating constraints. Often the linear relaxation and the

More information

MVE165/MMG631 Linear and integer optimization with applications Lecture 8 Discrete optimization: theory and algorithms

MVE165/MMG631 Linear and integer optimization with applications Lecture 8 Discrete optimization: theory and algorithms MVE165/MMG631 Linear and integer optimization with applications Lecture 8 Discrete optimization: theory and algorithms Ann-Brith Strömberg 2017 04 07 Lecture 8 Linear and integer optimization with applications

More information

Energy minimization via graph-cuts

Energy minimization via graph-cuts Energy minimization via graph-cuts Nikos Komodakis Ecole des Ponts ParisTech, LIGM Traitement de l information et vision artificielle Binary energy minimization We will first consider binary MRFs: Graph

More information

Structured Prediction

Structured Prediction Structured Prediction Classification Algorithms Classify objects x X into labels y Y First there was binary: Y = {0, 1} Then multiclass: Y = {1,...,6} The next generation: Structured Labels Structured

More information

Max$Sum(( Exact(Inference(

Max$Sum(( Exact(Inference( Probabilis7c( Graphical( Models( Inference( MAP( Max$Sum(( Exact(Inference( Product Summation a 1 b 1 8 a 1 b 2 1 a 2 b 1 0.5 a 2 b 2 2 a 1 b 1 3 a 1 b 2 0 a 2 b 1-1 a 2 b 2 1 Max-Sum Elimination in Chains

More information

A Framework for Efficient Structured Max-Margin Learning of High-Order MRF Models

A Framework for Efficient Structured Max-Margin Learning of High-Order MRF Models 1 A Framework for Efficient Structured Max-Margin Learning of High-Order MRF Models Nikos Komodakis, Bo Xiang, Nikos Paragios Abstract We present a very general algorithm for structured prediction learning

More information

Section Notes 9. Midterm 2 Review. Applied Math / Engineering Sciences 121. Week of December 3, 2018

Section Notes 9. Midterm 2 Review. Applied Math / Engineering Sciences 121. Week of December 3, 2018 Section Notes 9 Midterm 2 Review Applied Math / Engineering Sciences 121 Week of December 3, 2018 The following list of topics is an overview of the material that was covered in the lectures and sections

More information

Maximum Persistency via Iterative Relaxed Inference with Graphical Models

Maximum Persistency via Iterative Relaxed Inference with Graphical Models Maximum Persistency via Iterative Relaxed Inference with Graphical Models Alexander Shekhovtsov TU Graz, Austria shekhovtsov@icg.tugraz.at Paul Swoboda Heidelberg University, Germany swoboda@math.uni-heidelberg.de

More information

Distributed Optimization. Song Chong EE, KAIST

Distributed Optimization. Song Chong EE, KAIST Distributed Optimization Song Chong EE, KAIST songchong@kaist.edu Dynamic Programming for Path Planning A path-planning problem consists of a weighted directed graph with a set of n nodes N, directed links

More information

Does Better Inference mean Better Learning?

Does Better Inference mean Better Learning? Does Better Inference mean Better Learning? Andrew E. Gelfand, Rina Dechter & Alexander Ihler Department of Computer Science University of California, Irvine {agelfand,dechter,ihler}@ics.uci.edu Abstract

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Tightening Fractional Covering Upper Bounds on the Partition Function for High-Order Region Graphs

Tightening Fractional Covering Upper Bounds on the Partition Function for High-Order Region Graphs Tightening Fractional Covering Upper Bounds on the Partition Function for High-Order Region Graphs Tamir Hazan TTI Chicago Chicago, IL 60637 Jian Peng TTI Chicago Chicago, IL 60637 Amnon Shashua The Hebrew

More information

Machine Learning And Applications: Supervised Learning-SVM

Machine Learning And Applications: Supervised Learning-SVM Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine

More information

Dual Decomposition for Natural Language Processing. Decoding complexity

Dual Decomposition for Natural Language Processing. Decoding complexity Dual Decomposition for atural Language Processing Alexander M. Rush and Michael Collins Decoding complexity focus: decoding problem for natural language tasks motivation: y = arg max y f (y) richer model

More information

MAP Examples. Sargur Srihari

MAP Examples. Sargur Srihari MAP Examples Sargur srihari@cedar.buffalo.edu 1 Potts Model CRF for OCR Topics Image segmentation based on energy minimization 2 Examples of MAP Many interesting examples of MAP inference are instances

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

Introduction to the Tensor Train Decomposition and Its Applications in Machine Learning

Introduction to the Tensor Train Decomposition and Its Applications in Machine Learning Introduction to the Tensor Train Decomposition and Its Applications in Machine Learning Anton Rodomanov Higher School of Economics, Russia Bayesian methods research group (http://bayesgroup.ru) 14 March

More information

Integer Programming for Bayesian Network Structure Learning

Integer Programming for Bayesian Network Structure Learning Integer Programming for Bayesian Network Structure Learning James Cussens Helsinki, 2013-04-09 James Cussens IP for BNs Helsinki, 2013-04-09 1 / 20 Linear programming The Belgian diet problem Fat Sugar

More information

Integer Programming ISE 418. Lecture 8. Dr. Ted Ralphs

Integer Programming ISE 418. Lecture 8. Dr. Ted Ralphs Integer Programming ISE 418 Lecture 8 Dr. Ted Ralphs ISE 418 Lecture 8 1 Reading for This Lecture Wolsey Chapter 2 Nemhauser and Wolsey Sections II.3.1, II.3.6, II.4.1, II.4.2, II.5.4 Duality for Mixed-Integer

More information

Probabilistic and Bayesian Machine Learning

Probabilistic and Bayesian Machine Learning Probabilistic and Bayesian Machine Learning Day 4: Expectation and Belief Propagation Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/

More information

A weighted Mirror Descent algorithm for nonsmooth convex optimization problem

A weighted Mirror Descent algorithm for nonsmooth convex optimization problem Noname manuscript No. (will be inserted by the editor) A weighted Mirror Descent algorithm for nonsmooth convex optimization problem Duy V.N. Luong Panos Parpas Daniel Rueckert Berç Rustem Received: date

More information

Introduction to Mathematical Programming IE406. Lecture 10. Dr. Ted Ralphs

Introduction to Mathematical Programming IE406. Lecture 10. Dr. Ted Ralphs Introduction to Mathematical Programming IE406 Lecture 10 Dr. Ted Ralphs IE406 Lecture 10 1 Reading for This Lecture Bertsimas 4.1-4.3 IE406 Lecture 10 2 Duality Theory: Motivation Consider the following

More information

Linear programming II

Linear programming II Linear programming II Review: LP problem 1/33 The standard form of LP problem is (primal problem): max z = cx s.t. Ax b, x 0 The corresponding dual problem is: min b T y s.t. A T y c T, y 0 Strong Duality

More information

Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract)

Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract) Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract) Arthur Choi and Adnan Darwiche Computer Science Department University of California, Los Angeles

More information

Decision Tree Fields

Decision Tree Fields Sebastian Nowozin, arsten Rother, Shai agon, Toby Sharp, angpeng Yao, Pushmeet Kohli arcelona, 8th November 2011 Introduction Random Fields in omputer Vision Markov Random Fields (MRF) (Kindermann and

More information

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities Duality Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities Lagrangian Consider the optimization problem in standard form

More information

Efficient Energy Maximization Using Smoothing Technique

Efficient Energy Maximization Using Smoothing Technique 1/35 Efficient Energy Maximization Using Smoothing Technique Bogdan Savchynskyy Heidelberg Collaboratory for Image Processing (HCI) University of Heidelberg 2/35 Energy Maximization Problem G pv, Eq, v

More information

Cutting Plane Training of Structural SVM

Cutting Plane Training of Structural SVM Cutting Plane Training of Structural SVM Seth Neel University of Pennsylvania sethneel@wharton.upenn.edu September 28, 2017 Seth Neel (Penn) Short title September 28, 2017 1 / 33 Overview Structural SVMs

More information

Advances in Bayesian Network Learning using Integer Programming

Advances in Bayesian Network Learning using Integer Programming Advances in Bayesian Network Learning using Integer Programming Mark Bartlett and James Cussens UAI-13, 2013-07-12 Supported by the UK Medical Research Council (Project Grant G1002312) Mark Bartlett and

More information

Lagrangian Relaxation in MIP

Lagrangian Relaxation in MIP Lagrangian Relaxation in MIP Bernard Gendron May 28, 2016 Master Class on Decomposition, CPAIOR2016, Banff, Canada CIRRELT and Département d informatique et de recherche opérationnelle, Université de Montréal,

More information

15-780: LinearProgramming

15-780: LinearProgramming 15-780: LinearProgramming J. Zico Kolter February 1-3, 2016 1 Outline Introduction Some linear algebra review Linear programming Simplex algorithm Duality and dual simplex 2 Outline Introduction Some linear

More information

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo Group Prof. Daniel Cremers 11. Sampling Methods: Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative

More information

Markov Chains and MCMC

Markov Chains and MCMC Markov Chains and MCMC Markov chains Let S = {1, 2,..., N} be a finite set consisting of N states. A Markov chain Y 0, Y 1, Y 2,... is a sequence of random variables, with Y t S for all points in time

More information

Lagrange Relaxation: Introduction and Applications

Lagrange Relaxation: Introduction and Applications 1 / 23 Lagrange Relaxation: Introduction and Applications Operations Research Anthony Papavasiliou 2 / 23 Contents 1 Context 2 Applications Application in Stochastic Programming Unit Commitment 3 / 23

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

More information

Integer Programming for Bayesian Network Structure Learning

Integer Programming for Bayesian Network Structure Learning Integer Programming for Bayesian Network Structure Learning James Cussens Prague, 2013-09-02 Supported by the UK Medical Research Council (Project Grant G1002312) James Cussens IP for BNs Prague, 2013-09-02

More information

(P ) Minimize 4x 1 + 6x 2 + 5x 3 s.t. 2x 1 3x 3 3 3x 2 2x 3 6

(P ) Minimize 4x 1 + 6x 2 + 5x 3 s.t. 2x 1 3x 3 3 3x 2 2x 3 6 The exam is three hours long and consists of 4 exercises. The exam is graded on a scale 0-25 points, and the points assigned to each question are indicated in parenthesis within the text. Problem 1 Consider

More information

DEPARTMENT OF STATISTICS AND OPERATIONS RESEARCH OPERATIONS RESEARCH DETERMINISTIC QUALIFYING EXAMINATION. Part I: Short Questions

DEPARTMENT OF STATISTICS AND OPERATIONS RESEARCH OPERATIONS RESEARCH DETERMINISTIC QUALIFYING EXAMINATION. Part I: Short Questions DEPARTMENT OF STATISTICS AND OPERATIONS RESEARCH OPERATIONS RESEARCH DETERMINISTIC QUALIFYING EXAMINATION Part I: Short Questions August 12, 2008 9:00 am - 12 pm General Instructions This examination is

More information

Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials

Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials Philipp Krähenbühl and Vladlen Koltun Stanford University Presenter: Yuan-Ting Hu 1 Conditional Random Field (CRF) E x I = φ u

More information

Graph Cut based Inference with Co-occurrence Statistics. Ľubor Ladický, Chris Russell, Pushmeet Kohli, Philip Torr

Graph Cut based Inference with Co-occurrence Statistics. Ľubor Ladický, Chris Russell, Pushmeet Kohli, Philip Torr Graph Cut based Inference with Co-occurrence Statistics Ľubor Ladický, Chris Russell, Pushmeet Kohli, Philip Torr Image labelling Problems Assign a label to each image pixel Geometry Estimation Image Denoising

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3

More information

Convergent Tree-reweighted Message Passing for Energy Minimization

Convergent Tree-reweighted Message Passing for Energy Minimization Accepted to IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2006. Convergent Tree-reweighted Message Passing for Energy Minimization Vladimir Kolmogorov University College London,

More information

Convex Optimization and Modeling

Convex Optimization and Modeling Convex Optimization and Modeling Duality Theory and Optimality Conditions 5th lecture, 12.05.2010 Jun.-Prof. Matthias Hein Program of today/next lecture Lagrangian and duality: the Lagrangian the dual

More information

Lecture 9: PGM Learning

Lecture 9: PGM Learning 13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and

More information

Network Flows. 6. Lagrangian Relaxation. Programming. Fall 2010 Instructor: Dr. Masoud Yaghini

Network Flows. 6. Lagrangian Relaxation. Programming. Fall 2010 Instructor: Dr. Masoud Yaghini In the name of God Network Flows 6. Lagrangian Relaxation 6.3 Lagrangian Relaxation and Integer Programming Fall 2010 Instructor: Dr. Masoud Yaghini Integer Programming Outline Branch-and-Bound Technique

More information

A new look at reweighted message passing

A new look at reweighted message passing Copyright IEEE. Accepted to Transactions on Pattern Analysis and Machine Intelligence (TPAMI) http://ieeexplore.ieee.org/xpl/articledetails.jsp?arnumber=692686 A new look at reweighted message passing

More information