Semi-Markov/Graph Cuts

Size: px

Start display at page:

Download "Semi-Markov/Graph Cuts"

Herbert Flowers
5 years ago
Views:

1 Semi-Markov/Graph Cuts Alireza Shafaei University of British Columbia August, / 30

2 A Quick Review For a general chain-structured UGM we have: n n p(x 1, x 2,..., x n ) φ i (x i ) φ i,i 1 (x i, x i 1 ), i=1 i=2 X1 X2 X3 X4 X5 X6 X7 In this case we only have local Markov property, x i x 1,..., x i 2, x i+2,..., x n x i 1, x i+1, 2 / 30

3 A Quick Review For a general chain-structured UGM we have: n n p(x 1, x 2,..., x n ) φ i (x i ) φ i,i 1 (x i, x i 1 ), i=1 i=2 X1 X2 X3 X4 X5 X6 X7 In this case we only have local Markov property, x i x 1,..., x i 2, x i+2,..., x n x i 1, x i+1, Local Markov property in general UGMs: given neighbours, conditional independence of other nodes. (Marginal independence corresponds to reachability.) 2 / 30

4 A Quick Review For chain-structured UGMs we learned the Viterbi decoding algorithm. Forward phase: V 1,s = φ 1(s), V i,s = max s {φ i(s)φ i,i 1(s, s )V i 1,s }, 3 / 30

5 A Quick Review For chain-structured UGMs we learned the Viterbi decoding algorithm. Forward phase: V 1,s = φ 1(s), V i,s = max s {φ i(s)φ i,i 1(s, s )V i 1,s }, Backward phase: backtrack through argmax values. Solves the decoding problem in O(ns 2 ) instead of O(s n ). 3 / 30

6 A Quick Review For inference in chain-structured UGMs we learned the forward-backward algorithm. 4 / 30

7 A Quick Review For inference in chain-structured UGMs we learned the forward-backward algorithm. 1 Forward phase (sums up paths from the beginning): V 1,s = φ 1(s), V i,s = s φ i(s)φ i,i 1(s, s )V i 1,s, Z = s V n,s. 4 / 30

8 A Quick Review For inference in chain-structured UGMs we learned the forward-backward algorithm. 1 Forward phase (sums up paths from the beginning): V 1,s = φ 1(s), V i,s = s φ i(s)φ i,i 1(s, s )V i 1,s, Z = s V n,s. 2 Backward phase: (sums up paths to the end): B n,s = 1, B i,s = s φ i+1(s )φ i+1,i(s, s)b i+1,s. 4 / 30

9 A Quick Review For inference in chain-structured UGMs we learned the forward-backward algorithm. 1 Forward phase (sums up paths from the beginning): V 1,s = φ 1(s), V i,s = s φ i(s)φ i,i 1(s, s )V i 1,s, Z = s V n,s. 2 Backward phase: (sums up paths to the end): B n,s = 1, B i,s = s φ i+1(s )φ i+1,i(s, s)b i+1,s. 3 Marginals are given by p(x i = s) V i,sb i,s. 4 / 30

10 A Quick Review For chain-structured UGMs we learned the forward-backward and Viterbi algorithm. The same idea was generalized for tree-structured UGMs. 5 / 30

11 A Quick Review For chain-structured UGMs we learned the forward-backward and Viterbi algorithm. The same idea was generalized for tree-structured UGMs. For graphs with small cutset we learned the Cutset Conditioning method. 5 / 30

12 A Quick Review For chain-structured UGMs we learned the forward-backward and Viterbi algorithm. The same idea was generalized for tree-structured UGMs. For graphs with small cutset we learned the Cutset Conditioning method. For graphs with small tree-width we learned the Junction Tree method. 5 / 30

13 A Quick Review For chain-structured UGMs we learned the forward-backward and Viterbi algorithm. The same idea was generalized for tree-structured UGMs. For graphs with small cutset we learned the Cutset Conditioning method. For graphs with small tree-width we learned the Junction Tree method. Two more group of problems that we can deal with exactly in polynomial time. 5 / 30

14 A Quick Review For chain-structured UGMs we learned the forward-backward and Viterbi algorithm. The same idea was generalized for tree-structured UGMs. For graphs with small cutset we learned the Cutset Conditioning method. For graphs with small tree-width we learned the Junction Tree method. Two more group of problems that we can deal with exactly in polynomial time. Semi-Markov chain-structured UGMs. 5 / 30

15 A Quick Review For chain-structured UGMs we learned the forward-backward and Viterbi algorithm. The same idea was generalized for tree-structured UGMs. For graphs with small cutset we learned the Cutset Conditioning method. For graphs with small tree-width we learned the Junction Tree method. Two more group of problems that we can deal with exactly in polynomial time. Semi-Markov chain-structured UGMs. Binary and attractive state UGMs. 5 / 30

16 Semi-Markov chain-structured UGMs Local Markov property in general chain-structured UGMs: Given neighbours, we have conditional independence of other nodes. 6 / 30

17 Semi-Markov chain-structured UGMs Local Markov property in general chain-structured UGMs: Given neighbours, we have conditional independence of other nodes. In Semi-Markov chain-structured models: Given neighbours and their lengths, we have conditional independence of other nodes. 6 / 30

18 Semi-Markov chain-structured UGMs Local Markov property in general chain-structured UGMs: Given neighbours, we have conditional independence of other nodes. In Semi-Markov chain-structured models: Given neighbours and their lengths, we have conditional independence of other nodes. A subsequence of nodes can have the same state. You can encourage smoothness. 6 / 30

19 Semi-Markov chain-structured UGMs Local Markov property in general chain-structured UGMs: Given neighbours, we have conditional independence of other nodes. In Semi-Markov chain-structured models: Given neighbours and their lengths, we have conditional independence of other nodes. A subsequence of nodes can have the same state. You can encourage smoothness. Useful when you wish to keep track of how long you have been staying on the same state. 6 / 30

20 Semi-Markov chain-structured UGMs Previously, the potential of each edge was a function of neighboring vertices φ i,i 1 (s, s ). 7 / 30

21 Semi-Markov chain-structured UGMs Previously, the potential of each edge was a function of neighboring vertices φ i,i 1 (s, s ). In Semi-Markov chain-structured models we define the potential as φ i,i 1 (s, s, l) 7 / 30

22 Semi-Markov chain-structured UGMs Previously, the potential of each edge was a function of neighboring vertices φ i,i 1 (s, s ). In Semi-Markov chain-structured models we define the potential as φ i,i 1 (s, s, l) The potential of making a transition from s to s after l steps. 7 / 30

23 Semi-Markov chain-structured UGMs Previously, the potential of each edge was a function of neighboring vertices φ i,i 1 (s, s ). In Semi-Markov chain-structured models we define the potential as φ i,i 1 (s, s, l) The potential of making a transition from s to s after l steps. You can encourage staying in certain states for a period of time. 7 / 30

24 Decoding the Semi-Markov chain-structured UGMs Let us look at the Viterbi decoding again: V 1,s = φ 1 (s), V i,s = φ i (s) max s {φ i,i 1 (s, s ) V i 1,s }, 8 / 30

25 Decoding the Semi-Markov chain-structured UGMs Let us look at the Viterbi decoding again: V 1,s = φ 1 (s), V i,s = φ i (s) max s {φ i,i 1 (s, s ) V i 1,s }, How can we update the formula to solve Semi-Markov chain structures? 8 / 30

26 Decoding the Semi-Markov chain-structured UGMs Let us look at the Viterbi decoding again: V 1,s = φ 1 (s), V i,s = φ i (s) max s {φ i,i 1 (s, s ) V i 1,s }, How can we update the formula to solve Semi-Markov chain structures? V 1,s = φ 1 (s), V i,s = φ i (s) max s,l {φ i,i 1(s, s, l) V i l,s }, 8 / 30

27 Decoding the Semi-Markov chain-structured UGMs Let us look at the Viterbi decoding again: V 1,s = φ 1 (s), V i,s = φ i (s) max s {φ i,i 1 (s, s ) V i 1,s }, How can we update the formula to solve Semi-Markov chain structures? V 1,s = φ 1 (s), V i,s = φ i (s) max s,l {φ i,i 1(s, s, l) V i l,s }, Depending on the application we can bound the maximum possible value of l to be L. 8 / 30

28 Decoding the Semi-Markov chain-structured UGMs Let us look at the Viterbi decoding again: V 1,s = φ 1 (s), V i,s = φ i (s) max s {φ i,i 1 (s, s ) V i 1,s }, How can we update the formula to solve Semi-Markov chain structures? V 1,s = φ 1 (s), V i,s = φ i (s) max s,l {φ i,i 1(s, s, l) V i l,s }, Depending on the application we can bound the maximum possible value of l to be L. For the unbounded case, L is simply n, the total length of chain. 8 / 30

29 Decoding the Semi-Markov chain-structured UGMs Let us look at the Viterbi decoding again: V 1,s = φ 1 (s), V i,s = φ i (s) max s {φ i,i 1 (s, s ) V i 1,s }, How can we update the formula to solve Semi-Markov chain structures? V 1,s = φ 1 (s), V i,s = φ i (s) max s,l {φ i,i 1(s, s, l) V i l,s }, Depending on the application we can bound the maximum possible value of l to be L. For the unbounded case, L is simply n, the total length of chain. Note that it is different from having an order-l Markov chain (why?). 8 / 30

30 Inference in the Semi-Markov chain-structured UGMs Forward-backward algorithm for the Semi-Markov models: Forward phase (sums up paths from the beginning): V 1,s = φ 1(s), V i,s = φ i(s) φ i,i 1(s, s, l)v i l,s, Z = s s,l V n,s. 9 / 30

31 Inference in the Semi-Markov chain-structured UGMs Forward-backward algorithm for the Semi-Markov models: Forward phase (sums up paths from the beginning): V 1,s = φ 1(s), V i,s = φ i(s) φ i,i 1(s, s, l)v i l,s, Z = s s,l Backward phase: (sums up paths to the end): B n,s = 1, B i,s = φ i+1(s )φ i+1,i(s, s, l)b i+l,s. s,l V n,s. 9 / 30

32 Inference in the Semi-Markov chain-structured UGMs Forward-backward algorithm for the Semi-Markov models: Forward phase (sums up paths from the beginning): V 1,s = φ 1(s), V i,s = φ i(s) φ i,i 1(s, s, l)v i l,s, Z = s s,l Backward phase: (sums up paths to the end): B n,s = 1, B i,s = φ i+1(s )φ i+1,i(s, s, l)b i+l,s. s,l V n,s. Marginals are given by p(x i = s) V i,sb i,s. 9 / 30

33 Inference in the Semi-Markov chain-structured UGMs Forward-backward algorithm for the Semi-Markov models: Forward phase (sums up paths from the beginning): V 1,s = φ 1(s), V i,s = φ i(s) φ i,i 1(s, s, l)v i l,s, Z = s s,l Backward phase: (sums up paths to the end): B n,s = 1, B i,s = φ i+1(s )φ i+1,i(s, s, l)b i+l,s. s,l V n,s. Marginals are given by p(x i = s) V i,sb i,s. Questions? 9 / 30

34 Solving with graph cuts Restricting the structure of graph is just one way to simplify our tasks. 10 / 30

35 Solving with graph cuts Restricting the structure of graph is just one way to simplify our tasks. We can also restrict the potentials. 10 / 30

36 Solving with graph cuts Restricting the structure of graph is just one way to simplify our tasks. We can also restrict the potentials. Here we look at a group of pairwise UGMs with the following restrictions: 10 / 30

37 Solving with graph cuts Restricting the structure of graph is just one way to simplify our tasks. We can also restrict the potentials. Here we look at a group of pairwise UGMs with the following restrictions: 1 Binary variables. 10 / 30

38 Solving with graph cuts Restricting the structure of graph is just one way to simplify our tasks. We can also restrict the potentials. Here we look at a group of pairwise UGMs with the following restrictions: 1 Binary variables. 2 Pairwise potential makes a submodular problem. 10 / 30

39 Solving with graph cuts Restricting the structure of graph is just one way to simplify our tasks. We can also restrict the potentials. Here we look at a group of pairwise UGMs with the following restrictions: 1 Binary variables. 2 Pairwise potential makes a submodular problem. Can be decoded by reformulation as a Max-Flow problem. 10 / 30

40 Solving with graph cuts Restricting the structure of graph is just one way to simplify our tasks. We can also restrict the potentials. Here we look at a group of pairwise UGMs with the following restrictions: 1 Binary variables. 2 Pairwise potential makes a submodular problem. Can be decoded by reformulation as a Max-Flow problem. Can be generalized to non-binary cases. 10 / 30

41 Solving with graph cuts Restricting the structure of graph is just one way to simplify our tasks. We can also restrict the potentials. Here we look at a group of pairwise UGMs with the following restrictions: 1 Binary variables. 2 Pairwise potential makes a submodular problem. Can be decoded by reformulation as a Max-Flow problem. Can be generalized to non-binary cases. Can be 2-approximated when not submodular (under a different constraint). 10 / 30

42 Solving with graph cuts Restricting the structure of graph is just one way to simplify our tasks. We can also restrict the potentials. Here we look at a group of pairwise UGMs with the following restrictions: 1 Binary variables. 2 Pairwise potential makes a submodular problem. Can be decoded by reformulation as a Max-Flow problem. Can be generalized to non-binary cases. Can be 2-approximated when not submodular (under a different constraint). In the general case it is known to be NP-Hard. 10 / 30

43 Solving with graph cuts Restricting the structure of graph is just one way to simplify our tasks. We can also restrict the potentials. Here we look at a group of pairwise UGMs with the following restrictions: 1 Binary variables. 2 Pairwise potential makes a submodular problem. Can be decoded by reformulation as a Max-Flow problem. Can be generalized to non-binary cases. Can be 2-approximated when not submodular (under a different constraint). In the general case it is known to be NP-Hard. The following material is borrowed from Simon Prince s (@UCL) slides. Available at computervisionmodels.com 10 / 30

44 The Max-Flow problem 11 / 30

45 The Max-Flow problem The goal is to push as much flow as possible through the directed graph from the source to the sink. 11 / 30

46 The Max-Flow problem The goal is to push as much flow as possible through the directed graph from the source to the sink. Cannot exceed the (non-negative) capacities C ij associated with each edge. 11 / 30

47 Saturation When we push the maximum flow from source to sink: There must be at least one saturated edge on any path from source to sink, otherwise you can push more flow. 12 / 30

48 Saturation When we push the maximum flow from source to sink: There must be at least one saturated edge on any path from source to sink, otherwise you can push more flow. The set of saturated edges hence separate the source and sink. 12 / 30

49 Saturation When we push the maximum flow from source to sink: There must be at least one saturated edge on any path from source to sink, otherwise you can push more flow. The set of saturated edges hence separate the source and sink. This set is simultaneously the min-cut and the max-flow. 12 / 30

50 An example Two numbers are: current flow/ total capacity 13 / 30

51 An example Chose any path from source to sink with spare capacity and push as much flow as possible. 14 / 30

52 An example 15 / 30

53 An example 16 / 30

54 An example 17 / 30

55 An example 18 / 30

56 An example No further augmenting path exists. 19 / 30

57 An example The saturated edges partition the graph into two subgraphs. 20 / 30

58 The binary MRFs In the simplest form, let us constrain the pairwise potentials for adjacent nodes m, n to be: φ m,n(0, 0) = φ m,n(1, 1) = 0. φ m,n(1, 0) = θ 10. φ m,n(0, 1) = θ / 30

59 The binary MRFs In the simplest form, let us constrain the pairwise potentials for adjacent nodes m, n to be: φ m,n(0, 0) = φ m,n(1, 1) = 0. φ m,n(1, 0) = θ 10. φ m,n(0, 1) = θ 01. Will make a graph such that each cut corresponds to a configuration. 21 / 30

60 The binary MRFs 22 / 30

61 The binary MRFs 23 / 30

62 The binary MRFs 24 / 30

63 The binary MRFs 25 / 30

64 The binary MRFs In the general case: 26 / 30

65 The binary MRFs In the general case: Constraint θ 10 + θ 01 > θ 11 + θ 00 (attraction). 26 / 30

66 The binary MRFs In the general case: Constraint θ 10 + θ 01 > θ 11 + θ 00 (attraction). If met, the problem is called submodular and we can solve it in polynomial time. 26 / 30

67 Other cases 27 / 30

68 Other cases 27 / 30

69 Other cases Another type of constraint allows approximate solutions. 28 / 30

70 Other cases Another type of constraint allows approximate solutions. if the pairwise potential is a metric 28 / 30

71 Other cases Another type of constraint allows approximate solutions. if the pairwise potential is a metric Alpha Expansion Algorithm (next week) uses the max-flow idea as a subroutine to do coordinate descent in the label space. 28 / 30

72 Conclusion Decoding and inference is still efficient with Semi-Markov models. 29 / 30

73 Conclusion Decoding and inference is still efficient with Semi-Markov models. Useful if need to control the length of each state over a sequence. 29 / 30

74 Conclusion Decoding and inference is still efficient with Semi-Markov models. Useful if need to control the length of each state over a sequence. Graph cuts help with decoding on models with pairwise potentials. 29 / 30

75 Conclusion Decoding and inference is still efficient with Semi-Markov models. Useful if need to control the length of each state over a sequence. Graph cuts help with decoding on models with pairwise potentials. Exact solution in binary case if submodular. 29 / 30

76 Conclusion Decoding and inference is still efficient with Semi-Markov models. Useful if need to control the length of each state over a sequence. Graph cuts help with decoding on models with pairwise potentials. Exact solution in binary case if submodular. Exact solution in multi-label case if submodular. 29 / 30

77 Conclusion Decoding and inference is still efficient with Semi-Markov models. Useful if need to control the length of each state over a sequence. Graph cuts help with decoding on models with pairwise potentials. Exact solution in binary case if submodular. Exact solution in multi-label case if submodular. Approximate solution in multi-label case if a metric. 29 / 30

78 Thank you! Questions? 30 / 30

Intelligent Systems:

Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM Roadmap for next two lectures Definition