Loopy Belief Propagation for Bipartite Maximum Weight b-matching

Size: px

Start display at page:

Download "Loopy Belief Propagation for Bipartite Maximum Weight b-matching"

Tamsyn Scott
5 years ago
Views:

1 Loopy Belief Propagation for Bipartite Maximum Weight b-matching Bert Huang and Tony Jebara Computer Science Department Columbia University New York, NY 10027

2 Outline 1. Bipartite Weighted b-matching 2. Edge Weights As a Distribution 3. Efficient Max-Product 4. Convergence Proof Sketch 5. Experiments 6. Discussion

3 Outline 1. Bipartite Weighted b-matching 2. Edge Weights As a Distribution 3. Efficient Max-Product 4. Convergence Proof Sketch 5. Experiments 6. Discussion

4 Bipartite Weighted b-matching

5 Bipartite Weighted b-matching On bipartite graph, G = (U, V, E) {u 1,..., u n } U {v 1,..., v n } V E = (u i, v j ), i j u 1 u 2 u 3 u 4 v 1 v 2 v 3 v 4

6 Bipartite Weighted b-matching On bipartite graph, {u 1,..., u n } U {v 1,..., v n } V E = (u i, v j ), i j G = (U, V, E) u 1 u 2 u 3 u 4 A = weight matrix s.t. weight of edge v 1 v 2 v 3 v 4 (u i, v j ) = A ij

7 Bipartite Weighted b-matching Task: Find the maximum weight subset of E such that each vertex has exactly b neighbors.

8 Bipartite Weighted b-matching Task: Find the maximum weight subset of E such that each vertex has exactly b neighbors. Example: u 1 u 2 u 3 u 4 n = 4 b = 2 v 1 v 2 v 3 v 4

9 Bipartite Weighted b-matching Classical Application: Resource Allocation - Manual labor Workers u 1 u 2 u 3 u 4 - n workers - n tasks Tasks - Team of b workers needed per task. - skill of worker at performing task. A ij v 1 v 2 v 3 v 4

10 Bipartite Weighted b-matching Alternate uses of b-matching: - Balanced k-nearest-neighbors - Each node can only be picked k times. - Robust to translations of test data. - When test data is collected under different conditions (e.g. time, location, instrument calibration).

11 Bipartite Weighted b-matching Classical algorithms solve Max-Weighted b- Matching in O(bn 3 ) running time, such as: - Blossom Algorithm (Edmonds 1965) - Balanced Network Flow (Fremuth-Paeger, Jungnickel 1999)

12 Outline 1. Bipartite Weighted b-matching 2. Edge Weights As a Distribution 3. Efficient Max-Product 4. Convergence Proof Sketch 5. Experiments 6. Discussion

13 Edge Weights As a Distribution - Bayati, Shah, and Sharma (2005) formulated the 1-matching problem as a probability distribution. - This work generalizes to arbitrary b.

14 Edge Weights As a Distribution Variables: Each vertex chooses b neighbors.

15 Example: u i - u 1 u 1 u 1 v 1 v 2 v 3 v 4 v 1 v 2 v 3 v 4 v 1 v 2 v 3 v 4 u 1 u 1 u 1 v 1 v 2 v 3 v 4 v 1 v 2 v 3 v 4 v 1 v 2 v 3 v 4

16 Edge Weights As a Distribution Variables: Each vertex chooses b neighbors.

17 Edge Weights As a Distribution Variables: Each vertex chooses b neighbors. For vertex, u i X i V, X i = b Similarly, for v j have variable Y j Note: variables have ( ) n b possible settings.

18 Edge Weights As a Distribution Weights as probabilities: Since we sum weights but multiply probabilities, exponentiate. φ(x i ) = exp( 1 2 φ(y j ) = exp( 1 2 v j X i A ij ) u i Y j A ij ) ( ) n b

19 Edge Weights As a Distribution Enforce b-matching: Neighbor choices must agree

20 Example: Invalid settings u 1 u 1 u 2 u 3 u 4 v 1 v 2 v 3 v 4 v 1

21 Edge Weights As a Distribution Enforce b-matching: Neighbor choices must agree

22 Edge Weights As a Distribution Enforce b-matching: Neighbor choices must agree Pairwise compatibility function: ψ(x i, Y j ) = = (v j X i u i Y j ). ( ) n b ( ) n b

23 Edge Weights As a Distribution P (X, Y ) = 1 Z n i,j=1 ψ(x i, Y j ) n k=1 φ(x k )φ(y j ) φ(x i ) = exp( 1 A ij ) φ(y j ) = exp( v j X i ψ(x i, Y j ) = (v j X i u i Y j ). A ij ) u i Y j

24 Edge Weights As a Distribution P (X, Y ) = 1 Z n i,j=1 ψ(x i, Y j ) n k=1 φ(x k )φ(y j ) φ(x i ) = exp( 1 A ij ) φ(y j ) = exp( v j X i ψ(x i, Y j ) = (v j X i u i Y j ). A ij ) u i Y j Ignore the Z normalization, P(X,Y) is exactly the exponentiated weight of the b-matching.

25 Edge Weights As a Distribution P (X, Y ) = 1 Z n i,j=1 ψ(x i, Y j ) n k=1 φ(x k )φ(y j ) φ(x i ) = exp( 1 A ij ) φ(y j ) = exp( v j X i ψ(x i, Y j ) = (v j X i u i Y j ). A ij ) u i Y j Also, since we re maximizing, ignore the 1/2 (makes the math more readable).

26 Outline 1. Bipartite Weighted b-matching 2. Edge Weights As a Distribution 3. Efficient Max-Product 4. Convergence Proof Sketch 5. Experiments 6. Discussion

27 Standard Max-Product Send messages between variables: m Xi (Y j ) = 1 Z max X i φ(x i )ψ(x i, Y j ) m Yk (X i ) k j Fuse messages to obtain beliefs (or estimate of max-marginals): b(x i ) = 1 Z φ(x i) k m Yk (X i )

28 Standard Max-Product Converges to true maximum on any tree structured graph (Pearl 1986). We show that it converges to the correct maximum on our graph.

29 Standard Max-Product Converges to true maximum on any tree structured graph (Pearl 1986). We show that it converges to the correct maximum on our graph. But what about the belief vectors? ( ) n b -length message and

30 Efficient Max-Product ( ) n - Use algebraic tricks to reduce b -length message vectors to scalars. - Derive new update rule for scalar messages. - Use similar trick to maximize belief vectors efficiently. Let s speed through the math.

31 Efficient Max-Product Take advantage of binary ψ(x i, y j ) function: Message vectors consist of only two values m xi (y j ) m xi (y j ) max v j x i φ(x i ) k j max v j / x i φ(x i ) k j m yk (x i ), if u i y j m yk (x i ), if u i / y j

32 Efficient Max-Product Take advantage of binary ψ(x i, y j ) function: Message vectors consist of only two values m xi (y j ) m xi (y j ) max v j x i φ(x i ) k j max v j / x i φ(x i ) k j m yk (x i ), if u i y j m yk (x i ), if u i / y j If we rename these two values we can break up the product.

33 Efficient Max-Product Take advantage of binary function: Message vectors consist of only two values µ xi y j max v j x i φ(x i ) ν xi y j max v j / x i φ(x i ) ψ(x i, y j ) u k x i \v j µ ki u k x i \v j µ ki u k / x i \v j ν ki u k / x i \v j ν ki. If we rename these two values we can break up the product.

34 Efficient Max-Product Normalize messages by dividing whole vector by ν xi y j ˆµ xi y j = µ x i y j ν xi y j and ˆν xi y j = 1

35 Efficient Max-Product Normalize messages by dividing whole vector by ν xi y j ˆµ xi y j = µ x i y j ν xi y j and ˆν xi y j = 1 ( ) n b -length vector scalar

36 Efficient Max-Product Derive update rule: ˆµ xi y j = max j x i φ(x i ) k x i \j ˆµ ki max j / xi φ(x i ) k x i \j ˆµ ki = max j x i k x i exp(a ik ) k x i \j ˆµ ki max j / xi k x i exp(a ik ) k x i \j ˆµ ki = exp(a ij) max j xi k x i \j exp(a ik)ˆµ ki max j / xi k x i exp(a ij )ˆµ ki

37 Efficient Max-Product Derive update rule: ˆµ xi y j = max j x i φ(x i ) k x i \j ˆµ ki max j / xi φ(x i ) k x i \j ˆµ ki = max j x i k x i exp(a ik ) k x i \j ˆµ ki max j / xi k x i exp(a ik ) k x i \j ˆµ ki φ(x = exp(a i ) exp(a ij) max j xi k x i \j exp(a ik ) k x ik)ˆµ ki i max j / xi k x i exp(a ij )ˆµ ki

38 Efficient Max-Product Derive update rule: ˆµ xi y j = max j x i φ(x i ) k x i \j ˆµ ki max j / xi φ(x i ) k x i \j ˆµ ki = max j x i k x i exp(a ik ) k x i \j ˆµ ki max j / xi k x i exp(a ik ) k x i \j ˆµ ki = exp(a ij) max j xi k x i \j exp(a ik)ˆµ ki max j / xi k x i exp(a ij )ˆµ ki

39 Efficient Max-Product Derive update rule: ˆµ xi y j = max j x i φ(x i ) k x i \j ˆµ ki max j / xi φ(x i ) k x i \j ˆµ ki = max j x i k x i exp(a ik ) k x i \j ˆµ ki max j / xi k x i exp(a ik ) k x i \j ˆµ ki = exp(a ij) max j xi k x i \j exp(a ik)ˆµ ki max j / xi k x i exp(a ij )ˆµ ki

40 Efficient Max-Product After canceling terms message update simplifies to ˆµ xi y j = exp(a ij) exp(a il )ˆµ yl x i. l = and we maximize beliefs with max b(x i ) max φ(x i ) ˆµ yk x i x i x i k x i max exp(a ik )ˆµ yk x i x i k x i bth greatest setting of k for the term exp(a ik )m yk (x i ), s. t. k j Both these updates take O(bn) time per vertex.

41 Outline 1. Bipartite Weighted b-matching 2. Edge Weights As a Distribution 3. Efficient Max-Product 4. Convergence Proof Sketch 5. Experiments 6. Discussion

42 Convergence Proof Sketch

43 Convergence Proof Sketch Assumptions: - Optimal b-matching is unique. - ɛ = difference between weight of best and 2nd best b-matching is constant. - Weights treated as constants.

44 Convergence Proof Sketch Basic mechanism: Unwrapped Graph, T 1. Pick root node 2. Copy all neighbors 3. Continue but don t backtrack 4. Continue to depth d u 1 u 2 u 3 u 4 v 1 v 2 v 3 v 4

45 the same asymptotic rate of O(bn 3 ). Furthe more, the algorithm shows promising perfo mance in machine learning applications. Convergence Proof Sketch 1 INTRODUCTION Basic mechanism: Unwrapped Graph, T 1. Pick root node 2. Copy all neighbors 3. Continue but don t backtrack 4. Continue to depth d u 1 u 2 u 3 u 4 v 1 v 2 v 3 v 4 u 1 Figure 1: Example b-matching M G on a bipartite gr v Dashed 1 v lines represent 2 v possible edges, solid 3 v lines re 4 b-matched edges. In this case b = 2. u 2 u 3 u 4 u 2 u 3 u 4 u 2 u 3 u 4 u 2 u 3 u 4 2 EXPERIMENTS v 2 v 3 v 4 v 2 v 3 v 4 v 2 v 3 v 4 v 1 v 3 v 4 v 1 v 3 v 4 v 1 v 3 v 4 v 1 v 2 v 4 v 1 v 2 v 4 v 1 v 2 v 4 v 1 v 2 v 3 v 1 v 2 v 3 v 1 v 2 v 3

46 Convergence Proof Sketch Basic mechanism: Unwrapped Graph, T 1. Pick root node 2. Copy all neighbors 3. Continue but don t backtrack 4. Continue to depth d u 1 u 2 u 3 u 4 v 1 v 2 v 3 v 4 u 1 v 1 v 2 v 3 v 4 u 2 u 3 u 4 u 2 u 3 u 4 u 2 u 3 u 4 u 2 u 3 u 4 v 2 v 3 v 4 v 2 v 3 v 4 v 2 v 3 v 4 v 1 v 3 v 4 v 1 v 3 v 4 v 1 v 3 v 4 v 1 v 2 v 4 v 1 v 2 v 4 v 1 v 2 v 4 v 1 v 2 v 3 v 1 v 2 v 3 v 1 v 2 v 3

47 Convergence Proof Sketch Basic mechanism: Unwrapped Graph, T 1. Pick root node 2. Copy all neighbors 3. Continue but don t backtrack 4. Continue to depth d u 1 u 2 u 3 u 4 v 1 v 2 v 3 v 4 u 1 v 1 v 2 v 3 v 4 u 2 u 3 u 4 u 2 u 3 u 4 u 2 u 3 u 4 u 2 u 3 u 4 v 2 v 3 v 4 v 2 v 3 v 4 v 2 v 3 v 4 v 1 v 3 v 4 v 1 v 3 v 4 v 1 v 3 v 4 v 1 v 2 v 4 v 1 v 2 v 4 v 1 v 2 v 4 v 1 v 2 v 3 v 1 v 2 v 3 v 1 v 2 v 3

Convergence Proof Sketch Basic mechanism: Unwrapped Graph, T 1. Pick root node 2. Copy all neighbors 3. Continue but don t backtrack 4. Continue to depth d u 1 u 2 u 3 u 4 Too complex to color.

48 Convergence Proof Sketch Basic mechanism: Unwrapped Graph, T 1. Pick root node 2. Copy all neighbors 3. Continue but don t backtrack 4. Continue to depth d u 1 u 2 u 3 u 4 Too complex to color... v 1 v 2 v 3 v 4 u 1 v 1 v 2 v 3 v 4 u 2 u 3 u 4 u 2 u 3 u 4 u 2 u 3 u 4 u 2 u 3 u 4 v 2 v 3 v 4 v 2 v 3 v 4 v 2 v 3 v 4 v 1 v 3 v 4 v 1 v 3 v 4 v 1 v 3 v 4 v 1 v 2 v 4 v 1 v 2 v 4 v 1 v 2 v 4 v 1 v 2 v 3 v 1 v 2 v 3 v 1 v 2 v 3

49 Convergence Proof Sketch Basic mechanism: Unwrapped Graph, T - Construction follows loopy BP messages in reverse. - True max-marginals of root node are exactly belief at iteration d. - Max of unwrapped graph distribution is the maximum weight b-matching on tree.

50 Convergence Proof Sketch Proof by contradiction: What happens if optimal b-matching on T differs from optimal b-matching on G at root?

51 Convergence Proof Sketch Best b-matching on T u 1 v 1 v 2 v 3 v 4 u 2 u 3 u 4 u 2 u 3 u 4 u 2 u 3 u 4 u 2 u 3 u 4 v 2 v 3 v 4 v 2 v 3 v 4 v 2 v 3 v 4 v 1 v 3 v 4 v 1 v 3 v 4 v 1 v 3 v 4 v 1 v 2 v 4 v 1 v 2 v 4 v 1 v 2 v 4 v 1 v 2 v 3 v 1 v 2 v 3 v 1 v 2 v 3 Best b-matching on G copied onto T ple unwrapped graph T of G at 3 iterations. u 1 The matching M T is highlighted based on af nodes cannot have perfect b-matchings, but all inner nodes and the root do. One po ich is discussed in Lemma??. v 1 v 2 v 3 v 4 u 2 u First 3 cycle u 4 u 2 u 3 u 4 u 2 v 1 u 3 u 4 u 4 u 2 u 3 u 4 u 3 v 1 u 4 v 3 u 1 v 1 u 4 v 2 u 2... u 1 v 3 u 3 v 1 u 4 v 2 u 2 v 2 v 3 v 4 v 2 v 3 v 4 v 2 v 3 v 4 v 1 v 3 v 4 v 1 v 3 v 4 v 1 v 3 v 4 v 1 v 2 v 4 v 1 v 2 v 4 v 1 v 2 v 4 v 1 v 2 v 3 v 1 v 2 v 3 v 1 v 2 v 3

52 Convergence Proof Sketch There exists at least one path on T that alternates between edges that are b-matched in each b-matching. u 1 v 1 v 2 v 3 v 4 u 2 u 3 u 4 u 2 u 3 u 4 u 2 u 3 u 4 u 2 u 3 u 4 v 2 v 3 v 4 v 2 v 3 v 4 v 2 v 3 v 4 v 1 v 3 v 4 v 1 v 3 v 4 v 1 v 3 v 4 v 1 v 2 v 4 v 1 v 2 v 4 v 1 v 2 v 4 v 1 v 2 v 3 v 1 v 2 v 3 v 1 v 2 v 3

53 Convergence Proof Sketch There exists at least one path on T that alternates between edges that are b-matched in each b-matching. u 1 v 1 v 2 v 3 v 4 u 2 u 3 u 4 u 2 u 3 u 4 u 2 u 3 u 4 u 2 u 3 u 4 v 2 v 3 v 4 v 2 v 3 v 4 v 2 v 3 v 4 v 1 v 3 v 4 v 1 v 3 v 4 v 1 v 3 v 4 v 1 v 2 v 4 v 1 v 2 v 4 v 1 v 2 v 4 v 1 v 2 v 3 v 1 v 2 v 3 v 1 v 2 v 3

54 Convergence Proof Sketch Claim: If depth d is great enough, if we replace the blue edges of this path with the red edges in the optimal b-matching on T, we get a new b-matching with greater weight. u 1 v 1 v 2 v 3 v 4 u 2 u 3 u 4 u 2 u 3 u 4 u 2 u 3 u 4 u 2 u 3 u 4 v 2 v 3 v 4 v 2 v 3 v 4 v 2 v 3 v 4 v 1 v 3 v 4 v 1 v 3 v 4 v 1 v 3 v 4 v 1 v 2 v 4 v 1 v 2 v 4 v 1 v 2 v 4 v 1 v 2 v 3 v 1 v 2 v 3 v 1 v 2 v 3

55 Convergence Proof Sketch Claim: If depth d is great enough, if we replace the blue edges of this path with the red edges in the optimal b-matching on T, we get a new b-matching with greater weight. u 1 v 1 v 2 v 3 v 4 u 2 u 3 u 4 u 2 u 3 u 4 u 2 u 3 u 4 u 2 u 3 u 4 v 2 v 3 v 4 v 2 v 3 v 4 v 2 v 3 v 4 v 1 v 3 v 4 v 1 v 3 v 4 v 1 v 3 v 4 v 1 v 2 v 4 v 1 v 2 v 4 v 1 v 2 v 4 v 1 v 2 v 3 v 1 v 2 v 3 v 1 v 2 v 3

56 Convergence Proof Sketch Claim: If depth d is great enough, if we replace the blue edges of this path with the red edges in the optimal b-matching on T, we get a new b-matching with greater weight. u 1 v 1 v 2 v 3 v 4 u 2 u 3 u 4 u 2 u 3 u 4 u 2 u 3 u 4 u 2 u 3 u 4 v 2 v 3 v 4 v 2 v 3 v 4 v 2 v 3 v 4 v 1 v 3 v 4 v 1 v 3 v 4 v 1 v 3 v 4 v 1 v 2 v 4 v 1 v 2 v 4 v 1 v 2 v 4 v 1 v 2 v 3 v 1 v 2 v 3 v 1 v 2 v 3

57 Convergence Proof Sketch Modifying optimal b-matching produced better b-matching Original contradiction impossible. We can analyze the change in weight by looking only at edges on path. u 1 v 1 v 2 v 3 v 4 u 2 u 3 u 4 u 2 u 3 u 4 u 2 u 3 u 4 u 2 u 3 u 4 v 2 v 3 v 4 v 2 v 3 v 4 v 2 v 3 v 4 v 1 v 3 v 4 v 1 v 3 v 4 v 1 v 3 v 4 v 1 v 2 v 4 v 1 v 2 v 4 v 1 v 2 v 4 v 1 v 2 v 3 v 1 v 2 v 3 v 1 v 2 v 3

58 Convergence Proof Sketch Loopy BP converges to true maximum weight b-matching in d iterations d n ɛ max i,j A ij = O(n) ɛ = difference between weight of best and 2nd best b-matching.

59 Convergence Proof Sketch Loopy BP converges to true maximum weight b-matching in d iterations d n ɛ max i,j A ij = O(n) ɛ = difference between weight of best and 2nd best b-matching. Running time of full algorithm: O(bn 3 )

60 Outline 1. Bipartite Weighted b-matching 2. Edge Weights As a Distribution 3. Efficient Max-Product 4. Convergence Proof Sketch 5. Experiments 6. Discussion

61 Experiments Running time comparison against GOBLIN graph optimization library. - Random weights. - Varied graph size n from 3 to Varied b from 1 to n/2

62 Experiments 3 Running time comparison against GOBLIN GOBLIN BP graph optimization library. 2 t BP median running time n b GOBLIN median running time t 1/3 1 0 Median Running time when B= n Median Running time when B=! n/2 " 4 GOBLIN 3 BP t 100 t 1/ n b n

63 Experiments: Translated Test Data $ % On toy data, translation cripples KNN but b-matching makes no classification errors. '()*+,*-./01*1 1 Accuracy & "!& Accuracy !%!$!! "! #" #! 231)451*,6/7,1*83, 0.2 b!matching k!nearest!neighbor k

64 Experiments MNIST Digits with pseudo-translation - Image data with background changes is like translation. - Train on MNIST digits 3, 5, and 8. - Test on new examples with various bluescreen textures.

65 Experiments Average accuracy Background texture BM KNN 5 6 Average accuracy BM KNN k

66 Outline 1. Bipartite Weighted b-matching 2. Edge Weights As a Distribution 3. Efficient Max-Product 4. Convergence Proof Sketch 5. Experiments 6. Discussion

67 Discussion Provably convergent belief propagation for a new type of graph (b-matchings). + Empirically faster than previous algorithms. + Parallelizeable - Only bipartite case. - Requires unique maximum. Interesting theoretical results coming out of sum-product for approximating marginals.

Graphical Model Inference with Perfect Graphs

Graphical Model Inference with Perfect Graphs Tony Jebara Columbia University July 25, 2013 joint work with Adrian Weller Graphical models and Markov random fields We depict a graphical model G as a bipartite