Bounded Inference Non-iteratively; Mini-Bucket Elimination

Size: px

Start display at page:

Download "Bounded Inference Non-iteratively; Mini-Bucket Elimination"

Russell Chapman
5 years ago
Views:

1 Bounded Inference Non-iteratively; Mini-Bucket Elimination COMPSCI 276,Spring 2018 Class 6: Rina Dechter Reading: Class Notes (8, Darwiche chapters 14

2 Outline Mini-bucket elimination Weighted Mini-bucket Mini-clustering Re-parameterization, cost-shifting Iterative Belief propagation Iterative-join-graph propagation

3 Types of queries Max-Inference Sum-Inference Harder Mixed-Inference NP-hard: exponentially many terms We will focus on approximation algorithms Anytime: very fast & very approximate! Slower & more accurate

4 Queries Probability of evidence (or partition function P( e = X var( e i= 1 P( x i pa i Posterior marginal (beliefs: P( x i P( xi, e e = = P( e Most Probable Explanation x* n e X var( e X j= 1 n X var( e j= 1 = n P( x Z = ( P( x argmaxp(x,e x i j j X pa pa j j i e e i C i

5 Bucket Elimination 0, ( 0 ( = = e a P e a P = = = d e b c c b e P b a d P a c P a b P a P e a P 0,,,, (, ( ( ( ( 0, ( = = d c b e b a d P c b e P a b P a c P a P, (, ( ( ( ( 0 Elimination Order: d,e,b,c Query: D: E: B: C: A: = d D b a d P b a f, (, (, ( b a d P, ( c b e P, 0 (, ( c b e P c b f E = = = b E D B c b f b a f a b P c a f, (, ( (, ( ( ( 0, ( a f A p e a P C = = P(a ( a c P = c B C c a f a c P a f, ( ( ( ( a b P D,A,B E,B,C B,A,C C,A A, ( b a f D, ( c b f E, ( c a f B f C (a A D E C B Bucket Tree D E B C A Original Functions Messages Time and space exp(w*

6 Finding Algorithm elim-mpe (Dechter 1996 MPE = max P(x x MPE = bucket B: bucket C: bucket D: bucket E: bucket A: is replaced by max : max P( a P( c a P( b a P( d a, e, d, c, b max b P(b a P(d b,a P(e b,c P(c a e=0 P(a h C (a,d,e MPE h D (a,e h E (a a, b P( e Elimination operator W*=4 induced width (max clique size b, c B C D E A

7 Generating the MPE-tuple 5. b' = arg max P(b a' b P(d' b,a' P(e' b,c' B: P(b a P(d b,a P(e b,c 4. c' = h arg max P(c a' B c (a',d',c, e' C: P(c a h B (a,d,c, e 3. d' = arg max h d C (a',d,e' D: h C (a,d,e 2. e' = 0 E: e=0 h D (a, e 1. a' = arg max P(a h a E (a A: P(a h E (a Return (a',b',c',d',e'

8 Approximate Inference Metrics of evaluation Absolute error: given e>0 and a query p= P(x e, an estimate r has absolute error e iff p-r <e Relative error: the ratio r/p in [1-e,1+e]. Dagum and Luby 1993: approximation up to a relative error is NPhard. Absolute error is also NP-hard if error is less than.5

9 Outline Mini-bucket elimination Weighted Mini-bucket Mini-clustering Re-parameterization, cost-shifting Iterative Belief propagation Iterative-join-graph propagation

10 Mini-Buckets: Local Inference Computation in a bucket is time and space exponential in the number of variables involved Therefore, partition functions in a bucket into mini-buckets on smaller number of variables

Decomposition Bounds Upper & lower bounds via approximate problem decomposition Example: MAP inference X F (X X f 1 (X X f 2 (X 0 2.0 1 4.0 2 5.0 = 0 1.0 1 2.0 2 3.0 + 0 1.

11 Decomposition Bounds Upper & lower bounds via approximate problem decomposition Example: MAP inference X F (X X f 1 (X X f 2 (X = = = 6.0 Relaxation: two copies of x, no longer required to be equal Bound is tight (equality if f 1, f 2 agree on maximizing value x

12 Mini-Bucket Approximation Split a bucket into mini-buckets > bound complexity bucket (X = Exponential complexity decrease:

13 Mini-Bucket Elimination mini-buckets [Dechter & Rish 2003] A bucket B: bucket C: B C bucket D: bucket E: D E bucket A: U = upper bound

14 Mini-Bucket Elimination mini-buckets [Dechter & Rish 2003] B D B A E C bucket B: bucket C: bucket D: bucket E: bucket A: U = upper bound Can interpret process as duplicating B [Kask et al. 2001, Geffner et al. 2007, Choi et al. 2007, Johnson et al. 2007]

15 Semantics of Mini-Bucket: Splitting a Node Variables in different buckets are renamed and duplicated (Kask et. al., 2001, (Geffner et. al., 2007, (Choi, Chavira, Darwiche, 2007 Before Splitting: Network N After Splitting: Network N' U U Û 16

16 Relaxed Network Example

17 MBE-MPE(i: Algorithm MBE-mpe Input: I max number of variables allowed in a mini-bucket Output: [lower bound (P of suboptimal solution, upper bound] Example: MBE-mpe(3 versus BE-mpe B: f a, b f b, c f b, d f b, e max variables in a mini-bucket 3: B: f a, b f b, c f b, d f b, e C: λ B C (a, c f c, a f c, e 3: C: λ B C (a, c, d, e f c, a f c, e D: f a, d λ B D (d, e 3: D: f a, d λ C D (a, d, e E: λ C E (a, e λ D E (a, e 2: E: λ D E (a, e A: f a λ E A (a 1: A: f a λ E A (a U = Upper bound w = 2 OPT w =4 [Dechter and Rish, 1997]

18 Mini-Bucket Decoding b = arg min b መd = arg min d a = arg min a f a, b + f b, cƹ +f b, መd + f(b, e Ƹ c Ƹ = arg min λ B C a, c + f c, a + f(c, e Ƹ c f a, d + λ B D (d, e Ƹ e Ƹ = arg min λ C E ( a, e + λ D E ( a, e e f a + λ E A (a bucket B: bucket C: bucket D: bucket E: bucket A: mini-buckets min f( min f( B f a, b f b, c f b, d f b, e λ B C (a, c f c, a f c, e f a, d λ B D (d, e λ C E (a, e λ D E (a, e f a λ E A (a B Greedy configuration = upper bound L = lower bound [Dechter and Rish, 2003]

19 (i,m-patitionings

20 MBE(i,m, MBE(i Input: Belief network ( P,..., Output: upper and lower bounds Initialize: put functions in buckets along ordering Process each bucket from p=n to 1 Create (i,m-partitions Process each mini-bucket (For mpe: assign values in ordering d 1 Return: mpe-configuration, upper and lower bounds P n

22 Partitioning, Refinements

23 Properties of MBE(i Complexity: O(r exp(i time and O(exp(i space Yields a lower bound and an upper bound Accuracy: determined by upper/lower (U/L bound Possible use of mini-bucket approximations As anytime algorithms As heuristics in search Other tasks (similar mini-bucket approximations Belief updating, Marginal MAP, MEU, WCSP, Max-CSP [Dechter and Rish, 1997], [Liu and Ihler, 2011], [Liu and Ihler, 2013]

24 Anytime Approximation

25 MBE for Belief Updating and for Probability of Evidence or Partition Function Idea mini-bucket is the same: f ( x g( x f ( x g( x X X X X f ( x g( x f ( x max g( X So we can apply a sum in each mini-bucket, or better, one sum and the rest max, or min (for lower-bound MBE-bel-max(i,m, MBE-bel-min(i,m generating upper and lower-bound on beliefs approximates BE-bel MBE-map(i,m: max mini-buckets will be maximized, sum mini-buckets will be sum-max. Approximates BE-map. X X

27 Methodology for Empirical Evaluation (for mpe U/L accuracy Better (U/mpe or mpe/l Benchmarks: Random networks Given n,e,v generate a random DAG For xi and parents generate table from uniform [0,1], or noisy-or Create k instances. For each, generate random evidence, likely evidence Measure averages

28 Upper/Lower CPCS Networks Medical Diagnosis (noisy-or model Test case: no evidence Anytime-mpe( U/L error vs time cpcs422b cpcs360b Algorithm elim-mpe anytime-mpe(, anytime-mpe(, 0.6 i=1 i= Time and parameter i 4 = 10 1 = 10 Time (sec cpcs360 cpcs

29 Outline Mini-bucket elimination Weighted Mini-bucket Mini-clustering Re-parameterization, cost-shifting Iterative Belief propagation Iterative-join-graph propagation

30 The Power Sum and Holder Inequality

31 Working Example Model: Markov network Task: Partition function A B C (Qiang Liu slides

32 Mini-Bucket (Basic Principles Upper bound Lower bound (Qiang Liu slides

33 Holder Inequality Where and When, the equality is achieved. (Qiang Liu slides G. H. Hardy, J. E. Littlewood and G. Pólya, dechter, Inequalities, class Cambridge Univ. Press, London and New York, 1934.

34 G. H. Hardy, J. E. Littlewood and G. Pólya, Inequalities, Cambridge Univ. Press, London and New York, Reverse Holder Inequality If, but the direction of the inequality reverses. (Qiang Liu slides

35 Mini-Bucket as Holder Inequality (mbe-bel-max (mbe-bel-min (Qiang Liu slides

36 Weighted Mini-Bucket (for summation Exact bucket elimination: λ B a, c, d, e = where b w 1 f a, b f b, c b f a, b f b, c f b, d f b, e w 2 f b, d f b, e = λ B C (a, c λ B D (d, e (mini-buckets w f x = f x 1 w x x is the weighted or power sum operator b w bucket B: bucket C: bucket D: bucket E: bucket A: mini-buckets f(x f a, b f b, c f b, d f b, e λ B C (a, c f a, c f c, e f a, d λ B D (d, e λ C E (a, e λ D E (a, e f a λ E A (a w x w w 1 f 1 x f 2 x f 1 x x x w 2 f 2 x where w 1 + w 2 = w and w 1 > 0, w 2 > 0 (lower bound if w 1 > 0, w 2 < 0 x U = upper bound [Liu and Ihler, 2011]

38 Choosing the weights (Cauchy Schwarz inequality (Qiang Liu slides

39 Weighted-mini-bucket for Marginal Map

40 MB and WMB for Marginal MAP max Y Σ X\Y Π j P j w 1 λ B C a, c = f a, b f(b, c b w 2 λ B D d, e = f b, d f(b, e. b (w 1 + w 2 = 1 Marginal MAP Σ B Σ C max D bucket B: bucket C: bucket D: mini-buckets f a, b f b, c f b, d f b, e λ B C (a, c f a, c f c, e f a, d λ B D (d, e λ E A a = max e λ C E a, e λ D E (a, e max E bucket E: λ C E (a, e λ D E (a, e U = max a f a λ E A(a max A bucket A: f a λ E A (a Can optimize over cost-shifting and weights (single pass MM or iterative message passing U = upper bound [Liu and Ihler, 2011; 2013] [Dechter and Rish, 2003]

41 MBE-map Process max buckets With max mini-buckets And sum buckets with sum Mini-bucket and max mini-buckets

42 Initial partitioning

44 Complexity and Tractability of MBE(i,m

45 Outline Mini-bucket elimination Weighted Mini-bucket Mini-clustering Re-parameterization, cost-shifting Iterative Belief propagation Iterative-join-graph propagation

46 Join-Tree Clustering (Cluster-Tree Elimination A B 1 ABC BC h( 1,2 ( b, c = p( a p( b a p( c a, b a C D E 2 BCDF BF h( 2,1 ( b, c = p( d b p( f c, d h(3,2 ( b, f d, f h( 2,3 ( b, f = p( d b p( f c, d h(1,2 ( b, c c, d F EXACT algorithm Time and space: exp(cluster size= exp(treewidth G 3 BEF 4 EF EFG h( 3,2 ( b, f = p( e b, f h(4,3 ( e, f e h( 3,4 ( e, f = p( e b, f h(2,3 ( b, f h( 4,3 ( e, f = p( G = ge e, f b

48 Mini-Clustering, i-bound=3 A B 1 A B C p(a, p(b a, p(c a,b C F D E G 2 3 BC B C D p(d b, h (1,2 (b,c C D F p(f c,d BF B E F p(e b,f, h 1 (2,3(b, h 2 (2,3(f 1 h( 1,2 ( b, c = p( a p( b a p( c a, b a 1 1 h( 2,3 ( b = p( d b h(1,2 ( b, c c, d 2 h( 2,3 ( f = max p( f c, d c, d APPROXIMATE algorithm 4 EF E F G p(g e,f Time and space: exp(i-bound Number of variables in a mini-cluster

49 EF BF BC, ( ( ( :, ( 1 1,2 ( b a c p a b p a p c b h a = H (1,2, ( max : (, ( ( : (, 2 (2,1 1 (3,2, 1 (2,1 d c f p c h f b h b d p b h f d f d = = H (2,1, ( max : (, ( ( : (, 2 (2,3 1 (1,2, 1 (2,3 d c f p f h c b h b d p b h d c d c = = H (2,3, (, ( :, ( 1 ( 4, , ( f e h f b e p f b h e = H (3,2 ( (, ( :, ( 2 ( 2,3 1 ( 2, , ( f h b h f b e p f e h b = H (3,4, ( :, ( 1 4,3 ( f e g G p f e h e = = H (4,3 ABC BEF EFG BCDF Mini-Clustering - Example

50 Cluster Tree Elimination vs. Mini-Clustering 1 ABC BC h( 1, 2 ( b, c 1 ABC BC H (1,2 1 h( 1, 2 ( b, c h 1 (2,1 ( b 2 BCDF BF h( 2,1 ( b, c h( 2,3 ( b, f 2 BCDF BF H (2,1 H (2,3 h h h 2 (2,1 1 (2,3 2 (2,3 ( c ( b ( f 3 BEF EF 4 EFG h( 3, 2 ( b, f h( 3, 4 ( e, f h( 4,3 ( e, f 3 BEF EF 4 EFG H (3,2 H (3,4 H (4,3 1 h( 3, 2 ( b, f 1 h( 3, 4 ( e, f 1 h( 4,3 ( e, f

51 Linear Block Codes Received bits a b c d e f g h σ Input bits Parity bits A B C D E F G H Gaussian channel noise σ Received bits p 1 p 2 p 3 p 4 p 5 p 6

52 Probabilistic decoding Error-correcting linear block code State-of-the-art: approximate algorithm iterative belief propagation (IBP (Pearl s poly-tree algorithm applied to loopy networks

53 Iterative Belief Proapagation Belief propagation is exact for poly-trees IBP - applying BP iteratively to cyclic networks One step : update BEL(U 1 U1 U2 U3 U2 ( x 1 U3 ( x 1 X1 ( u 1 X 1 X 2 X 2 ( u 1 No guarantees for convergence Works well for many coding networks

54 MBE-mpe vs. IBP IB P a p p ro x - m p e is b e tte r on lo w - is b e tte r on r a nd o m ly ge ne r a te d w * c o d e s ( high - w * c o d e s Bit error rate (BER as a function of noise (sigma:

55 Grid 15x15-10 evidence Grid 15x15, evid=10, w*=22, 10 instances Grid 15x15, evid=10, w*=22, 10 instances NHD MC IBP Absolute error MC IBP i-bound Grid 15x15, evid=10, w*=22, 10 instances i-bound Grid 15x15, evid=10, w*=22, 10 instances MC IBP 10 MC IBP Relative error Time (seconds i-bound i-bound

Heuristics for partitioning (Dechter and Rish, 2003, Rollon and Dechter 2010 Scope-based Partitioning Heuristic (SCP aims at minimizing the number of mini-buckets in the partition by

56 Heuristics for partitioning (Dechter and Rish, 2003, Rollon and Dechter 2010 Scope-based Partitioning Heuristic (SCP aims at minimizing the number of mini-buckets in the partition by including in each minibucket as many functions as respecting the i bound is satisfied Use greedy heuristic derived from a distance function to decide which functions go into a single mini-bucket

57 Greedy Scope-based Partitioning

58 Heuristic for Partitioning Scope-based Partitioning Heuristic. The scope-based partition heuristic (SCP aims at minimizing the number of mini-buckets in the partition by including in each mini-bucket as many functions as possible as long as the i bound is satisfied. First, single function mini-buckets are decreasingly ordered according to their arity from left to right. Then, each mini-bucket is absorbed into the left-most mini-bucket with whom it can be merged. The time complexity of Partition(B, i, where B is the bucket to be partitioned, and B,the number of functions in the bucket, using the SCP heuristic is O( B log ( B + B ^2. The scope-based heuristic is is quite fast, its shortcoming is that it does not consider the actual information in the functions.

59 Greedy Partition as a function of a distance function h

60 Outline Mini-bucket elimination Weighted Mini-bucket Mini-clustering Iterative Belief propagation Iterative-join-graph propagation Re-parameterization, cost-shifting

61 Iterative Belief Proapagation Belief propagation is exact for poly-trees IBP - applying BP iteratively to cyclic networks One step : update BEL(U 1 U1 U2 U3 U2 ( x 1 U3 ( x 1 X1 ( u 1 X 1 X 2 ( u 1 X 2 No guarantees for convergence Works well for many coding networks Lets combine iterative-nature with anytime--ijgp

62 Iterative Join Graph Propagation Loopy Belief Propagation Cyclic graphs Iterative Converges fast in practice (no guarantees though Very good approximations (e.g., turbo decoding, LDPC codes, SAT survey propagation Mini-Clustering(i Tree decompositions Only two sets of messages (inward, outward Anytime behavior can improve with more time by increasing the i-bound We want to combine: Iterative virtues of Loopy BP Anytime behavior of Mini-Clustering(i

63 IJGP - The basic idea Apply Cluster Tree Elimination to any join-graph We commit to graphs that are I-maps Avoid cycles as long as I-mapness is not violated Result: use minimal arc-labeled join-graphs

64 Minimal Arc-Labeled Decomposition ABCDE BC BCE ABCDE BC BCE CDE CE DE CE CDEF CDEF a Fragment of an arc-labeled join-graph a Shrinking labels to make it a minimal arc-labeled join-graph Use a DFS algorithm to eliminate cycles relative to each variable

65 Minimal arc-labeled join-graph

66 Message propagation ABCDE BC BCE CDE CDEF CE ABCDE h (3,1 (bc p(a, p(c, p(b ac, BCE 1 p(d abe,p(e b,c BC 3 h(3,1(bc F F FGH GH h (1,2 CDE CE FGI GI GHIJ 2 CDEF Minimal arc-labeled: sep(1,2={d,e} elim(1,2={a,b,c} Non-minimal arc-labeled: sep(1,2={c,d,e} elim(1,2={a,b} h h( 1,2 ( de = p( a p( c p( b ac p( d abe p( e bc h(3,1 ( bc a, b, c ( 1,2 p( a p( c p( b ac p( d abe p( e bc h(3,1 ( bc a, b ( cde =

67 A IJGP - Example B C A A ABC C C A AB BC C D E ABDE BE BCE DE CE C F G H CDEF F FGH H H F FG GH H I J FGI GI GHIJ Belief network Loopy BP graph

68 Arc-Minimal Join-Graph A A ABC C C A AB BC C Arcs labeled with any single variable should form a TREE DE ABDE CE BE BCE C CDEF F FGH H H F FG GH H FGI GI GHIJ

69 A ABDE FGI ABC BCE GHIJ CDEF FGH C H A AB BC BE C DE CE H F FG GH GI Collapsing Clusters ABCDE FGI BCE GHIJ CDEF FGH BC CDE CE F FG GH GI ABCDE FGI BCE GHIJ CDEF FGH BC CDE CE F FG GH GI ABCDE FGI BCE GHIJ CDEF FGH BC CDE CE F FG GH GI ABCDE FGHI GHIJ CDEF CDE F GHI

70 Join-Graphs A A ABC C C A A ABC C A AB BC C AB BC ABCDE BC BCE ABCDE ABDE BE DE CE BCE C DE ABDE CE BCE C DE CE CDE CDEF F FGH H H CDEF FGH H H CDEF FGH CDEF F FG GH H F F GH F F GH F FGI GI GHIJ FGI GI GHIJ FGI GI GHIJ FGHI GHI GHIJ more accuracy less complexity

71 Bounded decompositions We want arc-labeled decompositions such that: the cluster size (internal width is bounded by i (the accuracy parameter the width of the decomposition as a graph (external width is as small as possible Possible approaches to build decompositions: partition-based algorithms - inspired by the mini-bucket decomposition grouping-based algorithms

72 Constructing Join-Graphs A B G: (GFE GFE P(G F,E C D E E: (EBF (EF EF EBF P(E B,F F F: (FCD (BF D: (DB (CD C: (CAB (CB B: (BA (AB (B A: (A P(F C,D FCD CD P(D B CDB F CB P(C A,B CAB BF P(B A BA BA BF B A P(A A G a schematic mini-bucket(i, i=3 b arc-labeled join-graph decomposition

73 IJGP properties IJGP(i applies BP to min arc-labeled join-graph, whose cluster size is bounded by i On join-trees IJGP finds exact beliefs IJGP is a Generalized Belief Propagation algorithm (Yedidia, Freeman, Weiss 2001 Complexity of one iteration: time: O(deg (n+n d i+1 space: O(N d

74 Empirical evaluation Algorithms: Exact IBP MC IJGP Measures: Networks (all variables are binary: Random networks Grid networks (MxM CPCS 54, 360, 422 Coding networks Absolute error Relative error Kulbach-Leibler (KL distance Bit Error Rate Time

75 Coding Networks Bit Error Rate σ =.22 Coding, N=400, 1000 instances, 30 it, w*=43, sigma=.22 σ =.32 Coding, N=400, 500 instances, 30 it, w*=43, sigma=.32 1e e-2 IJGP MC IBP IBP IJGP BER 1e-3 BER e e i-bound i-bound Coding, N=400, 500 instances, 30 it, w*=43, sigma=.51 σ =.51 Coding, N=400, 500 instances, 30 it, w*=43, sigma=.65 σ = IBP IJGP IBP IJGP BER BER i-bound i-bound

76 CPCS 422 KL Distance CPCS 422, evid=0, w*=23, 1instance CPCS 422, evid=30, w*=23, 1instance 0.1 IJGP 30 it (at convergence MC IBP 10 it (at convergence KL distance KL distance IJGP at convergence MC IBP at convergence i-bound i-bound evidence=0 evidence=30

77 CPCS 422 KL vs. Iterations CPCS 422, evid=0, w*=23, 1instance CPCS 422, evid=30, w*=23, 1instance IJGP (3 IJGP(10 IBP 0.1 IJGP(3 IJGP(10 IBP 0.01 KL distance KL distance number of iterations number of iterations evidence=0 evidence=30

78 Coding networks - Time 10 Coding, N=400, 500 instances, 30 iterations, w*=43 Time (seconds IJGP 30 iterations MC IBP 30 iterations i-bound

79 Outline Mini-bucket elimination Weighted Mini-bucket Mini-clustering Iterative Belief propagation Iterative-join-graph propagation Re-parameterization, cost-shifting

80 Cost-Shifting (Reparameterization + λ(b λ(b A B f(a,b B C f(b,c b b b g 0 1 g b g g 6 1 A B C f(a,b,c b b b 12 b b g 6 b g b 0 b b 6 3 b g 0 3 g b g g B λ(b b 3 g -1 b g g 6 g b b 6 g b g 0 g g b 6 g g g 12 = Modify the individual functions - but keep the sum of functions the same

81 Tightening the bound Reparameterization (or, cost shifting Decrease bound without changing overall function A B f 1 (A,B B C f 2 (B,C = A B C F(A,B,C = A B f 1 (A,B (B B C f 2 (B,C - (B (Adjusting functions cancel each other (Decomposition bound is exact 105

82 Dual Decomposition f 13 (x 1, x 3 x 1 x 3 f 13 ( f 12 (x 1, x 2 x 2 f 23 (x 2, x 3 x 1 x 1 x 3 x 3 f 12 ( f 23 ( x 2 x 2 F = min x f α (x α α min x f α (x Bound solution using decomposed optimization Solve independently: optimistic bound

83 Dual Decomposition f 12 (x 1, x 2 f 13 (x 1, x 3 x 1 x 3 x 2 f 23 (x 2, x 3 λ 1 13 (x 1 λ 3 13 (x 3 f 13 ( λ 1 12 (x 1 x 1 x 1 x 3 x 3 λ 3 23 (x 3 f 12 ( f 23 ( Reparameterization: x 2 x 2 j λ j α x j = 0 λ 2 13 (x 2 λ 2 23 (x 2 α j F = min x f α (x max α λ i α α min x f α (x + λ i α x i i α Bound solution using decomposed optimization Solve independently: optimistic bound Tighten the bound by reparameterization Enforce lost equality constraints via Lagrange multipliers

84 Dual Decomposition f 12 (x 1, x 2 f 13 (x 1, x 3 x 1 x 3 x 2 f 23 (x 2, x 3 λ 1 13 (x 1 λ 3 13 (x 3 f 13 ( λ 1 12 (x 1 x 1 x 1 x 3 x 3 λ 3 23 (x 3 f 12 ( f 23 ( Reparameterization: x 2 x 2 j λ j α x j = 0 λ 2 13 (x 2 λ 2 23 (x 2 α j F = min x f α (x max α λ i α α min x f α (x + λ i α x i i α Many names for the same class of bounds: Dual decomposition [Komodakis et al. 2007] TRW, MPLP [Wainwright et al. 2005; Globerson & Jaakkola, 2007] Soft arc consistency [Cooper & Schiex, 2004] Max-sum diffusion [Warner 2007]

85 Dual Decomposition f 12 (x 1, x 2 f 13 (x 1, x 3 x 1 x 3 x 2 f 23 (x 2, x 3 λ 1 13 (x 1 λ 3 13 (x 3 f 13 ( λ 1 12 (x 1 x 1 x 1 x 3 x 3 λ 3 23 (x 3 f 12 ( f 23 ( Reparameterization: x 2 x 2 j λ j α x j = 0 λ 2 13 (x 2 λ 2 23 (x 2 α j F = min x f α (x max α λ i α α min x f α (x + λ i α x i i α Many ways to optimize the bound: Sub-gradient descent [Komodakis et al. 2007; Jojic et al. 2010] Coordinate descent [Warner 2007; Globerson & Jaakkola 2007; Sontag et al. 2009; Ihler et al. 2012] Proximal optimization [Ravikumar et al, 2010] ADMM [Meshi & Globerson 2011; Martins et al. 2011; Forouzan & Ihler 2013]

86 Optimizing the bound Can optimize the bound in various ways: (Sub-gradient descent A B B C A B f 1 (A,B (B B C f 2 (B,C - (B =

87 Optimizing the bound Can optimize the bound in various ways: (Sub-gradient descent A B B C A B f 1 (A,B (B B C f 2 (B,C - (B =

88 Optimizing the bound Can optimize the bound in various ways: (Sub-gradient descent A B B C A B f 1 (A,B (B B C f 2 (B,C - (B =

89 Optimizing the bound Can optimize the bound in various ways: (Sub-gradient descent A B B C A B f 1 (A,B (B B C f 2 (B,C - (B =

90 Various Update Schemes Can use any decomposition updates (message passing, subgradient, augmented, etc. FGLP: Update the original factors JGLP: Update clique function of the join graph MBE-MM: Mini-bucket with moment matching Apply cost-shifting within each bucket only

91 Factor graph Linear Programming Update the original factors (FGLP Tighten all factors over over x i simultaneously Compute max-marginals & update:

92 FGLP (Factor Graph Linear Programming

94 Factor Graph Linear Programming

95 Complexity of FGLP

96 Mini-Bucket as Decomposition [Ihler et al. 2012] Downward pass as cost shifting Join graph: Can also do cost shifting within mini-buckets: Join graph message passing B: C: {A,B,C} {A,C} {A,C,E} {B} {B,D,E} {D,E} Moment-matching version: One message exchange within each bucket, during downward sweep Optimal bound defined by cliques ( regions and cost-shifting f n scopes ( coordinates D: E: A: {A,E} {A} {A,E} {A} U = upper bound {A,D,E} {A}

97 MBE-MM: MBE with moment matching Bucket B Bucket C P(E B,C P(C A max B m 11 m 12 h B (C,E P(B A max B P(D A,B m 11,m 12 - moment-matching messages P(B A B P(A A C P(C A Bucket D h B (A,D E Bucket E E = 0 h C (A,E D P(D A,B P(E B,C Bucket A P(A h E (A h D (A W=2 MPE* is an upper bound on MPE --U Generating a solution yields a lower bound--l

98 MBE-MM (MBE with Moment-Matching

99 Anytime Approximation Can tighten the bound in various ways Cost-shifting (improve consistency between cliques Increase i-bound (higher order consistency Simple moment-matching step improves bound significantly

100 Anytime Approximation Can tighten the bound in various ways Cost-shifting (improve consistency between cliques Increase i-bound (higher order consistency Simple moment-matching step improves bound significantly

101 Anytime Approximation Can tighten the bound in various ways Cost-shifting (improve consistency between cliques Increase i-bound (higher order consistency Simple moment-matching step improves bound significantly

102 Outline Mini-bucket elimination Weighted Mini-bucket Mini-clustering Iterative Belief propagation Iterative-join-graph propagation Re-parameterization, cost-shifting

Approximation Techniques Iterative bounded inference

Approximation Techniques Iterative bounded inference pproximation Techniques Iterative bounded inference COMPSCI 276, Spring 2013 Set 11: Rina Dechter (Reading: Primary: Class Notes (10) Secondary:, Darwiche chapters 14) genda Mini-bucket elimination Mini-clustering