Belief Propagation on Partially Ordered Sets Robert J. McEliece California Institute of Technology

Size: px

Start display at page:

Download "Belief Propagation on Partially Ordered Sets Robert J. McEliece California Institute of Technology"

Jessie Ramsey
5 years ago
Views:

1 Belief Propagation on Partially Ordered Sets Robert J. McEliece California Institute of Technology {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1,3} {2,3} {3,4} {3,5} {3} +1 International Symposium on Mathematical Theory of Networks and Systems University of Notre Dame August 13, 2002 With much help from Muhammed Yildirim, Jonathan Harel, Jeremy Thorpe, and Ravi Palanki.

2 Belief Propagation on Partially Ordered Sets Robert J. McEliece California Institute of Technology {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1,3} {2,3} {3,4} {3,5} {3} +1 International Symposium on Mathematical Theory of Networks and Systems University of Notre Dame August 13, 2002 * With much help from Muhammed Yildirim, Jonathan Harel, Jeremy Thorpe, and Ravi Palanki.

3 A Big Tip of the Hat to: Jonathan Yedidia, William Freeman, and Yair Weiss, the authors of Generalized Belief Propagation and Free Energy Minimization, which inspired this paper.

4 Outline of the Talk The Problem Statement Significance of the Problem A Statistical Physics Approach Solution Via Poset Belief Propagation Experimental Results Connection Between the Statistical Physics and the BP Approaches Open Problems Coffee Break

5 Outline of the Talk The Problem Statement Significance of the Problem A Statistical Physics Approach Solution Via Poset Belief Propagation Experimental Results Connection Between the Statistical Physics and the BP Approaches Open Problems Coffee Break

6 Outline of the Talk The Problem Statement Significance of the Problem A Statistical Physics Approach Solution Via Poset Belief Propagation Experimental Results Connection Between the Statistical Physics and the BP Approaches Open Problems Coffee Break

7 Outline of the Talk The Problem Statement Significance of the Problem A Statistical Physics Approach Solution Via Poset Belief Propagation Experimental Results Connection Between the Statistical Physics and the BP Approaches Open Problems Coffee Break

8 Outline of the Talk The Problem Statement Significance of the Problem A Statistical Physics Approach Solution Via Poset Belief Propagation Experimental Results Connection Between the Statistical Physics and the BP Approaches Open Problems Coffee Break

9 Outline of the Talk The Problem Statement Significance of the Problem A Statistical Physics Approach Solution Via Poset Belief Propagation Experimental Results Connection Between the Statistical Physics and the BP Approaches Open Problems Coffee Break

10 Outline of the Talk The Problem Statement Significance of the Problem A Statistical Physics Approach Solution Via Poset Belief Propagation Experimental Results Connection Between the Statistical Physics and the BP Approaches Open Problems Coffee Break

11 Outline of the Talk The Problem Statement Significance of the Problem A Statistical Physics Approach Solution Via Poset Belief Propagation Experimental Results Connection Between the Statistical Physics and the BP Approaches Open Problems Coffee Break

12 Outline of the Talk The Problem Statement Significance of the Problem A Statistical Physics Approach Solution Via Poset Belief Propagation Experimental Results Connection Between the Statistical Physics and the BP Approaches Open Problems Coffee Break

13 A General Computational Problem Variables {x 1,...,x n }, x i A = {0, 1,...,q 1}. R= {R 1,...,R M },acollection of subsets of {1, 2,...,n}. A set of nonnegative local potentials {α R (x R ):R R}. Define the global (Boltzmann) probability density function; B(x) = 1 α R (x R ), Z R R where Z = α R (x R ) (Partition function) x A n R R (F = ln Z = Helmholtz free energy).

14 A General Computational Problem Variables {x 1,...,x n }, x i A = {0, 1,...,q 1}. R= {R 1,...,R M },acollection of subsets of {1, 2,...,n}. A set of nonnegative local potentials {α R (x R ):R R}. Define the global (Boltzmann) probability density function; B(x) = 1 α R (x R ), Z R R where Z = α R (x R ) (Partition function) x A n R R (F = ln Z = Helmholtz free energy).

15 A General Computational Problem, Continued Problem: Compute, exactly or approximately, the Helmholtz free energy F and some or all of the local marginal densities of the Boltzmann density: for R R. B R (x R )= = 1 Z x R c A Rc B(x) x R c A Rc S R α S (x S ),

16 Applications: Appropriately interpreted, this computational problem includes: Turbo/LDPC decoding. Probabilistic inference in Bayesian networks. Finite Fourier transforms. Free energy computations in statistical physics. (But we won t discuss these applications).

17 Applications: Appropriately interpreted, this computational problem includes: Turbo/LDPC decoding. Probabilistic inference in Bayesian networks. Finite Fourier transforms. Free energy computations in statistical physics. (But we won t discuss these applications).

18 Applications: Appropriately interpreted, this computational problem includes: Turbo/LDPC decoding. Probabilistic inference in Bayesian networks. Finite Fourier transforms. Free energy computations in statistical physics. (But we won t discuss these applications).

19 Applications: Appropriately interpreted, this computational problem includes: Turbo/LDPC decoding. Probabilistic inference in Bayesian networks. Finite Fourier transforms. Free energy computations in statistical physics. (But we won t discuss these applications).

20 Applications: Appropriately interpreted, this computational problem includes: Turbo/LDPC decoding. Probabilistic inference in Bayesian networks. Finite Fourier transforms. Free energy computations in statistical physics. (But we won t discuss these applications).

21 Applications: Appropriately interpreted, this computational problem includes: Turbo/LDPC decoding. Probabilistic inference in Bayesian networks. Finite Fourier transforms. Free energy computations in statistical physics. (But we won t discuss these applications).

22 Applications: Appropriately interpreted, this computational problem includes: Turbo/LDPC decoding. Probabilistic inference in Bayesian networks. Finite Fourier transforms. Free energy computations in statistical physics. (But we won t discuss these applications).

23 A Simple Example. Alphabet: A = {0, 1}. Local domains: R = {{1, 2, 3}, {1, 3, 4}, {2, 3, 5}, {3, 4, 5}}. Local potentials: α i (x, y, z) = { 1/2 if x = y = z 0 otherwise i =1, 2, 3, 4.

24 A Simple Example. Alphabet: A = {0, 1}. Local domains: R = {{1, 2, 3}, {1, 3, 4}, {2, 3, 5}, {3, 4, 5}}. Local potentials: α i (x, y, z) = { 1/2 if x = y = z 0 otherwise i =1, 2, 3, 4. Answers: Z =1/8 F =ln8. B(x 1,x 2,x 3,x 4,x 5 )={ 1/2 if x1 = x 2 = = x 5 0 otherwise. { B i (x, y, z) = 1/2 if x = y = z 0 otherwise

25 A Statistical Physics Approach s i S S = {s 1,...,s n } = n identical particles. Spin of s i = x i A = {0, 1,...,q 1}. System configuration x =(x 1,x 2,...,x n ). E(x 1,...,x n )=energy of configuration x. (Hamiltonian) = R R ln α R(x R ).

26 A Statistical Physics Approach s i S S = {s 1,...,s n } = n identical particles. Spin of s i = x i A = {0, 1,...,q 1}. System configuration x =(x 1,x 2,...,x n ). E(x 1,...,x n )=energy of configuration x. (Hamiltonian) = R R ln α R(x R ).

27 A Statistical Physics Approach s i S S = {s 1,...,s n } = n identical particles. Spin of s i = x i A = {0, 1,...,q 1}. System configuration x =(x 1,x 2,...,x n ). E(x 1,...,x n )=energy of configuration x. (Hamiltonian) = R R ln α R(x R ).

28 A Statistical Physics Approach s i S S = {s 1,...,s n } = n identical particles. Spin of s i = x i A = {0, 1,...,q 1}. System configuration x =(x 1,x 2,...,x n ). E(x 1,...,x n )=energy of configuration x. (Hamiltonian) = R R ln α R(x R ).

29 A Statistical Physics Approach b(x) = Trial probability of configuration x. Average energy: U = x A b(x)e(x). n Entropy: H = x A b(x)lnb(x). n Variational free energy: F (b) =U H.

30 A Statistical Physics Approach b(x) = Trial probability of configuration x. Average energy: U = x A b(x)e(x). n Entropy: H = x A b(x)lnb(x). n Variational free energy: F (b) =U H.

31 A Statistical Physics Approach b(x) = Trial probability of configuration x. Average energy: U = x A b(x)e(x). n Entropy: H = x A b(x)lnb(x). n Variational free energy: F (b) =U H.

32 A Statistical Physics Approach b(x) = Trial probability of configuration x. Average energy: U = x A b(x)e(x). n Entropy: H = x A b(x)lnb(x). n Variational free energy: F (b) =U H.

33 A Famous Theorem from Statistical Mechanics Theorem. F (b) F, with equality if and only if b(x) =B(x) = 1 Z e E(x), the Boltzmann-Gibbs, or equilibrium, density.

34 Proof of the Famous Theorem A simple calculation shows that F (b) =D(b B)+F. Hence F (b) F, with equals iff b = B.

35 An Important Corollary Corollary. F = min b(x) F (b) B(x) = arg min b(x) F (b). Suggests a possible method for computing F (and B(x)), but as it stands, it s too complex...

36 An Important Corollary Corollary. F = min b(x) F (b) B(x) = arg min b(x) F (b). Suggests a possible method for computing F (and B(x)), but as it stands, it s too complex, and anyway it doesn t yield the marginals B R (x)...

37 A Solution Using Belief Propagation on a Partially Ordered Set (But What is a Partially Ordered Set?) A finite partially ordered set is a finite set P together with a binary relation, denoted, which satifies the following three axioms: 1. For all ρ P, ρ ρ. (reflexitivity) 2. If ρ σ and σ ρ, then ρ = σ. (antisymmetry) 3. If ρ σ and σ τ, then ρ τ. (transitivity)

38 A Solution Using Belief Propagation on a Partially Ordered Set (But What is a Partially Ordered Set?) A finite partially ordered set is a finite set P together with a binary relation, denoted, which satifies the following three axioms: 1. For all ρ P, ρ ρ. (reflexitivity) 2. If ρ σ and σ ρ, then ρ = σ. (antisymmetry) 3. If ρ σ and σ τ, then ρ τ. (transitivity)

39 A Solution Using Belief Propagation on a Partially Ordered Set (But What is a Partially Ordered Set?) A finite partially ordered set is a finite set P together with a binary relation, denoted, which satifies the following three axioms: 1. For all ρ P, ρ ρ. (reflexitivity) 2. If ρ σ and σ ρ, then ρ = σ. (antisymmetry) 3. If ρ σ and σ τ, then ρ τ. (transitivity)

40 Hasse Diagrams for Three Posets

41 Overcounting Numbers for Posets We assign an overcounting number φ(ρ) toeach ρ P, such that (1) ρ:ρ σ φ(ρ) =1, for all σ P. The overcounting numbers φ(ρ) are integers and are determined uniquely by (1).

42 Some Overcounting Numbers

43 How to Distribute the Local Potentials in a Given Poset P. Step 1: Assign each variable x i to one or more elements of P.IfP i = {ρ P : x i is assigned to ρ}, then we require: P i is connected; P i is closed under, i.e., if x i is assigned to ρ it is also ssigned to all of ρ s superiors; φ(p i )=1,i.e., the net number of appearances of x i is 1. We denote by D(ρ) (the local domain at ρ) the set of variables which are assigned to ρ.

44 How to Distribute the Local Potentials in a Given Poset P. Step 2: Assign each local potential α R (x R )toone or more elements of P. If P R = {ρ P : α R (x R )isassigned to ρ}, then we require: R D(ρ) for all ρ P R, i.e., the local domain at ρ supports α R (x R ); P R is connected; P R is closed under, i.e., if α R is known to ρ it is also known to all of ρ s superiors; φ(p R )=1,i.e., the net number of appearances of α R (x R ) is 1.

45 Example: Generalized Belief Propagation (YF&W) R= {{1, 2}, {2, 3},...,{8, 9}}: Here is the poset:

46 The Trick is to Cluster the Variables {1,2,4,5} {2,3,5,6} {4,5,7,8} {5,6,8,9} {2,,5} {4,5} {5,6} {5,8} +1 {5}

47 The PBP Algorithm: The Messages. Throughout the algorithm, each edge e =(ρ, σ) ofthe Hasse diagram carries a message m e. The message m e on the edge e is a function on the domain of σ: m e = m e (x σ ). Example: {x, x, x, x } ρ e σ {x, x, x } (x 2,x 4,x 6 ) m e (x 2.x 4,x 6 ) (0, 0, 0) 1 (0, 0, 1) 0.. (1, 1, 1) 0

48 The PBP Algorithm: Calculating Beliefs. Foragiven set of messages {m e : e E}, wedefine the belief at ρ as the following probability density on the domain at ρ: b ρ (x ρ ) α ρ (x ρ ) m e (x fin e ), e E(ρ) where the normalization is such that x ρ b ρ (x ρ )=1. Here α ρ (x ρ )isthe local potential at ρ, and E(ρ) represents the set of messages which are fused at ρ.

49 The PBP Algorithm: Calculating Beliefs. Foragiven set of messages {m e : e E}, wedefine the belief at ρ as the following probability density on the domain at ρ: b ρ (x ρ ) α ρ (x ρ ) m e (x fin e ), e E(ρ) where the normalization is such that x ρ b ρ (x ρ )=1. Here α ρ (x ρ ) is the local potential at ρ, and E(ρ) represents the set of messages which are fused at ρ.

50 The Messages that are Fused at ρ b ρ (x ρ ) α ρ (x ρ ) e E(ρ) m e (x fin e ), The messages in E(ρ) are those that originate outside ρ s field of view but terminate inside it. init(e) ρ e E(ρ) e fin(e)

51 The PBP Algorithm: Updating the Messages. An edge e =(ρ, σ) issaid to be consistent with respect to a given set {m e : E E} of messages if x ρ \x σ b ρ (x ρ )=b σ (x σ ) for all x σ A D(σ). In words, this says that the belief at ρ in x σ, obtained by marginalization, agrees with the belief at σ in x σ.

52 Example of Edge Consistency {x 1, x 2, x 4, x 6 } ρ e σ {x, x, x } e =(ρ, σ) isconsistent if b ρ (x 1,x 2,x 4,x 6 )=b σ (x 2,x 4,x 6 ) x 1 A

53 Example of Edge Consistency {x 1, x 2, x 4, x 6 } ρ e σ {x, x, x } e =(ρ, σ) isconsistent if b ρ (x 1,x 2,x 4,x 6 )=b σ (x 2,x 4,x 6 ) x 1 A

54 The PBP Algorithm: The Update Rule When the message m e is updated, it is adjusted so that the edge e becomes consistent. Explicitly, m e (x σ ) x ρ \x σ ( α ρ\σ (x ρ ) ) g E(ρ)\E(σ) m g(x fin(g) ) f E(σ)\{E(ρ) e} m. f (x fin(f) ) The PBP algorithm proceeds by updating messages according to this rule. The hope is that the messages will converge to a fixed point whose associated beliefs are good approximations to the desired marginals, i.e., b ρ (x ρ ) B ρ (x ρ ), where B(x) is the global or Boltzmann density.

55 The PBP Algorithm: The Update Rule When the message m e is updated, it is adjusted so that the edge e becomes consistent. Explicitly, m e (x σ ) x ρ \x σ ( α ρ\σ (x ρ ) ) g E(ρ)\E(σ) m g(x fin(g) ) f E(σ)\{E(ρ) e} m. f (x fin(f) ) The PBP algorithm proceeds by updating messages according to this rule. The hope is that the messages will converge to a fixed point whose associated beliefs are good approximations to the desired marginals, i.e., b ρ (x ρ ) B ρ (x ρ ), where B(x) is the global or Boltzmann density.

56 How Well Does it Work? A series of experiments: Local domains: R = {{1, 2, 3}, {1, 3, 4}, {2, 3, 5}, {3, 4, 5}}. Local potentials: α 1 (x 1,x 2,x 3 ), α 2 (x 1,x 3,x 4 ), α 3 (x 2,x 3,x 5 ), and α 4 (x 3,x 4,x 5 ) selected at random from the 8-simplex. There are many posets that support this choice of domains; we will investigate only three.

57 How Well Does it Work? A series of experiments: Local domains: R = {{1, 2, 3}, {1, 3, 4}, {2, 3, 5}, {3, 4, 5}}. Local potentials: α 1 (x 1,x 2,x 3 ), α 2 (x 1,x 3,x 4 ), α 3 (x 2,x 3,x 5 ), and α 4 (x 3,x 4,x 5 ) selected at random from the 8-simplex. There are many posets that support this choice of domains; we will investigate only three.

58 How Well Does it Work? A series of experiments: Local domains: R = {{1, 2, 3}, {1, 3, 4}, {2, 3, 5}, {3, 4, 5}}. Local potentials: α 1 (x 1,x 2,x 3 ), α 2 (x 1,x 3,x 4 ), α 3 (x 2,x 3,x 5 ), and α 4 (x 3,x 4,x 5 ) selected at random from the 8-simplex. There are many posets that support this choice of domains; we will investigate only three.

59 How Well Does it Work? A series of experiments: Local domains: R = {{1, 2, 3}, {1, 3, 4}, {2, 3, 5}, {3, 4, 5}}. Local potentials: α 1 (x 1,x 2,x 3 ), α 2 (x 1,x 3,x 4 ), α 3 (x 2,x 3,x 5 ), and α 4 (x 3,x 4,x 5 )selected at random from the 8-simplex. There are many posets that support this choice of domains; we will investigate only three.

60 How Well Does it Work? A series of experiments: Local domains: R = {{1, 2, 3}, {1, 3, 4}, {2, 3, 5}, {3, 4, 5}}. Local potentials: α 1 (x 1,x 2,x 3 ), α 2 (x 1,x 3,x 4 ), α 3 (x 2,x 3,x 5 ), and α 4 (x 3,x 4,x 5 )selected at random from the 8-simplex. There are many posets that support this choice of domains; we will investigate only three.

61 Poset Number One (Junction Graph Construction) {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1} {2,3} {3,4} {3,5}

62 Poset Number Two (Factor Graph Construction) {1,2,3} {1,3,4} {3,4,5} {2,3,5} {1} {2} {3} {4} {5}

63 Poset Number Three (Cluster Variational Method) {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1,3} {2,3} {3,4} {3,5} {3} +1

64 Poset Number One {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1} {2,3} {3,4} {3,5} PBP on Poset #1 local kernel # iterative exact

65 Poset Number One {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1} {2,3} {3,4} {3,5} PBP on Poset #1 local kernel # iterative exact

66 Poset Number One {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1} {2,3} {3,4} {3,5} PBP on Poset #1 local kernel # iterative exact

67 Poset Number One {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1} {2,3} {3,4} {3,5} PBP on Poset #1 local kernel # iterative exact

68 Poset Number One {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1} {2,3} {3,4} {3,5} PBP on Poset #1 variable # iterative exact

69 Poset Number One {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1} {2,3} {3,4} {3,5} PBP on Poset #1 variable # iterative exact

70 Poset Number One {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1} {2,3} {3,4} {3,5} PBP on Poset #1 variable # iterative exact

71 Poset Number One {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1} {2,3} {3,4} {3,5} PBP on Poset #1 variable # iterative exact

72 Poset Number One {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1} {2,3} {3,4} {3,5} PBP on Poset #1 variable # iterative exact

73 Poset Number Two (Factor Graph Construction) {1,2,3} {1,3,4} {3,4,5} {2,3,5} {1} {2} {3} {4} {5}

74 Poset Number Two {1,2,3} {1,3,4} {3,4,5} {2,3,5} {1} {2} {3} {4} {5} PBP on Poset #2 local kernel # iterative exact

75 Poset Number Two {1,2,3} {1,3,4} {3,4,5} {2,3,5} {1} {2} {3} {4} {5} PBP on Poset #2 local kernel # iterative exact

76 Poset Number Two {1,2,3} {1,3,4} {3,4,5} {2,3,5} {1} {2} {3} {4} {5} PBP on Poset #2 local kernel # iterative exact

77 Poset Number Two {1,2,3} {1,3,4} {3,4,5} {2,3,5} {1} {2} {3} {4} {5} PBP on Poset #2 local kernel # iterative exact

78 Poset Number Two {1,2,3} {1,3,4} {3,4,5} {2,3,5} {1} {2} {3} {4} {5} PBP on Poset #2 variable # iterative exact

79 Poset Number Two {1,2,3} {1,3,4} {3,4,5} {2,3,5} {1} {2} {3} {4} {5} PBP on Poset #2 variable # iterative exact

80 Poset Number Two {1,2,3} {1,3,4} {3,4,5} {2,3,5} {1} {2} {3} {4} {5} PBP on Poset #2 variable # iterative exact

81 Poset Number Two {1,2,3} {1,3,4} {3,4,5} {2,3,5} {1} {2} {3} {4} {5} PBP on Poset #2 variable # iterative exact

82 Poset Number Two {1,2,3} {1,3,4} {3,4,5} {2,3,5} {1} {2} {3} {4} {5} PBP on Poset #2 variable # iterative exact

83 Poset Number Three (Cluster Variational Method) {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1,3} {2,3} {3,4} {3,5} {3} +1

84 Poset Number Three {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1,3} {2,3} {3,4} {3,5} {3} +1 1 PBP on Poset #2 local kernel #1 (1000 runs,, w=0.65) iterative exact

85 Poset Number Three {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1,3} {2,3} {3,4} {3,5} {3} +1 1 PBP on Poset #2 local kernel #2 (1000 runs,, w=0.65) iterative exact

86 Poset Number Three {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1,3} {2,3} {3,4} {3,5} {3} +1 1 PBP on Poset #2 local kernel #3 (1000 runs,, w=0.65) iterative exact

87 Poset Number Three {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1,3} {2,3} {3,4} {3,5} {3} +1 1 PBP on Poset #2 local kernel #4 (1000 runs,, w=0.65) iterative exact

88 The Problem is Non-Convergence, and is Easily Repaired. y n+1 = F (y n ) (too aggressive) y n+1 = y n F (y n ) y n+1 = y 1 w n F (y n ) w for 0 <w<1.

89 The Problem is Non-Convergence, and is Easily Repaired. y n+1 = F (y n ) (too aggressive) y n+1 = y n F (y n ) (slower, but surer) y n+1 = y 1 w n F (y n ) w for 0 <w<1.

90 The Problem is Non-Convergence, and is Easily Repaired. y n+1 = F (y n ) (too aggressive) y n+1 = y n F (y n ) (slower, but surer) y n+1 = y 1 w n F (y n ) w for 0 <w<1.

91 Poset Number Three {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1,3} {2,3} {3,4} {3,5} {3} +1 KullbackLeibler Divergence (summed over experiments) Performance of GGBP (PBP) on Poset #3 for 10/w iterations w, coefficient of old message in update rule

92 {1,2,3} {1,3,4} {2,3,5} {3,4,5} Poset Number Three {1,3} {2,3} {3,4} {3,5} {3} +1 Performance of GGBP (PBP) on Poset #3 for 1000/w iterations (on [.3,.6]) 800 KullbackLeibler Divergence (summed over experiments) w, coefficient of old message in update rule

93 Poset Number Three {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1,3} {2,3} {3,4} {3,5} {3} +1 1 PBP on Poset #3 local kernel # iterative exact

94 Poset Number Three {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1,3} {2,3} {3,4} {3,5} {3} +1 1 PBP on Poset #3 local kernel # iterative exact

95 Poset Number Three {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1,3} {2,3} {3,4} {3,5} {3} +1 1 PBP on Poset #3 local kernel # iterative exact

96 Poset Number Three {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1,3} {2,3} {3,4} {3,5} {3} +1 1 PBP on Poset #3 local kernel # iterative exact

97 Poset Number Three {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1,3} {2,3} {3,4} {3,5} {3} +1 1 PBP on Poset #3 variable # iterative exact

98 Poset Number Three {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1,3} {2,3} {3,4} {3,5} {3} +1 1 PBP on Poset #3 variable # iterative exact

99 Poset Number Three {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1,3} {2,3} {3,4} {3,5} {3} +1 1 PBP on Poset #3 variable # iterative exact

100 Poset Number Three {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1,3} {2,3} {3,4} {3,5} {3} +1 1 PBP on Poset #3 variable # iterative exact

101 Poset Number Three {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1,3} {2,3} {3,4} {3,5} {3} +1 1 PBP on Poset #3 variable # iterative exact

102 Why Does It Work? No one really knows, but... The PBP algorithm can be viewed as an algorithm for minimizing a certain energy function. There is a one-toone correspondence between the fixed points of PBP and the stationary points of the energy. (It appears that the attractive fixed points correspond to the local minima of the free energy.)

103 Why Does It Work? No one really knows, but... The PBP algorithm can be viewed as an algorithm for minimizing a certain energy function. There is a one-toone correspondence between the fixed points of PBP and the stationary points of the energy. (It appears that the attractive fixed points correspond to the local minima of the free energy.)

104 Why Does It Work? No one really knows, but... The PBP algorithm can be viewed as an algorithm for minimizing a certain energy function. There is a one-toone correspondence between the fixed points of PBP and the stationary points of this energy surface. (It appears that the attractive fixed points correspond to the local minima of the free energy.)

105 More Precisely: The Bethe-Kikuchi Approximation We know F = min b(x) B(x) =argmin b(x) F (b(x)) F (b(x)). We approximate F (b(x)) with something that depends only on the poset P and the marginals b ρ (x ρ )ofb(x): F P (b(x)) = ρ P φ(ρ) F ρ (b ρ (x ρ ), where F ρ (b ρ (x ρ )) is the local free energy at ρ, defined as b ρ (x ρ )E ρ (x ρ )+ b ρ (x ρ )lnb ρ (x ρ ). x ρ x ρ

106 More Precisely: The Bethe-Kikuchi Approximation We know F = min b(x) B(x) =argmin b(x) F (b(x)) F (b(x)). We approximate F (b(x)) with something that depends only on the poset P and the marginals b ρ (x ρ )ofb(x): F P (b(x)) = ρ P φ(ρ) F ρ (b ρ (x ρ ), where F ρ (b ρ (x ρ )) is the local free energy at ρ, defined as b ρ (x ρ )E ρ (x ρ )+ b ρ (x ρ )lnb ρ (x ρ ). x ρ x ρ

107 An Analogy: Black line = F Colored line = F P. The hope is that min F {bρ (x ρ )}) P min b(x) F = F.

108 Poset Number One The BK Approximation {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1} {2,3} {3,4} {3,5} Variational Free Energy Poset #1 with error= FP weighted difference from convergence point

109 Poset Number Two The BK Approximation {1,2,3} {1,3,4} {3,4,5} {2,3,5} {1} {2} {3} {4} {5} Variational Free Energy Poset #2 with error= F weighted difference from convergence point

110 Poset Number Three The BK Approximation {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1,3} {2,3} {3,4} {3,5} {3} Variational Free Energy Poset #3 with error= F weighted difference from convergence point

111 Some Open Questions What is the best poset for a given MPD problem? Prove (or find a counterexample) that the stable fixed points of PBP correspond to the local minima of the BK variational free energy. Can the PBP algorithm always be made to converge using the smoothing trick? What is the relationship between the BK approximate free energy and the exact (Helmholtz), free energy? Can other combinatorial optimization methods, e.g. simulated annealing, be used to minimize F P, thereby leading to alternative BP algorithms?

112 Some Open Questions What is the best poset for a given MPD problem? Prove (or find a counterexample) that the stable fixed points of PBP correspond to the local minima of the BK variational free energy. Can the PBP algorithm always be made to converge using the smoothing trick? What is the relationship between the BK approximate free energy and the exact (Helmholtz), free energy? Can other combinatorial optimization methods, e.g. simulated annealing, be used to minimize F P, thereby leading to alternative BP algorithms?

113 Some Open Questions What is the best poset for a given MPD problem? Prove (or find a counterexample) that the stable fixed points of PBP correspond to the local minima of the BK variational free energy. Can the PBP algorithm always be made to converge using the smoothing trick? What is the relationship between the BK approximate free energy and the exact (Helmholtz), free energy? Can other combinatorial optimization methods, e.g. simulated annealing, be used to minimize F P, thereby leading to alternative BP algorithms?

114 Some Open Questions What is the best poset for a given MPD problem? Prove (or find a counterexample) that the stable fixed points of PBP correspond to the local minima of the BK variational free energy. Can the PBP algorithm always be made to converge using the smoothing trick? What is the relationship between the BK approximate free energy and the exact (Helmholtz), free energy? Can other combinatorial optimization methods, e.g. simulated annealing, be used to minimize F P, thereby leading to alternative BP algorithms?

115 Some Open Questions What is the best poset for a given MPD problem? Prove (or find a counterexample) that the stable fixed points of PBP correspond to the local minima of the BK variational free energy. Can the PBP algorithm always be made to converge using the smoothing trick? What is the relationship between the BK approximate free energy and the exact (Helmholtz), free energy? Can other combinatorial optimization methods, e.g. simulated annealing, be used to minimize F P, thereby leading to alternative BP algorithms?

116 Some Open Questions What is the best poset for a given MPD problem? Prove (or find a counterexample) that the stable fixed points of PBP correspond to the local minima of the BK variational free energy. Can the PBP algorithm always be made to converge using the smoothing trick? What is the relationship between the BK approximate free energy and the exact (Helmholtz), free energy? Can other combinatorial optimization methods, e.g. simulated annealing, be used to minimize F P, thereby leading to alternative BP algorithms?

The Generalized Distributive Law and Free Energy Minimization

The Generalized Distributive Law and Free Energy Minimization Srinivas M. Aji Robert J. McEliece Rainfinity, Inc. Department of Electrical Engineering 87 N. Raymond Ave. Suite 200 California Institute