Belief Propagation on Partially Ordered Sets Robert J. McEliece California Institute of Technology

Size: px
Start display at page:

Download "Belief Propagation on Partially Ordered Sets Robert J. McEliece California Institute of Technology"

Transcription

1 Belief Propagation on Partially Ordered Sets Robert J. McEliece California Institute of Technology {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1,3} {2,3} {3,4} {3,5} {3} +1 International Symposium on Mathematical Theory of Networks and Systems University of Notre Dame August 13, 2002 With much help from Muhammed Yildirim, Jonathan Harel, Jeremy Thorpe, and Ravi Palanki.

2 Belief Propagation on Partially Ordered Sets Robert J. McEliece California Institute of Technology {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1,3} {2,3} {3,4} {3,5} {3} +1 International Symposium on Mathematical Theory of Networks and Systems University of Notre Dame August 13, 2002 * With much help from Muhammed Yildirim, Jonathan Harel, Jeremy Thorpe, and Ravi Palanki.

3 A Big Tip of the Hat to: Jonathan Yedidia, William Freeman, and Yair Weiss, the authors of Generalized Belief Propagation and Free Energy Minimization, which inspired this paper.

4 Outline of the Talk The Problem Statement Significance of the Problem A Statistical Physics Approach Solution Via Poset Belief Propagation Experimental Results Connection Between the Statistical Physics and the BP Approaches Open Problems Coffee Break

5 Outline of the Talk The Problem Statement Significance of the Problem A Statistical Physics Approach Solution Via Poset Belief Propagation Experimental Results Connection Between the Statistical Physics and the BP Approaches Open Problems Coffee Break

6 Outline of the Talk The Problem Statement Significance of the Problem A Statistical Physics Approach Solution Via Poset Belief Propagation Experimental Results Connection Between the Statistical Physics and the BP Approaches Open Problems Coffee Break

7 Outline of the Talk The Problem Statement Significance of the Problem A Statistical Physics Approach Solution Via Poset Belief Propagation Experimental Results Connection Between the Statistical Physics and the BP Approaches Open Problems Coffee Break

8 Outline of the Talk The Problem Statement Significance of the Problem A Statistical Physics Approach Solution Via Poset Belief Propagation Experimental Results Connection Between the Statistical Physics and the BP Approaches Open Problems Coffee Break

9 Outline of the Talk The Problem Statement Significance of the Problem A Statistical Physics Approach Solution Via Poset Belief Propagation Experimental Results Connection Between the Statistical Physics and the BP Approaches Open Problems Coffee Break

10 Outline of the Talk The Problem Statement Significance of the Problem A Statistical Physics Approach Solution Via Poset Belief Propagation Experimental Results Connection Between the Statistical Physics and the BP Approaches Open Problems Coffee Break

11 Outline of the Talk The Problem Statement Significance of the Problem A Statistical Physics Approach Solution Via Poset Belief Propagation Experimental Results Connection Between the Statistical Physics and the BP Approaches Open Problems Coffee Break

12 Outline of the Talk The Problem Statement Significance of the Problem A Statistical Physics Approach Solution Via Poset Belief Propagation Experimental Results Connection Between the Statistical Physics and the BP Approaches Open Problems Coffee Break

13 A General Computational Problem Variables {x 1,...,x n }, x i A = {0, 1,...,q 1}. R= {R 1,...,R M },acollection of subsets of {1, 2,...,n}. A set of nonnegative local potentials {α R (x R ):R R}. Define the global (Boltzmann) probability density function; B(x) = 1 α R (x R ), Z R R where Z = α R (x R ) (Partition function) x A n R R (F = ln Z = Helmholtz free energy).

14 A General Computational Problem Variables {x 1,...,x n }, x i A = {0, 1,...,q 1}. R= {R 1,...,R M },acollection of subsets of {1, 2,...,n}. A set of nonnegative local potentials {α R (x R ):R R}. Define the global (Boltzmann) probability density function; B(x) = 1 α R (x R ), Z R R where Z = α R (x R ) (Partition function) x A n R R (F = ln Z = Helmholtz free energy).

15 A General Computational Problem, Continued Problem: Compute, exactly or approximately, the Helmholtz free energy F and some or all of the local marginal densities of the Boltzmann density: for R R. B R (x R )= = 1 Z x R c A Rc B(x) x R c A Rc S R α S (x S ),

16 Applications: Appropriately interpreted, this computational problem includes: Turbo/LDPC decoding. Probabilistic inference in Bayesian networks. Finite Fourier transforms. Free energy computations in statistical physics. (But we won t discuss these applications).

17 Applications: Appropriately interpreted, this computational problem includes: Turbo/LDPC decoding. Probabilistic inference in Bayesian networks. Finite Fourier transforms. Free energy computations in statistical physics. (But we won t discuss these applications).

18 Applications: Appropriately interpreted, this computational problem includes: Turbo/LDPC decoding. Probabilistic inference in Bayesian networks. Finite Fourier transforms. Free energy computations in statistical physics. (But we won t discuss these applications).

19 Applications: Appropriately interpreted, this computational problem includes: Turbo/LDPC decoding. Probabilistic inference in Bayesian networks. Finite Fourier transforms. Free energy computations in statistical physics. (But we won t discuss these applications).

20 Applications: Appropriately interpreted, this computational problem includes: Turbo/LDPC decoding. Probabilistic inference in Bayesian networks. Finite Fourier transforms. Free energy computations in statistical physics. (But we won t discuss these applications).

21 Applications: Appropriately interpreted, this computational problem includes: Turbo/LDPC decoding. Probabilistic inference in Bayesian networks. Finite Fourier transforms. Free energy computations in statistical physics. (But we won t discuss these applications).

22 Applications: Appropriately interpreted, this computational problem includes: Turbo/LDPC decoding. Probabilistic inference in Bayesian networks. Finite Fourier transforms. Free energy computations in statistical physics. (But we won t discuss these applications).

23 A Simple Example. Alphabet: A = {0, 1}. Local domains: R = {{1, 2, 3}, {1, 3, 4}, {2, 3, 5}, {3, 4, 5}}. Local potentials: α i (x, y, z) = { 1/2 if x = y = z 0 otherwise i =1, 2, 3, 4.

24 A Simple Example. Alphabet: A = {0, 1}. Local domains: R = {{1, 2, 3}, {1, 3, 4}, {2, 3, 5}, {3, 4, 5}}. Local potentials: α i (x, y, z) = { 1/2 if x = y = z 0 otherwise i =1, 2, 3, 4. Answers: Z =1/8 F =ln8. B(x 1,x 2,x 3,x 4,x 5 )={ 1/2 if x1 = x 2 = = x 5 0 otherwise. { B i (x, y, z) = 1/2 if x = y = z 0 otherwise

25 A Statistical Physics Approach s i S S = {s 1,...,s n } = n identical particles. Spin of s i = x i A = {0, 1,...,q 1}. System configuration x =(x 1,x 2,...,x n ). E(x 1,...,x n )=energy of configuration x. (Hamiltonian) = R R ln α R(x R ).

26 A Statistical Physics Approach s i S S = {s 1,...,s n } = n identical particles. Spin of s i = x i A = {0, 1,...,q 1}. System configuration x =(x 1,x 2,...,x n ). E(x 1,...,x n )=energy of configuration x. (Hamiltonian) = R R ln α R(x R ).

27 A Statistical Physics Approach s i S S = {s 1,...,s n } = n identical particles. Spin of s i = x i A = {0, 1,...,q 1}. System configuration x =(x 1,x 2,...,x n ). E(x 1,...,x n )=energy of configuration x. (Hamiltonian) = R R ln α R(x R ).

28 A Statistical Physics Approach s i S S = {s 1,...,s n } = n identical particles. Spin of s i = x i A = {0, 1,...,q 1}. System configuration x =(x 1,x 2,...,x n ). E(x 1,...,x n )=energy of configuration x. (Hamiltonian) = R R ln α R(x R ).

29 A Statistical Physics Approach b(x) = Trial probability of configuration x. Average energy: U = x A b(x)e(x). n Entropy: H = x A b(x)lnb(x). n Variational free energy: F (b) =U H.

30 A Statistical Physics Approach b(x) = Trial probability of configuration x. Average energy: U = x A b(x)e(x). n Entropy: H = x A b(x)lnb(x). n Variational free energy: F (b) =U H.

31 A Statistical Physics Approach b(x) = Trial probability of configuration x. Average energy: U = x A b(x)e(x). n Entropy: H = x A b(x)lnb(x). n Variational free energy: F (b) =U H.

32 A Statistical Physics Approach b(x) = Trial probability of configuration x. Average energy: U = x A b(x)e(x). n Entropy: H = x A b(x)lnb(x). n Variational free energy: F (b) =U H.

33 A Famous Theorem from Statistical Mechanics Theorem. F (b) F, with equality if and only if b(x) =B(x) = 1 Z e E(x), the Boltzmann-Gibbs, or equilibrium, density.

34 Proof of the Famous Theorem A simple calculation shows that F (b) =D(b B)+F. Hence F (b) F, with equals iff b = B.

35 An Important Corollary Corollary. F = min b(x) F (b) B(x) = arg min b(x) F (b). Suggests a possible method for computing F (and B(x)), but as it stands, it s too complex...

36 An Important Corollary Corollary. F = min b(x) F (b) B(x) = arg min b(x) F (b). Suggests a possible method for computing F (and B(x)), but as it stands, it s too complex, and anyway it doesn t yield the marginals B R (x)...

37 A Solution Using Belief Propagation on a Partially Ordered Set (But What is a Partially Ordered Set?) A finite partially ordered set is a finite set P together with a binary relation, denoted, which satifies the following three axioms: 1. For all ρ P, ρ ρ. (reflexitivity) 2. If ρ σ and σ ρ, then ρ = σ. (antisymmetry) 3. If ρ σ and σ τ, then ρ τ. (transitivity)

38 A Solution Using Belief Propagation on a Partially Ordered Set (But What is a Partially Ordered Set?) A finite partially ordered set is a finite set P together with a binary relation, denoted, which satifies the following three axioms: 1. For all ρ P, ρ ρ. (reflexitivity) 2. If ρ σ and σ ρ, then ρ = σ. (antisymmetry) 3. If ρ σ and σ τ, then ρ τ. (transitivity)

39 A Solution Using Belief Propagation on a Partially Ordered Set (But What is a Partially Ordered Set?) A finite partially ordered set is a finite set P together with a binary relation, denoted, which satifies the following three axioms: 1. For all ρ P, ρ ρ. (reflexitivity) 2. If ρ σ and σ ρ, then ρ = σ. (antisymmetry) 3. If ρ σ and σ τ, then ρ τ. (transitivity)

40 Hasse Diagrams for Three Posets

41 Overcounting Numbers for Posets We assign an overcounting number φ(ρ) toeach ρ P, such that (1) ρ:ρ σ φ(ρ) =1, for all σ P. The overcounting numbers φ(ρ) are integers and are determined uniquely by (1).

42 Some Overcounting Numbers

43 How to Distribute the Local Potentials in a Given Poset P. Step 1: Assign each variable x i to one or more elements of P.IfP i = {ρ P : x i is assigned to ρ}, then we require: P i is connected; P i is closed under, i.e., if x i is assigned to ρ it is also ssigned to all of ρ s superiors; φ(p i )=1,i.e., the net number of appearances of x i is 1. We denote by D(ρ) (the local domain at ρ) the set of variables which are assigned to ρ.

44 How to Distribute the Local Potentials in a Given Poset P. Step 2: Assign each local potential α R (x R )toone or more elements of P. If P R = {ρ P : α R (x R )isassigned to ρ}, then we require: R D(ρ) for all ρ P R, i.e., the local domain at ρ supports α R (x R ); P R is connected; P R is closed under, i.e., if α R is known to ρ it is also known to all of ρ s superiors; φ(p R )=1,i.e., the net number of appearances of α R (x R ) is 1.

45 Example: Generalized Belief Propagation (YF&W) R= {{1, 2}, {2, 3},...,{8, 9}}: Here is the poset:

46 The Trick is to Cluster the Variables {1,2,4,5} {2,3,5,6} {4,5,7,8} {5,6,8,9} {2,,5} {4,5} {5,6} {5,8} +1 {5}

47 The PBP Algorithm: The Messages. Throughout the algorithm, each edge e =(ρ, σ) ofthe Hasse diagram carries a message m e. The message m e on the edge e is a function on the domain of σ: m e = m e (x σ ). Example: {x, x, x, x } ρ e σ {x, x, x } (x 2,x 4,x 6 ) m e (x 2.x 4,x 6 ) (0, 0, 0) 1 (0, 0, 1) 0.. (1, 1, 1) 0

48 The PBP Algorithm: Calculating Beliefs. Foragiven set of messages {m e : e E}, wedefine the belief at ρ as the following probability density on the domain at ρ: b ρ (x ρ ) α ρ (x ρ ) m e (x fin e ), e E(ρ) where the normalization is such that x ρ b ρ (x ρ )=1. Here α ρ (x ρ )isthe local potential at ρ, and E(ρ) represents the set of messages which are fused at ρ.

49 The PBP Algorithm: Calculating Beliefs. Foragiven set of messages {m e : e E}, wedefine the belief at ρ as the following probability density on the domain at ρ: b ρ (x ρ ) α ρ (x ρ ) m e (x fin e ), e E(ρ) where the normalization is such that x ρ b ρ (x ρ )=1. Here α ρ (x ρ ) is the local potential at ρ, and E(ρ) represents the set of messages which are fused at ρ.

50 The Messages that are Fused at ρ b ρ (x ρ ) α ρ (x ρ ) e E(ρ) m e (x fin e ), The messages in E(ρ) are those that originate outside ρ s field of view but terminate inside it. init(e) ρ e E(ρ) e fin(e)

51 The PBP Algorithm: Updating the Messages. An edge e =(ρ, σ) issaid to be consistent with respect to a given set {m e : E E} of messages if x ρ \x σ b ρ (x ρ )=b σ (x σ ) for all x σ A D(σ). In words, this says that the belief at ρ in x σ, obtained by marginalization, agrees with the belief at σ in x σ.

52 Example of Edge Consistency {x 1, x 2, x 4, x 6 } ρ e σ {x, x, x } e =(ρ, σ) isconsistent if b ρ (x 1,x 2,x 4,x 6 )=b σ (x 2,x 4,x 6 ) x 1 A

53 Example of Edge Consistency {x 1, x 2, x 4, x 6 } ρ e σ {x, x, x } e =(ρ, σ) isconsistent if b ρ (x 1,x 2,x 4,x 6 )=b σ (x 2,x 4,x 6 ) x 1 A

54 The PBP Algorithm: The Update Rule When the message m e is updated, it is adjusted so that the edge e becomes consistent. Explicitly, m e (x σ ) x ρ \x σ ( α ρ\σ (x ρ ) ) g E(ρ)\E(σ) m g(x fin(g) ) f E(σ)\{E(ρ) e} m. f (x fin(f) ) The PBP algorithm proceeds by updating messages according to this rule. The hope is that the messages will converge to a fixed point whose associated beliefs are good approximations to the desired marginals, i.e., b ρ (x ρ ) B ρ (x ρ ), where B(x) is the global or Boltzmann density.

55 The PBP Algorithm: The Update Rule When the message m e is updated, it is adjusted so that the edge e becomes consistent. Explicitly, m e (x σ ) x ρ \x σ ( α ρ\σ (x ρ ) ) g E(ρ)\E(σ) m g(x fin(g) ) f E(σ)\{E(ρ) e} m. f (x fin(f) ) The PBP algorithm proceeds by updating messages according to this rule. The hope is that the messages will converge to a fixed point whose associated beliefs are good approximations to the desired marginals, i.e., b ρ (x ρ ) B ρ (x ρ ), where B(x) is the global or Boltzmann density.

56 How Well Does it Work? A series of experiments: Local domains: R = {{1, 2, 3}, {1, 3, 4}, {2, 3, 5}, {3, 4, 5}}. Local potentials: α 1 (x 1,x 2,x 3 ), α 2 (x 1,x 3,x 4 ), α 3 (x 2,x 3,x 5 ), and α 4 (x 3,x 4,x 5 ) selected at random from the 8-simplex. There are many posets that support this choice of domains; we will investigate only three.

57 How Well Does it Work? A series of experiments: Local domains: R = {{1, 2, 3}, {1, 3, 4}, {2, 3, 5}, {3, 4, 5}}. Local potentials: α 1 (x 1,x 2,x 3 ), α 2 (x 1,x 3,x 4 ), α 3 (x 2,x 3,x 5 ), and α 4 (x 3,x 4,x 5 ) selected at random from the 8-simplex. There are many posets that support this choice of domains; we will investigate only three.

58 How Well Does it Work? A series of experiments: Local domains: R = {{1, 2, 3}, {1, 3, 4}, {2, 3, 5}, {3, 4, 5}}. Local potentials: α 1 (x 1,x 2,x 3 ), α 2 (x 1,x 3,x 4 ), α 3 (x 2,x 3,x 5 ), and α 4 (x 3,x 4,x 5 ) selected at random from the 8-simplex. There are many posets that support this choice of domains; we will investigate only three.

59 How Well Does it Work? A series of experiments: Local domains: R = {{1, 2, 3}, {1, 3, 4}, {2, 3, 5}, {3, 4, 5}}. Local potentials: α 1 (x 1,x 2,x 3 ), α 2 (x 1,x 3,x 4 ), α 3 (x 2,x 3,x 5 ), and α 4 (x 3,x 4,x 5 )selected at random from the 8-simplex. There are many posets that support this choice of domains; we will investigate only three.

60 How Well Does it Work? A series of experiments: Local domains: R = {{1, 2, 3}, {1, 3, 4}, {2, 3, 5}, {3, 4, 5}}. Local potentials: α 1 (x 1,x 2,x 3 ), α 2 (x 1,x 3,x 4 ), α 3 (x 2,x 3,x 5 ), and α 4 (x 3,x 4,x 5 )selected at random from the 8-simplex. There are many posets that support this choice of domains; we will investigate only three.

61 Poset Number One (Junction Graph Construction) {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1} {2,3} {3,4} {3,5}

62 Poset Number Two (Factor Graph Construction) {1,2,3} {1,3,4} {3,4,5} {2,3,5} {1} {2} {3} {4} {5}

63 Poset Number Three (Cluster Variational Method) {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1,3} {2,3} {3,4} {3,5} {3} +1

64 Poset Number One {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1} {2,3} {3,4} {3,5} PBP on Poset #1 local kernel # iterative exact

65 Poset Number One {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1} {2,3} {3,4} {3,5} PBP on Poset #1 local kernel # iterative exact

66 Poset Number One {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1} {2,3} {3,4} {3,5} PBP on Poset #1 local kernel # iterative exact

67 Poset Number One {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1} {2,3} {3,4} {3,5} PBP on Poset #1 local kernel # iterative exact

68 Poset Number One {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1} {2,3} {3,4} {3,5} PBP on Poset #1 variable # iterative exact

69 Poset Number One {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1} {2,3} {3,4} {3,5} PBP on Poset #1 variable # iterative exact

70 Poset Number One {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1} {2,3} {3,4} {3,5} PBP on Poset #1 variable # iterative exact

71 Poset Number One {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1} {2,3} {3,4} {3,5} PBP on Poset #1 variable # iterative exact

72 Poset Number One {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1} {2,3} {3,4} {3,5} PBP on Poset #1 variable # iterative exact

73 Poset Number Two (Factor Graph Construction) {1,2,3} {1,3,4} {3,4,5} {2,3,5} {1} {2} {3} {4} {5}

74 Poset Number Two {1,2,3} {1,3,4} {3,4,5} {2,3,5} {1} {2} {3} {4} {5} PBP on Poset #2 local kernel # iterative exact

75 Poset Number Two {1,2,3} {1,3,4} {3,4,5} {2,3,5} {1} {2} {3} {4} {5} PBP on Poset #2 local kernel # iterative exact

76 Poset Number Two {1,2,3} {1,3,4} {3,4,5} {2,3,5} {1} {2} {3} {4} {5} PBP on Poset #2 local kernel # iterative exact

77 Poset Number Two {1,2,3} {1,3,4} {3,4,5} {2,3,5} {1} {2} {3} {4} {5} PBP on Poset #2 local kernel # iterative exact

78 Poset Number Two {1,2,3} {1,3,4} {3,4,5} {2,3,5} {1} {2} {3} {4} {5} PBP on Poset #2 variable # iterative exact

79 Poset Number Two {1,2,3} {1,3,4} {3,4,5} {2,3,5} {1} {2} {3} {4} {5} PBP on Poset #2 variable # iterative exact

80 Poset Number Two {1,2,3} {1,3,4} {3,4,5} {2,3,5} {1} {2} {3} {4} {5} PBP on Poset #2 variable # iterative exact

81 Poset Number Two {1,2,3} {1,3,4} {3,4,5} {2,3,5} {1} {2} {3} {4} {5} PBP on Poset #2 variable # iterative exact

82 Poset Number Two {1,2,3} {1,3,4} {3,4,5} {2,3,5} {1} {2} {3} {4} {5} PBP on Poset #2 variable # iterative exact

83 Poset Number Three (Cluster Variational Method) {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1,3} {2,3} {3,4} {3,5} {3} +1

84 Poset Number Three {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1,3} {2,3} {3,4} {3,5} {3} +1 1 PBP on Poset #2 local kernel #1 (1000 runs,, w=0.65) iterative exact

85 Poset Number Three {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1,3} {2,3} {3,4} {3,5} {3} +1 1 PBP on Poset #2 local kernel #2 (1000 runs,, w=0.65) iterative exact

86 Poset Number Three {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1,3} {2,3} {3,4} {3,5} {3} +1 1 PBP on Poset #2 local kernel #3 (1000 runs,, w=0.65) iterative exact

87 Poset Number Three {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1,3} {2,3} {3,4} {3,5} {3} +1 1 PBP on Poset #2 local kernel #4 (1000 runs,, w=0.65) iterative exact

88 The Problem is Non-Convergence, and is Easily Repaired. y n+1 = F (y n ) (too aggressive) y n+1 = y n F (y n ) y n+1 = y 1 w n F (y n ) w for 0 <w<1.

89 The Problem is Non-Convergence, and is Easily Repaired. y n+1 = F (y n ) (too aggressive) y n+1 = y n F (y n ) (slower, but surer) y n+1 = y 1 w n F (y n ) w for 0 <w<1.

90 The Problem is Non-Convergence, and is Easily Repaired. y n+1 = F (y n ) (too aggressive) y n+1 = y n F (y n ) (slower, but surer) y n+1 = y 1 w n F (y n ) w for 0 <w<1.

91 Poset Number Three {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1,3} {2,3} {3,4} {3,5} {3} +1 KullbackLeibler Divergence (summed over experiments) Performance of GGBP (PBP) on Poset #3 for 10/w iterations w, coefficient of old message in update rule

92 {1,2,3} {1,3,4} {2,3,5} {3,4,5} Poset Number Three {1,3} {2,3} {3,4} {3,5} {3} +1 Performance of GGBP (PBP) on Poset #3 for 1000/w iterations (on [.3,.6]) 800 KullbackLeibler Divergence (summed over experiments) w, coefficient of old message in update rule

93 Poset Number Three {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1,3} {2,3} {3,4} {3,5} {3} +1 1 PBP on Poset #3 local kernel # iterative exact

94 Poset Number Three {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1,3} {2,3} {3,4} {3,5} {3} +1 1 PBP on Poset #3 local kernel # iterative exact

95 Poset Number Three {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1,3} {2,3} {3,4} {3,5} {3} +1 1 PBP on Poset #3 local kernel # iterative exact

96 Poset Number Three {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1,3} {2,3} {3,4} {3,5} {3} +1 1 PBP on Poset #3 local kernel # iterative exact

97 Poset Number Three {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1,3} {2,3} {3,4} {3,5} {3} +1 1 PBP on Poset #3 variable # iterative exact

98 Poset Number Three {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1,3} {2,3} {3,4} {3,5} {3} +1 1 PBP on Poset #3 variable # iterative exact

99 Poset Number Three {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1,3} {2,3} {3,4} {3,5} {3} +1 1 PBP on Poset #3 variable # iterative exact

100 Poset Number Three {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1,3} {2,3} {3,4} {3,5} {3} +1 1 PBP on Poset #3 variable # iterative exact

101 Poset Number Three {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1,3} {2,3} {3,4} {3,5} {3} +1 1 PBP on Poset #3 variable # iterative exact

102 Why Does It Work? No one really knows, but... The PBP algorithm can be viewed as an algorithm for minimizing a certain energy function. There is a one-toone correspondence between the fixed points of PBP and the stationary points of the energy. (It appears that the attractive fixed points correspond to the local minima of the free energy.)

103 Why Does It Work? No one really knows, but... The PBP algorithm can be viewed as an algorithm for minimizing a certain energy function. There is a one-toone correspondence between the fixed points of PBP and the stationary points of the energy. (It appears that the attractive fixed points correspond to the local minima of the free energy.)

104 Why Does It Work? No one really knows, but... The PBP algorithm can be viewed as an algorithm for minimizing a certain energy function. There is a one-toone correspondence between the fixed points of PBP and the stationary points of this energy surface. (It appears that the attractive fixed points correspond to the local minima of the free energy.)

105 More Precisely: The Bethe-Kikuchi Approximation We know F = min b(x) B(x) =argmin b(x) F (b(x)) F (b(x)). We approximate F (b(x)) with something that depends only on the poset P and the marginals b ρ (x ρ )ofb(x): F P (b(x)) = ρ P φ(ρ) F ρ (b ρ (x ρ ), where F ρ (b ρ (x ρ )) is the local free energy at ρ, defined as b ρ (x ρ )E ρ (x ρ )+ b ρ (x ρ )lnb ρ (x ρ ). x ρ x ρ

106 More Precisely: The Bethe-Kikuchi Approximation We know F = min b(x) B(x) =argmin b(x) F (b(x)) F (b(x)). We approximate F (b(x)) with something that depends only on the poset P and the marginals b ρ (x ρ )ofb(x): F P (b(x)) = ρ P φ(ρ) F ρ (b ρ (x ρ ), where F ρ (b ρ (x ρ )) is the local free energy at ρ, defined as b ρ (x ρ )E ρ (x ρ )+ b ρ (x ρ )lnb ρ (x ρ ). x ρ x ρ

107 An Analogy: Black line = F Colored line = F P. The hope is that min F {bρ (x ρ )}) P min b(x) F = F.

108 Poset Number One The BK Approximation {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1} {2,3} {3,4} {3,5} Variational Free Energy Poset #1 with error= FP weighted difference from convergence point

109 Poset Number Two The BK Approximation {1,2,3} {1,3,4} {3,4,5} {2,3,5} {1} {2} {3} {4} {5} Variational Free Energy Poset #2 with error= F weighted difference from convergence point

110 Poset Number Three The BK Approximation {1,2,3} {1,3,4} {2,3,5} {3,4,5} {1,3} {2,3} {3,4} {3,5} {3} Variational Free Energy Poset #3 with error= F weighted difference from convergence point

111 Some Open Questions What is the best poset for a given MPD problem? Prove (or find a counterexample) that the stable fixed points of PBP correspond to the local minima of the BK variational free energy. Can the PBP algorithm always be made to converge using the smoothing trick? What is the relationship between the BK approximate free energy and the exact (Helmholtz), free energy? Can other combinatorial optimization methods, e.g. simulated annealing, be used to minimize F P, thereby leading to alternative BP algorithms?

112 Some Open Questions What is the best poset for a given MPD problem? Prove (or find a counterexample) that the stable fixed points of PBP correspond to the local minima of the BK variational free energy. Can the PBP algorithm always be made to converge using the smoothing trick? What is the relationship between the BK approximate free energy and the exact (Helmholtz), free energy? Can other combinatorial optimization methods, e.g. simulated annealing, be used to minimize F P, thereby leading to alternative BP algorithms?

113 Some Open Questions What is the best poset for a given MPD problem? Prove (or find a counterexample) that the stable fixed points of PBP correspond to the local minima of the BK variational free energy. Can the PBP algorithm always be made to converge using the smoothing trick? What is the relationship between the BK approximate free energy and the exact (Helmholtz), free energy? Can other combinatorial optimization methods, e.g. simulated annealing, be used to minimize F P, thereby leading to alternative BP algorithms?

114 Some Open Questions What is the best poset for a given MPD problem? Prove (or find a counterexample) that the stable fixed points of PBP correspond to the local minima of the BK variational free energy. Can the PBP algorithm always be made to converge using the smoothing trick? What is the relationship between the BK approximate free energy and the exact (Helmholtz), free energy? Can other combinatorial optimization methods, e.g. simulated annealing, be used to minimize F P, thereby leading to alternative BP algorithms?

115 Some Open Questions What is the best poset for a given MPD problem? Prove (or find a counterexample) that the stable fixed points of PBP correspond to the local minima of the BK variational free energy. Can the PBP algorithm always be made to converge using the smoothing trick? What is the relationship between the BK approximate free energy and the exact (Helmholtz), free energy? Can other combinatorial optimization methods, e.g. simulated annealing, be used to minimize F P, thereby leading to alternative BP algorithms?

116 Some Open Questions What is the best poset for a given MPD problem? Prove (or find a counterexample) that the stable fixed points of PBP correspond to the local minima of the BK variational free energy. Can the PBP algorithm always be made to converge using the smoothing trick? What is the relationship between the BK approximate free energy and the exact (Helmholtz), free energy? Can other combinatorial optimization methods, e.g. simulated annealing, be used to minimize F P, thereby leading to alternative BP algorithms?

The Generalized Distributive Law and Free Energy Minimization

The Generalized Distributive Law and Free Energy Minimization The Generalized Distributive Law and Free Energy Minimization Srinivas M. Aji Robert J. McEliece Rainfinity, Inc. Department of Electrical Engineering 87 N. Raymond Ave. Suite 200 California Institute

More information

UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS

UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS JONATHAN YEDIDIA, WILLIAM FREEMAN, YAIR WEISS 2001 MERL TECH REPORT Kristin Branson and Ian Fasel June 11, 2003 1. Inference Inference problems

More information

Lecture 18 Generalized Belief Propagation and Free Energy Approximations

Lecture 18 Generalized Belief Propagation and Free Energy Approximations Lecture 18, Generalized Belief Propagation and Free Energy Approximations 1 Lecture 18 Generalized Belief Propagation and Free Energy Approximations In this lecture we talked about graphical models and

More information

Probabilistic and Bayesian Machine Learning

Probabilistic and Bayesian Machine Learning Probabilistic and Bayesian Machine Learning Day 4: Expectation and Belief Propagation Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/

More information

Estimating the Capacity of the 2-D Hard Square Constraint Using Generalized Belief Propagation

Estimating the Capacity of the 2-D Hard Square Constraint Using Generalized Belief Propagation Estimating the Capacity of the 2-D Hard Square Constraint Using Generalized Belief Propagation Navin Kashyap (joint work with Eric Chan, Mahdi Jafari Siavoshani, Sidharth Jaggi and Pascal Vontobel, Chinese

More information

13 : Variational Inference: Loopy Belief Propagation and Mean Field

13 : Variational Inference: Loopy Belief Propagation and Mean Field 10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Maria Ryskina, Yen-Chia Hsu 1 Introduction

More information

Fractional Belief Propagation

Fractional Belief Propagation Fractional Belief Propagation im iegerinck and Tom Heskes S, niversity of ijmegen Geert Grooteplein 21, 6525 EZ, ijmegen, the etherlands wimw,tom @snn.kun.nl Abstract e consider loopy belief propagation

More information

Connecting Belief Propagation with Maximum Likelihood Detection

Connecting Belief Propagation with Maximum Likelihood Detection Connecting Belief Propagation with Maximum Likelihood Detection John MacLaren Walsh 1, Phillip Allan Regalia 2 1 School of Electrical Computer Engineering, Cornell University, Ithaca, NY. jmw56@cornell.edu

More information

Variational algorithms for marginal MAP

Variational algorithms for marginal MAP Variational algorithms for marginal MAP Alexander Ihler UC Irvine CIOG Workshop November 2011 Variational algorithms for marginal MAP Alexander Ihler UC Irvine CIOG Workshop November 2011 Work with Qiang

More information

12 : Variational Inference I

12 : Variational Inference I 10-708: Probabilistic Graphical Models, Spring 2015 12 : Variational Inference I Lecturer: Eric P. Xing Scribes: Fattaneh Jabbari, Eric Lei, Evan Shapiro 1 Introduction Probabilistic inference is one of

More information

13 : Variational Inference: Loopy Belief Propagation

13 : Variational Inference: Loopy Belief Propagation 10-708: Probabilistic Graphical Models 10-708, Spring 2014 13 : Variational Inference: Loopy Belief Propagation Lecturer: Eric P. Xing Scribes: Rajarshi Das, Zhengzhong Liu, Dishan Gupta 1 Introduction

More information

Concavity of reweighted Kikuchi approximation

Concavity of reweighted Kikuchi approximation Concavity of reweighted ikuchi approximation Po-Ling Loh Department of Statistics The Wharton School University of Pennsylvania loh@wharton.upenn.edu Andre Wibisono Computer Science Division University

More information

Information, Physics, and Computation

Information, Physics, and Computation Information, Physics, and Computation Marc Mezard Laboratoire de Physique Thdorique et Moales Statistiques, CNRS, and Universit y Paris Sud Andrea Montanari Department of Electrical Engineering and Department

More information

Data Fusion Algorithms for Collaborative Robotic Exploration

Data Fusion Algorithms for Collaborative Robotic Exploration IPN Progress Report 42-149 May 15, 2002 Data Fusion Algorithms for Collaborative Robotic Exploration J. Thorpe 1 and R. McEliece 1 In this article, we will study the problem of efficient data fusion in

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference IV: Variational Principle II Junming Yin Lecture 17, March 21, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3 X 3 Reading: X 4

More information

Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation

Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation Jason K. Johnson, Dmitry M. Malioutov and Alan S. Willsky Department of Electrical Engineering and Computer Science Massachusetts Institute

More information

Expectation Consistent Free Energies for Approximate Inference

Expectation Consistent Free Energies for Approximate Inference Expectation Consistent Free Energies for Approximate Inference Manfred Opper ISIS School of Electronics and Computer Science University of Southampton SO17 1BJ, United Kingdom mo@ecs.soton.ac.uk Ole Winther

More information

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah Introduction to Graphical Models Srikumar Ramalingam School of Computing University of Utah Reference Christopher M. Bishop, Pattern Recognition and Machine Learning, Jonathan S. Yedidia, William T. Freeman,

More information

6.867 Machine learning, lecture 23 (Jaakkola)

6.867 Machine learning, lecture 23 (Jaakkola) Lecture topics: Markov Random Fields Probabilistic inference Markov Random Fields We will briefly go over undirected graphical models or Markov Random Fields (MRFs) as they will be needed in the context

More information

A Cluster-Cumulant Expansion at the Fixed Points of Belief Propagation

A Cluster-Cumulant Expansion at the Fixed Points of Belief Propagation A Cluster-Cumulant Expansion at the Fixed Points of Belief Propagation Max Welling Dept. of Computer Science University of California, Irvine Irvine, CA 92697-3425, USA Andrew E. Gelfand Dept. of Computer

More information

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah Introduction to Graphical Models Srikumar Ramalingam School of Computing University of Utah Reference Christopher M. Bishop, Pattern Recognition and Machine Learning, Jonathan S. Yedidia, William T. Freeman,

More information

Junction Tree, BP and Variational Methods

Junction Tree, BP and Variational Methods Junction Tree, BP and Variational Methods Adrian Weller MLSALT4 Lecture Feb 21, 2018 With thanks to David Sontag (MIT) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,

More information

Structured Variational Inference

Structured Variational Inference Structured Variational Inference Sargur srihari@cedar.buffalo.edu 1 Topics 1. Structured Variational Approximations 1. The Mean Field Approximation 1. The Mean Field Energy 2. Maximizing the energy functional:

More information

HESKES ET AL. Approximate Inference and Constrained Optimization. SNN, University of Nijmegen Geert Grooteplein 21, 6525 EZ, Nijmegen, The Netherlands

HESKES ET AL. Approximate Inference and Constrained Optimization. SNN, University of Nijmegen Geert Grooteplein 21, 6525 EZ, Nijmegen, The Netherlands UA12003 HESKES ET AL. 313 Approximate Inference and Constrained Optimization Tom Heskes Kees Albers Bert Kappen SNN, University of Nijmegen Geert Grooteplein 21, 6525 EZ, Nijmegen, The Netherlands Abstract

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 9 Undirected Models CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due next Wednesday (Nov 4) in class Start early!!! Project milestones due Monday (Nov 9)

More information

Chapter 11. Stochastic Methods Rooted in Statistical Mechanics

Chapter 11. Stochastic Methods Rooted in Statistical Mechanics Chapter 11. Stochastic Methods Rooted in Statistical Mechanics Neural Networks and Learning Machines (Haykin) Lecture Notes on Self-learning Neural Algorithms Byoung-Tak Zhang School of Computer Science

More information

CS Lecture 19. Exponential Families & Expectation Propagation

CS Lecture 19. Exponential Families & Expectation Propagation CS 6347 Lecture 19 Exponential Families & Expectation Propagation Discrete State Spaces We have been focusing on the case of MRFs over discrete state spaces Probability distributions over discrete spaces

More information

Semidefinite relaxations for approximate inference on graphs with cycles

Semidefinite relaxations for approximate inference on graphs with cycles Semidefinite relaxations for approximate inference on graphs with cycles Martin J. Wainwright Electrical Engineering and Computer Science UC Berkeley, Berkeley, CA 94720 wainwrig@eecs.berkeley.edu Michael

More information

On the Choice of Regions for Generalized Belief Propagation

On the Choice of Regions for Generalized Belief Propagation UAI 2004 WELLING 585 On the Choice of Regions for Generalized Belief Propagation Max Welling (welling@ics.uci.edu) School of Information and Computer Science University of California Irvine, CA 92697-3425

More information

A Combined LP and QP Relaxation for MAP

A Combined LP and QP Relaxation for MAP A Combined LP and QP Relaxation for MAP Patrick Pletscher ETH Zurich, Switzerland pletscher@inf.ethz.ch Sharon Wulff ETH Zurich, Switzerland sharon.wulff@inf.ethz.ch Abstract MAP inference for general

More information

Probabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013

Probabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013 School of Computer Science Probabilistic Graphical Models Theory of Variational Inference: Inner and Outer Approximation Junming Yin Lecture 15, March 4, 2013 Reading: W & J Book Chapters 1 Roadmap Two

More information

On the number of circuits in random graphs. Guilhem Semerjian. [ joint work with Enzo Marinari and Rémi Monasson ]

On the number of circuits in random graphs. Guilhem Semerjian. [ joint work with Enzo Marinari and Rémi Monasson ] On the number of circuits in random graphs Guilhem Semerjian [ joint work with Enzo Marinari and Rémi Monasson ] [ Europhys. Lett. 73, 8 (2006) ] and [ cond-mat/0603657 ] Orsay 13-04-2006 Outline of the

More information

Linear Response for Approximate Inference

Linear Response for Approximate Inference Linear Response for Approximate Inference Max Welling Department of Computer Science University of Toronto Toronto M5S 3G4 Canada welling@cs.utoronto.ca Yee Whye Teh Computer Science Division University

More information

Min-Max Message Passing and Local Consistency in Constraint Networks

Min-Max Message Passing and Local Consistency in Constraint Networks Min-Max Message Passing and Local Consistency in Constraint Networks Hong Xu, T. K. Satish Kumar, and Sven Koenig University of Southern California, Los Angeles, CA 90089, USA hongx@usc.edu tkskwork@gmail.com

More information

17 Variational Inference

17 Variational Inference Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms for Inference Fall 2014 17 Variational Inference Prompted by loopy graphs for which exact

More information

Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract)

Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract) Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract) Arthur Choi and Adnan Darwiche Computer Science Department University of California, Los Angeles

More information

Information Projection Algorithms and Belief Propagation

Information Projection Algorithms and Belief Propagation χ 1 π(χ 1 ) Information Projection Algorithms and Belief Propagation Phil Regalia Department of Electrical Engineering and Computer Science Catholic University of America Washington, DC 20064 with J. M.

More information

Growth Rate of Spatially Coupled LDPC codes

Growth Rate of Spatially Coupled LDPC codes Growth Rate of Spatially Coupled LDPC codes Workshop on Spatially Coupled Codes and Related Topics at Tokyo Institute of Technology 2011/2/19 Contents 1. Factor graph, Bethe approximation and belief propagation

More information

NOTES ON PROBABILISTIC DECODING OF PARITY CHECK CODES

NOTES ON PROBABILISTIC DECODING OF PARITY CHECK CODES NOTES ON PROBABILISTIC DECODING OF PARITY CHECK CODES PETER CARBONETTO 1. Basics. The diagram given in p. 480 of [7] perhaps best summarizes our model of a noisy channel. We start with an array of bits

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models David Sontag New York University Lecture 6, March 7, 2013 David Sontag (NYU) Graphical Models Lecture 6, March 7, 2013 1 / 25 Today s lecture 1 Dual decomposition 2 MAP inference

More information

Clique trees & Belief Propagation. Siamak Ravanbakhsh Winter 2018

Clique trees & Belief Propagation. Siamak Ravanbakhsh Winter 2018 Graphical Models Clique trees & Belief Propagation Siamak Ravanbakhsh Winter 2018 Learning objectives message passing on clique trees its relation to variable elimination two different forms of belief

More information

A Generalized Loop Correction Method for Approximate Inference in Graphical Models

A Generalized Loop Correction Method for Approximate Inference in Graphical Models for Approximate Inference in Graphical Models Siamak Ravanbakhsh mravanba@ualberta.ca Chun-Nam Yu chunnam@ualberta.ca Russell Greiner rgreiner@ualberta.ca Department of Computing Science, University of

More information

Part 1: Expectation Propagation

Part 1: Expectation Propagation Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud

More information

(# = %(& )(* +,(- Closed system, well-defined energy (or e.g. E± E/2): Microcanonical ensemble

(# = %(& )(* +,(- Closed system, well-defined energy (or e.g. E± E/2): Microcanonical ensemble Recall from before: Internal energy (or Entropy): &, *, - (# = %(& )(* +,(- Closed system, well-defined energy (or e.g. E± E/2): Microcanonical ensemble & = /01Ω maximized Ω: fundamental statistical quantity

More information

The Origin of Deep Learning. Lili Mou Jan, 2015

The Origin of Deep Learning. Lili Mou Jan, 2015 The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets

More information

Factor Graphs and Message Passing Algorithms Part 1: Introduction

Factor Graphs and Message Passing Algorithms Part 1: Introduction Factor Graphs and Message Passing Algorithms Part 1: Introduction Hans-Andrea Loeliger December 2007 1 The Two Basic Problems 1. Marginalization: Compute f k (x k ) f(x 1,..., x n ) x 1,..., x n except

More information

Bayesian Machine Learning - Lecture 7

Bayesian Machine Learning - Lecture 7 Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1

More information

Recitation 9: Loopy BP

Recitation 9: Loopy BP Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 204 Recitation 9: Loopy BP General Comments. In terms of implementation,

More information

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2 Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ Bayesian paradigm Consistent use of probability theory

More information

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2 Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ 1 Bayesian paradigm Consistent use of probability theory

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture Notes Fall 2009 November, 2009 Byoung-Ta Zhang School of Computer Science and Engineering & Cognitive Science, Brain Science, and Bioinformatics Seoul National University

More information

Beyond Log-Supermodularity: Lower Bounds and the Bethe Partition Function

Beyond Log-Supermodularity: Lower Bounds and the Bethe Partition Function Beyond Log-Supermodularity: Lower Bounds and the Bethe Partition Function Nicholas Ruozzi Communication Theory Laboratory École Polytechnique Fédérale de Lausanne Lausanne, Switzerland nicholas.ruozzi@epfl.ch

More information

Information Geometric view of Belief Propagation

Information Geometric view of Belief Propagation Information Geometric view of Belief Propagation Yunshu Liu 2013-10-17 References: [1]. Shiro Ikeda, Toshiyuki Tanaka and Shun-ichi Amari, Stochastic reasoning, Free energy and Information Geometry, Neural

More information

On the Bethe approximation

On the Bethe approximation On the Bethe approximation Adrian Weller Department of Statistics at Oxford University September 12, 2014 Joint work with Tony Jebara 1 / 46 Outline 1 Background on inference in graphical models Examples

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2014 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Yu-Hsin Kuo, Amos Ng 1 Introduction Last lecture

More information

Learning unbelievable marginal probabilities

Learning unbelievable marginal probabilities Learning unbelievable marginal probabilities Xaq Pitkow Department of Brain and Cognitive Science University of Rochester Rochester, NY 14607 xaq@post.harvard.edu Yashar Ahmadian Center for Theoretical

More information

Message Passing and Junction Tree Algorithms. Kayhan Batmanghelich

Message Passing and Junction Tree Algorithms. Kayhan Batmanghelich Message Passing and Junction Tree Algorithms Kayhan Batmanghelich 1 Review 2 Review 3 Great Ideas in ML: Message Passing Each soldier receives reports from all branches of tree 3 here 7 here 1 of me 11

More information

Dual Decomposition for Marginal Inference

Dual Decomposition for Marginal Inference Dual Decomposition for Marginal Inference Justin Domke Rochester Institute of echnology Rochester, NY 14623 Abstract We present a dual decomposition approach to the treereweighted belief propagation objective.

More information

Inference and Representation

Inference and Representation Inference and Representation David Sontag New York University Lecture 5, Sept. 30, 2014 David Sontag (NYU) Inference and Representation Lecture 5, Sept. 30, 2014 1 / 16 Today s lecture 1 Running-time of

More information

THE PHYSICS OF COUNTING AND SAMPLING ON RANDOM INSTANCES. Lenka Zdeborová

THE PHYSICS OF COUNTING AND SAMPLING ON RANDOM INSTANCES. Lenka Zdeborová THE PHYSICS OF COUNTING AND SAMPLING ON RANDOM INSTANCES Lenka Zdeborová (CEA Saclay and CNRS, France) MAIN CONTRIBUTORS TO THE PHYSICS UNDERSTANDING OF RANDOM INSTANCES Braunstein, Franz, Kabashima, Kirkpatrick,

More information

Stochastic Processes, Kernel Regression, Infinite Mixture Models

Stochastic Processes, Kernel Regression, Infinite Mixture Models Stochastic Processes, Kernel Regression, Infinite Mixture Models Gabriel Huang (TA for Simon Lacoste-Julien) IFT 6269 : Probabilistic Graphical Models - Fall 2018 Stochastic Process = Random Function 2

More information

On the Convergence of the Convex Relaxation Method and Distributed Optimization of Bethe Free Energy

On the Convergence of the Convex Relaxation Method and Distributed Optimization of Bethe Free Energy On the Convergence of the Convex Relaxation Method and Distributed Optimization of Bethe Free Energy Ming Su Department of Electrical Engineering and Department of Statistics University of Washington,

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Random Graph Coloring

Random Graph Coloring Random Graph Coloring Amin Coja-Oghlan Goethe University Frankfurt A simple question... What is the chromatic number of G(n,m)? [ER 60] n vertices m = dn/2 random edges ... that lacks a simple answer Early

More information

Does Better Inference mean Better Learning?

Does Better Inference mean Better Learning? Does Better Inference mean Better Learning? Andrew E. Gelfand, Rina Dechter & Alexander Ihler Department of Computer Science University of California, Irvine {agelfand,dechter,ihler}@ics.uci.edu Abstract

More information

Loopy Belief Propagation for Bipartite Maximum Weight b-matching

Loopy Belief Propagation for Bipartite Maximum Weight b-matching Loopy Belief Propagation for Bipartite Maximum Weight b-matching Bert Huang and Tony Jebara Computer Science Department Columbia University New York, NY 10027 Outline 1. Bipartite Weighted b-matching 2.

More information

Bayesian Learning in Undirected Graphical Models

Bayesian Learning in Undirected Graphical Models Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ and Center for Automated Learning and

More information

Asaf Bar Zvi Adi Hayat. Semantic Segmentation

Asaf Bar Zvi Adi Hayat. Semantic Segmentation Asaf Bar Zvi Adi Hayat Semantic Segmentation Today s Topics Fully Convolutional Networks (FCN) (CVPR 2015) Conditional Random Fields as Recurrent Neural Networks (ICCV 2015) Gaussian Conditional random

More information

Variational Inference (11/04/13)

Variational Inference (11/04/13) STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further

More information

Introduction to Low-Density Parity Check Codes. Brian Kurkoski

Introduction to Low-Density Parity Check Codes. Brian Kurkoski Introduction to Low-Density Parity Check Codes Brian Kurkoski kurkoski@ice.uec.ac.jp Outline: Low Density Parity Check Codes Review block codes History Low Density Parity Check Codes Gallager s LDPC code

More information

Fast Memory-Efficient Generalized Belief Propagation

Fast Memory-Efficient Generalized Belief Propagation Fast Memory-Efficient Generalized Belief Propagation M. Pawan Kumar P.H.S. Torr Department of Computing Oxford Brookes University Oxford, UK, OX33 1HX {pkmudigonda,philiptorr}@brookes.ac.uk http://cms.brookes.ac.uk/computervision

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning More Approximate Inference Mark Schmidt University of British Columbia Winter 2018 Last Time: Approximate Inference We ve been discussing graphical models for density estimation,

More information

1.2 Posets and Zorn s Lemma

1.2 Posets and Zorn s Lemma 1.2 Posets and Zorn s Lemma A set Σ is called partially ordered or a poset with respect to a relation if (i) (reflexivity) ( σ Σ) σ σ ; (ii) (antisymmetry) ( σ,τ Σ) σ τ σ = σ = τ ; 66 (iii) (transitivity)

More information

Decomposition Methods for Large Scale LP Decoding

Decomposition Methods for Large Scale LP Decoding Decomposition Methods for Large Scale LP Decoding Siddharth Barman Joint work with Xishuo Liu, Stark Draper, and Ben Recht Outline Background and Problem Setup LP Decoding Formulation Optimization Framework

More information

Phase transitions in discrete structures

Phase transitions in discrete structures Phase transitions in discrete structures Amin Coja-Oghlan Goethe University Frankfurt Overview 1 The physics approach. [following Mézard, Montanari 09] Basics. Replica symmetry ( Belief Propagation ).

More information

Submodularity in Machine Learning

Submodularity in Machine Learning Saifuddin Syed MLRG Summer 2016 1 / 39 What are submodular functions Outline 1 What are submodular functions Motivation Submodularity and Concavity Examples 2 Properties of submodular functions Submodularity

More information

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Message passing and approximate message passing

Message passing and approximate message passing Message passing and approximate message passing Arian Maleki Columbia University 1 / 47 What is the problem? Given pdf µ(x 1, x 2,..., x n ) we are interested in arg maxx1,x 2,...,x n µ(x 1, x 2,..., x

More information

Bayesian Learning in Undirected Graphical Models

Bayesian Learning in Undirected Graphical Models Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ Work with: Iain Murray and Hyun-Chul

More information

Graphical models and message-passing Part II: Marginals and likelihoods

Graphical models and message-passing Part II: Marginals and likelihoods Graphical models and message-passing Part II: Marginals and likelihoods Martin Wainwright UC Berkeley Departments of Statistics, and EECS Tutorial materials (slides, monograph, lecture notes) available

More information

12. LOCAL SEARCH. gradient descent Metropolis algorithm Hopfield neural networks maximum cut Nash equilibria

12. LOCAL SEARCH. gradient descent Metropolis algorithm Hopfield neural networks maximum cut Nash equilibria 12. LOCAL SEARCH gradient descent Metropolis algorithm Hopfield neural networks maximum cut Nash equilibria Lecture slides by Kevin Wayne Copyright 2005 Pearson-Addison Wesley h ttp://www.cs.princeton.edu/~wayne/kleinberg-tardos

More information

arxiv: v1 [stat.ml] 26 Feb 2013

arxiv: v1 [stat.ml] 26 Feb 2013 Variational Algorithms for Marginal MAP Qiang Liu Donald Bren School of Information and Computer Sciences University of California, Irvine Irvine, CA, 92697-3425, USA qliu1@uci.edu arxiv:1302.6584v1 [stat.ml]

More information

Statistical Approaches to Learning and Discovery

Statistical Approaches to Learning and Discovery Statistical Approaches to Learning and Discovery Graphical Models Zoubin Ghahramani & Teddy Seidenfeld zoubin@cs.cmu.edu & teddy@stat.cmu.edu CALD / CS / Statistics / Philosophy Carnegie Mellon University

More information

Canonicity and representable relation algebras

Canonicity and representable relation algebras Canonicity and representable relation algebras Ian Hodkinson Joint work with: Rob Goldblatt Robin Hirsch Yde Venema What am I going to do? In the 1960s, Monk proved that the variety RRA of representable

More information

Probabilistic numerics for deep learning

Probabilistic numerics for deep learning Presenter: Shijia Wang Department of Engineering Science, University of Oxford rning (RLSS) Summer School, Montreal 2017 Outline 1 Introduction Probabilistic Numerics 2 Components Probabilistic modeling

More information

CS Lecture 4. Markov Random Fields

CS Lecture 4. Markov Random Fields CS 6347 Lecture 4 Markov Random Fields Recap Announcements First homework is available on elearning Reminder: Office hours Tuesday from 10am-11am Last Time Bayesian networks Today Markov random fields

More information

LINK scheduling algorithms based on CSMA have received

LINK scheduling algorithms based on CSMA have received Efficient CSMA using Regional Free Energy Approximations Peruru Subrahmanya Swamy, Venkata Pavan Kumar Bellam, Radha Krishna Ganti, and Krishna Jagannathan arxiv:.v [cs.ni] Feb Abstract CSMA Carrier Sense

More information

Linear Response Algorithms for Approximate Inference in Graphical Models

Linear Response Algorithms for Approximate Inference in Graphical Models Linear Response Algorithms for Approximate Inference in Graphical Models Max Welling Department of Computer Science University of Toronto 10 King s College Road, Toronto M5S 3G4 Canada welling@cs.toronto.edu

More information

Maxime GUILLAUD. Huawei Technologies Mathematical and Algorithmic Sciences Laboratory, Paris

Maxime GUILLAUD. Huawei Technologies Mathematical and Algorithmic Sciences Laboratory, Paris 1/21 Maxime GUILLAUD Alignment Huawei Technologies Mathematical and Algorithmic Sciences Laboratory, Paris maxime.guillaud@huawei.com http://research.mguillaud.net/ Optimisation Géométrique sur les Variétés

More information

Learning MN Parameters with Approximation. Sargur Srihari

Learning MN Parameters with Approximation. Sargur Srihari Learning MN Parameters with Approximation Sargur srihari@cedar.buffalo.edu 1 Topics Iterative exact learning of MN parameters Difficulty with exact methods Approximate methods Approximate Inference Belief

More information

Concavity of reweighted Kikuchi approximation

Concavity of reweighted Kikuchi approximation Concavity of reweighted Kikuchi approximation Po-Ling Loh Department of Statistics The Wharton School University of Pennsylvania loh@wharton.upenn.edu Andre Wibisono Computer Science Division University

More information

5. Density evolution. Density evolution 5-1

5. Density evolution. Density evolution 5-1 5. Density evolution Density evolution 5-1 Probabilistic analysis of message passing algorithms variable nodes factor nodes x1 a x i x2 a(x i ; x j ; x k ) x3 b x4 consider factor graph model G = (V ;

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3

More information

approximation of the dimension. Therefore, we get the following corollary.

approximation of the dimension. Therefore, we get the following corollary. INTRODUCTION In the last five years, several challenging problems in combinatorics have been solved in an unexpected way using polynomials. This new approach is called the polynomial method, and the goal

More information

First-Order Ordinary Differntial Equations II: Autonomous Case. David Levermore Department of Mathematics University of Maryland.

First-Order Ordinary Differntial Equations II: Autonomous Case. David Levermore Department of Mathematics University of Maryland. First-Order Ordinary Differntial Equations II: Autonomous Case David Levermore Department of Mathematics University of Maryland 25 February 2009 These notes cover some of the material that we covered in

More information

Definition: A binary relation R from a set A to a set B is a subset R A B. Example:

Definition: A binary relation R from a set A to a set B is a subset R A B. Example: Chapter 9 1 Binary Relations Definition: A binary relation R from a set A to a set B is a subset R A B. Example: Let A = {0,1,2} and B = {a,b} {(0, a), (0, b), (1,a), (2, b)} is a relation from A to B.

More information