Message Passing Algorithms: A Success Looking for Theoreticians

Size: px
Start display at page:

Download "Message Passing Algorithms: A Success Looking for Theoreticians"

Transcription

1 Message Passing Algorithms: A Success Looking for Theoreticians Andrea Montanari Stanford University June 5, 2010 Andrea Montanari (Stanford) Message Passing June 5, / 93

2 What is this talk about? A couple of examples A F m n 2, y F n 2 { minimize d(x, y), subject to Ax = 0. Andrea Montanari (Stanford) Message Passing June 5, / 93

3 What is this talk about? A couple of examples A F m n 2, y F n 2 { minimize d(x, y), subject to Ax = 0. Andrea Montanari (Stanford) Message Passing June 5, / 93

4 You should not hope for easy solutions... Andrea Montanari (Stanford) Message Passing June 5, / 93

5 You should not hope for easy solutions... One of the little boxes solves it for n = In 10 6 secs. Andrea Montanari (Stanford) Message Passing June 5, / 93

6 You should not hope for easy solutions... Uses a message passing algorithm + A random sparse Andrea Montanari (Stanford) Message Passing June 5, / 93

7 Do not worry! No more hardware diagrams in this talk!!! Andrea Montanari (Stanford) Message Passing June 5, / 93

8 Do not worry! No more hardware diagrams in this talk!!! Andrea Montanari (Stanford) Message Passing June 5, / 93

9 Second example: Learning low-rank matrices Andrea Montanari (Stanford) Message Passing June 5, / 93

10 Second example: Learning low-rank matrices The Netflix dataset M = movies users 10 8 ratings Andrea Montanari (Stanford) Message Passing June 5, / 93

11 Learning low-rank matrices: The Netflix dataset? ???? users movies 10 6 queries M = Andrea Montanari (Stanford) Message Passing June 5, / 93

12 A prize awarded for: RMSE < ; ) Andrea Montanari (Stanford) Message Passing June 5, / 93

13 A popular cost function C(X, Y ) 1 2 (i,j) Revealed M ij (XY T 2 λ ) ij + 2 X 2 F + λ 2 Y 2 F. X R n r, Y R m r [Srebro, Rennie, Jaakkola 2005] Non convex. Large (!) scale. Andrea Montanari (Stanford) Message Passing June 5, / 93

14 A popular cost function C(X, Y ) 1 2 (i,j) Revealed M ij (XY T 2 λ ) ij + 2 X 2 F + λ 2 Y 2 F. X R n r, Y R m r [Srebro, Rennie, Jaakkola 2005] Non convex. Large (!) scale. Andrea Montanari (Stanford) Message Passing June 5, / 93

15 A popular cost function C(X, Y ) 1 2 (i,j) Revealed M ij (XY T 2 λ ) ij + 2 X 2 F + λ 2 Y 2 F. X R n r, Y R m r [Srebro, Rennie, Jaakkola 2005] Non convex. Large (!) scale. Andrea Montanari (Stanford) Message Passing June 5, / 93

16 Three algorithms RMSE Message Passing Alt. Min. OptSpace iterations OptSpace Gradient descent [Keshavan-Montanari-Oh 2009] Alternating Least Squares [Koren-Bell 2008] Message Passing [Keshavan-Montanari 2010] Andrea Montanari (Stanford) Message Passing June 5, / 93

17 Examples everywhere! Machine learning and AI. Coding and communications. Large random structures, statistical mechanics, Andrea Montanari (Stanford) Message Passing June 5, / 93

18 Outline 1 k-sat: A (Very) Simple Algorithm 2 Taking it seriously 3 Of trees and loops 4 Beyond uniqueness? 5 A recent application Andrea Montanari (Stanford) Message Passing June 5, / 93

19 k-sat: A (Very) Simple Algorithm Andrea Montanari (Stanford) Message Passing June 5, / 93

20 k-sat x = (x 1,..., x n ) {0, 1} n Instance (k = 3): (x 1 x 4 x 6 ) (x 7 x 8 x 10 ) (x 12 x 17 x 19 ) Andrea Montanari (Stanford) Message Passing June 5, / 93

21 Broder-Frieze-Upfal (1993) Pure literal 1: Repeat : 2: Find x i pure literal; 3: Fix x i ; x i is a pure literal if it never appears negated or if only appears negated Andrea Montanari (Stanford) Message Passing June 5, / 93

22 Broder-Frieze-Upfal (1993) Random k-sat: Uniformly random formula with n variables and nα clauses. Analysis: I. Markov chain in reduced state space. II. ODE method. Andrea Montanari (Stanford) Message Passing June 5, / 93

23 Broder-Frieze-Upfal (1993) Random k-sat: Uniformly random formula with n variables and nα clauses. Analysis: I. Markov chain in reduced state space. II. ODE method. Andrea Montanari (Stanford) Message Passing June 5, / 93

24 Message passing: 1. Factor graph x 4 x 7 x 2 x 8 x 3 x 6 x 1 x 5 (x 1 x 2 x 3 ) (x 3 x 4 x 6 ) (x 3 x 5 x 6 ) (x 6 x 7 x 8 ) [Labeled bipartite graph] Andrea Montanari (Stanford) Message Passing June 5, / 93

25 Message passing: 2. Messages a ˆν (t) a i i a ν (t) i a i ν (t) i a, ˆν(t) a i {free, cons}. Andrea Montanari (Stanford) Message Passing June 5, / 93

26 Message passing: 3. Update rules s jb label on edge (j, b) b i a { ν (t+1) cons if sib s i a = ia, ˆν (t) b i = cons for some b i \ a, free otherwise. j a i { ˆν (t) free (t) if ν j a = free for some j a \ i, a i = cons otherwise. Equivalent to Pure Literal! Andrea Montanari (Stanford) Message Passing June 5, / 93

27 Message passing: 3. Update rules s jb label on edge (j, b) b i a { ν (t+1) cons if sib s i a = ia, ˆν (t) b i = cons for some b i \ a, free otherwise. j a i { ˆν (t) free (t) if ν j a = free for some j a \ i, a i = cons otherwise. Equivalent to Pure Literal! Andrea Montanari (Stanford) Message Passing June 5, / 93

28 Message passing: 4. Analysis (density evolution) φ t = P{ν (t) j a = cons}, φt = P{ˆν (t) a i = cons}, b i a φ t+1 = 1 E {[ φ t ] d } = 1 e kα 2 b φ t, d j a k 1 i φ t = φ k 1 t, Andrea Montanari (Stanford) Message Passing June 5, / 93

29 Message passing: 4. Analysis φ t+1 = 1 exp{ kαφ k 1 t /2} f (φ t ) 1 f (φ) α > α pl (k) α < α pl (k) φ α pl (k) = sup { α : 1 e kαxk 1 /2 x x [0, 1] } Theorem (Broder-Frieze-Upfal 93, Molloy 04) Pure Literal finds a solution whp if α < α pl and fails whp if α > α pl. Andrea Montanari (Stanford) Message Passing June 5, / 93

30 This is a proof because B I (t) ball of radius t around uniform I [n] T some random rooted tree T(t) its first t generations Definition (Benjamini-Schramm 1996, Aldous-Steele 2003) The sequences of (factor) graphs G n = (V n = [n], E n ) converges locally to T if, for any t, B I (t) converges in distribution to T(t). Andrea Montanari (Stanford) Message Passing June 5, / 93

31 This is a proof because B I (t) ball of radius t around uniform I [n] T some random rooted tree T(t) its first t generations Definition (Benjamini-Schramm 1996, Aldous-Steele 2003) The sequences of (factor) graphs G n = (V n = [n], E n ) converges locally to T if, for any t, B I (t) converges in distribution to T(t). Andrea Montanari (Stanford) Message Passing June 5, / 93

32 Example [In fact a little bit more is needed... ] Andrea Montanari (Stanford) Message Passing June 5, / 93 Lemma Random k-sat instances locally Poisson Galton-Watson trees. Poisson(kα/2) Poisson(kα/2)

33 Example [In fact a little bit more is needed... ] Andrea Montanari (Stanford) Message Passing June 5, / 93 Lemma Random k-sat instances locally Poisson Galton-Watson trees. Poisson(kα/2) Poisson(kα/2)

34 Example [In fact a little bit more is needed... ] Andrea Montanari (Stanford) Message Passing June 5, / 93 Lemma Random k-sat instances locally Poisson Galton-Watson trees. Poisson(kα/2) Poisson(kα/2)

35 Example [In fact a little bit more is needed... ] Andrea Montanari (Stanford) Message Passing June 5, / 93 Lemma Random k-sat instances locally Poisson Galton-Watson trees. Poisson(kα/2) Poisson(kα/2)

36 Example [In fact a little bit more is needed... ] Andrea Montanari (Stanford) Message Passing June 5, / 93 Lemma Random k-sat instances locally Poisson Galton-Watson trees. Poisson(kα/2) Poisson(kα/2)

37 Example [In fact a little bit more is needed... ] Andrea Montanari (Stanford) Message Passing June 5, / 93 Lemma Random k-sat instances locally Poisson Galton-Watson trees. Poisson(kα/2) Poisson(kα/2)

38 Example Analysis generalizes to other ensembles Andrea Montanari (Stanford) Message Passing June 5, / 93 Lemma Random k-sat instances locally Poisson Galton-Watson trees. Poisson(kα/2) Poisson(kα/2)

39 A parenthesis: Generalizations are useful LDPC codes [Gallager 1966, Luby et al. 2001] Code = { x {0, 1} n : Ax = 0 mod 2 } A = adjacency matrix of sparse (pseudo)random graph Andrea Montanari (Stanford) Message Passing June 5, / 93

40 A parenthesis: Generalizations are useful LDPC codes [Gallager 1966, Luby et al. 2001] Code = { x {0, 1} n : Ax = 0 mod 2 } A = adjacency matrix of sparse (pseudo)random graph Andrea Montanari (Stanford) Message Passing June 5, / 93

41 A parenthesis: Generalizations are useful Luby et al. 2001: Random graph w degree distributions (λ, ρ) Optimization over (λ, ρ) Andrea Montanari (Stanford) Message Passing June 5, / 93

42 Relation with Gibbs measures Poisson(kα/2) Poisson(kα/2) µ T (x) = uniform measure over solutions of T( ) Andrea Montanari (Stanford) Message Passing June 5, / 93

43 Relation with Gibbs measures What does it mean uniform??? T(t = 2) T(t = 2) Definition (DLR) µ T is uniform (Gibbs) if µ T (x T(t) x T(t) ) is uniform for all t. Andrea Montanari (Stanford) Message Passing June 5, / 93

44 Relation with Gibbs measures T(t = 2) T(t = 2) Lemma (Montanari-Shah 2010) µ T is unique if and only if α < α u (k) with α u (k) = 2 log k { 1 + ok (1) }. k In particular α pl (k) = α u (k) + O(k 2 ). Andrea Montanari (Stanford) Message Passing June 5, / 93

45 More concretely T(t = 2) T(t = 2) sup x,x µ T (x ø = 1 x T(t) ) µ T (x ø = 1 x T(t) ) A e b t. Andrea Montanari (Stanford) Message Passing June 5, / 93

46 Relation with MCMC To avoid pathologies: µ n,β (x) 1 { exp β } {clauses violated by x} Z n,β Conjecture Heath bath/glauber/gibbs sampler has τ mix = O(n C ) whp for α < α u (k). [Mossel-Sly 201?] Andrea Montanari (Stanford) Message Passing June 5, / 93

47 Relation with MCMC To avoid pathologies: µ n,β (x) 1 { exp β } {clauses violated by x} Z n,β Conjecture Heath bath/glauber/gibbs sampler has τ mix = O(n C ) whp for α < α u (k). [Mossel-Sly 201?] Andrea Montanari (Stanford) Message Passing June 5, / 93

48 Relation with MCMC To avoid pathologies: µ n,β (x) 1 { exp β } {clauses violated by x} Z n,β Conjecture Heath bath/glauber/gibbs sampler has τ mix = O(n C ) whp for α < α u (k). [Mossel-Sly 201?] Andrea Montanari (Stanford) Message Passing June 5, / 93

49 Relation with MCMC To avoid pathologies: µ n,β (x) 1 { exp β } {clauses violated by x} Z n,β Conjecture Heath bath/glauber/gibbs sampler has τ mix = O(n C ) whp for α < α u (k). [Mossel-Sly 201?] Andrea Montanari (Stanford) Message Passing June 5, / 93

50 Relation with MCMC To avoid pathologies: µ n,β (x) 1 { exp β } {clauses violated by x} Z n,β Conjecture Heath bath/glauber/gibbs sampler has τ mix = O(n C ) whp for α < α u (k). [Mossel-Sly 201?] Andrea Montanari (Stanford) Message Passing June 5, / 93

51 Taking it seriously Andrea Montanari (Stanford) Message Passing June 5, / 93

52 How would you do it in your dreams? I would use Marginal (i [n]) 1: ; 2: ; 3: ; 4: Return µ(x i = 1). µ(x i = 1) = fraction of solutions in which x i = 1 Andrea Montanari (Stanford) Message Passing June 5, / 93

53 How would you do it in your dreams? I would use Marginal (i [n]) 1: ; 2: ; 3: ; 4: Return µ(x i = 1). µ(x i = 1) = fraction of solutions in which x i = 1 Andrea Montanari (Stanford) Message Passing June 5, / 93

54 How would you do it in your dreams? Solver 1: Repeat : 2: Choose x i ; 3: µ(x i = 1) = Marginal(i); 4 : Fix x i = 1 with probability µ(x i = 1); 5 : x i = 0 otherwise; Samples a solution uniformly. Andrea Montanari (Stanford) Message Passing June 5, / 93

55 Message passing implementation of Marginal(i) x 4 x 7 x 2 x 8 x 3 x 6 x 1 x 5 (x 1 x 2 x 3 ) (x 3 x 4 x 6 ) (x 3 x 5 x 6 ) (x 6 x 7 x 8 ) Andrea Montanari (Stanford) Message Passing June 5, / 93

56 Message passing implementation of Marginal(i) a ˆν (t) a i a ν (t) i a i i ν (t) i a, ˆν(t) a i M({0, 1}) (simplex of prob measures on {0, 1}). ν (t) i a = (ν(t) i a (0), ν(t) i a (1)) Andrea Montanari (Stanford) Message Passing June 5, / 93

57 Message passing implementation of Marginal(i) a ˆν (t) a i a ν (t) i a i i ν (t) i a, ˆν(t) a i M({0, 1}) (simplex of prob measures on {0, 1}). ν (t) i a = (ν(t) i a (0), ν(t) i a (1)) Andrea Montanari (Stanford) Message Passing June 5, / 93

58 Message passing implementation of Marginal(i) s jb label on edge (j, b) b i a ν (t+1) i a (x i) = ˆν (t) b i (x i) b i\a j a i { ˆν (t) a i (x i) 1 j a\i = ν j a(x j = s ja ) if x i = s ia, 1 otherwise. [Belief propagation] Andrea Montanari (Stanford) Message Passing June 5, / 93

59 What are these messy equations? Clause a: (x 1 x 2 x 3 ) G\a 2 a 1 3 N G (x A = ) number of solutions such that x A =, µ G (x i = 1) = N G (x i = 1) N G (x i = 0) + N G (x i = 1). Andrea Montanari (Stanford) Message Passing June 5, / 93

60 What are these messy equations? Clause a: (x 1 x 2 x 3 ) G\a 3 2 a 1 N G (x 1 = 1) = N G\a (x 2 = 0, x 3 = 0) + N G\a (x 2 = 0, x 3 = 1) +N G\a (x 2 = 1, x 3 = 0) + N G\a (x 2 = 1, x 3 = 1) N G (x 1 = 0) = N G\a (x 2 = 0, x 3 = 1) + N G\a (x 2 = 1, x 3 = 0) +N G\a (x 2 = 1, x 3 = 1) = N G (x 1 = 1) N G\a (x 2 = 0, x 3 = 0) Andrea Montanari (Stanford) Message Passing June 5, / 93

61 What are these messy equations? Clause a: (x 1 x 2 x 3 ) G\a 3 2 a 1 N G (x 1 = 1) = N G\a (x 2 = 0, x 3 = 0) + N G\a (x 2 = 0, x 3 = 1) +N G\a (x 2 = 1, x 3 = 0) + N G\a (x 2 = 1, x 3 = 1) N G (x 1 = 0) = N G\a (x 2 = 0, x 3 = 1) + N G\a (x 2 = 1, x 3 = 0) +N G\a (x 2 = 1, x 3 = 1) = N G (x 1 = 1) N G\a (x 2 = 0, x 3 = 0) Andrea Montanari (Stanford) Message Passing June 5, / 93

62 What are these messy equations? Clause a: (x 1 x 2 x 3 ) G\a 2 a 1 3 N G (x 1 = 0) N G (x 1 = 1) = µg (x 1 = 0) µ G (x 1 = 1) = 1 µ G\a (x 2 = 0, x 3 = 0) 1 µ G\a (x 2 = 0)µ G\a (x 3 = 0). Andrea Montanari (Stanford) Message Passing June 5, / 93

63 What are these messy equations? Clause a: (x 1 x 2 x 3 ) G\a 2 a 1 3 N G (x 1 = 0) N G (x 1 = 1) = µg (x 1 = 0) µ G (x 1 = 1) = 1 µ G\a (x 2 = 0, x 3 = 0) 1 µ G\a (x 2 = 0)µ G\a (x 3 = 0). Andrea Montanari (Stanford) Message Passing June 5, / 93

64 BP-guided decimation BP-guided decimation 1: Repeat : 2: Choose x i ; 3: Compute µ(x i = 1) using BP; 4 : Fix x i = 1 with probability µ(x i = 1); 5 : x i = 0 otherwise; Andrea Montanari (Stanford) Message Passing June 5, / 93

65 Is it worth the effort??? Proposition BP computes the correct marginals for if α < α u (k), where α u (k) = 2 log k { 1 + ok (1) } = α pl (k) { 1 + o k (1) } : ( k Proof. A general argument, wait 2 minutes. Andrea Montanari (Stanford) Message Passing June 5, / 93

66 Is it worth the effort??? Proposition BP computes the correct marginals for if α < α u (k), where α u (k) = 2 log k { 1 + ok (1) } = α pl (k) { 1 + o k (1) } : ( k Proof. A general argument, wait 2 minutes. Andrea Montanari (Stanford) Message Passing June 5, / 93

67 Is it worth the effort??? Proposition BP computes the correct marginals for if α < α u (k), where α u (k) = 2 log k { 1 + ok (1) } = α pl (k) { 1 + o k (1) } : ( k Proof. A general argument, wait 2 minutes. Andrea Montanari (Stanford) Message Passing June 5, / 93

68 Is it worth the effort??? Andrea Montanari (Stanford) Message Passing June 5, / 93

69 Is it worth the effort??? 1 N= N=10000 success probability N=3000 N= α [Montanari-Ricci-Semerjian 2007] Andrea Montanari (Stanford) Message Passing June 5, / 93

70 The context: 4-SAT α pl (4) 1.54 α BP dec (4) α s (4) 9.93 [Mézard-Parisi-Zecchina 2002 (Conj)] Andrea Montanari (Stanford) Message Passing June 5, / 93

71 The context: k-sat α pl (4) 2 log k k α BP dec (k) 2k e k 0 α s (k) 2 k log 2 [Achlioptas-Peres 2004] Andrea Montanari (Stanford) Message Passing June 5, / 93

72 I owed you a general argument Andrea Montanari (Stanford) Message Passing June 5, / 93

73 Of trees and loops Andrea Montanari (Stanford) Message Passing June 5, / 93

74 Computation tree root Andrea Montanari (Stanford) Message Passing June 5, / 93

75 Computation tree, T G,i (0) root root Andrea Montanari (Stanford) Message Passing June 5, / 93

76 Computation tree, T G,i (1) root root Andrea Montanari (Stanford) Message Passing June 5, / 93

77 Computation tree, T G,i (2) root root Andrea Montanari (Stanford) Message Passing June 5, / 93

78 Computation tree, T G,i (3) root root Andrea Montanari (Stanford) Message Passing June 5, / 93

79 So what? root root Remark After t iterations BP(i, t) outputs µ T G,i (t) (x i ) Andrea Montanari (Stanford) Message Passing June 5, / 93

80 And now for the general argument root root Theorem (Tatikonda-Jordan 2002) 1. If the Gibbs measure on T G,i ( ) is unique then BP converges. 2. Further, if G has large girth, then µ BP(t) (x i ) = µ T G,i (t) (x i ) µ G (x i ). Proof. By definition. Andrea Montanari (Stanford) Message Passing June 5, / 93

81 And now for the general argument root root Theorem (Tatikonda-Jordan 2002) 1. If the Gibbs measure on T G,i ( ) is unique then BP converges. 2. Further, if G has large girth, then µ BP(t) (x i ) = µ T G,i (t) (x i ) µ G (x i ). Proof. By definition. Andrea Montanari (Stanford) Message Passing June 5, / 93

82 And now for the general argument root root Theorem (Tatikonda-Jordan 2002) 1. If the Gibbs measure on T G,i ( ) is unique then BP converges. 2. Further, if G has large girth, then µ BP(t) (x i ) = µ T G,i (t) (x i ) µ G (x i ). Proof. By definition. Andrea Montanari (Stanford) Message Passing June 5, / 93

83 What about graphs with many short loops? Andrea Montanari (Stanford) Message Passing June 5, / 93

84 Example: Independent sets root x j {0, 1} µ G (x) = 1 Z (i,j) E I(x i x j ) λ x Andrea Montanari (Stanford) Message Passing June 5, / 93

85 Weitz s Self Avoiding Walk tree T SAW G,i = Truncation of T G,i ( ) + boundary conditions Lemma µ TSAW G,i (x i ) = µ G (x i ). Andrea Montanari (Stanford) Message Passing June 5, / 93

86 Weitz s Self Avoiding Walk tree root root As you can see... T SAW G,i = exp{θ(n)} Andrea Montanari (Stanford) Message Passing June 5, / 93

87 An algorithm Truncate T SAW G,i at depth Θ(log n) Theorem (Weitz 2006) Assume G has degree bounded by k, and Gibbs measure on k-regular trees is unique. Approximate counting can be performed in polynomial time. Andrea Montanari (Stanford) Message Passing June 5, / 93

88 An algorithm Truncate T SAW G,i at depth Θ(log n) Theorem (Weitz 2006) Assume G has degree bounded by k, and Gibbs measure on k-regular trees is unique. Approximate counting can be performed in polynomial time. Andrea Montanari (Stanford) Message Passing June 5, / 93

89 How general is this strategy? Theorem (Gamarnik-Katz 2007) Pretty general. [Uses appropriate backtracking tree] Andrea Montanari (Stanford) Message Passing June 5, / 93

90 How practical is this strategy? Complexity = Θ(n big exponent ) big exponent depends on b sup x,x µ T (x ø = 1 x T(t) ) µ T (x ø = 1 x T(t) ) A e b t. [Lu-Measson-Montanari 2008] Andrea Montanari (Stanford) Message Passing June 5, / 93

91 How practical is this strategy? Complexity = Θ(n big exponent ) big exponent depends on b sup x,x µ T (x ø = 1 x T(t) ) µ T (x ø = 1 x T(t) ) A e b t. [Lu-Measson-Montanari 2008] Andrea Montanari (Stanford) Message Passing June 5, / 93

92 How practical is this strategy? Complexity = Θ(n big exponent ) big exponent depends on b sup x,x µ T (x ø = 1 x T(t) ) µ T (x ø = 1 x T(t) ) A e b t. [Lu-Measson-Montanari 2008] Andrea Montanari (Stanford) Message Passing June 5, / 93

93 Beyond uniqueness? Andrea Montanari (Stanford) Message Passing June 5, / 93

94 Why should we hope for more? Example: NAE-SAT One clause: (x 1 x 16 x 71 ) (x 1 x 16 x 71 ) µ G (x) = uniform measure over solutions Andrea Montanari (Stanford) Message Passing June 5, / 93

95 Why should we hope for more? Theorem (Montanari-Restrepo-Tetali 2009) For α α (k) = 2 k 1 log 2{1 + o k (1)}, (G n, µ Gn ) locally (T, µ T ). Theorem (Achlioptas-Moore 2002) For NAE-SAT α s (k) = 2 k 1 log 2 { 1 + o k (1) }. Proof: Second moment method. Andrea Montanari (Stanford) Message Passing June 5, / 93

96 Why should we hope for more? Theorem (Montanari-Restrepo-Tetali 2009) For α α (k) = 2 k 1 log 2{1 + o k (1)}, (G n, µ Gn ) locally (T, µ T ). Theorem (Achlioptas-Moore 2002) For NAE-SAT α s (k) = 2 k 1 log 2 { 1 + o k (1) }. Proof: Second moment method. Andrea Montanari (Stanford) Message Passing June 5, / 93

97 Why should we hope for more? Theorem (Montanari-Restrepo-Tetali 2009) For α α (k) = 2 k 1 log 2{1 + o k (1)}, (G n, µ Gn ) locally (T, µ T ). Theorem (Achlioptas-Moore 2002) For NAE-SAT α s (k) = 2 k 1 log 2 { 1 + o k (1) }. Proof: Second moment method. Andrea Montanari (Stanford) Message Passing June 5, / 93

98 What is happening? Notions of correlation decay Uniqueness: sup x,x x i µ(xi x i,r ) µ(x i x i,r ) 0 Extremality: x i,x i,r µ(x i, x i,r ) µ(x i )µ(x i,r ) 0 Concentration: µ(xi(1),..., x i(l) ) µ(x i(1) ) µ(x i(l) ) 0 x i(1)...x i(l) Andrea Montanari (Stanford) Message Passing June 5, / 93

99 What is happening? Notions of correlation decay Uniqueness: sup x,x x i µ(xi x i,r ) µ(x i x i,r ) 0 Extremality: x i,x i,r µ(x i, x i,r ) µ(x i )µ(x i,r ) 0 Concentration: µ(xi(1),..., x i(l) ) µ(x i(1) ) µ(x i(l) ) 0 x i(1)...x i(l) Andrea Montanari (Stanford) Message Passing June 5, / 93

100 What is happening? Notions of correlation decay Uniqueness: sup x,x x i µ(xi x i,r ) µ(x i x i,r ) 0 Extremality: x i,x i,r µ(x i, x i,r ) µ(x i )µ(x i,r ) 0 Concentration: µ(xi(1),..., x i(l) ) µ(x i(1) ) µ(x i(l) ) 0 x i(1)...x i(l) Andrea Montanari (Stanford) Message Passing June 5, / 93

101 What is happening? Notions of correlation decay Uniqueness: sup x,x x i µ(xi x i,r ) µ(x i x i,r ) 0 Extremality: x i,x i,r µ(x i, x i,r ) µ(x i )µ(x i,r ) 0 Concentration: µ(xi(1),..., x i(l) ) µ(x i(1) ) µ(x i(l) ) 0 x i(1)...x i(l) Andrea Montanari (Stanford) Message Passing June 5, / 93

102 What happens in k-sat? uniqueness extremality concentration no-concentration α u (k) α d (k) α c (k) α s (k) α u (k) = (2 log k)/k +... α d (k) = (2 k log k)/k +... [proved] (α d (4) 9.38) [proved in NAE-SAT] α c (k) = 2 k log log (α c(4) 9.547) α s (k) = 2 k log (1 + log 2) +... (α s(4) 9.93) Andrea Montanari (Stanford) Message Passing June 5, / 93

103 So what? Andrea Montanari (Stanford) Message Passing June 5, / 93

104 So what? Bethe-Peierls approximation root µ(x) = (i,j) E ψ i,j(x i, x j )/Z x i X Definition A set of messages is a collection {ν i j ( )} indexed by directed edges in G, where ν i j M(X ). Andrea Montanari (Stanford) Message Passing June 5, / 93

105 So what? Bethe-Peierls approximation root µ(x) = (i,j) E ψ i,j(x i, x j )/Z x i X Definition A set of messages is a collection {ν i j ( )} indexed by directed edges in G, where ν i j M(X ). Andrea Montanari (Stanford) Message Passing June 5, / 93

106 Given F G, diam(f ) 2l girth, such that deg F (i) = deg G (i) or 1 ν U (x U ) 1 C(ν U ) (ij) F ψ ij (x i, x j ) ν i j(i) (x i ). i F Andrea Montanari (Stanford) Message Passing June 5, / 93

107 Bethe states Definition A probability distribution ρ on X V is an (ε, r) Bethe state, if there exists a set of messages {ν i j ( )} such that, for any F G with diam(f ) 2r ρ U ν U TV ε. Theorem (Dembo-Montanari 2009) If µ is extremal with rate δ( ) then it an (ε, r) Bethe state for any r < l and ε Cδ(l r). Andrea Montanari (Stanford) Message Passing June 5, / 93

108 Bethe states Definition A probability distribution ρ on X V is an (ε, r) Bethe state, if there exists a set of messages {ν i j ( )} such that, for any F G with diam(f ) 2r ρ U ν U TV ε. Theorem (Dembo-Montanari 2009) If µ is extremal with rate δ( ) then it an (ε, r) Bethe state for any r < l and ε Cδ(l r). Andrea Montanari (Stanford) Message Passing June 5, / 93

109 Algorithms vs correlation decay α u (k) α d (k) α c (k) α s (k) Conjecture BP-guided decimation finds solutions up to α (k) α d (k). Currently huge gap! Andrea Montanari (Stanford) Message Passing June 5, / 93

110 Algorithms vs correlation decay α u (k) α d (k) α c (k) α s (k) Conjecture BP-guided decimation finds solutions up to α (k) α d (k). Currently huge gap! Andrea Montanari (Stanford) Message Passing June 5, / 93

111 Relation with the structure of the solution space α d (k) α c (k) α s (k) [Biroli-Monasson-Weigt 1999] [Mézard-Parisi-Zecchina 2001] [Krzakala et al. 2007] [Achlioptas-Coja 2009] Andrea Montanari (Stanford) Message Passing June 5, / 93

112 A recent application Andrea Montanari (Stanford) Message Passing June 5, / 93

113 Noisy undedetermined linear systems y = A x 0 + w Estimate x 0 R N given (y, A). [Signal processing: Donoho, Candes, Tao,... ] [Sketching: Indyk, Gilbert, Muthu,... ] Andrea Montanari (Stanford) Message Passing June 5, / 93

114 Noisy undedetermined linear systems y = A x 0 + w Estimate x 0 R N given (y, A). [Signal processing: Donoho, Candes, Tao,... ] [Sketching: Indyk, Gilbert, Muthu,... ] Andrea Montanari (Stanford) Message Passing June 5, / 93

115 Noisy undedetermined linear systems y = A x 0 + w Estimate x 0 R N given (y, A). [Signal processing: Donoho, Candes, Tao,... ] [Sketching: Indyk, Gilbert, Muthu,... ] Andrea Montanari (Stanford) Message Passing June 5, / 93

116 The LASSO x(y, A) = argmin x R N C A,y (x) C A,y (x) = λ x y Ax 2 2 [Tibshirani 96; Chen, Donoho 95; papers] Andrea Montanari (Stanford) Message Passing June 5, / 93

117 Wonderful, but... What performance should I expect? How am I supposed to choose C A,y? What if I can design A? Low-complexity algorithms? Andrea Montanari (Stanford) Message Passing June 5, / 93

118 A little experiment A real data +1 with prob , x 0,i = 0 with prob , 1 with prob , w i N(0, 0.2) Andrea Montanari (Stanford) Message Passing June 5, / 93

119 Clinical data 0.5 MSE N=200 N=500 N= λ A is n N, n = 0.64N Andrea Montanari (Stanford) Message Passing June 5, / 93

120 Clinical data 0.5 MSE N=200 N=500 N= λ A is n N, n = 0.64N Andrea Montanari (Stanford) Message Passing June 5, / 93

121 Gene expression data 0.5 MSE λ A is [from Hastie, Tibshirani, Friedman] Andrea Montanari (Stanford) Message Passing June 5, / 93

122 Gene expression data 0.5 MSE λ A is [from Hastie, Tibshirani, Friedman] Andrea Montanari (Stanford) Message Passing June 5, / 93

123 A theorem Theorem (Bayati, Montanari, 2010) Assume A ij N(0, 1/n), y = Ax 0 + w, and (τ 2, θ ) unique solution of Then, τ 2 = σ δ E{[η(X 0 + τ Z; θ ) X 0 ] 2 }, λ = θ { 1 1 δ E[η (X 0 + τ Z; θ )] } lim N almost surely as n. 1 N x LASSO (λ) x 0 2 = (τ 2 σ 2 )δ. Conjectured in a more general context with Donoho and Maleki Andrea Montanari (Stanford) Message Passing June 5, / 93

124 A theorem Theorem (Bayati, Montanari, 2010) Assume A ij N(0, 1/n), y = Ax 0 + w, and (τ 2, θ ) unique solution of Then, τ 2 = σ δ E{[η(X 0 + τ Z; θ ) X 0 ] 2 }, λ = θ { 1 1 δ E[η (X 0 + τ Z; θ )] } lim N almost surely as n. 1 N x LASSO (λ) x 0 2 = (τ 2 σ 2 )δ. Conjectured in a more general context with Donoho and Maleki Andrea Montanari (Stanford) Message Passing June 5, / 93

125 η η(y; θ) y θ +θ Andrea Montanari (Stanford) Message Passing June 5, / 93

126 Proof structure 1. Construct a message passing algorithm tho infer x. 2. Prove that the distribution of messages converges weakly to *** 2. and that their variance is tracked by a recursion. 3. Prove that message passing converges to the LASSO opt. Andrea Montanari (Stanford) Message Passing June 5, / 93

127 Approximate message passing algorithm x t+1 = η(x t + A T z t ; θ t ), z t = y Ax t + b t z t 1. with b t 1 NX η (x t 1 + A T z t 1 ; θ t ). n i=1 Andrea Montanari (Stanford) Message Passing June 5, / 93

128 Approximate message passing algorithm x t+1 = η(x t + A T z t ; θ t ), z t = y Ax t + b t z t 1. with b t 1 NX η (x t 1 + A T z t 1 ; θ t ). n i=1 Andrea Montanari (Stanford) Message Passing June 5, / 93

129 Approximate message passing algorithm x t+1 = η(x t + A T z t ; θ t ), z t = y Ax t + b t z t 1. Graph is dense. No local weak limit. No edge variables: only O(n) messages. Onsager term Andrea Montanari (Stanford) Message Passing June 5, / 93

130 Approximate message passing algorithm x t+1 = η(x t + A T z t ; θ t ), z t = y Ax t + b t z t 1. Graph is dense. No local weak limit. No edge variables: only O(n) messages. Onsager term Andrea Montanari (Stanford) Message Passing June 5, / 93

131 Approximate message passing algorithm x t+1 = η(x t + A T z t ; θ t ), z t = y Ax t + b t z t 1. Graph is dense. No local weak limit. No edge variables: only O(n) messages. Onsager term Andrea Montanari (Stanford) Message Passing June 5, / 93

132 Approximate message passing algorithm x t+1 = η(x t + A T z t ; θ t ), z t = y Ax t + b t z t 1. Graph is dense. No local weak limit. No edge variables: only O(n) messages. Onsager term Andrea Montanari (Stanford) Message Passing June 5, / 93

133 Approximate message passing algorithm x t+1 = η(x t + A T z t ; θ t ), z t = y Ax t + b t z t 1. Graph is dense. No local weak limit. No edge variables: only O(n) messages. Onsager term Andrea Montanari (Stanford) Message Passing June 5, / 93

134 A large gain in performances Comparison of Different Algorithms 1 IHT IST Tuned TST 0.9 LARS OMP 0.8 L 1 MPIST 0.7 ρ rho delta δ x t+1 = η(x t + A T z t ; θ t ), z t = y Ax t + b t z t 1. Andrea Montanari (Stanford) Message Passing June 5, / 93

135 Conclusion Andrea Montanari (Stanford) Message Passing June 5, / 93

136 I did not talk about Gaussian graphical models. (Weiss) Free energies Generalized BP. (Yedidia-Freeman-Weiss) Relation with convex relaxations. (Wainwright-Jordan, Bayati et al) Message passing to find game-theoretical equilibria. (Kanoria et al. arxiv: ) Andrea Montanari (Stanford) Message Passing June 5, / 93

137 Conclusion: A success looking for theoreticians A super-heuristics: Subsumes many natural heuristics. Easy to design/optimize. (Can be) used almost everywhere. No example in which it beats standard methods. Thanks! Andrea Montanari (Stanford) Message Passing June 5, / 93

138 Conclusion: A success looking for theoreticians A super-heuristics: Subsumes many natural heuristics. Easy to design/optimize. (Can be) used almost everywhere. No example in which it beats standard methods. Thanks! Andrea Montanari (Stanford) Message Passing June 5, / 93

Message passing and approximate message passing

Message passing and approximate message passing Message passing and approximate message passing Arian Maleki Columbia University 1 / 47 What is the problem? Given pdf µ(x 1, x 2,..., x n ) we are interested in arg maxx1,x 2,...,x n µ(x 1, x 2,..., x

More information

Statistical Physics on Sparse Random Graphs: Mathematical Perspective

Statistical Physics on Sparse Random Graphs: Mathematical Perspective Statistical Physics on Sparse Random Graphs: Mathematical Perspective Amir Dembo Stanford University Northwestern, July 19, 2016 x 5 x 6 Factor model [DM10, Eqn. (1.4)] x 1 x 2 x 3 x 4 x 9 x8 x 7 x 10

More information

Risk and Noise Estimation in High Dimensional Statistics via State Evolution

Risk and Noise Estimation in High Dimensional Statistics via State Evolution Risk and Noise Estimation in High Dimensional Statistics via State Evolution Mohsen Bayati Stanford University Joint work with Jose Bento, Murat Erdogdu, Marc Lelarge, and Andrea Montanari Statistical

More information

Phase transitions in discrete structures

Phase transitions in discrete structures Phase transitions in discrete structures Amin Coja-Oghlan Goethe University Frankfurt Overview 1 The physics approach. [following Mézard, Montanari 09] Basics. Replica symmetry ( Belief Propagation ).

More information

Matrix Completion from Fewer Entries

Matrix Completion from Fewer Entries from Fewer Entries Stanford University March 30, 2009 Outline The problem, a look at the data, and some results (slides) 2 Proofs (blackboard) arxiv:090.350 The problem, a look at the data, and some results

More information

Planted Cliques, Iterative Thresholding and Message Passing Algorithms

Planted Cliques, Iterative Thresholding and Message Passing Algorithms Planted Cliques, Iterative Thresholding and Message Passing Algorithms Yash Deshpande and Andrea Montanari Stanford University November 5, 2013 Deshpande, Montanari Planted Cliques November 5, 2013 1 /

More information

Reconstruction for Models on Random Graphs

Reconstruction for Models on Random Graphs 1 Stanford University, 2 ENS Paris October 21, 2007 Outline 1 The Reconstruction Problem 2 Related work and applications 3 Results The Reconstruction Problem: A Story Alice and Bob Alice, Bob and G root

More information

Matrix completion: Fundamental limits and efficient algorithms. Sewoong Oh Stanford University

Matrix completion: Fundamental limits and efficient algorithms. Sewoong Oh Stanford University Matrix completion: Fundamental limits and efficient algorithms Sewoong Oh Stanford University 1 / 35 Low-rank matrix completion Low-rank Data Matrix Sparse Sampled Matrix Complete the matrix from small

More information

Chasing the k-sat Threshold

Chasing the k-sat Threshold Chasing the k-sat Threshold Amin Coja-Oghlan Goethe University Frankfurt Random Discrete Structures In physics, phase transitions are studied by non-rigorous methods. New mathematical ideas are necessary

More information

On the number of circuits in random graphs. Guilhem Semerjian. [ joint work with Enzo Marinari and Rémi Monasson ]

On the number of circuits in random graphs. Guilhem Semerjian. [ joint work with Enzo Marinari and Rémi Monasson ] On the number of circuits in random graphs Guilhem Semerjian [ joint work with Enzo Marinari and Rémi Monasson ] [ Europhys. Lett. 73, 8 (2006) ] and [ cond-mat/0603657 ] Orsay 13-04-2006 Outline of the

More information

Probabilistic and Bayesian Machine Learning

Probabilistic and Bayesian Machine Learning Probabilistic and Bayesian Machine Learning Day 4: Expectation and Belief Propagation Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/

More information

Sparse Superposition Codes for the Gaussian Channel

Sparse Superposition Codes for the Gaussian Channel Sparse Superposition Codes for the Gaussian Channel Florent Krzakala (LPS, Ecole Normale Supérieure, France) J. Barbier (ENS) arxiv:1403.8024 presented at ISIT 14 Long version in preparation Communication

More information

Reconstruction in the Generalized Stochastic Block Model

Reconstruction in the Generalized Stochastic Block Model Reconstruction in the Generalized Stochastic Block Model Marc Lelarge 1 Laurent Massoulié 2 Jiaming Xu 3 1 INRIA-ENS 2 INRIA-Microsoft Research Joint Centre 3 University of Illinois, Urbana-Champaign GDR

More information

Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation

Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation Jason K. Johnson, Dmitry M. Malioutov and Alan S. Willsky Department of Electrical Engineering and Computer Science Massachusetts Institute

More information

The Tightness of the Kesten-Stigum Reconstruction Bound for a Symmetric Model With Multiple Mutations

The Tightness of the Kesten-Stigum Reconstruction Bound for a Symmetric Model With Multiple Mutations The Tightness of the Kesten-Stigum Reconstruction Bound for a Symmetric Model With Multiple Mutations City University of New York Frontier Probability Days 2018 Joint work with Dr. Sreenivasa Rao Jammalamadaka

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3

More information

Approximate counting of large subgraphs in random graphs with statistical mechanics methods

Approximate counting of large subgraphs in random graphs with statistical mechanics methods Approximate counting of large subgraphs in random graphs with statistical mechanics methods Guilhem Semerjian LPT-ENS Paris 13.03.08 / Eindhoven in collaboration with Rémi Monasson, Enzo Marinari and Valery

More information

Matrix Completion: Fundamental Limits and Efficient Algorithms

Matrix Completion: Fundamental Limits and Efficient Algorithms Matrix Completion: Fundamental Limits and Efficient Algorithms Sewoong Oh PhD Defense Stanford University July 23, 2010 1 / 33 Matrix completion Find the missing entries in a huge data matrix 2 / 33 Example

More information

Approximate Message Passing

Approximate Message Passing Approximate Message Passing Mohammad Emtiyaz Khan CS, UBC February 8, 2012 Abstract In this note, I summarize Sections 5.1 and 5.2 of Arian Maleki s PhD thesis. 1 Notation We denote scalars by small letters

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

Decay of Correlation in Spin Systems

Decay of Correlation in Spin Systems (An algorithmic perspective to the) Decay of Correlation in Spin Systems Yitong Yin Nanjing University Decay of Correlation hardcore model: random independent set I v Pr[v 2 I ] (d+)-regular tree `! σ:

More information

Phase Transitions in the Coloring of Random Graphs

Phase Transitions in the Coloring of Random Graphs Phase Transitions in the Coloring of Random Graphs Lenka Zdeborová (LPTMS, Orsay) In collaboration with: F. Krząkała (ESPCI Paris) G. Semerjian (ENS Paris) A. Montanari (Standford) F. Ricci-Tersenghi (La

More information

UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS

UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS JONATHAN YEDIDIA, WILLIAM FREEMAN, YAIR WEISS 2001 MERL TECH REPORT Kristin Branson and Ian Fasel June 11, 2003 1. Inference Inference problems

More information

THE PHYSICS OF COUNTING AND SAMPLING ON RANDOM INSTANCES. Lenka Zdeborová

THE PHYSICS OF COUNTING AND SAMPLING ON RANDOM INSTANCES. Lenka Zdeborová THE PHYSICS OF COUNTING AND SAMPLING ON RANDOM INSTANCES Lenka Zdeborová (CEA Saclay and CNRS, France) MAIN CONTRIBUTORS TO THE PHYSICS UNDERSTANDING OF RANDOM INSTANCES Braunstein, Franz, Kabashima, Kirkpatrick,

More information

On convergence of Approximate Message Passing

On convergence of Approximate Message Passing On convergence of Approximate Message Passing Francesco Caltagirone (1), Florent Krzakala (2) and Lenka Zdeborova (1) (1) Institut de Physique Théorique, CEA Saclay (2) LPS, Ecole Normale Supérieure, Paris

More information

The non-backtracking operator

The non-backtracking operator The non-backtracking operator Florent Krzakala LPS, Ecole Normale Supérieure in collaboration with Paris: L. Zdeborova, A. Saade Rome: A. Decelle Würzburg: J. Reichardt Santa Fe: C. Moore, P. Zhang Berkeley:

More information

arxiv: v1 [stat.ml] 28 Oct 2017

arxiv: v1 [stat.ml] 28 Oct 2017 Jinglin Chen Jian Peng Qiang Liu UIUC UIUC Dartmouth arxiv:1710.10404v1 [stat.ml] 28 Oct 2017 Abstract We propose a new localized inference algorithm for answering marginalization queries in large graphical

More information

1 Tridiagonal matrices

1 Tridiagonal matrices Lecture Notes: β-ensembles Bálint Virág Notes with Diane Holcomb 1 Tridiagonal matrices Definition 1. Suppose you have a symmetric matrix A, we can define its spectral measure (at the first coordinate

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15

More information

Message Passing Algorithms for Compressed Sensing: I. Motivation and Construction

Message Passing Algorithms for Compressed Sensing: I. Motivation and Construction Message Passing Algorithms for Compressed Sensing: I. Motivation and Construction David L. Donoho Department of Statistics Arian Maleki Department of Electrical Engineering Andrea Montanari Department

More information

A New Look at Survey Propagation and its Generalizations

A New Look at Survey Propagation and its Generalizations A New Look at Survey Propagation and its Generalizations Elitza Maneva Elchanan Mossel Martin J. Wainwright September 7, 2004 Technical Report 669 Department of Statistics University of California, Berkeley

More information

How to Design Message Passing Algorithms for Compressed Sensing

How to Design Message Passing Algorithms for Compressed Sensing How to Design Message Passing Algorithms for Compressed Sensing David L. Donoho, Arian Maleki and Andrea Montanari, February 17, 2011 Abstract Finding fast first order methods for recovering signals from

More information

Spectral thresholds in the bipartite stochastic block model

Spectral thresholds in the bipartite stochastic block model Spectral thresholds in the bipartite stochastic block model Laura Florescu and Will Perkins NYU and U of Birmingham September 27, 2016 Laura Florescu and Will Perkins Spectral thresholds in the bipartite

More information

Growth Rate of Spatially Coupled LDPC codes

Growth Rate of Spatially Coupled LDPC codes Growth Rate of Spatially Coupled LDPC codes Workshop on Spatially Coupled Codes and Related Topics at Tokyo Institute of Technology 2011/2/19 Contents 1. Factor graph, Bethe approximation and belief propagation

More information

Perturbed Message Passing for Constraint Satisfaction Problems

Perturbed Message Passing for Constraint Satisfaction Problems Journal of Machine Learning Research 16 (2015) 1249-1274 Submitted 4/13; Revised 9/14; Published 7/15 Perturbed Message Passing for Constraint Satisfaction Problems Siamak Ravanbakhsh Russell Greiner Department

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

Phase Transitions (and their meaning) in Random Constraint Satisfaction Problems

Phase Transitions (and their meaning) in Random Constraint Satisfaction Problems International Workshop on Statistical-Mechanical Informatics 2007 Kyoto, September 17 Phase Transitions (and their meaning) in Random Constraint Satisfaction Problems Florent Krzakala In collaboration

More information

Min-Max Message Passing and Local Consistency in Constraint Networks

Min-Max Message Passing and Local Consistency in Constraint Networks Min-Max Message Passing and Local Consistency in Constraint Networks Hong Xu, T. K. Satish Kumar, and Sven Koenig University of Southern California, Los Angeles, CA 90089, USA hongx@usc.edu tkskwork@gmail.com

More information

A Graphical Transformation for Belief Propagation: Maximum Weight Matchings and Odd-Sized Cycles

A Graphical Transformation for Belief Propagation: Maximum Weight Matchings and Odd-Sized Cycles A Graphical Transformation for Belief Propagation: Maximum Weight Matchings and Odd-Sized Cycles Jinwoo Shin Department of Electrical Engineering Korea Advanced Institute of Science and Technology Daejeon,

More information

Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract)

Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract) Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract) Arthur Choi and Adnan Darwiche Computer Science Department University of California, Los Angeles

More information

The condensation phase transition in random graph coloring

The condensation phase transition in random graph coloring The condensation phase transition in random graph coloring Victor Bapst Goethe University, Frankfurt Joint work with Amin Coja-Oghlan, Samuel Hetterich, Felicia Rassmann and Dan Vilenchik arxiv:1404.5513

More information

Low-rank Matrix Completion from Noisy Entries

Low-rank Matrix Completion from Noisy Entries Low-rank Matrix Completion from Noisy Entries Sewoong Oh Joint work with Raghunandan Keshavan and Andrea Montanari Stanford University Forty-Seventh Allerton Conference October 1, 2009 R.Keshavan, S.Oh,

More information

Exponential families also behave nicely under conditioning. Specifically, suppose we write η = (η 1, η 2 ) R k R p k so that

Exponential families also behave nicely under conditioning. Specifically, suppose we write η = (η 1, η 2 ) R k R p k so that 1 More examples 1.1 Exponential families under conditioning Exponential families also behave nicely under conditioning. Specifically, suppose we write η = η 1, η 2 R k R p k so that dp η dm 0 = e ηt 1

More information

Andrea Montanari. Professor of Electrical Engineering, of Statistics and, by courtesy, of Mathematics. Bio. Teaching BIO ACADEMIC APPOINTMENTS

Andrea Montanari. Professor of Electrical Engineering, of Statistics and, by courtesy, of Mathematics. Bio. Teaching BIO ACADEMIC APPOINTMENTS Professor of Electrical Engineering, of Statistics and, by courtesy, of Mathematics Bio BIO I am interested in developing efficient algorithms to make sense of large amounts of noisy data, extract information

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Maria Ryskina, Yen-Chia Hsu 1 Introduction

More information

Combining geometry and combinatorics

Combining geometry and combinatorics Combining geometry and combinatorics A unified approach to sparse signal recovery Anna C. Gilbert University of Michigan joint work with R. Berinde (MIT), P. Indyk (MIT), H. Karloff (AT&T), M. Strauss

More information

The Condensation Phase Transition in the Regular k-sat Model

The Condensation Phase Transition in the Regular k-sat Model The Condensation Phase Transition in the Regular k-sat Model Victor Bapst 1 and Amin Coja-Oghlan 2 1 Mathematics Institute, Goethe University, Frankfurt, Germany bapst@math.uni-frankfurt.de 2 Mathematics

More information

Improved FPTAS for Multi-Spin Systems

Improved FPTAS for Multi-Spin Systems Improved FPTAS for Multi-Spin Systems Pinyan Lu and Yitong Yin 2 Microsoft Research Asia, China. pinyanl@microsoft.com 2 State Key Laboratory for Novel Software Technology, Nanjing University, China. yinyt@nju.edu.cn

More information

Random Graph Coloring

Random Graph Coloring Random Graph Coloring Amin Coja-Oghlan Goethe University Frankfurt A simple question... What is the chromatic number of G(n,m)? [ER 60] n vertices m = dn/2 random edges ... that lacks a simple answer Early

More information

XVI International Congress on Mathematical Physics

XVI International Congress on Mathematical Physics Aug 2009 XVI International Congress on Mathematical Physics Underlying geometry: finite graph G=(V,E ). Set of possible configurations: V (assignments of +/- spins to the vertices). Probability of a configuration

More information

Lecture 1: September 25

Lecture 1: September 25 0-725: Optimization Fall 202 Lecture : September 25 Lecturer: Geoff Gordon/Ryan Tibshirani Scribes: Subhodeep Moitra Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

Sparse analysis Lecture VII: Combining geometry and combinatorics, sparse matrices for sparse signal recovery

Sparse analysis Lecture VII: Combining geometry and combinatorics, sparse matrices for sparse signal recovery Sparse analysis Lecture VII: Combining geometry and combinatorics, sparse matrices for sparse signal recovery Anna C. Gilbert Department of Mathematics University of Michigan Sparse signal recovery measurements:

More information

Variational algorithms for marginal MAP

Variational algorithms for marginal MAP Variational algorithms for marginal MAP Alexander Ihler UC Irvine CIOG Workshop November 2011 Variational algorithms for marginal MAP Alexander Ihler UC Irvine CIOG Workshop November 2011 Work with Qiang

More information

Advanced Algorithms 南京大学 尹一通

Advanced Algorithms 南京大学 尹一通 Advanced Algorithms 南京大学 尹一通 Constraint Satisfaction Problem variables: (CSP) X = {x 1, x2,..., xn} each variable ranges over a finite domain Ω an assignment σ ΩX assigns each variable a value in Ω constraints:

More information

Independence and chromatic number (and random k-sat): Sparse Case. Dimitris Achlioptas Microsoft

Independence and chromatic number (and random k-sat): Sparse Case. Dimitris Achlioptas Microsoft Independence and chromatic number (and random k-sat): Sparse Case Dimitris Achlioptas Microsoft Random graphs W.h.p.: with probability that tends to 1 as n. Hamiltonian cycle Let τ 2 be the moment all

More information

Lecture 5: Random Energy Model

Lecture 5: Random Energy Model STAT 206A: Gibbs Measures Invited Speaker: Andrea Montanari Lecture 5: Random Energy Model Lecture date: September 2 Scribe: Sebastien Roch This is a guest lecture by Andrea Montanari (ENS Paris and Stanford)

More information

The cavity method. Vingt ans après

The cavity method. Vingt ans après The cavity method Vingt ans après Les Houches lectures 1982 Early days with Giorgio SK model E = J ij s i s j i

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear

More information

Recovery of Simultaneously Structured Models using Convex Optimization

Recovery of Simultaneously Structured Models using Convex Optimization Recovery of Simultaneously Structured Models using Convex Optimization Maryam Fazel University of Washington Joint work with: Amin Jalali (UW), Samet Oymak and Babak Hassibi (Caltech) Yonina Eldar (Technion)

More information

Bayesian Machine Learning - Lecture 7

Bayesian Machine Learning - Lecture 7 Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1

More information

5. Density evolution. Density evolution 5-1

5. Density evolution. Density evolution 5-1 5. Density evolution Density evolution 5-1 Probabilistic analysis of message passing algorithms variable nodes factor nodes x1 a x i x2 a(x i ; x j ; x k ) x3 b x4 consider factor graph model G = (V ;

More information

Part 1: Expectation Propagation

Part 1: Expectation Propagation Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud

More information

ELE 538B: Mathematics of High-Dimensional Data. Spectral methods. Yuxin Chen Princeton University, Fall 2018

ELE 538B: Mathematics of High-Dimensional Data. Spectral methods. Yuxin Chen Princeton University, Fall 2018 ELE 538B: Mathematics of High-Dimensional Data Spectral methods Yuxin Chen Princeton University, Fall 2018 Outline A motivating application: graph clustering Distance and angles between two subspaces Eigen-space

More information

The cutoff phenomenon for random walk on random directed graphs

The cutoff phenomenon for random walk on random directed graphs The cutoff phenomenon for random walk on random directed graphs Justin Salez Joint work with C. Bordenave and P. Caputo Outline of the talk Outline of the talk 1. The cutoff phenomenon for Markov chains

More information

Factor Graphs and Message Passing Algorithms Part 1: Introduction

Factor Graphs and Message Passing Algorithms Part 1: Introduction Factor Graphs and Message Passing Algorithms Part 1: Introduction Hans-Andrea Loeliger December 2007 1 The Two Basic Problems 1. Marginalization: Compute f k (x k ) f(x 1,..., x n ) x 1,..., x n except

More information

A New Look at Survey Propagation and Its Generalizations

A New Look at Survey Propagation and Its Generalizations A New Look at Survey Propagation and Its Generalizations ELITZA MANEVA, ELCHANAN MOSSEL, AND MARTIN J. WAINWRIGHT University of California Berkeley, Berkeley, California Abstract. This article provides

More information

Oberwolfach workshop on Combinatorics and Probability

Oberwolfach workshop on Combinatorics and Probability Apr 2009 Oberwolfach workshop on Combinatorics and Probability 1 Describes a sharp transition in the convergence of finite ergodic Markov chains to stationarity. Steady convergence it takes a while to

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

Scaling Neighbourhood Methods

Scaling Neighbourhood Methods Quick Recap Scaling Neighbourhood Methods Collaborative Filtering m = #items n = #users Complexity : m * m * n Comparative Scale of Signals ~50 M users ~25 M items Explicit Ratings ~ O(1M) (1 per billion)

More information

Estimating the Capacity of the 2-D Hard Square Constraint Using Generalized Belief Propagation

Estimating the Capacity of the 2-D Hard Square Constraint Using Generalized Belief Propagation Estimating the Capacity of the 2-D Hard Square Constraint Using Generalized Belief Propagation Navin Kashyap (joint work with Eric Chan, Mahdi Jafari Siavoshani, Sidharth Jaggi and Pascal Vontobel, Chinese

More information

Lecture 9: PGM Learning

Lecture 9: PGM Learning 13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and

More information

Variational Inference. Sargur Srihari

Variational Inference. Sargur Srihari Variational Inference Sargur srihari@cedar.buffalo.edu 1 Plan of discussion We first describe inference with PGMs and the intractability of exact inference Then give a taxonomy of inference algorithms

More information

Junction Tree, BP and Variational Methods

Junction Tree, BP and Variational Methods Junction Tree, BP and Variational Methods Adrian Weller MLSALT4 Lecture Feb 21, 2018 With thanks to David Sontag (MIT) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by

More information

High-dimensional Statistics

High-dimensional Statistics High-dimensional Statistics Pradeep Ravikumar UT Austin Outline 1. High Dimensional Data : Large p, small n 2. Sparsity 3. Group Sparsity 4. Low Rank 1 Curse of Dimensionality Statistical Learning: Given

More information

13 : Variational Inference: Loopy Belief Propagation and Mean Field

13 : Variational Inference: Loopy Belief Propagation and Mean Field 10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction

More information

Lecture 4: Hashing and Streaming Algorithms

Lecture 4: Hashing and Streaming Algorithms CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 4: Hashing and Streaming Algorithms Lecturer: Shayan Oveis Gharan 01/18/2017 Scribe: Yuqing Ai Disclaimer: These notes have not been subjected

More information

Complete Convergence of Message Passing Algorithms for some Satisfiability Problems

Complete Convergence of Message Passing Algorithms for some Satisfiability Problems Complete Convergence of Message Passing Algorithms for some Satisfiability Problems Uriel Feige 1, Elchanan Mossel 2 and Dan Vilenchik 3 1 Micorosoft Research and The Weizmann Institute. urifeige@microsoft.com

More information

13 : Variational Inference: Loopy Belief Propagation

13 : Variational Inference: Loopy Belief Propagation 10-708: Probabilistic Graphical Models 10-708, Spring 2014 13 : Variational Inference: Loopy Belief Propagation Lecturer: Eric P. Xing Scribes: Rajarshi Das, Zhengzhong Liu, Dishan Gupta 1 Introduction

More information

ON THE MINIMUM DISTANCE OF NON-BINARY LDPC CODES. Advisor: Iryna Andriyanova Professor: R.. udiger Urbanke

ON THE MINIMUM DISTANCE OF NON-BINARY LDPC CODES. Advisor: Iryna Andriyanova Professor: R.. udiger Urbanke ON THE MINIMUM DISTANCE OF NON-BINARY LDPC CODES RETHNAKARAN PULIKKOONATTU ABSTRACT. Minimum distance is an important parameter of a linear error correcting code. For improved performance of binary Low

More information

Lecture 15: MCMC Sanjeev Arora Elad Hazan. COS 402 Machine Learning and Artificial Intelligence Fall 2016

Lecture 15: MCMC Sanjeev Arora Elad Hazan. COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 15: MCMC Sanjeev Arora Elad Hazan COS 402 Machine Learning and Artificial Intelligence Fall 2016 Course progress Learning from examples Definition + fundamental theorem of statistical learning,

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information

Inferring Sparsity: Compressed Sensing Using Generalized Restricted Boltzmann Machines. Eric W. Tramel. itwist 2016 Aalborg, DK 24 August 2016

Inferring Sparsity: Compressed Sensing Using Generalized Restricted Boltzmann Machines. Eric W. Tramel. itwist 2016 Aalborg, DK 24 August 2016 Inferring Sparsity: Compressed Sensing Using Generalized Restricted Boltzmann Machines Eric W. Tramel itwist 2016 Aalborg, DK 24 August 2016 Andre MANOEL, Francesco CALTAGIRONE, Marylou GABRIE, Florent

More information

Latent voter model on random regular graphs

Latent voter model on random regular graphs Latent voter model on random regular graphs Shirshendu Chatterjee Cornell University (visiting Duke U.) Work in progress with Rick Durrett April 25, 2011 Outline Definition of voter model and duality with

More information

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R

More information

Convexifying the Bethe Free Energy

Convexifying the Bethe Free Energy 402 MESHI ET AL. Convexifying the Bethe Free Energy Ofer Meshi Ariel Jaimovich Amir Globerson Nir Friedman School of Computer Science and Engineering Hebrew University, Jerusalem, Israel 91904 meshi,arielj,gamir,nir@cs.huji.ac.il

More information

Composite Loss Functions and Multivariate Regression; Sparse PCA

Composite Loss Functions and Multivariate Regression; Sparse PCA Composite Loss Functions and Multivariate Regression; Sparse PCA G. Obozinski, B. Taskar, and M. I. Jordan (2009). Joint covariate selection and joint subspace selection for multiple classification problems.

More information

SPIN SYSTEMS: HARDNESS OF APPROXIMATE COUNTING VIA PHASE TRANSITIONS

SPIN SYSTEMS: HARDNESS OF APPROXIMATE COUNTING VIA PHASE TRANSITIONS SPIN SYSTEMS: HARDNESS OF APPROXIMATE COUNTING VIA PHASE TRANSITIONS Andreas Galanis Joint work with: Jin-Yi Cai Leslie Ann Goldberg Heng Guo Mark Jerrum Daniel Štefankovič Eric Vigoda The Power of Randomness

More information

Lecture 14: Random Walks, Local Graph Clustering, Linear Programming

Lecture 14: Random Walks, Local Graph Clustering, Linear Programming CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 14: Random Walks, Local Graph Clustering, Linear Programming Lecturer: Shayan Oveis Gharan 3/01/17 Scribe: Laura Vonessen Disclaimer: These

More information

Message-Passing Algorithms for GMRFs and Non-Linear Optimization

Message-Passing Algorithms for GMRFs and Non-Linear Optimization Message-Passing Algorithms for GMRFs and Non-Linear Optimization Jason Johnson Joint Work with Dmitry Malioutov, Venkat Chandrasekaran and Alan Willsky Stochastic Systems Group, MIT NIPS Workshop: Approximate

More information

High-dimensional Statistical Models

High-dimensional Statistical Models High-dimensional Statistical Models Pradeep Ravikumar UT Austin MLSS 2014 1 Curse of Dimensionality Statistical Learning: Given n observations from p(x; θ ), where θ R p, recover signal/parameter θ. For

More information

MCMC and Gibbs Sampling. Kayhan Batmanghelich

MCMC and Gibbs Sampling. Kayhan Batmanghelich MCMC and Gibbs Sampling Kayhan Batmanghelich 1 Approaches to inference l Exact inference algorithms l l l The elimination algorithm Message-passing algorithm (sum-product, belief propagation) The junction

More information

Convex optimization. Javier Peña Carnegie Mellon University. Universidad de los Andes Bogotá, Colombia September 2014

Convex optimization. Javier Peña Carnegie Mellon University. Universidad de los Andes Bogotá, Colombia September 2014 Convex optimization Javier Peña Carnegie Mellon University Universidad de los Andes Bogotá, Colombia September 2014 1 / 41 Convex optimization Problem of the form where Q R n convex set: min x f(x) x Q,

More information

Linear Sketches A Useful Tool in Streaming and Compressive Sensing

Linear Sketches A Useful Tool in Streaming and Compressive Sensing Linear Sketches A Useful Tool in Streaming and Compressive Sensing Qin Zhang 1-1 Linear sketch Random linear projection M : R n R k that preserves properties of any v R n with high prob. where k n. M =

More information

Solving Corrupted Quadratic Equations, Provably

Solving Corrupted Quadratic Equations, Provably Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin

More information

Matrix Completion from a Few Entries

Matrix Completion from a Few Entries Matrix Completion from a Few Entries Raghunandan H. Keshavan and Sewoong Oh EE Department Stanford University, Stanford, CA 9434 Andrea Montanari EE and Statistics Departments Stanford University, Stanford,

More information

Approximate Message Passing Algorithms

Approximate Message Passing Algorithms November 4, 2017 Outline AMP (Donoho et al., 2009, 2010a) Motivations Derivations from a message-passing perspective Limitations Extensions Generalized Approximate Message Passing (GAMP) (Rangan, 2011)

More information

Survey Propagation: Iterative Solutions to Constraint Satisfaction Problems

Survey Propagation: Iterative Solutions to Constraint Satisfaction Problems Survey Propagation: Iterative Solutions to Constraint Satisfaction Problems Constantine Caramanis September 28, 2003 Abstract Iterative algorithms, such as the well known Belief Propagation algorithm,

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: ony Jebara Kalman Filtering Linear Dynamical Systems and Kalman Filtering Structure from Motion Linear Dynamical Systems Audio: x=pitch y=acoustic waveform Vision: x=object

More information

Linear and conic programming relaxations: Graph structure and message-passing

Linear and conic programming relaxations: Graph structure and message-passing Linear and conic programming relaxations: Graph structure and message-passing Martin Wainwright UC Berkeley Departments of EECS and Statistics Banff Workshop Partially supported by grants from: National

More information