Message Passing Algorithms: A Success Looking for Theoreticians

Size: px

Start display at page:

Download "Message Passing Algorithms: A Success Looking for Theoreticians"

Meghan Beasley
5 years ago
Views:

1 Message Passing Algorithms: A Success Looking for Theoreticians Andrea Montanari Stanford University June 5, 2010 Andrea Montanari (Stanford) Message Passing June 5, / 93

2 What is this talk about? A couple of examples A F m n 2, y F n 2 { minimize d(x, y), subject to Ax = 0. Andrea Montanari (Stanford) Message Passing June 5, / 93

3 What is this talk about? A couple of examples A F m n 2, y F n 2 { minimize d(x, y), subject to Ax = 0. Andrea Montanari (Stanford) Message Passing June 5, / 93

4 You should not hope for easy solutions... Andrea Montanari (Stanford) Message Passing June 5, / 93

5 You should not hope for easy solutions... One of the little boxes solves it for n = In 10 6 secs. Andrea Montanari (Stanford) Message Passing June 5, / 93

6 You should not hope for easy solutions... Uses a message passing algorithm + A random sparse Andrea Montanari (Stanford) Message Passing June 5, / 93

7 Do not worry! No more hardware diagrams in this talk!!! Andrea Montanari (Stanford) Message Passing June 5, / 93

8 Do not worry! No more hardware diagrams in this talk!!! Andrea Montanari (Stanford) Message Passing June 5, / 93

9 Second example: Learning low-rank matrices Andrea Montanari (Stanford) Message Passing June 5, / 93

10 Second example: Learning low-rank matrices The Netflix dataset M = movies users 10 8 ratings Andrea Montanari (Stanford) Message Passing June 5, / 93

11 Learning low-rank matrices: The Netflix dataset? ???? users movies 10 6 queries M = Andrea Montanari (Stanford) Message Passing June 5, / 93

12 A prize awarded for: RMSE < ; ) Andrea Montanari (Stanford) Message Passing June 5, / 93

13 A popular cost function C(X, Y ) 1 2 (i,j) Revealed M ij (XY T 2 λ ) ij + 2 X 2 F + λ 2 Y 2 F. X R n r, Y R m r [Srebro, Rennie, Jaakkola 2005] Non convex. Large (!) scale. Andrea Montanari (Stanford) Message Passing June 5, / 93

14 A popular cost function C(X, Y ) 1 2 (i,j) Revealed M ij (XY T 2 λ ) ij + 2 X 2 F + λ 2 Y 2 F. X R n r, Y R m r [Srebro, Rennie, Jaakkola 2005] Non convex. Large (!) scale. Andrea Montanari (Stanford) Message Passing June 5, / 93

15 A popular cost function C(X, Y ) 1 2 (i,j) Revealed M ij (XY T 2 λ ) ij + 2 X 2 F + λ 2 Y 2 F. X R n r, Y R m r [Srebro, Rennie, Jaakkola 2005] Non convex. Large (!) scale. Andrea Montanari (Stanford) Message Passing June 5, / 93

16 Three algorithms RMSE Message Passing Alt. Min. OptSpace iterations OptSpace Gradient descent [Keshavan-Montanari-Oh 2009] Alternating Least Squares [Koren-Bell 2008] Message Passing [Keshavan-Montanari 2010] Andrea Montanari (Stanford) Message Passing June 5, / 93

17 Examples everywhere! Machine learning and AI. Coding and communications. Large random structures, statistical mechanics, Andrea Montanari (Stanford) Message Passing June 5, / 93

18 Outline 1 k-sat: A (Very) Simple Algorithm 2 Taking it seriously 3 Of trees and loops 4 Beyond uniqueness? 5 A recent application Andrea Montanari (Stanford) Message Passing June 5, / 93

19 k-sat: A (Very) Simple Algorithm Andrea Montanari (Stanford) Message Passing June 5, / 93

20 k-sat x = (x 1,..., x n ) {0, 1} n Instance (k = 3): (x 1 x 4 x 6 ) (x 7 x 8 x 10 ) (x 12 x 17 x 19 ) Andrea Montanari (Stanford) Message Passing June 5, / 93

21 Broder-Frieze-Upfal (1993) Pure literal 1: Repeat : 2: Find x i pure literal; 3: Fix x i ; x i is a pure literal if it never appears negated or if only appears negated Andrea Montanari (Stanford) Message Passing June 5, / 93

22 Broder-Frieze-Upfal (1993) Random k-sat: Uniformly random formula with n variables and nα clauses. Analysis: I. Markov chain in reduced state space. II. ODE method. Andrea Montanari (Stanford) Message Passing June 5, / 93

23 Broder-Frieze-Upfal (1993) Random k-sat: Uniformly random formula with n variables and nα clauses. Analysis: I. Markov chain in reduced state space. II. ODE method. Andrea Montanari (Stanford) Message Passing June 5, / 93

24 Message passing: 1. Factor graph x 4 x 7 x 2 x 8 x 3 x 6 x 1 x 5 (x 1 x 2 x 3 ) (x 3 x 4 x 6 ) (x 3 x 5 x 6 ) (x 6 x 7 x 8 ) [Labeled bipartite graph] Andrea Montanari (Stanford) Message Passing June 5, / 93

25 Message passing: 2. Messages a ˆν (t) a i i a ν (t) i a i ν (t) i a, ˆν(t) a i {free, cons}. Andrea Montanari (Stanford) Message Passing June 5, / 93

26 Message passing: 3. Update rules s jb label on edge (j, b) b i a { ν (t+1) cons if sib s i a = ia, ˆν (t) b i = cons for some b i \ a, free otherwise. j a i { ˆν (t) free (t) if ν j a = free for some j a \ i, a i = cons otherwise. Equivalent to Pure Literal! Andrea Montanari (Stanford) Message Passing June 5, / 93

27 Message passing: 3. Update rules s jb label on edge (j, b) b i a { ν (t+1) cons if sib s i a = ia, ˆν (t) b i = cons for some b i \ a, free otherwise. j a i { ˆν (t) free (t) if ν j a = free for some j a \ i, a i = cons otherwise. Equivalent to Pure Literal! Andrea Montanari (Stanford) Message Passing June 5, / 93

28 Message passing: 4. Analysis (density evolution) φ t = P{ν (t) j a = cons}, φt = P{ˆν (t) a i = cons}, b i a φ t+1 = 1 E {[ φ t ] d } = 1 e kα 2 b φ t, d j a k 1 i φ t = φ k 1 t, Andrea Montanari (Stanford) Message Passing June 5, / 93

29 Message passing: 4. Analysis φ t+1 = 1 exp{ kαφ k 1 t /2} f (φ t ) 1 f (φ) α > α pl (k) α < α pl (k) φ α pl (k) = sup { α : 1 e kαxk 1 /2 x x [0, 1] } Theorem (Broder-Frieze-Upfal 93, Molloy 04) Pure Literal finds a solution whp if α < α pl and fails whp if α > α pl. Andrea Montanari (Stanford) Message Passing June 5, / 93

30 This is a proof because B I (t) ball of radius t around uniform I [n] T some random rooted tree T(t) its first t generations Definition (Benjamini-Schramm 1996, Aldous-Steele 2003) The sequences of (factor) graphs G n = (V n = [n], E n ) converges locally to T if, for any t, B I (t) converges in distribution to T(t). Andrea Montanari (Stanford) Message Passing June 5, / 93

31 This is a proof because B I (t) ball of radius t around uniform I [n] T some random rooted tree T(t) its first t generations Definition (Benjamini-Schramm 1996, Aldous-Steele 2003) The sequences of (factor) graphs G n = (V n = [n], E n ) converges locally to T if, for any t, B I (t) converges in distribution to T(t). Andrea Montanari (Stanford) Message Passing June 5, / 93

32 Example [In fact a little bit more is needed... ] Andrea Montanari (Stanford) Message Passing June 5, / 93 Lemma Random k-sat instances locally Poisson Galton-Watson trees. Poisson(kα/2) Poisson(kα/2)

33 Example [In fact a little bit more is needed... ] Andrea Montanari (Stanford) Message Passing June 5, / 93 Lemma Random k-sat instances locally Poisson Galton-Watson trees. Poisson(kα/2) Poisson(kα/2)

34 Example [In fact a little bit more is needed... ] Andrea Montanari (Stanford) Message Passing June 5, / 93 Lemma Random k-sat instances locally Poisson Galton-Watson trees. Poisson(kα/2) Poisson(kα/2)

35 Example [In fact a little bit more is needed... ] Andrea Montanari (Stanford) Message Passing June 5, / 93 Lemma Random k-sat instances locally Poisson Galton-Watson trees. Poisson(kα/2) Poisson(kα/2)

36 Example [In fact a little bit more is needed... ] Andrea Montanari (Stanford) Message Passing June 5, / 93 Lemma Random k-sat instances locally Poisson Galton-Watson trees. Poisson(kα/2) Poisson(kα/2)

37 Example [In fact a little bit more is needed... ] Andrea Montanari (Stanford) Message Passing June 5, / 93 Lemma Random k-sat instances locally Poisson Galton-Watson trees. Poisson(kα/2) Poisson(kα/2)

38 Example Analysis generalizes to other ensembles Andrea Montanari (Stanford) Message Passing June 5, / 93 Lemma Random k-sat instances locally Poisson Galton-Watson trees. Poisson(kα/2) Poisson(kα/2)

39 A parenthesis: Generalizations are useful LDPC codes [Gallager 1966, Luby et al. 2001] Code = { x {0, 1} n : Ax = 0 mod 2 } A = adjacency matrix of sparse (pseudo)random graph Andrea Montanari (Stanford) Message Passing June 5, / 93

40 A parenthesis: Generalizations are useful LDPC codes [Gallager 1966, Luby et al. 2001] Code = { x {0, 1} n : Ax = 0 mod 2 } A = adjacency matrix of sparse (pseudo)random graph Andrea Montanari (Stanford) Message Passing June 5, / 93

41 A parenthesis: Generalizations are useful Luby et al. 2001: Random graph w degree distributions (λ, ρ) Optimization over (λ, ρ) Andrea Montanari (Stanford) Message Passing June 5, / 93

42 Relation with Gibbs measures Poisson(kα/2) Poisson(kα/2) µ T (x) = uniform measure over solutions of T( ) Andrea Montanari (Stanford) Message Passing June 5, / 93

43 Relation with Gibbs measures What does it mean uniform??? T(t = 2) T(t = 2) Definition (DLR) µ T is uniform (Gibbs) if µ T (x T(t) x T(t) ) is uniform for all t. Andrea Montanari (Stanford) Message Passing June 5, / 93

44 Relation with Gibbs measures T(t = 2) T(t = 2) Lemma (Montanari-Shah 2010) µ T is unique if and only if α < α u (k) with α u (k) = 2 log k { 1 + ok (1) }. k In particular α pl (k) = α u (k) + O(k 2 ). Andrea Montanari (Stanford) Message Passing June 5, / 93

45 More concretely T(t = 2) T(t = 2) sup x,x µ T (x ø = 1 x T(t) ) µ T (x ø = 1 x T(t) ) A e b t. Andrea Montanari (Stanford) Message Passing June 5, / 93

46 Relation with MCMC To avoid pathologies: µ n,β (x) 1 { exp β } {clauses violated by x} Z n,β Conjecture Heath bath/glauber/gibbs sampler has τ mix = O(n C ) whp for α < α u (k). [Mossel-Sly 201?] Andrea Montanari (Stanford) Message Passing June 5, / 93

47 Relation with MCMC To avoid pathologies: µ n,β (x) 1 { exp β } {clauses violated by x} Z n,β Conjecture Heath bath/glauber/gibbs sampler has τ mix = O(n C ) whp for α < α u (k). [Mossel-Sly 201?] Andrea Montanari (Stanford) Message Passing June 5, / 93

48 Relation with MCMC To avoid pathologies: µ n,β (x) 1 { exp β } {clauses violated by x} Z n,β Conjecture Heath bath/glauber/gibbs sampler has τ mix = O(n C ) whp for α < α u (k). [Mossel-Sly 201?] Andrea Montanari (Stanford) Message Passing June 5, / 93

49 Relation with MCMC To avoid pathologies: µ n,β (x) 1 { exp β } {clauses violated by x} Z n,β Conjecture Heath bath/glauber/gibbs sampler has τ mix = O(n C ) whp for α < α u (k). [Mossel-Sly 201?] Andrea Montanari (Stanford) Message Passing June 5, / 93

50 Relation with MCMC To avoid pathologies: µ n,β (x) 1 { exp β } {clauses violated by x} Z n,β Conjecture Heath bath/glauber/gibbs sampler has τ mix = O(n C ) whp for α < α u (k). [Mossel-Sly 201?] Andrea Montanari (Stanford) Message Passing June 5, / 93

51 Taking it seriously Andrea Montanari (Stanford) Message Passing June 5, / 93

52 How would you do it in your dreams? I would use Marginal (i [n]) 1: ; 2: ; 3: ; 4: Return µ(x i = 1). µ(x i = 1) = fraction of solutions in which x i = 1 Andrea Montanari (Stanford) Message Passing June 5, / 93

53 How would you do it in your dreams? I would use Marginal (i [n]) 1: ; 2: ; 3: ; 4: Return µ(x i = 1). µ(x i = 1) = fraction of solutions in which x i = 1 Andrea Montanari (Stanford) Message Passing June 5, / 93

54 How would you do it in your dreams? Solver 1: Repeat : 2: Choose x i ; 3: µ(x i = 1) = Marginal(i); 4 : Fix x i = 1 with probability µ(x i = 1); 5 : x i = 0 otherwise; Samples a solution uniformly. Andrea Montanari (Stanford) Message Passing June 5, / 93

55 Message passing implementation of Marginal(i) x 4 x 7 x 2 x 8 x 3 x 6 x 1 x 5 (x 1 x 2 x 3 ) (x 3 x 4 x 6 ) (x 3 x 5 x 6 ) (x 6 x 7 x 8 ) Andrea Montanari (Stanford) Message Passing June 5, / 93

56 Message passing implementation of Marginal(i) a ˆν (t) a i a ν (t) i a i i ν (t) i a, ˆν(t) a i M({0, 1}) (simplex of prob measures on {0, 1}). ν (t) i a = (ν(t) i a (0), ν(t) i a (1)) Andrea Montanari (Stanford) Message Passing June 5, / 93

57 Message passing implementation of Marginal(i) a ˆν (t) a i a ν (t) i a i i ν (t) i a, ˆν(t) a i M({0, 1}) (simplex of prob measures on {0, 1}). ν (t) i a = (ν(t) i a (0), ν(t) i a (1)) Andrea Montanari (Stanford) Message Passing June 5, / 93

58 Message passing implementation of Marginal(i) s jb label on edge (j, b) b i a ν (t+1) i a (x i) = ˆν (t) b i (x i) b i\a j a i { ˆν (t) a i (x i) 1 j a\i = ν j a(x j = s ja ) if x i = s ia, 1 otherwise. [Belief propagation] Andrea Montanari (Stanford) Message Passing June 5, / 93

59 What are these messy equations? Clause a: (x 1 x 2 x 3 ) G\a 2 a 1 3 N G (x A = ) number of solutions such that x A =, µ G (x i = 1) = N G (x i = 1) N G (x i = 0) + N G (x i = 1). Andrea Montanari (Stanford) Message Passing June 5, / 93

60 What are these messy equations? Clause a: (x 1 x 2 x 3 ) G\a 3 2 a 1 N G (x 1 = 1) = N G\a (x 2 = 0, x 3 = 0) + N G\a (x 2 = 0, x 3 = 1) +N G\a (x 2 = 1, x 3 = 0) + N G\a (x 2 = 1, x 3 = 1) N G (x 1 = 0) = N G\a (x 2 = 0, x 3 = 1) + N G\a (x 2 = 1, x 3 = 0) +N G\a (x 2 = 1, x 3 = 1) = N G (x 1 = 1) N G\a (x 2 = 0, x 3 = 0) Andrea Montanari (Stanford) Message Passing June 5, / 93

61 What are these messy equations? Clause a: (x 1 x 2 x 3 ) G\a 3 2 a 1 N G (x 1 = 1) = N G\a (x 2 = 0, x 3 = 0) + N G\a (x 2 = 0, x 3 = 1) +N G\a (x 2 = 1, x 3 = 0) + N G\a (x 2 = 1, x 3 = 1) N G (x 1 = 0) = N G\a (x 2 = 0, x 3 = 1) + N G\a (x 2 = 1, x 3 = 0) +N G\a (x 2 = 1, x 3 = 1) = N G (x 1 = 1) N G\a (x 2 = 0, x 3 = 0) Andrea Montanari (Stanford) Message Passing June 5, / 93

62 What are these messy equations? Clause a: (x 1 x 2 x 3 ) G\a 2 a 1 3 N G (x 1 = 0) N G (x 1 = 1) = µg (x 1 = 0) µ G (x 1 = 1) = 1 µ G\a (x 2 = 0, x 3 = 0) 1 µ G\a (x 2 = 0)µ G\a (x 3 = 0). Andrea Montanari (Stanford) Message Passing June 5, / 93

63 What are these messy equations? Clause a: (x 1 x 2 x 3 ) G\a 2 a 1 3 N G (x 1 = 0) N G (x 1 = 1) = µg (x 1 = 0) µ G (x 1 = 1) = 1 µ G\a (x 2 = 0, x 3 = 0) 1 µ G\a (x 2 = 0)µ G\a (x 3 = 0). Andrea Montanari (Stanford) Message Passing June 5, / 93

64 BP-guided decimation BP-guided decimation 1: Repeat : 2: Choose x i ; 3: Compute µ(x i = 1) using BP; 4 : Fix x i = 1 with probability µ(x i = 1); 5 : x i = 0 otherwise; Andrea Montanari (Stanford) Message Passing June 5, / 93

65 Is it worth the effort??? Proposition BP computes the correct marginals for if α < α u (k), where α u (k) = 2 log k { 1 + ok (1) } = α pl (k) { 1 + o k (1) } : ( k Proof. A general argument, wait 2 minutes. Andrea Montanari (Stanford) Message Passing June 5, / 93

66 Is it worth the effort??? Proposition BP computes the correct marginals for if α < α u (k), where α u (k) = 2 log k { 1 + ok (1) } = α pl (k) { 1 + o k (1) } : ( k Proof. A general argument, wait 2 minutes. Andrea Montanari (Stanford) Message Passing June 5, / 93

67 Is it worth the effort??? Proposition BP computes the correct marginals for if α < α u (k), where α u (k) = 2 log k { 1 + ok (1) } = α pl (k) { 1 + o k (1) } : ( k Proof. A general argument, wait 2 minutes. Andrea Montanari (Stanford) Message Passing June 5, / 93

68 Is it worth the effort??? Andrea Montanari (Stanford) Message Passing June 5, / 93

69 Is it worth the effort??? 1 N= N=10000 success probability N=3000 N= α [Montanari-Ricci-Semerjian 2007] Andrea Montanari (Stanford) Message Passing June 5, / 93

70 The context: 4-SAT α pl (4) 1.54 α BP dec (4) α s (4) 9.93 [Mézard-Parisi-Zecchina 2002 (Conj)] Andrea Montanari (Stanford) Message Passing June 5, / 93

71 The context: k-sat α pl (4) 2 log k k α BP dec (k) 2k e k 0 α s (k) 2 k log 2 [Achlioptas-Peres 2004] Andrea Montanari (Stanford) Message Passing June 5, / 93

72 I owed you a general argument Andrea Montanari (Stanford) Message Passing June 5, / 93

73 Of trees and loops Andrea Montanari (Stanford) Message Passing June 5, / 93

74 Computation tree root Andrea Montanari (Stanford) Message Passing June 5, / 93

75 Computation tree, T G,i (0) root root Andrea Montanari (Stanford) Message Passing June 5, / 93

76 Computation tree, T G,i (1) root root Andrea Montanari (Stanford) Message Passing June 5, / 93

77 Computation tree, T G,i (2) root root Andrea Montanari (Stanford) Message Passing June 5, / 93

78 Computation tree, T G,i (3) root root Andrea Montanari (Stanford) Message Passing June 5, / 93

79 So what? root root Remark After t iterations BP(i, t) outputs µ T G,i (t) (x i ) Andrea Montanari (Stanford) Message Passing June 5, / 93

80 And now for the general argument root root Theorem (Tatikonda-Jordan 2002) 1. If the Gibbs measure on T G,i ( ) is unique then BP converges. 2. Further, if G has large girth, then µ BP(t) (x i ) = µ T G,i (t) (x i ) µ G (x i ). Proof. By definition. Andrea Montanari (Stanford) Message Passing June 5, / 93

81 And now for the general argument root root Theorem (Tatikonda-Jordan 2002) 1. If the Gibbs measure on T G,i ( ) is unique then BP converges. 2. Further, if G has large girth, then µ BP(t) (x i ) = µ T G,i (t) (x i ) µ G (x i ). Proof. By definition. Andrea Montanari (Stanford) Message Passing June 5, / 93

82 And now for the general argument root root Theorem (Tatikonda-Jordan 2002) 1. If the Gibbs measure on T G,i ( ) is unique then BP converges. 2. Further, if G has large girth, then µ BP(t) (x i ) = µ T G,i (t) (x i ) µ G (x i ). Proof. By definition. Andrea Montanari (Stanford) Message Passing June 5, / 93

83 What about graphs with many short loops? Andrea Montanari (Stanford) Message Passing June 5, / 93

84 Example: Independent sets root x j {0, 1} µ G (x) = 1 Z (i,j) E I(x i x j ) λ x Andrea Montanari (Stanford) Message Passing June 5, / 93

85 Weitz s Self Avoiding Walk tree T SAW G,i = Truncation of T G,i ( ) + boundary conditions Lemma µ TSAW G,i (x i ) = µ G (x i ). Andrea Montanari (Stanford) Message Passing June 5, / 93

86 Weitz s Self Avoiding Walk tree root root As you can see... T SAW G,i = exp{θ(n)} Andrea Montanari (Stanford) Message Passing June 5, / 93

87 An algorithm Truncate T SAW G,i at depth Θ(log n) Theorem (Weitz 2006) Assume G has degree bounded by k, and Gibbs measure on k-regular trees is unique. Approximate counting can be performed in polynomial time. Andrea Montanari (Stanford) Message Passing June 5, / 93

88 An algorithm Truncate T SAW G,i at depth Θ(log n) Theorem (Weitz 2006) Assume G has degree bounded by k, and Gibbs measure on k-regular trees is unique. Approximate counting can be performed in polynomial time. Andrea Montanari (Stanford) Message Passing June 5, / 93

89 How general is this strategy? Theorem (Gamarnik-Katz 2007) Pretty general. [Uses appropriate backtracking tree] Andrea Montanari (Stanford) Message Passing June 5, / 93

90 How practical is this strategy? Complexity = Θ(n big exponent ) big exponent depends on b sup x,x µ T (x ø = 1 x T(t) ) µ T (x ø = 1 x T(t) ) A e b t. [Lu-Measson-Montanari 2008] Andrea Montanari (Stanford) Message Passing June 5, / 93

91 How practical is this strategy? Complexity = Θ(n big exponent ) big exponent depends on b sup x,x µ T (x ø = 1 x T(t) ) µ T (x ø = 1 x T(t) ) A e b t. [Lu-Measson-Montanari 2008] Andrea Montanari (Stanford) Message Passing June 5, / 93

92 How practical is this strategy? Complexity = Θ(n big exponent ) big exponent depends on b sup x,x µ T (x ø = 1 x T(t) ) µ T (x ø = 1 x T(t) ) A e b t. [Lu-Measson-Montanari 2008] Andrea Montanari (Stanford) Message Passing June 5, / 93

93 Beyond uniqueness? Andrea Montanari (Stanford) Message Passing June 5, / 93

94 Why should we hope for more? Example: NAE-SAT One clause: (x 1 x 16 x 71 ) (x 1 x 16 x 71 ) µ G (x) = uniform measure over solutions Andrea Montanari (Stanford) Message Passing June 5, / 93

95 Why should we hope for more? Theorem (Montanari-Restrepo-Tetali 2009) For α α (k) = 2 k 1 log 2{1 + o k (1)}, (G n, µ Gn ) locally (T, µ T ). Theorem (Achlioptas-Moore 2002) For NAE-SAT α s (k) = 2 k 1 log 2 { 1 + o k (1) }. Proof: Second moment method. Andrea Montanari (Stanford) Message Passing June 5, / 93

96 Why should we hope for more? Theorem (Montanari-Restrepo-Tetali 2009) For α α (k) = 2 k 1 log 2{1 + o k (1)}, (G n, µ Gn ) locally (T, µ T ). Theorem (Achlioptas-Moore 2002) For NAE-SAT α s (k) = 2 k 1 log 2 { 1 + o k (1) }. Proof: Second moment method. Andrea Montanari (Stanford) Message Passing June 5, / 93

97 Why should we hope for more? Theorem (Montanari-Restrepo-Tetali 2009) For α α (k) = 2 k 1 log 2{1 + o k (1)}, (G n, µ Gn ) locally (T, µ T ). Theorem (Achlioptas-Moore 2002) For NAE-SAT α s (k) = 2 k 1 log 2 { 1 + o k (1) }. Proof: Second moment method. Andrea Montanari (Stanford) Message Passing June 5, / 93

98 What is happening? Notions of correlation decay Uniqueness: sup x,x x i µ(xi x i,r ) µ(x i x i,r ) 0 Extremality: x i,x i,r µ(x i, x i,r ) µ(x i )µ(x i,r ) 0 Concentration: µ(xi(1),..., x i(l) ) µ(x i(1) ) µ(x i(l) ) 0 x i(1)...x i(l) Andrea Montanari (Stanford) Message Passing June 5, / 93

99 What is happening? Notions of correlation decay Uniqueness: sup x,x x i µ(xi x i,r ) µ(x i x i,r ) 0 Extremality: x i,x i,r µ(x i, x i,r ) µ(x i )µ(x i,r ) 0 Concentration: µ(xi(1),..., x i(l) ) µ(x i(1) ) µ(x i(l) ) 0 x i(1)...x i(l) Andrea Montanari (Stanford) Message Passing June 5, / 93

100 What is happening? Notions of correlation decay Uniqueness: sup x,x x i µ(xi x i,r ) µ(x i x i,r ) 0 Extremality: x i,x i,r µ(x i, x i,r ) µ(x i )µ(x i,r ) 0 Concentration: µ(xi(1),..., x i(l) ) µ(x i(1) ) µ(x i(l) ) 0 x i(1)...x i(l) Andrea Montanari (Stanford) Message Passing June 5, / 93

101 What is happening? Notions of correlation decay Uniqueness: sup x,x x i µ(xi x i,r ) µ(x i x i,r ) 0 Extremality: x i,x i,r µ(x i, x i,r ) µ(x i )µ(x i,r ) 0 Concentration: µ(xi(1),..., x i(l) ) µ(x i(1) ) µ(x i(l) ) 0 x i(1)...x i(l) Andrea Montanari (Stanford) Message Passing June 5, / 93

102 What happens in k-sat? uniqueness extremality concentration no-concentration α u (k) α d (k) α c (k) α s (k) α u (k) = (2 log k)/k +... α d (k) = (2 k log k)/k +... [proved] (α d (4) 9.38) [proved in NAE-SAT] α c (k) = 2 k log log (α c(4) 9.547) α s (k) = 2 k log (1 + log 2) +... (α s(4) 9.93) Andrea Montanari (Stanford) Message Passing June 5, / 93

103 So what? Andrea Montanari (Stanford) Message Passing June 5, / 93

104 So what? Bethe-Peierls approximation root µ(x) = (i,j) E ψ i,j(x i, x j )/Z x i X Definition A set of messages is a collection {ν i j ( )} indexed by directed edges in G, where ν i j M(X ). Andrea Montanari (Stanford) Message Passing June 5, / 93

105 So what? Bethe-Peierls approximation root µ(x) = (i,j) E ψ i,j(x i, x j )/Z x i X Definition A set of messages is a collection {ν i j ( )} indexed by directed edges in G, where ν i j M(X ). Andrea Montanari (Stanford) Message Passing June 5, / 93

106 Given F G, diam(f ) 2l girth, such that deg F (i) = deg G (i) or 1 ν U (x U ) 1 C(ν U ) (ij) F ψ ij (x i, x j ) ν i j(i) (x i ). i F Andrea Montanari (Stanford) Message Passing June 5, / 93

107 Bethe states Definition A probability distribution ρ on X V is an (ε, r) Bethe state, if there exists a set of messages {ν i j ( )} such that, for any F G with diam(f ) 2r ρ U ν U TV ε. Theorem (Dembo-Montanari 2009) If µ is extremal with rate δ( ) then it an (ε, r) Bethe state for any r < l and ε Cδ(l r). Andrea Montanari (Stanford) Message Passing June 5, / 93

108 Bethe states Definition A probability distribution ρ on X V is an (ε, r) Bethe state, if there exists a set of messages {ν i j ( )} such that, for any F G with diam(f ) 2r ρ U ν U TV ε. Theorem (Dembo-Montanari 2009) If µ is extremal with rate δ( ) then it an (ε, r) Bethe state for any r < l and ε Cδ(l r). Andrea Montanari (Stanford) Message Passing June 5, / 93

109 Algorithms vs correlation decay α u (k) α d (k) α c (k) α s (k) Conjecture BP-guided decimation finds solutions up to α (k) α d (k). Currently huge gap! Andrea Montanari (Stanford) Message Passing June 5, / 93

110 Algorithms vs correlation decay α u (k) α d (k) α c (k) α s (k) Conjecture BP-guided decimation finds solutions up to α (k) α d (k). Currently huge gap! Andrea Montanari (Stanford) Message Passing June 5, / 93

111 Relation with the structure of the solution space α d (k) α c (k) α s (k) [Biroli-Monasson-Weigt 1999] [Mézard-Parisi-Zecchina 2001] [Krzakala et al. 2007] [Achlioptas-Coja 2009] Andrea Montanari (Stanford) Message Passing June 5, / 93

112 A recent application Andrea Montanari (Stanford) Message Passing June 5, / 93

113 Noisy undedetermined linear systems y = A x 0 + w Estimate x 0 R N given (y, A). [Signal processing: Donoho, Candes, Tao,... ] [Sketching: Indyk, Gilbert, Muthu,... ] Andrea Montanari (Stanford) Message Passing June 5, / 93

114 Noisy undedetermined linear systems y = A x 0 + w Estimate x 0 R N given (y, A). [Signal processing: Donoho, Candes, Tao,... ] [Sketching: Indyk, Gilbert, Muthu,... ] Andrea Montanari (Stanford) Message Passing June 5, / 93

115 Noisy undedetermined linear systems y = A x 0 + w Estimate x 0 R N given (y, A). [Signal processing: Donoho, Candes, Tao,... ] [Sketching: Indyk, Gilbert, Muthu,... ] Andrea Montanari (Stanford) Message Passing June 5, / 93

116 The LASSO x(y, A) = argmin x R N C A,y (x) C A,y (x) = λ x y Ax 2 2 [Tibshirani 96; Chen, Donoho 95; papers] Andrea Montanari (Stanford) Message Passing June 5, / 93

117 Wonderful, but... What performance should I expect? How am I supposed to choose C A,y? What if I can design A? Low-complexity algorithms? Andrea Montanari (Stanford) Message Passing June 5, / 93

118 A little experiment A real data +1 with prob , x 0,i = 0 with prob , 1 with prob , w i N(0, 0.2) Andrea Montanari (Stanford) Message Passing June 5, / 93

119 Clinical data 0.5 MSE N=200 N=500 N= λ A is n N, n = 0.64N Andrea Montanari (Stanford) Message Passing June 5, / 93

120 Clinical data 0.5 MSE N=200 N=500 N= λ A is n N, n = 0.64N Andrea Montanari (Stanford) Message Passing June 5, / 93

121 Gene expression data 0.5 MSE λ A is [from Hastie, Tibshirani, Friedman] Andrea Montanari (Stanford) Message Passing June 5, / 93

122 Gene expression data 0.5 MSE λ A is [from Hastie, Tibshirani, Friedman] Andrea Montanari (Stanford) Message Passing June 5, / 93

123 A theorem Theorem (Bayati, Montanari, 2010) Assume A ij N(0, 1/n), y = Ax 0 + w, and (τ 2, θ ) unique solution of Then, τ 2 = σ δ E{[η(X 0 + τ Z; θ ) X 0 ] 2 }, λ = θ { 1 1 δ E[η (X 0 + τ Z; θ )] } lim N almost surely as n. 1 N x LASSO (λ) x 0 2 = (τ 2 σ 2 )δ. Conjectured in a more general context with Donoho and Maleki Andrea Montanari (Stanford) Message Passing June 5, / 93

124 A theorem Theorem (Bayati, Montanari, 2010) Assume A ij N(0, 1/n), y = Ax 0 + w, and (τ 2, θ ) unique solution of Then, τ 2 = σ δ E{[η(X 0 + τ Z; θ ) X 0 ] 2 }, λ = θ { 1 1 δ E[η (X 0 + τ Z; θ )] } lim N almost surely as n. 1 N x LASSO (λ) x 0 2 = (τ 2 σ 2 )δ. Conjectured in a more general context with Donoho and Maleki Andrea Montanari (Stanford) Message Passing June 5, / 93

125 η η(y; θ) y θ +θ Andrea Montanari (Stanford) Message Passing June 5, / 93

126 Proof structure 1. Construct a message passing algorithm tho infer x. 2. Prove that the distribution of messages converges weakly to *** 2. and that their variance is tracked by a recursion. 3. Prove that message passing converges to the LASSO opt. Andrea Montanari (Stanford) Message Passing June 5, / 93

127 Approximate message passing algorithm x t+1 = η(x t + A T z t ; θ t ), z t = y Ax t + b t z t 1. with b t 1 NX η (x t 1 + A T z t 1 ; θ t ). n i=1 Andrea Montanari (Stanford) Message Passing June 5, / 93

128 Approximate message passing algorithm x t+1 = η(x t + A T z t ; θ t ), z t = y Ax t + b t z t 1. with b t 1 NX η (x t 1 + A T z t 1 ; θ t ). n i=1 Andrea Montanari (Stanford) Message Passing June 5, / 93

129 Approximate message passing algorithm x t+1 = η(x t + A T z t ; θ t ), z t = y Ax t + b t z t 1. Graph is dense. No local weak limit. No edge variables: only O(n) messages. Onsager term Andrea Montanari (Stanford) Message Passing June 5, / 93

130 Approximate message passing algorithm x t+1 = η(x t + A T z t ; θ t ), z t = y Ax t + b t z t 1. Graph is dense. No local weak limit. No edge variables: only O(n) messages. Onsager term Andrea Montanari (Stanford) Message Passing June 5, / 93

131 Approximate message passing algorithm x t+1 = η(x t + A T z t ; θ t ), z t = y Ax t + b t z t 1. Graph is dense. No local weak limit. No edge variables: only O(n) messages. Onsager term Andrea Montanari (Stanford) Message Passing June 5, / 93

132 Approximate message passing algorithm x t+1 = η(x t + A T z t ; θ t ), z t = y Ax t + b t z t 1. Graph is dense. No local weak limit. No edge variables: only O(n) messages. Onsager term Andrea Montanari (Stanford) Message Passing June 5, / 93

133 Approximate message passing algorithm x t+1 = η(x t + A T z t ; θ t ), z t = y Ax t + b t z t 1. Graph is dense. No local weak limit. No edge variables: only O(n) messages. Onsager term Andrea Montanari (Stanford) Message Passing June 5, / 93

134 A large gain in performances Comparison of Different Algorithms 1 IHT IST Tuned TST 0.9 LARS OMP 0.8 L 1 MPIST 0.7 ρ rho delta δ x t+1 = η(x t + A T z t ; θ t ), z t = y Ax t + b t z t 1. Andrea Montanari (Stanford) Message Passing June 5, / 93

135 Conclusion Andrea Montanari (Stanford) Message Passing June 5, / 93

136 I did not talk about Gaussian graphical models. (Weiss) Free energies Generalized BP. (Yedidia-Freeman-Weiss) Relation with convex relaxations. (Wainwright-Jordan, Bayati et al) Message passing to find game-theoretical equilibria. (Kanoria et al. arxiv: ) Andrea Montanari (Stanford) Message Passing June 5, / 93

137 Conclusion: A success looking for theoreticians A super-heuristics: Subsumes many natural heuristics. Easy to design/optimize. (Can be) used almost everywhere. No example in which it beats standard methods. Thanks! Andrea Montanari (Stanford) Message Passing June 5, / 93

138 Conclusion: A success looking for theoreticians A super-heuristics: Subsumes many natural heuristics. Easy to design/optimize. (Can be) used almost everywhere. No example in which it beats standard methods. Thanks! Andrea Montanari (Stanford) Message Passing June 5, / 93

Message passing and approximate message passing

Message passing and approximate message passing Arian Maleki Columbia University 1 / 47 What is the problem? Given pdf µ(x 1, x 2,..., x n ) we are interested in arg maxx1,x 2,...,x n µ(x 1, x 2,..., x