Message Passing Algorithms: A Success Looking for Theoreticians
|
|
- Meghan Beasley
- 5 years ago
- Views:
Transcription
1 Message Passing Algorithms: A Success Looking for Theoreticians Andrea Montanari Stanford University June 5, 2010 Andrea Montanari (Stanford) Message Passing June 5, / 93
2 What is this talk about? A couple of examples A F m n 2, y F n 2 { minimize d(x, y), subject to Ax = 0. Andrea Montanari (Stanford) Message Passing June 5, / 93
3 What is this talk about? A couple of examples A F m n 2, y F n 2 { minimize d(x, y), subject to Ax = 0. Andrea Montanari (Stanford) Message Passing June 5, / 93
4 You should not hope for easy solutions... Andrea Montanari (Stanford) Message Passing June 5, / 93
5 You should not hope for easy solutions... One of the little boxes solves it for n = In 10 6 secs. Andrea Montanari (Stanford) Message Passing June 5, / 93
6 You should not hope for easy solutions... Uses a message passing algorithm + A random sparse Andrea Montanari (Stanford) Message Passing June 5, / 93
7 Do not worry! No more hardware diagrams in this talk!!! Andrea Montanari (Stanford) Message Passing June 5, / 93
8 Do not worry! No more hardware diagrams in this talk!!! Andrea Montanari (Stanford) Message Passing June 5, / 93
9 Second example: Learning low-rank matrices Andrea Montanari (Stanford) Message Passing June 5, / 93
10 Second example: Learning low-rank matrices The Netflix dataset M = movies users 10 8 ratings Andrea Montanari (Stanford) Message Passing June 5, / 93
11 Learning low-rank matrices: The Netflix dataset? ???? users movies 10 6 queries M = Andrea Montanari (Stanford) Message Passing June 5, / 93
12 A prize awarded for: RMSE < ; ) Andrea Montanari (Stanford) Message Passing June 5, / 93
13 A popular cost function C(X, Y ) 1 2 (i,j) Revealed M ij (XY T 2 λ ) ij + 2 X 2 F + λ 2 Y 2 F. X R n r, Y R m r [Srebro, Rennie, Jaakkola 2005] Non convex. Large (!) scale. Andrea Montanari (Stanford) Message Passing June 5, / 93
14 A popular cost function C(X, Y ) 1 2 (i,j) Revealed M ij (XY T 2 λ ) ij + 2 X 2 F + λ 2 Y 2 F. X R n r, Y R m r [Srebro, Rennie, Jaakkola 2005] Non convex. Large (!) scale. Andrea Montanari (Stanford) Message Passing June 5, / 93
15 A popular cost function C(X, Y ) 1 2 (i,j) Revealed M ij (XY T 2 λ ) ij + 2 X 2 F + λ 2 Y 2 F. X R n r, Y R m r [Srebro, Rennie, Jaakkola 2005] Non convex. Large (!) scale. Andrea Montanari (Stanford) Message Passing June 5, / 93
16 Three algorithms RMSE Message Passing Alt. Min. OptSpace iterations OptSpace Gradient descent [Keshavan-Montanari-Oh 2009] Alternating Least Squares [Koren-Bell 2008] Message Passing [Keshavan-Montanari 2010] Andrea Montanari (Stanford) Message Passing June 5, / 93
17 Examples everywhere! Machine learning and AI. Coding and communications. Large random structures, statistical mechanics, Andrea Montanari (Stanford) Message Passing June 5, / 93
18 Outline 1 k-sat: A (Very) Simple Algorithm 2 Taking it seriously 3 Of trees and loops 4 Beyond uniqueness? 5 A recent application Andrea Montanari (Stanford) Message Passing June 5, / 93
19 k-sat: A (Very) Simple Algorithm Andrea Montanari (Stanford) Message Passing June 5, / 93
20 k-sat x = (x 1,..., x n ) {0, 1} n Instance (k = 3): (x 1 x 4 x 6 ) (x 7 x 8 x 10 ) (x 12 x 17 x 19 ) Andrea Montanari (Stanford) Message Passing June 5, / 93
21 Broder-Frieze-Upfal (1993) Pure literal 1: Repeat : 2: Find x i pure literal; 3: Fix x i ; x i is a pure literal if it never appears negated or if only appears negated Andrea Montanari (Stanford) Message Passing June 5, / 93
22 Broder-Frieze-Upfal (1993) Random k-sat: Uniformly random formula with n variables and nα clauses. Analysis: I. Markov chain in reduced state space. II. ODE method. Andrea Montanari (Stanford) Message Passing June 5, / 93
23 Broder-Frieze-Upfal (1993) Random k-sat: Uniformly random formula with n variables and nα clauses. Analysis: I. Markov chain in reduced state space. II. ODE method. Andrea Montanari (Stanford) Message Passing June 5, / 93
24 Message passing: 1. Factor graph x 4 x 7 x 2 x 8 x 3 x 6 x 1 x 5 (x 1 x 2 x 3 ) (x 3 x 4 x 6 ) (x 3 x 5 x 6 ) (x 6 x 7 x 8 ) [Labeled bipartite graph] Andrea Montanari (Stanford) Message Passing June 5, / 93
25 Message passing: 2. Messages a ˆν (t) a i i a ν (t) i a i ν (t) i a, ˆν(t) a i {free, cons}. Andrea Montanari (Stanford) Message Passing June 5, / 93
26 Message passing: 3. Update rules s jb label on edge (j, b) b i a { ν (t+1) cons if sib s i a = ia, ˆν (t) b i = cons for some b i \ a, free otherwise. j a i { ˆν (t) free (t) if ν j a = free for some j a \ i, a i = cons otherwise. Equivalent to Pure Literal! Andrea Montanari (Stanford) Message Passing June 5, / 93
27 Message passing: 3. Update rules s jb label on edge (j, b) b i a { ν (t+1) cons if sib s i a = ia, ˆν (t) b i = cons for some b i \ a, free otherwise. j a i { ˆν (t) free (t) if ν j a = free for some j a \ i, a i = cons otherwise. Equivalent to Pure Literal! Andrea Montanari (Stanford) Message Passing June 5, / 93
28 Message passing: 4. Analysis (density evolution) φ t = P{ν (t) j a = cons}, φt = P{ˆν (t) a i = cons}, b i a φ t+1 = 1 E {[ φ t ] d } = 1 e kα 2 b φ t, d j a k 1 i φ t = φ k 1 t, Andrea Montanari (Stanford) Message Passing June 5, / 93
29 Message passing: 4. Analysis φ t+1 = 1 exp{ kαφ k 1 t /2} f (φ t ) 1 f (φ) α > α pl (k) α < α pl (k) φ α pl (k) = sup { α : 1 e kαxk 1 /2 x x [0, 1] } Theorem (Broder-Frieze-Upfal 93, Molloy 04) Pure Literal finds a solution whp if α < α pl and fails whp if α > α pl. Andrea Montanari (Stanford) Message Passing June 5, / 93
30 This is a proof because B I (t) ball of radius t around uniform I [n] T some random rooted tree T(t) its first t generations Definition (Benjamini-Schramm 1996, Aldous-Steele 2003) The sequences of (factor) graphs G n = (V n = [n], E n ) converges locally to T if, for any t, B I (t) converges in distribution to T(t). Andrea Montanari (Stanford) Message Passing June 5, / 93
31 This is a proof because B I (t) ball of radius t around uniform I [n] T some random rooted tree T(t) its first t generations Definition (Benjamini-Schramm 1996, Aldous-Steele 2003) The sequences of (factor) graphs G n = (V n = [n], E n ) converges locally to T if, for any t, B I (t) converges in distribution to T(t). Andrea Montanari (Stanford) Message Passing June 5, / 93
32 Example [In fact a little bit more is needed... ] Andrea Montanari (Stanford) Message Passing June 5, / 93 Lemma Random k-sat instances locally Poisson Galton-Watson trees. Poisson(kα/2) Poisson(kα/2)
33 Example [In fact a little bit more is needed... ] Andrea Montanari (Stanford) Message Passing June 5, / 93 Lemma Random k-sat instances locally Poisson Galton-Watson trees. Poisson(kα/2) Poisson(kα/2)
34 Example [In fact a little bit more is needed... ] Andrea Montanari (Stanford) Message Passing June 5, / 93 Lemma Random k-sat instances locally Poisson Galton-Watson trees. Poisson(kα/2) Poisson(kα/2)
35 Example [In fact a little bit more is needed... ] Andrea Montanari (Stanford) Message Passing June 5, / 93 Lemma Random k-sat instances locally Poisson Galton-Watson trees. Poisson(kα/2) Poisson(kα/2)
36 Example [In fact a little bit more is needed... ] Andrea Montanari (Stanford) Message Passing June 5, / 93 Lemma Random k-sat instances locally Poisson Galton-Watson trees. Poisson(kα/2) Poisson(kα/2)
37 Example [In fact a little bit more is needed... ] Andrea Montanari (Stanford) Message Passing June 5, / 93 Lemma Random k-sat instances locally Poisson Galton-Watson trees. Poisson(kα/2) Poisson(kα/2)
38 Example Analysis generalizes to other ensembles Andrea Montanari (Stanford) Message Passing June 5, / 93 Lemma Random k-sat instances locally Poisson Galton-Watson trees. Poisson(kα/2) Poisson(kα/2)
39 A parenthesis: Generalizations are useful LDPC codes [Gallager 1966, Luby et al. 2001] Code = { x {0, 1} n : Ax = 0 mod 2 } A = adjacency matrix of sparse (pseudo)random graph Andrea Montanari (Stanford) Message Passing June 5, / 93
40 A parenthesis: Generalizations are useful LDPC codes [Gallager 1966, Luby et al. 2001] Code = { x {0, 1} n : Ax = 0 mod 2 } A = adjacency matrix of sparse (pseudo)random graph Andrea Montanari (Stanford) Message Passing June 5, / 93
41 A parenthesis: Generalizations are useful Luby et al. 2001: Random graph w degree distributions (λ, ρ) Optimization over (λ, ρ) Andrea Montanari (Stanford) Message Passing June 5, / 93
42 Relation with Gibbs measures Poisson(kα/2) Poisson(kα/2) µ T (x) = uniform measure over solutions of T( ) Andrea Montanari (Stanford) Message Passing June 5, / 93
43 Relation with Gibbs measures What does it mean uniform??? T(t = 2) T(t = 2) Definition (DLR) µ T is uniform (Gibbs) if µ T (x T(t) x T(t) ) is uniform for all t. Andrea Montanari (Stanford) Message Passing June 5, / 93
44 Relation with Gibbs measures T(t = 2) T(t = 2) Lemma (Montanari-Shah 2010) µ T is unique if and only if α < α u (k) with α u (k) = 2 log k { 1 + ok (1) }. k In particular α pl (k) = α u (k) + O(k 2 ). Andrea Montanari (Stanford) Message Passing June 5, / 93
45 More concretely T(t = 2) T(t = 2) sup x,x µ T (x ø = 1 x T(t) ) µ T (x ø = 1 x T(t) ) A e b t. Andrea Montanari (Stanford) Message Passing June 5, / 93
46 Relation with MCMC To avoid pathologies: µ n,β (x) 1 { exp β } {clauses violated by x} Z n,β Conjecture Heath bath/glauber/gibbs sampler has τ mix = O(n C ) whp for α < α u (k). [Mossel-Sly 201?] Andrea Montanari (Stanford) Message Passing June 5, / 93
47 Relation with MCMC To avoid pathologies: µ n,β (x) 1 { exp β } {clauses violated by x} Z n,β Conjecture Heath bath/glauber/gibbs sampler has τ mix = O(n C ) whp for α < α u (k). [Mossel-Sly 201?] Andrea Montanari (Stanford) Message Passing June 5, / 93
48 Relation with MCMC To avoid pathologies: µ n,β (x) 1 { exp β } {clauses violated by x} Z n,β Conjecture Heath bath/glauber/gibbs sampler has τ mix = O(n C ) whp for α < α u (k). [Mossel-Sly 201?] Andrea Montanari (Stanford) Message Passing June 5, / 93
49 Relation with MCMC To avoid pathologies: µ n,β (x) 1 { exp β } {clauses violated by x} Z n,β Conjecture Heath bath/glauber/gibbs sampler has τ mix = O(n C ) whp for α < α u (k). [Mossel-Sly 201?] Andrea Montanari (Stanford) Message Passing June 5, / 93
50 Relation with MCMC To avoid pathologies: µ n,β (x) 1 { exp β } {clauses violated by x} Z n,β Conjecture Heath bath/glauber/gibbs sampler has τ mix = O(n C ) whp for α < α u (k). [Mossel-Sly 201?] Andrea Montanari (Stanford) Message Passing June 5, / 93
51 Taking it seriously Andrea Montanari (Stanford) Message Passing June 5, / 93
52 How would you do it in your dreams? I would use Marginal (i [n]) 1: ; 2: ; 3: ; 4: Return µ(x i = 1). µ(x i = 1) = fraction of solutions in which x i = 1 Andrea Montanari (Stanford) Message Passing June 5, / 93
53 How would you do it in your dreams? I would use Marginal (i [n]) 1: ; 2: ; 3: ; 4: Return µ(x i = 1). µ(x i = 1) = fraction of solutions in which x i = 1 Andrea Montanari (Stanford) Message Passing June 5, / 93
54 How would you do it in your dreams? Solver 1: Repeat : 2: Choose x i ; 3: µ(x i = 1) = Marginal(i); 4 : Fix x i = 1 with probability µ(x i = 1); 5 : x i = 0 otherwise; Samples a solution uniformly. Andrea Montanari (Stanford) Message Passing June 5, / 93
55 Message passing implementation of Marginal(i) x 4 x 7 x 2 x 8 x 3 x 6 x 1 x 5 (x 1 x 2 x 3 ) (x 3 x 4 x 6 ) (x 3 x 5 x 6 ) (x 6 x 7 x 8 ) Andrea Montanari (Stanford) Message Passing June 5, / 93
56 Message passing implementation of Marginal(i) a ˆν (t) a i a ν (t) i a i i ν (t) i a, ˆν(t) a i M({0, 1}) (simplex of prob measures on {0, 1}). ν (t) i a = (ν(t) i a (0), ν(t) i a (1)) Andrea Montanari (Stanford) Message Passing June 5, / 93
57 Message passing implementation of Marginal(i) a ˆν (t) a i a ν (t) i a i i ν (t) i a, ˆν(t) a i M({0, 1}) (simplex of prob measures on {0, 1}). ν (t) i a = (ν(t) i a (0), ν(t) i a (1)) Andrea Montanari (Stanford) Message Passing June 5, / 93
58 Message passing implementation of Marginal(i) s jb label on edge (j, b) b i a ν (t+1) i a (x i) = ˆν (t) b i (x i) b i\a j a i { ˆν (t) a i (x i) 1 j a\i = ν j a(x j = s ja ) if x i = s ia, 1 otherwise. [Belief propagation] Andrea Montanari (Stanford) Message Passing June 5, / 93
59 What are these messy equations? Clause a: (x 1 x 2 x 3 ) G\a 2 a 1 3 N G (x A = ) number of solutions such that x A =, µ G (x i = 1) = N G (x i = 1) N G (x i = 0) + N G (x i = 1). Andrea Montanari (Stanford) Message Passing June 5, / 93
60 What are these messy equations? Clause a: (x 1 x 2 x 3 ) G\a 3 2 a 1 N G (x 1 = 1) = N G\a (x 2 = 0, x 3 = 0) + N G\a (x 2 = 0, x 3 = 1) +N G\a (x 2 = 1, x 3 = 0) + N G\a (x 2 = 1, x 3 = 1) N G (x 1 = 0) = N G\a (x 2 = 0, x 3 = 1) + N G\a (x 2 = 1, x 3 = 0) +N G\a (x 2 = 1, x 3 = 1) = N G (x 1 = 1) N G\a (x 2 = 0, x 3 = 0) Andrea Montanari (Stanford) Message Passing June 5, / 93
61 What are these messy equations? Clause a: (x 1 x 2 x 3 ) G\a 3 2 a 1 N G (x 1 = 1) = N G\a (x 2 = 0, x 3 = 0) + N G\a (x 2 = 0, x 3 = 1) +N G\a (x 2 = 1, x 3 = 0) + N G\a (x 2 = 1, x 3 = 1) N G (x 1 = 0) = N G\a (x 2 = 0, x 3 = 1) + N G\a (x 2 = 1, x 3 = 0) +N G\a (x 2 = 1, x 3 = 1) = N G (x 1 = 1) N G\a (x 2 = 0, x 3 = 0) Andrea Montanari (Stanford) Message Passing June 5, / 93
62 What are these messy equations? Clause a: (x 1 x 2 x 3 ) G\a 2 a 1 3 N G (x 1 = 0) N G (x 1 = 1) = µg (x 1 = 0) µ G (x 1 = 1) = 1 µ G\a (x 2 = 0, x 3 = 0) 1 µ G\a (x 2 = 0)µ G\a (x 3 = 0). Andrea Montanari (Stanford) Message Passing June 5, / 93
63 What are these messy equations? Clause a: (x 1 x 2 x 3 ) G\a 2 a 1 3 N G (x 1 = 0) N G (x 1 = 1) = µg (x 1 = 0) µ G (x 1 = 1) = 1 µ G\a (x 2 = 0, x 3 = 0) 1 µ G\a (x 2 = 0)µ G\a (x 3 = 0). Andrea Montanari (Stanford) Message Passing June 5, / 93
64 BP-guided decimation BP-guided decimation 1: Repeat : 2: Choose x i ; 3: Compute µ(x i = 1) using BP; 4 : Fix x i = 1 with probability µ(x i = 1); 5 : x i = 0 otherwise; Andrea Montanari (Stanford) Message Passing June 5, / 93
65 Is it worth the effort??? Proposition BP computes the correct marginals for if α < α u (k), where α u (k) = 2 log k { 1 + ok (1) } = α pl (k) { 1 + o k (1) } : ( k Proof. A general argument, wait 2 minutes. Andrea Montanari (Stanford) Message Passing June 5, / 93
66 Is it worth the effort??? Proposition BP computes the correct marginals for if α < α u (k), where α u (k) = 2 log k { 1 + ok (1) } = α pl (k) { 1 + o k (1) } : ( k Proof. A general argument, wait 2 minutes. Andrea Montanari (Stanford) Message Passing June 5, / 93
67 Is it worth the effort??? Proposition BP computes the correct marginals for if α < α u (k), where α u (k) = 2 log k { 1 + ok (1) } = α pl (k) { 1 + o k (1) } : ( k Proof. A general argument, wait 2 minutes. Andrea Montanari (Stanford) Message Passing June 5, / 93
68 Is it worth the effort??? Andrea Montanari (Stanford) Message Passing June 5, / 93
69 Is it worth the effort??? 1 N= N=10000 success probability N=3000 N= α [Montanari-Ricci-Semerjian 2007] Andrea Montanari (Stanford) Message Passing June 5, / 93
70 The context: 4-SAT α pl (4) 1.54 α BP dec (4) α s (4) 9.93 [Mézard-Parisi-Zecchina 2002 (Conj)] Andrea Montanari (Stanford) Message Passing June 5, / 93
71 The context: k-sat α pl (4) 2 log k k α BP dec (k) 2k e k 0 α s (k) 2 k log 2 [Achlioptas-Peres 2004] Andrea Montanari (Stanford) Message Passing June 5, / 93
72 I owed you a general argument Andrea Montanari (Stanford) Message Passing June 5, / 93
73 Of trees and loops Andrea Montanari (Stanford) Message Passing June 5, / 93
74 Computation tree root Andrea Montanari (Stanford) Message Passing June 5, / 93
75 Computation tree, T G,i (0) root root Andrea Montanari (Stanford) Message Passing June 5, / 93
76 Computation tree, T G,i (1) root root Andrea Montanari (Stanford) Message Passing June 5, / 93
77 Computation tree, T G,i (2) root root Andrea Montanari (Stanford) Message Passing June 5, / 93
78 Computation tree, T G,i (3) root root Andrea Montanari (Stanford) Message Passing June 5, / 93
79 So what? root root Remark After t iterations BP(i, t) outputs µ T G,i (t) (x i ) Andrea Montanari (Stanford) Message Passing June 5, / 93
80 And now for the general argument root root Theorem (Tatikonda-Jordan 2002) 1. If the Gibbs measure on T G,i ( ) is unique then BP converges. 2. Further, if G has large girth, then µ BP(t) (x i ) = µ T G,i (t) (x i ) µ G (x i ). Proof. By definition. Andrea Montanari (Stanford) Message Passing June 5, / 93
81 And now for the general argument root root Theorem (Tatikonda-Jordan 2002) 1. If the Gibbs measure on T G,i ( ) is unique then BP converges. 2. Further, if G has large girth, then µ BP(t) (x i ) = µ T G,i (t) (x i ) µ G (x i ). Proof. By definition. Andrea Montanari (Stanford) Message Passing June 5, / 93
82 And now for the general argument root root Theorem (Tatikonda-Jordan 2002) 1. If the Gibbs measure on T G,i ( ) is unique then BP converges. 2. Further, if G has large girth, then µ BP(t) (x i ) = µ T G,i (t) (x i ) µ G (x i ). Proof. By definition. Andrea Montanari (Stanford) Message Passing June 5, / 93
83 What about graphs with many short loops? Andrea Montanari (Stanford) Message Passing June 5, / 93
84 Example: Independent sets root x j {0, 1} µ G (x) = 1 Z (i,j) E I(x i x j ) λ x Andrea Montanari (Stanford) Message Passing June 5, / 93
85 Weitz s Self Avoiding Walk tree T SAW G,i = Truncation of T G,i ( ) + boundary conditions Lemma µ TSAW G,i (x i ) = µ G (x i ). Andrea Montanari (Stanford) Message Passing June 5, / 93
86 Weitz s Self Avoiding Walk tree root root As you can see... T SAW G,i = exp{θ(n)} Andrea Montanari (Stanford) Message Passing June 5, / 93
87 An algorithm Truncate T SAW G,i at depth Θ(log n) Theorem (Weitz 2006) Assume G has degree bounded by k, and Gibbs measure on k-regular trees is unique. Approximate counting can be performed in polynomial time. Andrea Montanari (Stanford) Message Passing June 5, / 93
88 An algorithm Truncate T SAW G,i at depth Θ(log n) Theorem (Weitz 2006) Assume G has degree bounded by k, and Gibbs measure on k-regular trees is unique. Approximate counting can be performed in polynomial time. Andrea Montanari (Stanford) Message Passing June 5, / 93
89 How general is this strategy? Theorem (Gamarnik-Katz 2007) Pretty general. [Uses appropriate backtracking tree] Andrea Montanari (Stanford) Message Passing June 5, / 93
90 How practical is this strategy? Complexity = Θ(n big exponent ) big exponent depends on b sup x,x µ T (x ø = 1 x T(t) ) µ T (x ø = 1 x T(t) ) A e b t. [Lu-Measson-Montanari 2008] Andrea Montanari (Stanford) Message Passing June 5, / 93
91 How practical is this strategy? Complexity = Θ(n big exponent ) big exponent depends on b sup x,x µ T (x ø = 1 x T(t) ) µ T (x ø = 1 x T(t) ) A e b t. [Lu-Measson-Montanari 2008] Andrea Montanari (Stanford) Message Passing June 5, / 93
92 How practical is this strategy? Complexity = Θ(n big exponent ) big exponent depends on b sup x,x µ T (x ø = 1 x T(t) ) µ T (x ø = 1 x T(t) ) A e b t. [Lu-Measson-Montanari 2008] Andrea Montanari (Stanford) Message Passing June 5, / 93
93 Beyond uniqueness? Andrea Montanari (Stanford) Message Passing June 5, / 93
94 Why should we hope for more? Example: NAE-SAT One clause: (x 1 x 16 x 71 ) (x 1 x 16 x 71 ) µ G (x) = uniform measure over solutions Andrea Montanari (Stanford) Message Passing June 5, / 93
95 Why should we hope for more? Theorem (Montanari-Restrepo-Tetali 2009) For α α (k) = 2 k 1 log 2{1 + o k (1)}, (G n, µ Gn ) locally (T, µ T ). Theorem (Achlioptas-Moore 2002) For NAE-SAT α s (k) = 2 k 1 log 2 { 1 + o k (1) }. Proof: Second moment method. Andrea Montanari (Stanford) Message Passing June 5, / 93
96 Why should we hope for more? Theorem (Montanari-Restrepo-Tetali 2009) For α α (k) = 2 k 1 log 2{1 + o k (1)}, (G n, µ Gn ) locally (T, µ T ). Theorem (Achlioptas-Moore 2002) For NAE-SAT α s (k) = 2 k 1 log 2 { 1 + o k (1) }. Proof: Second moment method. Andrea Montanari (Stanford) Message Passing June 5, / 93
97 Why should we hope for more? Theorem (Montanari-Restrepo-Tetali 2009) For α α (k) = 2 k 1 log 2{1 + o k (1)}, (G n, µ Gn ) locally (T, µ T ). Theorem (Achlioptas-Moore 2002) For NAE-SAT α s (k) = 2 k 1 log 2 { 1 + o k (1) }. Proof: Second moment method. Andrea Montanari (Stanford) Message Passing June 5, / 93
98 What is happening? Notions of correlation decay Uniqueness: sup x,x x i µ(xi x i,r ) µ(x i x i,r ) 0 Extremality: x i,x i,r µ(x i, x i,r ) µ(x i )µ(x i,r ) 0 Concentration: µ(xi(1),..., x i(l) ) µ(x i(1) ) µ(x i(l) ) 0 x i(1)...x i(l) Andrea Montanari (Stanford) Message Passing June 5, / 93
99 What is happening? Notions of correlation decay Uniqueness: sup x,x x i µ(xi x i,r ) µ(x i x i,r ) 0 Extremality: x i,x i,r µ(x i, x i,r ) µ(x i )µ(x i,r ) 0 Concentration: µ(xi(1),..., x i(l) ) µ(x i(1) ) µ(x i(l) ) 0 x i(1)...x i(l) Andrea Montanari (Stanford) Message Passing June 5, / 93
100 What is happening? Notions of correlation decay Uniqueness: sup x,x x i µ(xi x i,r ) µ(x i x i,r ) 0 Extremality: x i,x i,r µ(x i, x i,r ) µ(x i )µ(x i,r ) 0 Concentration: µ(xi(1),..., x i(l) ) µ(x i(1) ) µ(x i(l) ) 0 x i(1)...x i(l) Andrea Montanari (Stanford) Message Passing June 5, / 93
101 What is happening? Notions of correlation decay Uniqueness: sup x,x x i µ(xi x i,r ) µ(x i x i,r ) 0 Extremality: x i,x i,r µ(x i, x i,r ) µ(x i )µ(x i,r ) 0 Concentration: µ(xi(1),..., x i(l) ) µ(x i(1) ) µ(x i(l) ) 0 x i(1)...x i(l) Andrea Montanari (Stanford) Message Passing June 5, / 93
102 What happens in k-sat? uniqueness extremality concentration no-concentration α u (k) α d (k) α c (k) α s (k) α u (k) = (2 log k)/k +... α d (k) = (2 k log k)/k +... [proved] (α d (4) 9.38) [proved in NAE-SAT] α c (k) = 2 k log log (α c(4) 9.547) α s (k) = 2 k log (1 + log 2) +... (α s(4) 9.93) Andrea Montanari (Stanford) Message Passing June 5, / 93
103 So what? Andrea Montanari (Stanford) Message Passing June 5, / 93
104 So what? Bethe-Peierls approximation root µ(x) = (i,j) E ψ i,j(x i, x j )/Z x i X Definition A set of messages is a collection {ν i j ( )} indexed by directed edges in G, where ν i j M(X ). Andrea Montanari (Stanford) Message Passing June 5, / 93
105 So what? Bethe-Peierls approximation root µ(x) = (i,j) E ψ i,j(x i, x j )/Z x i X Definition A set of messages is a collection {ν i j ( )} indexed by directed edges in G, where ν i j M(X ). Andrea Montanari (Stanford) Message Passing June 5, / 93
106 Given F G, diam(f ) 2l girth, such that deg F (i) = deg G (i) or 1 ν U (x U ) 1 C(ν U ) (ij) F ψ ij (x i, x j ) ν i j(i) (x i ). i F Andrea Montanari (Stanford) Message Passing June 5, / 93
107 Bethe states Definition A probability distribution ρ on X V is an (ε, r) Bethe state, if there exists a set of messages {ν i j ( )} such that, for any F G with diam(f ) 2r ρ U ν U TV ε. Theorem (Dembo-Montanari 2009) If µ is extremal with rate δ( ) then it an (ε, r) Bethe state for any r < l and ε Cδ(l r). Andrea Montanari (Stanford) Message Passing June 5, / 93
108 Bethe states Definition A probability distribution ρ on X V is an (ε, r) Bethe state, if there exists a set of messages {ν i j ( )} such that, for any F G with diam(f ) 2r ρ U ν U TV ε. Theorem (Dembo-Montanari 2009) If µ is extremal with rate δ( ) then it an (ε, r) Bethe state for any r < l and ε Cδ(l r). Andrea Montanari (Stanford) Message Passing June 5, / 93
109 Algorithms vs correlation decay α u (k) α d (k) α c (k) α s (k) Conjecture BP-guided decimation finds solutions up to α (k) α d (k). Currently huge gap! Andrea Montanari (Stanford) Message Passing June 5, / 93
110 Algorithms vs correlation decay α u (k) α d (k) α c (k) α s (k) Conjecture BP-guided decimation finds solutions up to α (k) α d (k). Currently huge gap! Andrea Montanari (Stanford) Message Passing June 5, / 93
111 Relation with the structure of the solution space α d (k) α c (k) α s (k) [Biroli-Monasson-Weigt 1999] [Mézard-Parisi-Zecchina 2001] [Krzakala et al. 2007] [Achlioptas-Coja 2009] Andrea Montanari (Stanford) Message Passing June 5, / 93
112 A recent application Andrea Montanari (Stanford) Message Passing June 5, / 93
113 Noisy undedetermined linear systems y = A x 0 + w Estimate x 0 R N given (y, A). [Signal processing: Donoho, Candes, Tao,... ] [Sketching: Indyk, Gilbert, Muthu,... ] Andrea Montanari (Stanford) Message Passing June 5, / 93
114 Noisy undedetermined linear systems y = A x 0 + w Estimate x 0 R N given (y, A). [Signal processing: Donoho, Candes, Tao,... ] [Sketching: Indyk, Gilbert, Muthu,... ] Andrea Montanari (Stanford) Message Passing June 5, / 93
115 Noisy undedetermined linear systems y = A x 0 + w Estimate x 0 R N given (y, A). [Signal processing: Donoho, Candes, Tao,... ] [Sketching: Indyk, Gilbert, Muthu,... ] Andrea Montanari (Stanford) Message Passing June 5, / 93
116 The LASSO x(y, A) = argmin x R N C A,y (x) C A,y (x) = λ x y Ax 2 2 [Tibshirani 96; Chen, Donoho 95; papers] Andrea Montanari (Stanford) Message Passing June 5, / 93
117 Wonderful, but... What performance should I expect? How am I supposed to choose C A,y? What if I can design A? Low-complexity algorithms? Andrea Montanari (Stanford) Message Passing June 5, / 93
118 A little experiment A real data +1 with prob , x 0,i = 0 with prob , 1 with prob , w i N(0, 0.2) Andrea Montanari (Stanford) Message Passing June 5, / 93
119 Clinical data 0.5 MSE N=200 N=500 N= λ A is n N, n = 0.64N Andrea Montanari (Stanford) Message Passing June 5, / 93
120 Clinical data 0.5 MSE N=200 N=500 N= λ A is n N, n = 0.64N Andrea Montanari (Stanford) Message Passing June 5, / 93
121 Gene expression data 0.5 MSE λ A is [from Hastie, Tibshirani, Friedman] Andrea Montanari (Stanford) Message Passing June 5, / 93
122 Gene expression data 0.5 MSE λ A is [from Hastie, Tibshirani, Friedman] Andrea Montanari (Stanford) Message Passing June 5, / 93
123 A theorem Theorem (Bayati, Montanari, 2010) Assume A ij N(0, 1/n), y = Ax 0 + w, and (τ 2, θ ) unique solution of Then, τ 2 = σ δ E{[η(X 0 + τ Z; θ ) X 0 ] 2 }, λ = θ { 1 1 δ E[η (X 0 + τ Z; θ )] } lim N almost surely as n. 1 N x LASSO (λ) x 0 2 = (τ 2 σ 2 )δ. Conjectured in a more general context with Donoho and Maleki Andrea Montanari (Stanford) Message Passing June 5, / 93
124 A theorem Theorem (Bayati, Montanari, 2010) Assume A ij N(0, 1/n), y = Ax 0 + w, and (τ 2, θ ) unique solution of Then, τ 2 = σ δ E{[η(X 0 + τ Z; θ ) X 0 ] 2 }, λ = θ { 1 1 δ E[η (X 0 + τ Z; θ )] } lim N almost surely as n. 1 N x LASSO (λ) x 0 2 = (τ 2 σ 2 )δ. Conjectured in a more general context with Donoho and Maleki Andrea Montanari (Stanford) Message Passing June 5, / 93
125 η η(y; θ) y θ +θ Andrea Montanari (Stanford) Message Passing June 5, / 93
126 Proof structure 1. Construct a message passing algorithm tho infer x. 2. Prove that the distribution of messages converges weakly to *** 2. and that their variance is tracked by a recursion. 3. Prove that message passing converges to the LASSO opt. Andrea Montanari (Stanford) Message Passing June 5, / 93
127 Approximate message passing algorithm x t+1 = η(x t + A T z t ; θ t ), z t = y Ax t + b t z t 1. with b t 1 NX η (x t 1 + A T z t 1 ; θ t ). n i=1 Andrea Montanari (Stanford) Message Passing June 5, / 93
128 Approximate message passing algorithm x t+1 = η(x t + A T z t ; θ t ), z t = y Ax t + b t z t 1. with b t 1 NX η (x t 1 + A T z t 1 ; θ t ). n i=1 Andrea Montanari (Stanford) Message Passing June 5, / 93
129 Approximate message passing algorithm x t+1 = η(x t + A T z t ; θ t ), z t = y Ax t + b t z t 1. Graph is dense. No local weak limit. No edge variables: only O(n) messages. Onsager term Andrea Montanari (Stanford) Message Passing June 5, / 93
130 Approximate message passing algorithm x t+1 = η(x t + A T z t ; θ t ), z t = y Ax t + b t z t 1. Graph is dense. No local weak limit. No edge variables: only O(n) messages. Onsager term Andrea Montanari (Stanford) Message Passing June 5, / 93
131 Approximate message passing algorithm x t+1 = η(x t + A T z t ; θ t ), z t = y Ax t + b t z t 1. Graph is dense. No local weak limit. No edge variables: only O(n) messages. Onsager term Andrea Montanari (Stanford) Message Passing June 5, / 93
132 Approximate message passing algorithm x t+1 = η(x t + A T z t ; θ t ), z t = y Ax t + b t z t 1. Graph is dense. No local weak limit. No edge variables: only O(n) messages. Onsager term Andrea Montanari (Stanford) Message Passing June 5, / 93
133 Approximate message passing algorithm x t+1 = η(x t + A T z t ; θ t ), z t = y Ax t + b t z t 1. Graph is dense. No local weak limit. No edge variables: only O(n) messages. Onsager term Andrea Montanari (Stanford) Message Passing June 5, / 93
134 A large gain in performances Comparison of Different Algorithms 1 IHT IST Tuned TST 0.9 LARS OMP 0.8 L 1 MPIST 0.7 ρ rho delta δ x t+1 = η(x t + A T z t ; θ t ), z t = y Ax t + b t z t 1. Andrea Montanari (Stanford) Message Passing June 5, / 93
135 Conclusion Andrea Montanari (Stanford) Message Passing June 5, / 93
136 I did not talk about Gaussian graphical models. (Weiss) Free energies Generalized BP. (Yedidia-Freeman-Weiss) Relation with convex relaxations. (Wainwright-Jordan, Bayati et al) Message passing to find game-theoretical equilibria. (Kanoria et al. arxiv: ) Andrea Montanari (Stanford) Message Passing June 5, / 93
137 Conclusion: A success looking for theoreticians A super-heuristics: Subsumes many natural heuristics. Easy to design/optimize. (Can be) used almost everywhere. No example in which it beats standard methods. Thanks! Andrea Montanari (Stanford) Message Passing June 5, / 93
138 Conclusion: A success looking for theoreticians A super-heuristics: Subsumes many natural heuristics. Easy to design/optimize. (Can be) used almost everywhere. No example in which it beats standard methods. Thanks! Andrea Montanari (Stanford) Message Passing June 5, / 93
Message passing and approximate message passing
Message passing and approximate message passing Arian Maleki Columbia University 1 / 47 What is the problem? Given pdf µ(x 1, x 2,..., x n ) we are interested in arg maxx1,x 2,...,x n µ(x 1, x 2,..., x
More informationStatistical Physics on Sparse Random Graphs: Mathematical Perspective
Statistical Physics on Sparse Random Graphs: Mathematical Perspective Amir Dembo Stanford University Northwestern, July 19, 2016 x 5 x 6 Factor model [DM10, Eqn. (1.4)] x 1 x 2 x 3 x 4 x 9 x8 x 7 x 10
More informationRisk and Noise Estimation in High Dimensional Statistics via State Evolution
Risk and Noise Estimation in High Dimensional Statistics via State Evolution Mohsen Bayati Stanford University Joint work with Jose Bento, Murat Erdogdu, Marc Lelarge, and Andrea Montanari Statistical
More informationPhase transitions in discrete structures
Phase transitions in discrete structures Amin Coja-Oghlan Goethe University Frankfurt Overview 1 The physics approach. [following Mézard, Montanari 09] Basics. Replica symmetry ( Belief Propagation ).
More informationMatrix Completion from Fewer Entries
from Fewer Entries Stanford University March 30, 2009 Outline The problem, a look at the data, and some results (slides) 2 Proofs (blackboard) arxiv:090.350 The problem, a look at the data, and some results
More informationPlanted Cliques, Iterative Thresholding and Message Passing Algorithms
Planted Cliques, Iterative Thresholding and Message Passing Algorithms Yash Deshpande and Andrea Montanari Stanford University November 5, 2013 Deshpande, Montanari Planted Cliques November 5, 2013 1 /
More informationReconstruction for Models on Random Graphs
1 Stanford University, 2 ENS Paris October 21, 2007 Outline 1 The Reconstruction Problem 2 Related work and applications 3 Results The Reconstruction Problem: A Story Alice and Bob Alice, Bob and G root
More informationMatrix completion: Fundamental limits and efficient algorithms. Sewoong Oh Stanford University
Matrix completion: Fundamental limits and efficient algorithms Sewoong Oh Stanford University 1 / 35 Low-rank matrix completion Low-rank Data Matrix Sparse Sampled Matrix Complete the matrix from small
More informationChasing the k-sat Threshold
Chasing the k-sat Threshold Amin Coja-Oghlan Goethe University Frankfurt Random Discrete Structures In physics, phase transitions are studied by non-rigorous methods. New mathematical ideas are necessary
More informationOn the number of circuits in random graphs. Guilhem Semerjian. [ joint work with Enzo Marinari and Rémi Monasson ]
On the number of circuits in random graphs Guilhem Semerjian [ joint work with Enzo Marinari and Rémi Monasson ] [ Europhys. Lett. 73, 8 (2006) ] and [ cond-mat/0603657 ] Orsay 13-04-2006 Outline of the
More informationProbabilistic and Bayesian Machine Learning
Probabilistic and Bayesian Machine Learning Day 4: Expectation and Belief Propagation Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/
More informationSparse Superposition Codes for the Gaussian Channel
Sparse Superposition Codes for the Gaussian Channel Florent Krzakala (LPS, Ecole Normale Supérieure, France) J. Barbier (ENS) arxiv:1403.8024 presented at ISIT 14 Long version in preparation Communication
More informationReconstruction in the Generalized Stochastic Block Model
Reconstruction in the Generalized Stochastic Block Model Marc Lelarge 1 Laurent Massoulié 2 Jiaming Xu 3 1 INRIA-ENS 2 INRIA-Microsoft Research Joint Centre 3 University of Illinois, Urbana-Champaign GDR
More informationWalk-Sum Interpretation and Analysis of Gaussian Belief Propagation
Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation Jason K. Johnson, Dmitry M. Malioutov and Alan S. Willsky Department of Electrical Engineering and Computer Science Massachusetts Institute
More informationThe Tightness of the Kesten-Stigum Reconstruction Bound for a Symmetric Model With Multiple Mutations
The Tightness of the Kesten-Stigum Reconstruction Bound for a Symmetric Model With Multiple Mutations City University of New York Frontier Probability Days 2018 Joint work with Dr. Sreenivasa Rao Jammalamadaka
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3
More informationApproximate counting of large subgraphs in random graphs with statistical mechanics methods
Approximate counting of large subgraphs in random graphs with statistical mechanics methods Guilhem Semerjian LPT-ENS Paris 13.03.08 / Eindhoven in collaboration with Rémi Monasson, Enzo Marinari and Valery
More informationMatrix Completion: Fundamental Limits and Efficient Algorithms
Matrix Completion: Fundamental Limits and Efficient Algorithms Sewoong Oh PhD Defense Stanford University July 23, 2010 1 / 33 Matrix completion Find the missing entries in a huge data matrix 2 / 33 Example
More informationApproximate Message Passing
Approximate Message Passing Mohammad Emtiyaz Khan CS, UBC February 8, 2012 Abstract In this note, I summarize Sections 5.1 and 5.2 of Arian Maleki s PhD thesis. 1 Notation We denote scalars by small letters
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationDecay of Correlation in Spin Systems
(An algorithmic perspective to the) Decay of Correlation in Spin Systems Yitong Yin Nanjing University Decay of Correlation hardcore model: random independent set I v Pr[v 2 I ] (d+)-regular tree `! σ:
More informationPhase Transitions in the Coloring of Random Graphs
Phase Transitions in the Coloring of Random Graphs Lenka Zdeborová (LPTMS, Orsay) In collaboration with: F. Krząkała (ESPCI Paris) G. Semerjian (ENS Paris) A. Montanari (Standford) F. Ricci-Tersenghi (La
More informationUNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS
UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS JONATHAN YEDIDIA, WILLIAM FREEMAN, YAIR WEISS 2001 MERL TECH REPORT Kristin Branson and Ian Fasel June 11, 2003 1. Inference Inference problems
More informationTHE PHYSICS OF COUNTING AND SAMPLING ON RANDOM INSTANCES. Lenka Zdeborová
THE PHYSICS OF COUNTING AND SAMPLING ON RANDOM INSTANCES Lenka Zdeborová (CEA Saclay and CNRS, France) MAIN CONTRIBUTORS TO THE PHYSICS UNDERSTANDING OF RANDOM INSTANCES Braunstein, Franz, Kabashima, Kirkpatrick,
More informationOn convergence of Approximate Message Passing
On convergence of Approximate Message Passing Francesco Caltagirone (1), Florent Krzakala (2) and Lenka Zdeborova (1) (1) Institut de Physique Théorique, CEA Saclay (2) LPS, Ecole Normale Supérieure, Paris
More informationThe non-backtracking operator
The non-backtracking operator Florent Krzakala LPS, Ecole Normale Supérieure in collaboration with Paris: L. Zdeborova, A. Saade Rome: A. Decelle Würzburg: J. Reichardt Santa Fe: C. Moore, P. Zhang Berkeley:
More informationarxiv: v1 [stat.ml] 28 Oct 2017
Jinglin Chen Jian Peng Qiang Liu UIUC UIUC Dartmouth arxiv:1710.10404v1 [stat.ml] 28 Oct 2017 Abstract We propose a new localized inference algorithm for answering marginalization queries in large graphical
More information1 Tridiagonal matrices
Lecture Notes: β-ensembles Bálint Virág Notes with Diane Holcomb 1 Tridiagonal matrices Definition 1. Suppose you have a symmetric matrix A, we can define its spectral measure (at the first coordinate
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15
More informationMessage Passing Algorithms for Compressed Sensing: I. Motivation and Construction
Message Passing Algorithms for Compressed Sensing: I. Motivation and Construction David L. Donoho Department of Statistics Arian Maleki Department of Electrical Engineering Andrea Montanari Department
More informationA New Look at Survey Propagation and its Generalizations
A New Look at Survey Propagation and its Generalizations Elitza Maneva Elchanan Mossel Martin J. Wainwright September 7, 2004 Technical Report 669 Department of Statistics University of California, Berkeley
More informationHow to Design Message Passing Algorithms for Compressed Sensing
How to Design Message Passing Algorithms for Compressed Sensing David L. Donoho, Arian Maleki and Andrea Montanari, February 17, 2011 Abstract Finding fast first order methods for recovering signals from
More informationSpectral thresholds in the bipartite stochastic block model
Spectral thresholds in the bipartite stochastic block model Laura Florescu and Will Perkins NYU and U of Birmingham September 27, 2016 Laura Florescu and Will Perkins Spectral thresholds in the bipartite
More informationGrowth Rate of Spatially Coupled LDPC codes
Growth Rate of Spatially Coupled LDPC codes Workshop on Spatially Coupled Codes and Related Topics at Tokyo Institute of Technology 2011/2/19 Contents 1. Factor graph, Bethe approximation and belief propagation
More informationPerturbed Message Passing for Constraint Satisfaction Problems
Journal of Machine Learning Research 16 (2015) 1249-1274 Submitted 4/13; Revised 9/14; Published 7/15 Perturbed Message Passing for Constraint Satisfaction Problems Siamak Ravanbakhsh Russell Greiner Department
More informationGraphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More informationPhase Transitions (and their meaning) in Random Constraint Satisfaction Problems
International Workshop on Statistical-Mechanical Informatics 2007 Kyoto, September 17 Phase Transitions (and their meaning) in Random Constraint Satisfaction Problems Florent Krzakala In collaboration
More informationMin-Max Message Passing and Local Consistency in Constraint Networks
Min-Max Message Passing and Local Consistency in Constraint Networks Hong Xu, T. K. Satish Kumar, and Sven Koenig University of Southern California, Los Angeles, CA 90089, USA hongx@usc.edu tkskwork@gmail.com
More informationA Graphical Transformation for Belief Propagation: Maximum Weight Matchings and Odd-Sized Cycles
A Graphical Transformation for Belief Propagation: Maximum Weight Matchings and Odd-Sized Cycles Jinwoo Shin Department of Electrical Engineering Korea Advanced Institute of Science and Technology Daejeon,
More informationApproximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract)
Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract) Arthur Choi and Adnan Darwiche Computer Science Department University of California, Los Angeles
More informationThe condensation phase transition in random graph coloring
The condensation phase transition in random graph coloring Victor Bapst Goethe University, Frankfurt Joint work with Amin Coja-Oghlan, Samuel Hetterich, Felicia Rassmann and Dan Vilenchik arxiv:1404.5513
More informationLow-rank Matrix Completion from Noisy Entries
Low-rank Matrix Completion from Noisy Entries Sewoong Oh Joint work with Raghunandan Keshavan and Andrea Montanari Stanford University Forty-Seventh Allerton Conference October 1, 2009 R.Keshavan, S.Oh,
More informationExponential families also behave nicely under conditioning. Specifically, suppose we write η = (η 1, η 2 ) R k R p k so that
1 More examples 1.1 Exponential families under conditioning Exponential families also behave nicely under conditioning. Specifically, suppose we write η = η 1, η 2 R k R p k so that dp η dm 0 = e ηt 1
More informationAndrea Montanari. Professor of Electrical Engineering, of Statistics and, by courtesy, of Mathematics. Bio. Teaching BIO ACADEMIC APPOINTMENTS
Professor of Electrical Engineering, of Statistics and, by courtesy, of Mathematics Bio BIO I am interested in developing efficient algorithms to make sense of large amounts of noisy data, extract information
More information14 : Theory of Variational Inference: Inner and Outer Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Maria Ryskina, Yen-Chia Hsu 1 Introduction
More informationCombining geometry and combinatorics
Combining geometry and combinatorics A unified approach to sparse signal recovery Anna C. Gilbert University of Michigan joint work with R. Berinde (MIT), P. Indyk (MIT), H. Karloff (AT&T), M. Strauss
More informationThe Condensation Phase Transition in the Regular k-sat Model
The Condensation Phase Transition in the Regular k-sat Model Victor Bapst 1 and Amin Coja-Oghlan 2 1 Mathematics Institute, Goethe University, Frankfurt, Germany bapst@math.uni-frankfurt.de 2 Mathematics
More informationImproved FPTAS for Multi-Spin Systems
Improved FPTAS for Multi-Spin Systems Pinyan Lu and Yitong Yin 2 Microsoft Research Asia, China. pinyanl@microsoft.com 2 State Key Laboratory for Novel Software Technology, Nanjing University, China. yinyt@nju.edu.cn
More informationRandom Graph Coloring
Random Graph Coloring Amin Coja-Oghlan Goethe University Frankfurt A simple question... What is the chromatic number of G(n,m)? [ER 60] n vertices m = dn/2 random edges ... that lacks a simple answer Early
More informationXVI International Congress on Mathematical Physics
Aug 2009 XVI International Congress on Mathematical Physics Underlying geometry: finite graph G=(V,E ). Set of possible configurations: V (assignments of +/- spins to the vertices). Probability of a configuration
More informationLecture 1: September 25
0-725: Optimization Fall 202 Lecture : September 25 Lecturer: Geoff Gordon/Ryan Tibshirani Scribes: Subhodeep Moitra Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have
More informationSparse analysis Lecture VII: Combining geometry and combinatorics, sparse matrices for sparse signal recovery
Sparse analysis Lecture VII: Combining geometry and combinatorics, sparse matrices for sparse signal recovery Anna C. Gilbert Department of Mathematics University of Michigan Sparse signal recovery measurements:
More informationVariational algorithms for marginal MAP
Variational algorithms for marginal MAP Alexander Ihler UC Irvine CIOG Workshop November 2011 Variational algorithms for marginal MAP Alexander Ihler UC Irvine CIOG Workshop November 2011 Work with Qiang
More informationAdvanced Algorithms 南京大学 尹一通
Advanced Algorithms 南京大学 尹一通 Constraint Satisfaction Problem variables: (CSP) X = {x 1, x2,..., xn} each variable ranges over a finite domain Ω an assignment σ ΩX assigns each variable a value in Ω constraints:
More informationIndependence and chromatic number (and random k-sat): Sparse Case. Dimitris Achlioptas Microsoft
Independence and chromatic number (and random k-sat): Sparse Case Dimitris Achlioptas Microsoft Random graphs W.h.p.: with probability that tends to 1 as n. Hamiltonian cycle Let τ 2 be the moment all
More informationLecture 5: Random Energy Model
STAT 206A: Gibbs Measures Invited Speaker: Andrea Montanari Lecture 5: Random Energy Model Lecture date: September 2 Scribe: Sebastien Roch This is a guest lecture by Andrea Montanari (ENS Paris and Stanford)
More informationThe cavity method. Vingt ans après
The cavity method Vingt ans après Les Houches lectures 1982 Early days with Giorgio SK model E = J ij s i s j i
More informationECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis
ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear
More informationRecovery of Simultaneously Structured Models using Convex Optimization
Recovery of Simultaneously Structured Models using Convex Optimization Maryam Fazel University of Washington Joint work with: Amin Jalali (UW), Samet Oymak and Babak Hassibi (Caltech) Yonina Eldar (Technion)
More informationBayesian Machine Learning - Lecture 7
Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1
More information5. Density evolution. Density evolution 5-1
5. Density evolution Density evolution 5-1 Probabilistic analysis of message passing algorithms variable nodes factor nodes x1 a x i x2 a(x i ; x j ; x k ) x3 b x4 consider factor graph model G = (V ;
More informationPart 1: Expectation Propagation
Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud
More informationELE 538B: Mathematics of High-Dimensional Data. Spectral methods. Yuxin Chen Princeton University, Fall 2018
ELE 538B: Mathematics of High-Dimensional Data Spectral methods Yuxin Chen Princeton University, Fall 2018 Outline A motivating application: graph clustering Distance and angles between two subspaces Eigen-space
More informationThe cutoff phenomenon for random walk on random directed graphs
The cutoff phenomenon for random walk on random directed graphs Justin Salez Joint work with C. Bordenave and P. Caputo Outline of the talk Outline of the talk 1. The cutoff phenomenon for Markov chains
More informationFactor Graphs and Message Passing Algorithms Part 1: Introduction
Factor Graphs and Message Passing Algorithms Part 1: Introduction Hans-Andrea Loeliger December 2007 1 The Two Basic Problems 1. Marginalization: Compute f k (x k ) f(x 1,..., x n ) x 1,..., x n except
More informationA New Look at Survey Propagation and Its Generalizations
A New Look at Survey Propagation and Its Generalizations ELITZA MANEVA, ELCHANAN MOSSEL, AND MARTIN J. WAINWRIGHT University of California Berkeley, Berkeley, California Abstract. This article provides
More informationOberwolfach workshop on Combinatorics and Probability
Apr 2009 Oberwolfach workshop on Combinatorics and Probability 1 Describes a sharp transition in the convergence of finite ergodic Markov chains to stationarity. Steady convergence it takes a while to
More informationProbabilistic Graphical Models
2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector
More informationScaling Neighbourhood Methods
Quick Recap Scaling Neighbourhood Methods Collaborative Filtering m = #items n = #users Complexity : m * m * n Comparative Scale of Signals ~50 M users ~25 M items Explicit Ratings ~ O(1M) (1 per billion)
More informationEstimating the Capacity of the 2-D Hard Square Constraint Using Generalized Belief Propagation
Estimating the Capacity of the 2-D Hard Square Constraint Using Generalized Belief Propagation Navin Kashyap (joint work with Eric Chan, Mahdi Jafari Siavoshani, Sidharth Jaggi and Pascal Vontobel, Chinese
More informationLecture 9: PGM Learning
13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and
More informationVariational Inference. Sargur Srihari
Variational Inference Sargur srihari@cedar.buffalo.edu 1 Plan of discussion We first describe inference with PGMs and the intractability of exact inference Then give a taxonomy of inference algorithms
More informationJunction Tree, BP and Variational Methods
Junction Tree, BP and Variational Methods Adrian Weller MLSALT4 Lecture Feb 21, 2018 With thanks to David Sontag (MIT) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by
More informationHigh-dimensional Statistics
High-dimensional Statistics Pradeep Ravikumar UT Austin Outline 1. High Dimensional Data : Large p, small n 2. Sparsity 3. Group Sparsity 4. Low Rank 1 Curse of Dimensionality Statistical Learning: Given
More information13 : Variational Inference: Loopy Belief Propagation and Mean Field
10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction
More informationLecture 4: Hashing and Streaming Algorithms
CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 4: Hashing and Streaming Algorithms Lecturer: Shayan Oveis Gharan 01/18/2017 Scribe: Yuqing Ai Disclaimer: These notes have not been subjected
More informationComplete Convergence of Message Passing Algorithms for some Satisfiability Problems
Complete Convergence of Message Passing Algorithms for some Satisfiability Problems Uriel Feige 1, Elchanan Mossel 2 and Dan Vilenchik 3 1 Micorosoft Research and The Weizmann Institute. urifeige@microsoft.com
More information13 : Variational Inference: Loopy Belief Propagation
10-708: Probabilistic Graphical Models 10-708, Spring 2014 13 : Variational Inference: Loopy Belief Propagation Lecturer: Eric P. Xing Scribes: Rajarshi Das, Zhengzhong Liu, Dishan Gupta 1 Introduction
More informationON THE MINIMUM DISTANCE OF NON-BINARY LDPC CODES. Advisor: Iryna Andriyanova Professor: R.. udiger Urbanke
ON THE MINIMUM DISTANCE OF NON-BINARY LDPC CODES RETHNAKARAN PULIKKOONATTU ABSTRACT. Minimum distance is an important parameter of a linear error correcting code. For improved performance of binary Low
More informationLecture 15: MCMC Sanjeev Arora Elad Hazan. COS 402 Machine Learning and Artificial Intelligence Fall 2016
Lecture 15: MCMC Sanjeev Arora Elad Hazan COS 402 Machine Learning and Artificial Intelligence Fall 2016 Course progress Learning from examples Definition + fundamental theorem of statistical learning,
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov
More informationInferring Sparsity: Compressed Sensing Using Generalized Restricted Boltzmann Machines. Eric W. Tramel. itwist 2016 Aalborg, DK 24 August 2016
Inferring Sparsity: Compressed Sensing Using Generalized Restricted Boltzmann Machines Eric W. Tramel itwist 2016 Aalborg, DK 24 August 2016 Andre MANOEL, Francesco CALTAGIRONE, Marylou GABRIE, Florent
More informationLatent voter model on random regular graphs
Latent voter model on random regular graphs Shirshendu Chatterjee Cornell University (visiting Duke U.) Work in progress with Rick Durrett April 25, 2011 Outline Definition of voter model and duality with
More informationProximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization
Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R
More informationConvexifying the Bethe Free Energy
402 MESHI ET AL. Convexifying the Bethe Free Energy Ofer Meshi Ariel Jaimovich Amir Globerson Nir Friedman School of Computer Science and Engineering Hebrew University, Jerusalem, Israel 91904 meshi,arielj,gamir,nir@cs.huji.ac.il
More informationComposite Loss Functions and Multivariate Regression; Sparse PCA
Composite Loss Functions and Multivariate Regression; Sparse PCA G. Obozinski, B. Taskar, and M. I. Jordan (2009). Joint covariate selection and joint subspace selection for multiple classification problems.
More informationSPIN SYSTEMS: HARDNESS OF APPROXIMATE COUNTING VIA PHASE TRANSITIONS
SPIN SYSTEMS: HARDNESS OF APPROXIMATE COUNTING VIA PHASE TRANSITIONS Andreas Galanis Joint work with: Jin-Yi Cai Leslie Ann Goldberg Heng Guo Mark Jerrum Daniel Štefankovič Eric Vigoda The Power of Randomness
More informationLecture 14: Random Walks, Local Graph Clustering, Linear Programming
CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 14: Random Walks, Local Graph Clustering, Linear Programming Lecturer: Shayan Oveis Gharan 3/01/17 Scribe: Laura Vonessen Disclaimer: These
More informationMessage-Passing Algorithms for GMRFs and Non-Linear Optimization
Message-Passing Algorithms for GMRFs and Non-Linear Optimization Jason Johnson Joint Work with Dmitry Malioutov, Venkat Chandrasekaran and Alan Willsky Stochastic Systems Group, MIT NIPS Workshop: Approximate
More informationHigh-dimensional Statistical Models
High-dimensional Statistical Models Pradeep Ravikumar UT Austin MLSS 2014 1 Curse of Dimensionality Statistical Learning: Given n observations from p(x; θ ), where θ R p, recover signal/parameter θ. For
More informationMCMC and Gibbs Sampling. Kayhan Batmanghelich
MCMC and Gibbs Sampling Kayhan Batmanghelich 1 Approaches to inference l Exact inference algorithms l l l The elimination algorithm Message-passing algorithm (sum-product, belief propagation) The junction
More informationConvex optimization. Javier Peña Carnegie Mellon University. Universidad de los Andes Bogotá, Colombia September 2014
Convex optimization Javier Peña Carnegie Mellon University Universidad de los Andes Bogotá, Colombia September 2014 1 / 41 Convex optimization Problem of the form where Q R n convex set: min x f(x) x Q,
More informationLinear Sketches A Useful Tool in Streaming and Compressive Sensing
Linear Sketches A Useful Tool in Streaming and Compressive Sensing Qin Zhang 1-1 Linear sketch Random linear projection M : R n R k that preserves properties of any v R n with high prob. where k n. M =
More informationSolving Corrupted Quadratic Equations, Provably
Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin
More informationMatrix Completion from a Few Entries
Matrix Completion from a Few Entries Raghunandan H. Keshavan and Sewoong Oh EE Department Stanford University, Stanford, CA 9434 Andrea Montanari EE and Statistics Departments Stanford University, Stanford,
More informationApproximate Message Passing Algorithms
November 4, 2017 Outline AMP (Donoho et al., 2009, 2010a) Motivations Derivations from a message-passing perspective Limitations Extensions Generalized Approximate Message Passing (GAMP) (Rangan, 2011)
More informationSurvey Propagation: Iterative Solutions to Constraint Satisfaction Problems
Survey Propagation: Iterative Solutions to Constraint Satisfaction Problems Constantine Caramanis September 28, 2003 Abstract Iterative algorithms, such as the well known Belief Propagation algorithm,
More informationMachine Learning 4771
Machine Learning 4771 Instructor: ony Jebara Kalman Filtering Linear Dynamical Systems and Kalman Filtering Structure from Motion Linear Dynamical Systems Audio: x=pitch y=acoustic waveform Vision: x=object
More informationLinear and conic programming relaxations: Graph structure and message-passing
Linear and conic programming relaxations: Graph structure and message-passing Martin Wainwright UC Berkeley Departments of EECS and Statistics Banff Workshop Partially supported by grants from: National
More information