Approximate Message Passing Algorithms

November 4, 2017

Outline AMP (Donoho et al., 2009, 2010a) Motivations Derivations from a message-passing perspective Limitations Extensions Generalized Approximate Message Passing (GAMP) (Rangan, 2011) Vector Approximate Message Passing (VAMP) (Schniter et al., 2016; Rangan et al., 2017) institution-logo-filen

Compressed Sensing Most of the data is redundant Enormously wasteful in storage and transmission institution-logo-filen

Compressed Sensing y R n measurement vector x R N unknown sparse signal vector A R n N incoherent measurement matrix with n < N w R n measurement noise

Compressed Sensing Noiseless: y = Ax min x p. (1) y=ax Noisy: y = Ax + w 1 min x 2 y Ax 2 2 + λ x p. (2)

Compressed Sensing p = 2: l 2 minimization mostly gives unsatisfactory results as real world signals are often compressible p = 0: l 0 minimization though gives accurate results, it has computational disadvantage of being a NP hard problem p = 1: l 1 minimization is computationally tractable and has theoretical upper bound of the reconstruction error We focus on the l 1 cases below.

Disadvantages of LP methods Convex optimization (LP-based) methods yield accurate reconstructions (Candès and Wakin, 2008) for (1), but Realistic modern problems in spectroscopy and medical imaging demand reconstructions of objects with tens of thousands or even millions of unknowns. Existing convex optimization algorithms are too slow on large problems

Iterative Shrinkage/Thresholding Algorithm (ISTA) Notice that ( y Ax 2 2/2) = A T (Ax y). Let η( ) be a scalar soft-thresholding function (applied on vectors component-wisely), the ISTA updates for (2) z t = y Ax t x t+1 = η(x t + 1 ρ AT z t ; λ). (3)

ISTA Low per-iteration cost: matrix vector multiplications Convergence rate: O(1/t)

FISTA FISTA (Beck and Teboulle, 2009) update: z t = y Ax t u t+1 = η(x t + 1 ρ AT z t ; λ) x t+1 = u t+1 + ( s t 1 s t+1 )(u t+1 u t ) s 0 = 0, s t+1 = (1 + 1 + 4s 2 t )/2 (not unique). Convergence rate: O(1/t 2 ) (4) Faster algorithms if A is large, random matrix (e.g., i.i.d Gaussian)? institution-logo-filen

Markov Random Fields Suppose that we are modeling selection preferences among persons A,B,C,D. Based on the Hammersley Clifford theorem, we can model the joint probability (p > 0) p(a, B, C, D) = 1 φ(a, B)φ(B, C )φ(c, D)φ(D, A), Z where Z is the normalization constant (MRF). institution-logo-filen

Markov Random Fields Marginal distribution: find p(a), p(b), p(c), p(d) Maximizer: find argmax a,b,c,d p(a, b, c, d) With k nodes each taking s values, O(s k ) computations for brute force methods!

Message Passing (Belief Propagation)

Message Passing Message from i to node j : m i j (x j ) Messages are similar to likelihoods: non-negative (don t have to sum to 1) A high value of m i j (x j ) indicates that node i believes the marginal value p(x j ) to be high Usually initialize all messages to 1 (or random positive values).

Message Passing Sum-product message passing: m i j (x j ) = φ(x i, x j ) x i m l i (x i ) l N (i)\j m B D (x D ) = x B φ(x B, x D )m A B (x B )m C B (x B ) Marginal distribution p(x i ) = l N (i) m l i(x i ). institution-logo-filen

Message Passing

Message Passing Noiseless: p 1 (x 1... x N ) 1 Z N n exp( β x i ) δ {yj =(Ax) j } (5) i=1 j =1 Noisy: p 2 (x 1... x N ) 1 Z N n exp( β x i ) exp{ β 2 [y j (Ax) j ] 2 } i=1 j =1 (6) Find marginal distribution p 1 (x i ) and p 2 (x i ) when β. institution-logo-filen

Approximate Message Passing Construct a undirected graphical model (last slide) Large system limit (thermodynamic limit N, δ = n/n fixed) Large β limit (low temperature limit) From message passing to AMP (Onsager correction)

Approximate Message Passing Message passing for (5): z t a i = y a j i A aj x t j a x t+1 i a = η( b a A bi z t b i; τ t ) τ t+1 = τ t N δ N η ( b i=1 A bi z t b i; τ t ) (7) O(nN ) messages passing per-iteration!

Approximate Message Passing AMP for (5): z t = y Ax t + 1 δ z t 1 η (A T z t 1 + x t 1 ; τ t 1 ) x t+1 = η(a T z t + x t ; τ t ) (8) τ t = λt 1 δ η (A T z t 1 + x t ; τ t 1 ) Efficient: vectorized updates Parameter free: threshold is updated recursively (noiseless problem, no λ) institution-logo-filen

ISTA vs. AMP (Noisy) Recall 1 δ z t 1 η (x t 1 + A T z t 1 ; τ t 1 ) = 1 n x t 0 z t 1. ISTA AMP z t = y Ax t x t+1 = η(x t + 1 ρ AT z t ; λ) z t = y Ax t + 1 n x t 0 z t 1 x t+1 = η(x t + A T z t ; λ t ) stepsize momentum term iteration dependent thresholding λ t = λ + τ t with τ t update similarly, see Donoho et al. (2010a) institution-logo-filen

Onsager Correction (Thouless et al., 1977)

NMSE (log 10 ) AMP Demo n = 500 N = 1000 x 0 = 50 A ij i.i.d N (0, 1) iid Gaussian (scaled by 1/ n) w additive white Gaussian noise (AWGN) with SNR 40 db λ = 2 log N ˆσ λ t = αˆσ t with α = 1 and ˆσ t = z t 2 2/n 0-1 -2-3 -4 ISTA FISTA OWL-QN AMP -5 0 0.5 1 1.5 2 2.5 3 Iterations (log 10 ) institution-logo-filen

NMSE (log 10 ) NMSE (log 10 ) NMSE (log 10 ) NMSE (log 10 ) NMSE (log 10 ) NMSE (log 10 ) AMP Demo 0 0 0-1 ISTA FISTA OWL-QN AMP -1 ISTA FISTA OWL-QN AMP -1 ISTA FISTA OWL-QN AMP -2-2 -2-3 -3-3 -4-4 -4-5 0 0.5 1 1.5 2 2.5 3 Iterations (log 10 ) -5 0 0.5 1 1.5 2 2.5 3 Iterations (log 10 ) -5 0 0.5 1 1.5 2 2.5 3 Iterations (log 10 ) 0 0 0-1 ISTA FISTA OWL-QN AMP -1 ISTA FISTA OWL-QN AMP -1 ISTA FISTA OWL-QN AMP -2-2 -2-3 -3-3 -4-4 -4-5 0 0.5 1 1.5 2 2.5 3 Iterations (log 10 ) -5 0 0.5 1 1.5 2 2.5 3 Iterations (log 10 ) -5 0 0.5 1 1.5 2 2.5 3 Iterations (log 10 ) j k Figure: A = N (0, [ρ 0 ])/ n, ρ 0 = 0, 0.1, 0.15 (top) 0.17, 0.18, 0.20 (bottom). institution-logo-filen

NMSE (log 10 ) NMSE (log 10 ) NMSE (log 10 ) institution-logo-filen AMP Demo 0 0 0-1 ISTA FISTA OWL-QN AMP -1 ISTA FISTA OWL-QN AMP -1 ISTA FISTA OWL-QN AMP -2-2 -2-3 -3-3 -4-4 -4-5 0 0.5 1 1.5 2 2.5 3 Iterations (log 10 ) -5 0 0.5 1 1.5 2 2.5 3 Iterations (log 10 ) -5 0 0.5 1 1.5 2 2.5 3 Iterations (log 10 ) Figure: ρ 0 = 0.2, 0.3, 0.5, stepsize s = 0.95, 0.9, 0.5. Line search?

State Evolution The AMP iterates r t x t + A T z t = x + N (0, σ 2 t I N N ) ε t 1 N E( x t x 2 2) obeys a scalar recursion (Donoho et al., 2010b): σ 2 t = σ 2 w + N ε t /n, ε t+1 = 1 N E( η(x + N (0, σ 2 t I N N ); λ t ) x 2 2)

Phase Transition Figure: Observed phase transitions of reconstruction algorithms. institution-logo-filen

Universality of Phase Transition Observed universality of phase transitions in high-dimensional geometry, with implications for modern data analysis and signal processing (Donoho and Tanner, 2009)

Limitations of AMP y is a linear transformation of the signal x with additive noise A large i.i.d (sub) Gaussian

Generalized approximate message passing Recover the sparse signal x given A and measurements y v = Ax y p(y v), where p(y v) captures the non-gaussianity. Binary classifications (probit, logit models) Poisson measurements (photon-limited imaging, neural spike models)

GAMP vs. FISTA n = 200, N = 500, y = Ax + w (w complex noise vector)

GAMP SVM n = N /3, N = 512, y = (1/2)[sgn(Ax + w) + 1]

GAMP Mixed Gaussian n = 500, N = 1000, y = Ax + w with w = φn (µ 1, 1) + (1 φ)n (µ 2, 1).

VAMP Standard Linear Model Denoising x t 1 = η(r t 1, λ t 1) α t 1 = η (r t 1, λ t 1) λ t 2 = (1/α t 1 1)λ t 1 r t 2 = (x t 1/α t 1 r t 1)(λ t 1/λ t 2) LMMSE estimation x t 2 = g(r t 2, λ t 2) α t 2 = g (r t 2, λ t 2) λ t+1 1 = (1/α t 2 1)λ t 2 r t+1 1 = (x t 2/α t 2 r t 2)(λ t 2/λ t+1 1 ) x t 2 = argminˆx E x ˆx 2 2 s.t. ˆx = Ŵ y + ˆb, where x N (r t 2, (λ t 2) 1 I ) and p(y x) = N (y, Ax; σ 2 wi ).

VAMP VAMP generalizes AMP to right-notionally invariant A distribution of A is identical to that of AV 0 for any V0 T V 0 = V 0 V0 T = I VAMP alternates between denoising (shrinkage) and LMMSE steps with Onsager corrections VAMP has similar per-iteration cost compared to AMP VAMP can be extended to generalized linear models

VAMP AWGN

VAMP Probit

I Beck, A. and Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1):183 202. Candès, E. J. and Wakin, M. B. (2008). An introduction to compressive sampling. IEEE signal processing magazine, 25(2):21 30. Donoho, D. and Tanner, J. (2009). Observed universality of phase transitions in high-dimensional geometry, with implications for modern data analysis and signal processing. Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, 367(1906):4273 4293. Donoho, D. L., Maleki, A., and Montanari, A. (2009). Message-passing algorithms for compressed sensing. Proceedings of the National Academy of Sciences, 106(45):18914 18919. institution-logo-filen

II Donoho, D. L., Maleki, A., and Montanari, A. (2010a). Message passing algorithms for compressed sensing: I. motivation and construction. In 2010 IEEE Information Theory Workshop on Information Theory (ITW 2010, Cairo), pages 1 5. Donoho, D. L., Maleki, A., and Montanari, A. (2010b). Message passing algorithms for compressed sensing: II. analysis and validation. In 2010 IEEE Information Theory Workshop on Information Theory (ITW 2010, Cairo), pages 1 5. Rangan, S. (2011). Generalized approximate message passing for estimation with random linear mixing. In 2011 IEEE International Symposium on Information Theory Proceedings, pages 2168 2172. Rangan, S., Schniter, P., and Fletcher, A. K. (2017). Vector approximate message passing. In 2017 IEEE International Symposium on Information Theory (ISIT), pages 1588 1592. institution-logo-filen

III Schniter, P., Rangan, S., and Fletcher, A. K. (2016). Vector approximate message passing for the generalized linear model. In 2016 50th Asilomar Conference on Signals, Systems and Computers, pages 1525 1529. Thouless, D. J., Anderson, P. W., and Palmer, R. G. (1977). Solution of solvable model of a spin glass. The Philosophical Magazine: A Journal of Theoretical Experimental and Applied Physics, 35(3):593 601.