Approximate Message Passing Algorithms

Similar documents
Rigorous Dynamics and Consistent Estimation in Arbitrarily Conditioned Linear Systems

Vector Approximate Message Passing. Phil Schniter

Phil Schniter. Supported in part by NSF grants IIP , CCF , and CCF

Approximate Message Passing with Built-in Parameter Estimation for Sparse Signal Recovery

Compressive Sensing under Matrix Uncertainties: An Approximate Message Passing Approach

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Message Passing Algorithms for Compressed Sensing: II. Analysis and Validation

On convergence of Approximate Message Passing

Probabilistic Graphical Models Lecture Notes Fall 2009

Message Passing Algorithms for Compressed Sensing: I. Motivation and Construction

Sparse Superposition Codes for the Gaussian Channel

9 Forward-backward algorithm, sum-product on factor graphs

Message passing and approximate message passing

Inferring Sparsity: Compressed Sensing Using Generalized Restricted Boltzmann Machines. Eric W. Tramel. itwist 2016 Aalborg, DK 24 August 2016

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)

Risk and Noise Estimation in High Dimensional Statistics via State Evolution

Chapter 16. Structured Probabilistic Models for Deep Learning

Statistical Image Recovery: A Message-Passing Perspective. Phil Schniter

Improving Approximate Message Passing Recovery of Sparse Binary Vectors by Post Processing

Chris Bishop s PRML Ch. 8: Graphical Models

Scalable Inference for Neuronal Connectivity from Calcium Imaging

OWL to the rescue of LASSO

Course 16:198:520: Introduction To Artificial Intelligence Lecture 9. Markov Networks. Abdeslam Boularias. Monday, October 14, 2015

Approximate Message Passing

BM3D-prGAMP: Compressive Phase Retrieval Based on BM3D Denoising

Markov Random Fields

Undirected Graphical Models

Phil Schniter. Collaborators: Jason Jeremy and Volkan

AMP-Inspired Deep Networks for Sparse Linear Inverse Problems

How to Design Message Passing Algorithms for Compressed Sensing

MMSE Denoising of 2-D Signals Using Consistent Cycle Spinning Algorithm

Binary Classification and Feature Selection via Generalized Approximate Message Passing

Undirected graphical models

Does l p -minimization outperform l 1 -minimization?

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization

Minimizing Isotropic Total Variation without Subiterations

Bayesian Machine Learning

Turbo-AMP: A Graphical-Models Approach to Compressive Inference

Noisy Signal Recovery via Iterative Reweighted L1-Minimization

LDPC Codes. Intracom Telecom, Peania

Approximate Message Passing for Bilinear Models

Recent developments on sparse representation

Single-Gaussian Messages and Noise Thresholds for Low-Density Lattice Codes

Sparsity Regularization

An equivalence between high dimensional Bayes optimal inference and M-estimation

Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing

ECE531 Screencast 9.2: N-P Detection with an Infinite Number of Possible Observations

This is an author-deposited version published in : Eprints ID : 16992

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc.

GREEDY SIGNAL RECOVERY REVIEW

Simultaneous Multi-frame MAP Super-Resolution Video Enhancement using Spatio-temporal Priors

Compressed Sensing and Neural Networks

Introduction to Compressed Sensing

An Overview of Multi-Processor Approximate Message Passing

Lecture 9: PGM Learning

Bayesian Machine Learning - Lecture 7

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

Optimization methods

High dimensional Ising model selection

arxiv: v1 [cs.it] 21 Feb 2013

sparse and low-rank tensor recovery Cubic-Sketching

Learning MMSE Optimal Thresholds for FISTA

Probabilistic Graphical Models

CPSC 540: Machine Learning

Estimating Unknown Sparsity in Compressed Sensing

Submodularity in Machine Learning

Recent Developments in Compressed Sensing

Variational Inference (11/04/13)

The Minimax Noise Sensitivity in Compressed Sensing

Optimization methods

High-dimensional graphical model selection: Practical and information-theoretic limits

Directed and Undirected Graphical Models

Continuous State MRF s

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

Passing and Interference Coordination

Probabilistic Graphical Models

DNNs for Sparse Coding and Dictionary Learning

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Rapid, Robust, and Reliable Blind Deconvolution via Nonconvex Optimization

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

1 Undirected Graphical Models. 2 Markov Random Fields (MRFs)

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

13: Variational inference II

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

Graphical Models and Kernel Methods

13 : Variational Inference: Loopy Belief Propagation and Mean Field

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond

Directed and Undirected Graphical Models

5. Density evolution. Density evolution 5-1

Estimating LASSO Risk and Noise Level

Enhanced Compressive Sensing and More

Sparse Approximation and Variable Selection

Probabilistic Graphical Models

Reconstruction from Anisotropic Random Measurements

An iterative hard thresholding estimator for low rank matrix recovery

Tractable Upper Bounds on the Restricted Isometry Constant

Introduction to Low-Density Parity Check Codes. Brian Kurkoski

Compressed Sensing and Linear Codes over Real Numbers

Message-Passing Algorithms for GMRFs and Non-Linear Optimization

Introduction to the Tensor Train Decomposition and Its Applications in Machine Learning

Transcription:

November 4, 2017

Outline AMP (Donoho et al., 2009, 2010a) Motivations Derivations from a message-passing perspective Limitations Extensions Generalized Approximate Message Passing (GAMP) (Rangan, 2011) Vector Approximate Message Passing (VAMP) (Schniter et al., 2016; Rangan et al., 2017) institution-logo-filen

Compressed Sensing Most of the data is redundant Enormously wasteful in storage and transmission institution-logo-filen

Compressed Sensing y R n measurement vector x R N unknown sparse signal vector A R n N incoherent measurement matrix with n < N w R n measurement noise

Compressed Sensing Noiseless: y = Ax min x p. (1) y=ax Noisy: y = Ax + w 1 min x 2 y Ax 2 2 + λ x p. (2)

Compressed Sensing p = 2: l 2 minimization mostly gives unsatisfactory results as real world signals are often compressible p = 0: l 0 minimization though gives accurate results, it has computational disadvantage of being a NP hard problem p = 1: l 1 minimization is computationally tractable and has theoretical upper bound of the reconstruction error We focus on the l 1 cases below.

Disadvantages of LP methods Convex optimization (LP-based) methods yield accurate reconstructions (Candès and Wakin, 2008) for (1), but Realistic modern problems in spectroscopy and medical imaging demand reconstructions of objects with tens of thousands or even millions of unknowns. Existing convex optimization algorithms are too slow on large problems

Iterative Shrinkage/Thresholding Algorithm (ISTA) Notice that ( y Ax 2 2/2) = A T (Ax y). Let η( ) be a scalar soft-thresholding function (applied on vectors component-wisely), the ISTA updates for (2) z t = y Ax t x t+1 = η(x t + 1 ρ AT z t ; λ). (3)

ISTA Low per-iteration cost: matrix vector multiplications Convergence rate: O(1/t)

FISTA FISTA (Beck and Teboulle, 2009) update: z t = y Ax t u t+1 = η(x t + 1 ρ AT z t ; λ) x t+1 = u t+1 + ( s t 1 s t+1 )(u t+1 u t ) s 0 = 0, s t+1 = (1 + 1 + 4s 2 t )/2 (not unique). Convergence rate: O(1/t 2 ) (4) Faster algorithms if A is large, random matrix (e.g., i.i.d Gaussian)? institution-logo-filen

Markov Random Fields Suppose that we are modeling selection preferences among persons A,B,C,D. Based on the Hammersley Clifford theorem, we can model the joint probability (p > 0) p(a, B, C, D) = 1 φ(a, B)φ(B, C )φ(c, D)φ(D, A), Z where Z is the normalization constant (MRF). institution-logo-filen

Markov Random Fields Marginal distribution: find p(a), p(b), p(c), p(d) Maximizer: find argmax a,b,c,d p(a, b, c, d) With k nodes each taking s values, O(s k ) computations for brute force methods!

Message Passing (Belief Propagation)

Message Passing Message from i to node j : m i j (x j ) Messages are similar to likelihoods: non-negative (don t have to sum to 1) A high value of m i j (x j ) indicates that node i believes the marginal value p(x j ) to be high Usually initialize all messages to 1 (or random positive values).

Message Passing Sum-product message passing: m i j (x j ) = φ(x i, x j ) x i m l i (x i ) l N (i)\j m B D (x D ) = x B φ(x B, x D )m A B (x B )m C B (x B ) Marginal distribution p(x i ) = l N (i) m l i(x i ). institution-logo-filen

Message Passing

Message Passing Noiseless: p 1 (x 1... x N ) 1 Z N n exp( β x i ) δ {yj =(Ax) j } (5) i=1 j =1 Noisy: p 2 (x 1... x N ) 1 Z N n exp( β x i ) exp{ β 2 [y j (Ax) j ] 2 } i=1 j =1 (6) Find marginal distribution p 1 (x i ) and p 2 (x i ) when β. institution-logo-filen

Approximate Message Passing Construct a undirected graphical model (last slide) Large system limit (thermodynamic limit N, δ = n/n fixed) Large β limit (low temperature limit) From message passing to AMP (Onsager correction)

Approximate Message Passing Message passing for (5): z t a i = y a j i A aj x t j a x t+1 i a = η( b a A bi z t b i; τ t ) τ t+1 = τ t N δ N η ( b i=1 A bi z t b i; τ t ) (7) O(nN ) messages passing per-iteration!

Approximate Message Passing AMP for (5): z t = y Ax t + 1 δ z t 1 η (A T z t 1 + x t 1 ; τ t 1 ) x t+1 = η(a T z t + x t ; τ t ) (8) τ t = λt 1 δ η (A T z t 1 + x t ; τ t 1 ) Efficient: vectorized updates Parameter free: threshold is updated recursively (noiseless problem, no λ) institution-logo-filen

ISTA vs. AMP (Noisy) Recall 1 δ z t 1 η (x t 1 + A T z t 1 ; τ t 1 ) = 1 n x t 0 z t 1. ISTA AMP z t = y Ax t x t+1 = η(x t + 1 ρ AT z t ; λ) z t = y Ax t + 1 n x t 0 z t 1 x t+1 = η(x t + A T z t ; λ t ) stepsize momentum term iteration dependent thresholding λ t = λ + τ t with τ t update similarly, see Donoho et al. (2010a) institution-logo-filen

Onsager Correction (Thouless et al., 1977)

NMSE (log 10 ) AMP Demo n = 500 N = 1000 x 0 = 50 A ij i.i.d N (0, 1) iid Gaussian (scaled by 1/ n) w additive white Gaussian noise (AWGN) with SNR 40 db λ = 2 log N ˆσ λ t = αˆσ t with α = 1 and ˆσ t = z t 2 2/n 0-1 -2-3 -4 ISTA FISTA OWL-QN AMP -5 0 0.5 1 1.5 2 2.5 3 Iterations (log 10 ) institution-logo-filen

NMSE (log 10 ) NMSE (log 10 ) NMSE (log 10 ) NMSE (log 10 ) NMSE (log 10 ) NMSE (log 10 ) AMP Demo 0 0 0-1 ISTA FISTA OWL-QN AMP -1 ISTA FISTA OWL-QN AMP -1 ISTA FISTA OWL-QN AMP -2-2 -2-3 -3-3 -4-4 -4-5 0 0.5 1 1.5 2 2.5 3 Iterations (log 10 ) -5 0 0.5 1 1.5 2 2.5 3 Iterations (log 10 ) -5 0 0.5 1 1.5 2 2.5 3 Iterations (log 10 ) 0 0 0-1 ISTA FISTA OWL-QN AMP -1 ISTA FISTA OWL-QN AMP -1 ISTA FISTA OWL-QN AMP -2-2 -2-3 -3-3 -4-4 -4-5 0 0.5 1 1.5 2 2.5 3 Iterations (log 10 ) -5 0 0.5 1 1.5 2 2.5 3 Iterations (log 10 ) -5 0 0.5 1 1.5 2 2.5 3 Iterations (log 10 ) j k Figure: A = N (0, [ρ 0 ])/ n, ρ 0 = 0, 0.1, 0.15 (top) 0.17, 0.18, 0.20 (bottom). institution-logo-filen

NMSE (log 10 ) NMSE (log 10 ) NMSE (log 10 ) institution-logo-filen AMP Demo 0 0 0-1 ISTA FISTA OWL-QN AMP -1 ISTA FISTA OWL-QN AMP -1 ISTA FISTA OWL-QN AMP -2-2 -2-3 -3-3 -4-4 -4-5 0 0.5 1 1.5 2 2.5 3 Iterations (log 10 ) -5 0 0.5 1 1.5 2 2.5 3 Iterations (log 10 ) -5 0 0.5 1 1.5 2 2.5 3 Iterations (log 10 ) Figure: ρ 0 = 0.2, 0.3, 0.5, stepsize s = 0.95, 0.9, 0.5. Line search?

State Evolution The AMP iterates r t x t + A T z t = x + N (0, σ 2 t I N N ) ε t 1 N E( x t x 2 2) obeys a scalar recursion (Donoho et al., 2010b): σ 2 t = σ 2 w + N ε t /n, ε t+1 = 1 N E( η(x + N (0, σ 2 t I N N ); λ t ) x 2 2)

Phase Transition Figure: Observed phase transitions of reconstruction algorithms. institution-logo-filen

Universality of Phase Transition Observed universality of phase transitions in high-dimensional geometry, with implications for modern data analysis and signal processing (Donoho and Tanner, 2009)

Limitations of AMP y is a linear transformation of the signal x with additive noise A large i.i.d (sub) Gaussian

Generalized approximate message passing Recover the sparse signal x given A and measurements y v = Ax y p(y v), where p(y v) captures the non-gaussianity. Binary classifications (probit, logit models) Poisson measurements (photon-limited imaging, neural spike models)

GAMP vs. FISTA n = 200, N = 500, y = Ax + w (w complex noise vector)

GAMP SVM n = N /3, N = 512, y = (1/2)[sgn(Ax + w) + 1]

GAMP Mixed Gaussian n = 500, N = 1000, y = Ax + w with w = φn (µ 1, 1) + (1 φ)n (µ 2, 1).

VAMP Standard Linear Model Denoising x t 1 = η(r t 1, λ t 1) α t 1 = η (r t 1, λ t 1) λ t 2 = (1/α t 1 1)λ t 1 r t 2 = (x t 1/α t 1 r t 1)(λ t 1/λ t 2) LMMSE estimation x t 2 = g(r t 2, λ t 2) α t 2 = g (r t 2, λ t 2) λ t+1 1 = (1/α t 2 1)λ t 2 r t+1 1 = (x t 2/α t 2 r t 2)(λ t 2/λ t+1 1 ) x t 2 = argminˆx E x ˆx 2 2 s.t. ˆx = Ŵ y + ˆb, where x N (r t 2, (λ t 2) 1 I ) and p(y x) = N (y, Ax; σ 2 wi ).

VAMP VAMP generalizes AMP to right-notionally invariant A distribution of A is identical to that of AV 0 for any V0 T V 0 = V 0 V0 T = I VAMP alternates between denoising (shrinkage) and LMMSE steps with Onsager corrections VAMP has similar per-iteration cost compared to AMP VAMP can be extended to generalized linear models

VAMP AWGN

VAMP Probit

I Beck, A. and Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1):183 202. Candès, E. J. and Wakin, M. B. (2008). An introduction to compressive sampling. IEEE signal processing magazine, 25(2):21 30. Donoho, D. and Tanner, J. (2009). Observed universality of phase transitions in high-dimensional geometry, with implications for modern data analysis and signal processing. Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, 367(1906):4273 4293. Donoho, D. L., Maleki, A., and Montanari, A. (2009). Message-passing algorithms for compressed sensing. Proceedings of the National Academy of Sciences, 106(45):18914 18919. institution-logo-filen

II Donoho, D. L., Maleki, A., and Montanari, A. (2010a). Message passing algorithms for compressed sensing: I. motivation and construction. In 2010 IEEE Information Theory Workshop on Information Theory (ITW 2010, Cairo), pages 1 5. Donoho, D. L., Maleki, A., and Montanari, A. (2010b). Message passing algorithms for compressed sensing: II. analysis and validation. In 2010 IEEE Information Theory Workshop on Information Theory (ITW 2010, Cairo), pages 1 5. Rangan, S. (2011). Generalized approximate message passing for estimation with random linear mixing. In 2011 IEEE International Symposium on Information Theory Proceedings, pages 2168 2172. Rangan, S., Schniter, P., and Fletcher, A. K. (2017). Vector approximate message passing. In 2017 IEEE International Symposium on Information Theory (ISIT), pages 1588 1592. institution-logo-filen

III Schniter, P., Rangan, S., and Fletcher, A. K. (2016). Vector approximate message passing for the generalized linear model. In 2016 50th Asilomar Conference on Signals, Systems and Computers, pages 1525 1529. Thouless, D. J., Anderson, P. W., and Palmer, R. G. (1977). Solution of solvable model of a spin glass. The Philosophical Magazine: A Journal of Theoretical Experimental and Applied Physics, 35(3):593 601.