Turbo-AMP: A Graphical-Models Approach to Compressive Inference

Similar documents
Compressive Sensing under Matrix Uncertainties: An Approximate Message Passing Approach

Approximate Message Passing for Bilinear Models

On convergence of Approximate Message Passing

Phil Schniter. Collaborators: Jason Jeremy and Volkan

Approximate Message Passing with Built-in Parameter Estimation for Sparse Signal Recovery

Phil Schniter. Supported in part by NSF grants IIP , CCF , and CCF

Rigorous Dynamics and Consistent Estimation in Arbitrarily Conditioned Linear Systems

Binary Classification and Feature Selection via Generalized Approximate Message Passing

Approximate Message Passing Algorithms

Vector Approximate Message Passing. Phil Schniter

Compressive Imaging Using Approximate Message Passing and a Markov-Tree Prior

Low Resolution Adaptive Compressed Sensing for mmwave MIMO receivers

Mismatched Estimation in Large Linear Systems

STA 4273H: Statistical Machine Learning

Probabilistic Graphical Models

Sparse Superposition Codes for the Gaussian Channel

Particle Filtered Modified-CS (PaFiMoCS) for tracking signal sequences

Expectation propagation for signal detection in flat-fading channels

Performance Regions in Compressed Sensing from Noisy Measurements

Introduction to Machine Learning

Expectation propagation for symbol detection in large-scale MIMO communications

9 Forward-backward algorithm, sum-product on factor graphs

Joint Channel Estimation and Co-Channel Interference Mitigation in Wireless Networks Using Belief Propagation

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Approximate Message Passing

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Passing and Interference Coordination

Soft-Output Trellis Waveform Coding

Pattern Recognition and Machine Learning

Inferring Sparsity: Compressed Sensing Using Generalized Restricted Boltzmann Machines. Eric W. Tramel. itwist 2016 Aalborg, DK 24 August 2016

Linear Dynamical Systems

Statistical Image Recovery: A Message-Passing Perspective. Phil Schniter

Model-Based Compressive Sensing for Signal Ensembles. Marco F. Duarte Volkan Cevher Richard G. Baraniuk

5. Density evolution. Density evolution 5-1

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2

A Convex Approach for Designing Good Linear Embeddings. Chinmay Hegde

Recent Advances in Bayesian Inference Techniques

Design of Projection Matrix for Compressive Sensing by Nonsmooth Optimization

Message Passing Algorithms for Compressed Sensing: II. Analysis and Validation

Breaking the coherence barrier - A new theory for compressed sensing

A Message-Passing Approach for Joint Channel Estimation, Interference Mitigation and Decoding

Decoupling of CDMA Multiuser Detection via the Replica Method

Sparse Signal Recovery Using Markov Random Fields

Hybrid Approximate Message Passing for Generalized Group Sparsity

Optimality of Large MIMO Detection via Approximate Message Passing

Compressed Sensing and Linear Codes over Real Numbers

The Minimax Noise Sensitivity in Compressed Sensing

Single-letter Characterization of Signal Estimation from Linear Measurements

Joint FEC Encoder and Linear Precoder Design for MIMO Systems with Antenna Correlation

STA414/2104 Statistical Methods for Machine Learning II

Stability of Modified-CS over Time for recursive causal sparse reconstruction

Lecture 22: More On Compressed Sensing

Bayesian Networks BY: MOHAMAD ALSABBAGH

Compressed Sensing and Neural Networks

Approximate Message Passing Algorithm for Complex Separable Compressed Imaging

Mismatched Estimation in Large Linear Systems

STA 414/2104: Machine Learning

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design

CPSC 540: Machine Learning

arxiv: v1 [cs.it] 21 Feb 2013

CO-OPERATION among multiple cognitive radio (CR)

Learning Bayesian network : Given structure and completely observed data

Graphical Models for Collaborative Filtering

An equivalence between high dimensional Bayes optimal inference and M-estimation

Massive MIMO mmwave Channel Estimation Using Approximate Message Passing and Laplacian Prior

Exact Reconstruction Conditions and Error Bounds for Regularized Modified Basis Pursuit (Reg-Modified-BP)

LEARNING DATA TRIAGE: LINEAR DECODING WORKS FOR COMPRESSIVE MRI. Yen-Huan Li and Volkan Cevher

Risk and Noise Estimation in High Dimensional Statistics via State Evolution

Message passing and approximate message passing

arxiv:cs/ v2 [cs.it] 1 Oct 2006

Adaptive Compressive Imaging Using Sparse Hierarchical Learned Dictionaries

The Concept of Soft Channel Encoding and its Applications in Wireless Relay Networks

Expectation Propagation in Dynamical Systems

STA 4273H: Statistical Machine Learning

Compressed sensing and imaging

Measurements vs. Bits: Compressed Sensing meets Information Theory

Multipath Matching Pursuit

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Sparsity Regularization

Part 1: Expectation Propagation

Probabilistic Graphical Models

STA 4273H: Statistical Machine Learning

Compressed Sensing with Shannon-Kotel nikov Mapping in the Presence of Noise

Tree-Structured Compressive Sensing with Variational Bayesian Analysis

Performance Analysis and Code Optimization of Low Density Parity-Check Codes on Rayleigh Fading Channels

Sequential Monte Carlo Methods for Bayesian Computation

p L yi z n m x N n xi

Blind phase/frequency synchronization with low-precision ADC: a Bayesian approach

Dynamic Approaches: The Hidden Markov Model

Exploiting Structure in Wavelet-Based Bayesian Compressive Sensing

Bayesian Machine Learning

Compressive Sensing and Beyond

A Gaussian Tree Approximation for Integer Least-Squares

ECEN 655: Advanced Channel Coding

9 Multi-Model State Estimation

Recovering overcomplete sparse representations from structured sensing

Estimation Error Bounds for Frame Denoising

Fast Hard Thresholding with Nesterov s Gradient Method

The Pros and Cons of Compressive Sensing

Transcription:

Turbo-AMP: A Graphical-Models Approach to Compressive Inference Phil Schniter (With support from NSF CCF-1018368 and DARPA/ONR N66001-10-1-4090.) June 27, 2012 1

Outline: 1. Motivation. (a) the need for non-linear inference schemes, (b) some problems if interest. 2. The focus of this talk: (a) compressive sensing (in theory), (b) compressive sensing (in practice). 3. Recent approaches to these respective problems: (a) approximate message passing (AMP), (b) turbo-amp. 4. Illustrative applications of turbo-amp: (a) compressive imaging, (b) compressive tracking, (c) communication over sparse channels. 2

Motivations for nonlinear inference: Linear inference (e.g., matched filtering, linear equalization, least-squares, Kalman filtering, etc.) has been extremely popular in engineering and statistics due to computational efficiency and a well-developed theory. Indeed, linear inference is optimal in problems well-modeled by linear observations and Gaussian signal and noise. In many cases, though, linear inference is not good enough. The signal or noise may be non-gaussian, or the observation mechanism may be nonlinear, in which case linear inference is suboptimal. For example, the observations may be compressed (i.e., sampled below the Nyquist rate), in which case nonlinear inference becomes essential. But is there an accurate and computationally efficient framework for high-dimensional nonlinear inference? For a wide (and expanding) range of problems, Yes! Based on belief propagation or message passing. 3

A few problems of interest: Linear additive: y=ax+w with A known, x p x, w p w examples: communications, imaging, radar, compressive sensing (CS). Generalized linear: y p(y z) for z=ax with A known, x p x examples: quantization, phase retrieval, classification. Generalized bilinear: Y p(y Z) for Z=AX with A p A, X p X examples: dictionary learning, matrix completion, robust PCA. Parametric nonlinear: y p(y z) for z=a(θ)x with A( ) known, θ p θ, x p x examples: frequency estimation, calibration, autofocus. 4

Compressive sensing (in theory): Say N-length signal of interest u is sparse or compressible in a known orthonormal basis Ψ (e.g., wavelet, Fourier, or identity basis): u=ψx, where x has only K N large coefficients. We observe M N noisy linear measurements y: y = Φu+w = ΦΨx+w = Ax+w from which we want to recover u (or, equivalently, x). If A is well-behaved (e.g., satisfies RIP), the sparsity of x can be exploited for provably accurate reconstruction with computationally efficient algorithms. Caution: usually need to tune an algorithmic parameter that balances sparsity with data fidelity. If using cross-validation, this can be expensive! Such A results (with high probability) from Φ constructed randomly (e.g., i.i.d Gaussian) or semi-randomly (e.g., from random rows of fixed unitary Φ). 5

Compressive sensing (in practice): Usually, real-world applications exhibit additional structure... in the support of large signal coefficients (e.g., block, tree, etc.), among the values of large signal coefficients (e.g., correlation, coherence), and exploitation of these additional structures may be essential. But, exploiting this additional structure complicates tuning, since... many more parameters are involved in the model, and mismatch in these parameters can severely bias the signal estimate. Also, many real-world applications are not content with point estimates... since the estimates may be later used for decision-making, control, etc., in which case confidence intervals are needed, or preferably the full posterior probability distribution on the unknowns. 6

Solving the theoretical CS problem AMP: Approximate message passing (AMP) [Donoho/Maleki/Montanari 2009/10] refers to a family of signal reconstruction algorithms that are designed to solve the theoretical CS problem, inspired by principled approximations of belief propagation. AMP highlights: Very computationally efficient: a form of iterative thresholding. Very high performance (with sufficiently large N, M): Can be configured to produce near-map or near-mmse estimates. Admits rigorous asymptotic analysis [Bayati/Montanari 2010, Rangan 2010] (under i.i.d-gaussian A and N,M with fixed N/M): AMP follows a (deterministic) state-evolution trajectory. Agrees with analysis under the (non-rigorous) replica method. Agrees with belief propagation on sparse matrices, where marginal posterior distributions are known to be asymptotically optimal. 7

Solving practical compressive inference problems Turbo-AMP: The Bayesian graphical-model framework is a flexible and powerful way to incorporate and exploit probabilistic structure. Simple sparsity with known model parameters x 1 y 1 x 2 y 2 x 3 y 3 x 4 x 5 p(x y) AMP M N p(y m x) p(x n ) m=1 n=1 8

Solving practical compressive inference problems Turbo-AMP: The Bayesian graphical-model framework is a flexible and powerful way to incorporate and exploit probabilistic structure. Simple sparsity with unknown model parameters x 1 y 1 x 2 ν x ν w y 2 x 3 y 3 x 4 λ x 5 AMP Or treat(ν w,ν x,λ) as deterministic unknowns, and do ML estimation via EM. 9

Solving practical compressive inference problems Turbo-AMP: The Bayesian graphical-model framework is a flexible and powerful way to incorporate and exploit probabilistic structure. Structured sparsity with known model parameters: y 1 x 1 x 2 s 1 s 2 y 2 x 3 s 3 y 3 x 4 s 4 x 5 s 5 AMP MC For these problems, AMP is used as a soft-input soft-output inference block, like a channel decoder in a turbo receiver. [Schniter CISS 10] 10

Solving practical compressive inference problems Turbo-AMP: The Bayesian graphical-model framework is a flexible and powerful way to incorporate and exploit probabilistic structure. Structured sparsity with unknown model parameters: x 1 s 1 y 1 ν w y 2 x 2 x 3 ν x s 2 s 3 ρ y 3 x 4 s 4 λ x 5 s 5 AMP MC For these problems, AMP is used as a soft-input soft-output inference block, like a channel decoder in a turbo receiver. [Schniter CISS 10] 11

So what approximations lead to AMP?: Assume sum-product form of AMP. Then... 1. Message from y i node to x j node: N(y 1 ;[Ax] 1,ν w ) p 1 1 (x 1 ) x 1 p X (x 1 ) N via CLT p i j (x j ) N(y i ; r a ir x r,ν w ) r j p i r (x r ) {xr } r j N(y 2 ;[Ax] 2,ν w ) x 2 p X (x 2 ) zi N(y i ;z i,ν w )N(z i ;ẑ i (x j ),ν z i(x j )) N N(y M ;[Ax] M,ν w ) p M N (x N ) x N p X (x N ) To compute ẑ i (x j ),ν z i (x j), the means and variances of{p i r } r j suffice, thus Gaussian message passing! Remaining problem: we have 2M N messages to compute (too many!). 2. Exploiting similarity among the messages {p i j } M i=1, AMP employs a Taylor-series approximation of their difference whose error vanishes as M for dense A (and similar for{p i j } N i=1 as N ). Finally, need to compute only O(M +N) messages! N(y 1 ;[Ax] 1,ν w ) N(y 2 ;[Ax] 2,ν w ) N(y M ;[Ax] M,ν w ) p 1 1 (x 1 ) p M N (x N ) x 1 x 2 x N p X (x 1 ) p X (x 2 ) p X (x N ) 12

Extrinsic information transfer (EXIT) charts: EXIT charts, developed to predict the convergence of turbo decoding [ten Brink 01], can help to understand the interaction between turbo-amp inference blocks: 0.8 0.7 AMP mutual-info 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 MC mutual-info In this EXIT chart, we are plotting the mutual-information between the true and (AMP or MC)-estimated support pattern{s n }. 13

We will now detail three applications of the turbo-amp approach: 1. Compressive imaging... with (persistence across scales) structure in the signal support. 2. Compressive tracking... with (slow variation) structure in the signal s support and coefficients. 3. Communication over sparse channels...involving a generalized linear model, and...where AMP is embedded in a larger factor graph. 14

1) Compressive imaging: Wavelet representations of natural images are not only sparse, but also exhibit persistence across scales: Can be efficiently modeled using a Bernoulli-Gaussian hidden-markov-tree: p(x n s n )=s n N(x n ;0,ν j )+(1 s n )δ(x n ) for s n {0,1} p(s n s m ) state transition mtx( p00 j 1 p 00 j 1 p 11 j p 11 j y = Φu+w = ΦΨx+w, w N(0,ν w ) ), for n children(m), j=level(n) The model parameters ν w and{ν j,p 00 j,p11 j }J j=0 are treated as random with non-informative hyperpriors (Gamma and Beta, respectively). We approximate those messages by passing only the means. 15

Comparison to other methods: Average over Microsoft Research class recognition database (591 images): For M=5000 random measurements of 128 128 images (N=16384)... Algorithm Authors (year) Time NMSE IRWL1 Duarte, Wakin, Baraniuk (2008) 363 s -14.4 db ModelCS Baraniuk, Cevher, Duarte, Hegde (2010) 117 s -17.4 db Variational Bayes He, Chen, Carin (2010) 107 s -19.0 db MCMC He & Carin (2009) 742 s -20.1 db Turbo-AMP Som & Schniter (2010) 51 s -20.7 db Turbo-AMP beats other approaches simultaneously in speed and accuracy! 16

Comparison to other methods: True ModelCS IRWL1 Variational Bayes MCMC Turbo-AMP 17

2) Compressive tracking / Dynamic compressive sensing: Now say we observe the vector sequence y (t) =A (t) x (t) +w (t), t=1 T, w (t) m i.i.dn(0,ν w ) with sparse x (t) whose coefficients and support change slowly with time t. The slowly varying sparse signal can be modeled as Bernoulli-Gaussian with Gauss-Markov coefficient evolution and Markov-chain support evolution: x n (t) =s n (t) θ n (t) θ n (t) =(1 α)θ n (t 1) for s (t) n {0,1} and θ (t) n R +αv (t) n, v (t) n i.i.dn(0,ν v ) p(s (t) n s (t 1) n ) state transition matrix( p00 1 p 00 1 p 11 p 11 ) where here the model parameters{ν w,ν v,α,p 00,p 11 } are treated as deterministic unknowns and learned using the EM algorithm. Note: Our message-passing framework allows a unified treatment of tracking (i.e., causal estimation of{x (t) } T t=1 ) and smoothing. 18

Factor graph for compressive tracking/smoothing: t......... s (T) 1 y (1) 1 y (1) m y (1) M AMP x (1) 1 x (1) n x (1) N θ (1) 1 θ (1) n θ (1) N s (1) 1 s (1) n s (1) N θ (2) 1 θ (2) n θ (2) N s (2) 1 s (2) n s (2) N... θ (T) 1 θ (T) n θ (T) N... s (T) n s (T) N 19

7 5 Phil Schniter Near-optimal MSE performance: Support aware Kalman smoother TNMSE [db] DCS AMP TNMSE [db] β (E[K]/M) (More active coefficients) 0.9 0.8 0.7 19 0.6 0.5 0.4 0.3 0.2 0.1 22 23 20 21 24 25 26 18 27 28 22 0.2 0.4 0.6 0.8 δ (M/N) (More measurements) β (E[K]/M) (More active coefficients) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 22 23 24 16 17 18 19 20 21 25 26 27 28 0.2 0.4 0.6 0.8 δ (M/N) (More measurements) 15 β (E[K]/M) (More active coefficients) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 3 3 11 5 13 17 9 15 9 19 7 BG AMP TNMSE [db] 21 3 23 11 13 25 5 7 15 17 9 19 0.2 0.4 0.6 0.8 δ (M/N) (More measurements) With a Bernoulli-Gaussian signal, the support-aware Kalman smoother provides an oracle bound on MSE. The proposed dynamic turbo-amp performs very close to the bound, and much better than standard AMP (which does not exploit temporal structure). 20

Performance on dynamic-mri dataset: Frames 1, 2, 5, and 10 of a dynamic MRI image sequence: True images Basis pursuit Modified-CS Turbo-AMP Algorithm TNMSE (db) Runtime Basis Pursuit -17.22 47 min Modified-CS [Vaswani/Lu 09] -34.30 7.39 hrs Turbo-AMP (Filter) -34.62 8.08 sec 21

3) Communication over sparse channels: Consider communicating reliably over a channel that is Rayleigh block-fading with block length B, frequency-selective with delay spread N, sparse impulse response x with K<N non-zero coefs, where both coefs and support are unknown to the transmitter & receiver. The ergodic capacity is C(SNR)= B K B log(snr)+o(1) at high SNR. Say, with B-subcarrier OFDM, we use M pilot subcarriers, yielding observations y p =D p Φ p Ψx+w p with known diagonal pilot matrix D p, selection matrix Φ p, and DFT Ψ. In compressed channel sensing (CCS), the channel x is estimated from y p and the resulting ˆx is used to decode the(b M) data subcarriers y d =D d Φ d Ψx+w d. RIP analyses prescribe the use of M=O(KpolylogN) pilots, but communicating near capacity requires using no more than M = K pilots! 22

Rethinking communication over sparse channels: The fundamental problem with the conventional CCS approach is the separation between channel-estimation and data decoding. To communicate at rates near capacity, we need joint estimation/decoding, which we can do using turbo-amp: uniform prior info bits code & interlv pilots & training coded bits symbol mapping QAM symbs OFDM observ impulse response sparse prior y 0 x 1 y 1 x 2 y 2 x 3 y 3 SISO decoding generalized AMP Note: Here we need to use the generalized AMP from [Rangan 10] Note: We can now place pilots at the bit-level, rather than the symbol level. 23

NMSE & BER versus pilot/sparsity ratio (M/K): Assume B =1024 subcarriers with K =64-sparse channels of length N =256. NMSE [db] 0 5 10 15 20 25 20dB SNR, 64-QAM, 3 bpcu LMMSE LASSO SG BP 1 BP 2 BP- BSG BER 10 0 10 1 10 2 20dB SNR, 64-QAM, 3 bpcu LMMSE LASSO SG BP 1 BP 2 BP- BSG 30 35 1 2 3 4 5 6 pilot-to-sparsity ratio (M/K) implementable schemes LMMSE = LMMSE-based CCS LASSO = LASSO-based CCS BP-n=BP after n turbo iterations 10 3 1 2 3 4 5 6 pilot-to-sparsity ratio (M/K) reference schemes SG = support-aware genie BSG = bit- and support-aware genie For the plots above, we used M uniformly spaced pilot subcarriers. Since spectral efficiency is fixed, more pilots necessitates a weaker code! 24

Outage rate and the importance of bit-level pilots: 0.001-outage rate BER (20dB SNR, 256-QAM, 3.75 bpcu) bpcu 4 3.5 3 2.5 2 1.5 1 0.5 AMP BSG AMP AMP BP BP BP BSG BP BP AMP BSGAMP LASSO SG LASSO LMMSE SG LASSO LMMSE LASSO LMMSE Mt / M / Ls 4 3.5 3 2.5 2 1.5 1 0.5 0 0.5 1 1.5 2 2.5 10 12 14 16 18 20 SNR [db] 0 1 2 3 4 pilot-to-sparsity ratio (M/K) Solid-line rates used 64-QAM and M=4K=N pilot subcarriers. Dashed-line rate used 256-QAM with Klog 2 (256) pilot bits as MSBs. turbo-amp achieves the channel capacity s prelog factor! 25

Conclusions: The AMP algorithm of Donoho/Maleki/Montanari offers a state-of-the-art solution to the theoretical compressed sensing problem. Using a graphical-models framework, we can handle more complicated compressive inference tasks, with structured signals (e.g., Markov structure in imaging & tracking), structured generalized-linear measurements (e.g., code structure in comms), self-tuning (e.g., noise variance, sparsity, Markov parameters), and soft outputs, using the turbo-amp approach, which leverages AMP as a sub-block. Ongoing work includes applying turbo-amp approach to challenging new problems, and analyzing turbo-amp convergence/performance. Matlab code is available at http://ece.osu.edu/~schniter/emturbogamp 26

Thanks! 27

Bibliography: D. L. Donoho, A. Maleki, and A. Montanari, Message passing algorithms for compressed sensing, Proc. National Academy of Sciences, Nov. 2009. D. L. Donoho, A. Maleki, and A. Montanari, Message passing algorithms for compressed sensing: I. Motivation and construction, Information Theory Workshop, Jan. 2010 M. Bayati and A. Montanari, The dynamics of message passing on dense graphs, with applications to compressed sensing, Trans. IT, Feb. 2011. S. Rangan, Generalized approximate message passing for estimation with random linear mixing, arxiv:1010.5141, Oct. 2010. P. Schniter, Turbo reconstruction of structured sparse signals, CISS (Princeton), Mar. 2010. S. Som and P. Schniter, Compressive imaging using approximate message passing and a Markov-tree prior, Trans. SP 2012. (See also Asilomar 2010) J. Ziniel and P. Schniter, Dynamic Compressive Sensing of Time-Varying Signals via Approximate Message Passing, arxiv:1205.4080, 2012. (See also Asilomar 2010) W. Lu and N. Vaswani, Modified compressive sensing for real-time dynamic MR imaging, ICIP, Nov. 2009. A. P. Kannu and P. Schniter, On Communication over Unknown Sparse Frequency-Selective Block-Fading Channels, Trans. IT, Oct. 2011. P. Schniter, A Message-Passing Receiver for BICM-OFDM over Unknown Clustered-Sparse Channels, JSTSP, 2011. (See also Asilomar 2010) 28