I. INTRODUCTION. A. Motivation

Size: px

Start display at page:

Download "I. INTRODUCTION. A. Motivation"

Julia Manning
5 years ago
Views:

1 Approximate Message Passing Algoritm wit Universal Denoising and Gaussian Mixture Learning Yanting Ma, Student Member, IEEE, Junan Zu, Student Member, IEEE, and Dror Baron, Senior Member, IEEE Abstract We study compressed sensing (CS) signal reconstruction problems were an input signal is measured via matrix multiplication under additive wite Gaussian noise. Our signals are assumed to be stationary and ergodic, but te input statistics are unknown; te goal is to provide reconstruction algoritms tat are universal to te input statistics. We present a novel algoritmic framework tat combines: (i) te approximate message passing (AMP) CS reconstruction framework, wic solves te matrix cannel recovery problem by iterative scalar cannel denoising; (ii) a universal denoising sceme based on context quantization, wic partitions te stationary ergodic signal denoising into independent and identically distributed (i.i.d.) subsequence denoising; and (iii) a density estimation approac tat approximates te probability distribution of an i.i.d. sequence by fitting a Gaussian mixture (GM) model. In addition to te algoritmic framework, we provide tree contributions: (i) numerical results sowing tat state evolution olds for non-separable Bayesian sliding-window denoisers; (ii) an i.i.d. denoiser based on a modified GM learning algoritm; and (iii) a universal denoiser tat does not need information about te range were te input takes values from or require te input signal to be bounded. We provide two implementations of our universal CS recovery algoritm wit one being faster and te oter being more accurate. Te two implementations compare favorably wit existing universal reconstruction algoritms in terms of bot reconstruction quality and runtime. Index Terms approximate message passing, compressed sensing, Gaussian mixture model, universal denoising. A. Motivation I. INTRODUCTION Many scientific and engineering problems can be approximated as linear systems of te form y = Ax + z, (1) were x R N is te unknown input signal, A R M N is te matrix tat caracterizes te linear system, and z R M is measurement noise. Te goal is to estimate x from te measurements y given A and statistical information about z. Wen M N, te setup is known as compressed sensing (CS); by posing a sparsity or compressibility requirement on te signal, it is indeed possible to accurately recover x from Copyrigt (c) 015 IEEE. Personal use of tis material is permitted. However, permission to use tis material for any oter purposes must be obtained from te IEEE by sending a request to pubs-permissions@ieee.org. Tis work was supported in part by te National Science Foundation under te Grant CCF and in part by te U.S. Army Researc Office under te Grant W911NF Portions of te work appeared at te 5nd Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, Oct. 014 [1]. Yanting Ma, Junan Zu, and Dror Baron are wit te Department of Electrical and Computer Engineering, NC State University, Raleig, NC {yma7, jzu9, barondror}@ncsu.edu. te ill-posed linear system [, 3]. However, we migt need M > N wen te signal is dense or te noise is substantial. One popular sceme to solve te CS recovery problem is LASSO [4] (also known as basis pursuit denoising [5]): x = arg min x R N 1 y Ax + γ x 1, were p denotes te l p -norm, and γ reflects a trade-off between te sparsity x 1 and residual y Ax. Tis approac does not require statistical information about x and z, and can be conveniently solved via standard convex optimization tools or te approximate message passing (AMP) algoritm [6]. However, te reconstruction quality is often far from optimal in terms of mean square error (MSE). Bayesian CS recovery algoritms based on message passing [7 9] usually acieve better reconstruction quality, but must know te prior for x. For parametric signals wit unknown parameters, one can infer te parameters and acieve te minimum mean square error (MMSE) in some settings; examples include EM-GM-AMP-MOS [10], turbogamp [11], and adaptive-gamp [1]. Unfortunately, possible uncertainty about te input statistics may make it difficult to select a model class for empirical Bayes algoritms; a mismatced model can yield excess mean square error (EMSE) above te MMSE, and te EMSE can get amplified in linear inverse problems (1) compared to tat in scalar estimation problems [13]. Our goal is to develop universal scemes tat approac te optimal Bayesian performance for stationary ergodic signals despite not knowing te input statistics. Altoug oters ave worked on CS algoritms for independent and identically distributed (i.i.d.) signals wit unknown distributions [10], we are particularly interested in developing algoritms for signals tat may not be well approximated by i.i.d. models, because real-world signals often contain dependencies between different entries. For example, we will see in Fig. 6 tat a cirp sound clip is reconstructed 1 db better wit models tat can capture suc dependencies tan i.i.d. models applied to sparse transform coefficients. Wile approaces based on Kolmogorov complexity [14 17] are teoretically appealing for universal signal recovery, tey are not computable in practice [18, 19]. Several algoritms based on Markov cain Monte Carlo (MCMC) [0 3] leverage te fact tat for stationary ergodic signals, bot te per-symbol empirical entropy and Kolmogorov complexity converge asymptotically almost surely to te entropy rate of te signal [18], and aim to minimize te empirical entropy. Te best existing implementation of te MCMC approac [3] often acieves an MSE tat is witin 3 db of te MMSE, wic resembles a result by Donoo for universal denoising [14].

2 In tis paper, we confine our attention to te system model defined in (1), were te input signal x is stationary and ergodic. We merge concepts from AMP [6], Gaussian mixture (GM) learning [4] for density estimation, and universal denoising for stationary ergodic signals [5, 6]. We call te resulting universal CS recovery algoritm AMP-UD (AMP wit a universal denoiser). Two implementations of AMP- UD are provided, and tey compare favorably wit existing universal approaces in terms of reconstruction quality and runtime. B. Related work and main results Approximate message passing: AMP is an iterative algoritm tat solves a linear inverse problem by successively converting matrix cannel problems into scalar cannel denoising problems wit additive wite Gaussian noise (AWGN). AMP as received considerable attention, because of its fast convergence and te state evolution (SE) formalism [6, 7], wic offers a precise caracterization of te AWGN denoising problem in eac iteration. AMP wit separable denoisers as been rigorously proved to obey SE [7]. Te focus of tis paper is te reconstruction of signals tat are not necessarily i.i.d., and so we need to explore nonseparable denoisers. Donoo et al. [8] provide numerical results demonstrating tat SE accurately predicts te pase transition of AMP wen some well-beaved non-separable minimax denoisers are applied, and conjecture tat SE olds for AMP wit a broader class of denoisers. A compressive imaging algoritm tat applies non-separable image denoisers witin AMP appears in Tan et al. [9]. Rus et al. [30] apply AMP to sparse superposition decoding, and prove tat SE olds for AMP wit certain block-separable denoisers and tat suc an AMP-based decoder acieves cannel capacity. A potential callenge of implementing AMP is to obtain te Onsager correction term [6], wic involves te calculation of te derivative of a denoiser. Metzler et al. [31] leverage a Monte Carlo tecnique to approximate te derivative of a denoiser wen an explicit analytical formulation of te denoiser is unavailable, and provide numerical results sowing tat SE olds for AMP wit teir approximation. Despite te encouraging results for using non-separable denoisers witin AMP, a rigorous proof tat SE olds for general non-separable denoisers as yet to appear. Consequently, new evidence sowing tat AMP obeys SE may increase te community s confidence about using non-separable denoisers witin AMP. Our first contribution is tat we provide numerical results sowing tat SE olds for non-separable Bayesian sliding-window denoisers. Fitting Gaussian mixture models: Figueiredo and Jain [4] propose an unsupervised GM learning algoritm tat fits a given data sequence wit a GM model. Te algoritm employs a cost function tat resembles te minimum message lengt criterion, and te parameters are learned via expectation-maximization (EM). Our GM fitting problem involves estimating te probability density function (pdf) of a sequence x from its AWGN corrupted observations. We modify te GM fitting algoritm [4], so tat a GM model can be learned from noisy data. Once te estimated pdf p X of x is available, we estimate x by computing te conditional expectation wit te estimated pdf p X (recall tat MMSE estimators rely on conditional expectation). Our second contribution is tat we modify te GM learning algoritm, and extend it to an i.i.d. denoiser. Universal denoising: Our denoiser for stationary ergodic signals is inspired by a context quantization approac [6], were a universal denoiser for a stationary ergodic signal involves multiple i.i.d. denoisers for conditionally i.i.d. subsequences. Sivaramakrisnan and Weissman [6] ave sown tat teir universal denoiser based on context quantization can acieve te MMSE asymptotically for stationary ergodic signals wit known bounds. Te boundedness condition of Sivaramakrisnan and Weissman [6] is partly due to teir density estimation approac, in wic te empirical distribution function is obtained by quantizing te bounded range of te signal. Suc boundedness conditions may be undesirable in certain applications. We overcome tis limitation by replacing teir density estimation approac wit GM model learning. Our tird contribution is a universal denoiser tat does not need information about te bounds or require te input signal to be bounded; we conjecture tat our universal denoiser acieves te MMSE asymptotically under some tecnical conditions. A flow cart of AMP-UD, wic employs te AMP framework, along wit our modified universal denoiser (η univ ) and te GM-based i.i.d. denoiser (η iid ), is sown in Fig. 1. Based on te numerical evidence tat SE olds for AMP wit Bayesian sliding-window denoisers and te conjecture tat our universal denoiser can acieve te MMSE, we furter conjecture tat AMP-UD acieves te MMSE under some tecnical conditions. Te details of AMP-UD, including two practical implementations, are developed in Sections II V. Te remainder of te paper is arranged as follows. In Section II, we review AMP and provide new numerical evidence tat AMP obeys SE wit non-separable denoisers. Section III modifies te GM fitting algoritm, and extends it to an i.i.d. denoiser. In Section IV, we extend te universal denoiser based on context quantization to overcome te boundedness condition, and two implementations are provided to improve denoising quality. Our proposed AMP-UD algoritm is summarized in Section V. Numerical results are sown in Section VI, and we conclude te paper in Section VII. II. APPROXIMATE MESSAGE PASSING WITH SLIDING-WINDOW DENOISERS In tis section, we apply non-separable Bayesian slidingwindow denoisers witin AMP, and provide numerical evidence tat state evolution (SE) olds for AMP wit tis class of denoisers. A. Review of AMP Consider a linear system (1), were te measurement matrix A as zero mean i.i.d. Gaussian entries wit unit-norm columns on average, and z represents i.i.d. Gaussian noise

3 3 y, A AMP decoupling q t = x t + A T r t r t = y Ax t + r t 1 R < ηη iid,t 1 (q t 1 ) > x t+1, ηη iid,t (q t ) q t ηη univ,t (q t ) Context quantization (1) q t () q t q t (L) ηη iid,t (q t (1) ) ηη iid,t (q t () ) ηη iid,t (q t (L) ) (1) x t+1, ηηiid,t () x t+1, ηηiid,t (L) x t+1, ηηiid,t (q t (1) ) (q t () ) (q t (L) ) Reorder x = x tmax Fig. 1. Flow cart of AMP-UD. AMP (, 3) decouples te linear inverse problem into scalar cannel denoising problems. In te t-t iteration, te universal denoiser η univ,t ( ) converts stationary ergodic signal denoising into i.i.d. subsequence denoising. Eac i.i.d. denoiser η iid,t ( ) (13) outputs te denoised subsequence x (l) t+1 and te derivative of te denoiser η iid,t ( ) (16). Te algoritm stops wen te iteration index t reaces te predefined maximum tmax, and outputs x tmax as te CS recovery result. wit pdf p Z (z i ) = N (z i ; 0, σz), were z i is te i-t entry of te vector z, and N (x; µ, σ ) denotes a Gaussian pdf: N (x; µ, σ ) = 1 ) (x µ) exp ( πσ σ. Note tat AMP as been proved to follow SE wen A is a zero mean i.i.d. Gaussian matrix, but may diverge oterwise. Several tecniques ave been proposed to improve te convergence of AMP [3 35]. Moreover, oter noise distributions can be supported using generalized AMP (GAMP) [9], and te noise distribution can be estimated in eac GAMP iteration [1]. Suc generalizations are beyond te scope of tis work. Starting wit x 0 = 0, te AMP algoritm [6] proceeds iteratively according to x t+1 = η t (A T r t + x t ), () r t = y Ax t + 1 R r t 1 η t 1(A T r t 1 + x t 1 ), (3) were R = M/N represents te measurement rate, t represents te iteration index, η t ( ) is a denoising function, and u = 1 N N i=1 u i for some vector u R N. Te last term in (3) is called te Onsager correction term in statistical pysics. Te empirical distribution of x is assumed to converge to some probability distribution p X on R, and te denoising function η t ( ) is separable in te original derivation of AMP [6, 7, 36]. Tat is, η t (u) = (η t (u 1 ), η t (u ),..., η t (u N )) and η t(u) = (η t(u 1 ), η t(u ),..., η t(u N )), were η t( ) denotes te derivative of η t ( ). A useful property of AMP is tat at eac iteration, te vector A T r t + x t R N in () is statistically equivalent to te input signal x corrupted by AWGN, were te noise variance σt evolves following SE in te limit of large systems (N, M/N R): σ t+1 = σ z + 1 R MSE(η t, σ t ), (4) knowing tat SE olds for AMP wit te denoisers tat we are interested in can elp us coose a good denoiser for AMP. It as been conjectured by Donoo et al. [8] tat AMP wit a wide range of non-separable denoisers obeys SE. We now provide new evidence to support tis conjecture by constructing non-separable Bayesian denoisers witin a sliding-window denoising sceme for two stationary ergodic Markov signal models, and sowing tat SE accurately predicts te performance of AMP wit tis class of denoisers for large signal dimension N. Note tat for a signal tat is generated by a stationary ergodic process, its empirical distribution converges to te stationary distribution, ence te condition on te input signal in te proof for SE [7] is satisfied, and our goal is to numerically verify tat SE olds for AMP wit non-separable sliding-window denoisers for stationary ergodic signal models. Our rationale for examining te SE performance of sliding-window denoisers is tat te context quantization based universal denoiser [6], wic will be used in Section IV, resembles a sliding-window denoiser. Te matematical model for an AWGN cannel denoising problem is defined as q = x + v, (6) were x R N is te input signal, v R N is AWGN wit pdf p V (v i ) = N (v i ; 0, σv), and q R N is a sequence of noisy observations. Note tat we are interested in designing denoisers for AMP, and te noise variance of te scalar cannel in eac AMP iteration can be estimated as σ t (5). Terefore, trougout te paper we assume tat te noise variance σv is known wen we discuss scalar cannels. In a separable denoiser, x j is estimated only from its noisy observation q j. Te separable Bayesian denoiser tat minimizes te MSE is point-wise conditional expectation, x j = E[X Q = q j ] = xp(x q j )dx, (7) were MSE(η t, σ t ) = E X,W [(η t (X + σ t W ) X) ], W N (w; 0, 1), X p X, and σ 0 = σ z + 1 R E[X ]. Formal statements for SE appear in te reference papers [7, 36]. Additionally, it is convenient to use te following estimator for σ t [7, 36]: σ t = 1 M r t. (5) B. State evolution for Bayesian sliding-window denoisers SE allows to calculate te asymptotic MSE of linear systems from te MSE of te denoiser used witin AMP. Terefore, were Bayes rule yields p(x q j ) = N (qj;x,σ v )p X (x) p Q (q j). If entries of te input signal x are drawn independently from p X, ten (7) acieves te MMSE. Wen tere are statistical dependencies among te entries of x, a sliding-window sceme can be applied to improve te MSE. We consider two Markov sources as examples tat contain statistical dependencies, and empasize tat our true motivation is te ricer class of stationary ergodic sources. Example source 1: Consider a two-state Markov state macine tat contains states s 0 (zero state in wic te signal entries are zero) and s 1 (nonzero state in wic entries are

4 4 nonzero). Te transition probabilities are p 10 = p(s 0 s 1 ) and p 01 = p(s 1 s 0 ). In te steady state, te marginal probability p of state s 1 is 01 p 01+p 10. We call our first example source Markov-Gaussian (MGauss for sort); it is generated by te two-state Markov macine wit p 01 = and p 10 = 1 10, and in te nonzero state te signal value follows a Gaussian distribution N (x; µ x, σx). Tese state transition parameters yield 3% nonzero entries in an MGauss signal on average. Example source : Our second example is a four-state Markov switcing signal (M4 for sort) tat follows te pattern +1, +1, 1, 1, +1, +1, 1, 1... wit 3% error probability in state transitions, resulting in te signal switcing from 1 to +1 or vice versa eiter too early or too late; te four states s 1 = [ 1 1], s = [ 1 + 1], s 3 = [+1 1], and s 4 = [+1 + 1] ave equal marginal probabilities 0.5 in te steady state. Bayesian sliding-window denoiser: Let θ be a binary vector, were θ i = 0 indicates x i = 0, and θ i = 1 indicates x i 0. Denoting a block (u s, u s+1,..., u t ) of any sequence u by u t s for s < t, te (k + 1)-Bayesian sliding-window denoiser η MGauss for te MGauss signal is defined as η MGauss,j (q ) = E[X j Q ( = θ {s0,s1}k+1 θ j=s 1 i= = q ] (q i, θ i ; µ x, σx, σv)p Θ (θ ) p Q (q ) ( ) σ x σx + σv (q j µ x ) + µ x, (8) were (q i, θ i ; µ x, σ x, σ v) = { N (q i ; µ x, σ v + σ x), if θ i = s 1 N (q i ; 0, σ v), if θ i = s 0, 1 p Θ (θ ) = p(θ ) p(θ i+1 θ i ), p Q (q ) = p Θ i= (θ ) i= (q i, θ i ; µ x, σ x), and te summation is over θ {s 0, s 1 } k+1. Te MSE of η MGauss,j is [ ( ) ] MSE(η MGauss, σv) = E X j η MGauss,j(Q ) = p 01(σ x + µ x) p 01 + p 10 R k+1 η MGauss,j(q)p Q (q)dq. (9) Similarly, te (k + 1)-Bayesian sliding-window denoiser η M4 for te M4 signal is defined as η M4,j (q ) = E[X j Q = q ] = p Xj,Q (1, q p Xj,Q (1, q ) p X j,q ) + p X j,q ( 1, q ) ) ( 1, q (10) ), were p X (x ) = p(x, x +1 ) p(x i+ x i+1, x i ), p Xj,Q (x, q ) = p X i= (x ) i= N (q i ; x i, σ v), were te summation is over x { 1, 1}k+1 wit x j = x { 1, 1} fixed. It can be sown tat [ ( ) ] MSE(η M4, σv) = E X j η M4,j(Q ) = 4 R k+1 p Xj,Q ( 1, q)p Xj,Q p Q (q) (1, q) dq. (11) If AMP wit η MGauss or η M4 obeys SE, ten te noise variance σt sould evolve according to (4). As a consequence, te reconstruction error at iteration t can be predicted by evaluating (9) or (11) wit σv being replaced by σt. Numerical evidence: We apply η MGauss (8) witin AMP for MGauss signals, and η M4 (10) witin AMP for M4 signals. Te window size k + 1 is cosen to be 1 or 3 for η MGauss, and 1 or 5 for η M4. Note tat wen te window size is 1, η MGauss and η M4 become separable denoisers. Te MSE predicted by SE is compared to te empirical MSE at eac iteration were te input signal to noise ratio (SNR = 10 log 10 [(NE[X ])/(Mσz)]) is 10 db for bot MGauss and M4. It is sown in Fig. for AMP wit η MGauss and η M4 tat te markers representing te empirical MSE track te lines predicted by SE, and tat sideinformation from neigboring entries elps improve te MSE. Our SE results for te two Markov sources increase our confidence tat AMP wit non-separable denoisers tat incorporate information from neigboring entries will track SE. Te reader may ave noticed from Fig. 1 tat te universal denoiser η univ ( ) is acting as a set of separable denoisers η iid ( ). However, te statistical information used by η iid ( ) is learned from subsequences q (1) t,...,q (L) t of te noisy sequence q t, and te subsequencing result is determined by te neigborood of eac entry. Te SE results for te Bayesian sliding-window denoisers motivate us to apply te universal denoiser witin AMP for CS reconstruction of stationary ergodic signals wit unknown input statistics. Indeed, te numerical results in Section VI sow tat AMP wit a universal denoiser leads to a promising universal CS recovery algoritm. III. I.I.D. DENOISING VIA GAUSSIAN MIXTURE FITTING We will see in Section IV tat context quantization maps te non-i.i.d. sequence q into conditionally independent subsequences, and now focus our attention on denoising te resulting i.i.d. subsequences. A. Background Te pdf of a Gaussian mixture (GM) as te form: p(x) = α s N (x; µ s, σs), (1) s=1

5 5 MSE MSE Prediction (window size=1) Prediction (window size=3) Empirical Iteration Prediction (window size=1) Prediction (window size=5) Empirical Iteration Fig.. Top: Numerical verification of SE for AMP wit η MGauss (8) wen te input is an MGauss signal. (N = 0000, R = M/N = 0.4, SNR = 10 db.) Bottom: Numerical verification of SE for AMP wit η M4 (10) wen te input is an M4 signal. (N = 0000, R = 0.4, SNR = 10 db.) were S is te number of Gaussian components, and S s=1 α s = 1, so tat p(x) is a proper pdf. Figueiredo and Jain [4] propose to fit a GM model for a given data sequence by starting wit some arbitrarily large S, and inferring te structure of te mixture by letting te mixing probabilities α s of some components be zero. Tis leads to an unsupervised learning algoritm tat automatically determines te number of Gaussian components from data. Tis approac resembles te concept underlying te minimum message lengt (MML) criterion tat selects te best overall model from te entire model space, wic differs from model class selection based on te best model witin eac class. 1 Tis criterion can be interpreted as posing a Diriclet prior on te mixing probability and perform maximum a poteriori estimation [4]. A component-wise EM algoritm tat updates {α s, µ s, σs} sequentially in s is used to implement te MMLbased approac. Te main feature of te component-wise EM algoritm is tat if α s is estimated as 0, ten te s- t component is immediately removed, and te expectation is recalculated before moving to te estimation of te next component. B. Extension to denoising Consider te scalar cannel denoising problem defined in (6) wit an i.i.d. input. We propose to estimate x from its Gaussian noise corrupted observations q by posing a GM prior on x, and learning te parameters of te GM model wit a modified version of te algoritm by Figueiredo and Jain [4]. Initialization of EM: Te EM algoritm must be initialized for eac parameter, {α s, µ s, σ s}, s = 1,..., S. One may coose to initialize te Gaussian components wit equal mixing probabilities and equal variances, and te initial value of te means are randomly sampled from te input data sequence [4], wic in our case is te sequence of noisy 1 All models wit te same number of components belong to one model class, and different models witin a model class ave different parameters for eac component. observations q. However, in CS recovery problems, te input signal is often sparse, and it becomes difficult to correct te initial value if te initialized values are far from te trut. To see wy a poor initialization migt be problematic, consider te following scenario: a sparse binary signal tat contains a few ones and is corrupted by Gaussian noise is sent to te algoritm. If te initialization levels of te µ s s are all around zero, ten te algoritm is likely to fit a Gaussian component wit near-zero mean and large variance rater tan two narrow Gaussian components, one of wic as mean close to zero wile te oter as mean close to one. To address tis issue, we modify te initialization to examine te maximal distance between eac symbol of te input data sequence and te current initialization of te µ s s. If te distance is greater tan 0.1σ q, ten we add a Gaussian component wose mean is initialized as te value of te symbol being examined, were σq is te estimated variance of te noisy observations q. We found in our simulations tat te modified initialization improves te accuracy of te density estimation, and speeds up te convergence of te EM algoritm; te details of te simulation are omitted for brevity. Parameter estimation from noisy data: Two possible modifications can be made to te original GM learning algoritm [4] tat is designed for clean data. We first notice tat te model for te noisy data is a GM convolved wit Gaussian noise, wic is a new GM wit larger component variances. Hence, one approac is to use te original algoritm [4] to fit a GM to te noisy data, but to remove a component immediately during te EM iterations if te estimated component variance is muc smaller tan te noise variance σv. Specifically, during te parameter learning process, if a component as variance tat is less tan 0.σv, we assume tat tis low-variance component is spurious, and remove it from te mixture model. However, if te component variance is between 0.σv and 0.9σv, ten we force te component variance to be 0.9σv and let te algoritm keep tracking tis component. For component variance greater tan 0.9σv, we do not adjust te algoritm. Te parameters 0. and 0.9 are cosen, because tey provide reasonable MSE performance for a wide range of signals tat we tested. Tese parameters are ten fixed for our algoritm to generate te numerical results in Section VI. At te end of te parameter learning process, all remaining components wit variances less tan σv are set to ave variances equal to σv. Tat said, wen subtracting te noise variance σv from te Gaussian components of p Q to obtain te components of p X, we could ave components wit zero-valued variance, wic yields deltas in p X. Note tat deltas are in general difficult to fit wit a limited amount of observations, and our modification elps te algoritm estimate deltas. Anoter approac is to introduce latent variables tat represent te underlying clean data, and estimate te parameters of te GM for te latent variables directly. Hence, similar to te original algoritm, a component is removed only wen te estimated mixing probability is non-positive. It can be sown

6 6 tat te GM parameters are estimated as { N } max w (s) i (t) 1, 0 i=1 α s (t + 1) = { N }, max w (s) i (t) 1, 0 s:α s>0 i=1 N w (s) i (t)a (s) i (t) i=1 µ s (t + 1) =, N (t) σ s(t + 1) = were N i=1 w (s) i (t) = a (s) i (t) = i=1 w (s) i w (s) i (t) ( ( ) ) v (s) i (t) + a (s) i (t) µ s (t + 1) N i=1 w (s) i (t) α s (t)n (q i ; µ s (t), σv + σ s (t) ), α m (t)n (q i ; µ m (t), σv + σ m(t)) m=1 σ s(t) σ s(t) + σv (q i µ s (t)) + µ s (t), v (s) σ i (t) = vσs σv + σ s(t). Detailed derivations are in te supporting document [37]. We found in our simulation tat te first approac converges faster and leads to lower reconstruction error, especially for discrete-valued inputs. Terefore, te simulation results presented in Section VI use te first approac. Denoising: Once te parameters in (1) are estimated, we define a denoiser for i.i.d. signals as conditional expectation: η iid (q) = E[X Q = q] = E[X Q = q, comp = s]p (comp = s Q = q) = s=1 ( σ s s=1 ) σs + σv (q µ s ) + µ s α s N (q; µ s, σ s + σ v) S s=1 α sn (q; µ s, σ s + σ v), were comp is te component index, and ( ) σ E[X Q = q, comp = s] = s σs + σv (q µ s ) + µ s, (13) is te Wiener filter for component s. We ave verified numerically for several distributions and low to moderate noise levels tat te denoising results obtained by te GM-based i.i.d. denoiser (13) approac te MMSE witin a few undredts of a db. For example, te favorable reconstruction results for i.i.d. sparse Laplace signals in Fig. 3 sow tat te GM-based denoiser approaces te MMSE. IV. UNIVERSAL DENOISING We ave seen in Section III tat an i.i.d. denoiser based on GM learning can denoise i.i.d. signals wit unknown distributions. Our goal in tis work is to reconstruct stationary ergodic signals tat are not necessarily i.i.d. Sivaramakrisnan and Weissman [6] ave proposed a universal denoising sceme for stationary ergodic signals wit known bounds based on context quantization, were a stationary ergodic signal is partitioned into i.i.d. subsequences. In tis section, we modify te context quantization sceme and apply te GM-based denoiser (13) to te i.i.d. subsequences, so tat our universal denoiser can denoise stationary ergodic signals tat are unbounded or wit unknown bounds. A. Background Consider te denoising problem (6), were te input x is stationary ergodic. Te main idea of te context quantization sceme [6] is to quantize te noisy symbols q to generate quantized contexts tat are used to partition te unquantized symbols into subsequences. Tat is, given te noisy observations q R N, define te context of q j as c j = [q j 1 ; q j+1 ] Rk for j = 1 + k,..., N k, were [a; b] denotes te concatenation of te sequences a and b. For j k or j N k +1, te median value q med of q is used as te missing symbols in te contexts. As an example for j = k, we only ave k 1 symbols in q before q k, and so te first symbol in c k is missing; we define c k = [q med ; q k 1 1 ; q k k+1 ]. Vector quantization can ten be applied to te context set C = {c j : j = 1,..., N}, and eac c j is assigned a label l j {1,..., L} tat represents te cluster tat c j belongs to. Finally, te L subsequences tat consist of symbols from q wit te same label are obtained by taking q (l) = {q j : l j = l}, for l = 1,..., L. Te symbols in eac subsequence q (l) are regarded as approximately conditionally identically distributed given te common quantized contexts. Te rationale underlying tis concept is tat a sliding-window denoiser uses information from te contexts to estimate te current symbol, and symbols wit similar contexts in te noisy output of te scalar cannel ave similar contexts in te original signal. Terefore, symbols wit similar contexts can be grouped togeter and denoised using te same denoiser. Note tat Sivaramakrisnan and Weissman [6] propose a second subsequencing step, wic furter partitions eac subsequence into smaller subsequences suc tat a symbol in a subsequence does not belong to te contexts of any oter symbols in tis subsequence. Tis step ensures tat te symbols witin eac subsequence are mutually independent, wic is crucial for teoretical analysis. However, for finite-lengt signals, small subsequences may occur, and tey may not contain enoug symbols to learn its empirical pdf well. Terefore, we omit tis second subsequencing step in our implementations. In order to estimate te distribution of x (l), wic is te clean subsequence corresponding to q (l), Sivaramakrisnan and Weissman [6] first estimate te pdf p (l) Q of q(l) via kernel density estimation. Tey ten quantize te range tat x i s take values from and te levels of te empirical distribution function of x, and find a quantized distribution function tat matces p (l) Q well. Once te distribution function of x(l) is obtained, te conditional expectation of te symbols in te l-t subsequence can be calculated.

7 7 For error metrics tat satisfy some mild tecnical conditions, Sivaramakrisnan and Weissman [6] ave proved for stationary ergodic signals wit bounded components tat teir universal denoiser asymptotically acieves te optimal estimation error among all sliding-window denoising scemes despite not knowing te prior for te signal. Wen te error metric is square error, te optimal error is te MMSE. B. Extension to unbounded signals and signals wit unknown bounds Sivaramakrisnan and Weissman [6] ave sown tat one can denoise a stationary ergodic signal by (i) grouping togeter symbols wit similar contexts and (ii) applying an i.i.d. denoiser to eac group. Suc a sceme is optimal in te limit of large signal dimension N. However, teir denoiser assumes an input wit known bounds, wic migt make it inapplicable to some real-world settings. In order to be able to estimate signals tat take values from te entire real line, in step (ii), we apply te GM learning algoritm for density estimation, wic as been discussed in detail in Section III, and compute te conditional expectation wit te estimated density as our i.i.d. denoiser. We now provide details about a modification made to step (i). Te context set C is acquired in te same way as described in Section IV-A. Because te symbols in te context c j C tat are closer in index to q j are likely to provide more information about x j tan te ones tat are located furter away, we add weigts to te contexts before clustering. Tat is, for eac c j C of lengt k, te weigted context is defined as c j = c j w, were denotes a point-wise product, and te weigts take values, { e β(k ki), k i = 1,.., k w ki = e β(ki k 1), (14), k i = k + 1,..., k for some β 0. Wile in noisier cannels, it migt be necessary to use information from longer contexts, comparatively sort contexts could be sufficient for cleaner cannels. Terefore, te exponential decay rate β is made adaptive to te noise level in a way suc tat β increases wit SNR. Specifically, β is cosen to be linear in SNR: β = b 1 log 10 (( q /N σ v)/σ v) + b, (15) were b 1 > 0 and b can be determined numerically. Specifically, we run te algoritm wit a sufficiently large range of β values for various input signals at various SNR levels and measurement rates. For eac setting, we select te β tat acieves te best reconstruction quality. Ten, b 1 and b are obtained using te least squares approac. Note tat b 1 and b are fixed for all te simulation results presented in Section VI. If te parameters were tuned for eac individual input signal, ten te optimal parameter values migt vary for different input signals, and te reconstruction quality migt be improved. Te simulation results in Section VI wit fixed parameters sow tat wen te parameters are sligtly off from te individually optimal ones, te reconstruction quality of AMP-UD is still comparable or better tan te prior art. We coose te linear relation because it is simple and fits well wit our empirical optimal values for β; oter coices for β migt be possible. Te weigted context set C = {c j : j = 1,..., N} is ten sent to a k-means algoritm [38], and q (l), l = 1,..., L, are obtained according to te labels determined via clustering. We can now apply te GM-based i.i.d. denoiser (13) to eac subsequence separately. However, one potential problem is tat te GM fitting algoritm migt not provide a good estimate of te model wen te number of data points is small. We propose two approaces to address tis small cluster issue. Approac 1: Borrow members from nearby clusters. A post-processing step can be added to ensure tat te pdf of q (l) is estimated from no less tan T symbols. Tat is, if te size of q (l), wic is denoted by B, is less tan T, ten T B symbols in oter clusters wose contexts are closest to te centroid of te current cluster are included to estimate te empirical pdf of q (l), wile after te pdf is estimated, te extra symbols are removed, and only q (l) is denoised wit te currently estimated pdf. We call UD wit Approac 1 UD1. Approac : Merge statistically similar subsequences. An alternative approac is to merge subsequences iteratively according to teir statistical caracterizations. Te idea is to find subsequences wit pdfs tat are close in Kullback-Leibler (KL) distance [18], and decide weter merging tem can yield a better model according to te minimum description lengt (MDL) [39] criterion. Denote te iteration index for te merging process by. After te k-means algoritm, we ave obtained a set of subsequences {q (l) : l = 1,..., L }, were L is te current number of subsequences. A GM pdf p (l) Q, is learned for eac subsequence q(l). Te MDL cost cmdl for te current model is calculated as: c MDL L = l=1 q (l) i=1 ( log p (l) Q, (q(l) ) i, ) L + l=1 L + L + L 0 l=1 3 m (l) n (l) L 0 ( ) log q (l) ( ) L 0 log, n (l) were q (l) i, is te i-t entry of te subsequence q(l), m(l) is te number of Gaussian components in te mixture model for subsequence q (l), L 0 is te number of subsequences before te merging procedure, and n (l) is te number of subsequences in te initial set {q (l) 0 : l = 1,..., L 0 } tat are merged to form te subsequence q (l). Te four terms in cmdl are interpreted as follows. Te first term is te negative log likeliood of te entire noisy sequence q given te current GM models. Te second term is te penalty for te number of parameters used to describe te model, were we ave 3 parameters (α, µ, σ ) for eac Gaussian component, and m (l) components for te subsequence q (l). Te tird term arises from bits tat are A related approac is k-nearest neigbors, were for eac symbol in q, we find T symbols wose contexts are nearest to tat of te current symbol and estimate its pdf from te T symbols. Te k-nearest neigbors approac requires to run te GM learning algoritm [4] N times in eac AMP iteration, wic significantly slows down te algoritm.

8 8 used to encode m (l) for l = 1,..., L, because our numerical results ave sown tat te number of Gaussian components ( ) L n rarely exceeds 4. In te fourt term, (l) L l=1 L 0 log 0 n (l) is te uncertainty tat a subsequence from te initial set is mapped to q (l) wit probability n(l) /L 0, for l = 1,..., L. Terefore, te fourt term is te coding lengt for mapping te L 0 subsequences from te initial set to te current set. We ten compute te KL distance between te pdf of q (s) and tat of q (t), for s, t = 1,..., L : ) ( ) (s) p ( D p (s) Q, p (t) Q, = p (s) Q, (q) log Q, (q) p (t) Q, (q) A symmetric L L distance matrix D is obtained by letting its s-t row and t-t column be ( ) ( ) D p (s) p (t) + D p (t) p (s). Q, Q, Q, Q, Suppose te smallest entry in te upper triangular part of D (not including te diagonal) is located in te s -t row and t -t column, ten q (s ) and q (t ) are temporarily merged to form a new subsequence, and a new GM pdf is learned for te merged subsequence. We now ave a new model wit L +1 = L 1 GM pdfs, and te MDL criterion c MDL +1 is calculated for te new model. If c MDL +1 is smaller tan c MDL, ten we accept te new model, and calculate a new L +1 L +1 distance matrix D +1 ; oterwise we keep te current model, and look for te next smallest entry in te upper triangular part of te current L L distance matrix. Te number of subsequences is decreased by at most one after eac iteration, and te merging process ends wen tere is only one subsequence left, or te smallest KL distance between two GM pdfs is greater tan some tresold, wic is determined numerically. We call UD wit Approac UD. We will see in Section VI tat UD is more reliable tan UD1 in terms of MSE performance, wereas UD1 is faster tan UD. Tis is because UD applies a more complicated (and tus slower) subsequencing procedure, wic allows more accurate GM models to be fitted to subsequences. dq. V. PROPOSED UNIVERSAL CS RECOVERY ALGORITHM Combining te tree components tat ave been discussed in Sections II IV, we are now ready to introduce our proposed universal CS recovery algoritm AMP-UD. Note tat te AMP-UD algoritm is designed for medium to large size problems. Specifically, te problem size sould be large enoug, suc tat te decoupling effect of AMP, wic converts te compressed sensing problem to a series of scalar cannel denoising problems, approximately olds, and tat te statistical information about te input can be approximately estimated by te universal denoiser. Consider a linear system (1), were te input signal x is stationary and ergodic wit unknown distributions, and te matrix A as i.i.d. Gaussian entries. To estimate x from y given A, we apply AMP as defined in () and (3). In eac iteration, AWGN corrupted observations, q t = x t + A T r t = x + v, are obtained, were σv is estimated by σ t (5). A subsequencing approac is applied to generate i.i.d. subsequences, were Approac 1 and Approac (Section IV-B) are two possible implementations. Te GM-based i.i.d. denoiser (13) is ten utilized to denoise eac i.i.d. subsequence. To obtain te Onsager correction term in (3), we need to calculate te derivative of η iid (13). For q R, denoting ( ) σ f(q) = α s N (q; µ s, σs + σv) s σ s=1 s + σv (q µ s ) + µ s, g(q) = α s N (q; µ s, σs + σv), s=1 we ave tat f (q) = Terefore, α s N (q; µ s, σs + σv) s=1 ( σs + µ s qµ s σs + σv g (q) = s=1 α s N (q; µ s, σ s + σ v) ( ) ) σs (q µ s ) σs + σv, ( q µ s σs + σv ). η iid(q) = f (q)g(q) f(q)g (q) (g(q)). (16) We igligt tat AMP-UD is unaware of te input SNR and also unaware of te input statistics. Te noise variance σv in te scalar cannel denoising problem is estimated by te average energy of te residual (5). Te input s statistical structure is learned by te universal denoiser witout any prior assumptions. It as been proved [6] tat te context quantization universal denoising sceme can asymptotically acieve te MMSE for stationary ergodic signals wit known bounds. We ave extended te sceme to unbounded signals in Sections III and IV, and conjecture tat our modified universal denoiser can asymptotically acieve te MMSE for unbounded stationary ergodic signals. AMP wit MMSE-acieving separable denoisers as been proved to asymptotically acieve te MMSE in linear systems for i.i.d. inputs [7]. In Section II-B, we ave provided numerical evidence tat sows tat SE olds for AMP wit Bayesian sliding-window denoisers. Bayesian sliding-window denoisers wit proper window-sizes are MMSE-acieving non-separable denoisers [6]. Given tat our universal denoiser resembles a Bayesian sliding-window denoiser, we conjecture tat AMP-UD can acieve te MMSE for stationary ergodic inputs in te limit of large linear systems were te matrix as i.i.d. random entries. Note tat we ave optimized te window-size for inputs of lengt N = via numerical experiments. We believe tat te window size sould increase wit N, and leave te caracterization of te optimal window size for future work. VI. NUMERICAL RESULTS We run AMP-UD1 (AMP wit UD1) and AMP-UD (AMP wit UD) in MATLAB on a Dell OPTIPLEX 9010 running an Intel(R) Core TM i wit 16GB RAM, and test tem utilizing different types of signals, including syntetic signals,

9 9 Signal to distortion ratio (db) dB MMSE 10 10dB AMP UD 10dB AMP UD1 8 10dB SLA MCMC 10dB EMGM AMP MOS 6 5dB MMSE 4 5dB AMP UD 5dB AMP UD1 5dB SLA MCMC 5dB EMGM AMP MOS Measurement rate (R) Fig. 3. Two AMP-UD implementations, SLA-MCMC, and EM-GM-AMP- MOS reconstruction results for an i.i.d. sparse Laplace signal as a function of measurement rate (R = M/N). Note tat te SDR curves for te two AMP- UD implementations and EM-GM-AMP-MOS overlap te MMSE curve. (N = 10000, SNR = 5 db or 10 db.) a cirp sound clip, and a speec signal, at various measurement rates and SNR levels, were we remind te reader tat SNR is defined in Section II-B. Te input signal lengt N is for syntetic signals and rougly for te cirp sound clip and te speec signal. Te context size k is cosen to be 1, and te contexts are weigted according to (14) and (15). Te context quantization is implemented via te k-means algoritm [38]. In order to avoid possible divergence of AMP-UD, possibly due to a bad GM fitting, we employ a damping tecnique [3] to slow down te evolution. Specifically, damping is an extra step in te AMP iteration (3); instead of updating te value of x t+1 by te output of te denoiser η t (A T r t + x t ), a weigted sum of η t (A T r t + x t ) and x t is taken as follows, x t+1 = λη t (A T r t + x t ) + (1 λ)x t, for some λ (0, 1]. Parameters for AMP-UD1: Te number of clusters L is initialized as 10, and may become smaller if empty clusters occur. Te lower bound T on te number of symbols required to learn te GM parameters is 56. Te damping parameter λ is 0.1, and we run 100 AMP iterations. Parameters for AMP-UD: Te initial number of clusters is set to be 30, and tese clusters will be merged according to te sceme described in Section IV. Because eac time wen merging occurs, we need to apply te GM fitting algoritm one more time to learn a new mixture model for te merged cluster, wic is computationally demanding, we apply adaptive damping [34] to reduce te number of iterations required; te number of AMP iterations is set to be 30. Te damping parameter is initialized to be 0.5, and will increase (decrease) witin te range [0.01, 0.5] if te value of te scalar cannel noise estimator σ t (5) decreases (increases). Te recovery performance is evaluated by signal to distortion ratio (SDR = 10 log 10 (E[X ]/MSE)), were te MSE is averaged over 50 random draws of x, A, and z. Signal to distortion ratio (db) dB AMP UD 14 10dB AMP UD1 10dB SLA MCMC 1 10dB turbogamp 5dB AMP UD 5dB AMP UD1 10 5dB SLA MCMC 5dB turbogamp Measurement rate (R) Fig. 4. Two AMP-UD implementations, SLA-MCMC, and turbogamp reconstruction results for a two-state Markov signal wit nonzero entries drawn from a uniform distribution U[0, 1] as a function of measurement rate. Note tat te SDR curves for te two AMP-UD implementations overlap at SNR = 5 db, and tey bot overlap turbogamp at SNR = 10 db. (N = 10000, SNR = 5 db or 10 db.) We compare te performance of te two AMP-UD implementations to (i) te universal CS recovery algoritm SLA-MCMC [3]; and (ii) te empirical Bayesian message passing approaces EM-GM-AMP-MOS [10] for i.i.d. inputs and turbogamp [11] for non-i.i.d. inputs. Note tat EM- GM-AMP-MOS assumes during recovery tat te input is i.i.d., wereas turbogamp is designed for non-i.i.d. inputs wit a known statistical model. We do not include results for oter well-known CS algoritms suc as compressive sensing matcing pursuit (CoSaMP) [40], gradient projection for sparse reconstruction (GPSR) [41], or l 1 minimization [, 3], because teir SDR performance is consistently weaker tan te tree algoritms being compared. Sparse Laplace signal (i.i.d.): We tested i.i.d. sparse Laplace signals tat follow te distribution p X (x) = 0.03L(0, 1) δ(x), were L(0, 1) denotes a Laplacian distribution wit mean zero and variance one, and δ( ) is te delta function [4]. It is sown in Fig. 3 tat te two AMP-UD implementations and EM-GM-AMP-MOS acieve te MMSE [43, 44], wereas SLA-MCMC as weaker performance, because te MCMC approac is expected to sample from te posterior and its MSE is twice te MMSE [14, 3]. Markov-uniform signal: Consider te two-state Markov state macine defined in Section II-B wit p 01 = and p 10 = A Markov-uniform signal (MUnif for sort) follows a uniform distribution U[0, 1] at te nonzero state s 1. Tese parameters lead to 3% nonzero entries in an MUnif signal on average. It is sown in Fig. 4 tat at low SNR, te two AMP-UD implementations acieve iger SDR tan SLA- MCMC and turbogamp. At ig SNR, te two AMP-UD implementations and turbogamp ave similar SDR performance, and are sligtly better tan SLA-MCMC. We igligt tat turbogamp needs side information about te Markovian structure of te signal, wereas te two AMP-UD implementations and SLA-MCMC do not. Dense Markov-Rademacer signal: Consider te two-

10 10 Signal to distortion ratio (db) dB AMP UD 0 10dB AMP UD1 10dB SLA MCMC 15 10dB turbogamp 5dB AMP UD 5dB AMP UD1 10 5dB SLA MCMC 5dB turbogamp Measurement rate (R) Fig. 5. Two AMP-UD implementations, SLA-MCMC, and turbogamp reconstruction results for a dense two-state Markov signal wit nonzero entries drawn from a Rademacer (±1) distribution as a function of measurement rate. (N = 10000, SNR = 10 db or 15 db.) Signal to Distortion Ratio (db) dB AMP-UD 10dB SLA-MCMC 6 10dB AMP-UD1 10dB EM-GM-AMP-MOS 5 5dB AMP-UD 5dB SLA-MCMC 4 5dB AMP-UD1 5dB EM-GM-AMP-MOS Measurement rate (R) Fig. 6. Two AMP-UD implementations, SLA-MCMC, and EM-GM-AMP- MOS reconstruction results for a cirp sound clip as a function of measurement rate. (N = 9600, SNR = 5 db or 10 db.) state Markov state macine defined in Section II-B wit p 01 = 3 70 and p 10 = A dense Markov Rademacer signal (MRad for sort) takes values from { 1, +1} wit equal probability at s 1. Tese parameters lead to 30% nonzero entries in an MRad signal on average. Because te MRad signal is dense (non-sparse), we must measure it wit somewat larger measurement rates and SNRs tan before. It is sown in Fig. 5 tat te two AMP-UD implementations and SLA- MCMC ave better overall performance tan turbogamp. AMP-UD1 outperforms SLA-MCMC except for te lowest tested measurement rate at low SNR, wereas AMP-UD outperforms SLA-MCMC consistently. Cirp sound clip and speec signal: Our experiments up to tis point use syntetic signals. We now evaluate te reconstruction quality of AMP-UD for two real-world signals. A Cirp sound clip and a speec signal are used. We cut a segment wit lengt 9600 out of te Cirp and a segment wit lengt out of te speec signal (denoted by x), and performed a sort-time discrete cosine transform (DCT) wit window size, number of DCT points, and op size all being 3. Te resulting sort-time DCT coefficients matrix are ten vectorized to form a coefficient vector θ. Denoting te sorttime DCT matrix by W 1, we ave θ = W 1 x. Terefore, we can rewrite (1) as y = Φθ + z, were Φ = AW. Our goal is to reconstruct θ from te measurements y and te matrix Φ. After we obtain te estimated coefficient vector θ, te estimated signal is calculated as x = W θ. Altoug te coefficient vector θ may exibit some type of memory, it is not readily modeled in closed form, and so we cannot provide a valid model for turbogamp [11]. Terefore, we use EM-GM-AMP-MOS [10] instead of turbogamp [11]. Te SDRs for te two AMP-UD implementations, SLA-MCMC and EM-GM-AMP-MOS [10] for te Cirp are plotted in Fig. 6 and te speec signal in Fig. 7. We can see tat bot AMP-UD implementations outperform EM-GM-AMP- MOS consistently, wic implies tat te simple i.i.d. model is suboptimal for tese two real-world signals. Moreover, AMP- UD provides comparable and in most cases iger SDR tan SLA-MCMC, wic indicates tat AMP-UD is more reliable in learning various statistical structures tan SLA-MCMC. AMP-UD1 is te fastest among te four algoritms, but it may ave lower reconstruction quality tan AMP-UD and SLA-MCMC, owing to poor selection of te subsequences. It is wort mentioning tat we ave also run simulations on an electrocardiograp (ECG) signal, and EM-GM-AMP-MOS acieved similar SDR as te two AMP-UD implementations, wic indicates tat an i.i.d. model migt be adequate to represent te coefficients of te ECG signal; te plot is omitted for brevity. Runtime: Te runtime of AMP-UD1 and AMP-UD for MUnif, MRad, and te speec signal is typically under 5 minutes and 10 minutes, respectively, but somewat more for signals suc as sparse Laplace and te cirp sound clip tat require a large number of Gaussian components to be fit. For comparison, te runtime of SLA-MCMC is typically an our, wereas typical runtimes of EM-GM-AMP-MOS and turbogamp are 30 minutes. To furter accelerate AMP, we could consider parallel computing. Tat is, after clustering, te Gaussian mixture learning algoritm can be implemented simultaneously in different processors. VII. CONCLUSION In tis paper, we introduced a universal compressed sensing recovery algoritm AMP-UD tat applies our proposed universal denoiser (UD) witin approximate message passing (AMP). AMP-UD is designed to reconstruct stationary ergodic signals from noisy linear measurements. Te performance of two AMP-UD implementations was evaluated via simulations, were it was sown tat AMP-UD acieves favorable signal to distortion ratios compared to existing universal algoritms, and tat its runtime is typically faster. AMP-UD combines tree existing scemes: (i) AMP [6]; (ii) universal denoising [6]; and (iii) a density estimation approac based on Gaussian mixture (GM) fitting [4]. In

11 11 Signal to Distortion Ratio (db) dB AMP-UD 10dB SLA-MCMC 10dB AMP-UD1 10dB EM-GM-AMP-MOS 5dB AMP-UD 5dB SLA-MCMC 5dB AMP-UD1 5dB EM-GM-AMP-MOS Measurement rate (R) Fig. 7. Two AMP-UD implementations, SLA-MCMC, and EM-GM-AMP- MOS reconstruction results for a speec signal as a function of measurement rate. (N = 10560, SNR = 5 db or 10 db.) addition to te algoritmic framework, we provided tree specific contributions. First, we provided numerical results sowing tat SE olds for non-separable Bayesian slidingwindow denoisers. Second, we modified te GM learning algoritm, and extended it to an i.i.d. denoiser. Tird, we designed a universal denoiser tat does not need to know te bounds of te input or require te input signal to be bounded. Two implementations of te universal denoiser were provided, wit one being faster and te oter acieving better reconstruction quality in terms of signal to distortion ratio. Tere are numerous directions for future work. First, our current algoritm was designed to minimize te square error, and te denoiser could be modified to minimize oter error metrics [45]. Second, AMP-UD was designed to reconstruct one-dimensional signals. In order to support applications tat process multi-dimensional signals suc as images, it migt be instructive to employ universal image denoisers witin AMP. Tird, te relation between te input lengt and te optimal window-size, as well as te exponential decay rate of te context weigts, can be investigated. Finally, we can modify our work to support measurement noise wit unknown distributions as an extension to adaptive generalized AMP [1]. ACKNOWLEDGEMENTS We tank Mario Figueiredo and Tsacy Weissman for informative discussions; and Amad Beirami and Marco Duarte for detailed suggestions on te manuscript. REFERENCES [1] Y. Ma, J. Zu, and D. Baron, Compressed sensing via universal denoising and approximate message passing, in Proc. Allerton Conference Commun., Control, and Comput., Oct [] D. Donoo, Compressed sensing, IEEE Trans. Inf. Teory, vol. 5, no. 4, pp , Apr [3] E. Candès, J. Romberg, and T. Tao, Robust uncertainty principles: Exact signal reconstruction from igly incomplete frequency information, IEEE Trans. Inf. Teory, vol. 5, no., pp , Feb [4] R. Tibsirani, Regression srinkage and selection via te LASSO, J. Royal Stat. Soc. Series B (Metodological), vol. 58, no. 1, pp , [5] S. Cen, D. Donoo, and M. Saunders, Atomic decomposition by basis pursuit, SIAM J. Sci. Comp., vol. 0, no. 1, pp , [6] D. L. Donoo, A. Maleki, and A. Montanari, Message passing algoritms for compressed sensing, Proc. Nat. Academy Sci., vol. 106, no. 45, pp , Nov [7] D. Baron, S. Sarvotam, and R. G. Baraniuk, Bayesian compressive sensing via belief propagation, IEEE Trans. Signal Process., vol. 58, pp , Jan [8] D. L. Donoo, A. Maleki, and A. Montanari, Message Passing Algoritms for Compressed Sensing: I. Motivation and Construction, in IEEE Inf. Teory Worksop, Jan [9] S. Rangan, Generalized approximate message passing for estimation wit random linear mixing, in Proc. IEEE Int. Symp. Inf. Teory (ISIT), July 011, pp [10] J. Vila and P. Scniter, Expectation-maximization Gaussian-mixture approximate message passing, IEEE Trans. Signal Process., vol. 61, no. 19, pp , Oct [11] J. Ziniel, S. Rangan, and P. Scniter, A generalized framework for learning and recovery of structured sparse signals, in Proc. IEEE Stat. Signal Process. Worksop (SSP), Aug. 01, pp [1] U. Kamilov, S. Rangan, A. K. Fletcer, and M. Unser, Approximate message passing wit consistent parameter estimation and applications to sparse learning, IEEE Trans. Inf. Teory, vol. 60, no. 5, pp , May 014. [13] Y. Ma, D. Baron, and A. Beirami, Mismatced estimation in large linear systems, in Proc. IEEE Int. Symp. Inf. Teory (ISIT), July 015, pp [14] D. L. Donoo, Te Kolmogorov sampler, Stanford University, Stanford, CA, Department of Statistics Tecnical Report 00-4, Jan. 00. [15] D. Donoo, H. Kakavand, and J. Mammen, Te simplest solution to an underdetermined system of linear equations, in Proc. Int. Symp. Inf. Teory (ISIT), July 006, pp [16] S. Jalali and A. Maleki, Minimum complexity pursuit, in Proc. Allerton Conference Commun., Control, Comput., Sept. 011, pp [17] S. Jalali, A. Maleki, and R. G. Baraniuk, Minimum complexity pursuit for universal compressed sensing, IEEE Trans. Inf. Teory, vol. 60, no. 4, pp , Apr [18] T. M. Cover and J. A. Tomas, Elements of Information Teory. New York, NY, USA: Wiley-Interscience, 006. [19] M. Li and P. M. B. Vitanyi, An introduction to Kolmogorov complexity and its applications. Springer-Verlag, New York, 008. [0] D. Baron, Information complexity and estimation, in Worksop Inf. Teoretic Metods Sci. Eng. (WITMSE), Aug [1] D. Baron and M. F. Duarte, Universal MAP estimation in compressed sensing, in Proc. Allerton Conference Commun., Control, and Comput., Sept. 011, pp [] J. Zu, D. Baron, and M. F. Duarte, Complexity adaptive universal signal estimation for compressed sensing, in Proc. IEEE Stat. Signal Process. Worksop (SSP), June 014, pp [3], Recovery from linear measurements wit complexity-matcing universal signal estimation, IEEE Trans. Signal Process., vol. 63, no. 6, pp , Mar [4] M. Figueiredo and A. Jain, Unsupervised learning of finite mixture models, IEEE Trans. Pattern Anal. Mac. Intell., vol. 4, pp , Mar. 00. [5] K. Sivaramakrisnan and T. Weissman, Universal denoising of discretetime continuous-amplitude signals, IEEE Trans. Inf. Teory, vol. 54, no. 1, pp , Dec [6], A context quantization approac to universal denoising, IEEE Trans. Signal Process., vol. 57, no. 6, pp , June 009. [7] M. Bayati and A. Montanari, Te dynamics of message passing on dense graps, wit applications to compressed sensing, IEEE Trans. Inf. Teory, vol. 57, no., pp , Feb [8] D. Donoo, I. Jonstone, and A. Montanari, Accurate prediction of pase transitions in compressed sensing via a connection to minimax denoising, IEEE Trans. Inf. Teory, vol. 59, no. 6, pp , June 013. [9] J. Tan, Y. Ma, and D. Baron, Compressive imaging via approximate message passing wit image denoising, IEEE Trans. Signal Process., vol. 63, no. 8, pp , Apr [30] C. Rus, A. Greig, and R. Venkataramanan, Capacity-acieving sparse superposition codes via approximate message passing decoding, Proc. Int. Symp. Inf. Teory (ISIT), June 015. [31] C. Metzler, A. Maleki, and R. G. Baraniuk, From denoising to compressed sensing, Arxiv preprint arxiv: v, June 014.

1 [3] S. Rangan, P. Scniter, and A. Fletcer, On te convergence of approximate message passing wit arbitrary matrices, in Proc. IEEE Int. Symp. Inf. Teory (ISIT), July 014, pp. 36 40. [33] A.

Rangan, F. Krzakala, and L. Zdeborova, Adaptive damping and mean removal for te generalized approximate message passing algoritm, in IEEE Int. Conf. Acoustics, Speec, Signal Process. (ICASSP), Apr.

Inf. Teory (ISIT), June 015, pp. 1640 1644. [36] A. Montanari, Grapical models concepts in compressed sensing, Compressed Sensing: Teory and Applications, pp. 394 438, 01. [37] Y. Ma, J. Zu, and D.

12 1 [3] S. Rangan, P. Scniter, and A. Fletcer, On te convergence of approximate message passing wit arbitrary matrices, in Proc. IEEE Int. Symp. Inf. Teory (ISIT), July 014, pp [33] A. Manoel, F. Krzakala, E. W. Tramel, and L. Zdeborova, Sparse estimation wit te swept approximated message-passing algoritm, Arxiv preprint arxiv: , June 014. [34] J. Vila, P. Scniter, S. Rangan, F. Krzakala, and L. Zdeborova, Adaptive damping and mean removal for te generalized approximate message passing algoritm, in IEEE Int. Conf. Acoustics, Speec, Signal Process. (ICASSP), Apr. 015, pp [35] S. Rangan, A. K. Fletcer, P. Scniter, and U. S. Kamilov, Inference for generalized linear models via alternating directions and bete free energy minimization, in Proc. Int. Symp. Inf. Teory (ISIT), June 015, pp [36] A. Montanari, Grapical models concepts in compressed sensing, Compressed Sensing: Teory and Applications, pp , 01. [37] Y. Ma, J. Zu, and D. Baron, Approximate message passing algoritm wit universal denoising and Gaussian mixture learning, Arxiv preprint arxiv: v, Aug [38] J. MacQueen, Some metods for classification and analysis of multivariate observations, in Proc. 5t Berkeley Symp. Mat. Stat. & Prob., vol. 1, no. 14, 1967, pp [39] A. Barron, J. Rissanen, and B. Yu, Te minimum description lengt principle in coding and modeling, IEEE Trans. Inf. Teory, vol. 44, no. 6, pp , Oct [40] D. Needell and J. A. Tropp, CoSaMP: Iterative signal recovery from incomplete and inaccurate samples, Appl. Computational Harmonic Anal., vol. 6, no. 3, pp , May 009. [41] M. Figueiredo, R. Nowak, and S. J. Wrigt, Gradient projection for sparse reconstruction: Application to compressed sensing and oter inverse problems, IEEE J. Select. Topics Signal Proces., vol. 1, pp , Dec [4] A. Papoulis, Probability, Random Variables, and Stocastic Processes. McGraw Hill Book Co., [43] D. Guo, D. Baron, and S. Samai, A single-letter caracterization of optimal noisy compressed sensing, in Proc. 47t Allerton Conf. Commun., Control, and Comput., Sept. 009, pp [44] S. Rangan, A. K. Fletcer, and V. K. Goyal, Asymptotic analysis of MAP estimation via te replica metod and applications to compressed sensing, IEEE Trans. Inf. Teory, vol. 58, no. 3, pp , Mar. 01. [45] J. Tan, D. Carmon, and D. Baron, Signal estimation wit additive error metrics in compressed sensing, IEEE Trans. Inf. Teory, vol. 60, no. 1, pp , Jan Yanting Ma (S 13) received te B.Sc. degree in communication engineering from Wuan University, Cina, in 01. Currently se is a P.D. candidate at Nort Carolina State University, in te Department of Electrical and Computer Engineering. Her researc interests include information teory, convex optimization, and statistical signal processing. Junan Zu (S 11) received te B.E. degree in Electrical Engineering wit a focus on Optoelectronics from te University of Sangai for Science and Tecnology (USST), Sangai, Cina in 011. He is currently pursuing te P.D. degree in Electrical Engineering at Nort Carolina State University (NCSU), Raleig, NC. His researc interests include compressed sensing, macine learning, optimization, distributed algoritms, and computational imaging. Dror Baron (S 99-M 03-SM 10) received te B.Sc. (summa cum laude) and M.Sc. degrees from te Tecnion - Israel Institute of Tecnology, Haifa, Israel, in 1997 and 1999, and te P.D. degree from te University of Illinois at Urbana-Campaign in 003, all in electrical engineering. From 1997 to 1999, Dr. Baron worked at Witcom Ltd. in modem design. From 1999 to 003, e was a researc assistant at te University of Illinois at UrbanaCampaign, were e was also a Visiting Assistant Professor in 003. From 003 to 006, e was a Postdoctoral Researc Associate in te Department of Electrical and Computer Engineering at Rice University, Houston, TX. From 007 to 008, e was a quantitative fnancial analyst wit Menta Capital, San Francisco, CA, and from 008 to 010 e was a Visiting Scientist in te Department of Electrical Engineering at te Tecnion - Israel Institute of Tecnology, Haifa. Since 010, Dr. Baron as been wit te Electrical and Computer Engineering Department at Nort Carolina State University, were e is currently an Associate Professor. Dr. Baron s researc interests combine information teory, signal processing, and fast algoritms; in recent years, e as focused on compressed sensing. Dr. Baron was a recipient of te 00 M. E. Van Valkenburg Graduate Researc Award, and received onorable mention at te Robert Borer Memorial Student Worksop in April 00, bot at te University of Illinois. He also participated from 1994 to 1997 in te Program for Outstanding Students, comprising te top 0.5% of undergraduates at te Tecnion.

A = h w (1) Error Analysis Physics 141

A = h w (1) Error Analysis Physics 141 Introduction In all brances of pysical science and engineering one deals constantly wit numbers wic results more or less directly from experimental observations. Experimental observations always ave inaccuracies.