I. INTRODUCTION. A. Motivation

Size: px
Start display at page:

Download "I. INTRODUCTION. A. Motivation"

Transcription

1 Approximate Message Passing Algoritm wit Universal Denoising and Gaussian Mixture Learning Yanting Ma, Student Member, IEEE, Junan Zu, Student Member, IEEE, and Dror Baron, Senior Member, IEEE Abstract We study compressed sensing (CS) signal reconstruction problems were an input signal is measured via matrix multiplication under additive wite Gaussian noise. Our signals are assumed to be stationary and ergodic, but te input statistics are unknown; te goal is to provide reconstruction algoritms tat are universal to te input statistics. We present a novel algoritmic framework tat combines: (i) te approximate message passing (AMP) CS reconstruction framework, wic solves te matrix cannel recovery problem by iterative scalar cannel denoising; (ii) a universal denoising sceme based on context quantization, wic partitions te stationary ergodic signal denoising into independent and identically distributed (i.i.d.) subsequence denoising; and (iii) a density estimation approac tat approximates te probability distribution of an i.i.d. sequence by fitting a Gaussian mixture (GM) model. In addition to te algoritmic framework, we provide tree contributions: (i) numerical results sowing tat state evolution olds for non-separable Bayesian sliding-window denoisers; (ii) an i.i.d. denoiser based on a modified GM learning algoritm; and (iii) a universal denoiser tat does not need information about te range were te input takes values from or require te input signal to be bounded. We provide two implementations of our universal CS recovery algoritm wit one being faster and te oter being more accurate. Te two implementations compare favorably wit existing universal reconstruction algoritms in terms of bot reconstruction quality and runtime. Index Terms approximate message passing, compressed sensing, Gaussian mixture model, universal denoising. A. Motivation I. INTRODUCTION Many scientific and engineering problems can be approximated as linear systems of te form y = Ax + z, (1) were x R N is te unknown input signal, A R M N is te matrix tat caracterizes te linear system, and z R M is measurement noise. Te goal is to estimate x from te measurements y given A and statistical information about z. Wen M N, te setup is known as compressed sensing (CS); by posing a sparsity or compressibility requirement on te signal, it is indeed possible to accurately recover x from Copyrigt (c) 015 IEEE. Personal use of tis material is permitted. However, permission to use tis material for any oter purposes must be obtained from te IEEE by sending a request to pubs-permissions@ieee.org. Tis work was supported in part by te National Science Foundation under te Grant CCF and in part by te U.S. Army Researc Office under te Grant W911NF Portions of te work appeared at te 5nd Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, Oct. 014 [1]. Yanting Ma, Junan Zu, and Dror Baron are wit te Department of Electrical and Computer Engineering, NC State University, Raleig, NC {yma7, jzu9, barondror}@ncsu.edu. te ill-posed linear system [, 3]. However, we migt need M > N wen te signal is dense or te noise is substantial. One popular sceme to solve te CS recovery problem is LASSO [4] (also known as basis pursuit denoising [5]): x = arg min x R N 1 y Ax + γ x 1, were p denotes te l p -norm, and γ reflects a trade-off between te sparsity x 1 and residual y Ax. Tis approac does not require statistical information about x and z, and can be conveniently solved via standard convex optimization tools or te approximate message passing (AMP) algoritm [6]. However, te reconstruction quality is often far from optimal in terms of mean square error (MSE). Bayesian CS recovery algoritms based on message passing [7 9] usually acieve better reconstruction quality, but must know te prior for x. For parametric signals wit unknown parameters, one can infer te parameters and acieve te minimum mean square error (MMSE) in some settings; examples include EM-GM-AMP-MOS [10], turbogamp [11], and adaptive-gamp [1]. Unfortunately, possible uncertainty about te input statistics may make it difficult to select a model class for empirical Bayes algoritms; a mismatced model can yield excess mean square error (EMSE) above te MMSE, and te EMSE can get amplified in linear inverse problems (1) compared to tat in scalar estimation problems [13]. Our goal is to develop universal scemes tat approac te optimal Bayesian performance for stationary ergodic signals despite not knowing te input statistics. Altoug oters ave worked on CS algoritms for independent and identically distributed (i.i.d.) signals wit unknown distributions [10], we are particularly interested in developing algoritms for signals tat may not be well approximated by i.i.d. models, because real-world signals often contain dependencies between different entries. For example, we will see in Fig. 6 tat a cirp sound clip is reconstructed 1 db better wit models tat can capture suc dependencies tan i.i.d. models applied to sparse transform coefficients. Wile approaces based on Kolmogorov complexity [14 17] are teoretically appealing for universal signal recovery, tey are not computable in practice [18, 19]. Several algoritms based on Markov cain Monte Carlo (MCMC) [0 3] leverage te fact tat for stationary ergodic signals, bot te per-symbol empirical entropy and Kolmogorov complexity converge asymptotically almost surely to te entropy rate of te signal [18], and aim to minimize te empirical entropy. Te best existing implementation of te MCMC approac [3] often acieves an MSE tat is witin 3 db of te MMSE, wic resembles a result by Donoo for universal denoising [14].

2 In tis paper, we confine our attention to te system model defined in (1), were te input signal x is stationary and ergodic. We merge concepts from AMP [6], Gaussian mixture (GM) learning [4] for density estimation, and universal denoising for stationary ergodic signals [5, 6]. We call te resulting universal CS recovery algoritm AMP-UD (AMP wit a universal denoiser). Two implementations of AMP- UD are provided, and tey compare favorably wit existing universal approaces in terms of reconstruction quality and runtime. B. Related work and main results Approximate message passing: AMP is an iterative algoritm tat solves a linear inverse problem by successively converting matrix cannel problems into scalar cannel denoising problems wit additive wite Gaussian noise (AWGN). AMP as received considerable attention, because of its fast convergence and te state evolution (SE) formalism [6, 7], wic offers a precise caracterization of te AWGN denoising problem in eac iteration. AMP wit separable denoisers as been rigorously proved to obey SE [7]. Te focus of tis paper is te reconstruction of signals tat are not necessarily i.i.d., and so we need to explore nonseparable denoisers. Donoo et al. [8] provide numerical results demonstrating tat SE accurately predicts te pase transition of AMP wen some well-beaved non-separable minimax denoisers are applied, and conjecture tat SE olds for AMP wit a broader class of denoisers. A compressive imaging algoritm tat applies non-separable image denoisers witin AMP appears in Tan et al. [9]. Rus et al. [30] apply AMP to sparse superposition decoding, and prove tat SE olds for AMP wit certain block-separable denoisers and tat suc an AMP-based decoder acieves cannel capacity. A potential callenge of implementing AMP is to obtain te Onsager correction term [6], wic involves te calculation of te derivative of a denoiser. Metzler et al. [31] leverage a Monte Carlo tecnique to approximate te derivative of a denoiser wen an explicit analytical formulation of te denoiser is unavailable, and provide numerical results sowing tat SE olds for AMP wit teir approximation. Despite te encouraging results for using non-separable denoisers witin AMP, a rigorous proof tat SE olds for general non-separable denoisers as yet to appear. Consequently, new evidence sowing tat AMP obeys SE may increase te community s confidence about using non-separable denoisers witin AMP. Our first contribution is tat we provide numerical results sowing tat SE olds for non-separable Bayesian sliding-window denoisers. Fitting Gaussian mixture models: Figueiredo and Jain [4] propose an unsupervised GM learning algoritm tat fits a given data sequence wit a GM model. Te algoritm employs a cost function tat resembles te minimum message lengt criterion, and te parameters are learned via expectation-maximization (EM). Our GM fitting problem involves estimating te probability density function (pdf) of a sequence x from its AWGN corrupted observations. We modify te GM fitting algoritm [4], so tat a GM model can be learned from noisy data. Once te estimated pdf p X of x is available, we estimate x by computing te conditional expectation wit te estimated pdf p X (recall tat MMSE estimators rely on conditional expectation). Our second contribution is tat we modify te GM learning algoritm, and extend it to an i.i.d. denoiser. Universal denoising: Our denoiser for stationary ergodic signals is inspired by a context quantization approac [6], were a universal denoiser for a stationary ergodic signal involves multiple i.i.d. denoisers for conditionally i.i.d. subsequences. Sivaramakrisnan and Weissman [6] ave sown tat teir universal denoiser based on context quantization can acieve te MMSE asymptotically for stationary ergodic signals wit known bounds. Te boundedness condition of Sivaramakrisnan and Weissman [6] is partly due to teir density estimation approac, in wic te empirical distribution function is obtained by quantizing te bounded range of te signal. Suc boundedness conditions may be undesirable in certain applications. We overcome tis limitation by replacing teir density estimation approac wit GM model learning. Our tird contribution is a universal denoiser tat does not need information about te bounds or require te input signal to be bounded; we conjecture tat our universal denoiser acieves te MMSE asymptotically under some tecnical conditions. A flow cart of AMP-UD, wic employs te AMP framework, along wit our modified universal denoiser (η univ ) and te GM-based i.i.d. denoiser (η iid ), is sown in Fig. 1. Based on te numerical evidence tat SE olds for AMP wit Bayesian sliding-window denoisers and te conjecture tat our universal denoiser can acieve te MMSE, we furter conjecture tat AMP-UD acieves te MMSE under some tecnical conditions. Te details of AMP-UD, including two practical implementations, are developed in Sections II V. Te remainder of te paper is arranged as follows. In Section II, we review AMP and provide new numerical evidence tat AMP obeys SE wit non-separable denoisers. Section III modifies te GM fitting algoritm, and extends it to an i.i.d. denoiser. In Section IV, we extend te universal denoiser based on context quantization to overcome te boundedness condition, and two implementations are provided to improve denoising quality. Our proposed AMP-UD algoritm is summarized in Section V. Numerical results are sown in Section VI, and we conclude te paper in Section VII. II. APPROXIMATE MESSAGE PASSING WITH SLIDING-WINDOW DENOISERS In tis section, we apply non-separable Bayesian slidingwindow denoisers witin AMP, and provide numerical evidence tat state evolution (SE) olds for AMP wit tis class of denoisers. A. Review of AMP Consider a linear system (1), were te measurement matrix A as zero mean i.i.d. Gaussian entries wit unit-norm columns on average, and z represents i.i.d. Gaussian noise

3 3 y, A AMP decoupling q t = x t + A T r t r t = y Ax t + r t 1 R < ηη iid,t 1 (q t 1 ) > x t+1, ηη iid,t (q t ) q t ηη univ,t (q t ) Context quantization (1) q t () q t q t (L) ηη iid,t (q t (1) ) ηη iid,t (q t () ) ηη iid,t (q t (L) ) (1) x t+1, ηηiid,t () x t+1, ηηiid,t (L) x t+1, ηηiid,t (q t (1) ) (q t () ) (q t (L) ) Reorder x = x tmax Fig. 1. Flow cart of AMP-UD. AMP (, 3) decouples te linear inverse problem into scalar cannel denoising problems. In te t-t iteration, te universal denoiser η univ,t ( ) converts stationary ergodic signal denoising into i.i.d. subsequence denoising. Eac i.i.d. denoiser η iid,t ( ) (13) outputs te denoised subsequence x (l) t+1 and te derivative of te denoiser η iid,t ( ) (16). Te algoritm stops wen te iteration index t reaces te predefined maximum tmax, and outputs x tmax as te CS recovery result. wit pdf p Z (z i ) = N (z i ; 0, σz), were z i is te i-t entry of te vector z, and N (x; µ, σ ) denotes a Gaussian pdf: N (x; µ, σ ) = 1 ) (x µ) exp ( πσ σ. Note tat AMP as been proved to follow SE wen A is a zero mean i.i.d. Gaussian matrix, but may diverge oterwise. Several tecniques ave been proposed to improve te convergence of AMP [3 35]. Moreover, oter noise distributions can be supported using generalized AMP (GAMP) [9], and te noise distribution can be estimated in eac GAMP iteration [1]. Suc generalizations are beyond te scope of tis work. Starting wit x 0 = 0, te AMP algoritm [6] proceeds iteratively according to x t+1 = η t (A T r t + x t ), () r t = y Ax t + 1 R r t 1 η t 1(A T r t 1 + x t 1 ), (3) were R = M/N represents te measurement rate, t represents te iteration index, η t ( ) is a denoising function, and u = 1 N N i=1 u i for some vector u R N. Te last term in (3) is called te Onsager correction term in statistical pysics. Te empirical distribution of x is assumed to converge to some probability distribution p X on R, and te denoising function η t ( ) is separable in te original derivation of AMP [6, 7, 36]. Tat is, η t (u) = (η t (u 1 ), η t (u ),..., η t (u N )) and η t(u) = (η t(u 1 ), η t(u ),..., η t(u N )), were η t( ) denotes te derivative of η t ( ). A useful property of AMP is tat at eac iteration, te vector A T r t + x t R N in () is statistically equivalent to te input signal x corrupted by AWGN, were te noise variance σt evolves following SE in te limit of large systems (N, M/N R): σ t+1 = σ z + 1 R MSE(η t, σ t ), (4) knowing tat SE olds for AMP wit te denoisers tat we are interested in can elp us coose a good denoiser for AMP. It as been conjectured by Donoo et al. [8] tat AMP wit a wide range of non-separable denoisers obeys SE. We now provide new evidence to support tis conjecture by constructing non-separable Bayesian denoisers witin a sliding-window denoising sceme for two stationary ergodic Markov signal models, and sowing tat SE accurately predicts te performance of AMP wit tis class of denoisers for large signal dimension N. Note tat for a signal tat is generated by a stationary ergodic process, its empirical distribution converges to te stationary distribution, ence te condition on te input signal in te proof for SE [7] is satisfied, and our goal is to numerically verify tat SE olds for AMP wit non-separable sliding-window denoisers for stationary ergodic signal models. Our rationale for examining te SE performance of sliding-window denoisers is tat te context quantization based universal denoiser [6], wic will be used in Section IV, resembles a sliding-window denoiser. Te matematical model for an AWGN cannel denoising problem is defined as q = x + v, (6) were x R N is te input signal, v R N is AWGN wit pdf p V (v i ) = N (v i ; 0, σv), and q R N is a sequence of noisy observations. Note tat we are interested in designing denoisers for AMP, and te noise variance of te scalar cannel in eac AMP iteration can be estimated as σ t (5). Terefore, trougout te paper we assume tat te noise variance σv is known wen we discuss scalar cannels. In a separable denoiser, x j is estimated only from its noisy observation q j. Te separable Bayesian denoiser tat minimizes te MSE is point-wise conditional expectation, x j = E[X Q = q j ] = xp(x q j )dx, (7) were MSE(η t, σ t ) = E X,W [(η t (X + σ t W ) X) ], W N (w; 0, 1), X p X, and σ 0 = σ z + 1 R E[X ]. Formal statements for SE appear in te reference papers [7, 36]. Additionally, it is convenient to use te following estimator for σ t [7, 36]: σ t = 1 M r t. (5) B. State evolution for Bayesian sliding-window denoisers SE allows to calculate te asymptotic MSE of linear systems from te MSE of te denoiser used witin AMP. Terefore, were Bayes rule yields p(x q j ) = N (qj;x,σ v )p X (x) p Q (q j). If entries of te input signal x are drawn independently from p X, ten (7) acieves te MMSE. Wen tere are statistical dependencies among te entries of x, a sliding-window sceme can be applied to improve te MSE. We consider two Markov sources as examples tat contain statistical dependencies, and empasize tat our true motivation is te ricer class of stationary ergodic sources. Example source 1: Consider a two-state Markov state macine tat contains states s 0 (zero state in wic te signal entries are zero) and s 1 (nonzero state in wic entries are

4 4 nonzero). Te transition probabilities are p 10 = p(s 0 s 1 ) and p 01 = p(s 1 s 0 ). In te steady state, te marginal probability p of state s 1 is 01 p 01+p 10. We call our first example source Markov-Gaussian (MGauss for sort); it is generated by te two-state Markov macine wit p 01 = and p 10 = 1 10, and in te nonzero state te signal value follows a Gaussian distribution N (x; µ x, σx). Tese state transition parameters yield 3% nonzero entries in an MGauss signal on average. Example source : Our second example is a four-state Markov switcing signal (M4 for sort) tat follows te pattern +1, +1, 1, 1, +1, +1, 1, 1... wit 3% error probability in state transitions, resulting in te signal switcing from 1 to +1 or vice versa eiter too early or too late; te four states s 1 = [ 1 1], s = [ 1 + 1], s 3 = [+1 1], and s 4 = [+1 + 1] ave equal marginal probabilities 0.5 in te steady state. Bayesian sliding-window denoiser: Let θ be a binary vector, were θ i = 0 indicates x i = 0, and θ i = 1 indicates x i 0. Denoting a block (u s, u s+1,..., u t ) of any sequence u by u t s for s < t, te (k + 1)-Bayesian sliding-window denoiser η MGauss for te MGauss signal is defined as η MGauss,j (q ) = E[X j Q ( = θ {s0,s1}k+1 θ j=s 1 i= = q ] (q i, θ i ; µ x, σx, σv)p Θ (θ ) p Q (q ) ( ) σ x σx + σv (q j µ x ) + µ x, (8) were (q i, θ i ; µ x, σ x, σ v) = { N (q i ; µ x, σ v + σ x), if θ i = s 1 N (q i ; 0, σ v), if θ i = s 0, 1 p Θ (θ ) = p(θ ) p(θ i+1 θ i ), p Q (q ) = p Θ i= (θ ) i= (q i, θ i ; µ x, σ x), and te summation is over θ {s 0, s 1 } k+1. Te MSE of η MGauss,j is [ ( ) ] MSE(η MGauss, σv) = E X j η MGauss,j(Q ) = p 01(σ x + µ x) p 01 + p 10 R k+1 η MGauss,j(q)p Q (q)dq. (9) Similarly, te (k + 1)-Bayesian sliding-window denoiser η M4 for te M4 signal is defined as η M4,j (q ) = E[X j Q = q ] = p Xj,Q (1, q p Xj,Q (1, q ) p X j,q ) + p X j,q ( 1, q ) ) ( 1, q (10) ), were p X (x ) = p(x, x +1 ) p(x i+ x i+1, x i ), p Xj,Q (x, q ) = p X i= (x ) i= N (q i ; x i, σ v), were te summation is over x { 1, 1}k+1 wit x j = x { 1, 1} fixed. It can be sown tat [ ( ) ] MSE(η M4, σv) = E X j η M4,j(Q ) = 4 R k+1 p Xj,Q ( 1, q)p Xj,Q p Q (q) (1, q) dq. (11) If AMP wit η MGauss or η M4 obeys SE, ten te noise variance σt sould evolve according to (4). As a consequence, te reconstruction error at iteration t can be predicted by evaluating (9) or (11) wit σv being replaced by σt. Numerical evidence: We apply η MGauss (8) witin AMP for MGauss signals, and η M4 (10) witin AMP for M4 signals. Te window size k + 1 is cosen to be 1 or 3 for η MGauss, and 1 or 5 for η M4. Note tat wen te window size is 1, η MGauss and η M4 become separable denoisers. Te MSE predicted by SE is compared to te empirical MSE at eac iteration were te input signal to noise ratio (SNR = 10 log 10 [(NE[X ])/(Mσz)]) is 10 db for bot MGauss and M4. It is sown in Fig. for AMP wit η MGauss and η M4 tat te markers representing te empirical MSE track te lines predicted by SE, and tat sideinformation from neigboring entries elps improve te MSE. Our SE results for te two Markov sources increase our confidence tat AMP wit non-separable denoisers tat incorporate information from neigboring entries will track SE. Te reader may ave noticed from Fig. 1 tat te universal denoiser η univ ( ) is acting as a set of separable denoisers η iid ( ). However, te statistical information used by η iid ( ) is learned from subsequences q (1) t,...,q (L) t of te noisy sequence q t, and te subsequencing result is determined by te neigborood of eac entry. Te SE results for te Bayesian sliding-window denoisers motivate us to apply te universal denoiser witin AMP for CS reconstruction of stationary ergodic signals wit unknown input statistics. Indeed, te numerical results in Section VI sow tat AMP wit a universal denoiser leads to a promising universal CS recovery algoritm. III. I.I.D. DENOISING VIA GAUSSIAN MIXTURE FITTING We will see in Section IV tat context quantization maps te non-i.i.d. sequence q into conditionally independent subsequences, and now focus our attention on denoising te resulting i.i.d. subsequences. A. Background Te pdf of a Gaussian mixture (GM) as te form: p(x) = α s N (x; µ s, σs), (1) s=1

5 5 MSE MSE Prediction (window size=1) Prediction (window size=3) Empirical Iteration Prediction (window size=1) Prediction (window size=5) Empirical Iteration Fig.. Top: Numerical verification of SE for AMP wit η MGauss (8) wen te input is an MGauss signal. (N = 0000, R = M/N = 0.4, SNR = 10 db.) Bottom: Numerical verification of SE for AMP wit η M4 (10) wen te input is an M4 signal. (N = 0000, R = 0.4, SNR = 10 db.) were S is te number of Gaussian components, and S s=1 α s = 1, so tat p(x) is a proper pdf. Figueiredo and Jain [4] propose to fit a GM model for a given data sequence by starting wit some arbitrarily large S, and inferring te structure of te mixture by letting te mixing probabilities α s of some components be zero. Tis leads to an unsupervised learning algoritm tat automatically determines te number of Gaussian components from data. Tis approac resembles te concept underlying te minimum message lengt (MML) criterion tat selects te best overall model from te entire model space, wic differs from model class selection based on te best model witin eac class. 1 Tis criterion can be interpreted as posing a Diriclet prior on te mixing probability and perform maximum a poteriori estimation [4]. A component-wise EM algoritm tat updates {α s, µ s, σs} sequentially in s is used to implement te MMLbased approac. Te main feature of te component-wise EM algoritm is tat if α s is estimated as 0, ten te s- t component is immediately removed, and te expectation is recalculated before moving to te estimation of te next component. B. Extension to denoising Consider te scalar cannel denoising problem defined in (6) wit an i.i.d. input. We propose to estimate x from its Gaussian noise corrupted observations q by posing a GM prior on x, and learning te parameters of te GM model wit a modified version of te algoritm by Figueiredo and Jain [4]. Initialization of EM: Te EM algoritm must be initialized for eac parameter, {α s, µ s, σ s}, s = 1,..., S. One may coose to initialize te Gaussian components wit equal mixing probabilities and equal variances, and te initial value of te means are randomly sampled from te input data sequence [4], wic in our case is te sequence of noisy 1 All models wit te same number of components belong to one model class, and different models witin a model class ave different parameters for eac component. observations q. However, in CS recovery problems, te input signal is often sparse, and it becomes difficult to correct te initial value if te initialized values are far from te trut. To see wy a poor initialization migt be problematic, consider te following scenario: a sparse binary signal tat contains a few ones and is corrupted by Gaussian noise is sent to te algoritm. If te initialization levels of te µ s s are all around zero, ten te algoritm is likely to fit a Gaussian component wit near-zero mean and large variance rater tan two narrow Gaussian components, one of wic as mean close to zero wile te oter as mean close to one. To address tis issue, we modify te initialization to examine te maximal distance between eac symbol of te input data sequence and te current initialization of te µ s s. If te distance is greater tan 0.1σ q, ten we add a Gaussian component wose mean is initialized as te value of te symbol being examined, were σq is te estimated variance of te noisy observations q. We found in our simulations tat te modified initialization improves te accuracy of te density estimation, and speeds up te convergence of te EM algoritm; te details of te simulation are omitted for brevity. Parameter estimation from noisy data: Two possible modifications can be made to te original GM learning algoritm [4] tat is designed for clean data. We first notice tat te model for te noisy data is a GM convolved wit Gaussian noise, wic is a new GM wit larger component variances. Hence, one approac is to use te original algoritm [4] to fit a GM to te noisy data, but to remove a component immediately during te EM iterations if te estimated component variance is muc smaller tan te noise variance σv. Specifically, during te parameter learning process, if a component as variance tat is less tan 0.σv, we assume tat tis low-variance component is spurious, and remove it from te mixture model. However, if te component variance is between 0.σv and 0.9σv, ten we force te component variance to be 0.9σv and let te algoritm keep tracking tis component. For component variance greater tan 0.9σv, we do not adjust te algoritm. Te parameters 0. and 0.9 are cosen, because tey provide reasonable MSE performance for a wide range of signals tat we tested. Tese parameters are ten fixed for our algoritm to generate te numerical results in Section VI. At te end of te parameter learning process, all remaining components wit variances less tan σv are set to ave variances equal to σv. Tat said, wen subtracting te noise variance σv from te Gaussian components of p Q to obtain te components of p X, we could ave components wit zero-valued variance, wic yields deltas in p X. Note tat deltas are in general difficult to fit wit a limited amount of observations, and our modification elps te algoritm estimate deltas. Anoter approac is to introduce latent variables tat represent te underlying clean data, and estimate te parameters of te GM for te latent variables directly. Hence, similar to te original algoritm, a component is removed only wen te estimated mixing probability is non-positive. It can be sown

6 6 tat te GM parameters are estimated as { N } max w (s) i (t) 1, 0 i=1 α s (t + 1) = { N }, max w (s) i (t) 1, 0 s:α s>0 i=1 N w (s) i (t)a (s) i (t) i=1 µ s (t + 1) =, N (t) σ s(t + 1) = were N i=1 w (s) i (t) = a (s) i (t) = i=1 w (s) i w (s) i (t) ( ( ) ) v (s) i (t) + a (s) i (t) µ s (t + 1) N i=1 w (s) i (t) α s (t)n (q i ; µ s (t), σv + σ s (t) ), α m (t)n (q i ; µ m (t), σv + σ m(t)) m=1 σ s(t) σ s(t) + σv (q i µ s (t)) + µ s (t), v (s) σ i (t) = vσs σv + σ s(t). Detailed derivations are in te supporting document [37]. We found in our simulation tat te first approac converges faster and leads to lower reconstruction error, especially for discrete-valued inputs. Terefore, te simulation results presented in Section VI use te first approac. Denoising: Once te parameters in (1) are estimated, we define a denoiser for i.i.d. signals as conditional expectation: η iid (q) = E[X Q = q] = E[X Q = q, comp = s]p (comp = s Q = q) = s=1 ( σ s s=1 ) σs + σv (q µ s ) + µ s α s N (q; µ s, σ s + σ v) S s=1 α sn (q; µ s, σ s + σ v), were comp is te component index, and ( ) σ E[X Q = q, comp = s] = s σs + σv (q µ s ) + µ s, (13) is te Wiener filter for component s. We ave verified numerically for several distributions and low to moderate noise levels tat te denoising results obtained by te GM-based i.i.d. denoiser (13) approac te MMSE witin a few undredts of a db. For example, te favorable reconstruction results for i.i.d. sparse Laplace signals in Fig. 3 sow tat te GM-based denoiser approaces te MMSE. IV. UNIVERSAL DENOISING We ave seen in Section III tat an i.i.d. denoiser based on GM learning can denoise i.i.d. signals wit unknown distributions. Our goal in tis work is to reconstruct stationary ergodic signals tat are not necessarily i.i.d. Sivaramakrisnan and Weissman [6] ave proposed a universal denoising sceme for stationary ergodic signals wit known bounds based on context quantization, were a stationary ergodic signal is partitioned into i.i.d. subsequences. In tis section, we modify te context quantization sceme and apply te GM-based denoiser (13) to te i.i.d. subsequences, so tat our universal denoiser can denoise stationary ergodic signals tat are unbounded or wit unknown bounds. A. Background Consider te denoising problem (6), were te input x is stationary ergodic. Te main idea of te context quantization sceme [6] is to quantize te noisy symbols q to generate quantized contexts tat are used to partition te unquantized symbols into subsequences. Tat is, given te noisy observations q R N, define te context of q j as c j = [q j 1 ; q j+1 ] Rk for j = 1 + k,..., N k, were [a; b] denotes te concatenation of te sequences a and b. For j k or j N k +1, te median value q med of q is used as te missing symbols in te contexts. As an example for j = k, we only ave k 1 symbols in q before q k, and so te first symbol in c k is missing; we define c k = [q med ; q k 1 1 ; q k k+1 ]. Vector quantization can ten be applied to te context set C = {c j : j = 1,..., N}, and eac c j is assigned a label l j {1,..., L} tat represents te cluster tat c j belongs to. Finally, te L subsequences tat consist of symbols from q wit te same label are obtained by taking q (l) = {q j : l j = l}, for l = 1,..., L. Te symbols in eac subsequence q (l) are regarded as approximately conditionally identically distributed given te common quantized contexts. Te rationale underlying tis concept is tat a sliding-window denoiser uses information from te contexts to estimate te current symbol, and symbols wit similar contexts in te noisy output of te scalar cannel ave similar contexts in te original signal. Terefore, symbols wit similar contexts can be grouped togeter and denoised using te same denoiser. Note tat Sivaramakrisnan and Weissman [6] propose a second subsequencing step, wic furter partitions eac subsequence into smaller subsequences suc tat a symbol in a subsequence does not belong to te contexts of any oter symbols in tis subsequence. Tis step ensures tat te symbols witin eac subsequence are mutually independent, wic is crucial for teoretical analysis. However, for finite-lengt signals, small subsequences may occur, and tey may not contain enoug symbols to learn its empirical pdf well. Terefore, we omit tis second subsequencing step in our implementations. In order to estimate te distribution of x (l), wic is te clean subsequence corresponding to q (l), Sivaramakrisnan and Weissman [6] first estimate te pdf p (l) Q of q(l) via kernel density estimation. Tey ten quantize te range tat x i s take values from and te levels of te empirical distribution function of x, and find a quantized distribution function tat matces p (l) Q well. Once te distribution function of x(l) is obtained, te conditional expectation of te symbols in te l-t subsequence can be calculated.

7 7 For error metrics tat satisfy some mild tecnical conditions, Sivaramakrisnan and Weissman [6] ave proved for stationary ergodic signals wit bounded components tat teir universal denoiser asymptotically acieves te optimal estimation error among all sliding-window denoising scemes despite not knowing te prior for te signal. Wen te error metric is square error, te optimal error is te MMSE. B. Extension to unbounded signals and signals wit unknown bounds Sivaramakrisnan and Weissman [6] ave sown tat one can denoise a stationary ergodic signal by (i) grouping togeter symbols wit similar contexts and (ii) applying an i.i.d. denoiser to eac group. Suc a sceme is optimal in te limit of large signal dimension N. However, teir denoiser assumes an input wit known bounds, wic migt make it inapplicable to some real-world settings. In order to be able to estimate signals tat take values from te entire real line, in step (ii), we apply te GM learning algoritm for density estimation, wic as been discussed in detail in Section III, and compute te conditional expectation wit te estimated density as our i.i.d. denoiser. We now provide details about a modification made to step (i). Te context set C is acquired in te same way as described in Section IV-A. Because te symbols in te context c j C tat are closer in index to q j are likely to provide more information about x j tan te ones tat are located furter away, we add weigts to te contexts before clustering. Tat is, for eac c j C of lengt k, te weigted context is defined as c j = c j w, were denotes a point-wise product, and te weigts take values, { e β(k ki), k i = 1,.., k w ki = e β(ki k 1), (14), k i = k + 1,..., k for some β 0. Wile in noisier cannels, it migt be necessary to use information from longer contexts, comparatively sort contexts could be sufficient for cleaner cannels. Terefore, te exponential decay rate β is made adaptive to te noise level in a way suc tat β increases wit SNR. Specifically, β is cosen to be linear in SNR: β = b 1 log 10 (( q /N σ v)/σ v) + b, (15) were b 1 > 0 and b can be determined numerically. Specifically, we run te algoritm wit a sufficiently large range of β values for various input signals at various SNR levels and measurement rates. For eac setting, we select te β tat acieves te best reconstruction quality. Ten, b 1 and b are obtained using te least squares approac. Note tat b 1 and b are fixed for all te simulation results presented in Section VI. If te parameters were tuned for eac individual input signal, ten te optimal parameter values migt vary for different input signals, and te reconstruction quality migt be improved. Te simulation results in Section VI wit fixed parameters sow tat wen te parameters are sligtly off from te individually optimal ones, te reconstruction quality of AMP-UD is still comparable or better tan te prior art. We coose te linear relation because it is simple and fits well wit our empirical optimal values for β; oter coices for β migt be possible. Te weigted context set C = {c j : j = 1,..., N} is ten sent to a k-means algoritm [38], and q (l), l = 1,..., L, are obtained according to te labels determined via clustering. We can now apply te GM-based i.i.d. denoiser (13) to eac subsequence separately. However, one potential problem is tat te GM fitting algoritm migt not provide a good estimate of te model wen te number of data points is small. We propose two approaces to address tis small cluster issue. Approac 1: Borrow members from nearby clusters. A post-processing step can be added to ensure tat te pdf of q (l) is estimated from no less tan T symbols. Tat is, if te size of q (l), wic is denoted by B, is less tan T, ten T B symbols in oter clusters wose contexts are closest to te centroid of te current cluster are included to estimate te empirical pdf of q (l), wile after te pdf is estimated, te extra symbols are removed, and only q (l) is denoised wit te currently estimated pdf. We call UD wit Approac 1 UD1. Approac : Merge statistically similar subsequences. An alternative approac is to merge subsequences iteratively according to teir statistical caracterizations. Te idea is to find subsequences wit pdfs tat are close in Kullback-Leibler (KL) distance [18], and decide weter merging tem can yield a better model according to te minimum description lengt (MDL) [39] criterion. Denote te iteration index for te merging process by. After te k-means algoritm, we ave obtained a set of subsequences {q (l) : l = 1,..., L }, were L is te current number of subsequences. A GM pdf p (l) Q, is learned for eac subsequence q(l). Te MDL cost cmdl for te current model is calculated as: c MDL L = l=1 q (l) i=1 ( log p (l) Q, (q(l) ) i, ) L + l=1 L + L + L 0 l=1 3 m (l) n (l) L 0 ( ) log q (l) ( ) L 0 log, n (l) were q (l) i, is te i-t entry of te subsequence q(l), m(l) is te number of Gaussian components in te mixture model for subsequence q (l), L 0 is te number of subsequences before te merging procedure, and n (l) is te number of subsequences in te initial set {q (l) 0 : l = 1,..., L 0 } tat are merged to form te subsequence q (l). Te four terms in cmdl are interpreted as follows. Te first term is te negative log likeliood of te entire noisy sequence q given te current GM models. Te second term is te penalty for te number of parameters used to describe te model, were we ave 3 parameters (α, µ, σ ) for eac Gaussian component, and m (l) components for te subsequence q (l). Te tird term arises from bits tat are A related approac is k-nearest neigbors, were for eac symbol in q, we find T symbols wose contexts are nearest to tat of te current symbol and estimate its pdf from te T symbols. Te k-nearest neigbors approac requires to run te GM learning algoritm [4] N times in eac AMP iteration, wic significantly slows down te algoritm.

8 8 used to encode m (l) for l = 1,..., L, because our numerical results ave sown tat te number of Gaussian components ( ) L n rarely exceeds 4. In te fourt term, (l) L l=1 L 0 log 0 n (l) is te uncertainty tat a subsequence from te initial set is mapped to q (l) wit probability n(l) /L 0, for l = 1,..., L. Terefore, te fourt term is te coding lengt for mapping te L 0 subsequences from te initial set to te current set. We ten compute te KL distance between te pdf of q (s) and tat of q (t), for s, t = 1,..., L : ) ( ) (s) p ( D p (s) Q, p (t) Q, = p (s) Q, (q) log Q, (q) p (t) Q, (q) A symmetric L L distance matrix D is obtained by letting its s-t row and t-t column be ( ) ( ) D p (s) p (t) + D p (t) p (s). Q, Q, Q, Q, Suppose te smallest entry in te upper triangular part of D (not including te diagonal) is located in te s -t row and t -t column, ten q (s ) and q (t ) are temporarily merged to form a new subsequence, and a new GM pdf is learned for te merged subsequence. We now ave a new model wit L +1 = L 1 GM pdfs, and te MDL criterion c MDL +1 is calculated for te new model. If c MDL +1 is smaller tan c MDL, ten we accept te new model, and calculate a new L +1 L +1 distance matrix D +1 ; oterwise we keep te current model, and look for te next smallest entry in te upper triangular part of te current L L distance matrix. Te number of subsequences is decreased by at most one after eac iteration, and te merging process ends wen tere is only one subsequence left, or te smallest KL distance between two GM pdfs is greater tan some tresold, wic is determined numerically. We call UD wit Approac UD. We will see in Section VI tat UD is more reliable tan UD1 in terms of MSE performance, wereas UD1 is faster tan UD. Tis is because UD applies a more complicated (and tus slower) subsequencing procedure, wic allows more accurate GM models to be fitted to subsequences. dq. V. PROPOSED UNIVERSAL CS RECOVERY ALGORITHM Combining te tree components tat ave been discussed in Sections II IV, we are now ready to introduce our proposed universal CS recovery algoritm AMP-UD. Note tat te AMP-UD algoritm is designed for medium to large size problems. Specifically, te problem size sould be large enoug, suc tat te decoupling effect of AMP, wic converts te compressed sensing problem to a series of scalar cannel denoising problems, approximately olds, and tat te statistical information about te input can be approximately estimated by te universal denoiser. Consider a linear system (1), were te input signal x is stationary and ergodic wit unknown distributions, and te matrix A as i.i.d. Gaussian entries. To estimate x from y given A, we apply AMP as defined in () and (3). In eac iteration, AWGN corrupted observations, q t = x t + A T r t = x + v, are obtained, were σv is estimated by σ t (5). A subsequencing approac is applied to generate i.i.d. subsequences, were Approac 1 and Approac (Section IV-B) are two possible implementations. Te GM-based i.i.d. denoiser (13) is ten utilized to denoise eac i.i.d. subsequence. To obtain te Onsager correction term in (3), we need to calculate te derivative of η iid (13). For q R, denoting ( ) σ f(q) = α s N (q; µ s, σs + σv) s σ s=1 s + σv (q µ s ) + µ s, g(q) = α s N (q; µ s, σs + σv), s=1 we ave tat f (q) = Terefore, α s N (q; µ s, σs + σv) s=1 ( σs + µ s qµ s σs + σv g (q) = s=1 α s N (q; µ s, σ s + σ v) ( ) ) σs (q µ s ) σs + σv, ( q µ s σs + σv ). η iid(q) = f (q)g(q) f(q)g (q) (g(q)). (16) We igligt tat AMP-UD is unaware of te input SNR and also unaware of te input statistics. Te noise variance σv in te scalar cannel denoising problem is estimated by te average energy of te residual (5). Te input s statistical structure is learned by te universal denoiser witout any prior assumptions. It as been proved [6] tat te context quantization universal denoising sceme can asymptotically acieve te MMSE for stationary ergodic signals wit known bounds. We ave extended te sceme to unbounded signals in Sections III and IV, and conjecture tat our modified universal denoiser can asymptotically acieve te MMSE for unbounded stationary ergodic signals. AMP wit MMSE-acieving separable denoisers as been proved to asymptotically acieve te MMSE in linear systems for i.i.d. inputs [7]. In Section II-B, we ave provided numerical evidence tat sows tat SE olds for AMP wit Bayesian sliding-window denoisers. Bayesian sliding-window denoisers wit proper window-sizes are MMSE-acieving non-separable denoisers [6]. Given tat our universal denoiser resembles a Bayesian sliding-window denoiser, we conjecture tat AMP-UD can acieve te MMSE for stationary ergodic inputs in te limit of large linear systems were te matrix as i.i.d. random entries. Note tat we ave optimized te window-size for inputs of lengt N = via numerical experiments. We believe tat te window size sould increase wit N, and leave te caracterization of te optimal window size for future work. VI. NUMERICAL RESULTS We run AMP-UD1 (AMP wit UD1) and AMP-UD (AMP wit UD) in MATLAB on a Dell OPTIPLEX 9010 running an Intel(R) Core TM i wit 16GB RAM, and test tem utilizing different types of signals, including syntetic signals,

9 9 Signal to distortion ratio (db) dB MMSE 10 10dB AMP UD 10dB AMP UD1 8 10dB SLA MCMC 10dB EMGM AMP MOS 6 5dB MMSE 4 5dB AMP UD 5dB AMP UD1 5dB SLA MCMC 5dB EMGM AMP MOS Measurement rate (R) Fig. 3. Two AMP-UD implementations, SLA-MCMC, and EM-GM-AMP- MOS reconstruction results for an i.i.d. sparse Laplace signal as a function of measurement rate (R = M/N). Note tat te SDR curves for te two AMP- UD implementations and EM-GM-AMP-MOS overlap te MMSE curve. (N = 10000, SNR = 5 db or 10 db.) a cirp sound clip, and a speec signal, at various measurement rates and SNR levels, were we remind te reader tat SNR is defined in Section II-B. Te input signal lengt N is for syntetic signals and rougly for te cirp sound clip and te speec signal. Te context size k is cosen to be 1, and te contexts are weigted according to (14) and (15). Te context quantization is implemented via te k-means algoritm [38]. In order to avoid possible divergence of AMP-UD, possibly due to a bad GM fitting, we employ a damping tecnique [3] to slow down te evolution. Specifically, damping is an extra step in te AMP iteration (3); instead of updating te value of x t+1 by te output of te denoiser η t (A T r t + x t ), a weigted sum of η t (A T r t + x t ) and x t is taken as follows, x t+1 = λη t (A T r t + x t ) + (1 λ)x t, for some λ (0, 1]. Parameters for AMP-UD1: Te number of clusters L is initialized as 10, and may become smaller if empty clusters occur. Te lower bound T on te number of symbols required to learn te GM parameters is 56. Te damping parameter λ is 0.1, and we run 100 AMP iterations. Parameters for AMP-UD: Te initial number of clusters is set to be 30, and tese clusters will be merged according to te sceme described in Section IV. Because eac time wen merging occurs, we need to apply te GM fitting algoritm one more time to learn a new mixture model for te merged cluster, wic is computationally demanding, we apply adaptive damping [34] to reduce te number of iterations required; te number of AMP iterations is set to be 30. Te damping parameter is initialized to be 0.5, and will increase (decrease) witin te range [0.01, 0.5] if te value of te scalar cannel noise estimator σ t (5) decreases (increases). Te recovery performance is evaluated by signal to distortion ratio (SDR = 10 log 10 (E[X ]/MSE)), were te MSE is averaged over 50 random draws of x, A, and z. Signal to distortion ratio (db) dB AMP UD 14 10dB AMP UD1 10dB SLA MCMC 1 10dB turbogamp 5dB AMP UD 5dB AMP UD1 10 5dB SLA MCMC 5dB turbogamp Measurement rate (R) Fig. 4. Two AMP-UD implementations, SLA-MCMC, and turbogamp reconstruction results for a two-state Markov signal wit nonzero entries drawn from a uniform distribution U[0, 1] as a function of measurement rate. Note tat te SDR curves for te two AMP-UD implementations overlap at SNR = 5 db, and tey bot overlap turbogamp at SNR = 10 db. (N = 10000, SNR = 5 db or 10 db.) We compare te performance of te two AMP-UD implementations to (i) te universal CS recovery algoritm SLA-MCMC [3]; and (ii) te empirical Bayesian message passing approaces EM-GM-AMP-MOS [10] for i.i.d. inputs and turbogamp [11] for non-i.i.d. inputs. Note tat EM- GM-AMP-MOS assumes during recovery tat te input is i.i.d., wereas turbogamp is designed for non-i.i.d. inputs wit a known statistical model. We do not include results for oter well-known CS algoritms suc as compressive sensing matcing pursuit (CoSaMP) [40], gradient projection for sparse reconstruction (GPSR) [41], or l 1 minimization [, 3], because teir SDR performance is consistently weaker tan te tree algoritms being compared. Sparse Laplace signal (i.i.d.): We tested i.i.d. sparse Laplace signals tat follow te distribution p X (x) = 0.03L(0, 1) δ(x), were L(0, 1) denotes a Laplacian distribution wit mean zero and variance one, and δ( ) is te delta function [4]. It is sown in Fig. 3 tat te two AMP-UD implementations and EM-GM-AMP-MOS acieve te MMSE [43, 44], wereas SLA-MCMC as weaker performance, because te MCMC approac is expected to sample from te posterior and its MSE is twice te MMSE [14, 3]. Markov-uniform signal: Consider te two-state Markov state macine defined in Section II-B wit p 01 = and p 10 = A Markov-uniform signal (MUnif for sort) follows a uniform distribution U[0, 1] at te nonzero state s 1. Tese parameters lead to 3% nonzero entries in an MUnif signal on average. It is sown in Fig. 4 tat at low SNR, te two AMP-UD implementations acieve iger SDR tan SLA- MCMC and turbogamp. At ig SNR, te two AMP-UD implementations and turbogamp ave similar SDR performance, and are sligtly better tan SLA-MCMC. We igligt tat turbogamp needs side information about te Markovian structure of te signal, wereas te two AMP-UD implementations and SLA-MCMC do not. Dense Markov-Rademacer signal: Consider te two-

10 10 Signal to distortion ratio (db) dB AMP UD 0 10dB AMP UD1 10dB SLA MCMC 15 10dB turbogamp 5dB AMP UD 5dB AMP UD1 10 5dB SLA MCMC 5dB turbogamp Measurement rate (R) Fig. 5. Two AMP-UD implementations, SLA-MCMC, and turbogamp reconstruction results for a dense two-state Markov signal wit nonzero entries drawn from a Rademacer (±1) distribution as a function of measurement rate. (N = 10000, SNR = 10 db or 15 db.) Signal to Distortion Ratio (db) dB AMP-UD 10dB SLA-MCMC 6 10dB AMP-UD1 10dB EM-GM-AMP-MOS 5 5dB AMP-UD 5dB SLA-MCMC 4 5dB AMP-UD1 5dB EM-GM-AMP-MOS Measurement rate (R) Fig. 6. Two AMP-UD implementations, SLA-MCMC, and EM-GM-AMP- MOS reconstruction results for a cirp sound clip as a function of measurement rate. (N = 9600, SNR = 5 db or 10 db.) state Markov state macine defined in Section II-B wit p 01 = 3 70 and p 10 = A dense Markov Rademacer signal (MRad for sort) takes values from { 1, +1} wit equal probability at s 1. Tese parameters lead to 30% nonzero entries in an MRad signal on average. Because te MRad signal is dense (non-sparse), we must measure it wit somewat larger measurement rates and SNRs tan before. It is sown in Fig. 5 tat te two AMP-UD implementations and SLA- MCMC ave better overall performance tan turbogamp. AMP-UD1 outperforms SLA-MCMC except for te lowest tested measurement rate at low SNR, wereas AMP-UD outperforms SLA-MCMC consistently. Cirp sound clip and speec signal: Our experiments up to tis point use syntetic signals. We now evaluate te reconstruction quality of AMP-UD for two real-world signals. A Cirp sound clip and a speec signal are used. We cut a segment wit lengt 9600 out of te Cirp and a segment wit lengt out of te speec signal (denoted by x), and performed a sort-time discrete cosine transform (DCT) wit window size, number of DCT points, and op size all being 3. Te resulting sort-time DCT coefficients matrix are ten vectorized to form a coefficient vector θ. Denoting te sorttime DCT matrix by W 1, we ave θ = W 1 x. Terefore, we can rewrite (1) as y = Φθ + z, were Φ = AW. Our goal is to reconstruct θ from te measurements y and te matrix Φ. After we obtain te estimated coefficient vector θ, te estimated signal is calculated as x = W θ. Altoug te coefficient vector θ may exibit some type of memory, it is not readily modeled in closed form, and so we cannot provide a valid model for turbogamp [11]. Terefore, we use EM-GM-AMP-MOS [10] instead of turbogamp [11]. Te SDRs for te two AMP-UD implementations, SLA-MCMC and EM-GM-AMP-MOS [10] for te Cirp are plotted in Fig. 6 and te speec signal in Fig. 7. We can see tat bot AMP-UD implementations outperform EM-GM-AMP- MOS consistently, wic implies tat te simple i.i.d. model is suboptimal for tese two real-world signals. Moreover, AMP- UD provides comparable and in most cases iger SDR tan SLA-MCMC, wic indicates tat AMP-UD is more reliable in learning various statistical structures tan SLA-MCMC. AMP-UD1 is te fastest among te four algoritms, but it may ave lower reconstruction quality tan AMP-UD and SLA-MCMC, owing to poor selection of te subsequences. It is wort mentioning tat we ave also run simulations on an electrocardiograp (ECG) signal, and EM-GM-AMP-MOS acieved similar SDR as te two AMP-UD implementations, wic indicates tat an i.i.d. model migt be adequate to represent te coefficients of te ECG signal; te plot is omitted for brevity. Runtime: Te runtime of AMP-UD1 and AMP-UD for MUnif, MRad, and te speec signal is typically under 5 minutes and 10 minutes, respectively, but somewat more for signals suc as sparse Laplace and te cirp sound clip tat require a large number of Gaussian components to be fit. For comparison, te runtime of SLA-MCMC is typically an our, wereas typical runtimes of EM-GM-AMP-MOS and turbogamp are 30 minutes. To furter accelerate AMP, we could consider parallel computing. Tat is, after clustering, te Gaussian mixture learning algoritm can be implemented simultaneously in different processors. VII. CONCLUSION In tis paper, we introduced a universal compressed sensing recovery algoritm AMP-UD tat applies our proposed universal denoiser (UD) witin approximate message passing (AMP). AMP-UD is designed to reconstruct stationary ergodic signals from noisy linear measurements. Te performance of two AMP-UD implementations was evaluated via simulations, were it was sown tat AMP-UD acieves favorable signal to distortion ratios compared to existing universal algoritms, and tat its runtime is typically faster. AMP-UD combines tree existing scemes: (i) AMP [6]; (ii) universal denoising [6]; and (iii) a density estimation approac based on Gaussian mixture (GM) fitting [4]. In

11 11 Signal to Distortion Ratio (db) dB AMP-UD 10dB SLA-MCMC 10dB AMP-UD1 10dB EM-GM-AMP-MOS 5dB AMP-UD 5dB SLA-MCMC 5dB AMP-UD1 5dB EM-GM-AMP-MOS Measurement rate (R) Fig. 7. Two AMP-UD implementations, SLA-MCMC, and EM-GM-AMP- MOS reconstruction results for a speec signal as a function of measurement rate. (N = 10560, SNR = 5 db or 10 db.) addition to te algoritmic framework, we provided tree specific contributions. First, we provided numerical results sowing tat SE olds for non-separable Bayesian slidingwindow denoisers. Second, we modified te GM learning algoritm, and extended it to an i.i.d. denoiser. Tird, we designed a universal denoiser tat does not need to know te bounds of te input or require te input signal to be bounded. Two implementations of te universal denoiser were provided, wit one being faster and te oter acieving better reconstruction quality in terms of signal to distortion ratio. Tere are numerous directions for future work. First, our current algoritm was designed to minimize te square error, and te denoiser could be modified to minimize oter error metrics [45]. Second, AMP-UD was designed to reconstruct one-dimensional signals. In order to support applications tat process multi-dimensional signals suc as images, it migt be instructive to employ universal image denoisers witin AMP. Tird, te relation between te input lengt and te optimal window-size, as well as te exponential decay rate of te context weigts, can be investigated. Finally, we can modify our work to support measurement noise wit unknown distributions as an extension to adaptive generalized AMP [1]. ACKNOWLEDGEMENTS We tank Mario Figueiredo and Tsacy Weissman for informative discussions; and Amad Beirami and Marco Duarte for detailed suggestions on te manuscript. REFERENCES [1] Y. Ma, J. Zu, and D. Baron, Compressed sensing via universal denoising and approximate message passing, in Proc. Allerton Conference Commun., Control, and Comput., Oct [] D. Donoo, Compressed sensing, IEEE Trans. Inf. Teory, vol. 5, no. 4, pp , Apr [3] E. Candès, J. Romberg, and T. Tao, Robust uncertainty principles: Exact signal reconstruction from igly incomplete frequency information, IEEE Trans. Inf. Teory, vol. 5, no., pp , Feb [4] R. Tibsirani, Regression srinkage and selection via te LASSO, J. Royal Stat. Soc. Series B (Metodological), vol. 58, no. 1, pp , [5] S. Cen, D. Donoo, and M. Saunders, Atomic decomposition by basis pursuit, SIAM J. Sci. Comp., vol. 0, no. 1, pp , [6] D. L. Donoo, A. Maleki, and A. Montanari, Message passing algoritms for compressed sensing, Proc. Nat. Academy Sci., vol. 106, no. 45, pp , Nov [7] D. Baron, S. Sarvotam, and R. G. Baraniuk, Bayesian compressive sensing via belief propagation, IEEE Trans. Signal Process., vol. 58, pp , Jan [8] D. L. Donoo, A. Maleki, and A. Montanari, Message Passing Algoritms for Compressed Sensing: I. Motivation and Construction, in IEEE Inf. Teory Worksop, Jan [9] S. Rangan, Generalized approximate message passing for estimation wit random linear mixing, in Proc. IEEE Int. Symp. Inf. Teory (ISIT), July 011, pp [10] J. Vila and P. Scniter, Expectation-maximization Gaussian-mixture approximate message passing, IEEE Trans. Signal Process., vol. 61, no. 19, pp , Oct [11] J. Ziniel, S. Rangan, and P. Scniter, A generalized framework for learning and recovery of structured sparse signals, in Proc. IEEE Stat. Signal Process. Worksop (SSP), Aug. 01, pp [1] U. Kamilov, S. Rangan, A. K. Fletcer, and M. Unser, Approximate message passing wit consistent parameter estimation and applications to sparse learning, IEEE Trans. Inf. Teory, vol. 60, no. 5, pp , May 014. [13] Y. Ma, D. Baron, and A. Beirami, Mismatced estimation in large linear systems, in Proc. IEEE Int. Symp. Inf. Teory (ISIT), July 015, pp [14] D. L. Donoo, Te Kolmogorov sampler, Stanford University, Stanford, CA, Department of Statistics Tecnical Report 00-4, Jan. 00. [15] D. Donoo, H. Kakavand, and J. Mammen, Te simplest solution to an underdetermined system of linear equations, in Proc. Int. Symp. Inf. Teory (ISIT), July 006, pp [16] S. Jalali and A. Maleki, Minimum complexity pursuit, in Proc. Allerton Conference Commun., Control, Comput., Sept. 011, pp [17] S. Jalali, A. Maleki, and R. G. Baraniuk, Minimum complexity pursuit for universal compressed sensing, IEEE Trans. Inf. Teory, vol. 60, no. 4, pp , Apr [18] T. M. Cover and J. A. Tomas, Elements of Information Teory. New York, NY, USA: Wiley-Interscience, 006. [19] M. Li and P. M. B. Vitanyi, An introduction to Kolmogorov complexity and its applications. Springer-Verlag, New York, 008. [0] D. Baron, Information complexity and estimation, in Worksop Inf. Teoretic Metods Sci. Eng. (WITMSE), Aug [1] D. Baron and M. F. Duarte, Universal MAP estimation in compressed sensing, in Proc. Allerton Conference Commun., Control, and Comput., Sept. 011, pp [] J. Zu, D. Baron, and M. F. Duarte, Complexity adaptive universal signal estimation for compressed sensing, in Proc. IEEE Stat. Signal Process. Worksop (SSP), June 014, pp [3], Recovery from linear measurements wit complexity-matcing universal signal estimation, IEEE Trans. Signal Process., vol. 63, no. 6, pp , Mar [4] M. Figueiredo and A. Jain, Unsupervised learning of finite mixture models, IEEE Trans. Pattern Anal. Mac. Intell., vol. 4, pp , Mar. 00. [5] K. Sivaramakrisnan and T. Weissman, Universal denoising of discretetime continuous-amplitude signals, IEEE Trans. Inf. Teory, vol. 54, no. 1, pp , Dec [6], A context quantization approac to universal denoising, IEEE Trans. Signal Process., vol. 57, no. 6, pp , June 009. [7] M. Bayati and A. Montanari, Te dynamics of message passing on dense graps, wit applications to compressed sensing, IEEE Trans. Inf. Teory, vol. 57, no., pp , Feb [8] D. Donoo, I. Jonstone, and A. Montanari, Accurate prediction of pase transitions in compressed sensing via a connection to minimax denoising, IEEE Trans. Inf. Teory, vol. 59, no. 6, pp , June 013. [9] J. Tan, Y. Ma, and D. Baron, Compressive imaging via approximate message passing wit image denoising, IEEE Trans. Signal Process., vol. 63, no. 8, pp , Apr [30] C. Rus, A. Greig, and R. Venkataramanan, Capacity-acieving sparse superposition codes via approximate message passing decoding, Proc. Int. Symp. Inf. Teory (ISIT), June 015. [31] C. Metzler, A. Maleki, and R. G. Baraniuk, From denoising to compressed sensing, Arxiv preprint arxiv: v, June 014.

12 1 [3] S. Rangan, P. Scniter, and A. Fletcer, On te convergence of approximate message passing wit arbitrary matrices, in Proc. IEEE Int. Symp. Inf. Teory (ISIT), July 014, pp [33] A. Manoel, F. Krzakala, E. W. Tramel, and L. Zdeborova, Sparse estimation wit te swept approximated message-passing algoritm, Arxiv preprint arxiv: , June 014. [34] J. Vila, P. Scniter, S. Rangan, F. Krzakala, and L. Zdeborova, Adaptive damping and mean removal for te generalized approximate message passing algoritm, in IEEE Int. Conf. Acoustics, Speec, Signal Process. (ICASSP), Apr. 015, pp [35] S. Rangan, A. K. Fletcer, P. Scniter, and U. S. Kamilov, Inference for generalized linear models via alternating directions and bete free energy minimization, in Proc. Int. Symp. Inf. Teory (ISIT), June 015, pp [36] A. Montanari, Grapical models concepts in compressed sensing, Compressed Sensing: Teory and Applications, pp , 01. [37] Y. Ma, J. Zu, and D. Baron, Approximate message passing algoritm wit universal denoising and Gaussian mixture learning, Arxiv preprint arxiv: v, Aug [38] J. MacQueen, Some metods for classification and analysis of multivariate observations, in Proc. 5t Berkeley Symp. Mat. Stat. & Prob., vol. 1, no. 14, 1967, pp [39] A. Barron, J. Rissanen, and B. Yu, Te minimum description lengt principle in coding and modeling, IEEE Trans. Inf. Teory, vol. 44, no. 6, pp , Oct [40] D. Needell and J. A. Tropp, CoSaMP: Iterative signal recovery from incomplete and inaccurate samples, Appl. Computational Harmonic Anal., vol. 6, no. 3, pp , May 009. [41] M. Figueiredo, R. Nowak, and S. J. Wrigt, Gradient projection for sparse reconstruction: Application to compressed sensing and oter inverse problems, IEEE J. Select. Topics Signal Proces., vol. 1, pp , Dec [4] A. Papoulis, Probability, Random Variables, and Stocastic Processes. McGraw Hill Book Co., [43] D. Guo, D. Baron, and S. Samai, A single-letter caracterization of optimal noisy compressed sensing, in Proc. 47t Allerton Conf. Commun., Control, and Comput., Sept. 009, pp [44] S. Rangan, A. K. Fletcer, and V. K. Goyal, Asymptotic analysis of MAP estimation via te replica metod and applications to compressed sensing, IEEE Trans. Inf. Teory, vol. 58, no. 3, pp , Mar. 01. [45] J. Tan, D. Carmon, and D. Baron, Signal estimation wit additive error metrics in compressed sensing, IEEE Trans. Inf. Teory, vol. 60, no. 1, pp , Jan Yanting Ma (S 13) received te B.Sc. degree in communication engineering from Wuan University, Cina, in 01. Currently se is a P.D. candidate at Nort Carolina State University, in te Department of Electrical and Computer Engineering. Her researc interests include information teory, convex optimization, and statistical signal processing. Junan Zu (S 11) received te B.E. degree in Electrical Engineering wit a focus on Optoelectronics from te University of Sangai for Science and Tecnology (USST), Sangai, Cina in 011. He is currently pursuing te P.D. degree in Electrical Engineering at Nort Carolina State University (NCSU), Raleig, NC. His researc interests include compressed sensing, macine learning, optimization, distributed algoritms, and computational imaging. Dror Baron (S 99-M 03-SM 10) received te B.Sc. (summa cum laude) and M.Sc. degrees from te Tecnion - Israel Institute of Tecnology, Haifa, Israel, in 1997 and 1999, and te P.D. degree from te University of Illinois at Urbana-Campaign in 003, all in electrical engineering. From 1997 to 1999, Dr. Baron worked at Witcom Ltd. in modem design. From 1999 to 003, e was a researc assistant at te University of Illinois at UrbanaCampaign, were e was also a Visiting Assistant Professor in 003. From 003 to 006, e was a Postdoctoral Researc Associate in te Department of Electrical and Computer Engineering at Rice University, Houston, TX. From 007 to 008, e was a quantitative fnancial analyst wit Menta Capital, San Francisco, CA, and from 008 to 010 e was a Visiting Scientist in te Department of Electrical Engineering at te Tecnion - Israel Institute of Tecnology, Haifa. Since 010, Dr. Baron as been wit te Electrical and Computer Engineering Department at Nort Carolina State University, were e is currently an Associate Professor. Dr. Baron s researc interests combine information teory, signal processing, and fast algoritms; in recent years, e as focused on compressed sensing. Dr. Baron was a recipient of te 00 M. E. Van Valkenburg Graduate Researc Award, and received onorable mention at te Robert Borer Memorial Student Worksop in April 00, bot at te University of Illinois. He also participated from 1994 to 1997 in te Program for Outstanding Students, comprising te top 0.5% of undergraduates at te Tecnion.

A = h w (1) Error Analysis Physics 141

A = h w (1) Error Analysis Physics 141 Introduction In all brances of pysical science and engineering one deals constantly wit numbers wic results more or less directly from experimental observations. Experimental observations always ave inaccuracies.

More information

Regularized Regression

Regularized Regression Regularized Regression David M. Blei Columbia University December 5, 205 Modern regression problems are ig dimensional, wic means tat te number of covariates p is large. In practice statisticians regularize

More information

Mismatched Estimation in Large Linear Systems

Mismatched Estimation in Large Linear Systems Mismatched Estimation in Large Linear Systems Yanting Ma, Dror Baron, Ahmad Beirami Department of Electrical and Computer Engineering, North Carolina State University, Raleigh, NC 7695, USA Department

More information

Copyright c 2008 Kevin Long

Copyright c 2008 Kevin Long Lecture 4 Numerical solution of initial value problems Te metods you ve learned so far ave obtained closed-form solutions to initial value problems. A closedform solution is an explicit algebriac formula

More information

lecture 26: Richardson extrapolation

lecture 26: Richardson extrapolation 43 lecture 26: Ricardson extrapolation 35 Ricardson extrapolation, Romberg integration Trougout numerical analysis, one encounters procedures tat apply some simple approximation (eg, linear interpolation)

More information

Probabilistic Graphical Models Homework 1: Due January 29, 2014 at 4 pm

Probabilistic Graphical Models Homework 1: Due January 29, 2014 at 4 pm Probabilistic Grapical Models 10-708 Homework 1: Due January 29, 2014 at 4 pm Directions. Tis omework assignment covers te material presented in Lectures 1-3. You must complete all four problems to obtain

More information

Efficient algorithms for for clone items detection

Efficient algorithms for for clone items detection Efficient algoritms for for clone items detection Raoul Medina, Caroline Noyer, and Olivier Raynaud Raoul Medina, Caroline Noyer and Olivier Raynaud LIMOS - Université Blaise Pascal, Campus universitaire

More information

Consider a function f we ll specify which assumptions we need to make about it in a minute. Let us reformulate the integral. 1 f(x) dx.

Consider a function f we ll specify which assumptions we need to make about it in a minute. Let us reformulate the integral. 1 f(x) dx. Capter 2 Integrals as sums and derivatives as differences We now switc to te simplest metods for integrating or differentiating a function from its function samples. A careful study of Taylor expansions

More information

Lecture 15. Interpolation II. 2 Piecewise polynomial interpolation Hermite splines

Lecture 15. Interpolation II. 2 Piecewise polynomial interpolation Hermite splines Lecture 5 Interpolation II Introduction In te previous lecture we focused primarily on polynomial interpolation of a set of n points. A difficulty we observed is tat wen n is large, our polynomial as to

More information

Notes on Neural Networks

Notes on Neural Networks Artificial neurons otes on eural etwors Paulo Eduardo Rauber 205 Consider te data set D {(x i y i ) i { n} x i R m y i R d } Te tas of supervised learning consists on finding a function f : R m R d tat

More information

Te comparison of dierent models M i is based on teir relative probabilities, wic can be expressed, again using Bayes' teorem, in terms of prior probab

Te comparison of dierent models M i is based on teir relative probabilities, wic can be expressed, again using Bayes' teorem, in terms of prior probab To appear in: Advances in Neural Information Processing Systems 9, eds. M. C. Mozer, M. I. Jordan and T. Petsce. MIT Press, 997 Bayesian Model Comparison by Monte Carlo Caining David Barber D.Barber@aston.ac.uk

More information

Bayesian ML Sequence Detection for ISI Channels

Bayesian ML Sequence Detection for ISI Channels Bayesian ML Sequence Detection for ISI Cannels Jill K. Nelson Department of Electrical and Computer Engineering George Mason University Fairfax, VA 030 Email: jnelson@gmu.edu Andrew C. Singer Department

More information

Digital Filter Structures

Digital Filter Structures Digital Filter Structures Te convolution sum description of an LTI discrete-time system can, in principle, be used to implement te system For an IIR finite-dimensional system tis approac is not practical

More information

Minimizing D(Q,P) def = Q(h)

Minimizing D(Q,P) def = Q(h) Inference Lecture 20: Variational Metods Kevin Murpy 29 November 2004 Inference means computing P( i v), were are te idden variables v are te visible variables. For discrete (eg binary) idden nodes, exact

More information

Exercises for numerical differentiation. Øyvind Ryan

Exercises for numerical differentiation. Øyvind Ryan Exercises for numerical differentiation Øyvind Ryan February 25, 2013 1. Mark eac of te following statements as true or false. a. Wen we use te approximation f (a) (f (a +) f (a))/ on a computer, we can

More information

Polynomial Interpolation

Polynomial Interpolation Capter 4 Polynomial Interpolation In tis capter, we consider te important problem of approximatinga function fx, wose values at a set of distinct points x, x, x,, x n are known, by a polynomial P x suc

More information

IEOR 165 Lecture 10 Distribution Estimation

IEOR 165 Lecture 10 Distribution Estimation IEOR 165 Lecture 10 Distribution Estimation 1 Motivating Problem Consider a situation were we ave iid data x i from some unknown distribution. One problem of interest is estimating te distribution tat

More information

CS522 - Partial Di erential Equations

CS522 - Partial Di erential Equations CS5 - Partial Di erential Equations Tibor Jánosi April 5, 5 Numerical Di erentiation In principle, di erentiation is a simple operation. Indeed, given a function speci ed as a closed-form formula, its

More information

7 Semiparametric Methods and Partially Linear Regression

7 Semiparametric Methods and Partially Linear Regression 7 Semiparametric Metods and Partially Linear Regression 7. Overview A model is called semiparametric if it is described by and were is nite-dimensional (e.g. parametric) and is in nite-dimensional (nonparametric).

More information

An Empirical Bayesian interpretation and generalization of NL-means

An Empirical Bayesian interpretation and generalization of NL-means Computer Science Tecnical Report TR2010-934, October 2010 Courant Institute of Matematical Sciences, New York University ttp://cs.nyu.edu/web/researc/tecreports/reports.tml An Empirical Bayesian interpretation

More information

Lecture XVII. Abstract We introduce the concept of directional derivative of a scalar function and discuss its relation with the gradient operator.

Lecture XVII. Abstract We introduce the concept of directional derivative of a scalar function and discuss its relation with the gradient operator. Lecture XVII Abstract We introduce te concept of directional derivative of a scalar function and discuss its relation wit te gradient operator. Directional derivative and gradient Te directional derivative

More information

Learning based super-resolution land cover mapping

Learning based super-resolution land cover mapping earning based super-resolution land cover mapping Feng ing, Yiang Zang, Giles M. Foody IEEE Fellow, Xiaodong Xiuua Zang, Siming Fang, Wenbo Yun Du is work was supported in part by te National Basic Researc

More information

5 Ordinary Differential Equations: Finite Difference Methods for Boundary Problems

5 Ordinary Differential Equations: Finite Difference Methods for Boundary Problems 5 Ordinary Differential Equations: Finite Difference Metods for Boundary Problems Read sections 10.1, 10.2, 10.4 Review questions 10.1 10.4, 10.8 10.9, 10.13 5.1 Introduction In te previous capters we

More information

WYSE Academic Challenge 2004 Sectional Mathematics Solution Set

WYSE Academic Challenge 2004 Sectional Mathematics Solution Set WYSE Academic Callenge 00 Sectional Matematics Solution Set. Answer: B. Since te equation can be written in te form x + y, we ave a major 5 semi-axis of lengt 5 and minor semi-axis of lengt. Tis means

More information

Numerical Differentiation

Numerical Differentiation Numerical Differentiation Finite Difference Formulas for te first derivative (Using Taylor Expansion tecnique) (section 8.3.) Suppose tat f() = g() is a function of te variable, and tat as 0 te function

More information

Volume 29, Issue 3. Existence of competitive equilibrium in economies with multi-member households

Volume 29, Issue 3. Existence of competitive equilibrium in economies with multi-member households Volume 29, Issue 3 Existence of competitive equilibrium in economies wit multi-member ouseolds Noriisa Sato Graduate Scool of Economics, Waseda University Abstract Tis paper focuses on te existence of

More information

LIMITATIONS OF EULER S METHOD FOR NUMERICAL INTEGRATION

LIMITATIONS OF EULER S METHOD FOR NUMERICAL INTEGRATION LIMITATIONS OF EULER S METHOD FOR NUMERICAL INTEGRATION LAURA EVANS.. Introduction Not all differential equations can be explicitly solved for y. Tis can be problematic if we need to know te value of y

More information

Time (hours) Morphine sulfate (mg)

Time (hours) Morphine sulfate (mg) Mat Xa Fall 2002 Review Notes Limits and Definition of Derivative Important Information: 1 According to te most recent information from te Registrar, te Xa final exam will be eld from 9:15 am to 12:15

More information

Introduction to Machine Learning. Recitation 8. w 2, b 2. w 1, b 1. z 0 z 1. The function we want to minimize is the loss over all examples: f =

Introduction to Machine Learning. Recitation 8. w 2, b 2. w 1, b 1. z 0 z 1. The function we want to minimize is the loss over all examples: f = Introduction to Macine Learning Lecturer: Regev Scweiger Recitation 8 Fall Semester Scribe: Regev Scweiger 8.1 Backpropagation We will develop and review te backpropagation algoritm for neural networks.

More information

INTRODUCTION AND MATHEMATICAL CONCEPTS

INTRODUCTION AND MATHEMATICAL CONCEPTS INTODUCTION ND MTHEMTICL CONCEPTS PEVIEW Tis capter introduces you to te basic matematical tools for doing pysics. You will study units and converting between units, te trigonometric relationsips of sine,

More information

Math 102 TEST CHAPTERS 3 & 4 Solutions & Comments Fall 2006

Math 102 TEST CHAPTERS 3 & 4 Solutions & Comments Fall 2006 Mat 102 TEST CHAPTERS 3 & 4 Solutions & Comments Fall 2006 f(x+) f(x) 10 1. For f(x) = x 2 + 2x 5, find ))))))))) and simplify completely. NOTE: **f(x+) is NOT f(x)+! f(x+) f(x) (x+) 2 + 2(x+) 5 ( x 2

More information

Notes on wavefunctions II: momentum wavefunctions

Notes on wavefunctions II: momentum wavefunctions Notes on wavefunctions II: momentum wavefunctions and uncertainty Te state of a particle at any time is described by a wavefunction ψ(x). Tese wavefunction must cange wit time, since we know tat particles

More information

Polynomial Interpolation

Polynomial Interpolation Capter 4 Polynomial Interpolation In tis capter, we consider te important problem of approximating a function f(x, wose values at a set of distinct points x, x, x 2,,x n are known, by a polynomial P (x

More information

New Distribution Theory for the Estimation of Structural Break Point in Mean

New Distribution Theory for the Estimation of Structural Break Point in Mean New Distribution Teory for te Estimation of Structural Break Point in Mean Liang Jiang Singapore Management University Xiaou Wang Te Cinese University of Hong Kong Jun Yu Singapore Management University

More information

Chapter 5 FINITE DIFFERENCE METHOD (FDM)

Chapter 5 FINITE DIFFERENCE METHOD (FDM) MEE7 Computer Modeling Tecniques in Engineering Capter 5 FINITE DIFFERENCE METHOD (FDM) 5. Introduction to FDM Te finite difference tecniques are based upon approximations wic permit replacing differential

More information

Rigorous Dynamics and Consistent Estimation in Arbitrarily Conditioned Linear Systems

Rigorous Dynamics and Consistent Estimation in Arbitrarily Conditioned Linear Systems 1 Rigorous Dynamics and Consistent Estimation in Arbitrarily Conditioned Linear Systems Alyson K. Fletcher, Mojtaba Sahraee-Ardakan, Philip Schniter, and Sundeep Rangan Abstract arxiv:1706.06054v1 cs.it

More information

Symmetry Labeling of Molecular Energies

Symmetry Labeling of Molecular Energies Capter 7. Symmetry Labeling of Molecular Energies Notes: Most of te material presented in tis capter is taken from Bunker and Jensen 1998, Cap. 6, and Bunker and Jensen 2005, Cap. 7. 7.1 Hamiltonian Symmetry

More information

Pre-Calculus Review Preemptive Strike

Pre-Calculus Review Preemptive Strike Pre-Calculus Review Preemptive Strike Attaced are some notes and one assignment wit tree parts. Tese are due on te day tat we start te pre-calculus review. I strongly suggest reading troug te notes torougly

More information

A MONTE CARLO ANALYSIS OF THE EFFECTS OF COVARIANCE ON PROPAGATED UNCERTAINTIES

A MONTE CARLO ANALYSIS OF THE EFFECTS OF COVARIANCE ON PROPAGATED UNCERTAINTIES A MONTE CARLO ANALYSIS OF THE EFFECTS OF COVARIANCE ON PROPAGATED UNCERTAINTIES Ronald Ainswort Hart Scientific, American Fork UT, USA ABSTRACT Reports of calibration typically provide total combined uncertainties

More information

1 Introduction to Optimization

1 Introduction to Optimization Unconstrained Convex Optimization 2 1 Introduction to Optimization Given a general optimization problem of te form min x f(x) (1.1) were f : R n R. Sometimes te problem as constraints (we are only interested

More information

Mathematics 5 Worksheet 11 Geometry, Tangency, and the Derivative

Mathematics 5 Worksheet 11 Geometry, Tangency, and the Derivative Matematics 5 Workseet 11 Geometry, Tangency, and te Derivative Problem 1. Find te equation of a line wit slope m tat intersects te point (3, 9). Solution. Te equation for a line passing troug a point (x

More information

Homework 1 Due: Wednesday, September 28, 2016

Homework 1 Due: Wednesday, September 28, 2016 0-704 Information Processing and Learning Fall 06 Homework Due: Wednesday, September 8, 06 Notes: For positive integers k, [k] := {,..., k} denotes te set of te first k positive integers. Wen p and Y q

More information

3.4 Worksheet: Proof of the Chain Rule NAME

3.4 Worksheet: Proof of the Chain Rule NAME Mat 1170 3.4 Workseet: Proof of te Cain Rule NAME Te Cain Rule So far we are able to differentiate all types of functions. For example: polynomials, rational, root, and trigonometric functions. We are

More information

HOW TO DEAL WITH FFT SAMPLING INFLUENCES ON ADEV CALCULATIONS

HOW TO DEAL WITH FFT SAMPLING INFLUENCES ON ADEV CALCULATIONS HOW TO DEAL WITH FFT SAMPLING INFLUENCES ON ADEV CALCULATIONS Po-Ceng Cang National Standard Time & Frequency Lab., TL, Taiwan 1, Lane 551, Min-Tsu Road, Sec. 5, Yang-Mei, Taoyuan, Taiwan 36 Tel: 886 3

More information

The derivative function

The derivative function Roberto s Notes on Differential Calculus Capter : Definition of derivative Section Te derivative function Wat you need to know already: f is at a point on its grap and ow to compute it. Wat te derivative

More information

. If lim. x 2 x 1. f(x+h) f(x)

. If lim. x 2 x 1. f(x+h) f(x) Review of Differential Calculus Wen te value of one variable y is uniquely determined by te value of anoter variable x, ten te relationsip between x and y is described by a function f tat assigns a value

More information

Spike train entropy-rate estimation using hierarchical Dirichlet process priors

Spike train entropy-rate estimation using hierarchical Dirichlet process priors publised in: Advances in Neural Information Processing Systems 26 (23), 276 284. Spike train entropy-rate estimation using ierarcical Diriclet process priors Karin Knudson Department of Matematics kknudson@mat.utexas.edu

More information

Fractional Derivatives as Binomial Limits

Fractional Derivatives as Binomial Limits Fractional Derivatives as Binomial Limits Researc Question: Can te limit form of te iger-order derivative be extended to fractional orders? (atematics) Word Count: 669 words Contents - IRODUCIO... Error!

More information

Mathematics 105 Calculus I. Exam 1. February 13, Solution Guide

Mathematics 105 Calculus I. Exam 1. February 13, Solution Guide Matematics 05 Calculus I Exam February, 009 Your Name: Solution Guide Tere are 6 total problems in tis exam. On eac problem, you must sow all your work, or oterwise torougly explain your conclusions. Tere

More information

Quantum Numbers and Rules

Quantum Numbers and Rules OpenStax-CNX module: m42614 1 Quantum Numbers and Rules OpenStax College Tis work is produced by OpenStax-CNX and licensed under te Creative Commons Attribution License 3.0 Abstract Dene quantum number.

More information

Kernel Density Based Linear Regression Estimate

Kernel Density Based Linear Regression Estimate Kernel Density Based Linear Regression Estimate Weixin Yao and Zibiao Zao Abstract For linear regression models wit non-normally distributed errors, te least squares estimate (LSE will lose some efficiency

More information

Lecture 21. Numerical differentiation. f ( x+h) f ( x) h h

Lecture 21. Numerical differentiation. f ( x+h) f ( x) h h Lecture Numerical differentiation Introduction We can analytically calculate te derivative of any elementary function, so tere migt seem to be no motivation for calculating derivatives numerically. However

More information

Combining functions: algebraic methods

Combining functions: algebraic methods Combining functions: algebraic metods Functions can be added, subtracted, multiplied, divided, and raised to a power, just like numbers or algebra expressions. If f(x) = x 2 and g(x) = x + 2, clearly f(x)

More information

THE hidden Markov model (HMM)-based parametric

THE hidden Markov model (HMM)-based parametric JOURNAL OF L A TEX CLASS FILES, VOL. 6, NO. 1, JANUARY 2007 1 Modeling Spectral Envelopes Using Restricted Boltzmann Macines and Deep Belief Networks for Statistical Parametric Speec Syntesis Zen-Hua Ling,

More information

SECTION 1.10: DIFFERENCE QUOTIENTS LEARNING OBJECTIVES

SECTION 1.10: DIFFERENCE QUOTIENTS LEARNING OBJECTIVES (Section.0: Difference Quotients).0. SECTION.0: DIFFERENCE QUOTIENTS LEARNING OBJECTIVES Define average rate of cange (and average velocity) algebraically and grapically. Be able to identify, construct,

More information

Investigating Euler s Method and Differential Equations to Approximate π. Lindsay Crowl August 2, 2001

Investigating Euler s Method and Differential Equations to Approximate π. Lindsay Crowl August 2, 2001 Investigating Euler s Metod and Differential Equations to Approximate π Lindsa Crowl August 2, 2001 Tis researc paper focuses on finding a more efficient and accurate wa to approximate π. Suppose tat x

More information

How to Find the Derivative of a Function: Calculus 1

How to Find the Derivative of a Function: Calculus 1 Introduction How to Find te Derivative of a Function: Calculus 1 Calculus is not an easy matematics course Te fact tat you ave enrolled in suc a difficult subject indicates tat you are interested in te

More information

MATH745 Fall MATH745 Fall

MATH745 Fall MATH745 Fall MATH745 Fall 5 MATH745 Fall 5 INTRODUCTION WELCOME TO MATH 745 TOPICS IN NUMERICAL ANALYSIS Instructor: Dr Bartosz Protas Department of Matematics & Statistics Email: bprotas@mcmasterca Office HH 36, Ext

More information

Fast optimal bandwidth selection for kernel density estimation

Fast optimal bandwidth selection for kernel density estimation Fast optimal bandwidt selection for kernel density estimation Vikas Candrakant Raykar and Ramani Duraiswami Dept of computer science and UMIACS, University of Maryland, CollegePark {vikas,ramani}@csumdedu

More information

Overdispersed Variational Autoencoders

Overdispersed Variational Autoencoders Overdispersed Variational Autoencoders Harsil Sa, David Barber and Aleksandar Botev Department of Computer Science, University College London Alan Turing Institute arsil.sa.15@ucl.ac.uk, david.barber@ucl.ac.uk,

More information

1watt=1W=1kg m 2 /s 3

1watt=1W=1kg m 2 /s 3 Appendix A Matematics Appendix A.1 Units To measure a pysical quantity, you need a standard. Eac pysical quantity as certain units. A unit is just a standard we use to compare, e.g. a ruler. In tis laboratory

More information

Exam 1 Review Solutions

Exam 1 Review Solutions Exam Review Solutions Please also review te old quizzes, and be sure tat you understand te omework problems. General notes: () Always give an algebraic reason for your answer (graps are not sufficient),

More information

The Krewe of Caesar Problem. David Gurney. Southeastern Louisiana University. SLU 10541, 500 Western Avenue. Hammond, LA

The Krewe of Caesar Problem. David Gurney. Southeastern Louisiana University. SLU 10541, 500 Western Avenue. Hammond, LA Te Krewe of Caesar Problem David Gurney Souteastern Louisiana University SLU 10541, 500 Western Avenue Hammond, LA 7040 June 19, 00 Krewe of Caesar 1 ABSTRACT Tis paper provides an alternative to te usual

More information

[db]

[db] Blind Source Separation based on Second-Order Statistics wit Asymptotically Optimal Weigting Arie Yeredor Department of EE-Systems, el-aviv University P.O.Box 3900, el-aviv 69978, Israel Abstract Blind

More information

Material for Difference Quotient

Material for Difference Quotient Material for Difference Quotient Prepared by Stepanie Quintal, graduate student and Marvin Stick, professor Dept. of Matematical Sciences, UMass Lowell Summer 05 Preface Te following difference quotient

More information

Reading Group on Deep Learning Session 4 Unsupervised Neural Networks

Reading Group on Deep Learning Session 4 Unsupervised Neural Networks Reading Group on Deep Learning Session 4 Unsupervised Neural Networks Jakob Verbeek & Daan Wynen 206-09-22 Jakob Verbeek & Daan Wynen Unsupervised Neural Networks Outline Autoencoders Restricted) Boltzmann

More information

Near-Optimal conversion of Hardness into Pseudo-Randomness

Near-Optimal conversion of Hardness into Pseudo-Randomness Near-Optimal conversion of Hardness into Pseudo-Randomness Russell Impagliazzo Computer Science and Engineering UC, San Diego 9500 Gilman Drive La Jolla, CA 92093-0114 russell@cs.ucsd.edu Ronen Saltiel

More information

Explicit Interleavers for a Repeat Accumulate Accumulate (RAA) code construction

Explicit Interleavers for a Repeat Accumulate Accumulate (RAA) code construction Eplicit Interleavers for a Repeat Accumulate Accumulate RAA code construction Venkatesan Gurusami Computer Science and Engineering University of Wasington Seattle, WA 98195, USA Email: venkat@csasingtonedu

More information

Chapter 2 Ising Model for Ferromagnetism

Chapter 2 Ising Model for Ferromagnetism Capter Ising Model for Ferromagnetism Abstract Tis capter presents te Ising model for ferromagnetism, wic is a standard simple model of a pase transition. Using te approximation of mean-field teory, te

More information

1 Calculus. 1.1 Gradients and the Derivative. Q f(x+h) f(x)

1 Calculus. 1.1 Gradients and the Derivative. Q f(x+h) f(x) Calculus. Gradients and te Derivative Q f(x+) δy P T δx R f(x) 0 x x+ Let P (x, f(x)) and Q(x+, f(x+)) denote two points on te curve of te function y = f(x) and let R denote te point of intersection of

More information

2011 Fermat Contest (Grade 11)

2011 Fermat Contest (Grade 11) Te CENTRE for EDUCATION in MATHEMATICS and COMPUTING 011 Fermat Contest (Grade 11) Tursday, February 4, 011 Solutions 010 Centre for Education in Matematics and Computing 011 Fermat Contest Solutions Page

More information

Reflection Symmetries of q-bernoulli Polynomials

Reflection Symmetries of q-bernoulli Polynomials Journal of Nonlinear Matematical Pysics Volume 1, Supplement 1 005, 41 4 Birtday Issue Reflection Symmetries of q-bernoulli Polynomials Boris A KUPERSHMIDT Te University of Tennessee Space Institute Tullaoma,

More information

Physically Based Modeling: Principles and Practice Implicit Methods for Differential Equations

Physically Based Modeling: Principles and Practice Implicit Methods for Differential Equations Pysically Based Modeling: Principles and Practice Implicit Metods for Differential Equations David Baraff Robotics Institute Carnegie Mellon University Please note: Tis document is 997 by David Baraff

More information

Finite Difference Methods Assignments

Finite Difference Methods Assignments Finite Difference Metods Assignments Anders Söberg and Aay Saxena, Micael Tuné, and Maria Westermarck Revised: Jarmo Rantakokko June 6, 1999 Teknisk databeandling Assignment 1: A one-dimensional eat equation

More information

Taylor Series and the Mean Value Theorem of Derivatives

Taylor Series and the Mean Value Theorem of Derivatives 1 - Taylor Series and te Mean Value Teorem o Derivatives Te numerical solution o engineering and scientiic problems described by matematical models oten requires solving dierential equations. Dierential

More information

Differentiation in higher dimensions

Differentiation in higher dimensions Capter 2 Differentiation in iger dimensions 2.1 Te Total Derivative Recall tat if f : R R is a 1-variable function, and a R, we say tat f is differentiable at x = a if and only if te ratio f(a+) f(a) tends

More information

The Priestley-Chao Estimator

The Priestley-Chao Estimator Te Priestley-Cao Estimator In tis section we will consider te Pristley-Cao estimator of te unknown regression function. It is assumed tat we ave a sample of observations (Y i, x i ), i = 1,..., n wic are

More information

Natural Language Understanding. Recap: probability, language models, and feedforward networks. Lecture 12: Recurrent Neural Networks and LSTMs

Natural Language Understanding. Recap: probability, language models, and feedforward networks. Lecture 12: Recurrent Neural Networks and LSTMs Natural Language Understanding Lecture 12: Recurrent Neural Networks and LSTMs Recap: probability, language models, and feedforward networks Simple Recurrent Networks Adam Lopez Credits: Mirella Lapata

More information

AN IMPROVED WEIGHTED TOTAL HARMONIC DISTORTION INDEX FOR INDUCTION MOTOR DRIVES

AN IMPROVED WEIGHTED TOTAL HARMONIC DISTORTION INDEX FOR INDUCTION MOTOR DRIVES AN IMPROVED WEIGHTED TOTA HARMONIC DISTORTION INDEX FOR INDUCTION MOTOR DRIVES Tomas A. IPO University of Wisconsin, 45 Engineering Drive, Madison WI, USA P: -(608)-6-087, Fax: -(608)-6-5559, lipo@engr.wisc.edu

More information

Printed Name: Section #: Instructor:

Printed Name: Section #: Instructor: Printed Name: Section #: Instructor: Please do not ask questions during tis exam. If you consider a question to be ambiguous, state your assumptions in te margin and do te best you can to provide te correct

More information

Order of Accuracy. ũ h u Ch p, (1)

Order of Accuracy. ũ h u Ch p, (1) Order of Accuracy 1 Terminology We consider a numerical approximation of an exact value u. Te approximation depends on a small parameter, wic can be for instance te grid size or time step in a numerical

More information

HARMONIC ALLOCATION TO MV CUSTOMERS IN RURAL DISTRIBUTION SYSTEMS

HARMONIC ALLOCATION TO MV CUSTOMERS IN RURAL DISTRIBUTION SYSTEMS HARMONIC ALLOCATION TO MV CUSTOMERS IN RURAL DISTRIBUTION SYSTEMS V Gosbell University of Wollongong Department of Electrical, Computer & Telecommunications Engineering, Wollongong, NSW 2522, Australia

More information

Average Rate of Change

Average Rate of Change Te Derivative Tis can be tougt of as an attempt to draw a parallel (pysically and metaporically) between a line and a curve, applying te concept of slope to someting tat isn't actually straigt. Te slope

More information

Simulation and verification of a plate heat exchanger with a built-in tap water accumulator

Simulation and verification of a plate heat exchanger with a built-in tap water accumulator Simulation and verification of a plate eat excanger wit a built-in tap water accumulator Anders Eriksson Abstract In order to test and verify a compact brazed eat excanger (CBE wit a built-in accumulation

More information

A First-Order System Approach for Diffusion Equation. I. Second-Order Residual-Distribution Schemes

A First-Order System Approach for Diffusion Equation. I. Second-Order Residual-Distribution Schemes A First-Order System Approac for Diffusion Equation. I. Second-Order Residual-Distribution Scemes Hiroaki Nisikawa W. M. Keck Foundation Laboratory for Computational Fluid Dynamics, Department of Aerospace

More information

CORRELATION TEST OF RESIDUAL ERRORS IN FREQUENCY DOMAIN SYSTEM IDENTIFICATION

CORRELATION TEST OF RESIDUAL ERRORS IN FREQUENCY DOMAIN SYSTEM IDENTIFICATION IAC Symposium on System Identification, SYSID 006 Marc 9-3 006, Newcastle, Australia CORRELATION TEST O RESIDUAL ERRORS IN REQUENCY DOMAIN SYSTEM IDENTIICATION István Kollár *, Ri Pintelon **, Joan Scouens

More information

Local Orthogonal Polynomial Expansion (LOrPE) for Density Estimation

Local Orthogonal Polynomial Expansion (LOrPE) for Density Estimation Local Ortogonal Polynomial Expansion (LOrPE) for Density Estimation Alex Trindade Dept. of Matematics & Statistics, Texas Tec University Igor Volobouev, Texas Tec University (Pysics Dept.) D.P. Amali Dassanayake,

More information

Financial Econometrics Prof. Massimo Guidolin

Financial Econometrics Prof. Massimo Guidolin CLEFIN A.A. 2010/2011 Financial Econometrics Prof. Massimo Guidolin A Quick Review of Basic Estimation Metods 1. Were te OLS World Ends... Consider two time series 1: = { 1 2 } and 1: = { 1 2 }. At tis

More information

Linearized Primal-Dual Methods for Linear Inverse Problems with Total Variation Regularization and Finite Element Discretization

Linearized Primal-Dual Methods for Linear Inverse Problems with Total Variation Regularization and Finite Element Discretization Linearized Primal-Dual Metods for Linear Inverse Problems wit Total Variation Regularization and Finite Element Discretization WENYI TIAN XIAOMING YUAN September 2, 26 Abstract. Linear inverse problems

More information

Bootstrap confidence intervals in nonparametric regression without an additive model

Bootstrap confidence intervals in nonparametric regression without an additive model Bootstrap confidence intervals in nonparametric regression witout an additive model Dimitris N. Politis Abstract Te problem of confidence interval construction in nonparametric regression via te bootstrap

More information

Impact of Lightning Strikes on National Airspace System (NAS) Outages

Impact of Lightning Strikes on National Airspace System (NAS) Outages Impact of Ligtning Strikes on National Airspace System (NAS) Outages A Statistical Approac Aurélien Vidal University of California at Berkeley NEXTOR Berkeley, CA, USA aurelien.vidal@berkeley.edu Jasenka

More information

EDML: A Method for Learning Parameters in Bayesian Networks

EDML: A Method for Learning Parameters in Bayesian Networks : A Metod for Learning Parameters in Bayesian Networks Artur Coi, Kaled S. Refaat and Adnan Darwice Computer Science Department University of California, Los Angeles {aycoi, krefaat, darwice}@cs.ucla.edu

More information

The Verlet Algorithm for Molecular Dynamics Simulations

The Verlet Algorithm for Molecular Dynamics Simulations Cemistry 380.37 Fall 2015 Dr. Jean M. Standard November 9, 2015 Te Verlet Algoritm for Molecular Dynamics Simulations Equations of motion For a many-body system consisting of N particles, Newton's classical

More information

SECTION 3.2: DERIVATIVE FUNCTIONS and DIFFERENTIABILITY

SECTION 3.2: DERIVATIVE FUNCTIONS and DIFFERENTIABILITY (Section 3.2: Derivative Functions and Differentiability) 3.2.1 SECTION 3.2: DERIVATIVE FUNCTIONS and DIFFERENTIABILITY LEARNING OBJECTIVES Know, understand, and apply te Limit Definition of te Derivative

More information

Solving Continuous Linear Least-Squares Problems by Iterated Projection

Solving Continuous Linear Least-Squares Problems by Iterated Projection Solving Continuous Linear Least-Squares Problems by Iterated Projection by Ral Juengling Department o Computer Science, Portland State University PO Box 75 Portland, OR 977 USA Email: juenglin@cs.pdx.edu

More information

INTRODUCTION AND MATHEMATICAL CONCEPTS

INTRODUCTION AND MATHEMATICAL CONCEPTS Capter 1 INTRODUCTION ND MTHEMTICL CONCEPTS PREVIEW Tis capter introduces you to te basic matematical tools for doing pysics. You will study units and converting between units, te trigonometric relationsips

More information

arxiv: v1 [math.pr] 28 Dec 2018

arxiv: v1 [math.pr] 28 Dec 2018 Approximating Sepp s constants for te Slepian process Jack Noonan a, Anatoly Zigljavsky a, a Scool of Matematics, Cardiff University, Cardiff, CF4 4AG, UK arxiv:8.0v [mat.pr] 8 Dec 08 Abstract Slepian

More information

EFFICIENCY OF MODEL-ASSISTED REGRESSION ESTIMATORS IN SAMPLE SURVEYS

EFFICIENCY OF MODEL-ASSISTED REGRESSION ESTIMATORS IN SAMPLE SURVEYS Statistica Sinica 24 2014, 395-414 doi:ttp://dx.doi.org/10.5705/ss.2012.064 EFFICIENCY OF MODEL-ASSISTED REGRESSION ESTIMATORS IN SAMPLE SURVEYS Jun Sao 1,2 and Seng Wang 3 1 East Cina Normal University,

More information

Basic Nonparametric Estimation Spring 2002

Basic Nonparametric Estimation Spring 2002 Basic Nonparametric Estimation Spring 2002 Te following topics are covered today: Basic Nonparametric Regression. Tere are four books tat you can find reference: Silverman986, Wand and Jones995, Hardle990,

More information

MATH 1020 Answer Key TEST 2 VERSION B Fall Printed Name: Section #: Instructor:

MATH 1020 Answer Key TEST 2 VERSION B Fall Printed Name: Section #: Instructor: Printed Name: Section #: Instructor: Please do not ask questions during tis exam. If you consider a question to be ambiguous, state your assumptions in te margin and do te best you can to provide te correct

More information