On the Use of A Priori Information for Sparse Signal Approximations

Similar documents
A Simple Regression Problem

Feature Extraction Techniques

Block designs and statistics

Sharp Time Data Tradeoffs for Linear Inverse Problems

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

Randomized Recovery for Boolean Compressed Sensing

Kernel Methods and Support Vector Machines

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

Weighted- 1 minimization with multiple weighting sets

Non-Parametric Non-Line-of-Sight Identification 1

3.8 Three Types of Convergence

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Lecture 21. Interior Point Methods Setup and Algorithm

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup)

Computational and Statistical Learning Theory

Support recovery in compressed sensing: An estimation theoretic approach

1 Proof of learning bounds

CS Lecture 13. More Maximum Likelihood

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Lecture 20 November 7, 2013

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness

COS 424: Interacting with Data. Written Exercises

On Constant Power Water-filling

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Exact tensor completion with sum-of-squares

Estimating Parameters for a Gaussian pdf

arxiv: v1 [cs.ds] 3 Feb 2014

Physics 215 Winter The Density Matrix

Compressive Distilled Sensing: Sparse Recovery Using Adaptivity in Compressive Measurements

Polygonal Designs: Existence and Construction

Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence

3.3 Variational Characterization of Singular Values

Lower Bounds for Quantized Matrix Completion

Bipartite subgraphs and the smallest eigenvalue

A note on the multiplication of sparse matrices

1 Bounding the Margin

Computable Shell Decomposition Bounds

Chaotic Coupled Map Lattices

Multi-Scale/Multi-Resolution: Wavelet Transform

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

Chapter 6 1-D Continuous Groups

Distributed Subgradient Methods for Multi-agent Optimization

The proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013).

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

OPTIMIZATION in multi-agent networks has attracted

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution

EE5900 Spring Lecture 4 IC interconnect modeling methods Zhuo Feng

A Probabilistic and RIPless Theory of Compressed Sensing

A Better Algorithm For an Ancient Scheduling Problem. David R. Karger Steven J. Phillips Eric Torng. Department of Computer Science

Hybrid System Identification: An SDP Approach

Stochastic Subgradient Methods

. The univariate situation. It is well-known for a long tie that denoinators of Pade approxiants can be considered as orthogonal polynoials with respe

a a a a a a a m a b a b

Least Squares Fitting of Data

Bootstrapping Dependent Data

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval

List Scheduling and LPT Oliver Braun (09/05/2017)

Computable Shell Decomposition Bounds

A PROBABILISTIC AND RIPLESS THEORY OF COMPRESSED SENSING. Emmanuel J. Candès Yaniv Plan. Technical Report No November 2010

PAC-Bayes Analysis Of Maximum Entropy Learning

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search

The Simplex Method is Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate

Pattern Recognition and Machine Learning. Artificial Neural networks

Testing Properties of Collections of Distributions

Structured signal recovery from quadratic measurements: Breaking sample complexity barriers via nonconvex optimization

Recovery of Sparsely Corrupted Signals

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay

arxiv: v1 [cs.ds] 29 Jan 2012

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Ch 12: Variations on Backpropagation

A Note on the Applied Use of MDL Approximations

arxiv: v1 [math.na] 10 Oct 2016

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical

Topic 5a Introduction to Curve Fitting & Linear Regression

The Weierstrass Approximation Theorem

Necessity of low effective dimension

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs

A Bernstein-Markov Theorem for Normed Spaces

In this chapter, we consider several graph-theoretic and probabilistic models

arxiv: v1 [cs.ds] 17 Mar 2016

Physics 139B Solutions to Homework Set 3 Fall 2009

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.

Boosting with log-loss

Introduction to Machine Learning. Recitation 11

arxiv: v1 [math.nt] 14 Sep 2014

Fast Montgomery-like Square Root Computation over GF(2 m ) for All Trinomials

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information

Probability Distributions

Tail estimates for norms of sums of log-concave random vectors

ADVANCES ON THE BESSIS- MOUSSA-VILLANI TRACE CONJECTURE

Bayes Decision Rule and Naïve Bayes Classifier

A method to determine relative stroke detection efficiencies from multiplicity distributions

Upper bound on false alarm rate for landmine detection and classification using syntactic pattern recognition

On Poset Merging. 1 Introduction. Peter Chen Guoli Ding Steve Seiden. Keywords: Merging, Partial Order, Lower Bounds. AMS Classification: 68W40

Kernel-Based Nonparametric Anomaly Detection

DERIVING PROPER UNIFORM PRIORS FOR REGRESSION COEFFICIENTS

Chapter 6: Economic Inequality

1 Rademacher Complexity Bounds

Transcription:

ITS TECHNICAL REPORT NO. 3/4 On the Use of A Priori Inforation for Sparse Signal Approxiations Oscar Divorra Escoda, Lorenzo Granai and Pierre Vandergheynst Signal Processing Institute ITS) Ecole Polytechnique Fédérale de Lausanne EPFL) LTS-ITS-STI-EPFL, 5 Lausanne, Switzerland Technical Report No. 3/4 Abstract This report is the extension to the case of sparse approxiations of our previous study on the effects of introducing a priori knowledge to solve the recovery of sparse representations when overcoplete dictionaries are used []. Greedy algoriths and Basis Pursuit Denoising are considered in this work. Theoretical results show how the use of reliable a priori inforation which in this work appears under the for of weights) can iprove the perforances of these ethods. In particular, we generalize the sufficient conditions established by Tropp [], [3] and Gribonval and Vandergheynst [4], that guarantee the retrieval of the sparsest solution, to the case where a priori inforation is used. We prove how the use of prior odels at the signal decoposition stage influences these sufficient conditions. The results found in this work reduce to the classical case of [4] and [3] when no a priori inforation about the signal is available. Finally, exaples validate and illustrate theoretical results. Index Ters Sparse Approxiations, Sparse Representations, Basis Pursuit Denoising, Matching Pursuit, Relaxation Algoriths, Greedy Algoriths, A Priori Knowledge, Redundant Dictionaries, Weighted Basis Pursuit Denoising, Weighted Matching Pursuit. Contents I Introduction II Recovery of General Signals: Sparse Approxiations II-A Greedy Algoriths: Weak-MP...................................... 3 II-A. Robustness........................................... 3 II-A. Rate of Convergence...................................... 4 II-B Convex Relaxation of the Subset Selection Proble.......................... 4 III Including A Priori Inforation on Greedy algoriths 5 III-A Influence on Sparse Approxiations................................... 5 III-B Rate of Convergence of Weighted-MP/OMP on Sparse Approxiations............... 8 III-C Exaple: Use of Footprints for ɛ-sparse Approxiations........................ IV Approxiations with Weighted Basis Pursuit Denoising 3 IV-A A Bayesian Approach to Weighted Basis Pursuit Denoising...................... 4 IV-B Preliinary Propositions......................................... 5 IV-C Weighted Relaxed Subset Selection................................... 7 IV-D Relation with the Weighted Cuulative Coherence.......................... 8 IV-E Exaple.................................................. 9 V Exaples: A Natural Signal Approxiation with Coherent Dictionaries and an A Priori Model 9 V-A A Noisy Piecewise-sooth Natural Signal................................ V-B Modeling the Relation Signal-Dictionary................................ V-C Signal Approxiation........................................... V-D Results................................................... V-D. Approxiation Results with OMP.............................. Web page: http://ltswww.epfl.ch The work of Oscar Divorra Escoda is partly sponsored by the IM. NCCR The work of Lorenzo Granai is supported by the SNF grant -669./

ITS TECHNICAL REPORT NO. 3/4 V-D. Approxiation Results with BPDN............................. V-D.3 Capturing the Piecewise-sooth Coponent with Footprints Basis............ 3 V-D.4 Paraeter Search....................................... 3 VI Conclusions 4 References 6 I. Introduction In any applications, such as copression, denoising or source separation, one seeks an efficient representation or approxiation of the signal by eans of a linear expansion into a possibly overcoplete faily of functions. In this setting, efficiency is often characterized by sparseness of the associated series of coefficients. The criterion of sparseness has been studied for a long tie and in the last few years has becoe popular in the signal processing counity [5], [6], [], [3]. Natural signals are very unlikely to be exact sparse superpositions of vectors. In fact the set of such signals is of easure zero in C N []. We thus extend our previous work on sparse exact representations [] to the ore useful case of sparse approxiations: in c f Dc s.t. c. ) In general, the proble of recovering the sparsest signal approxiation or representation) over a redundant dictionary is a NP-hard proble. However this does not ipair the possibility of finding this efficiently when particular classes of dictionaries are used [7]. As deonstrated in [], [3], [6], [4], in order to ensure the good behavior of algoriths like General Weakα) Matching Pursuit Weak-MP) and Basis Pursuit Denoising BPDN), dictionaries need to be incoherent enough. Under this ain hypothesis, sufficient conditions have been stated so that both ethods are able to recover the atos fro the sparsest -ters expansion of a signal. However, experience and intuition dictate that good dictionaries for sparse approxiations of natural signals can be very dense and, depending on the kind of signal structures to exploit, they ay in ost cases be highly coherent. For exaple, consider a natural iage fro which an efficient approxiation of edges is desired. Several approaches have been proposed where the functions in use have a very strong geoetrical eaning [8], [9], [], []. Indeed, they represent local orientation of edges. In order to accurately represent all orientations, the set of functions to use need to finely saple the direction paraeter. This yields that the atos of the dictionary ay have a strong correlation. Moreover, if further failies of functions are considered in the dictionary in order to odel other coponents like textures, or sooth areas, the coherence between the whole set of functions can becoe even higher. As a further otivation, one can also observe that the set of visual priitives obtained by Olshausen and Field while studying the spatial receptive fields of siple cells in aalian striate cortex [] is redundant and with a high coherence. Concerning the case of exact sparse signal representations, we introduced in [] a way of using ore coherent dictionaries with Weak-MP and Basis Pursuit BP), while keeping the possibility of recovering the ial solution. Two ethods were proposed based on the use of a priori inforation about the signal to decopose: we called the Weighted-MP and Weighted-BP. In this report we address the case of sparse signal approxiations, discussing the potentiality of using a priori knowledge in the ato selection procedure. We do not face here the issue of how to find a reliable and useful a priori knowledge about a signal. This proble strongly depends on the nature of the signal and on the kind of dictionary used. The ai of this paper is the theoretical study of the weighted algoriths in the prospective of achieving sparseness. Main results are: The definition of Weighted-BPDN and Weighted-MP/OMP algoriths for approxiation purposes. We reforulate classic BPDN and Weak-MP in order to take a priori inforation into account when decoposing the signal. A sufficient condition under which Weighted Basis Pursuit Denoising and Weighted-MP/OMP find the best -ter signal approxiation. A study of how adapting the decoposition algorith depending on a priori inforation ay help in the recovery of sparse approxiations. An analysis of the effects of adding the a priori weights on the rate of convergence of Weak-MP. An epirical analysis, on natural signals, of the effect of using prior odels at the decoposition stage when coherent overcoplete dictionaries are used. II. Recovery of General Signals: Sparse Approxiations Exact sparse representations are ostly useless in practice, since the set of such signals is of easure zero in C N []. There are very few cases where, for a given signal, one can hope to find an -sparse representation. Sparse approxiations, on the other hand, have found nuerous applications, e.g.. in copression, restoration, denoising or sources separation.

DIVORRA, GRANAI AND VANDERGHEYNST 3 The use of overcoplete dictionaries, although in a sense advantageous, has the drawback that finding a solution to ) ay be rather difficult or even practically infeasible. Subial algoriths that are used instead like Weak General Matching Pursuit algoriths -Weak-MP- [4] or l -nor Relaxation Algoriths -BPDN- [3]) do not necessarily supply the sae solution as the proble forulated in ). However, there exist particular situations in which they succeed in recovering the correct solution. Very iportant results have been found in the case of using incoherent dictionaries to find sparse approxiations of signals. Indeed, sufficient conditions have been found such that for a dictionary Weak-MP and BPDN algoriths can be guaranteed to find the set of atos that for the sparsest -ters approxiant f of a signal f. Moreover, for the case where Orthogonal Matching Pursuit OMP) is used the set of coefficients found by the algorith will correspond also to the ial one. Prior to reviewing the results that state the sufficient conditions introduced above, let us define and recall a series of eleents that will be used in the reaining of the paper. f H is the function to be approxiated, where H is a Hilbert space. Unless otherwise stated, in this report we assue f R n. D and D define respectively the set of atos included in the dictionary and the dictionary synthesis atrix where each one of the coluns corresponds to an ato D = {g i : i Ω}). f is the best approxiant of f such that f a positive integer. = D c where the support of c is saller than or equal to Given n, r n and f n are the residual and approxiant generated by a greedy algorith at its nth iteration. is the ial set of atos that generate f. Often, in the text, this will be referred to as for siplicity. The best approxiation of f over the atos in is called a = DD + f. α, ], is defined as the weakness factor associated to the ato selection procedure of Weak-MP algoriths [4]. µ, D) is a easure of the internal cuulative coherence of D []: µ, D) ax ax g i, g λ, ) = i Ω\ where Ω has size. Reark that the easure known as coherence of a dictionary µ) and often used to characterize redundant dictionaries corresponds to the particular case of µ = µ, D). Furtherore µ, D) µ. Let η η ) be a subiality factor associated to the case where the best -ters approxiation can not be reached by the algorith in use. In such a case, the residual error energy after approxiation is + η) r instead of r. A. Greedy Algoriths: Weak-MP Gribonval and Vandergheynst extended the results Tropp found for the particular case of OMP to the general Weak- MP. Akin to the case of signal representation, the ain results consist in the sufficient conditions that guarantee that Weak-MP will recover the ial set of atos that generate the best -ters approxiant f. Moreover, a result establishes as well an upper bound on the decay of the residual energy in the approxiation of a signal that depends on the internal coherence of D, and a bound on how any correct iterations can be perfored by the greedy algorith depending on the dictionary and the energy of f. ) Robustness: The sufficient conditions found in [4] that ensure that Weak-MP will recover the set of atos that copose the best -ters approxiant are enounced in Theore. First of all, it is necessary that the ial set satisfies the Stability Condition [4]. If in addition soe conditions are satisfied concerning the reaining residual energy at the nth iteration r n ) and the ial residual energy r, then an additional ato belonging to will be recovered. This condition, called originally the General Recovery Condition in [], was naed, for the case of general Weak-MP, the Robustness Condition in [4]. Theore : Gribonval & Vandergheynst [4]) Let {r n } n be a sequence of residuals coputed by General MP to approxiate soe f H. For any integer such that µ ) + µ ), let f = γ c γ g γ be a best -ters approxiation to f, and let N = N f) be the sallest integer such that r N ) r µ )) + µ ) µ )). 3) Then, for n < N, General MP picks up a correct ato. If no best -ters approxiant exists, the sae results are valid provided that r = f f is replaced with f f = + η) r in 3). λ

4 ITS TECHNICAL REPORT NO. 3/4 ) Rate of Convergence: In the following the ain result concerning the exponential decay of the error energy bound, as well as the bound on how any correct iterations can be perfored by the greedy algorith, is reviewed. Theore : Gribonval & Vandergheynst [4]) Let {r n } n be a sequence of residuals coputed by General MP to approxiate soe f H. For any integer such that µ ) + µ ), we have that r n r µ ) n l ) r l ) r. 4) Moreover, N, and for : if r 3 r /, then N < + µ ) ln 3 r r else N. 5) B. Convex Relaxation of the Subset Selection Proble Another instance of proble ) is given by in c f Dc + τ c. 6) Unfortunately the function that has to be iniized is not convex. One can define a p-nor of a vector c for any positive real p: ) /p. c p = c i p 7) i Ω It is well known that the sallest p for which Eq. 7) is convex is. For this reason the convex relaxation of the subset selection proble was introduced in [3] under the nae of Basis Pursuit Denoising: in b f Db + γ b. 8) This proble can be solved recurring to Quadratic Prograing techniques see also Section IV). In [3], the author studies the relation between the subset selection proble 6) and its convex relaxation 8). Next theore shows that any coefficient vector which iniizes Eq. 8) is supported inside the ial set of indexes. Theore 3: Correlation Condition, Tropp [3]) Suppose that the axiu inner product between the residual signal and any ato satisfies the condition D f a ) < γ sup D + g i ). i / Then any coefficient vector b that iniizes the function 8) ust satisfy supportb ). In particular, the following theoretical result show how the trade off paraeters τ and γ are related. Theore 4: Tropp [3]) Suppose that the coefficient vector b iniizes the function 8) with threshold γ = τ/ sup i / D + g i ). Then we have that: ) the relaxation never selects a non ial ato since supportb ) supportc ). ) The solution of the convex relaxation is unique. 3) The following upper bound is valid: τ DD ), c b sup D + g. 9) i i / 4) The support of b contains every index j for which τ DD ), c j) > sup D + g. ) i i /

DIVORRA, GRANAI AND VANDERGHEYNST 5 If the dictionary we are working with is orthonoral it follows that sup D + g i = and i / D D ), = and the previous theore becoes uch stronger. In particular we obtain that c b τ and c j) > τ [3], [5]. A proble siilar to the subset selection is given by the retrieval of a sparse approxiation given an error constraint: whose natural convex relaxation is given by in c c s.t. f Dc ɛ, ) in b b s.t. f Db ɛ. ) In this paper, we are not going to explore this proble, but let us just recall that if the dictionary is incoherent, then the solution to ) for a given ɛ is at least as sparse as the solution to ), with a tolerance ɛ soewhat saller than ɛ [3]. Fro all these results, one can infer that the use of incoherent dictionaries is very iportant for the good behavior of greedy and l -nor relaxation algoriths. However, as discussed in the introduction, experience sees to teach us that the overcoplete dictionaries which are likely to be powerful for natural signals approxiation would be very redundant and with significant internal coherence. Hence, this inconsistent and contradictory situation clais for a solution to be found. In the following sections, we introduce a general approach that intends to tackle this proble. In our opinion, a ore careful analysis and odeling of the signal to approxiate is necessary. Dictionary wavefors are not enough good odeling eleents to be exploited in the signal decoposition stage. Further analysis is required to better drive the decoposition algorith. As in our previous work concerning the case of exact signal representations [], in this report, a priori knowledge that relates the signal f and the dictionary D are considered for signal approxiations. III. Including A Priori Inforation on Greedy algoriths As seen in our previous work on the exact sparse representation of signals [], the use of a priori inforation on greedy algoriths ay ake the difference between recovering the ial set of coponents for a given approxiation or not. In this section we explore the effect of using a priori knowledge on greedy algoriths on the recovery of the ) of a signal f. First, sufficient conditions for the recovery of a correct ato fro the sparsest -ters approxiant are established for the case where a priories are taken into account. Later, we study how prior knowledge affects the rate of convergence of greedy algoriths. Finally, a practical exaple is presented. best -ters approxiant f A. Influence on Sparse Approxiations An iportant result concerning sparse approxiations is the feasibility of recovering the sparsest -ters approxiation f. Akin to the stateents established for the exact representation case, sufficient conditions have been deterined such that, given a Weak-MP and the associated series of atos g γn and residuals r n n ) up to the nth step, a correct ato at the n + )th step can be guaranteed see Sec. II). In Theore 5, sufficient conditions are presented for the case when soe a priori knowledge is available. The ain interest of this result is to show that if an appropriate a priori concerning f and D) is in use, a better approxiation result can be achieved. First of all, let us expose the eleents that take part into the following results. In order to enhance the clarity of the explanation, we first recall the definition of the ain concepts that will be used. Let us consider first the diagonal atrix W f, D) introduced in [] to represent the a priori knowledge taken into account in the ato selection procedure. Definition : A weighting atrix W = W f, D) is a square diagonal atrix of size d d. Each of the entries w i, ] fro the diagonal corresponds to the a priori likelihood of a particular ato g i D to be part of the sparsest decoposition of f. We define also w ax hence: as the biggest of the weights corresponding to the subset of atos belonging to = Ω \, w ax sup w γ. 3) γ

6 ITS TECHNICAL REPORT NO. 3/4 Moreover, an additional quantity is required in the results depicted below: ɛ ax sup w γ. 4) γ Eqs. 3) and 4) give inforation about how good the a priori inforation is. The reader will notice that these quantities depend on the ial set of atos, aking not possible to establish a rule to copute the in advance. The role of these agnitudes is to represent the influence of the prior knowledge quality in the results obtained below. Notice that ɛ ax and w ax. ɛ ax is close to zero if good atos the ones belonging to ) are not penalized by the a priori. If the supplied a priori is a good enough odel of the relation between the signal and the dictionary in use, we state that the a priori knowledge is reliable. w ax becoes sall if bad atos are strongly penalized by the prior knowledge.as we will see in the following, if the a priori is reliable and w ax is sall, then the prior knowledge can have a relevant positive influence in the behavior of the greedy algorith. The consequence of taking into account the a priori atrix W, is to allow a new definition of the Babel Function introduced by Tropp []. In [] the fact that not all the atos of D are equiprobabile is taken into account. In effect, the availability of soe prior should be considered when judging whether a greedy algorith is going to be able to recover the -sparsest approxiation of a signal f or not. As seen in the Sec. II, the conditions that ensure the recoverability of the best ter approxiant relay on the internal coherence easure of a dictionary µ ). Using the a priori inforation, soe ato interactions can be penalized or even disissed in the cuulative coherence easure. Hence, a new signal dependent cuulative coherence easure was introduced in []: Definition : The Weighted Cuulative Coherence function of D is defined as the following data dependent easure: µ w, D, f) ax ax < g λ, g i > w λ w i. = i Ω\ 5) λ Once all necessary eleents have been defined, we can finally state the result that shows the behavior of greedy algoriths on the use of a priori inforation for the recovery of -sparse approxiants. As proved later in this section, the use of such knowledge iplies an iproveent with respect to the classic Weak-MP also for the case of signal approxiation. Theore 5: Let {r n } : n, be the set of residuals generated by Weighted-MP/OMP in the approxiation of a signal f, and let f be the best -ters approxiant of f over D. Then, for any positive integer such that, for a reliable a priori inforation W f, D), µ w ) + µ w ) < ɛ ax, η and r n > f f + η) + µw ) + ɛ ax )) w ax ) ) µ w ) + µ w ) + ɛ ax )), 6) Weighted-MP/OMP will recover an ato that belongs to the ial set that expand the best -sparse approxiant of f. If the best -ters approxiant f exist and can be reached, then η =. This eans that if the approxiation error at the nth iteration is still bigger than a certain quantity which depends on the ial error f f ), the internal dictionary cuulative coherence and the reliability of the a priori inforation, then still another ter of the best ter approxiant can still be recovered. The use of reliable a priori inforation akes the bound easier to satisfy copared to when no prior knowledge is used [4]. Thus, a higher nuber of ters fro the best ter approxiant ay be recovered. Proof: To deonstrate the result of Theore 5, we follow the steps of the original proofs by Tropp [] and Gribonval and Vandergheynst [4]. This tie however, a priori knowledge is taken into account. First of all, let us reind the following stateents: f span ) r n = f f n r = f f is such that r f f n ) n <, hence r n = f f n + f f. In order to ensure the recovery of any ato belonging to the ial set =, the following needs to be satisfied: ρ w r n ) = D W r n D W r n < α, 7)

DIVORRA, GRANAI AND VANDERGHEYNST 7 where α, ] is the weakness factor [4]. To establish 6), the previous expression has to be put in ters of f and f n. Hence, ρ w r n ) = D W r n = D W f f n) D W r n D W f f n ) D W = f f ) f ) + D W f n D W f f ) f ) + D W f n D W = f f ) + D W ) D W f n f f f n ) D W f ) f D W f ) D W f + n D W D = W f ) f ) D W f + ρ w f f n ), n f where the second ter can be upper bounded since f f n ) span) [], f f ) f n ) f n 8) ρ w f f n ) µ w ) µ w ) + ɛ ax ). 9) The first ter of the last equality in 8) can be upper bounded in the following way: D W f ) sup g f γ w γ, ) f f γ f ) D W f = f ) n D W f, n and by the Cauchy-Schwarz inequality, sup g γ w γ, ) f f sup γ γ f ) D W f n D W sup γ = D W w γ f f f f n ) = g γ w γ f f f f n ) sup w γ f f γ g γ w γ, ) f f n. In order to further upper bound the expression above, the denoinator can be lower bounded, as shown in []. Indeed, by the singular value decoposition: sup g γ w γ, ) f σ f n in w f f n, ) γ where σin w is the iniu of the squared singular values of G D W ) T D W ), and can be bounded as σin w ɛ ax µ w ). Moreover, in ), f f can be defined as f f = + η) r, where η stands for a sub-iality factor which indicates whether f can be reached and, if not possible i.e. η ), sets the best possible reachable approxiation error. Hence, ) can be rewritten as: sup w γ f f sup w γ + η) r γ sup g γ w γ, γ ) f f n. 3) µ w ) ɛ ax γ f f n Thus, fro 3) and 9), a sufficient condition for the recovery of a correct ato can be expressed as: sup w + η) r ρ w γ µ w ) r n ) + µ w ) ɛ ax µ w f ) ɛ ax f n 4) = wax µ w ) ɛ ax) + η) r + f f n µ w ) µ w ) ɛ ax ) f f < α. n sup γ ) )

8 ITS TECHNICAL REPORT NO. 3/4 Considering that f w ax Then, we solve for r n : f n = r n r, it easily follows that µ w ) ɛ ax) + η) r + µ w ) ɛ ax ) r n > + η) r r n + η) r r n + η) r µw ) < α. 5) ) w ax µ w ) ɛ ax ) + α µ w ) ɛ. 6) ax) µ w )) For siplicity, let us consider the case where a full search ato selection algorith is available. Thus, replacing α = in 6) proves Theore 5. The general effect of using prior knowledge can thus be suarized by the following two Corollaries. Corollary : Let W f, D) be a reliable a priori knowledge and assuing α =, then for any positive integer such that µ )+µ ) and µ w )+µ w ) < ɛ ax, Weighted-MP/OMP unlike Weakα)-MP/OMP) will be sure to recover the atos belonging to the best -ters approxiation f. Corollary : Let W f, D) be a reliable a priori knowledge and assuing α =, then for any positive integer such that µ ) + µ ) < and µ w ) + µ w ) + ɛ ax < µ ) + µ ) <, Weighted-MP/OMP has a weaker sufficient condition than MP/OMP for the recovery of correct atos fro the best -ters approxiant. Hence, the correction factor of the right hand side of expression 6) is saller for the Weighted-MP/OMP than for the pure greedy algorith case: µ w ) + ɛ ax ) w ax ) ) + µ w ) + µ w ) + ɛ ax )) + ) µ )) µ ) + µ ))). Therefore, Weighted-MP/OMP is guaranteed to recover better approxiants than classic MP/OMP when reliable and good enough a priori inforation is in use. B. Rate of Convergence of Weighted-MP/OMP on Sparse Approxiations The energy of the series of residuals r n n ) generated by the greedy algorith progressively converges toward zero as n increases. In the sae way, Weighted-MP/OMP with reliable a priori inforation is expected to have a better behavior and a faster convergence rate than the Weak-MP for the approxiation case. A ore accurate estiate of the dictionary coherence conditioned to the signal to be analyzed is available: µ w ) where µ w ) µ )). Then a better bound for the rate of convergence can be found for the case of Weighted-MP/OMP. We follow the path suggested in [] for OMP and in [4] for the case of general Weak-MP algorith to prove this. As before, we introduce the consideration of the a priori inforation in the forulation. The results forally show how Weighted-MP/OMP can outperfor Weak-MP when the prior knowledge is reliable enough. Theore 6: Let W f, D) be a reliable a priori inforation atrix and {r n } : n a sequence of residuals produced by Weighted-MP/OMP, then as long as r n satisfies Eq. 6) Theore 5), Weighted-MP/OMP picks up a correct ato and r n r + η)) α ) n l µw ) ɛ ax ) r l r + η)), 7) where n l. This iplies that Weighted-MP/OMP, in the sae way as weak-mp, has a exponentially decaying upper bound on the rate of convergence. Moreover, in the case where reliable a priori inforation is used, the bound appears to be lower than in the case where priors are not used. This result suggests that the convergence of Weighted greedy algoriths ay be faster than for the case of classic pure greedy algoriths. Proof: Let us consider n such that r n satisfies Eq. 6) of Theore 5. Then, it is known that for Weak-MP: r n r n r n, g kn, 8)

DIVORRA, GRANAI AND VANDERGHEYNST 9 where the inequality applies for OMP, while in the case of MP the equality holds. Moreover, considering the weighted selection, then r n r n α sup r n, g γ w γ γ wγ = α sup f f n, g γ w γ γ wγ, 9) where the last equality follows fro the assuption that Eq. 6) of Theore 5 is satisfied and because f f ) span). And by ), r n r n α σin w f wγ f n. 3) As stated before, f together with 3) gives: f n = r n r, hence f f n f f n = r n r n, which f f n f f n α w γ σ in w ) ) f f n α σ in w. 3) Finally, by siply considering l n by recursion it follows: r n ) n l r + η) α σ in w r l r + η)), 3) and the Theore is proved. Depending on the sufficient conditions specified in Sec. III-A, the recovery of the ial set will be guaranteed. However, it is not yet clear how long a non-orthogonalized greedy algorith Weighted-MP in our case) will last iterating over the ial set of atos in the approxiation case. Let us define the nuber of correct iterations as follows: Definition 3: Consider a Weighted-MP/OMP algorith used for the approxiation of signals. We define the nuber of provably correct steps N as the sallest positive integer such that r N µ w f f + ) + ɛ ax ) w ax ) ) η) + µ w ) + µ w ) + ɛ ax )), which corresponds to the nuber of atos belonging to the ial set that is possible to recover given a signal f, a dictionary and an a priori inforation atrix W f, D). In the case of OMP and Weighted-OMP N will be always saller or equal to the cardinality of. For Weak-MP and Weighted-MP, provided that µ ) + µ ) + ɛ ax <, the probable nuber of correct iterations will depend on the final error that reains after the best -ter approxiation has been found. In the following Theore, soe bounds on the quantity N are given for Weighted-MP/OMP. To obtain the results we follow [4]. Before stating the following theore, the reader should note that fro now on, w ax defines the sae concept of l 3) for an ial set of atos of size l, i.e. for l. Theore 7: Let W f, D) be a reliable a priori inforation and {r n } : n a sequence of residuals produced by Weighted-MP/OMP when approxiating f. Then, for any integer such that µ ) + µ ) + ɛ ax <, we have N and for : if 3 r else N. r ɛ ax ) w ax ), then N < + log ɛ ax 3 r r ɛ ax ) ) w ax. 33) Fro 33) we can draw that the upper bound on the nuber of correct steps N is higher for the case of using reliable a priori inforation. This iplies that a better behavior of the algorith is possible with respect to [4]. The ter w ax that depends on the a priori inforation used for Weighted-MP/OMP, w ax and w ax = for the case of classic Weak-MP) helps increasing the value of this bound, allowing Weighted-MP to recover a higher nuber of correct iterations. w ax represents the capacity to discriinate between good and bad atos of the a

ITS TECHNICAL REPORT NO. 3/4 priori odel. In order to prove Theore 7, several interediate results are necessary. Lea : Let W f, D) be a reliable a priori inforation and {r n } : n a sequence of residuals produced by Weighted-MP/OMP, then as long as r n satisfies Eq. 6) of Theore 5, for any k < such that N k < N, where [ r ) k N N k < + µ w ) ɛ log ax r λ w l ) l µ w l ) + ɛ ax )) w ax l in which l corresponds to the size of a particular ial set of atos l = l ). Proof: Fro Theore 6, it follows that for l = N k, n = N, defining ) ] + λ w + log k + λ w, 34) [ µ w l ) + µw l) + ɛ ax)], 35) βl w µw l ) ɛ ax, 36) l where l is defined as in 35), and starting fro the condition in the residual rn as defined in the Definition 3, the following is accoplished if α = : λ w + η) r < rn r + η) β) w N Nk r Nk r + η)) β) w N Nk + λ k ) r k Operating on 37) as in [4], it easily follows that: ) N N k r k < r thus, If t then log t) t and so β w + η) r + λ w k + λ w, ) [ r k N N k log β w < log r log β w ) This proves the result presented in 34) and so the Lea. µ ) ɛ ax. + λ w k + λ w ]. + η)). 37) In order to use Lea in Theore 7, an estiate of the arguent of the second logarith in 34) is necessary. This can be found in the following Lea. Lea : For all such that µ w ) + µ w ) + ɛ ax < and k <, we have: λ w w ) ax ) 38) λ w k λ w k µ w k ) ɛ axk ) w ax k ) µ w ) ɛ ax ) w ax 39) Proof: Consider the definition of λ w of 35). Then since µ w l ) + µ w l ) + ɛ ax µ w l ) + µ w l) + ɛ ax for l, the following can be stated: ) ) λ w l λ w l µ w l ) ɛ axl w ax l ) l l. 4) µ w l ) ɛ ax l ) w ax l

DIVORRA, GRANAI AND VANDERGHEYNST By assuing k + l the Lea is proved. Finally, building on the results obtained fro Theore 6 and Leas and, Theore 7 can be proved. Proof: To prove Theore 7 we need to upper bound the factor + λw k + λ w in Eq. 34). For this purpose let us consider the following: + λ w k ) + λ w ) + λw k ) λ w ) λ w + λw k λ w. 4) Together with the results of Lea, it gives: ) + λ w log k + λ w log + k µ w k ) ɛ axk ) µ w ) ɛ ax ) Hence, using Eq. 34) we obtain [ r ) k N N k < + µ w ) ɛ log + ax r ) w ax k ) w ax ) log + k µ w k ) ɛ axk ) w ax k ). µ w ) ɛ ax ) w ax. 4) Theore 7 is thus proved by particularizing the previous expression for the case where k =. For the case of N N + =, this yields that N N < + µ w ) ɛ ax r ) ) log r + log ɛ + ax ) w ax ). µ w ) ɛ ax ) w ax < + µ w ) ɛ ax r ) log r + log + ). 44) µ w ) ɛ ax ) w ax and µ w ) ɛ ax > ɛ ax, then r ) + ɛ ax ) N < + µ w ) ɛ log ax r + log ɛ ax ) r ) 3 < + µ w ) ɛ log ax r + log ɛ ax ) 3 r Which, since µ w ) < ɛ ax < + µ w ) ɛ log ax < + log ɛ ax This is only possible if 3 r r 3 r r ɛ ax ) ). r ɛ ax ) w ax ɛ ax ) ) w ax ) w ax ) w ax w ax ) ) w ax 43). 45)

ITS TECHNICAL REPORT NO. 3/4 Copared to the case where no a priori inforation is available [4], the condition for the validity of bound 33) is softened in our case. Moreover, the upper bound on N is increased, which eans that there is roo for an iproveent on the nuber of correct iterations Indeed, under the assuption of having a reliable a priori inforation, the saller w ax which iplies a good discriination between and ) the easier it is to fulfill the condition stated in Theore 7. C. Exaple: Use of Footprints for ɛ-sparse Approxiations. To give an exaple of the approxiation of signals using a priori inforation, we consider again the case presented in [] where a piecewise-sooth signal is represented by eans of an overcoplete dictionary coposed by the ixture of an orthonoral wavelet basis and a faily of wavelet footprints see [6]). Wavelet+Footprints Dictionary Teporal Axis Function index Fig.. Dictionary fored by the Sylet-4 [7] left half) and its respective footprints for piecewise constant singularities right half). Let us reind the considerations for the dictionary. The dictionary is built by the union of an orthonoral basis defined by the Sylet-4 faily of wavelets [7] and the respective faily of footprints for all the possible translations of the Heaviside function. The latter is used to odel the discontinuities. The graphical representation of the dictionary atrix can be seen in Fig. where the coluns are the wavefors that copose the dictionary. 5 Original 5 Use of OMP with dictionary D without "a priory" ters approx). 5 Use of OMP with dictionary D with "a priory" ters approx). 4 4 4 3 3 3 aplitude aplitude aplitude 4 6 8 teporal axis 4 6 8 teporal axis 4 6 8 teporal axis Fig.. Coparison of OMP based approxiation with ters using the footprints dictionary Fig. ). Left: Original signal. Middle: blind OMP approxiation. Right: OMP with prior knowledge of the footprints location. The use of such a dictionary, indeed, does not satisfy at all the sufficient condition required to ensure the recovery of an ial approxiant with ore than one ter. Moreover, even if the best a priori was available, it is also far fro satisfying the sufficient condition based on the weighted Babel function. Nevertheless, such an exaple is considered in this section because of two ain reasons. The first concerns the fact that sufficient theoretical conditions exposed in the literature are very pessiistic and reflect the worst possible case. The second reason is that, as previously discussed, experience sees to teach us that good dictionaries for efficient approxiation of signals, are likely to be highly coherent. This fact conflicts with the requireent of incoherence for the good behavior of greedy algoriths. Hence, we find this exaple of special interest to underline the benefits of using a priori inforation and additional signal odeling for non-linear expansions. Indeed, with this exaple it is shown that, by using reliable a priori knowledge, better approxiations are possible, not only with incoherent dictionaries where theoretical evidences of the iproveent have been shown in this paper) but also with highly coherent ones.

DIVORRA, GRANAI AND VANDERGHEYNST 3 We repeat the procedure used in [] to estiate the a priori inforation based on the dictionary and the input data. We also refer the reader to Sec. V for a ore detailed explanation. Rate of convergence of the residual error OMP with a priori OMP residual energy 3 4 5 3 4 5 6 7 8 iteration # Fig. 3. Rate of convergence of the error with respect to the iteration nuber in the experient of Fig. Fig. presents the original signal left) together with the two approxiations obtained in this exaple: without a priori in the iddle and with a priori at the right. The signal to be approxiated has a higher nuber of polynoial degrees than the nuber of vanishing oents of the Sylet-4. The figures depict clearly the positive effect of the reliable a priori inforation inserted in the Weighted-OMP algorith. Indeed, with very few coponents, the algorith benefits fro the a priori inforation estiated fro the signal, and gives a uch better approxiation. A ore global view of this behavioral enhanceent can be seen in Fig. 3 where the rate of convergence of the approxiation error is presented. The use of weights is definitively helpful and a considerable gain in the reduction of the approxiation error is achieved for a sall nuber of ters. IV. Approxiations with Weighted Basis Pursuit Denoising In this section, the proble of finding a sparse approxiation of a signal f is addressed considering a trade-off between the error and the nuber of eleents that participate to the approxiation. In statistics this proble is also called Subset Selection and we will refer to it as P : P ) in c f Dc + τ c. 46) Solving P is NP coplex, so a possible way of siplifying the coputation can be to substitute the l quasi-nor with the convex l nor. This relaxation leads to the following proble that, fro now on, is called P : P ) in b f Db + γ b. 47) This new proble corresponds to the iniization of a convex function that can be solved with classical Quadratic Prograing ethods. This relaxation is siilar to the one that leads to the definition of the Basis Pursuit principle for the case of exact signal representation. The fact that this paradig is also called Basis Pursuit Denoising can be explained because it was introduced to adapt BP to the case of noisy data [3]. Note that if D is orthonoral the solution of P can be found by a soft shrinkage of the coefficients [5], [3], while, if D is a union of orthonoral subdictionaries, the proble can be solved recurring to the Block Coordinate Relaxation ethod [8], faster than Quadratic Prograing. In [] we introduced a theoretical fraework for sparse representation over redundant dictionaries taking into account soe a priori inforation about the signal. In this ic we proposed the Weighted Basis Pursuit ethod that iniizes a cost function that includes weights expressing the a priori inforation: The ain results in [] concerning WBP are contained in Proposition : in b W b s.t. Db = f. 48) Definition 4: Given a dictionary D indexed in Ω and an index subset Ω, we define the Weighted Recovery Factor WRF) as: W RF ) = sup D W ) + g i w i. 49) i /

4 ITS TECHNICAL REPORT NO. 3/4 Proposition : Given a dictionary D and an a priori atrix W f, D), Weighted Basis Pursuit is able to recover the ial representation of a sparse signal f = D b if the following Exact Recovery Condition is respected: W RF ) <. 5) Moreover, a bound for the WRF is: µ w ) W RF ) < ɛ ax µ w 5) ), where ɛ ax = sup w γ and wγ are the eleents of the diagonal atrix W. Therefore, the Exact Recovery Condition γ for WBP 5) holds for any index set of size at ost such that A. A Bayesian Approach to Weighted Basis Pursuit Denoising µ w ) + µ w ) < ɛ ax. 5) In this subsection the proble of signal approxiation is studied fro a Bayesian point of view. We also exaine under which hypotheses BPDN finds the ial solution. This leads us to generalize the BPDN principle thorugh the definition of Weighted Basis Pursuit Denoising WBPDN). Let us write again the odel of our data approxiation, where ˆf is the approxiant and r is the residual: Assuing r to be an iid Gaussian set of variables, the data likelihood is pf D, b) = f = ˆf + r = Db + r. 53) πσ r exp f Db σ r ), 54) where σr is the variance of the residual. In the approxiation proble, one ais at axiizing the likelihood of pb f, D). Forally, by the Bayes rule, we have pb f, D) = pf D, b) pb), pf, D) and thus, assuing pf, D) to be unifor, it follows that the ost probable signal representation is: b P = arg ax pf D, b) pb). 55) b Let us now assue the coefficients b i are independent and have a Laplacian distribution with standard deviation σ i : pb i ) = σi exp Fro 55), by coputing the logarith, it follows that ) ) bi σ i f Db σr + i ) bi. b P = arg ax lnpf D, b)) + ln pb i ) = arg in b i b σ i Making the hypothesis that σ i is constant for every index i, the previous equation eans that the ost probable b is the one found by the BPDN algorith [9]. In fact, this hypothesis does not often correspond to reality. On the contrary, if the variances of the coefficients are not forced to be all the sae, it turns out that the ost probable signal representation can be found by solving the following proble: P w ) in f Db + γ W b, 56) where the diagonal atrix with entries in, ] is defined in Section III. One can notice that in Eq. 56), the introduction of weights allows to individually odel the coponents of b. This approach is analogous to the one introduced in [] and [] and, fro now on, we will refer to P w as Weighted Basis Pursuit Denoising or WBPDN. The assuption often ade about the Gaussianity of the residual is quite restrictive. However, for another particular proble, one could ake the hypothesis that this residual has a Laplacian distribution. It is then possible to prove that the ost probable signal representation can be found substituting the l easure of the error with the l. This leads to the following iniization proble: b in f Db + γ W b, where W = I if the variances of the probability density functions of b i are the sae for each i. This proble is faced, for exaple in [], where it is also explained that it can be solved by Linear Prograing techniques. b

DIVORRA, GRANAI AND VANDERGHEYNST 5 B. Preliinary Propositions Here soe preliinary propositions are presented, allowing us to prove the results of the following two subsections. The proofs of ost of the follow the arguents given by Tropp in [3]. In the following c and b lay in R but soeties these are extended to R Ω by padding with zeros. The sae is valid for the atrix W. Next proposition, siilar to the Correlation Condition Lea in [3], establishes a fundaental result for the rest of the report: it basically states that, if the atos of have a sall weighted coherence, expressed by the Weighted Recovery Factor, then the support of any vector that solves P w is a subset of. Lea 3: Given an index subset Ω, suppose that the following condition is satisfied: D T f a ) < γ w ax W RF )), 57) where w ax, ] is the quantity defined by equation 3). Then, any coefficient vector b that iniizes the cost function of proble P w ust satisfy supportb ). 58) Proof: Assue that b is a vector iniizing 56). Assue also that it uses an index outside. b can be copared with its projection D + Db, which is supported in and we obtain: which gives f Db + γ W b f DD + γ W b W Db + γ W D+ Db ), D+ Db ) ) f DD + Db f Db. 59) First we shall provide a lower bound on the left-hand side of the previous inequality. Let us split the vector b into two parts: b = b + b, where the forer vector contains the coponents with indexes in, while the latter the reaining coponents fro Ω \. This yields, by the upper triangular inequality, that Since W b W W D+ Db ) = W b + W b W b + W W b W D+ DW W b. D+ DW W b sup D W ) + g i w i W b, i / D+ Db using 49), one can write that W b W D+ Db ) W RF )) W b. 6) We now provide an upper bound for the right-hand side of 59). This quantity does not depend on the weighting atrix, thus, exactly as in [3], it can be stated that: f DD + f Db b D T f a ). 6) Db Fro 59), 6) and 6) it turns out that: γ W RF )) W b b D T f a ). 6) Since the weights are in, ], and the vector b, by assuption, cannot be null, it can be written: γ W RF )) b W b D T f a ) w ax D T f a ). 63) If 57) is valid, then 63) fails and so one ust discard the hypothesis that b is non-zero for an index in = Ω \. We now focus on finding a necessary and sufficient condition for the existence and unicity of a iniu of P w. The presence of the l nor iply that the cost function of this proble is non-sooth at zero: for this reason the concept of subdifferential is used. Given a real vector variable x, the subdifferential of x is denoted by x and defined as: x {u u x = x, u }. The vectors u that copose the subdifferential are called subgradients [].

6 ITS TECHNICAL REPORT NO. 3/4 Lea 4: A necessary and sufficient condition for b to globally iniize the objective function of P w the coefficient vectors with support is that: over all where u is a vector fro b. Moreover, the iniizer is unique. Proof: One can observe that solving P w fro R : c b = γ DD T ) W u, 64) is equivalent to iniize the following function over coefficient vectors F b) a D b + γ W b. A point b iniizes the second ter of F b) if and only if the following Ferat criterion holds see [], []): b. Moreover, b iniizes F b) if and only if F b ). In our case this eans that u b s.t. DD T b Da T + γw u =, 65) for soe vector u taken fro b. Let the atos in be linearly independent, fro 65) it follows: b D T D ) D T a + γ D T D ) W u =, and so D + a b = γ W D T D ) u. To conclude the proof it is sufficient to recall that c = D + a. If W = I, then this result coincides with the one developed by Fuchs in [3] and by Tropp in [3] in the coplex case. Lea 5: Suppose that b iniizes the cost function of proble P w. Then the following bound holds: c b γ w in DD T ),, 66) where w in is defined as w in inf i w i. 67) Proof: Let us consider the necessary and sufficient condition of Lea 4: taking the l nor of 64) we obtain: c b = γ DD T ) W u γ DD T ) W u, By definition of subdifferential, u. Inserting this into the previous equation and using the sub-ultiplicative property of atrix nors AB p,q A p,q B p,q ), we can prove that c b γ DD T ), W,. Just apply the fact that W, = sup/w i ) = i w in to reach the result. The following proposition states a result that will be used in next subsection to prove Theore 8. Note that here is the ial index subset of Ω, and thus c is the sparsest solution to the subset selection proble. Proposition : Tropp [3]) Given an input signal f and a threshold τ, suppose that the coefficient vector c, having support of cardinality, is the sparsest solution of the proble P. Set f = Dc. Then: k, c k) τ. i /, f f, g i < τ. These preliinary stateents of this subsection will allow us to obtain the ain results in the following of the report.

DIVORRA, GRANAI AND VANDERGHEYNST 7 C. Weighted Relaxed Subset Selection Let us now study the relationship between the results obtained by solving proble P w and P. Suppose that c is the sparsest solution to P and that its support is, with =. D will be the atrix containing all the atos participating to the sparsest approxiation of f and f will be the approxiant given by c, i.e f = Dc = DD + f = D D + f. Assuing W RF ) <, we have the following result. Theore 8: Suppose that b iniizes the cost function of proble P w γ = τ wax W RF ), where w ax is defined in 3). Then: ) WBP never selects a non ial ato since supportb ). ) The solution of WBP is unique. 3) The following upper bound is valid: c b τ wax w D T in D W RF ) with threshold ),. 68) 4) The support of b contains every index j for which τ wax w D in T D ), c j) >. 69) W RF ) The scalar w in appearing in Eqs. 68) and 69) is defined in Eq. 67). Proof: Considering the first stated result, note that every ato indexed by has zero inner product with the ial residual r = f f ) since f is the best approxiation of f using the atos in. Using Proposition and recalling that D is finite, it can be stated that D T f f ) < τ. 7) Moreover, Lea 3 guarantees that γ s.t. D T f f ) < γ w ax W RF )), 7) and that the solution b to the convex proble P w is supported on. Fro 7) and 7) it follows that for any γ that satisfies the following condition, it is insured that supportb ) : γ τ wax W RF ). 7) In the following, the sallest possible value for γ is chosen: in this way, Eq. 7) becoes an equality. The uniqueness of the solution follows fro Lea 4. With regard to the third point, the results fro Lea 5 yield c b γ w in D T ), τ wax w D in W RF ) D T ), D. This proves the point 3. Concerning the fourth result of the theore, one can observe that, for every index j for which c j) > γ D T ) D W,, the corresponding coefficient b j) ust be different for zero. As before, substituting Eq. 7) leads to prove the last result of the Theore. This theore states two iportant concepts. First, if the trade off paraeter is correct and the weighted cuulative coherence of the dictionary is sall enough, WBPDN is able to select the correct atos for the signal expansion and it will not pick up the wrong ones. Furtherore, the error achieved by the algorith on the aplitude of the coefficients related to the selected atos is bounded. Nevertheless, once the algorith has recovered the ato subset, the appropriate aplitudes of the coefficients can be coputed by the orthogonal projection of the signal onto the space generated by the selected atos. This ethod is illustrated in Section IV-E to generate soe exaples.

8 ITS TECHNICAL REPORT NO. 3/4 and w ax depend on the reliability and goodness of the a priori. In particular, if W tends to The quantities w in be ial i.e. its diagonal entries tend to for the eleents that should appear in the sparsest approxiation and to for the ones that should not), one can observe that w in and w ax. D. Relation with the Weighted Cuulative Coherence In this subsection, the previous results are described using the weighted cuulative coherence function defined in 5). In this way a coparison is ade between the results achievable by BPDN and WBPDN. Theore 9: Assue that the real vector b solves P w with γ = wax τ ɛ ax µ w )) ɛ ax µ w ). µw ) Then supportb ) and τ wax ɛ w b c in ax µ w )) ɛ ax µ w ) µw )) µ )). 73) Proof: This result can be obtained fro Proposition and Theore 8, since: b c γ w in D T ) τ wax ɛ, w in ax µ w )) D T D ), D = ɛ ax µ w ) µw )). The last ter of the nuerator in the previous expression is the nor of the inverse Gra atrix. Since see [3], [3], []) this proves equation 73). D T D ), = D T D ), µ ), This result is valid in general and illustrates how the distance between the ial signal approxiation and the solution found by solving P w can be bounded. In case no a priori is given, the bound on the coefficient error is obtained fro Eq. 73) setting W = I. Consequently, w in =, ɛ ax = and w ax = see also [3]): b c τ µ ) µ ). 74) Coparing the two bounds, one can observe how the availability of a reliable prior on the signal can help in finding a sparser signal approxiation. This concept is ephasized in the following corollary. Corollary 3: Let W f, D) be a reliable a priori knowledge, with w ax w in then for any positive integer such that µ )+µ ) < and µ w )+µ w )+ɛ ax < µ )+µ ) <, the error b c given by the coefficients found by WBPDN is saller than the one obtained by BPDN. Hence, the bound stated by Eq. 73) is lower than the one in Eq. 74), i.e. τ wax ɛ ax µ w )), w in ɛ ax µ w ) µw )) µ )) τ µ ) µ ). Note the siilarity between Corollaries and 3. The proof of the latter Corollary is reported in the appendix. Here the hypothesis that wax is ade. Observe that if the a priori is particularly good this factor can be uch saller w in than one and so the iproveent with respect to the general case can be really big.

DIVORRA, GRANAI AND VANDERGHEYNST 9 E. Exaple We exaine again the exaple presented in section III-C, but this tie using the Basis Pursuit Denoising and Weighted Basis Pursuit Denoising algoriths. The dictionary used for the decoposition is illustrated in section III-C, as well as the input signal. For an explanation of the prior odel and the extraction of the a priori atrix, see [] or Section V. The signal f is decoposed first solving the iniization proble illustrated in 47) BPDN). Then, the a priori knowledge is introduced, and we solve the iniization proble illustrated in 56) WBPDN). Both solutions were nuerically found using Quadratic Prograing techniques. The trade off paraeter γ controls the l nor of the coefficient vector and indirectly its sparseness. The signal representations present any coponents with negligible values due to the nuerical coputation: a hard thresholding is perfored in order to get rid of this isleading inforation, paying attention not to eliinate significant eleents. In this ic it is possible to easure the l nor of the vector b. The data reported here refer to a threshold value of.. Of course the reconstructions are coputed starting fro the thresholded coefficients. Figure 4 shows the reconstructions of the input signal given by a -ters approxiation found by BPDN and WBPDN. Figure 6 on the left-hand) illustrates the ean square error of the 5 Original 5 reconstructed by BP: ters 5 reconstructed by WBP: ters 4 4 4 3 3 3 aplitude aplitude aplitude 4 6 8 teporal axis 4 6 8 teporal axis 4 6 8 teporal axis Fig. 4. Coparison of BPDN and WBPDN based approxiation with ters using the footprints dictionary Fig. ). Left: Original signal. Center: reconstruction using BPDN coefficients. Right: reconstruction using WBPDN coefficients. approxiations of f with ters found by BPDN and WBPDN. Let us call b the approxiation found by BPDN and b w the one found by WBPDN. As just explained, these vectors are thresholded reoving the nuerically negligible coponents, and in this way we are able to individuate a sparse support and so a subset of the dictionary. Let us label the subdictionary found by WBPDN with D w coposed by the atos corresponding to the non-zero eleents of b w ). Once this is given, there are no guarantees that the coefficients that represent f are ial see Theores 8 and 4). These are, thus, recoputed projecting the signal onto D w and a new approxiation of f naed b w is found. Exactly the sae is done for BPDN, ending up with a subdictionary D and a new approxiation b. Of course, supportb ) = supportb ) and supportb w ) = supportb w ). Forally the approxiants found by BPDN and WBPDN are respectively: f = D D + f = Db and f w = D w D w ) + f = Db. In synthesis, the pursuit algorith is used only to select a dictionary subset and then the coefficients of the approxiation are coputed again, by eans of a siple projection. Fig. 5 and 6, show how this technique considerably iproves the results obtained by solving probles P and P w. Moreover they confir the advantages of the weighted algorith with respect to the non weighted one. 75) V. Exaples: A Natural Signal Approxiation with Coherent Dictionaries and an A Priori Model In the previous sections, the exaple of approxiating a synthetically generated piecewise-sooth signal has been presented. This successfully illustrates how we can exploit prior signal odels in order to aeliorate the behavior of sub-ial algoriths in the retrieval of sparse approxiations. In this section we show that the iproveent is not liited to artificially generated signal, provide a very siple exaple where weighted algoriths perfor better also with natural signals. Moreover, as we will show, the a priori weights can be autoatically extracted fro the data and iized such that the perforance of the weighted algorith in use is axiized. Continuing with the class of signals analyzed in previous exaples, we would like to approxiate a signal that can be considered as piecewise-sooth. This kind of signals can be efficiently approxiated by using dictionaries of functions that can ially represent both features of the signal: discontinuities and sooth parts. For this purpose an overcoplete coherent dictionary given by footprints and wavelets is used, as in [] and in all the previous exaples in the present work.

ITS TECHNICAL REPORT NO. 3/4 5 original BP BP+proj 5 original WBP WBP+proj 4 4 3 3 4 6 8 4 6 8 Fig. 5. The original signal reconstructed fro a -ters approxiation coputed by BPDN left) and WBPDN right). The coparison show the iproveent given by recoputing the projections once that the algorith has selected a subdictionary. For the errors see Figure 6 WBP BP BP + proj WBP + proj MSE MSE 5 6 7 8 9 3 # ters 3 6 7 8 9 3 # ters Fig. 6. Errors in log scale) of the -ters approxiations f with BPDN and WBPDN. In the figure on the right the approxiations are coputed projecting the signal on the subdictionary selected by the algorith see Eq. 75) A. A Noisy Piecewise-sooth Natural Signal As often suggested, iages ay be represented by a piecewise-sooth coponent and a texture coponent [4], [5]. A way to get a natural D piecewise-sooth signal is to extract a colun or a row fro a natural iage. In particular, we study two D signals extracted fro the iage caeraan, shown in Fig. 7: one corresponds to the 4th colun, and the second to the 8th row. Fig. 7. Caeraan picture used to extract the exaple of a real world D signal with piecewise-sooth characteristics

DIVORRA, GRANAI AND VANDERGHEYNST B. Modeling the Relation Signal-Dictionary The dictionary in use is coposed by the union of the Sylet-4 orthonoral base and the set of piecewise-constant footprints see Sec. III-C and Fig. ). Since the input signals are constituted by 56 saples, the dictionary is expressed by a atrix of size 56 5. The odeling of the interaction between the signal and the dictionary is perfored using the siple approach described in []. The weighting atrix W f, D) is generated by eans of a pre-estiation of the locations where footprints are likely to be used and the assuption that in such locations wavelets have less probability to be required. This discriination is reflected in the diagonal atrix W f, D) in the following way: locations where a footprint is likely to be placed are not penalized thus the weighting factor reains ). On the contrary, wavelets that overlap the footprint and footprints considered unlikely to be used get a penalizing factor β, ]. To be ore explicit, the edge detection based odel used in [] is recalled: Algorith W f, D) estiation Require: D = D Sylet D F ootprints, define a threshold λ, define a penalty factor β : f diff = D + F ootprints f {Footprints location estiation edge detection)} : Threshold f diff by λ putting greater values to or β otherwise. 3: W diag footprints = f diff {Diagonal of the sub-atrix of W f, D) corresponding to footprints.} 4: Create Wwave diag s.t. [ all wavelets intersecting ]) the found footprints locations equal β, set to otherwise. 5: W f, D) = diag ; W diag wave W diag footprints As one can observe, two paraeters configure the odel that generates W f, D): a threshold λ and a penalty weight β. In practice, as shown later in this section, these can be selected by an iization procedure such that the energy of the approxiation error is iniized. C. Signal Approxiation An additional phase is inserted before the approxiation. It consists of an estiation of the ain features of the particular signal to approxiate and their relation with the dictionary in use. We thus resue the general procedure by these two steps: ) Estiation of the a priori inforation fro the real world signal using a prior odel. ) Use of a weighted algorith greedy or relaxed) based on the estiated a priori knowledge to find the appropriate atos subset to approxiate the real world signal. Furtherore, an iterative version of this two phase algorith can be considered in order to iize the paraeters that configure the prior odel used in the first step: the Expectation Maxiization EM) algorith. A first approach for the paraeters tuning can be a grid search, or a ulti-scale grid search. More sophisticated search techniques could be used, in particular, an overview can be found in [6]. D. Results The results obtained fro the fraework introduced above are illustrated in the following. First, we show the quantitative ipact of using Weighted-MP/OMP and WBPDN in ters of the residual error energy. Right after, the use of atos of the dictionary to represent the ain features of the signal is analyzed. Finally, we explore the influence of tuning in an appropriate way the two paraeters that configure our penalty odel. ) Approxiation Results with OMP: Two approxiation exaples, where OMP and Weighted-OMP are used, are presented. Following the two steps procedure, described above, we look for approxiants of the signals appearing at the left of Figs. 8 and 9. The forer corresponds to the 4th colun of the caeraan picture and the latter to the 8th row. The iproveent in perforance of Weighted-OMP in the case of sparse approxiations is assessed by the rate of convergence of the residual energy. On the right hand of Figs. 8 and 9, the graphs show that after a certain nuber of iterations, Weighted-MP selects better atos than classic Weak-MP. Hence the convergence of the error iproves and this yields a gain of up to dbs for the first exaple and up to.5 dbs in the second one depending on the iteration). ) Approxiation Results with BPDN: The D signal extracted fro the 4th colun of caeraan, illustrated on the left side of Fig. 8, is approxiated by using BPDN and WBPDN. As explained in section IV-E, the pursuit algorith is used only to select a dictionary subset and then, the coefficients of the approxiation are coputed again by eans of a siple projection. Fig. shows the decay of the energy of the error while the nuber of atos selected increases. It is clear how the use of the a priori helps the algorith in finding a better approxiation of the signal. The results concerning WBPDN are obtained by ading a weighting atrix that corresponds to λ = 9 and β =.. Notice that these values are not ial for all the nubers of non-zero coefficients. Better results can be achieved by tuning appropriately β and γ for any desired.

ITS TECHNICAL REPORT NO. 3/4 Original caeraan colun 4 Rate of convergence of the residual error 5 65 OMP with a priori OMP 6 Aplitude 5 residual energy dbs) 55 5 5 45 5 5 5 t 4 5 5 5 3 35 4 45 5 iteration # Fig. 8. Experient of approxiating the D signal extracted fro the 4th colun of caeraan. Left, D signal used in the experience can be seen. Right, the rate of convergence of the residual error. In red can be observed the OMP result. In blue the Weighted-OMP result. Original 65 Rate of convergence of the residual error OMP with a priori OMP 8 6 6 55 Aplitude 4 8 6 residual energy dbs) 5 45 4 4 35 5 5 5 t 3 5 5 5 3 35 4 45 5 iteration # Fig. 9. Experient of approxiating the D signal extracted fro the 8th row of the 56x56 caeraan picture fro Fig. 7. Left, D signal used in the experience can be seen. Right, the rate of convergence of the residual error. In red can be observed the OMP result. In blue the Weighted-OMP result. 54 53 BP WBP 5 5 5 Error dbs) 49 48 47 46 45 44 4 9 34 39 44 49 # coefficients Fig.. Error in db) obtained by BPDN and WBPDN approxiating the D signal extracted fro the 4th colun of the iage caeraan see Fig. 7) using different nubers of atos. Both results are obtained by using quadratic prograing for selecting a dictionary subset and then recoputing the coefficients by re-projecting the signal onto the span of the subdictionary. The procedure is illustrated in section IV-E.

DIVORRA, GRANAI AND VANDERGHEYNST 3 3) Capturing the Piecewise-sooth Coponent with Footprints Basis: Here, the results intend to underline the iportance of selecting the appropriate ato to represent a particular signal feature. In Fig. we can see in the upper row the resulting approxiants of original signal obtained fro the 4th colun Fig. 8) after 5 iterations of OMP left) and Weighted-OMP right). The result which considers the a priori is a.5 dbs better than the approxiant obtained by OMP. At this point, it is iportant to notice the result depicted in the lower row of Fig.. These wavefors represent the signal coponents that are captured exclusively by the footprints atos and Sylet-4 scaling functions. These signal coponents should correspond to the piecewise-sooth parts of the signal. However, and as depicted by the lower row of Fig., in the case of OMP botto left) the piecewise-sooth coponent captured by footprints and low-pass functions is far fro what one could expect. Intuitively one can understand that the OMP algorith is failing in the selection of atos. On the other hand, the result obtained by Weighted-OMP botto right) clearly shows that footprints and Sylet-4 scaling functions are capturing a uch ore accurate approxiant of the piecewise-sooth coponent of the signal. Hence, a better approxiation is achieved by using the a priori inforation. This leads to a sparser approxiation too. Reconstruction fro all retrieved coefs without W Reconstruction fro all retrieved coefs with W 5 5 Aplitude 5 Aplitude 5 5 5 5 5 5 t 5 5 5 t Reconstruction fro the footprint part without W Reconstruction fro the footprint part with W 5 5 Aplitude 5 Aplitude 5 5 5 5 5 5 t 5 5 5 t Fig.. Upper left: Approxiation after 5 iterations with OMP without using a priori inforation. Upper right: Approxiation after 5 iterations with Weighted-OMP+.5 dbs). Botto left: Signal coponents captured by Sylet-4 scaling functions and Footprints when siple OMP is used. Botto right: Signal coponents captured by Sylet-4 scaling functions and Footprints using Weighted-OMP. 4) Paraeter Search: Finally, we show the influence of the paraeters λ and β in the average quadratic error of the residues obtained by Weighted-MP, i.e. E {r n λ, β } = N n= r n N s.t. r n has been obtained fixing λ = λ and β = β. 76) In Figs. and 3 the agnitude of Eq. 76) is shown as a function of λ odel threshold) and β penalty weight) for the two natural exaples exposed in this work. Fig. corresponds to the case of the 4th colun of caeraan, and Fig. 3 concerns the 8th row. In the figures, a apping of the eaning of the colors is available. The lower the value of E {r n λ, β } is, the ore probably the associated odel paraeters are the good ones. In this case, it can be easily observed how the ial configuration of paraeters concentrates in both cases in a unique global iu. Hence, the set of ial paraeters that fit the data odel can be easily found by soe iterative procedure.

4 ITS TECHNICAL REPORT NO. 3/4 Expectation ap caeraan colun 4 3 odel threshold λ 4 5 6 7 8 9...3.4.5.6.7.8.9. probability weight β less probable ore probable Fig.. Representation of the expectation ap depending on the paraeters that configure the a priori odel in the experient set up in Fig. 8. The expectation corresponds to the energy of the residual error. Expectation ap caeraan line 8 3 odel threshold λ 4 5 6 7 8 9...3.4.5.6.7.8.9 probability weight β less probable ore probable Fig. 3. Representation of the expectation ap depending on the paraeters that configure the a priori odel in the experient set up in Fig. 9. The expectation corresponds tot the energy of the residual error. VI. Conclusions This work presents theoretical results on the perforance of including a priori knowledge in algoriths for signal approxiation like Weak-MP or BPDN. We introduce weighted variants called Weighted-MP/OMP and WBPDN. Theoretical results show that these algoriths ay supply uch better results than classic approaches for highly nonlinear signal approxiations, if sufficiently reliable prior odels are used. This reliability is theoretically represented in our results by ɛ ax while the discriinative ability of the a priori is given by w ax and w in. The fact that these quantities are norally unavailable in practice, disables us to use nuerically the sufficient conditions found unlike in the case where no prior is used. Nevertheless, the results found are able to explain how weighted Greedy and BPDN algoriths behave copared to non-weighted ones. A field to explore ay be to try to deterine soe bounds on this quantity depending on the class of signals to approxiate and the practical estiators those that generate the a priori weights) in use. Practical exaples concerning synthetic and natural signals have been presented where we used a dictionary with high internal cuulative coherence. Theoretically, with such a dictionary, greedy or relaxation algoriths are not guaranteed to recover the set of atos of the best -ters approxiation. Sufficient conditions for the recovery of all the correct atos of the best -ters approxiation are far fro being satisfied.