On the Use of A Priori Information for Sparse Signal Approximations

Size: px
Start display at page:

Download "On the Use of A Priori Information for Sparse Signal Approximations"

Transcription

1 ITS TECHNICAL REPORT NO. 3/4 On the Use of A Priori Inforation for Sparse Signal Approxiations Oscar Divorra Escoda, Lorenzo Granai and Pierre Vandergheynst Signal Processing Institute ITS) Ecole Polytechnique Fédérale de Lausanne EPFL) LTS-ITS-STI-EPFL, 5 Lausanne, Switzerland Technical Report No. 3/4 Abstract This report is the extension to the case of sparse approxiations of our previous study on the effects of introducing a priori knowledge to solve the recovery of sparse representations when overcoplete dictionaries are used []. Greedy algoriths and Basis Pursuit Denoising are considered in this work. Theoretical results show how the use of reliable a priori inforation which in this work appears under the for of weights) can iprove the perforances of these ethods. In particular, we generalize the sufficient conditions established by Tropp [], [3] and Gribonval and Vandergheynst [4], that guarantee the retrieval of the sparsest solution, to the case where a priori inforation is used. We prove how the use of prior odels at the signal decoposition stage influences these sufficient conditions. The results found in this work reduce to the classical case of [4] and [3] when no a priori inforation about the signal is available. Finally, exaples validate and illustrate theoretical results. Index Ters Sparse Approxiations, Sparse Representations, Basis Pursuit Denoising, Matching Pursuit, Relaxation Algoriths, Greedy Algoriths, A Priori Knowledge, Redundant Dictionaries, Weighted Basis Pursuit Denoising, Weighted Matching Pursuit. Contents I Introduction II Recovery of General Signals: Sparse Approxiations II-A Greedy Algoriths: Weak-MP II-A. Robustness II-A. Rate of Convergence II-B Convex Relaxation of the Subset Selection Proble III Including A Priori Inforation on Greedy algoriths 5 III-A Influence on Sparse Approxiations III-B Rate of Convergence of Weighted-MP/OMP on Sparse Approxiations III-C Exaple: Use of Footprints for ɛ-sparse Approxiations IV Approxiations with Weighted Basis Pursuit Denoising 3 IV-A A Bayesian Approach to Weighted Basis Pursuit Denoising IV-B Preliinary Propositions IV-C Weighted Relaxed Subset Selection IV-D Relation with the Weighted Cuulative Coherence IV-E Exaple V Exaples: A Natural Signal Approxiation with Coherent Dictionaries and an A Priori Model 9 V-A A Noisy Piecewise-sooth Natural Signal V-B Modeling the Relation Signal-Dictionary V-C Signal Approxiation V-D Results V-D. Approxiation Results with OMP Web page: The work of Oscar Divorra Escoda is partly sponsored by the IM. NCCR The work of Lorenzo Granai is supported by the SNF grant -669./

2 ITS TECHNICAL REPORT NO. 3/4 V-D. Approxiation Results with BPDN V-D.3 Capturing the Piecewise-sooth Coponent with Footprints Basis V-D.4 Paraeter Search VI Conclusions 4 References 6 I. Introduction In any applications, such as copression, denoising or source separation, one seeks an efficient representation or approxiation of the signal by eans of a linear expansion into a possibly overcoplete faily of functions. In this setting, efficiency is often characterized by sparseness of the associated series of coefficients. The criterion of sparseness has been studied for a long tie and in the last few years has becoe popular in the signal processing counity [5], [6], [], [3]. Natural signals are very unlikely to be exact sparse superpositions of vectors. In fact the set of such signals is of easure zero in C N []. We thus extend our previous work on sparse exact representations [] to the ore useful case of sparse approxiations: in c f Dc s.t. c. ) In general, the proble of recovering the sparsest signal approxiation or representation) over a redundant dictionary is a NP-hard proble. However this does not ipair the possibility of finding this efficiently when particular classes of dictionaries are used [7]. As deonstrated in [], [3], [6], [4], in order to ensure the good behavior of algoriths like General Weakα) Matching Pursuit Weak-MP) and Basis Pursuit Denoising BPDN), dictionaries need to be incoherent enough. Under this ain hypothesis, sufficient conditions have been stated so that both ethods are able to recover the atos fro the sparsest -ters expansion of a signal. However, experience and intuition dictate that good dictionaries for sparse approxiations of natural signals can be very dense and, depending on the kind of signal structures to exploit, they ay in ost cases be highly coherent. For exaple, consider a natural iage fro which an efficient approxiation of edges is desired. Several approaches have been proposed where the functions in use have a very strong geoetrical eaning [8], [9], [], []. Indeed, they represent local orientation of edges. In order to accurately represent all orientations, the set of functions to use need to finely saple the direction paraeter. This yields that the atos of the dictionary ay have a strong correlation. Moreover, if further failies of functions are considered in the dictionary in order to odel other coponents like textures, or sooth areas, the coherence between the whole set of functions can becoe even higher. As a further otivation, one can also observe that the set of visual priitives obtained by Olshausen and Field while studying the spatial receptive fields of siple cells in aalian striate cortex [] is redundant and with a high coherence. Concerning the case of exact sparse signal representations, we introduced in [] a way of using ore coherent dictionaries with Weak-MP and Basis Pursuit BP), while keeping the possibility of recovering the ial solution. Two ethods were proposed based on the use of a priori inforation about the signal to decopose: we called the Weighted-MP and Weighted-BP. In this report we address the case of sparse signal approxiations, discussing the potentiality of using a priori knowledge in the ato selection procedure. We do not face here the issue of how to find a reliable and useful a priori knowledge about a signal. This proble strongly depends on the nature of the signal and on the kind of dictionary used. The ai of this paper is the theoretical study of the weighted algoriths in the prospective of achieving sparseness. Main results are: The definition of Weighted-BPDN and Weighted-MP/OMP algoriths for approxiation purposes. We reforulate classic BPDN and Weak-MP in order to take a priori inforation into account when decoposing the signal. A sufficient condition under which Weighted Basis Pursuit Denoising and Weighted-MP/OMP find the best -ter signal approxiation. A study of how adapting the decoposition algorith depending on a priori inforation ay help in the recovery of sparse approxiations. An analysis of the effects of adding the a priori weights on the rate of convergence of Weak-MP. An epirical analysis, on natural signals, of the effect of using prior odels at the decoposition stage when coherent overcoplete dictionaries are used. II. Recovery of General Signals: Sparse Approxiations Exact sparse representations are ostly useless in practice, since the set of such signals is of easure zero in C N []. There are very few cases where, for a given signal, one can hope to find an -sparse representation. Sparse approxiations, on the other hand, have found nuerous applications, e.g.. in copression, restoration, denoising or sources separation.

3 DIVORRA, GRANAI AND VANDERGHEYNST 3 The use of overcoplete dictionaries, although in a sense advantageous, has the drawback that finding a solution to ) ay be rather difficult or even practically infeasible. Subial algoriths that are used instead like Weak General Matching Pursuit algoriths -Weak-MP- [4] or l -nor Relaxation Algoriths -BPDN- [3]) do not necessarily supply the sae solution as the proble forulated in ). However, there exist particular situations in which they succeed in recovering the correct solution. Very iportant results have been found in the case of using incoherent dictionaries to find sparse approxiations of signals. Indeed, sufficient conditions have been found such that for a dictionary Weak-MP and BPDN algoriths can be guaranteed to find the set of atos that for the sparsest -ters approxiant f of a signal f. Moreover, for the case where Orthogonal Matching Pursuit OMP) is used the set of coefficients found by the algorith will correspond also to the ial one. Prior to reviewing the results that state the sufficient conditions introduced above, let us define and recall a series of eleents that will be used in the reaining of the paper. f H is the function to be approxiated, where H is a Hilbert space. Unless otherwise stated, in this report we assue f R n. D and D define respectively the set of atos included in the dictionary and the dictionary synthesis atrix where each one of the coluns corresponds to an ato D = {g i : i Ω}). f is the best approxiant of f such that f a positive integer. = D c where the support of c is saller than or equal to Given n, r n and f n are the residual and approxiant generated by a greedy algorith at its nth iteration. is the ial set of atos that generate f. Often, in the text, this will be referred to as for siplicity. The best approxiation of f over the atos in is called a = DD + f. α, ], is defined as the weakness factor associated to the ato selection procedure of Weak-MP algoriths [4]. µ, D) is a easure of the internal cuulative coherence of D []: µ, D) ax ax g i, g λ, ) = i Ω\ where Ω has size. Reark that the easure known as coherence of a dictionary µ) and often used to characterize redundant dictionaries corresponds to the particular case of µ = µ, D). Furtherore µ, D) µ. Let η η ) be a subiality factor associated to the case where the best -ters approxiation can not be reached by the algorith in use. In such a case, the residual error energy after approxiation is + η) r instead of r. A. Greedy Algoriths: Weak-MP Gribonval and Vandergheynst extended the results Tropp found for the particular case of OMP to the general Weak- MP. Akin to the case of signal representation, the ain results consist in the sufficient conditions that guarantee that Weak-MP will recover the ial set of atos that generate the best -ters approxiant f. Moreover, a result establishes as well an upper bound on the decay of the residual energy in the approxiation of a signal that depends on the internal coherence of D, and a bound on how any correct iterations can be perfored by the greedy algorith depending on the dictionary and the energy of f. ) Robustness: The sufficient conditions found in [4] that ensure that Weak-MP will recover the set of atos that copose the best -ters approxiant are enounced in Theore. First of all, it is necessary that the ial set satisfies the Stability Condition [4]. If in addition soe conditions are satisfied concerning the reaining residual energy at the nth iteration r n ) and the ial residual energy r, then an additional ato belonging to will be recovered. This condition, called originally the General Recovery Condition in [], was naed, for the case of general Weak-MP, the Robustness Condition in [4]. Theore : Gribonval & Vandergheynst [4]) Let {r n } n be a sequence of residuals coputed by General MP to approxiate soe f H. For any integer such that µ ) + µ ), let f = γ c γ g γ be a best -ters approxiation to f, and let N = N f) be the sallest integer such that r N ) r µ )) + µ ) µ )). 3) Then, for n < N, General MP picks up a correct ato. If no best -ters approxiant exists, the sae results are valid provided that r = f f is replaced with f f = + η) r in 3). λ

4 4 ITS TECHNICAL REPORT NO. 3/4 ) Rate of Convergence: In the following the ain result concerning the exponential decay of the error energy bound, as well as the bound on how any correct iterations can be perfored by the greedy algorith, is reviewed. Theore : Gribonval & Vandergheynst [4]) Let {r n } n be a sequence of residuals coputed by General MP to approxiate soe f H. For any integer such that µ ) + µ ), we have that r n r µ ) n l ) r l ) r. 4) Moreover, N, and for : if r 3 r /, then N < + µ ) ln 3 r r else N. 5) B. Convex Relaxation of the Subset Selection Proble Another instance of proble ) is given by in c f Dc + τ c. 6) Unfortunately the function that has to be iniized is not convex. One can define a p-nor of a vector c for any positive real p: ) /p. c p = c i p 7) i Ω It is well known that the sallest p for which Eq. 7) is convex is. For this reason the convex relaxation of the subset selection proble was introduced in [3] under the nae of Basis Pursuit Denoising: in b f Db + γ b. 8) This proble can be solved recurring to Quadratic Prograing techniques see also Section IV). In [3], the author studies the relation between the subset selection proble 6) and its convex relaxation 8). Next theore shows that any coefficient vector which iniizes Eq. 8) is supported inside the ial set of indexes. Theore 3: Correlation Condition, Tropp [3]) Suppose that the axiu inner product between the residual signal and any ato satisfies the condition D f a ) < γ sup D + g i ). i / Then any coefficient vector b that iniizes the function 8) ust satisfy supportb ). In particular, the following theoretical result show how the trade off paraeters τ and γ are related. Theore 4: Tropp [3]) Suppose that the coefficient vector b iniizes the function 8) with threshold γ = τ/ sup i / D + g i ). Then we have that: ) the relaxation never selects a non ial ato since supportb ) supportc ). ) The solution of the convex relaxation is unique. 3) The following upper bound is valid: τ DD ), c b sup D + g. 9) i i / 4) The support of b contains every index j for which τ DD ), c j) > sup D + g. ) i i /

5 DIVORRA, GRANAI AND VANDERGHEYNST 5 If the dictionary we are working with is orthonoral it follows that sup D + g i = and i / D D ), = and the previous theore becoes uch stronger. In particular we obtain that c b τ and c j) > τ [3], [5]. A proble siilar to the subset selection is given by the retrieval of a sparse approxiation given an error constraint: whose natural convex relaxation is given by in c c s.t. f Dc ɛ, ) in b b s.t. f Db ɛ. ) In this paper, we are not going to explore this proble, but let us just recall that if the dictionary is incoherent, then the solution to ) for a given ɛ is at least as sparse as the solution to ), with a tolerance ɛ soewhat saller than ɛ [3]. Fro all these results, one can infer that the use of incoherent dictionaries is very iportant for the good behavior of greedy and l -nor relaxation algoriths. However, as discussed in the introduction, experience sees to teach us that the overcoplete dictionaries which are likely to be powerful for natural signals approxiation would be very redundant and with significant internal coherence. Hence, this inconsistent and contradictory situation clais for a solution to be found. In the following sections, we introduce a general approach that intends to tackle this proble. In our opinion, a ore careful analysis and odeling of the signal to approxiate is necessary. Dictionary wavefors are not enough good odeling eleents to be exploited in the signal decoposition stage. Further analysis is required to better drive the decoposition algorith. As in our previous work concerning the case of exact signal representations [], in this report, a priori knowledge that relates the signal f and the dictionary D are considered for signal approxiations. III. Including A Priori Inforation on Greedy algoriths As seen in our previous work on the exact sparse representation of signals [], the use of a priori inforation on greedy algoriths ay ake the difference between recovering the ial set of coponents for a given approxiation or not. In this section we explore the effect of using a priori knowledge on greedy algoriths on the recovery of the ) of a signal f. First, sufficient conditions for the recovery of a correct ato fro the sparsest -ters approxiant are established for the case where a priories are taken into account. Later, we study how prior knowledge affects the rate of convergence of greedy algoriths. Finally, a practical exaple is presented. best -ters approxiant f A. Influence on Sparse Approxiations An iportant result concerning sparse approxiations is the feasibility of recovering the sparsest -ters approxiation f. Akin to the stateents established for the exact representation case, sufficient conditions have been deterined such that, given a Weak-MP and the associated series of atos g γn and residuals r n n ) up to the nth step, a correct ato at the n + )th step can be guaranteed see Sec. II). In Theore 5, sufficient conditions are presented for the case when soe a priori knowledge is available. The ain interest of this result is to show that if an appropriate a priori concerning f and D) is in use, a better approxiation result can be achieved. First of all, let us expose the eleents that take part into the following results. In order to enhance the clarity of the explanation, we first recall the definition of the ain concepts that will be used. Let us consider first the diagonal atrix W f, D) introduced in [] to represent the a priori knowledge taken into account in the ato selection procedure. Definition : A weighting atrix W = W f, D) is a square diagonal atrix of size d d. Each of the entries w i, ] fro the diagonal corresponds to the a priori likelihood of a particular ato g i D to be part of the sparsest decoposition of f. We define also w ax hence: as the biggest of the weights corresponding to the subset of atos belonging to = Ω \, w ax sup w γ. 3) γ

6 6 ITS TECHNICAL REPORT NO. 3/4 Moreover, an additional quantity is required in the results depicted below: ɛ ax sup w γ. 4) γ Eqs. 3) and 4) give inforation about how good the a priori inforation is. The reader will notice that these quantities depend on the ial set of atos, aking not possible to establish a rule to copute the in advance. The role of these agnitudes is to represent the influence of the prior knowledge quality in the results obtained below. Notice that ɛ ax and w ax. ɛ ax is close to zero if good atos the ones belonging to ) are not penalized by the a priori. If the supplied a priori is a good enough odel of the relation between the signal and the dictionary in use, we state that the a priori knowledge is reliable. w ax becoes sall if bad atos are strongly penalized by the prior knowledge.as we will see in the following, if the a priori is reliable and w ax is sall, then the prior knowledge can have a relevant positive influence in the behavior of the greedy algorith. The consequence of taking into account the a priori atrix W, is to allow a new definition of the Babel Function introduced by Tropp []. In [] the fact that not all the atos of D are equiprobabile is taken into account. In effect, the availability of soe prior should be considered when judging whether a greedy algorith is going to be able to recover the -sparsest approxiation of a signal f or not. As seen in the Sec. II, the conditions that ensure the recoverability of the best ter approxiant relay on the internal coherence easure of a dictionary µ ). Using the a priori inforation, soe ato interactions can be penalized or even disissed in the cuulative coherence easure. Hence, a new signal dependent cuulative coherence easure was introduced in []: Definition : The Weighted Cuulative Coherence function of D is defined as the following data dependent easure: µ w, D, f) ax ax < g λ, g i > w λ w i. = i Ω\ 5) λ Once all necessary eleents have been defined, we can finally state the result that shows the behavior of greedy algoriths on the use of a priori inforation for the recovery of -sparse approxiants. As proved later in this section, the use of such knowledge iplies an iproveent with respect to the classic Weak-MP also for the case of signal approxiation. Theore 5: Let {r n } : n, be the set of residuals generated by Weighted-MP/OMP in the approxiation of a signal f, and let f be the best -ters approxiant of f over D. Then, for any positive integer such that, for a reliable a priori inforation W f, D), µ w ) + µ w ) < ɛ ax, η and r n > f f + η) + µw ) + ɛ ax )) w ax ) ) µ w ) + µ w ) + ɛ ax )), 6) Weighted-MP/OMP will recover an ato that belongs to the ial set that expand the best -sparse approxiant of f. If the best -ters approxiant f exist and can be reached, then η =. This eans that if the approxiation error at the nth iteration is still bigger than a certain quantity which depends on the ial error f f ), the internal dictionary cuulative coherence and the reliability of the a priori inforation, then still another ter of the best ter approxiant can still be recovered. The use of reliable a priori inforation akes the bound easier to satisfy copared to when no prior knowledge is used [4]. Thus, a higher nuber of ters fro the best ter approxiant ay be recovered. Proof: To deonstrate the result of Theore 5, we follow the steps of the original proofs by Tropp [] and Gribonval and Vandergheynst [4]. This tie however, a priori knowledge is taken into account. First of all, let us reind the following stateents: f span ) r n = f f n r = f f is such that r f f n ) n <, hence r n = f f n + f f. In order to ensure the recovery of any ato belonging to the ial set =, the following needs to be satisfied: ρ w r n ) = D W r n D W r n < α, 7)

7 DIVORRA, GRANAI AND VANDERGHEYNST 7 where α, ] is the weakness factor [4]. To establish 6), the previous expression has to be put in ters of f and f n. Hence, ρ w r n ) = D W r n = D W f f n) D W r n D W f f n ) D W = f f ) f ) + D W f n D W f f ) f ) + D W f n D W = f f ) + D W ) D W f n f f f n ) D W f ) f D W f ) D W f + n D W D = W f ) f ) D W f + ρ w f f n ), n f where the second ter can be upper bounded since f f n ) span) [], f f ) f n ) f n 8) ρ w f f n ) µ w ) µ w ) + ɛ ax ). 9) The first ter of the last equality in 8) can be upper bounded in the following way: D W f ) sup g f γ w γ, ) f f γ f ) D W f = f ) n D W f, n and by the Cauchy-Schwarz inequality, sup g γ w γ, ) f f sup γ γ f ) D W f n D W sup γ = D W w γ f f f f n ) = g γ w γ f f f f n ) sup w γ f f γ g γ w γ, ) f f n. In order to further upper bound the expression above, the denoinator can be lower bounded, as shown in []. Indeed, by the singular value decoposition: sup g γ w γ, ) f σ f n in w f f n, ) γ where σin w is the iniu of the squared singular values of G D W ) T D W ), and can be bounded as σin w ɛ ax µ w ). Moreover, in ), f f can be defined as f f = + η) r, where η stands for a sub-iality factor which indicates whether f can be reached and, if not possible i.e. η ), sets the best possible reachable approxiation error. Hence, ) can be rewritten as: sup w γ f f sup w γ + η) r γ sup g γ w γ, γ ) f f n. 3) µ w ) ɛ ax γ f f n Thus, fro 3) and 9), a sufficient condition for the recovery of a correct ato can be expressed as: sup w + η) r ρ w γ µ w ) r n ) + µ w ) ɛ ax µ w f ) ɛ ax f n 4) = wax µ w ) ɛ ax) + η) r + f f n µ w ) µ w ) ɛ ax ) f f < α. n sup γ ) )

8 8 ITS TECHNICAL REPORT NO. 3/4 Considering that f w ax Then, we solve for r n : f n = r n r, it easily follows that µ w ) ɛ ax) + η) r + µ w ) ɛ ax ) r n > + η) r r n + η) r r n + η) r µw ) < α. 5) ) w ax µ w ) ɛ ax ) + α µ w ) ɛ. 6) ax) µ w )) For siplicity, let us consider the case where a full search ato selection algorith is available. Thus, replacing α = in 6) proves Theore 5. The general effect of using prior knowledge can thus be suarized by the following two Corollaries. Corollary : Let W f, D) be a reliable a priori knowledge and assuing α =, then for any positive integer such that µ )+µ ) and µ w )+µ w ) < ɛ ax, Weighted-MP/OMP unlike Weakα)-MP/OMP) will be sure to recover the atos belonging to the best -ters approxiation f. Corollary : Let W f, D) be a reliable a priori knowledge and assuing α =, then for any positive integer such that µ ) + µ ) < and µ w ) + µ w ) + ɛ ax < µ ) + µ ) <, Weighted-MP/OMP has a weaker sufficient condition than MP/OMP for the recovery of correct atos fro the best -ters approxiant. Hence, the correction factor of the right hand side of expression 6) is saller for the Weighted-MP/OMP than for the pure greedy algorith case: µ w ) + ɛ ax ) w ax ) ) + µ w ) + µ w ) + ɛ ax )) + ) µ )) µ ) + µ ))). Therefore, Weighted-MP/OMP is guaranteed to recover better approxiants than classic MP/OMP when reliable and good enough a priori inforation is in use. B. Rate of Convergence of Weighted-MP/OMP on Sparse Approxiations The energy of the series of residuals r n n ) generated by the greedy algorith progressively converges toward zero as n increases. In the sae way, Weighted-MP/OMP with reliable a priori inforation is expected to have a better behavior and a faster convergence rate than the Weak-MP for the approxiation case. A ore accurate estiate of the dictionary coherence conditioned to the signal to be analyzed is available: µ w ) where µ w ) µ )). Then a better bound for the rate of convergence can be found for the case of Weighted-MP/OMP. We follow the path suggested in [] for OMP and in [4] for the case of general Weak-MP algorith to prove this. As before, we introduce the consideration of the a priori inforation in the forulation. The results forally show how Weighted-MP/OMP can outperfor Weak-MP when the prior knowledge is reliable enough. Theore 6: Let W f, D) be a reliable a priori inforation atrix and {r n } : n a sequence of residuals produced by Weighted-MP/OMP, then as long as r n satisfies Eq. 6) Theore 5), Weighted-MP/OMP picks up a correct ato and r n r + η)) α ) n l µw ) ɛ ax ) r l r + η)), 7) where n l. This iplies that Weighted-MP/OMP, in the sae way as weak-mp, has a exponentially decaying upper bound on the rate of convergence. Moreover, in the case where reliable a priori inforation is used, the bound appears to be lower than in the case where priors are not used. This result suggests that the convergence of Weighted greedy algoriths ay be faster than for the case of classic pure greedy algoriths. Proof: Let us consider n such that r n satisfies Eq. 6) of Theore 5. Then, it is known that for Weak-MP: r n r n r n, g kn, 8)

9 DIVORRA, GRANAI AND VANDERGHEYNST 9 where the inequality applies for OMP, while in the case of MP the equality holds. Moreover, considering the weighted selection, then r n r n α sup r n, g γ w γ γ wγ = α sup f f n, g γ w γ γ wγ, 9) where the last equality follows fro the assuption that Eq. 6) of Theore 5 is satisfied and because f f ) span). And by ), r n r n α σin w f wγ f n. 3) As stated before, f together with 3) gives: f n = r n r, hence f f n f f n = r n r n, which f f n f f n α w γ σ in w ) ) f f n α σ in w. 3) Finally, by siply considering l n by recursion it follows: r n ) n l r + η) α σ in w r l r + η)), 3) and the Theore is proved. Depending on the sufficient conditions specified in Sec. III-A, the recovery of the ial set will be guaranteed. However, it is not yet clear how long a non-orthogonalized greedy algorith Weighted-MP in our case) will last iterating over the ial set of atos in the approxiation case. Let us define the nuber of correct iterations as follows: Definition 3: Consider a Weighted-MP/OMP algorith used for the approxiation of signals. We define the nuber of provably correct steps N as the sallest positive integer such that r N µ w f f + ) + ɛ ax ) w ax ) ) η) + µ w ) + µ w ) + ɛ ax )), which corresponds to the nuber of atos belonging to the ial set that is possible to recover given a signal f, a dictionary and an a priori inforation atrix W f, D). In the case of OMP and Weighted-OMP N will be always saller or equal to the cardinality of. For Weak-MP and Weighted-MP, provided that µ ) + µ ) + ɛ ax <, the probable nuber of correct iterations will depend on the final error that reains after the best -ter approxiation has been found. In the following Theore, soe bounds on the quantity N are given for Weighted-MP/OMP. To obtain the results we follow [4]. Before stating the following theore, the reader should note that fro now on, w ax defines the sae concept of l 3) for an ial set of atos of size l, i.e. for l. Theore 7: Let W f, D) be a reliable a priori inforation and {r n } : n a sequence of residuals produced by Weighted-MP/OMP when approxiating f. Then, for any integer such that µ ) + µ ) + ɛ ax <, we have N and for : if 3 r else N. r ɛ ax ) w ax ), then N < + log ɛ ax 3 r r ɛ ax ) ) w ax. 33) Fro 33) we can draw that the upper bound on the nuber of correct steps N is higher for the case of using reliable a priori inforation. This iplies that a better behavior of the algorith is possible with respect to [4]. The ter w ax that depends on the a priori inforation used for Weighted-MP/OMP, w ax and w ax = for the case of classic Weak-MP) helps increasing the value of this bound, allowing Weighted-MP to recover a higher nuber of correct iterations. w ax represents the capacity to discriinate between good and bad atos of the a

10 ITS TECHNICAL REPORT NO. 3/4 priori odel. In order to prove Theore 7, several interediate results are necessary. Lea : Let W f, D) be a reliable a priori inforation and {r n } : n a sequence of residuals produced by Weighted-MP/OMP, then as long as r n satisfies Eq. 6) of Theore 5, for any k < such that N k < N, where [ r ) k N N k < + µ w ) ɛ log ax r λ w l ) l µ w l ) + ɛ ax )) w ax l in which l corresponds to the size of a particular ial set of atos l = l ). Proof: Fro Theore 6, it follows that for l = N k, n = N, defining ) ] + λ w + log k + λ w, 34) [ µ w l ) + µw l) + ɛ ax)], 35) βl w µw l ) ɛ ax, 36) l where l is defined as in 35), and starting fro the condition in the residual rn as defined in the Definition 3, the following is accoplished if α = : λ w + η) r < rn r + η) β) w N Nk r Nk r + η)) β) w N Nk + λ k ) r k Operating on 37) as in [4], it easily follows that: ) N N k r k < r thus, If t then log t) t and so β w + η) r + λ w k + λ w, ) [ r k N N k log β w < log r log β w ) This proves the result presented in 34) and so the Lea. µ ) ɛ ax. + λ w k + λ w ]. + η)). 37) In order to use Lea in Theore 7, an estiate of the arguent of the second logarith in 34) is necessary. This can be found in the following Lea. Lea : For all such that µ w ) + µ w ) + ɛ ax < and k <, we have: λ w w ) ax ) 38) λ w k λ w k µ w k ) ɛ axk ) w ax k ) µ w ) ɛ ax ) w ax 39) Proof: Consider the definition of λ w of 35). Then since µ w l ) + µ w l ) + ɛ ax µ w l ) + µ w l) + ɛ ax for l, the following can be stated: ) ) λ w l λ w l µ w l ) ɛ axl w ax l ) l l. 4) µ w l ) ɛ ax l ) w ax l

11 DIVORRA, GRANAI AND VANDERGHEYNST By assuing k + l the Lea is proved. Finally, building on the results obtained fro Theore 6 and Leas and, Theore 7 can be proved. Proof: To prove Theore 7 we need to upper bound the factor + λw k + λ w in Eq. 34). For this purpose let us consider the following: + λ w k ) + λ w ) + λw k ) λ w ) λ w + λw k λ w. 4) Together with the results of Lea, it gives: ) + λ w log k + λ w log + k µ w k ) ɛ axk ) µ w ) ɛ ax ) Hence, using Eq. 34) we obtain [ r ) k N N k < + µ w ) ɛ log + ax r ) w ax k ) w ax ) log + k µ w k ) ɛ axk ) w ax k ). µ w ) ɛ ax ) w ax. 4) Theore 7 is thus proved by particularizing the previous expression for the case where k =. For the case of N N + =, this yields that N N < + µ w ) ɛ ax r ) ) log r + log ɛ + ax ) w ax ). µ w ) ɛ ax ) w ax < + µ w ) ɛ ax r ) log r + log + ). 44) µ w ) ɛ ax ) w ax and µ w ) ɛ ax > ɛ ax, then r ) + ɛ ax ) N < + µ w ) ɛ log ax r + log ɛ ax ) r ) 3 < + µ w ) ɛ log ax r + log ɛ ax ) 3 r Which, since µ w ) < ɛ ax < + µ w ) ɛ log ax < + log ɛ ax This is only possible if 3 r r 3 r r ɛ ax ) ). r ɛ ax ) w ax ɛ ax ) ) w ax ) w ax ) w ax w ax ) ) w ax 43). 45)

12 ITS TECHNICAL REPORT NO. 3/4 Copared to the case where no a priori inforation is available [4], the condition for the validity of bound 33) is softened in our case. Moreover, the upper bound on N is increased, which eans that there is roo for an iproveent on the nuber of correct iterations Indeed, under the assuption of having a reliable a priori inforation, the saller w ax which iplies a good discriination between and ) the easier it is to fulfill the condition stated in Theore 7. C. Exaple: Use of Footprints for ɛ-sparse Approxiations. To give an exaple of the approxiation of signals using a priori inforation, we consider again the case presented in [] where a piecewise-sooth signal is represented by eans of an overcoplete dictionary coposed by the ixture of an orthonoral wavelet basis and a faily of wavelet footprints see [6]). Wavelet+Footprints Dictionary Teporal Axis Function index Fig.. Dictionary fored by the Sylet-4 [7] left half) and its respective footprints for piecewise constant singularities right half). Let us reind the considerations for the dictionary. The dictionary is built by the union of an orthonoral basis defined by the Sylet-4 faily of wavelets [7] and the respective faily of footprints for all the possible translations of the Heaviside function. The latter is used to odel the discontinuities. The graphical representation of the dictionary atrix can be seen in Fig. where the coluns are the wavefors that copose the dictionary. 5 Original 5 Use of OMP with dictionary D without "a priory" ters approx). 5 Use of OMP with dictionary D with "a priory" ters approx) aplitude aplitude aplitude teporal axis teporal axis teporal axis Fig.. Coparison of OMP based approxiation with ters using the footprints dictionary Fig. ). Left: Original signal. Middle: blind OMP approxiation. Right: OMP with prior knowledge of the footprints location. The use of such a dictionary, indeed, does not satisfy at all the sufficient condition required to ensure the recovery of an ial approxiant with ore than one ter. Moreover, even if the best a priori was available, it is also far fro satisfying the sufficient condition based on the weighted Babel function. Nevertheless, such an exaple is considered in this section because of two ain reasons. The first concerns the fact that sufficient theoretical conditions exposed in the literature are very pessiistic and reflect the worst possible case. The second reason is that, as previously discussed, experience sees to teach us that good dictionaries for efficient approxiation of signals, are likely to be highly coherent. This fact conflicts with the requireent of incoherence for the good behavior of greedy algoriths. Hence, we find this exaple of special interest to underline the benefits of using a priori inforation and additional signal odeling for non-linear expansions. Indeed, with this exaple it is shown that, by using reliable a priori knowledge, better approxiations are possible, not only with incoherent dictionaries where theoretical evidences of the iproveent have been shown in this paper) but also with highly coherent ones.

13 DIVORRA, GRANAI AND VANDERGHEYNST 3 We repeat the procedure used in [] to estiate the a priori inforation based on the dictionary and the input data. We also refer the reader to Sec. V for a ore detailed explanation. Rate of convergence of the residual error OMP with a priori OMP residual energy iteration # Fig. 3. Rate of convergence of the error with respect to the iteration nuber in the experient of Fig. Fig. presents the original signal left) together with the two approxiations obtained in this exaple: without a priori in the iddle and with a priori at the right. The signal to be approxiated has a higher nuber of polynoial degrees than the nuber of vanishing oents of the Sylet-4. The figures depict clearly the positive effect of the reliable a priori inforation inserted in the Weighted-OMP algorith. Indeed, with very few coponents, the algorith benefits fro the a priori inforation estiated fro the signal, and gives a uch better approxiation. A ore global view of this behavioral enhanceent can be seen in Fig. 3 where the rate of convergence of the approxiation error is presented. The use of weights is definitively helpful and a considerable gain in the reduction of the approxiation error is achieved for a sall nuber of ters. IV. Approxiations with Weighted Basis Pursuit Denoising In this section, the proble of finding a sparse approxiation of a signal f is addressed considering a trade-off between the error and the nuber of eleents that participate to the approxiation. In statistics this proble is also called Subset Selection and we will refer to it as P : P ) in c f Dc + τ c. 46) Solving P is NP coplex, so a possible way of siplifying the coputation can be to substitute the l quasi-nor with the convex l nor. This relaxation leads to the following proble that, fro now on, is called P : P ) in b f Db + γ b. 47) This new proble corresponds to the iniization of a convex function that can be solved with classical Quadratic Prograing ethods. This relaxation is siilar to the one that leads to the definition of the Basis Pursuit principle for the case of exact signal representation. The fact that this paradig is also called Basis Pursuit Denoising can be explained because it was introduced to adapt BP to the case of noisy data [3]. Note that if D is orthonoral the solution of P can be found by a soft shrinkage of the coefficients [5], [3], while, if D is a union of orthonoral subdictionaries, the proble can be solved recurring to the Block Coordinate Relaxation ethod [8], faster than Quadratic Prograing. In [] we introduced a theoretical fraework for sparse representation over redundant dictionaries taking into account soe a priori inforation about the signal. In this ic we proposed the Weighted Basis Pursuit ethod that iniizes a cost function that includes weights expressing the a priori inforation: The ain results in [] concerning WBP are contained in Proposition : in b W b s.t. Db = f. 48) Definition 4: Given a dictionary D indexed in Ω and an index subset Ω, we define the Weighted Recovery Factor WRF) as: W RF ) = sup D W ) + g i w i. 49) i /

14 4 ITS TECHNICAL REPORT NO. 3/4 Proposition : Given a dictionary D and an a priori atrix W f, D), Weighted Basis Pursuit is able to recover the ial representation of a sparse signal f = D b if the following Exact Recovery Condition is respected: W RF ) <. 5) Moreover, a bound for the WRF is: µ w ) W RF ) < ɛ ax µ w 5) ), where ɛ ax = sup w γ and wγ are the eleents of the diagonal atrix W. Therefore, the Exact Recovery Condition γ for WBP 5) holds for any index set of size at ost such that A. A Bayesian Approach to Weighted Basis Pursuit Denoising µ w ) + µ w ) < ɛ ax. 5) In this subsection the proble of signal approxiation is studied fro a Bayesian point of view. We also exaine under which hypotheses BPDN finds the ial solution. This leads us to generalize the BPDN principle thorugh the definition of Weighted Basis Pursuit Denoising WBPDN). Let us write again the odel of our data approxiation, where ˆf is the approxiant and r is the residual: Assuing r to be an iid Gaussian set of variables, the data likelihood is pf D, b) = f = ˆf + r = Db + r. 53) πσ r exp f Db σ r ), 54) where σr is the variance of the residual. In the approxiation proble, one ais at axiizing the likelihood of pb f, D). Forally, by the Bayes rule, we have pb f, D) = pf D, b) pb), pf, D) and thus, assuing pf, D) to be unifor, it follows that the ost probable signal representation is: b P = arg ax pf D, b) pb). 55) b Let us now assue the coefficients b i are independent and have a Laplacian distribution with standard deviation σ i : pb i ) = σi exp Fro 55), by coputing the logarith, it follows that ) ) bi σ i f Db σr + i ) bi. b P = arg ax lnpf D, b)) + ln pb i ) = arg in b i b σ i Making the hypothesis that σ i is constant for every index i, the previous equation eans that the ost probable b is the one found by the BPDN algorith [9]. In fact, this hypothesis does not often correspond to reality. On the contrary, if the variances of the coefficients are not forced to be all the sae, it turns out that the ost probable signal representation can be found by solving the following proble: P w ) in f Db + γ W b, 56) where the diagonal atrix with entries in, ] is defined in Section III. One can notice that in Eq. 56), the introduction of weights allows to individually odel the coponents of b. This approach is analogous to the one introduced in [] and [] and, fro now on, we will refer to P w as Weighted Basis Pursuit Denoising or WBPDN. The assuption often ade about the Gaussianity of the residual is quite restrictive. However, for another particular proble, one could ake the hypothesis that this residual has a Laplacian distribution. It is then possible to prove that the ost probable signal representation can be found substituting the l easure of the error with the l. This leads to the following iniization proble: b in f Db + γ W b, where W = I if the variances of the probability density functions of b i are the sae for each i. This proble is faced, for exaple in [], where it is also explained that it can be solved by Linear Prograing techniques. b

15 DIVORRA, GRANAI AND VANDERGHEYNST 5 B. Preliinary Propositions Here soe preliinary propositions are presented, allowing us to prove the results of the following two subsections. The proofs of ost of the follow the arguents given by Tropp in [3]. In the following c and b lay in R but soeties these are extended to R Ω by padding with zeros. The sae is valid for the atrix W. Next proposition, siilar to the Correlation Condition Lea in [3], establishes a fundaental result for the rest of the report: it basically states that, if the atos of have a sall weighted coherence, expressed by the Weighted Recovery Factor, then the support of any vector that solves P w is a subset of. Lea 3: Given an index subset Ω, suppose that the following condition is satisfied: D T f a ) < γ w ax W RF )), 57) where w ax, ] is the quantity defined by equation 3). Then, any coefficient vector b that iniizes the cost function of proble P w ust satisfy supportb ). 58) Proof: Assue that b is a vector iniizing 56). Assue also that it uses an index outside. b can be copared with its projection D + Db, which is supported in and we obtain: which gives f Db + γ W b f DD + γ W b W Db + γ W D+ Db ), D+ Db ) ) f DD + Db f Db. 59) First we shall provide a lower bound on the left-hand side of the previous inequality. Let us split the vector b into two parts: b = b + b, where the forer vector contains the coponents with indexes in, while the latter the reaining coponents fro Ω \. This yields, by the upper triangular inequality, that Since W b W W D+ Db ) = W b + W b W b + W W b W D+ DW W b. D+ DW W b sup D W ) + g i w i W b, i / D+ Db using 49), one can write that W b W D+ Db ) W RF )) W b. 6) We now provide an upper bound for the right-hand side of 59). This quantity does not depend on the weighting atrix, thus, exactly as in [3], it can be stated that: f DD + f Db b D T f a ). 6) Db Fro 59), 6) and 6) it turns out that: γ W RF )) W b b D T f a ). 6) Since the weights are in, ], and the vector b, by assuption, cannot be null, it can be written: γ W RF )) b W b D T f a ) w ax D T f a ). 63) If 57) is valid, then 63) fails and so one ust discard the hypothesis that b is non-zero for an index in = Ω \. We now focus on finding a necessary and sufficient condition for the existence and unicity of a iniu of P w. The presence of the l nor iply that the cost function of this proble is non-sooth at zero: for this reason the concept of subdifferential is used. Given a real vector variable x, the subdifferential of x is denoted by x and defined as: x {u u x = x, u }. The vectors u that copose the subdifferential are called subgradients [].

16 6 ITS TECHNICAL REPORT NO. 3/4 Lea 4: A necessary and sufficient condition for b to globally iniize the objective function of P w the coefficient vectors with support is that: over all where u is a vector fro b. Moreover, the iniizer is unique. Proof: One can observe that solving P w fro R : c b = γ DD T ) W u, 64) is equivalent to iniize the following function over coefficient vectors F b) a D b + γ W b. A point b iniizes the second ter of F b) if and only if the following Ferat criterion holds see [], []): b. Moreover, b iniizes F b) if and only if F b ). In our case this eans that u b s.t. DD T b Da T + γw u =, 65) for soe vector u taken fro b. Let the atos in be linearly independent, fro 65) it follows: b D T D ) D T a + γ D T D ) W u =, and so D + a b = γ W D T D ) u. To conclude the proof it is sufficient to recall that c = D + a. If W = I, then this result coincides with the one developed by Fuchs in [3] and by Tropp in [3] in the coplex case. Lea 5: Suppose that b iniizes the cost function of proble P w. Then the following bound holds: c b γ w in DD T ),, 66) where w in is defined as w in inf i w i. 67) Proof: Let us consider the necessary and sufficient condition of Lea 4: taking the l nor of 64) we obtain: c b = γ DD T ) W u γ DD T ) W u, By definition of subdifferential, u. Inserting this into the previous equation and using the sub-ultiplicative property of atrix nors AB p,q A p,q B p,q ), we can prove that c b γ DD T ), W,. Just apply the fact that W, = sup/w i ) = i w in to reach the result. The following proposition states a result that will be used in next subsection to prove Theore 8. Note that here is the ial index subset of Ω, and thus c is the sparsest solution to the subset selection proble. Proposition : Tropp [3]) Given an input signal f and a threshold τ, suppose that the coefficient vector c, having support of cardinality, is the sparsest solution of the proble P. Set f = Dc. Then: k, c k) τ. i /, f f, g i < τ. These preliinary stateents of this subsection will allow us to obtain the ain results in the following of the report.

17 DIVORRA, GRANAI AND VANDERGHEYNST 7 C. Weighted Relaxed Subset Selection Let us now study the relationship between the results obtained by solving proble P w and P. Suppose that c is the sparsest solution to P and that its support is, with =. D will be the atrix containing all the atos participating to the sparsest approxiation of f and f will be the approxiant given by c, i.e f = Dc = DD + f = D D + f. Assuing W RF ) <, we have the following result. Theore 8: Suppose that b iniizes the cost function of proble P w γ = τ wax W RF ), where w ax is defined in 3). Then: ) WBP never selects a non ial ato since supportb ). ) The solution of WBP is unique. 3) The following upper bound is valid: c b τ wax w D T in D W RF ) with threshold ),. 68) 4) The support of b contains every index j for which τ wax w D in T D ), c j) >. 69) W RF ) The scalar w in appearing in Eqs. 68) and 69) is defined in Eq. 67). Proof: Considering the first stated result, note that every ato indexed by has zero inner product with the ial residual r = f f ) since f is the best approxiation of f using the atos in. Using Proposition and recalling that D is finite, it can be stated that D T f f ) < τ. 7) Moreover, Lea 3 guarantees that γ s.t. D T f f ) < γ w ax W RF )), 7) and that the solution b to the convex proble P w is supported on. Fro 7) and 7) it follows that for any γ that satisfies the following condition, it is insured that supportb ) : γ τ wax W RF ). 7) In the following, the sallest possible value for γ is chosen: in this way, Eq. 7) becoes an equality. The uniqueness of the solution follows fro Lea 4. With regard to the third point, the results fro Lea 5 yield c b γ w in D T ), τ wax w D in W RF ) D T ), D. This proves the point 3. Concerning the fourth result of the theore, one can observe that, for every index j for which c j) > γ D T ) D W,, the corresponding coefficient b j) ust be different for zero. As before, substituting Eq. 7) leads to prove the last result of the Theore. This theore states two iportant concepts. First, if the trade off paraeter is correct and the weighted cuulative coherence of the dictionary is sall enough, WBPDN is able to select the correct atos for the signal expansion and it will not pick up the wrong ones. Furtherore, the error achieved by the algorith on the aplitude of the coefficients related to the selected atos is bounded. Nevertheless, once the algorith has recovered the ato subset, the appropriate aplitudes of the coefficients can be coputed by the orthogonal projection of the signal onto the space generated by the selected atos. This ethod is illustrated in Section IV-E to generate soe exaples.

18 8 ITS TECHNICAL REPORT NO. 3/4 and w ax depend on the reliability and goodness of the a priori. In particular, if W tends to The quantities w in be ial i.e. its diagonal entries tend to for the eleents that should appear in the sparsest approxiation and to for the ones that should not), one can observe that w in and w ax. D. Relation with the Weighted Cuulative Coherence In this subsection, the previous results are described using the weighted cuulative coherence function defined in 5). In this way a coparison is ade between the results achievable by BPDN and WBPDN. Theore 9: Assue that the real vector b solves P w with γ = wax τ ɛ ax µ w )) ɛ ax µ w ). µw ) Then supportb ) and τ wax ɛ w b c in ax µ w )) ɛ ax µ w ) µw )) µ )). 73) Proof: This result can be obtained fro Proposition and Theore 8, since: b c γ w in D T ) τ wax ɛ, w in ax µ w )) D T D ), D = ɛ ax µ w ) µw )). The last ter of the nuerator in the previous expression is the nor of the inverse Gra atrix. Since see [3], [3], []) this proves equation 73). D T D ), = D T D ), µ ), This result is valid in general and illustrates how the distance between the ial signal approxiation and the solution found by solving P w can be bounded. In case no a priori is given, the bound on the coefficient error is obtained fro Eq. 73) setting W = I. Consequently, w in =, ɛ ax = and w ax = see also [3]): b c τ µ ) µ ). 74) Coparing the two bounds, one can observe how the availability of a reliable prior on the signal can help in finding a sparser signal approxiation. This concept is ephasized in the following corollary. Corollary 3: Let W f, D) be a reliable a priori knowledge, with w ax w in then for any positive integer such that µ )+µ ) < and µ w )+µ w )+ɛ ax < µ )+µ ) <, the error b c given by the coefficients found by WBPDN is saller than the one obtained by BPDN. Hence, the bound stated by Eq. 73) is lower than the one in Eq. 74), i.e. τ wax ɛ ax µ w )), w in ɛ ax µ w ) µw )) µ )) τ µ ) µ ). Note the siilarity between Corollaries and 3. The proof of the latter Corollary is reported in the appendix. Here the hypothesis that wax is ade. Observe that if the a priori is particularly good this factor can be uch saller w in than one and so the iproveent with respect to the general case can be really big.

19 DIVORRA, GRANAI AND VANDERGHEYNST 9 E. Exaple We exaine again the exaple presented in section III-C, but this tie using the Basis Pursuit Denoising and Weighted Basis Pursuit Denoising algoriths. The dictionary used for the decoposition is illustrated in section III-C, as well as the input signal. For an explanation of the prior odel and the extraction of the a priori atrix, see [] or Section V. The signal f is decoposed first solving the iniization proble illustrated in 47) BPDN). Then, the a priori knowledge is introduced, and we solve the iniization proble illustrated in 56) WBPDN). Both solutions were nuerically found using Quadratic Prograing techniques. The trade off paraeter γ controls the l nor of the coefficient vector and indirectly its sparseness. The signal representations present any coponents with negligible values due to the nuerical coputation: a hard thresholding is perfored in order to get rid of this isleading inforation, paying attention not to eliinate significant eleents. In this ic it is possible to easure the l nor of the vector b. The data reported here refer to a threshold value of.. Of course the reconstructions are coputed starting fro the thresholded coefficients. Figure 4 shows the reconstructions of the input signal given by a -ters approxiation found by BPDN and WBPDN. Figure 6 on the left-hand) illustrates the ean square error of the 5 Original 5 reconstructed by BP: ters 5 reconstructed by WBP: ters aplitude aplitude aplitude teporal axis teporal axis teporal axis Fig. 4. Coparison of BPDN and WBPDN based approxiation with ters using the footprints dictionary Fig. ). Left: Original signal. Center: reconstruction using BPDN coefficients. Right: reconstruction using WBPDN coefficients. approxiations of f with ters found by BPDN and WBPDN. Let us call b the approxiation found by BPDN and b w the one found by WBPDN. As just explained, these vectors are thresholded reoving the nuerically negligible coponents, and in this way we are able to individuate a sparse support and so a subset of the dictionary. Let us label the subdictionary found by WBPDN with D w coposed by the atos corresponding to the non-zero eleents of b w ). Once this is given, there are no guarantees that the coefficients that represent f are ial see Theores 8 and 4). These are, thus, recoputed projecting the signal onto D w and a new approxiation of f naed b w is found. Exactly the sae is done for BPDN, ending up with a subdictionary D and a new approxiation b. Of course, supportb ) = supportb ) and supportb w ) = supportb w ). Forally the approxiants found by BPDN and WBPDN are respectively: f = D D + f = Db and f w = D w D w ) + f = Db. In synthesis, the pursuit algorith is used only to select a dictionary subset and then the coefficients of the approxiation are coputed again, by eans of a siple projection. Fig. 5 and 6, show how this technique considerably iproves the results obtained by solving probles P and P w. Moreover they confir the advantages of the weighted algorith with respect to the non weighted one. 75) V. Exaples: A Natural Signal Approxiation with Coherent Dictionaries and an A Priori Model In the previous sections, the exaple of approxiating a synthetically generated piecewise-sooth signal has been presented. This successfully illustrates how we can exploit prior signal odels in order to aeliorate the behavior of sub-ial algoriths in the retrieval of sparse approxiations. In this section we show that the iproveent is not liited to artificially generated signal, provide a very siple exaple where weighted algoriths perfor better also with natural signals. Moreover, as we will show, the a priori weights can be autoatically extracted fro the data and iized such that the perforance of the weighted algorith in use is axiized. Continuing with the class of signals analyzed in previous exaples, we would like to approxiate a signal that can be considered as piecewise-sooth. This kind of signals can be efficiently approxiated by using dictionaries of functions that can ially represent both features of the signal: discontinuities and sooth parts. For this purpose an overcoplete coherent dictionary given by footprints and wavelets is used, as in [] and in all the previous exaples in the present work.

20 ITS TECHNICAL REPORT NO. 3/4 5 original BP BP+proj 5 original WBP WBP+proj Fig. 5. The original signal reconstructed fro a -ters approxiation coputed by BPDN left) and WBPDN right). The coparison show the iproveent given by recoputing the projections once that the algorith has selected a subdictionary. For the errors see Figure 6 WBP BP BP + proj WBP + proj MSE MSE # ters # ters Fig. 6. Errors in log scale) of the -ters approxiations f with BPDN and WBPDN. In the figure on the right the approxiations are coputed projecting the signal on the subdictionary selected by the algorith see Eq. 75) A. A Noisy Piecewise-sooth Natural Signal As often suggested, iages ay be represented by a piecewise-sooth coponent and a texture coponent [4], [5]. A way to get a natural D piecewise-sooth signal is to extract a colun or a row fro a natural iage. In particular, we study two D signals extracted fro the iage caeraan, shown in Fig. 7: one corresponds to the 4th colun, and the second to the 8th row. Fig. 7. Caeraan picture used to extract the exaple of a real world D signal with piecewise-sooth characteristics

21 DIVORRA, GRANAI AND VANDERGHEYNST B. Modeling the Relation Signal-Dictionary The dictionary in use is coposed by the union of the Sylet-4 orthonoral base and the set of piecewise-constant footprints see Sec. III-C and Fig. ). Since the input signals are constituted by 56 saples, the dictionary is expressed by a atrix of size The odeling of the interaction between the signal and the dictionary is perfored using the siple approach described in []. The weighting atrix W f, D) is generated by eans of a pre-estiation of the locations where footprints are likely to be used and the assuption that in such locations wavelets have less probability to be required. This discriination is reflected in the diagonal atrix W f, D) in the following way: locations where a footprint is likely to be placed are not penalized thus the weighting factor reains ). On the contrary, wavelets that overlap the footprint and footprints considered unlikely to be used get a penalizing factor β, ]. To be ore explicit, the edge detection based odel used in [] is recalled: Algorith W f, D) estiation Require: D = D Sylet D F ootprints, define a threshold λ, define a penalty factor β : f diff = D + F ootprints f {Footprints location estiation edge detection)} : Threshold f diff by λ putting greater values to or β otherwise. 3: W diag footprints = f diff {Diagonal of the sub-atrix of W f, D) corresponding to footprints.} 4: Create Wwave diag s.t. [ all wavelets intersecting ]) the found footprints locations equal β, set to otherwise. 5: W f, D) = diag ; W diag wave W diag footprints As one can observe, two paraeters configure the odel that generates W f, D): a threshold λ and a penalty weight β. In practice, as shown later in this section, these can be selected by an iization procedure such that the energy of the approxiation error is iniized. C. Signal Approxiation An additional phase is inserted before the approxiation. It consists of an estiation of the ain features of the particular signal to approxiate and their relation with the dictionary in use. We thus resue the general procedure by these two steps: ) Estiation of the a priori inforation fro the real world signal using a prior odel. ) Use of a weighted algorith greedy or relaxed) based on the estiated a priori knowledge to find the appropriate atos subset to approxiate the real world signal. Furtherore, an iterative version of this two phase algorith can be considered in order to iize the paraeters that configure the prior odel used in the first step: the Expectation Maxiization EM) algorith. A first approach for the paraeters tuning can be a grid search, or a ulti-scale grid search. More sophisticated search techniques could be used, in particular, an overview can be found in [6]. D. Results The results obtained fro the fraework introduced above are illustrated in the following. First, we show the quantitative ipact of using Weighted-MP/OMP and WBPDN in ters of the residual error energy. Right after, the use of atos of the dictionary to represent the ain features of the signal is analyzed. Finally, we explore the influence of tuning in an appropriate way the two paraeters that configure our penalty odel. ) Approxiation Results with OMP: Two approxiation exaples, where OMP and Weighted-OMP are used, are presented. Following the two steps procedure, described above, we look for approxiants of the signals appearing at the left of Figs. 8 and 9. The forer corresponds to the 4th colun of the caeraan picture and the latter to the 8th row. The iproveent in perforance of Weighted-OMP in the case of sparse approxiations is assessed by the rate of convergence of the residual energy. On the right hand of Figs. 8 and 9, the graphs show that after a certain nuber of iterations, Weighted-MP selects better atos than classic Weak-MP. Hence the convergence of the error iproves and this yields a gain of up to dbs for the first exaple and up to.5 dbs in the second one depending on the iteration). ) Approxiation Results with BPDN: The D signal extracted fro the 4th colun of caeraan, illustrated on the left side of Fig. 8, is approxiated by using BPDN and WBPDN. As explained in section IV-E, the pursuit algorith is used only to select a dictionary subset and then, the coefficients of the approxiation are coputed again by eans of a siple projection. Fig. shows the decay of the energy of the error while the nuber of atos selected increases. It is clear how the use of the a priori helps the algorith in finding a better approxiation of the signal. The results concerning WBPDN are obtained by ading a weighting atrix that corresponds to λ = 9 and β =.. Notice that these values are not ial for all the nubers of non-zero coefficients. Better results can be achieved by tuning appropriately β and γ for any desired.

22 ITS TECHNICAL REPORT NO. 3/4 Original caeraan colun 4 Rate of convergence of the residual error 5 65 OMP with a priori OMP 6 Aplitude 5 residual energy dbs) t iteration # Fig. 8. Experient of approxiating the D signal extracted fro the 4th colun of caeraan. Left, D signal used in the experience can be seen. Right, the rate of convergence of the residual error. In red can be observed the OMP result. In blue the Weighted-OMP result. Original 65 Rate of convergence of the residual error OMP with a priori OMP Aplitude residual energy dbs) t iteration # Fig. 9. Experient of approxiating the D signal extracted fro the 8th row of the 56x56 caeraan picture fro Fig. 7. Left, D signal used in the experience can be seen. Right, the rate of convergence of the residual error. In red can be observed the OMP result. In blue the Weighted-OMP result BP WBP Error dbs) # coefficients Fig.. Error in db) obtained by BPDN and WBPDN approxiating the D signal extracted fro the 4th colun of the iage caeraan see Fig. 7) using different nubers of atos. Both results are obtained by using quadratic prograing for selecting a dictionary subset and then recoputing the coefficients by re-projecting the signal onto the span of the subdictionary. The procedure is illustrated in section IV-E.

23 DIVORRA, GRANAI AND VANDERGHEYNST 3 3) Capturing the Piecewise-sooth Coponent with Footprints Basis: Here, the results intend to underline the iportance of selecting the appropriate ato to represent a particular signal feature. In Fig. we can see in the upper row the resulting approxiants of original signal obtained fro the 4th colun Fig. 8) after 5 iterations of OMP left) and Weighted-OMP right). The result which considers the a priori is a.5 dbs better than the approxiant obtained by OMP. At this point, it is iportant to notice the result depicted in the lower row of Fig.. These wavefors represent the signal coponents that are captured exclusively by the footprints atos and Sylet-4 scaling functions. These signal coponents should correspond to the piecewise-sooth parts of the signal. However, and as depicted by the lower row of Fig., in the case of OMP botto left) the piecewise-sooth coponent captured by footprints and low-pass functions is far fro what one could expect. Intuitively one can understand that the OMP algorith is failing in the selection of atos. On the other hand, the result obtained by Weighted-OMP botto right) clearly shows that footprints and Sylet-4 scaling functions are capturing a uch ore accurate approxiant of the piecewise-sooth coponent of the signal. Hence, a better approxiation is achieved by using the a priori inforation. This leads to a sparser approxiation too. Reconstruction fro all retrieved coefs without W Reconstruction fro all retrieved coefs with W 5 5 Aplitude 5 Aplitude t t Reconstruction fro the footprint part without W Reconstruction fro the footprint part with W 5 5 Aplitude 5 Aplitude t t Fig.. Upper left: Approxiation after 5 iterations with OMP without using a priori inforation. Upper right: Approxiation after 5 iterations with Weighted-OMP+.5 dbs). Botto left: Signal coponents captured by Sylet-4 scaling functions and Footprints when siple OMP is used. Botto right: Signal coponents captured by Sylet-4 scaling functions and Footprints using Weighted-OMP. 4) Paraeter Search: Finally, we show the influence of the paraeters λ and β in the average quadratic error of the residues obtained by Weighted-MP, i.e. E {r n λ, β } = N n= r n N s.t. r n has been obtained fixing λ = λ and β = β. 76) In Figs. and 3 the agnitude of Eq. 76) is shown as a function of λ odel threshold) and β penalty weight) for the two natural exaples exposed in this work. Fig. corresponds to the case of the 4th colun of caeraan, and Fig. 3 concerns the 8th row. In the figures, a apping of the eaning of the colors is available. The lower the value of E {r n λ, β } is, the ore probably the associated odel paraeters are the good ones. In this case, it can be easily observed how the ial configuration of paraeters concentrates in both cases in a unique global iu. Hence, the set of ial paraeters that fit the data odel can be easily found by soe iterative procedure.

24 4 ITS TECHNICAL REPORT NO. 3/4 Expectation ap caeraan colun 4 3 odel threshold λ probability weight β less probable ore probable Fig.. Representation of the expectation ap depending on the paraeters that configure the a priori odel in the experient set up in Fig. 8. The expectation corresponds to the energy of the residual error. Expectation ap caeraan line 8 3 odel threshold λ probability weight β less probable ore probable Fig. 3. Representation of the expectation ap depending on the paraeters that configure the a priori odel in the experient set up in Fig. 9. The expectation corresponds tot the energy of the residual error. VI. Conclusions This work presents theoretical results on the perforance of including a priori knowledge in algoriths for signal approxiation like Weak-MP or BPDN. We introduce weighted variants called Weighted-MP/OMP and WBPDN. Theoretical results show that these algoriths ay supply uch better results than classic approaches for highly nonlinear signal approxiations, if sufficiently reliable prior odels are used. This reliability is theoretically represented in our results by ɛ ax while the discriinative ability of the a priori is given by w ax and w in. The fact that these quantities are norally unavailable in practice, disables us to use nuerically the sufficient conditions found unlike in the case where no prior is used. Nevertheless, the results found are able to explain how weighted Greedy and BPDN algoriths behave copared to non-weighted ones. A field to explore ay be to try to deterine soe bounds on this quantity depending on the class of signals to approxiate and the practical estiators those that generate the a priori weights) in use. Practical exaples concerning synthetic and natural signals have been presented where we used a dictionary with high internal cuulative coherence. Theoretically, with such a dictionary, greedy or relaxation algoriths are not guaranteed to recover the set of atos of the best -ters approxiation. Sufficient conditions for the recovery of all the correct atos of the best -ters approxiation are far fro being satisfied.

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

Randomized Recovery for Boolean Compressed Sensing

Randomized Recovery for Boolean Compressed Sensing Randoized Recovery for Boolean Copressed Sensing Mitra Fatei and Martin Vetterli Laboratory of Audiovisual Counication École Polytechnique Fédéral de Lausanne (EPFL) Eail: {itra.fatei, artin.vetterli}@epfl.ch

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

Weighted- 1 minimization with multiple weighting sets

Weighted- 1 minimization with multiple weighting sets Weighted- 1 iniization with ultiple weighting sets Hassan Mansour a,b and Özgür Yılaza a Matheatics Departent, University of British Colubia, Vancouver - BC, Canada; b Coputer Science Departent, University

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

3.8 Three Types of Convergence

3.8 Three Types of Convergence 3.8 Three Types of Convergence 3.8 Three Types of Convergence 93 Suppose that we are given a sequence functions {f k } k N on a set X and another function f on X. What does it ean for f k to converge to

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

Lecture 21. Interior Point Methods Setup and Algorithm

Lecture 21. Interior Point Methods Setup and Algorithm Lecture 21 Interior Point Methods In 1984, Kararkar introduced a new weakly polynoial tie algorith for solving LPs [Kar84a], [Kar84b]. His algorith was theoretically faster than the ellipsoid ethod and

More information

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup)

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup) Recovering Data fro Underdeterined Quadratic Measureents (CS 229a Project: Final Writeup) Mahdi Soltanolkotabi Deceber 16, 2011 1 Introduction Data that arises fro engineering applications often contains

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher

More information

Support recovery in compressed sensing: An estimation theoretic approach

Support recovery in compressed sensing: An estimation theoretic approach Support recovery in copressed sensing: An estiation theoretic approach Ain Karbasi, Ali Horati, Soheil Mohajer, Martin Vetterli School of Coputer and Counication Sciences École Polytechnique Fédérale de

More information

1 Proof of learning bounds

1 Proof of learning bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

Lecture 20 November 7, 2013

Lecture 20 November 7, 2013 CS 229r: Algoriths for Big Data Fall 2013 Prof. Jelani Nelson Lecture 20 Noveber 7, 2013 Scribe: Yun Willia Yu 1 Introduction Today we re going to go through the analysis of atrix copletion. First though,

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,

More information

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness A Note on Scheduling Tall/Sall Multiprocessor Tasks with Unit Processing Tie to Miniize Maxiu Tardiness Philippe Baptiste and Baruch Schieber IBM T.J. Watson Research Center P.O. Box 218, Yorktown Heights,

More information

COS 424: Interacting with Data. Written Exercises

COS 424: Interacting with Data. Written Exercises COS 424: Interacting with Data Hoework #4 Spring 2007 Regression Due: Wednesday, April 18 Written Exercises See the course website for iportant inforation about collaboration and late policies, as well

More information

On Constant Power Water-filling

On Constant Power Water-filling On Constant Power Water-filling Wei Yu and John M. Cioffi Electrical Engineering Departent Stanford University, Stanford, CA94305, U.S.A. eails: {weiyu,cioffi}@stanford.edu Abstract This paper derives

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

Exact tensor completion with sum-of-squares

Exact tensor completion with sum-of-squares Proceedings of Machine Learning Research vol 65:1 54, 2017 30th Annual Conference on Learning Theory Exact tensor copletion with su-of-squares Aaron Potechin Institute for Advanced Study, Princeton David

More information

Estimating Parameters for a Gaussian pdf

Estimating Parameters for a Gaussian pdf Pattern Recognition and achine Learning Jaes L. Crowley ENSIAG 3 IS First Seester 00/0 Lesson 5 7 Noveber 00 Contents Estiating Paraeters for a Gaussian pdf Notation... The Pattern Recognition Proble...3

More information

arxiv: v1 [cs.ds] 3 Feb 2014

arxiv: v1 [cs.ds] 3 Feb 2014 arxiv:40.043v [cs.ds] 3 Feb 04 A Bound on the Expected Optiality of Rando Feasible Solutions to Cobinatorial Optiization Probles Evan A. Sultani The Johns Hopins University APL evan@sultani.co http://www.sultani.co/

More information

Physics 215 Winter The Density Matrix

Physics 215 Winter The Density Matrix Physics 215 Winter 2018 The Density Matrix The quantu space of states is a Hilbert space H. Any state vector ψ H is a pure state. Since any linear cobination of eleents of H are also an eleent of H, it

More information

Compressive Distilled Sensing: Sparse Recovery Using Adaptivity in Compressive Measurements

Compressive Distilled Sensing: Sparse Recovery Using Adaptivity in Compressive Measurements 1 Copressive Distilled Sensing: Sparse Recovery Using Adaptivity in Copressive Measureents Jarvis D. Haupt 1 Richard G. Baraniuk 1 Rui M. Castro 2 and Robert D. Nowak 3 1 Dept. of Electrical and Coputer

More information

Polygonal Designs: Existence and Construction

Polygonal Designs: Existence and Construction Polygonal Designs: Existence and Construction John Hegean Departent of Matheatics, Stanford University, Stanford, CA 9405 Jeff Langford Departent of Matheatics, Drake University, Des Moines, IA 5011 G

More information

Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence

Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence Best Ar Identification: A Unified Approach to Fixed Budget and Fixed Confidence Victor Gabillon Mohaad Ghavazadeh Alessandro Lazaric INRIA Lille - Nord Europe, Tea SequeL {victor.gabillon,ohaad.ghavazadeh,alessandro.lazaric}@inria.fr

More information

3.3 Variational Characterization of Singular Values

3.3 Variational Characterization of Singular Values 3.3. Variational Characterization of Singular Values 61 3.3 Variational Characterization of Singular Values Since the singular values are square roots of the eigenvalues of the Heritian atrices A A and

More information

Lower Bounds for Quantized Matrix Completion

Lower Bounds for Quantized Matrix Completion Lower Bounds for Quantized Matrix Copletion Mary Wootters and Yaniv Plan Departent of Matheatics University of Michigan Ann Arbor, MI Eail: wootters, yplan}@uich.edu Mark A. Davenport School of Elec. &

More information

Bipartite subgraphs and the smallest eigenvalue

Bipartite subgraphs and the smallest eigenvalue Bipartite subgraphs and the sallest eigenvalue Noga Alon Benny Sudaov Abstract Two results dealing with the relation between the sallest eigenvalue of a graph and its bipartite subgraphs are obtained.

More information

A note on the multiplication of sparse matrices

A note on the multiplication of sparse matrices Cent. Eur. J. Cop. Sci. 41) 2014 1-11 DOI: 10.2478/s13537-014-0201-x Central European Journal of Coputer Science A note on the ultiplication of sparse atrices Research Article Keivan Borna 12, Sohrab Aboozarkhani

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Coputable Shell Decoposition Bounds John Langford TTI-Chicago jcl@cs.cu.edu David McAllester TTI-Chicago dac@autoreason.co Editor: Leslie Pack Kaelbling and David Cohn Abstract Haussler, Kearns, Seung

More information

Chaotic Coupled Map Lattices

Chaotic Coupled Map Lattices Chaotic Coupled Map Lattices Author: Dustin Keys Advisors: Dr. Robert Indik, Dr. Kevin Lin 1 Introduction When a syste of chaotic aps is coupled in a way that allows the to share inforation about each

More information

Multi-Scale/Multi-Resolution: Wavelet Transform

Multi-Scale/Multi-Resolution: Wavelet Transform Multi-Scale/Multi-Resolution: Wavelet Transfor Proble with Fourier Fourier analysis -- breaks down a signal into constituent sinusoids of different frequencies. A serious drawback in transforing to the

More information

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians Using EM To Estiate A Probablity Density With A Mixture Of Gaussians Aaron A. D Souza adsouza@usc.edu Introduction The proble we are trying to address in this note is siple. Given a set of data points

More information

Chapter 6 1-D Continuous Groups

Chapter 6 1-D Continuous Groups Chapter 6 1-D Continuous Groups Continuous groups consist of group eleents labelled by one or ore continuous variables, say a 1, a 2,, a r, where each variable has a well- defined range. This chapter explores:

More information

Distributed Subgradient Methods for Multi-agent Optimization

Distributed Subgradient Methods for Multi-agent Optimization 1 Distributed Subgradient Methods for Multi-agent Optiization Angelia Nedić and Asuan Ozdaglar October 29, 2007 Abstract We study a distributed coputation odel for optiizing a su of convex objective functions

More information

The proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013).

The proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013). A Appendix: Proofs The proofs of Theore 1-3 are along the lines of Wied and Galeano (2013) Proof of Theore 1 Let D[d 1, d 2 ] be the space of càdlàg functions on the interval [d 1, d 2 ] equipped with

More information

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

e-companion ONLY AVAILABLE IN ELECTRONIC FORM OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer

More information

OPTIMIZATION in multi-agent networks has attracted

OPTIMIZATION in multi-agent networks has attracted Distributed constrained optiization and consensus in uncertain networks via proxial iniization Kostas Margellos, Alessandro Falsone, Sione Garatti and Maria Prandini arxiv:603.039v3 [ath.oc] 3 May 07 Abstract

More information

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution Testing approxiate norality of an estiator using the estiated MSE and bias with an application to the shape paraeter of the generalized Pareto distribution J. Martin van Zyl Abstract In this work the norality

More information

EE5900 Spring Lecture 4 IC interconnect modeling methods Zhuo Feng

EE5900 Spring Lecture 4 IC interconnect modeling methods Zhuo Feng EE59 Spring Parallel LSI AD Algoriths Lecture I interconnect odeling ethods Zhuo Feng. Z. Feng MTU EE59 So far we ve considered only tie doain analyses We ll soon see that it is soeties preferable to odel

More information

A Probabilistic and RIPless Theory of Compressed Sensing

A Probabilistic and RIPless Theory of Compressed Sensing A Probabilistic and RIPless Theory of Copressed Sensing Eanuel J Candès and Yaniv Plan 2 Departents of Matheatics and of Statistics, Stanford University, Stanford, CA 94305 2 Applied and Coputational Matheatics,

More information

A Better Algorithm For an Ancient Scheduling Problem. David R. Karger Steven J. Phillips Eric Torng. Department of Computer Science

A Better Algorithm For an Ancient Scheduling Problem. David R. Karger Steven J. Phillips Eric Torng. Department of Computer Science A Better Algorith For an Ancient Scheduling Proble David R. Karger Steven J. Phillips Eric Torng Departent of Coputer Science Stanford University Stanford, CA 9435-4 Abstract One of the oldest and siplest

More information

Hybrid System Identification: An SDP Approach

Hybrid System Identification: An SDP Approach 49th IEEE Conference on Decision and Control Deceber 15-17, 2010 Hilton Atlanta Hotel, Atlanta, GA, USA Hybrid Syste Identification: An SDP Approach C Feng, C M Lagoa, N Ozay and M Sznaier Abstract The

More information

Stochastic Subgradient Methods

Stochastic Subgradient Methods Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, yutianc}@ics.uci.edu Abstract Stochastic subgradient ethods

More information

. The univariate situation. It is well-known for a long tie that denoinators of Pade approxiants can be considered as orthogonal polynoials with respe

. The univariate situation. It is well-known for a long tie that denoinators of Pade approxiants can be considered as orthogonal polynoials with respe PROPERTIES OF MULTIVARIATE HOMOGENEOUS ORTHOGONAL POLYNOMIALS Brahi Benouahane y Annie Cuyt? Keywords Abstract It is well-known that the denoinators of Pade approxiants can be considered as orthogonal

More information

a a a a a a a m a b a b

a a a a a a a m a b a b Algebra / Trig Final Exa Study Guide (Fall Seester) Moncada/Dunphy Inforation About the Final Exa The final exa is cuulative, covering Appendix A (A.1-A.5) and Chapter 1. All probles will be ultiple choice

More information

Least Squares Fitting of Data

Least Squares Fitting of Data Least Squares Fitting of Data David Eberly, Geoetric Tools, Redond WA 98052 https://www.geoetrictools.co/ This work is licensed under the Creative Coons Attribution 4.0 International License. To view a

More information

Bootstrapping Dependent Data

Bootstrapping Dependent Data Bootstrapping Dependent Data One of the key issues confronting bootstrap resapling approxiations is how to deal with dependent data. Consider a sequence fx t g n t= of dependent rando variables. Clearly

More information

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval Unifor Approxiation and Bernstein Polynoials with Coefficients in the Unit Interval Weiang Qian and Marc D. Riedel Electrical and Coputer Engineering, University of Minnesota 200 Union St. S.E. Minneapolis,

More information

List Scheduling and LPT Oliver Braun (09/05/2017)

List Scheduling and LPT Oliver Braun (09/05/2017) List Scheduling and LPT Oliver Braun (09/05/207) We investigate the classical scheduling proble P ax where a set of n independent jobs has to be processed on 2 parallel and identical processors (achines)

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Journal of Machine Learning Research 5 (2004) 529-547 Subitted 1/03; Revised 8/03; Published 5/04 Coputable Shell Decoposition Bounds John Langford David McAllester Toyota Technology Institute at Chicago

More information

A PROBABILISTIC AND RIPLESS THEORY OF COMPRESSED SENSING. Emmanuel J. Candès Yaniv Plan. Technical Report No November 2010

A PROBABILISTIC AND RIPLESS THEORY OF COMPRESSED SENSING. Emmanuel J. Candès Yaniv Plan. Technical Report No November 2010 A PROBABILISTIC AND RIPLESS THEORY OF COMPRESSED SENSING By Eanuel J Candès Yaniv Plan Technical Report No 200-0 Noveber 200 Departent of Statistics STANFORD UNIVERSITY Stanford, California 94305-4065

More information

PAC-Bayes Analysis Of Maximum Entropy Learning

PAC-Bayes Analysis Of Maximum Entropy Learning PAC-Bayes Analysis Of Maxiu Entropy Learning John Shawe-Taylor and David R. Hardoon Centre for Coputational Statistics and Machine Learning Departent of Coputer Science University College London, UK, WC1E

More information

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search Quantu algoriths (CO 781, Winter 2008) Prof Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search ow we begin to discuss applications of quantu walks to search algoriths

More information

The Simplex Method is Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate

The Simplex Method is Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate The Siplex Method is Strongly Polynoial for the Markov Decision Proble with a Fixed Discount Rate Yinyu Ye April 20, 2010 Abstract In this note we prove that the classic siplex ethod with the ost-negativereduced-cost

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lessons 7 20 Dec 2017 Outline Artificial Neural networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

Testing Properties of Collections of Distributions

Testing Properties of Collections of Distributions Testing Properties of Collections of Distributions Reut Levi Dana Ron Ronitt Rubinfeld April 9, 0 Abstract We propose a fraework for studying property testing of collections of distributions, where the

More information

Structured signal recovery from quadratic measurements: Breaking sample complexity barriers via nonconvex optimization

Structured signal recovery from quadratic measurements: Breaking sample complexity barriers via nonconvex optimization Structured signal recovery fro quadratic easureents: Breaking saple coplexity barriers via nonconvex optiization Mahdi Soltanolkotabi Ming Hsieh Departent of Electrical Engineering University of Southern

More information

Recovery of Sparsely Corrupted Signals

Recovery of Sparsely Corrupted Signals TO APPEAR IN IEEE TRANSACTIONS ON INFORMATION TEORY 1 Recovery of Sparsely Corrupted Signals Christoph Studer, Meber, IEEE, Patrick Kuppinger, Student Meber, IEEE, Graee Pope, Student Meber, IEEE, and

More information

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay A Low-Coplexity Congestion Control and Scheduling Algorith for Multihop Wireless Networks with Order-Optial Per-Flow Delay Po-Kai Huang, Xiaojun Lin, and Chih-Chun Wang School of Electrical and Coputer

More information

arxiv: v1 [cs.ds] 29 Jan 2012

arxiv: v1 [cs.ds] 29 Jan 2012 A parallel approxiation algorith for ixed packing covering seidefinite progras arxiv:1201.6090v1 [cs.ds] 29 Jan 2012 Rahul Jain National U. Singapore January 28, 2012 Abstract Penghui Yao National U. Singapore

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

Ch 12: Variations on Backpropagation

Ch 12: Variations on Backpropagation Ch 2: Variations on Backpropagation The basic backpropagation algorith is too slow for ost practical applications. It ay take days or weeks of coputer tie. We deonstrate why the backpropagation algorith

More information

A Note on the Applied Use of MDL Approximations

A Note on the Applied Use of MDL Approximations A Note on the Applied Use of MDL Approxiations Daniel J. Navarro Departent of Psychology Ohio State University Abstract An applied proble is discussed in which two nested psychological odels of retention

More information

arxiv: v1 [math.na] 10 Oct 2016

arxiv: v1 [math.na] 10 Oct 2016 GREEDY GAUSS-NEWTON ALGORITHM FOR FINDING SPARSE SOLUTIONS TO NONLINEAR UNDERDETERMINED SYSTEMS OF EQUATIONS MÅRTEN GULLIKSSON AND ANNA OLEYNIK arxiv:6.395v [ath.na] Oct 26 Abstract. We consider the proble

More information

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Soft Coputing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Beverly Rivera 1,2, Irbis Gallegos 1, and Vladik Kreinovich 2 1 Regional Cyber and Energy Security Center RCES

More information

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical IEEE TRANSACTIONS ON INFORMATION THEORY Large Alphabet Source Coding using Independent Coponent Analysis Aichai Painsky, Meber, IEEE, Saharon Rosset and Meir Feder, Fellow, IEEE arxiv:67.7v [cs.it] Jul

More information

Topic 5a Introduction to Curve Fitting & Linear Regression

Topic 5a Introduction to Curve Fitting & Linear Regression /7/08 Course Instructor Dr. Rayond C. Rup Oice: A 337 Phone: (95) 747 6958 E ail: rcrup@utep.edu opic 5a Introduction to Curve Fitting & Linear Regression EE 4386/530 Coputational ethods in EE Outline

More information

The Weierstrass Approximation Theorem

The Weierstrass Approximation Theorem 36 The Weierstrass Approxiation Theore Recall that the fundaental idea underlying the construction of the real nubers is approxiation by the sipler rational nubers. Firstly, nubers are often deterined

More information

Necessity of low effective dimension

Necessity of low effective dimension Necessity of low effective diension Art B. Owen Stanford University October 2002, Orig: July 2002 Abstract Practitioners have long noticed that quasi-monte Carlo ethods work very well on functions that

More information

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs BULLETIN OF THE POLISH ACADEMY OF SCIENCES TECHNICAL SCIENCES Vol. 57, No. 3, 2009 Algoriths for parallel processor scheduling with distinct due windows and unit-tie obs A. JANIAK 1, W.A. JANIAK 2, and

More information

A Bernstein-Markov Theorem for Normed Spaces

A Bernstein-Markov Theorem for Normed Spaces A Bernstein-Markov Theore for Nored Spaces Lawrence A. Harris Departent of Matheatics, University of Kentucky Lexington, Kentucky 40506-0027 Abstract Let X and Y be real nored linear spaces and let φ :

More information

In this chapter, we consider several graph-theoretic and probabilistic models

In this chapter, we consider several graph-theoretic and probabilistic models THREE ONE GRAPH-THEORETIC AND STATISTICAL MODELS 3.1 INTRODUCTION In this chapter, we consider several graph-theoretic and probabilistic odels for a social network, which we do under different assuptions

More information

arxiv: v1 [cs.ds] 17 Mar 2016

arxiv: v1 [cs.ds] 17 Mar 2016 Tight Bounds for Single-Pass Streaing Coplexity of the Set Cover Proble Sepehr Assadi Sanjeev Khanna Yang Li Abstract arxiv:1603.05715v1 [cs.ds] 17 Mar 2016 We resolve the space coplexity of single-pass

More information

Physics 139B Solutions to Homework Set 3 Fall 2009

Physics 139B Solutions to Homework Set 3 Fall 2009 Physics 139B Solutions to Hoework Set 3 Fall 009 1. Consider a particle of ass attached to a rigid assless rod of fixed length R whose other end is fixed at the origin. The rod is free to rotate about

More information

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t. CS 493: Algoriths for Massive Data Sets Feb 2, 2002 Local Models, Bloo Filter Scribe: Qin Lv Local Models In global odels, every inverted file entry is copressed with the sae odel. This work wells when

More information

Boosting with log-loss

Boosting with log-loss Boosting with log-loss Marco Cusuano-Towner Septeber 2, 202 The proble Suppose we have data exaples {x i, y i ) i =... } for a two-class proble with y i {, }. Let F x) be the predictor function with the

More information

Introduction to Machine Learning. Recitation 11

Introduction to Machine Learning. Recitation 11 Introduction to Machine Learning Lecturer: Regev Schweiger Recitation Fall Seester Scribe: Regev Schweiger. Kernel Ridge Regression We now take on the task of kernel-izing ridge regression. Let x,...,

More information

arxiv: v1 [math.nt] 14 Sep 2014

arxiv: v1 [math.nt] 14 Sep 2014 ROTATION REMAINDERS P. JAMESON GRABER, WASHINGTON AND LEE UNIVERSITY 08 arxiv:1409.411v1 [ath.nt] 14 Sep 014 Abstract. We study properties of an array of nubers, called the triangle, in which each row

More information

Fast Montgomery-like Square Root Computation over GF(2 m ) for All Trinomials

Fast Montgomery-like Square Root Computation over GF(2 m ) for All Trinomials Fast Montgoery-like Square Root Coputation over GF( ) for All Trinoials Yin Li a, Yu Zhang a, a Departent of Coputer Science and Technology, Xinyang Noral University, Henan, P.R.China Abstract This letter

More information

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information Cite as: Straub D. (2014). Value of inforation analysis with structural reliability ethods. Structural Safety, 49: 75-86. Value of Inforation Analysis with Structural Reliability Methods Daniel Straub

More information

Probability Distributions

Probability Distributions Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples

More information

Tail estimates for norms of sums of log-concave random vectors

Tail estimates for norms of sums of log-concave random vectors Tail estiates for nors of sus of log-concave rando vectors Rados law Adaczak Rafa l Lata la Alexander E. Litvak Alain Pajor Nicole Toczak-Jaegerann Abstract We establish new tail estiates for order statistics

More information

ADVANCES ON THE BESSIS- MOUSSA-VILLANI TRACE CONJECTURE

ADVANCES ON THE BESSIS- MOUSSA-VILLANI TRACE CONJECTURE ADVANCES ON THE BESSIS- MOUSSA-VILLANI TRACE CONJECTURE CHRISTOPHER J. HILLAR Abstract. A long-standing conjecture asserts that the polynoial p(t = Tr(A + tb ] has nonnegative coefficients whenever is

More information

Bayes Decision Rule and Naïve Bayes Classifier

Bayes Decision Rule and Naïve Bayes Classifier Bayes Decision Rule and Naïve Bayes Classifier Le Song Machine Learning I CSE 6740, Fall 2013 Gaussian Mixture odel A density odel p(x) ay be ulti-odal: odel it as a ixture of uni-odal distributions (e.g.

More information

A method to determine relative stroke detection efficiencies from multiplicity distributions

A method to determine relative stroke detection efficiencies from multiplicity distributions A ethod to deterine relative stroke detection eiciencies ro ultiplicity distributions Schulz W. and Cuins K. 2. Austrian Lightning Detection and Inoration Syste (ALDIS), Kahlenberger Str.2A, 90 Vienna,

More information

Upper bound on false alarm rate for landmine detection and classification using syntactic pattern recognition

Upper bound on false alarm rate for landmine detection and classification using syntactic pattern recognition Upper bound on false alar rate for landine detection and classification using syntactic pattern recognition Ahed O. Nasif, Brian L. Mark, Kenneth J. Hintz, and Nathalia Peixoto Dept. of Electrical and

More information

On Poset Merging. 1 Introduction. Peter Chen Guoli Ding Steve Seiden. Keywords: Merging, Partial Order, Lower Bounds. AMS Classification: 68W40

On Poset Merging. 1 Introduction. Peter Chen Guoli Ding Steve Seiden. Keywords: Merging, Partial Order, Lower Bounds. AMS Classification: 68W40 On Poset Merging Peter Chen Guoli Ding Steve Seiden Abstract We consider the follow poset erging proble: Let X and Y be two subsets of a partially ordered set S. Given coplete inforation about the ordering

More information

Kernel-Based Nonparametric Anomaly Detection

Kernel-Based Nonparametric Anomaly Detection Kernel-Based Nonparaetric Anoaly Detection Shaofeng Zou Dept of EECS Syracuse University Eail: szou@syr.edu Yingbin Liang Dept of EECS Syracuse University Eail: yliang6@syr.edu H. Vincent Poor Dept of

More information

DERIVING PROPER UNIFORM PRIORS FOR REGRESSION COEFFICIENTS

DERIVING PROPER UNIFORM PRIORS FOR REGRESSION COEFFICIENTS DERIVING PROPER UNIFORM PRIORS FOR REGRESSION COEFFICIENTS N. van Erp and P. van Gelder Structural Hydraulic and Probabilistic Design, TU Delft Delft, The Netherlands Abstract. In probles of odel coparison

More information

Chapter 6: Economic Inequality

Chapter 6: Economic Inequality Chapter 6: Econoic Inequality We are interested in inequality ainly for two reasons: First, there are philosophical and ethical grounds for aversion to inequality per se. Second, even if we are not interested

More information

1 Rademacher Complexity Bounds

1 Rademacher Complexity Bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Max Goer March 07, 2013 1 Radeacher Coplexity Bounds Recall the following theore fro last lecture: Theore 1. With probability

More information