ETNA Kent State University - PDF Free Download

Electronic Transactions on Numerical Analysis. Volume 38, pp. 44-68,. Copyright,. ISSN 68-963. ETNA ERROR ESTIMATES FOR GENERAL FIDELITIES MARTIN BENNING AND MARTIN BURGER Abstract. Appropriate error estimation for regularization methods in imaging and inverse problems is of enormous importance for controlling approximation properties and understanding types of solutions that are particularly favoured. In the case of linear problems, i.e., variational methods with quadratic fidelity and quadratic regularization, the error estimation is well-understood under so-called source conditions. Significant progress for nonquadratic regularization functionals has been made recently after the introduction of the Bregman distance as an appropriate error measure. The other important generalization, namely for nonquadratic fidelities, has not been analyzed so far. In this paper we develop a framework for the derivation of error estimates in the case of rather general fidelities and highlight the importance of duality for the shape of the estimates. We then specialize the approach for several important fidelities in imaging L, Kullback-Leibler. Key words. error estimation, Bregman distance, discrepancy principle, imaging, image processing, sparsity AMS subject classifications. 47A5, 65, 49M3. Introduction. Image processing and inversion with structural prior information e.g., sparsity, sharp edges are of growing importance in practical applications. Such prior information is often incorporated into variational models with appropriate penalty functionals used for regularization, e.g., total variation or l -norms of coefficients in orthonormal bases. The error control for such models, which is of obvious relevance, is the subject of this paper. Most imaging and inverse problems can be formulated as the computation of a function ũ UΩ from the operator equation,. Kũ = g, with given data g V. Here UΩ and V are Banach spaces of functions on bounded and compact sets Ω, respectively, and K denotes a linear operator K : UΩ V. We shall also allow to be discrete with point measures, which often corresponds to the situation encountered in practice. In the course of this work we want to call g the exact data and ũ the exact solution. Most inverse problems are ill-posed, i.e., K usually cannot be inverted continuously due to compactness of the forward operator. Furthermore, in real-life applications the exact data g are usually not available. Hence, we face to solve the inverse problem,. Ku = f, instead of., with u UΩ and f V, while g and f differ from each other by a certain amount. This difference is referred to as being noise or systematic and modelling errors, which we shall not consider here. Therefore, throughout this work we want to call f the noisy data. Although in general g is not available, nevertheless in many applications a maximum noise bound δ is given. This data error controls the maximum difference between g and f in some measure, depending on the type of noise. For instance, in the case of the standard L -fidelity, we have the noise bound, g f L δ. Received uly 3,. Accepted for publication anuary 6,. Published online March 4,. Recommended by R. Ramlau. Westfälische Wilhelms-Universität Münster, Institut für Numerische und Angewandte Mathematik, Einsteinstr. 6, D 4849 Münster, Germany martin.{benning,burger}@wwu.de. 44

ERROR ESTIMATES FOR GENERAL FIDELITIES 45 In order to obtain a robust approximation û of ũ for. many regularization techniques have been proposed. Here we focus on the particularly important and popular class of convex variational regularization, which is of the form.3 û arg min {H f Ku + αu}, u WΩ with H f : V R { } and : WΩ R { }, WΩ UΩ, being convex functionals. The scheme contains the general fidelity term H f Ku, which controls the deviation from equality of., and the regularization term αu, with α > being the regularization parameter, which guarantees certain smoothness features of the solution. In the literature, schemes based on.3 are often referred to as variational regularization schemes. Throughout this paper we shall assume that is chosen such that a minimizer of.3 exists, the proof of which is not an easy task for many important choices of H f ; cf., e.g., [, ]. Notice that if H f Ku + αu in.3 is strictly convex, the set of minimizers is indeed a singleton. Variational regularization of inverse problems based on general, convex and in many cases singular energy functionals has been a field of growing interest and importance over the last decades. In comparison to classical Tikhonov regularization cf. [3] different regularization energies allow the preservation of certain features, e.g., preservation of edges with the use of Total Variation TV as a regularizer see for instance the well-known ROF-model [9] or sparsity with respect to some bases or dictionaries. By regularizing the inverse problem., our goal is to obtain a solution û close to ũ in a robust way with respect to noise. Hence, we are interested in error estimates that describe the behaviour of the data error δ and optimal choices for quadratic fitting; see [3]. A major step for error estimates in the case of regularization with singular energies has been the introduction of generalized Bregman distances cf. [4, ] as an error measure; cf. [8]. The Bregman distance for general convex, not necessarily differentiable functionals, is defined as follows. DEFINITION. Bregman Distance. Let U be a Banach space and : U R { } be a convex functional with non-empty subdifferential. Then, the Bregman distance is defined as D u, v := {u v p, u v U p v}. The Bregman distance for a specific subgradient ζ v, v U, is defined as D ζ : U U R + with D ζ u, v := u v ζ, u v U. Since we are dealing with duality throughout this work, we are going to write a, b U := a, b U U = b, a U U, for a U and b U, as the notation for the dual product, for the sake of simplicity. The Bregman distance is no distance in the usual sense; at least D ζ u, u = and D ζ u, v hold for all ζ v, the latter due to convexity of. If is strictly convex, we even obtain D ζ u, v > for u v and ζ v. In general, no triangular inequality nor symmetry holds for the Bregman distance. The latter one can be achieved by introducing the so-called symmetric Bregman distance.

46 M. BENNING AND M. BURGER DEFINITION. Symmetric Bregman Distance. Let U be a Banach space and : U R { } be a convex functional with non-empty subdifferential. Then, a symmetric Bregman distance is defined as D symm : U U R + with with D symm u, u := D p u, u + D p u, u = u u, p p U, p i u i for i {, }. Obviously, the symmetric Bregman distance depends on the specific selection of the subgradients p i, which will be suppressed in the notation for simplicity throughout this work. Many works deal with the analysis and error propagation by considering the Bregman distance between û satisfying the optimality condition of a variational regularization method and the exact solution ũ; cf. [7, 9, 7,,, 8]. The Bregman distance turned out to be an adequate error measure since it seems to control only those errors that can be distinguished by the regularization term. This point of view is supported by the need of so-called source conditions, which are needed to obtain error estimates in the Bregman distance setting. In the case of quadratic fitting we have the source condition, ξ ũ, q L : ξ = K q, with K denoting the adjoint operator of K throughout this work. If, e.g., in the case of denoising with K = Id, the exact image ũ contains features that are not elements of the subgradient of, error estimates for the Bregman distance cannot be applied since the source condition is not fulfilled. Furthermore, Bregman distances according to certain regularization functionals have widely been used to replace those regularization terms, which yield inverse scale space methods with improved solutions of inverse problems; cf. [5, 6, 6]. Most works deal with the case of quadratic fitting, with only few exceptions; see, e.g., [7]. However, in many applications, such as Positron Emission Tomography PET, Microscopy, CCD cameras, or radar, different types of noise appear. Examples are Salt-and-Pepper noise, Poisson noise, additive Laplace noise, and different models of multiplicative noise. In the next section, we present some general fidelities as recently used in various imaging applications. Next, we present basic error estimates for general, convex variational regularization methods, which we apply to the specific models. Then we illustrate these estimates and test their sharpness by computational results. We conclude with a brief outlook and formulate open questions. We would also like to mention the parallel development on error estimates for variational models with non-quadratic fidelity in [7], which yields the same results as our paper in the case of Laplacian noise. Since the analysis in [7] is based on fidelities that are powers of a metric instead of the noise models we use here, most approaches appear orthogonal. In particular, we base our analysis on convexity and duality and avoid the use of triangle inequalities, which can only be used for a metric.. Non-quadratic fidelities. In many applications different fidelities than the standard L -fidelity are considered, usually to incorporate different a priori knowledge on the distribution of noise. Exemplary applications are Synthetic Aperture Radar, Positron Emission Tomography or Optical Nanoscopy. In the following, we present three particular fidelities, for which we will derive specific estimates later on.

ERROR ESTIMATES FOR GENERAL FIDELITIES 47.. General norm fidelity. Typical non-quadratic fidelity terms are norms in general, i.e., H f Ku := Ku f V, without taking a power of it. The corresponding variational problem is given via { }. û arg min u WΩ Ku f V + αu. The optimality condition of. can be computed as. K ŝ + αˆp =, ŝ Kû f V, ˆp û. In the following we want to present two special cases of this general class of fidelity terms that have been investigated in several applications.... L fidelity. A typical non-quadratic, nondifferentiable, fidelity term used in applications involving Laplace-distributed or impulsive noise e.g., Salt n Pepper noise, is the L -fidelity; see for instance [,, 3]. The related variational problem is given via.3 û = arg min Kuy fy dµy + αu u WΩ. The optimality condition of.3 can easily be computed as K ŝ + αˆp =, ŝ signkû f, ˆp û, with signx being the signum function, i.e., for x > signx = [, ] for x =. for x <... BV fidelity. In order to separate an image into texture and structure, in [3] Meyer proposed a modification of the ROF model via Fu, v := v BV Ω + u divq dx λ sup q C Ω;R q with respect to u structure and v texture for a given image f = u+v, and with BV Ω defined as w BV Ω := inf p p + p, Ω L Ω subject to div p = w. Initially the norm has been introduced as G-norm. In this context, we are going to consider error estimates for the variational model } u arg min { Ku f BV + αu u WΩ with its corresponding optimality condition K ŝ + αˆp =, ŝ Kû f BV, ˆp û.

48 M. BENNING AND M. BURGER.. Kullback-Leibler fidelity. In applications such as Positron Emission Tomography or Optical Nanoscopy, sampled data usually obey a Poisson process. For that reason, other fidelities than the L fidelity have to be incorporated into the variational framework. The most popular fidelity in this context is the Kullback-Leibler divergence cf. [4], i.e., H f Ku = [ ] fy fyln fy + Kuy dµy, Kuy Furthermore, due to the nature of the applications and their data, the function u usually represents a density that needs to be positive. The related variational minimization problem with an additional positivity constraint therefore reads as û arg min u W u With the natural scaling assumption, we obtain the complementarity condition, [ ] fy fyln fy + Kuy Kuy K =, dµy + αu..4 û, K f Ku αˆp, û K f Kû + αˆp =, ˆp û..3. Multiplicative noise fidelity. In applications such as Synthetic Aperture Radar the data is supposed to be corrupted by multiplicative noise, i.e., f = Kuv, where v represents the noise following a certain probability law and Ku is assumed. In [], Aubert and Aujol assumed v to follow a gamma law with mean one and derived the data fidelity, H f Ku = [ ln Kuy fy + fy ] Kuy dµy. Hence, the corresponding variational minimization problem reads as.5 û argmin u WΩ [ ln with the formal optimality condition Kuy fy + fy ] Kuy Kûy fy = K Kûy + αˆp, ˆp û. dµy + αu One main drawback of.5 is that the fidelity term is not globally convex and therefore will not allow unconditional use of the general error estimates which we are going to derive in Section 3. In order to convexify this speckle noise removal model, in [9] Huang et al.

ERROR ESTIMATES FOR GENERAL FIDELITIES 49 suggested the substitution zy := lnkuy to obtain the entirely convex optimization problem, [.6 ẑ = arg min z W ] zy + fye zy lnfy dµy + αz with optimality condition.7 fye ẑy + αˆp = for ˆp ẑ. This model is a special case of the general multiplicative noise model presented in [3]. We mention that in the case of total variation regularization a contrast change as above is not harmful, since the structural properties edges and piecewise constant regions are preserved. 3. Results for general models. After introducing some frequently used non-quadratic variational schemes, we present general error estimates for convex variational schemes. These basic estimates allow us to derive specific error estimates for the models presented in Section. Furthermore, we explore duality and discover an error estimate dependent on the convex conjugates of the fidelity and regularization terms. In order to derive estimates in the Bregman distance setting we need to introduce the so-called source condition, SC ξ ũ, q V : ξ = K q. As described in Section, the source condition SC in some sense ensures that a solution ũ contains features that can be distinguished by the regularization term. 3.. Basic estimates. In this section we derive basic error estimates in the Bregman distance measure for general variational regularization methods. To find a suitable solution of the inverse problem. close to the unknown exact solution ũ of., we consider methods of the form.3. We denote a solution of.3, which fulfills the optimality condition due to the Karush-Kuhn-Tucker conditions KKT, by û. First of all, we derive a rather general estimate for the Bregman distance D ξ û, ũ. LEMMA 3.. Let ũ denote the exact solution of the inverse problem. and let the source condition SC be fulfilled. Furthermore, let the functional : WΩ R { } be convex. If there exists a solution û that satisfies.3 for α >, then the error estimate holds. H f Kû + αd ξ û, ũ H fg α q, Kû g V Proof. Since û is an existing minimal solution satisfying.3, we have H f Kû + αû H f Kũ }{{} + αũ. =g If we subtract α ũ + ξ, û ũ UΩ on both sides we end up with H f Kû + α û ũ ξ, û ũ UΩ H f g α ξ, û ũ UΩ }{{}}{{} =D ξ û,ũ = K q,û ũ V = H f g α q, Kû g V.

5 M. BENNING AND M. BURGER Notice that needs to be convex in order to guarantee the positivity of D ξ û, ũ and therefore to ensure a meaningful estimate. In contrast to that, the data fidelity H f does not necessarily need to be convex, which makes Lemma 3. a very general estimate. Furthermore, the estimate also holds for any u for which we can guarantee H f Ku + αu H f Kũ + αũ a property that obviously might be hard to prove for a specific u, which might be useful to study non-optimal approximations to û. Nevertheless, we are mainly going to deal with a specific class of convex variational problems that allows us to derive sharper estimates, similar to Lemma 3. but for D symm û, ũ. Before we prove these estimates, we define the following class of problems that we further want to investigate: DEFINITION 3.. We define the class CΦ, Ψ, Θ as follows: H,, K CΦ, Ψ, Θ if K : Θ Φ is a linear operator between Banach spaces Θ and Φ, H : Φ R { } is proper, convex and lower semi-continuous, : Ψ R { } is proper, convex and lower semi-continuous, there exists a u with Ku domh and u dom, such that H is continuous at Ku. With this definition we assume more regularity to the considered functionals and are now able to derive the same estimate as in Lemma 3., but for D symm û, ũ instead of D ξ û, ũ. THEOREM 3.3 Basic Estimate I. Let H f,, K CV, WΩ, UΩ, for compact and bounded sets Ω and. Then, if the source condition SC is fulfilled, the error estimate 3. H f Kû + α D symm û, ũ H f g α q, Kû g V holds. Proof. Since H f and are convex, the optimality condition of.3 is given via H f Kû + α û. Since both H f and are proper, lower semi-continuous, and convex, and since there exists u with Ku domh f and u dom, such that H f is continuous at Ku, we have H f Ku + α u = H f Ku + αu for all u WΩ, due to [, Chapter, Section 5, Proposition 5.6]. Due to the linear mapping properties of K, we furthermore have H f K u = K H f Ku Hence, the equality K ˆη + αˆp = holds for ˆη H f Kû and ˆp û. If we subtract αξ, with ξ fulfilling SC, and take the duality product with û ũ, we obtain which equals K ˆη, û ũ UΩ + α ˆp ξ, û ũ UΩ = α ξ, û ũ }{{} UΩ, =K q ˆη, Kû }{{} Kũ V + αd symm û, ũ = α q, Kû }{{} Kũ V. =g =g

ERROR ESTIMATES FOR GENERAL FIDELITIES 5 Since H f is convex, the Bregman distance Dˆη H f g, Kû is non-negative, i.e., Dˆη H f g, Kû = H f g H f Kû ˆη, g Kû V, for ˆη H f Kû. Hence, we obtain ˆη, Kû g V H f Kû H f g. As a consequence, this yields 3.. We can further generalize the estimate of Theorem 3.3 to obtain the second important general estimate in this work. THEOREM 3.4 Basic Estimate II. Let H f,, K CV, WΩ, UΩ for compact and bounded sets Ω and. Then, if the source condition SC is fulfilled, the error estimate 3. ch f Kû + α D symm û, ũ + ch f g α q, f g V ch f g + α q, f Kû V ch f Kû holds for c ], [. Proof. Due to Theorem 3.3, we have H f Kû + αd symm û, ũ H f g α q, Kû g V. The left-hand side is equivalent to while the right-hand side can be rewritten as ch f Kû + αd symm û, ũ + ch f Kû, + ch f g α q, Kû g V ch f g for c ], [, without affecting the inequality. The dual product q, Kû g V is equivalent to q, f + Kû g f V and hence we have α q, Kû g V = α q, f g V + α q, f Kû V. Subtracting ch f Kû on both sides and replacing α q, Kû g V by α q, f g V + α q, f Kû V yields 3.. In Section 4 these two basic estimates will allow us to easily derive specific error estimates for the noise models described in Section. 3.. A dual perspective. In the following we provide a formal analysis in terms of Fenchel duality, which highlights a general way to obtain error estimates and provides further insights. In order to make the approach rigorous one needs to check detailed properties of all functionals allowing to pass to dual problems formally cf. [], which is however not our goal here. In order to formulate the dual approach we redefine the fidelity to G f Ku f := H f Ku and introduce the convex conjugates G f q = sup q, v V G f v, p = sup p, u UΩ u, v V u WΩ

5 M. BENNING AND M. BURGER for q V and p UΩ. Under appropriate conditions, the Fenchel duality theorem cf. [, Chapter 3, Section 4] implies the primal-dual relation, [ ] [ min u WΩ α G fku f + u = min K q q, f V + ] q V α G f αq, as well as a relation between the minimizers û of the primal and ˆq of the dual problem, namely, K ˆq û, û K ˆq. More precisely, the optimality condition for the dual problem becomes Kû f r =, r G f αˆq. If the exact solution ũ satisfies a source condition with source element d i.e. K d ũ, then we can use the dual optimality condition and take the duality product with ˆq d, which yields Kû ũ, ˆq d V = α r, αd αˆq V + f g, ˆq d V. One observes that the left-hand side equals D symm û, ũ = û ũ, K ˆq d UΩ, i.e., the Bregman distance we want to estimate. Using r G f αˆq, we find r, αd αˆq V G f αd G f αˆq. Under the standard assumption G f =, we find that G f noise-free case f = g, we end up with the estimate, is nonnegative and hence in the D symm û, ũ α G f αd. Hence the error in terms of α is determined by the properties of the convex conjugate of G f. For typical smooth fidelities G f, we have G f = and G f =. Hence α G f αd will at least grow linearly for small α, as confirmed by our results below. In the applications to specific noise models our strategy will be to estimate the terms on the right-hand side of 3. by quantities like G f αd and then work out the detailed dependence on α. 4. Application to specific noise models. We want to use the basic error estimates derived in Section 3 to derive specific error estimates for the noise models presented in Section. In the following it is assumed that the operator K satisfies the conditions of Theorem 3.3 and Theorem 3.4. 4.. General norm fidelity. With the use of Theorem 3.3 we can in analogy to the error estimates for the exact penalization model in [8] obtain the following estimate for H f Ku := Ku f V with û satisfying the optimality condition. and ũ being the exact solution of.. THEOREM 4.. Let û satisfy the optimality condition. and let ũ denote the exact solution of.. Furthermore, the difference between exact data g and noisy data f is

ERROR ESTIMATES FOR GENERAL FIDELITIES 53 bounded in the V-norm, i.e. f g V δ and SC holds. Then, for the symmetric Bregman distance D symm û, ũ for a specific regularization functional, such that H f,, K CV, WΩ, UΩ is satisfied, the estimate 4. α q V H f Kû + α D symm û, ũ + α q V δ holds. Furthermore, for α < / q V, we obtain 4. D symm û, ũ δ α + q V. Proof. Since we have H f,, K CV, WΩ, UΩ, we obtain due to Theorem 3.3 H f Kû + α D symm û, ũ H f g α q, Kû g V }{{} δ δ α q, Kû f + f g V = δ α q, Kû f V + q, f g V δ + α q V Kû f V + f g V δ + α q V Kû f V + δ, which leads us to 4.. If we insert H f Kû = Kû f V and set α < / q V, then we can subtract Kû f V on both sides. If we divide by α, then we obtain 4.. As expected from the dual perspective above, we obtain in the case of exact data δ = for α sufficiently small D symm û, ũ =, H g Kû =. For larger α no useful estimate is obtained. In the noisy case we can choose α small but independent of δ and hence obtain D symm û, ũ = Oδ. We remark on the necessity of the source condition SC. In usual converse results one proves that a source condition needs to hold if the distance between the reconstruction and the exact solution satisfies a certain asymptotic in δ; cf. [5]. Such results so far exist only for quadratic fidelity and special regularizations and cannot be expected for general Bregman distance estimates even less with non-quadratic fidelity models. We shall therefore only look on the asymptotics of H f in the noise-free case and argue that for this asymptotic the source condition is necessary at least in some sense. In the case of a general norm fidelity this is particularly simple due to the asymptotic exactness for α small. The optimality condition K ŝ + αˆp = can be rewritten as ˆp = K q, ˆp û, q V, with q = αŝ. Since û is a solution minimizing for α sufficiently small, we see that if the asymptotic in α holds, there exists a solution of Ku = g with minimal satisfying SC.

54 M. BENNING AND M. BURGER 4.. Poisson noise. In the case of Poisson noise the source condition can be written as SCL ξ ũ, q L : ξ = K q, and we have the Kullback-Leibler fidelity, [ H f Ku = fyln fy Kuy ] fy + Kuy dµy, and a positivity constraint u. Theorem 3.4 will allow us to derive an error estimate of the same order as known for quadratic fidelities. Before that, we have to prove the following lemma. LEMMA 4.. Let α and ϕ be positive, real numbers, i.e., α, ϕ R +. Furthermore, let γ R be a real number and c ], [. Then, the family of functions ϕ h n x := n αγϕ x c ϕln ϕ + x, x for x > and n N, are strictly concave and have their unique maxima at x h n = They are therefore bounded by ϕ + n α c γ. h n x < h n x h n = n αγϕ cϕln + n α c γ for α c γ < and x xh n. Proof. It is easy to see that h nx = c ϕ x < and, hence, h n is strictly concave for all n N. The unique maxima x h n can be computed via h nx h n =. Finally, since h n is strictly concave for all n N, h n x h n has to be a global maximum. Furthermore, we have to ensure the existence of u with Ku domh f and u dom, such that H f is continuous at Ku. If, e.g., dom = BVΩ, we do not obtain continuity of H f at Ku if K maps to, e.g., L. Therefore, we restrict K to map to L. However, we still keep SCL to derive the error estimates, which corresponds to an interpretation of K mapping to L. This implies more regularity than needed, since one usually uses q in the dual of the image space, which would mean q L. For the latter we are not able to derive the same estimates. Note, however, that the assumption of K mapping to L is used only to deal with the positivity of K. With the help of Lemma 4. and the restriction to K we are able to prove the following error estimate. THEOREM 4.3. Let û satisfy the optimality condition.4 with K : UΩ L satisfying NK = {}, let ũ denote the exact solution of., and let f be a probability density measure, i.e., f dµy =. Assume that the difference between noisy data f and exact data g is bounded in the Kullback-Leibler measure, i.e., [ ] f f ln f + g dµy δ g and that SCL holds. Then, for c ], [ and α < q, the symmetric Bregman L distance D symm û, ũ for a specific regularization functional, such that H f,, K CL, WΩ, UΩ c

ERROR ESTIMATES FOR GENERAL FIDELITIES 55 is satisfied, is bounded via 4.3 ch f Kû + α D symm û, ũ + cδ c ln α c q L. Proof. We have H f,, K CL, WΩ, UΩ. Using an analogous proof as in Theorem 3.4 with the non-negativity of û being incorporated in a variational inequality, we can still derive 3. in this case. Hence, we have to investigate α q, f g L ch f g and α q, f Kû L ch f Kû. If we consider both functionals pointwise and force, α c < q then we can use Lemma 4. to estimate and α q, f g L ch f g f α q, f Kû L ch f Kû f Adding these terms together yields the estimate ch f Kû + α D symm It is easy to see that for α < û, ũ + ch f g }{{} δ c q L we have ln α c q Hence, for positive f we obtain ch f Kû + α D symm û, ũ + cδ + αq c ln α c q dµy αq c ln + α c q dµy. + ln α c q L. = + cδ c ln f c ln α c q dµy. f c ln α c q L dµy α c q L f dµy } {{ } = and, hence, 4.3 holds. One observes from a Taylor approximation of the second term on the right-hand side of 4.3 around α = that H f Kû = O δ + α, D symm δ û, ũ = O α + α for small α, which is analogous to the quadratic case. REMARK 4.4. The assumption NK = {} is very strict. If NK is larger, the error estimate is still satisfied since H f is convex no longer strictly convex and the terms α q, f g L ch f g and α q, f Kû L ch f Kû are concave instead of being strictly concave. Hence, Lemma 4. can still be applied to find an upper estimate, the only difference is that there can be more than just one maximum.

56 M. BENNING AND M. BURGER 4.3. Multiplicative noise. In the case of multiplicative noise we are going to examine model.6 instead of.5, since.6 is convex for all z and therefore allows the application of Theorem 3.4. The source condition differs slightly, since there is no operator in that type of model. Therefore, we get zscl ξ z, q L : ξ = q. In analogy to the Poisson case, we have to prove the following lemma first, to derive qualitative and quantitative error estimates in the case of multiplicative noise. LEMMA 4.5. Let α and ϕ be positive, real numbers, i.e., α, ϕ R +. Furthermore, let γ R be a real number and c ], [. Then, the family of functions, k n x := n αγϕ x cx + ϕe x lnϕ for x > and n N, are strictly concave and have their unique maxima at c + x k n n = ln αγ cϕ for α c γ <. Hence, k n is bounded via c + k n x < k n x k n = αγ ϕ n n αγ c + n αγ + ln + c ln, cϕ c for x x k n. Proof. Similarly to Lemma 4., it can easily be shown that k n x = cϕe x < for all x R + and hence, the k n are strictly concave for all n N. The arguments x k n are computed to satisfy k n xk n =. Since the k n are strictly concave, k n x k n has to be a global maximum for all n N. With the help of Lemma 4.5, we are able to prove the following error estimate. THEOREM 4.6. Let ẑ satisfy the optimality condition.7 and let z denote the solution of z = ln Kũ = lng, with ũ being the exact solution of.. Assume that the difference between noisy data f and exact data g is bounded in the measure of.5, i.e., g ln + f f g dµy δ and that zscl holds. Then, for c ], [ and α < c/ q L, the symmetric Bregman distance D symm ẑ, z for a specific regularization functional such that is satisfied, is bounded via 4.4 H f,, Id CL, W, U c + α q L ch f ẑ + α D symm ẑ, z + cδ + α q L ln c α q L. Proof. First of all, we have H f CL, W, U. Furthermore, there exists u with Ku domh f and u dom, such that H f is continuous at Ku. Hence,

ERROR ESTIMATES FOR GENERAL FIDELITIES 57 we can apply Theorem 3.4 to obtain 3.. Therefore, we have to consider the functionals α q, f g L ch f g and α q, f ẑ L ch f ẑ pointwise. Due to Lemma 4.5 we have α q, f g L ch f g + α q, f ẑ L ch f ẑ + = α c αq c αq + c ln cf c c αq c + αq + c ln cf c c + αq c αq ln ln dµy cf cf }{{} αq f ln αq f + ln + c q =ln c αq c+αq c + αq c αq ln + ln dµy, c c }{{} =ln for α < c/q. It is easy to see that q ln α c q c+αq c αq it also easily can be verified that the function lx := ln dµy dµy q L ln c+α q L c α q. Furthermore, L α c x is strictly concave and α c q has its unique global maximum lx = at x =. Hence, if we consider ln pointwise, c ln α c q dµy has to hold. Inserting these estimates into 3. yields 4.4. Again a Taylor approximation of the second term on the right-hand side of 4.4 around α = yields the asymptotic behaviour, H f Kû = O δ + α, D symm δ û, ũ = O α + α. 5. A-posteriori parameter choice. Before we start discussing computational aspects and examples we want to briefly consider a-posteriori parameter choices for variational problems. A typical a-posteriori parameter choice rule is the discrepancy principle. For a general norm fidelity Ku f V the discrepancy principle states that for a given noise bound f g V δ the solution û to a regularized variational problem should satisfy Kû f V δ, i.e., 5. û argmin {u}, u WΩ subject to 5. Ku f V δ. We can reformulate this problem as 5.3 û argmin u WΩ } {X δ Ku f V + u

58 M. BENNING AND M. BURGER with X δ being the characteristic function { if v δ X δ v :=. + else With the use of the triangular inequality of the norm and the monotonicity and convexity of the characteristic function, it easily can be seen that X δ Ku f V is convex, and by setting H f Ku = X δ Ku f V, we can apply Lemma 3. to obtain the following theorem. THEOREM 5.. Let ũ denote the exact solution of. and let the source condition SC be fulfilled. If there exists a minimal solution û satisfying 5. subject to 5. and if f g V is also bounded by δ, the error estimate D ξ û, ũ δ q V holds. Proof. If we apply Lemma 3. to the variational problem 5.3, we obtain X δ Kû f V +D ξ }{{} û, ũ X δ f g V q, Kû g V }{{} δ } {{ } = δ }{{} = = q, Kû f V + q, f g V q V Kû f V + f g V }{{}}{{} δ δ = δ q V REMARK 5.. Obviously a discrepancy principle also can be considered for general fidelities, not only for norm fidelities, i.e., we may replace Ku f V in 5. by a general fidelity H f Ku. In that case we can apply the same convex-combination trick as in Theorem 3.4 to obtain with a subsequent computation of estimates for α q, f g V ch f g and α q, f Kû V ch f Kû as in Lemma 4. and Lemma 4.5 error estimates for D ξ û, ũ. 6. Computational tests. In the following we present some numerical results for validating the practical applicability of the error estimates as well as their sharpness. We shall focus on two models, namely, the L -fidelity due to the interesting asymptotic exactness and the Kullback-Leibler fidelity due to the practical importance. 6.. Laplace noise. In order to validate the asymptotic exactness or non-exactness in the case of Laplacian noise we investigate a denoising approach with quadratic regularization, i.e., the minimization 6. u f dx + α u + u dx min, u H Ω whose optimality condition is Ω Ω α u + αu + s =, s sign u f.

ERROR ESTIMATES FOR GENERAL FIDELITIES 59.6 Symmetric Bregman Distance Between û and g.4 Symmetric Bregman Distance..8.6.4....3.4.5.6.7.8.9 α = 3 to FIG. 6.. The Bregman distance error between û and g for α [ 3, ]. As soon as α, the error equals zero. A common approach to the numerical minimization of functionals such as 6. is a smooth approximation of the L -norm, e.g., by replacing u f with u f + ǫ for small ǫ. Such an approximation will however alter the asymptotic properties and destroy the possibility to have asymptotic exactness. Hence we shall use a dual approach as an alternative, which we derive from the dual characterization of the one-norm, [ ] inf u u f dx + α Ω Ω u + u dx [ = inf sup fs dx + u s Ωu α [ = sup inf fs dx + s u Ωu α Ω Ω ] u + u dx ] u + u dx. Exchanging infimum and supremum in the last formula can easily be justified with standard methods in convex analysis; cf. []. The infimum can be calculated exactly from solving α u + αu = s with homogeneous Neumann boundary conditions, and hence we obtain after simple manipulation the dual problem with the notation A := + sa s dx + α fs dx min. Ω Ω s L Ω, s This bound-constrained quadratic problem can be solved with efficient methods; we simply use a projected gradient approach, i.e., s k+ = P s k τ A s k + αf, where τ > is a damping parameter and P is the pointwise projection operator, P vx = { vx if vx vx vx else. Due to the quadratic H regularization, we obtain 6. D symm H û, g = D H û, g = û g H Ω,

6 M. BENNING AND M. BURGER Exact gx and noisy fx.5.5 fx gx 3 4 5 6 x = to π FIG. 6.. Exact gx = cosx and noisy fx, corrupted by Laplace noise with mean zero, σ =. and δ.37. and the source condition becomes SCH qx = gx + gx, q n =, for x Ω. for x Ω and q H Ω, In the following we present two examples and their related error estimates. EXAMPLE 6.. For our first data example we choose gx = cosx, for x [, π]. Since g C [, π] and g = g π =, the source condition SCH is fulfilled. Hence, the derived error estimates in Section 4 should work. First of all we check 4. and 4. numerically for noise-free data, i.e., f = g and δ =. The estimates predict that as soon as α note that q L [,π] = cosx L [,π] = holds, the regularized solution û should be identical to the exact solution g in the Bregman distance setting 6.. This is also found in computational practice, as Figure 6. confirms. In the following we want to illustrate the sharpness of 4. in the case of non-zero δ. For this reason, we have generated Laplace-distributed random variables and have added them to g to obtain f. We have generated random variables with different values for the variance of the Laplace distribution to obtain different noise levels δ in the L -measure. Figure 6. shows g and an exemplary noisy version of g with δ.37. In the following, we computed δ as the L -norm over [, π], to adjust the dimension of δ to the H -norm in the above example δ then approximately becomes δ.6. In order to validate 4. we produced many noisy functions f with different noise levels δ in the range of to. For five fixed α values α =., α =.4, α =.5, α =.6, and α = we have plotted the symmetric Bregman distances between the regularized solutions û and g, the regression line of these distances and the error bound given via 4.; the results can be seen in Figure 6.3. It can be observed that for α =. and α =.4 the computed Bregman distances lie beneath that bound, while for α =.5, α =.6 and α = the error bound is violated, which seems to be a good indicator of the sharpness of 4.. EXAMPLE 6.. In order to validate the need for the source condition SCH we want to consider two more examples; g x = sinx and g x = x π, x [, π]. Both functions violate SCH ; g does not fulfill the Neumann boundary conditions, while the second derivative of g is a δ-distribution centered at π/ and therefore not integrable. In the case of g there does not exist a q such that there could exist an α to guarantee 4.. Nevertheless, in order to visualize that there exists no such error bound, we want to introduce

ERROR ESTIMATES FOR GENERAL FIDELITIES 6 Plot of û g for g = cosx and alpha =. H Plot of û g for g = cosx and alpha =.4 H.8 3.6 Symmetric Bregman Distance.4..8.6.4 Symmetric Bregman Distance.5.5.5. Symmetric Bregman Distance û g H Error bound δ / α + q Regression Line of the Bregman Distances Symmetric Bregman Distance û g H Error bound δ / α + q Regression Line of the Bregman Distances..4.6.8..4.6.8 δ = to..4.6.8..4.6.8 δ = to a α =. b α =.4 Plot of û g for g = cosx and alpha =.5 H Plot of û g for g = cosx and alpha =.6 H 3.5 3.5 Symmetric Bregman Distance 3.5.5.5..4.6.8..4.6.8 δ = to c α =.5 Symmetric Bregman Distance û g H Error bound δ / α + q Regression Line of the Bregman Distances Symmetric Bregman Distance 3.5.5.5..4.6.8..4.6.8 δ = to d α =.6 Symmetric Bregman Distance û g H Error bound δ / α + q Regression Line of the Bregman Distances Plot of û g for g = cosx and alpha = H 8 Logarithmically Scaled Plot of the Error Bound and the Regression Line, for alpha =. Symmetric Bregman Distance 4.5 4 3.5 3.5.5.5..4.6.8..4.6.8 δ = to e α = Symmetric Bregman Distance û g H Error bound δ / α + q Regression Line of the Bregman Distances Logarithmically Scaled Symmetric Bregman Distance 6 4 8 6 4 Error bound δ / α + q Regression Line of the Bregman Distances..4.6.8..4.6.8 Logarithmically Scaled δ f α =. FIG. 6.3. The plots of computed symmetric Bregman distances for α =.,.4,.5,.6 and α =, against δ = to δ =. It can be seen that in a and b the computed Bregman distances lie beneath the error bound derived in 4., while the distances in c, d, and e partly violate this bound. Figure f shows the logarithmically scaled error bound in comparison to the logarithmically scaled regression line of the Bregman distances for α =.. It can be observed, that the slope of the regression line is smaller than the slope of the error bound. Hence, for the particular choice of gx = cosx there might exist an even stronger error bound than 4.. a reference bound δ /α + w L [,π] with wx := g x + g x, x [, π[ ]π, π], which yields w L [,π] = π. As in Example 6. we want to begin with the case of exact data, i.e., f = g. If we plot the symmetric Bregman distance against α, then we obtain the graphs displayed in Figure 6.4. It can be seen that for g as well as for g the error tends to be zero only if α gets very small. To

6 M. BENNING AND M. BURGER Symmetric Bregman Distance Between û and g 4.5 Symmetric Bregman Distance Between û and g.8 4 Symmetric Bregman Distance.6.4..8.6.4 Symmetric Bregman Distance 3.5 3.5.5..5...3.4.5.6.7.8.9 α = 3 to...3.4.5.6.7.8.9 α = 3 to a D symm H û, g vs α b D symm H û, g vs α FIG. 6.4. The symmetric Bregman distances D symm H û, g a and D symm H û, g b, for α ˆ 3,. illustrate the importance of the source condition in the noisy case with non-zero δ, we have proceeded as in Example 6.. We generated Laplace-type noise and added it to g and g to obtain f and f for different error values δ. Figure 6.5 shows the Bregman distance error in comparison to the error bound given via 4. and in comparison to the reference bound as described above, respectively. It can be seen that in comparison to Example 6. the error and reference bounds are completely violated, even for small α. Furthermore, in the worst case of g for α = the slope of the logarithmically scaled regression line is equal to the slope of the reference bound, which indicates that the error assumingly will in general never get beyond this reference bounds. The results support the need for the source condition to find quantitave error estimates. 6.. Compressive sensing with Poisson noise. For the validation of the error estimates in the Poisson case we consider the following discrete inverse problem. Given a two dimensional function u at discrete sampling points, i.e., u = u i,j i=,...,n, j=,...,m, we investigate the operator K : l Ω l and the operator equation with Ku = g g i,j = n φ i,k u k,j, k= for i =,...,l, j =,...,m, where φ i,j [, ] are uniformly distributed random numbers, Ω = {,..., n} {,..., m}, = {,..., l} {,..., m} and l >> n, such that K has a large nullspace. Furthermore, we consider f instead of g, with f being corrupted by Poisson noise. 6... Sparsity regularization. In the case of sparsity regularization, we assume u to have a sparse representation with respect to a certain basis. Therefore, we consider an operator B : l Θ l Ω such that u = Bc holds for coefficients c l Θ. If we want to apply a regularization that minimizes the l -norm of the coefficients, we obtain the minimization problem 6.3 f ln f KBc + KBc f dµy + α i,j c i,j l Θ min c l Θ.

ERROR ESTIMATES FOR GENERAL FIDELITIES 63 Plot of û g H for g = sinx and alpha =.4 Plot of û g H for g = sinx and alpha =.6 3.5 4 Symmetric Bregman Distance 3.5.5 Symmetric Bregman Distance 3.5 3.5.5.5 Symmetric Bregman Distance û g H Error bound δ / α + q Regression Line of the Bregman Distances.5 Symmetric Bregman Distance û g H Error bound δ / α + q Regression Line of the Bregman Distances..4.6.8..4.6.8 δ = to..4.6.8..4.6.8 δ = to a α =.4 b α =.6 Plot of û g H for g = x π and alpha =.4 Plot of û g H for g = x π and alpha = 9 5 Symmetric Bregman Distance 8 7 6 5 4 3..4.6.8..4.6.8 δ = to c α =.4 Symmetric Bregman Distance û g H Reference bound δ / α + w Regression Line of the Bregman Distances Symmetric Bregman Distance 5..4.6.8..4.6.8 δ = to d α = Symmetric Bregman Distance û g H Reference bound δ / α + w Regression Line of the Bregman Distances 9 Logarithmically Scaled Plot of the Error Bound and the Regression Line, for alpha = 4 Logarithmically Scaled Plot of the Reference Bound and the Regression Line, for alpha = Logarithmically Scaled Symmetric Bregman Distance 8 7 6 5 4 3 Error bound δ / α + q Regression Line of the Bregman Distances..4.6.8..4.6.8 Logarithmically Scaled δ e α = Logarithmically Scaled Symmetric Bregman Distance 8 6 4 Reference bound δ / α + w Regression Line of the Bregman Distances..4.6.8..4.6.8 Logarithmically Scaled δ f α = FIG. 6.5. The plots of the computed Bregman distances with violated SCH. Figures a and b show the Bregman distances D symm H û, g for α =.4 and α =.6, respectively. Figures c and d represent the Bregman distances D symm H û, g for α =.4 and α =. Furthermore, Figures e and f show the logarithmically scaled versions of the error/reference bound in comparison to a line regression of the Bregman distances for α =. Notice that the measure µ is a point measure and the integral in 6.3 is indeed a sum over all discrete samples, which we write as an integral in order to keep the notation as introduced in the previous sections. To perform a numerical computation, we use a forward-backward splitting algorithm based on a gradient descent of the optimality condition. For an initial set

64 M. BENNING AND M. BURGER 7 5 7 6 6 6 4 5 4 5 5 4 3 5 5 5 5 5 5 3 3 5 3 3 a ũ 5 5 5 3 b ũ top-view 85 8 8 8 8 7 6 5 75 7 7 6 5 75 7 65 65 6 6 8 55 8 55 6 5 6 5 4 4 c g 3 45 d f 3 45 85 8 8 75 75 4 7 4 7 6 65 6 65 8 6 8 6 55 55 5 5 45 45 3 e g top-view 3 f f top-view FIG. 6.6. In a we can see ũ as defined via ũ = B c. A top-view of ũ can be seen in b. In c and d the plots of g = Kũ and its noisy version f are described, while e and f show top-view illustrations of c and d. of Θ coefficients c i,j, we compute the iterates c i,j = c i,j k+ k τ c i,j k+ = sign c i,j k+ B T K T f KBc k c i,j k+ α, τ + i,j,

ERROR ESTIMATES FOR GENERAL FIDELITIES 65 with τ > being a damping parameter and a + = max {, a}. The disadvantage of this approach is that positivity of u is not guaranteed. However, it is easy to implement and produces satisfactory results as we will see in the following. As a computational example, we choose u to be sparse with respect to the cosine basis. We therefore define B : l Θ l Ω as the two-dimensional cosine-transform, 6.4 n c a,b = γaγb i= m j= πi + a u i,j cos n cos for a {,...,n}, b {,...,m} and { /n forx = γx =. /n forx πj + b, m Remember that since the cosine-transform defined as in 6.4 is orthonormal we have B = B T. We set ũ = B c with c being zero except for c, = 4 nm, c, = / nm, and c 4,4 = 3/ nm. With this choice of c we guarantee ũ >. Furthermore, we obtain g = Kũ with g >. Finally, we generate a noisy version f of g by replacing every sample g i,j with a Poisson random number f i,j with expected value g i,j. As an example we chose n = m = 3. x 3 8 5.8 6.6.4 4 4. 5 6. 8 5.4.6 4 6 3.8 8 5 5 5 3 a ξ 3 b q FIG. 6.7. The computed subgradient ξ a and the function q b, which are related to each other by the source condition SCl. The computed q has minimum l -norm among all q s satisfying SCl ; q l 8.9 3. Hence we obtain c, = 8, c, = 6, and c 4,4 = 48, while the other coefficients remain zero. Furthermore, we obtain ũ, g and f as described in Figure 6.6. The operator dimension is chosen to be l = 8. The damping parameter is set to the constant value τ =.5. The source condition for this discrete data example becomes SCl ξ c l Ω, q l : ξ = B T K T q. It can be shown that the subgradient of c l Ω is simply for c i,j > 6.5 c i,j l Ω = sign c i,j = [, ] for c i,j = for c i,j <,

66 M. BENNING AND M. BURGER for all i, j Ω. Hence, to validate the error estimates and their sharpness, we have computed ξ and q with minimal l -norm, in order to satisfy SCl and 6.5. The computational results can be seen in Figure 6.7, where q l 8.9 3 holds. With ξ computed, the symmetric Bregman distance easily can be calculated via D symm l ĉ, c = ˆp ξ, ĉ c l Ω for ˆp ĉ l. We did several computations with different values for α and the constant c ], [ not to be confused with the cosine transform coefficients to support the error estimate 4.3. The results can be seen in Table 6.. We note that Theorem 4.3 can only be applied if f dµy =, which is obviously not the case for our numerical example. But, due to the proof of Theorem 4.3, the only modification that has to be made is to multiply f = f dµy by the logarithmic term in 4.3 to obtain a valid estimate. 7. Outlook and open questions. We have seen that under rather natural source conditions error estimates in Bregman distances can be extended from the well-known quadratic fitting Gaussian noise case to general convex fidelities. We have seen that the appropriate definition of noise level in the convex case is not directly related to the norm difference, but rather to the data fidelity. With this definition the estimates indeed yield the same asymptotic order with respect to regularization parameter and noise level as in the quadratic case. The constants are again related to the smoothness of the solution norm of the source element, with the technique used in the general case one obtains slightly larger constants than in the original estimates. The latter is caused by the fact that the general approach to error estimation cannot exploit linearity present in the case of quadratic fidelity Error estimation is also important for other than variational approaches, in particular iterative or flow methods such as scale space methods, inverse scale space methods or Bregman iterations. The derivation of such estimates will need a further understanding of dual iterations or flows, which are simple gradient flows in the case of quadratic fidelity but have a much more complicated structure in general. Another interesting question for future research, which is also of practical importance, is an understanding of error estimation in a stochastic framework. It will be an interesting task to further quantify uncertainty or provide a comprehensive stochastic framework. In the case of Gaussian noise such steps have been made, e.g., by lifting pointwise estimates to estimates for the distributions in different metrics cf. [4, 5, 6] or direct statistical approaches; cf. [3]. In the case of Poisson noise a direct estimation of mean deviations has been developed in parallel in [8], which uses similar techniques as our estimation and also includes a novel statistical characterization of noise level in terms of measurement times. We think that our approach of estimating pointwise errors via fidelities, i.e., data likelihoods more precisely negative log likelihoods, will have an enormous potential to generalize to a statistical setting. In particular, we expect the derivation of confidence regions by further studies of distributions of the log-likelihood for different noise models. Acknowledgments. This work has been supported by the German Research Foundation DFG through the project Regularization with singular energies and SFB 656: Molecular cardiovascular imaging. The second author acknowledges further support by the German ministry for Science and Education BMBF through the project INVERS: Deconvolution with sparsity constraints in nanoscopy. The authors thank Dirk Lorenz University Bremen, Nico Bissantz University Bochum, Thorsten Hohage University Göttingen, and Christoph Brune University Münster for useful and stimulating discussions. Furthermore, the authors thank the referees for valuable suggestions.

ERROR ESTIMATES FOR GENERAL FIDELITIES 67 TABLE 6. Computational Results. The first two columns represent different values for α and c. The remaining numerated columns denote: a ch f ĉ, b αd symm ĉ, c, c +cδ, d cln ` α l c q, l e +cδ ch f ĉ cln ` α c q. l It can be seen that column e is always larger than b and hence, the error estimate 4.3 is always fulfilled. c α a b c d e.5.5 85.5.8 78.7.49 93.56.5.9 6.3.469 78.7.34 8.4.5.5 59.48.5379 78.7.49 9..5.3 58.39.67 78.7.3777.3.5.9 49.34.65 78.7 3.4e-5 9.4.5.5 46.69.5 78.7.49e-5 3.5.3 45.68.4 78.7 3.777e-6 33.5 5e-5 5.7.586 78.7.49e-7 8..9 8.6.469 3.7.48..5 7..5379 3.546 3.99..3 5..67 3.889 5.95..9 88.8.65 3.7 4.4..5 84.4.5 3 5.46e-5 47...3 8.3.4 3.889e-5 48.8. 5e-5 9.9.586 3 5.46e-7 39.76..9 9.4.469.3.7.54..5 7.8.5379.3.546.6..3 5.6.67.3.889 4.73..9 97.68.65.3.7.64..5 9.45.5.3.546 7.88..3 9.45.4.3.889 9.87. 5e-5.4.586.3 5.46e-6 9.9..5 7.3.8 9.3 58.45 7.38..9.5.469 9.3.75.437..5 8.8.5379 9.3.55.945..3 6.7.67 9.3.889.779..9 98.57.65 9.3.7.7..5 93.9.5 9.3.546 5.97..3 9.7.4 9.3.889 7.98. 5e-5.3.586 9.3 5.46e-5 7.9..9.6.469 9. 7.4 5.65..5 8.9.5379 9. 5.845 6.46..3 6.8.67 9..959 4.337..9 98.66.65 9..75.66..5 93.37.5 9..55 5.83..3 9.36.4 9..889 7.8. 5e-5.4.586 9..546 7.7 e-5.9 98.67.65 9..74 3.8 e-5.5 93.38.5 9..5845 6.34 e-5.3 9.36.4 9..959 7.97 e-5 5e-5.4.586 9..55 7.7 e-6 5e-5.4.586 9..5845 7.76 REFERENCES [] G. AUBERT AND. F. AUOL, A variational approach to remove multiplicative noise, SIAM. Appl. Math., 68 8, pp. 95 946. []. BARDSLEY AND A. LUTTMAN, Total variation-penalized Poisson likelihood estimation for ill-posed prob-